0% found this document useful (0 votes)
14 views

Models of Uncertainty in Spatial Data

Uploaded by

estella213439
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Models of Uncertainty in Spatial Data

Uploaded by

estella213439
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

13

Models of uncertainty in spatial data


P F FISHER

Spatial information is rife with uncertainty for a number of reasons. The correct
conceptualisation of that uncertainty is fundamental to the correct use of the information.
This chapter attempts to document different types of uncertainty – specifically error,
vagueness, and ambiguity. Examples of these three types are used to illustrate the classes
of problems which arise, and to identify appropriate strategies for coping with them. The
first two categories are well documented and researched within the GIS field, and are now
recognised in many varied contexts. The third has not been so widely researched. Cases are
also identified where uncertainty is deliberately introduced into geographical information in
order to anonymise individuals. Examples are given where both error and vagueness can be
applied to the same phenomenon with different understandings and different results.
Methods to address the problems are identified and are explored at length.

1 INTRODUCTION aware of the possible complications to their analysis


caused by uncertainty, and at best present the user of
‘The universe, they said, depended for its operation the analysis with a report of the uncertainty in the
on the balance of four forces which they identified final results together with a variety of plausible
as charm, persuasion, uncertainty and outcomes. A complete response to uncertainty is to
bloody-mindedness.’ present the results of a full modelling exercise which
Terry Pratchett (1986) takes into account all types of uncertainty in the
different data themes used in the analysis. It seems
acuracy n. An absence of erors. ‘The computer offers
that neither response is widespread at present, and in
both speed and acuracy, but the greatest of these is
any case the tools for doing the latter are currently
acuracy’ (sic)
the preserve only of researchers.
Kelly-Bootle (1995)
This chapter explores the developing area of the
The handling of large amounts of information conceptual understanding (modelling) of different
about the natural and built environments, as is types of uncertainty within spatial information.
necessary in any GIS, is prone to uncertainty in a These are illustrated in Figure 1. At the heart of the
number of forms. Ignoring that uncertainty can, at issue of uncertainty is the problem of defining both
best, lead to slightly incorrect predictions or advice the class of object to be examined (e.g. soils) and the
and at worst can be completely fatal to the use of the individual object (e.g. soil map unit) – the so-called
GIS and undermine any trust which might have been problem of definition (Taylor 1982). Once the
put in the work of the system or operator. It is conceptual modelling identifies whether the class of
therefore of crucial importance to all users of GIS objects to be described is well or poorly defined the
that awareness of uncertainty and error should be as nature of the uncertainty as follows:
widespread as possible. Fundamental to such
understanding is the nature of the uncertainty, in its 1 If both the class of object and the individual are
different guises. This is the subject of this chapter. A well defined then the uncertainty is caused by
minimal response should be that users of the GIS be errors and is probabilistic in nature;

191
P F Fisher

2 If the class of object or the individual is poorly Uncertainty


defined then additional types of uncertainty may
be recognised. Some have been explored by GIS
researchers and others have not:
a If the uncertainty is attributable to poor Well defined Poorly defined
definition of class of object or individual object, object object
then definition of a class or set within the
universe is a matter of vagueness, and this can
conveniently be treated with fuzzy set theory. Probability
b Uncertainty may also arise owing to ambiguity
(the confusion over the definition of sets
within the universe) owing, typically, to Vagueness Ambiguity
differing classification systems. This also takes
two forms (Klir and Yuan 1995), namely:
i Where one object or individual is clearly Fuzzy set theory
defined but is shown to be a member of two
or more different classes under differing Discord Nonspecificity
schemes or interpretations of the evidence,
then discord arises;
ii Where the process of assigning an object to
a class at all is open to interpretation, then ? ?
the problem is non-specificity.
In the context of spatial databases, only vagueness Fig 1. A conceptual model of uncertainty in spatial data
(adapted from Klir and Yuan 1995: 268).
as expressed by fuzzy set theory and error as
represented by probability theory have been
researched, and these are the primary focus of the clearly and meaningfully separable from other
discussion below. The list is necessarily not objects in whichever dimension is of interest –
exhaustive: however, the volume of research and the ideally it will be separable in both. This is a complex
amount of interest in this area continues to increase. intellectual process, one which draws on the history
If a chapter had been written in this form for the and the critical appraisal of subject-specific
first edition of this book, it would have focused on scientists. This conceptual model has been
only one variety of uncertainty, namely error complicated and muddied by conventions which
(Chrisman 1991). A few years later there are two influence the perception of geographical
equally important strands to be discussed. Although information. Foremost among these is the historical
the strands discussed here seem to explain the necessity of simplification of information for map
majority of the long-recognised causes of production; what Fisher (1996) denotes the
uncertainty in spatial information, it is already paradigm of ‘production cartography’. Equally
possible to identify other types of uncertainty that important are the concepts of classification,
commonly based on hierarchies, in which objects
should be addressed in future research.
must fall into one class or another, and of computer
database models in which objects are treated as
unique individuals and form the basis to analysis.
2 THE PROBLEM OF DEFINITION If a spatial database is to be used, or to be created
from scratch, then investigators or users have to ask
The principal issue of geographical uncertainty is
themselves two apparently simple questions:
the understanding of the collector and user of the
data as to the nature of that uncertainty. There are 1 Is the class of objects to be mapped (e.g. soils,
three facets to this, namely uncertainty in rocks, ownership, etc.) clearly separable from
measurement of attributes, of space, and of time. In other possible classes?
order to define the nature of the uncertainty of an 2 Are the geographical individuals within the class of
object within the dimensions of space and time, a objects clearly and conceptually separable from
decision must be made as to whether or not it is other geographical individuals within the same class?

192
Models of uncertainty in spatial data

If it is possible to separate unequivocally the cooperate. The primary errors associated with the US
phenomenon to be mapped into mappable and Census of Population arise out of underenumeration
spatially distinct objects using the spatial distribution of groups such as illegal immigrants and the homeless
of some individual attribute or collection of (Bureau of the Census 1982).
attributes, at a given time, then there is no problem of A second example of a well-defined geographical
definition. A phenomenon which is well defined phenomenon in western societies is land ownership.
should have diagnostic properties for separating The concept of private ownership of land is
individuals into classes based on attributes and into fundamental to these societies; therefore the spatial
spatially contiguous and homogenous areas. and attribute interpretation of that concept is
If it is not possible to define the spatial extent of an normally quite straightforward in its spatial
object to be mapped or analysed, there is a problem of expression. The boundary between land parcels is
definition, and it can be said to be ‘vague’ (Williamson commonly marked on the ground, and marks an
1994). In this circumstance, while specific properties abrupt and total change in ownership. In point of
may be measured and these measurements may be fact, at least in the UK, the surveyed boundary is
only deemed indicative of the actual position of the
precise, no combination of properties allows the
boundary, and so any property boundary has a
unequivocal allocation of individual objects to a class,
defined uncertainty in position, otherwise it would
or even the definition of the precise spatial extent of
require resurveying every time the boundary marker
the objects. Most spatial phenomena in the natural is rebuilt (Dale and McLaughlin 1988). Even in
environment share this problem of definition to some instances of collective ownership in which two
extent. Error analysis on its own does not help with groups may own two adjoining parcels and one
the description of these classes, although any person may belong to both groups, the question of
properties which are measured may be subject to ownership and responsibility remains clear in law.
errors just as they are in other cases. Well-defined geographical objects are essentially
created by human beings to order the world they
occupy. They exist in well-organised and established
2.1 Examples of well-defined geographical objects political and legal realms. Some other objects in our
In developed countries census geographies tend to be built and natural environments may seem to be well
well defined; even in less developed countries the defined, but they tend to be based on a single
geographical concepts are generally well defined, if measurement, and close examination frequently shows
less clearly implemented. They usually consist of a set the definition to be obscure. For example, the land
of regions each with precise boundaries within which surface seems well defined, and it should be possible to
specific attributes are enumerated (Openshaw 1995). determine its height above sea level rigorously and to
The areas at the lowest level of enumeration (city specified precision. But even the position of the ground
blocks, enumeration districts, etc.) are grouped with under our feet is being brought into question. This is
specific instances of other areas at the same level to caused by the increasing availability of elevation
make up higher level areas, which in turn are grouped models derived from photogrammetry to sub-
with other specific areas to form a complete and rigid centimetre precision, when the actual definition of the
hierarchy (e.g. see Martin, Chapter 6). The attributes land surface being mapped must come into question,
to be counted within the areas are typically based on and whether the field was ploughed or the grass was
property units, individuals, and households: although cut, become serious issues in defining the so-called land
the definitions of ‘household’ may differ between surface. Most, if not all, other geographical
different surveys (Office of National Statistics 1997) phenomena are similarly poorly defined to some extent.
and there is rarely any perfect correspondence
between households and property units (e.g. houses in
2.2 Examples of poorly-defined
multiple occupation), each definition is nevertheless
geographical objects
quite transparent and unambiguous. The data
collection process in the western world relies on a In aboriginal societies the concept of ownership is
certain level of cooperation and literacy amongst much less clear than in western society. There are
those being counted, and while there are frequently many different native cultures, but many have a
legal sanctions for non-cooperation these cannot conception of the land owning the people, and
easily be enforced if people are reluctant to responsibility for nurturing the land is a matter of

193
P F Fisher

common trust within a group (Native North indicator species to assist decisions, but strict
Americans and Australians, for example: Young deterministic rules may trivialise the classification
1992). Areas of responsibility are less well defined, process without generating any deeper meaning.
with certain core areas for which a group or an In discussion of most natural resource
individual may be responsible (e.g. the sacred sites of information we typically talk about central concepts
the Australian Aborigines: Davis and Prescott 1992), and transitions or intergrades. Figure 2 shows a
and other regions for which no one is actually scatter plot of some remotely-sensed (LANDSAT)
responsible but many groups may use (so-called data from Band 3 and Band 5 (which record the
‘frontier zones’). Among both North American and amounts of reflected electromagnetic radiation in
Australian native groups, the spatial extents of these the wavelength ranges 0.63–0.69 and 1.55–1.75 µm,
core and peripheral areas have been shown to be well respectively). This is part of the information used in
known to the groups concerned, although they may the assignment of pixels in an image to land covers.
not be marked, precisely located, or fixed over time The conceptualisation of the land covers is as
(Brody 1981; Davis and Prescott 1992). There are Boolean objects (discussed below), and yet it is clear
therefore acknowledged divisions of space, but the from Figure 2 that there are no natural breaks in the
spatial location of the divider may be uncertain. The distribution of points in the 2-dimensional space
extent of the zones of uncertainty can be resource shown. This is typical of satellite imagery. Although
dependent, so that when resources are plentiful there LANDSAT actually records information in seven
may be relatively precise boundaries, and when spectral bands which can give identification to some
scarce there may be very diffuse frontiers (Davis and natural groups of pixels, the number of identifiable
Prescott 1992; Young 1992). Alternatively, ties of groups very rarely corresponds with the number of
kinship between groups may create less specific land cover types being mapped (Campbell 1987).
frontiers, and lack of kinship hard boundaries The classification process involves the identification
(Brody 1981). These aboriginal territories have much of prototypical values for land cover types, and the
in common with the documented ‘behavioural extension of that mapping from the attribute
neighbourhoods’ of western individuals. Such dimensions shown to the spatial context.
neighbourhoods are also poorly defined both Conceptually, the same basic process is executed in
spatially and temporally: they may be discontinuous almost all traditional mapping operations, and the
and will inevitably overlap with others, and while problem of the identification of objects is
possibly unique to an individual or family, may fundamental. It is apparent from Figure 2 that the
nonetheless make up part of a geographical region intergrades (all possible locations in attribute space
that is occupied by a group. which are between the prototype or central
Complexity is also inherent in the mapping of concepts) are more commonly and continuously
vegetation (Foody 1992). The allocation of a patch of occupied than the prototypical classes.
woodland to the class of oak woodland, for example
– as opposed to any other candidate woodland type –
70
is not necessarily easy. It may be that in that region a
threshold percentage of trees need to be oak for the
60
woodland to be considered ‘oak’, but what happens if
there is one per cent less than that threshold? Does it
50
really mean anything to say that the woodland needs
to be classed to a different category? Indeed, the
40
higher level classification to woodland at all has the
same problems. Mapping the vegetation is also
30
problematic since in areas of natural vegetation there
are rarely sharp transitions from one vegetation type
20
to another, rather an intergrade zone or ecotone
occurs where the dominant vegetation type is in
transition (Moraczewski 1993). The ecotone may 10
0 20 40 60 80 100 120
occupy large tracts of ground. The attribute and
spatial assignments may follow rules, and may use Fig 2. A scatterplot of Bands 3 and 5 of a LANDSAT TM image.

194
Models of uncertainty in spatial data

The problem of identification may be extended importance on them, although the significance has
into locations. Figure 3 shows a soil map of part of not been assessed. In other areas, such as soil science
the Roujan catchment in France with numbers and vegetation mapping, some of the most
indicating soil map units (soil types) and the shading interesting areas are at the intergrade, and these are
indicating the extent of boundary intergrades rightly a focus of study in their own right (Burrough
between types. The width of intergrades is based on 1989; Burrough et al 1992; Lagacherie et al 1996).
the knowledge of soil surveyors who prepared the The interest in intergrades as boundaries is not a
map (Lagacherie et al 1996). preserve of natural resouce scientists, however, and
Within natural resource disciplines the in discussion of urban and political geography
conceptualisation of mappable phenomena and the considerable attention is paid to these concepts
spaces they occupy is rarely clear cut, and is still (Prescott 1987; Batty and Longley 1994).
more rarely achieved without invoking simplifying
assumptions (see also Veregin, Chapter 12). In
forestry, for example, tree stands are defined as being 3 ERROR
clearly separable and mappable; yet trees vary within
stands by species density, height, etc., and often the If an object is conceptualised as being definable in
spatial boundary between stands is not well defined both attribute and spatial dimensions, then it has a
(Edwards 1994). Although theorists may recognise Boolean occurrence; any location is either part of the
the existence of intergrades, the conceptual model of object, or it is not. Yet within GIS, for a number of
mapping used in this and other natural resource reasons, the assignment of an object or location to
disciplines accepts the simplification and places little the class may be expressed as a probability. There are

7
8 10
2 N
9
5
10
1 17

19
9
4 11 12 10 6
7 11
15 18 14 5
16
2 7
6 10 8
14
1
13
13
11
5
9 5
8
4 9 12
2
2

7 3
Scale 1

0 75 150 225 300 m

Fig 3. Soil map of the Roujan catchment in France showing the extent of soil intergrades
(after Lagacherie et al 1996: 281).

195
P F Fisher

any number of reasons why this might be the case. Table 1 Common reasons for a database being in error.
Three are briefly discussed here:
Type of error Cause of error
1 probability owing to error in the measurement; Measurement Measurement of a property is erroneous
2 probability because of the frequency of occurrence;
Assignment The object is assigned to the wrong class
3 probability based on expert opinion. because of measurement error by field, or
Errors occur within any database, and for any laboratory scientist, or by surveyor
number of reasons; some reasons are given in Table Class Following observation in the field and for reasons
1. They are given more complete treatment by Fisher generalisation of simplicity, the object is grouped with objects
(1991b) and Veregin (Chapter 12). The simplest to possessing somewhat dissimilar properties
handle are those associated with measurement, Spatial Generalisation of the cartographic representation
because well-advanced error analysis procedures generalisation of the object before digitising, including
have been developed (Heuvelink, Chapter 14; displacement, simplification, etc. (see Weibel and
Dutton, Chapter 10)
Heuvelink et al 1989; Heuvelink and Burrough 1993;
Taylor 1982). If the true value of a property of an Entry Data are miscoded during (electronic or manual)
entry to a GIS
object were precisely known, then it would be
possible to identify the distribution of ‘real world’ Temporal The object changes character between the time
measurement error by making repeated of data collection and of database usage
measurements of the property (which would each Processing In the course of data transformations an error
differ from the true value by a variable measurement arises because of rounding or algorithm error
error). It would then be possible to estimate the
distribution of the error in its measurement, and
thus to develop a full error model of the geological and soil mapping is actually the result of
measurement error. This is, in fact, the basis of the Boolean classification of subjective probability, since
‘root mean square’ reporting of error in digital it is impracticable to observe directly either of these
elevation models (see also Beard and Buttenfeld, phenomena across the entire countryside: rather
Chapter 15). Yet there are many instances in which inference is made using sampled points such as
such reductionist measures of error are over- outcrops and auger borings. Between those locations
simplistic and aspatial, failing to identify the spatial it is expert opinion as to what is there; so long as a
distribution of the error in GIS-based modelling Boolean model of soil and rock occurrence is applied,
(Monckton 1994; Walsby 1995). the map is implicitly a matter of the expert’s
A further means of describing aspatial error is to maximum probability (Clarke and Beckett 1971).
create a confusion matrix which shows the cover- Probability has been studied in mathematics and
type actually present at a location crosstabulated statistics for hundreds of years. It is well understood,
against the cover-type identified in the image and the essential methods are well documented.
classification process. Typically the matrix is There are many more approaches to probability than
generated for a complete image. It reports errors in the three described here. Probability is a subject that
the allocation of pixels to cover types (Campbell is on the syllabus of almost every scientist qualified
1987; Congalton and Mead 1983). However, the at degree level, and so it pervades the understanding
confusion matrix is of limited use if the precise of uncertainty through many disciplines. It is not,
interpretation of either the classification process or however, the only way to treat uncertainty.
the ground information is not clear cut.
A different view of probability is based on the
frequency of the occurrence of a phenomenon. The 4 VAGUENESS
classic applications of probability in this area
include weather and flood forecasting. Floods of a In contrast with error and probability which are
particular height are identified as having a particular steeped in the mathematical and statistical literature,
return period which translates as a particular vagueness is the realm of philosophy and logic and
probability of a flood of that level occurring. has been described as one of the fundamental
A third view of probability is as a manifestation of challanges to those disciplines (Williamson 1994;
subjective opinion, where an expert states a ‘gut Sainsbury 1995). It is relatively easy to show that a
feeling’ of the likelihood of an event occurring. Much concept is ‘vague’, and the classic pedagogic

196
Models of uncertainty in spatial data

exposition uses the case of the ‘bald’ man. If a person membership is defined by integer values in the range
with no hair at all is considered bald, then is a person {0,1}. By contrast, membership of a fuzzy set is
with one hair bald? Usually, in any working definition defined by a real number in the range [0,1] (the
of ‘bald’, the answer to this would be ‘yes’. If a change in type of brackets indicates the real and
person with one hair is bald, then is a person with two integer nature of the number range). Definite
hairs bald: again, ‘yes’. If you continue the argument, membership or non-membership of the set is
one hair at a time, then the addition of a single hair identified by the terminal values, while all intervening
never turns a bald man into a man with a full head of values define an intermediate degree of belonging to
hair. On the other hand, you would be very the set, so that, for example, a membership of 0.25
uncomfortable admitting that someone with plenty of reflects a smaller degree of belonging to the set than
hair was bald, since this is illogical (Burrough 1992; a membership of 0.5. The object described is less like
Burrough 1996; Zadeh 1965). This is known as the the central concept of the set.
Sorites Paradox which, little by little, presents the Fuzzy memberships are commonly identified by
logical argument that someone with plenty of hair is one of two methods (Robinson 1988):
bald! A number of resolutions to the paradox have
1 the Similarity Relation Model is data driven and
been suggested, but the most widely accepted is that
the logic employed permits only a Boolean response involves searching for patterns within a dataset
(‘yes’ or ‘no’) to the question. A graded response is similarly to traditional clustering and
not acceptable. And yet there is a degree to which a classification methods, the most widespread
person can be bald. It is also possible that the initial method being the Fuzzy c Means algorithm
question is false, because ‘bald’ would normally be (Bezdek 1981). More recently, fuzzy neural
qualified if we were examining it in detail, so we networks have been employed (Foody 1996);
might ask whether someone was ‘completely bald’, 2 the Semantic Import Model, in contrast, is derived
and we might define that as someone with no hair at from a formula or formulae specified by the user
all. Can we ever be certain that individuals have or another expert (Altman 1994; Burrough 1989;
absolutely no hair on their heads? Furthermore, Wang et al 1990).
where on their neck and face is the limit of the head
Many studies have applied fuzzy set theory to
such that we can judge whether there is any hair on it?
geographical information processing. There are
You are eventually forced to admit that by
several good introductions to the application of fuzzy
incremental logical argument, it is impossible to
sets in geographical data processing, including books
specify whether someone is ‘completely’, ‘absolutely’,
by Leung (1988) and Burrough and Frank (1996) –
‘partially’, or ‘not at all’ bald, given a count of hairs
see also Eastman (Chapter 35).
on their head, even if the count is absolutely correct.
Fuzzy set theory is now only one of an increasing
So no matter the precision of the measurement, the
number of soft set theories (Pawlak 1982), in
allocation to the set of people is inherently vague.
contrast to hard, Cantor sets. However, a number of
The Sorites Paradox is one way which is
authorities consider that fuzzy set theory is
commonly used to define vague concepts. If a
mistakenly used for problems which more correctly
concept is ‘Sorites susceptible’, it is vague. Many
fall within the realm of subjective probability
geographical phenomena are ‘Sorites susceptible’,
(Laviolette and Seaman 1994). They have, however,
including concepts and objects from the natural and
primarily addressed fuzzy logic rather than fuzzy
built environments (e.g. see Band, Chapter 37).
sets, and illustrated their arguments with Boolean
When, exactly, is a house a house; a settlement, a
conditions and decisions. As such, they have failed
settlement; a city, a city; a podsol, a podsol; an oak
to address the nature of the underlying set and any
woodland, an oak woodland? The questions always
inherent vagueness which may be present, as Zadeh
revolve around the threshold value of some
(1980) has shown. Moreover, Kosko (1990) has
measurable parameter or the opinion of some
argued that fuzzy sets are a superset of probability.
individual, expert or otherwise.
Fuzzy set theory was introduced by Zadeh (1965)
as an alternative to Cantor (Boolean) sets, and built
5 AMBIGUITY
on the earlier work of Kaplan and Schott (1951).
Membership of an object to a Cantor set is absolute, The concepts and consequences of ambiguity
that is it either belongs or it does not, and (Figure 1) in geographical information are not well

197
P F Fisher

researched. Ambiguity occurs when there is doubt as placement of soil boundaries in both attribute and
to how a phenomenon should be classified because spatial dimensions, and generates considerable
of differing perceptions of it. Two types of problems in mapping soils across international and
ambiguity have been recognised, namely discord and interstate boundaries (FAO/UNESCO 1990;
non-specificity. In other areas of study some partial Campbell et al 1989), as has been exemplified in the
solutions have been suggested, but they are not creation of the Soil Map of the European
reviewed here because of the lack of specific Communities (Tavernier and Louis 1984).
research with geographical information. Several measures of social deprivation have been
Within geography the most obvious form of suggested which are based upon information from the
discord through ambiguity is in the conflicting UK Census of Population (Table 3). Enumeration areas
territorial claims of nation states over particular are assigned to one class or another, and the classes have
pieces of land. History is filled with this type of been used in the allocation of resources for a range of
ambiguity, and the discord which results. Examples in social and economic programmes. The fact that there
the modern world include intermittent and ongoing are different bases to the measurement of deprivation
border conflicts and disagreements in Kashmir means that enumeration areas may be afforded special
(between India and China) and the neighbouring policy status using one indicator, but not using another,
Himalayan mountains (between China and India). and this is a source of potential discord.
Similarly, the existence or non-existence of a nation Ambiguity through non-specificity can be
of Kurds is another source of discord. All represent illustrated from geographical relations. The relation
mismatches between the political geography of the ‘A is north of B’ is itself non-specific, because the
nation states and the aspirations of people (Horn, concept ‘north of’ can have at least three specific
Chapter 67; Prescott 1987; Rumley and Minghi 1991). meanings: that A lies on exactly the same line of
As has already been noted, many if not most longitude and towards the north pole from B; that A
phenomena in the natural environment are also lies somewhere to the north of a line running east to
ill-defined. The inherent complexity in defining soil, west through B; or, in common use, that A lies in the
for example, is revealed by the fact that many sector between perhaps north-east and north-west,
countries have slightly different definitions of what a but is most likely to lie between north-north-east and
soil actually constitutes (cf. Avery 1980; Soil Survey north-north-west of B. The first two definitions are
Staff 1975), and by the complexity and the volume precise and specific, but equally valid. The third is the
of literature on attempting to define the spatial and natural language concept which is itself vague. Any
attribute boundaries between soil types (Webster and lack of definition as to which should be used means
Oliver 1990; Lagacherie et al 1996). Furthermore, no that uncertainty arises in interpreting ‘north of’.
two national classification schemes have either the Arguably, soil classification is a process whereby
same names for soils or the same definitions if they modern schemes have removed the problem of
happen to share names. This causes many soil non-specificity which was inherent in earlier schemes
profiles to be assigned to different classes in different and replaced it by supposedly objective, globally
schemes, as shown in Table 2 (see also Isbell 1996; applicable diagnostic criteria. The remaining
Soil Classification Working Group 1991; Soil Survey problems arise out of creating Boolean boundaries
Staff 1975). Within a single country this is not a in a vague classification environment and the
problem, yet ambiguity arises in the international problem of discord.
efforts to produce supra-national or global soil None of this should be taken to imply that
maps. The individual national classifications cause ambiguity is inappropriate or intrinsically ‘wrong’.
considerable confusion in the process and the The England and Wales soil classification scheme at
classification scheme becomes part of the national the scale of England and Wales is possibly the most
identity within the context. There is also rarely a relevant classification scheme for the soils in that
one-to-one correspondence between classification country. Similarly, the United States Department of
systems (soil type x in this classification corresponds Agriculture scheme (Soil Taxonomy) was the best
to soil type a in that), but rather a many-to-many scheme for the US when it was finalised in 1975
classification (soil types a and b correspond broadly (although it does claim a global application). The
to soil type x, but some profiles of soil type a are problem of ambiguity arises when we move to a
also soil types y and z). This leads to different higher level, and data from the British Soil Survey

198
Models of uncertainty in spatial data

Table 2 Alternative soil classification schemes for global and national use.

US Australian Soil Map of the World British Soil


Classification Classification (FAO/UNESCO 1990) Classification
(Soil Survey (Isbell 1996) (Avery 1980)
Staff 1975)
Entisol Anthroposol Fluvisol Kastanozem Terrestrial raw soil
Inceptisol Organosol Gleysol Chernozem Hydric raw soil
Spodosol Podsol Regosol Phaeozem Lithomorphic soil
Mollisol Hydrosol Lithosol Greyzem Pelosol
Oxisol Kurosol Arenosol Cambisol Brown soil
Ultisol Sodosol Rendzina Luvisol Podzolic soil
Alfisol Chromosol Ranker Podzoluvisol Ground-water gley soil
Aridisol Calcarosol Andosol Podzol Surface-water gley soil
Histosol Ferrosol Vertisol Planosol Man-made soil
Vertisol Dermosol Solonchak Acrisol Peat soil
Kandosol Solonetz Nitosol
Rudosol Yermosol Ferrasol
Tenosol Xerosol Histosol

Table 3 Measures of social deprivation used in the UK, with within the UK, housing indicators of deprivation
the associated census variables used in their calculation replace ethnic indicators in Wales.) Ambiguity
(Openshaw 1995). nevertheless does come into play in the allocation of
Variable Jarman Townsend Department of social and economic programme resources, and can
the Environment lead to contention between local, national, and (in the
case of EU programmes) supra-national, politicians
Unemployment X X X over the issue of the basis to financial support.
No car X
Unskilled X
Overcrowding (more than
1 person per room) X X X 6 CONTROLLED UNCERTAINTY
Lacking amenities X
Not owner occupied X Many agencies distribute and allow access to
Single-parent household X X spatial information which is degraded deliberately
Children under 5 years old X through creating uncertainty. Two examples of this
Lone pensioners X
are discussed (see also Heuvelink, Chapter 14, and
Ethnic minorities X
Hunter, Chapter 45, for a discussion of the
management of uncertainty).
have to be fused with data from neighbouring If the exact locations of rare or precious objects
countries or countries further afield. In preparing such as nesting sites of endangered birds or
the Soil Map of the European Community, for archaeological sites are recorded in a dataset, any more
example, the FAO/UNESCO classification was widely distributed versions may introduce a systematic
employed with some amendments. or random error introduced into the locational
In a like manner, there is nothing wrong with there component. This may be done by only reporting
being three different methods of defining deprived information for large areal aggregations (e.g. 4 km2 in
regions in Britain. Deprivation is a social construct the county flora of Leicestershire, England: Primavesi
and any quantitative index can only be an and Evans 1988; and 100 km2 grid in the state flora of
approximation which is deemed relevant and Victoria, Australia, distributed on CD-ROM: Viridians
acceptable within its own terms of reference. If the 1996). In some cases both systematic and random
constituent attributes of a particular index happen elements are introduced in order to protect the
not to be measured in another country, that index phenomenon reported, and although the error may be
simply ceases to have international application. (In inconvenient, the consequences of not introducing it
fact, with regard to the use of the Jarman Index may be worse.

199
P F Fisher

Uncertainty is also deliberately introduced into through spatial autocorrelation measures or full
census data in order to preserve confidentiality. If specification of the variogram of the error field
only a few people living within any one enumeration (Journel 1996). If the error field is generated using
area have a particular characteristic – for example this method then it can be added to the DEM,
high income – and incomes are reported, it may be yielding a revised DEM which includes the known
very easy to identify exactly which person that is. error. If the viewshed is determined over that DEM
This is not socially acceptable, and so most census with error, then a version of the Boolean viewshed is
organisations withhold or falsify small counts. For generated. If the process is repeated, then a second
example, in the USA, data for areas with small version of the Boolean viewshed, a third, a fourth,
counts are withheld (Bureau of the Census 1982), and so on are generated. If each Boolean viewshed
whereas in the UK small counts have had a random image is coded as 0 and 1 indicating areas which are
value between +1 and -1 added (Dewdney 1983). out-of-sight and in-sight, then using map algebra to
find the sum of Boolean viewsheds, a value between 0
and the number of realisations will be found for all
7 DISTINGUISHING BETWEEN VAGUENESS locations depending on the number of realisations in
which that location is visible. Dividing by the number
AND ERROR
of realisations will then give an estimate of the
Appropriate conceptualisation of uncertainty is a probability of that location actually being within the
prerequisite to its modelling within GIS. In this viewshed. The probability of any pixel being visible
section two areas of previous study are examined, from the viewing point, or the probability of the land
rising above the line of sight somewhere between the
and the reasons for the use of either vague or error
viewer and the viewed is given by:
models of uncertainty are discussed.
n
∑ xijk
7.1 Viewshed k=1 (1)
The viewshed is a simple operation within many p(xij) =
n
current GIS, which, in its usual implementation,
reports those areas in a landscape which are in view where
and those which are not (coded 1 and 0 respectively), p(xij) is the probability of a cell at row i and
whether in a triangulated grid or dataset (De Floriani column j in the raster image being visible; and
and Magillo, Chapter 38; De Floriani et al 1986; xijk is the value at the cell of the binary-coded
Fisher 1993). Fisher (1991a) has shown how, for a viewshed in realisation k such that k takes values
variety of reasons, the visible area is very susceptible 1 to n.
to error in the measurement of elevations in the
This is illustrated in Plate 9.
Digital Elevation Model (DEM). (While Fisher used
In contrast, using a Semantic Import Model it is
a rectangular grid in his 1991 study, the same would possible to define a number of different fuzzy
be true for a triangulated model.) The database error viewsheds (Fisher 1994; note that the term is used
is propagated into the binary viewshed because of incorrectly by Fisher 1992) from a family of
error in the elevation database (Fisher 1991a) and equations relating the distance from the viewer to
uncertainty in determination of visibility because of the viewed to the fuzzy membership function (Plate
variation between different algorithms (Fisher 1993). 10). Any number of different circumstances can be
Fisher (1992, 1993, 1994) has proposed that it is described, and two are included here: Equation 2
possible to define the error term from the Root Mean represents normal atmospheric conditions, and
Squared Error (RMSE) for the DEM such that the Equation 3 describes the visibility through fog.
error has a zero mean and standard deviation equal
1 for dvp → ij ≤ b1

{( (
to the RMSE. This is not in fact true and provides
insufficient description of the error for a fully 1 for dvp → ij > b1
µ(xij) =
justifiable error model since the mean error may be
biased (non-zero) and must have spatial structure.
1+
dvp → ij – b1
b2 )) 2 (2)

Spatial structure of the error may be identified

200
Models of uncertainty in spatial data

Fig 4. Probable viewshed based on Equation 1.

Fig 5. Fuzzy viewshed based on Equation 2.

201
P F Fisher

for dvp → ij ≤ b1

{ ((
1 of the operator, and none matches the actual
0 for dvp → ij > b1 + 2 • b2 situation either pragmatically or theoretically.
µ(xij) =
sin
2 • b2 ))
dvp → ij – b1 π for dvp → ij > b1
2 (3)
It is a fact of life that the spatial extent of
geographical objects is not coincident with the image
pixel, hence the class types on the ground are often
where
hard to define precisely (many are Sorites
µ(xij) is the fuzzy membership at the cell at row i, susceptible), and the digital numbers do not show
column j; greater similarity within cover type than between.
drp→ij is the distance from the viewpoint to row i, Therefore, arguably, fuzzy set theory (as an
column j; expression of concepts of vagueness) is a more
b1 is the radius of the zone around the viewpoint appropriate model for working with satellite imagery
where the clarity is perfect, and the target object and has been the subject of a number of
can be seen at the defined level of detail; explorations (Foody 1992, 1996; Fisher and
b2 is the distance from b1 to fuzzy membership Pathirana 1991; Goodchild et al 1994). Both Foody
of 0.5, sometimes called the cross-over point. (1992, 1996) and Fisher and Pathirana (1991) have
The distinction between fuzzy and probable shown that the fuzzy memberships extracted from
viewsheds is that the first describes the probability of digital images can be related to the proportion of the
a location being visible, while the second portrays cover types within pixels. This can be seen as a step
the degree to which objects can be distinguished. towards a full interpretation of the fuzzy
Thus there is an objective definition of the first, and memberships derived from the imagery, since in the
only subjective versions of the second which may work reported the land covers analysed are still well-
describe group or even personal circumstances. defined Boolean concepts; the vagueness is
introduced by the sensor characteristics (Fisher and
7.2 Remote sensing Pathirana 1991; Foody 1996). On the other hand,
Foody (1992) uses the fuzzy sets to examine a zone
Classification of remotely-sensed data has been a major of intergrade between vegetation communities,
source of land cover and land-use information for GIS. where both the communities and the intergrade are
The basic methods, based on a number of discriminant vague concepts.
functions from numerical taxonomy, are well known The confusion between land cover and land use is
and widely documented (Campbell 1987). The also problematic (see also Barnsley, Chapter 32).
assumptions implicit in this approach are threefold: Land use has a socioeconomic dimension to it,
1 the cover type itself is a well-defined phenomenon which cannot be sensed from satellites. Land cover,
with clear breaks reflected by there being more on the other hand, pertains to directly observable
similarity within cover types than between them; physical properties of the Earth’s surface, and so
2 the digital numbers recorded in the original can be classified directly. Indeed, one reason for the
satellite image allow the discrimination of land poor results of classification accuracy is the
cover/use types, mapping on a one-to-one basis confusion in the conceptualisation of this
between reflectance and cover type; transformation, and the opacity of the relationship
3 the area of the pixel on the ground can be between the surface reflectance of land covers and
identified as having a single cover type (that area land use. The most successful attempts at land-use
can be assigned to one and only one land-cover mapping from satellite imagery have adopted
or land-use). rule-based (Wang et al 1991) or graph theoretic
From these assumptions it is possible to allow the approaches to the problem (Barnsley, Chapter 32;
conceptualisation of both the spatial extent of the Barr and Barnsley 1995), and a combination of
pixel and the land cover attributes to be fuzzy set theory with these other methods may well
determined as Boolean concepts. Therefore further improve the results.
uncertainties can be described by probability, and Within remote sensing, it can therefore be seen
functional methods such as the maximum that the conceptualisation of the problem is the
likelihood classifier are applicable. Unfortunately, controlling influence. If the assumptions as to the
all the assumptions are made for the convenience spatial and attribute discrimination of land cover

202
Models of uncertainty in spatial data

within a pixel noted above are accepted, then there References


is a Boolean mapping between land cover and Altman D 1994 Fuzzy set theoretic approaches for handling
digital number which can be extracted by imprecision in spatial analysis. International Journal of
classification, and uncertainty can be expressed Geographical Information Systems 8: 271–89
probabilistically. If they cannot be accepted, then Avery B W 1980 Soil classification for England and Wales
the Sorites susceptibility of the subjects of (higher categories). Harpenden, Soil Survey Technical
mapping indicates their vagueness, and so fuzzy set Monograph 14
theory is a more appropriate approach to analysis. Barr S, Barnsley M 1995 A spatial modelling system to
A clear conceptualisation of the nature of the process, analyse, and interpret multi-class thematic maps
phenomenon to be mapped and the approach to be derived from satellite sensor images. In Fisher P F (ed)
Innovations in GIS 2. London: Taylor and Francis: 53–65
taken is essential to the successful analysis of
satellite imagery. Batty M, Longley P 1994 Fractal cities: a geometry of form and
function. London/San Diego, Academic Press
Bezdek J C 1981 Pattern recognition with fuzzy objective
function algorithms. New York, Plenum Press
8 CONCLUSION: UNCERTAINTY IN PRACTICE Brody H 1981 Maps and dreams; Indians and the British
Columbia frontier. Harmondsworth, Penguin
Through citing a number of different examples,
this chapter has argued that within geographical Bureau of the Census 1982 Census of Population and Housing.
Washington DC, US Department of Commerce
information there are a number of different causes
of and approaches to uncertainty. Anyone using Burrough P A 1989 Fuzzy mathematical methods for soil
survey and land evaluation. Journal of Soil Science 40:
uncertain information (i.e. the overwhelming
477–92
majority of GIS users) needs to think carefully
about the possible sources of uncertainty, and how Burrough P A 1992a Are GIS data structures too simple
minded? Computers & Geosciences 18: 395–400
they may be addressed. Uncertainty is a recurrent
theme throughout many of the chapters of this Burrough P A 1996 Natural objects with indeterminate
boundaries. In Burrough P A, Frank A U (eds) Geographic
book (e.g. Hunter, Chapter 45; Martin, Chapter 6;
objects with indeterminate boundaries. London, Taylor and
Raper, Chapter 5); the particular contribution of Francis: 3–28
this chapter is to relate our conceptualisation of
Burrough P A, Frank A U (eds) 1996 Geographic objects with
the nature of uncertainty to GIS-based data
indeterminate boundaries. London, Taylor and Francis
models. Analysis without accommodating data
Burrough P A, MacMillan R A, Deursen W van 1992 Fuzzy
uncertainty (both error and vagueness) can quite
classification methods for determining land suitability from
severely limit its usefulness.Yet an appropriate soil profile observations and topography. Journal of Soil
conceptualisation of uncertainty and the Science 43: 193–210
application of related analytical methods creates a
Campbell J B 1987 Introduction to remote sensing. New York,
rich analytical environment where decision making Guilford Press
based on spatial information is facilitated not only
Campbell W G, Church M R, Bishop G D, Mortenson D C,
by objective orderings of alternatives but also by Pierson S M 1989 The role for a geographical information
giving confidence in those alternatives. New system in a large environmental project. International Journal
analytical products are beginning to appear as a of Geographical Information Systems 3: 349–62
result of processing, and not ignoring, uncertainty Chrisman N R 1991b The error component in spatial data. In
(Burrough 1989; Burrough et al 1992; Davidson et Maguire D J, Goodchild M F, Rhind D W (eds)
al 1994; Wang et al 1990). Geographical information systems: principles and applications.
It is crucial to the correct use of geographical Harlow, Longman/New York, John Wiley & Sons Inc. Vol. 1:
information systems that all aspects of uncertainty 165–74
should be accommodated. This can only be achieved Clarke G P, Beckett P 1971 The study of soils in the field, 5th
through awareness of the issues and a thorough and edition. Oxford, Clarendon Press
correct conceptualisation of uncertainty. The subject Congalton R G, Mead R A 1983 A quantitative method to
of uncertainty in spatial information has developed test for consistency and correctness in photointerpretation.
rapidly, and is still changing, particularly with the Photogrammetric Engineering and Remote Sensing 49: 69–74
increasing use and exploration of alternative, soft set Dale P F, McLaughlin J D 1988 Land information
theories (Pawlak 1982). management. Oxford, Oxford University Press

203
P F Fisher

Davidson D A, Theocharopoulos S P, Bloksma R J 1994 A Heuvelink G B M, Burrough P A 1993 Error propagation in


land evaluation project in Greece using GIS and based on cartographic modelling using Boolean logic and continuous
Boolean fuzzy set methodologies. International Journal of classification. International Journal of Geographical
Geographical Information Systems 8: 369–84 Information Systems 7: 231–46
Davis S L, Prescott J R V 1992 Aboriginal frontiers and Heuvelink G B M, Burrough P A, Stein A 1989 Propagation
boundaries in Australia. Melbourne, Melbourne of errors in spatial modelling with GIS. International Journal
University Press of Geographical Information Systems 3: 303–22
De Floriani L, Falcidieno B, Pienovi C, Allen D, Nagy G 1986 Isbell R F 1996 The Australian soil classification. Australian
A visibility-based model for terrain features. Proceedings, Soil and Land Survey Handbook 4, CSIRO, Collingwood
Second International Symposium on Spatial Data Handling. Journel A 1996 Modelling uncertainty and spatial dependence:
Columbus, International Geographical Union: 235–50 stochastic imaging. International Journal of Geographical
Dewdney J G 1983 Census past and present. In Rhind D W Information Systems 10: 517–22
(ed.) A census user’s handbook. London, Methuen: 1–15 Kaplan A, Schott H F 1951 A calculus for empirical classes.
Methodos 3: 165–88
Edwards G 1994 Characteristics and maintaining polygons
with fuzzy boundaries in geographic information systems. In Kelly-Bootle S 1995 The computer contradictionary, 2nd
Waugh T C, Healey R G (eds) Advances in GIS research: edition. Cambridge (USA), MIT Press
Proceedings Sixth International Symposium on Spatial Data Klir G J, Yuan B 1995 Fuzzy sets and fuzzy logic: theory and
Handling. London, Taylor and Francis: 223–39 applications. Englewood Cliffs, Prentice-Hall
FAO/UNESCO 1990 Soil map of the world: revised legend. Kosko B 1990 Fuzziness vs probability. International Journal
FAO, Rome, World Soil Resources Report 60 of General Systems 17: 211–40
Fisher P F 1991a First experiments in viewshed uncertainty: Lagacherie P, Andrieux P, Bouzigues R 1996 The soil
the accuracy of the viewable area. Photogrammetric boundaries: from reality to coding in GIS. In Burrough P A,
Engineering and Remote Sensing 57: 1321–7 Frank A U (eds) Geographic objects with indeterminate
boundaries. London, Taylor and Francis: 275–86
Fisher P F 1991b Data sources and data problems. In
Maguire D J, Goodchild M F, Rhind D W (eds) Geographical Laviolette M, Seaman J W 1994 The efficacy of fuzzy
information systems: principles and applications. Harlow, representations of uncertainty. IEEE Transactions on Fuzzy
Longman/New York, John Wiley & Sons Inc. Vol. 1: 175–89 Systems 2: 4–15

Fisher P F 1992 First experiments in viewshed uncertainty: Leung Y C 1988 Spatial analysis and planning under
imprecision. New York, Elsevier Science
simulating the fuzzy viewshed. Photogrammetric Engineering
and Remote Sensing 58: 345–52 Monckton C G 1994 An investigation into the spatial structure
of error in digital elevation data. In Worboys M (ed.)
Fisher P F 1993 Algorithm and implementation uncertainty in
Innovations in GIS 1. London, Taylor and Francis: 201–11
the viewshed function. International Journal of Geographical
Information Systems 7: 331–47 Moraczewski I R 1993 Fuzzy logic for phytosociology 1:
syntaxa as vague concepts. Vegetatio 106: 1–11
Fisher P F 1994a Probable and fuzzy models of the viewshed
operation. In Worboys M (ed.) Innovations in GIS 1. Office of National Statistics 1997 Harmonised concepts and
London, Taylor and Francis: 161–75 questions for government social surveys. London, Her
Majesty’s Stationery Office
Fisher P F 1996a Concepts and paradigms of spatial data. In
Openshaw S (ed.) 1995a Census users’ handbook. Cambridge
Craglia M, Couclelis H (eds) Geographic information research:
(UK), GeoInformation International
bridging the Atlantic. London, Taylor and Francis: 297–307
Pawlak Z 1982 Rough sets. International Journal of Computer
Fisher P F, Pathirana S 1991 The evaluation of fuzzy
and Information Sciences 11: 341–56
membership of land cover classes in the suburban zone.
Remote Sensing of Environment 34: 121–32 Pratchett T 1986 The light fantastic. Gerrards Cross, Colin
Smythe
Foody G M 1992 A fuzzy sets approach to the representation
of vegetation continua from remotely-sensed data: an Prescott J R V 1987 Political frontiers and boundaries. London,
Allen and Unwin
example from lowland heath. Photogrammetric Engineering
and Remote Sensing 58: 221–5 Primavesi A L, Evans P A 1988 Flora of Leicestershire.
Leicester, Leicestershire County Museum Service
Foody G M 1996 Approaches to the production and
evaluation of fuzzy land cover classification from remotely- Robinson V B 1988 Some implications of fuzzy set theory
sensed data. International Journal of Remote Sensing 17: applied to geographic databases. Computers, Environment,
1317–40 and Urban Systems 12: 89–98
Goodchild M F, Chi-Chang L, Leung Y 1994 Visualising Rumley D, Minghi J V (eds) 1991 The geography of border
fuzzy maps. In Hearnshaw H M, Unwin D J (eds) landscapes. London, Routledge
Visualisation in geographical information systems. Chichester, Sainsbury R M 1995 Paradoxes, 2nd edition. Cambridge
John Wiley & Sons: 158–67 (UK), Cambridge University Press

204
Models of uncertainty in spatial data

Soil Classification Working Group 1991 Soil classification, a Wang F, Hall G B, Subaryono 1990 Fuzzy information
taxonomic system for South Africa. Memoirs on Agricultural representation and processing in conventional GIS software:
Natural Resources of South Africa 15, Pretoria database design and application. International Journal of
Soil Survey Staff 1975 Soil taxonomy: a basic system of soil Geographical Information Systems 4: 261–83
classification for making and interpreting soil surveys. USDA Wang M, Gong P, Howarth P J 1991 Thematic mapping from
Agricultural Handbook 436. Washington DC, Government imagery: an aspect of automated map generalisation.
Printing Office Proceedings of AutoCarto 10. Bethesda, American Congress
on Surveying and Mapping: 123–32
Tavernier R, Louis A 1984 Soil map of the European
Communities. Luxembourg, Office of Offical Publications of Webster R, Oliver M A 1990 Statistical methods in soil and
the European Communities land resource survey. Oxford, Oxford University Press
Taylor J R 1982 An introduction to error analysis: the study of Williamson T 1994 Vagueness. London, Routledge
uncertainties in physical measurements. Oxford, Oxford Young E 1992 Hunter-gatherer concepts of land and its
University Press/Mill Valley, University Science Books ownership in remote Australia and North America. In
Viridians 1996 Victorian flora database CD-ROM. Brighton Anderson K, Gale F (eds) Inventing places; studies in cultural
East, Victoria, Viridians Biological Databases geography. Melbourne, Longman: 255–72

Walsby J C 1995 The causes and effects of manual digitising Zadeh L A 1965 Fuzzy sets. Information and Control 8: 338–53
on error creation in data input to GIS. In Fisher P F (ed.) Zadeh L A 1980 Fuzzy sets versus probability. Proceedings of
Innovations in GIS 2. London, Taylor and Francis: 113–22 the IEEE 68: 421

205

You might also like