Edition - 2013 - Encyclopedia of GIS (3) - Annotated PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2504

Shashi Shekhar • Hui Xiong • Xun Zhou

Editors

Encyclopedia of GIS
Second Edition

With 1054 Figures and 118 Tables

123
Editors
Shashi Shekhar Hui Xiong
University of Minnesota Management Science and Information Systems
Minneapolis, MN Department
USA Rutgers Business School
Rutgers, The State University of New Jersey
Xun Zhou Newark, NJ
Department of Management Sciences USA
Tippie College of Business
University of Iowa
Iowa City, IA
USA

ISBN 978-3-319-17884-4 ISBN 978-3-319-17885-1 (eBook)


ISBN 978-3-319-17886-8 (print and electronic bundle)
DOI 10.1007/978-3-319-17885-1

Library of Congress Control Number: 2017930703

1st edition: © SpringerScience+Buisiness Media LLC 2008


© Springer International Publishing AG 2017
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole
or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way,
and transmission or information storage and retrieval, electronic adaptation, computer software,
or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are
exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in
this book are believed to be true and accurate at the date of publication. Neither the publisher
nor the authors or the editors give a warranty, express or implied, with respect to the material
contained herein or for any errors or omissions that may have been made.

Printed on acid-free paper

This Springer imprint is published by Springer Nature


The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Foreword by Brian Berry

The publication of a definitive Encyclopedia of GIS that lays out many of


the computer science/mathematics foundations of the field is a major event,
the culmination of a half century of development. I was part of the earliest
stirrings in the mid-1950s. A small group of geography graduate students
at the University of Washington, William Garrison’s “space cadets,” began
to assemble what became the foundations of contemporary spatial analysis
and to refocus mathematical cartography while working with civil engineer
Edgar Horwood on his attempts to use the printers of the time to produce
gray-shaded maps. Our attention was captured by Sputnik; however, we did
not anticipate that the USA’s response, the rapid development of NASA and
satellite systems, would be the key to the equally rapid development of remote
sensing or that global positioning would rewrite cartography. Among the
innovations of the time were Torsten Hägerstrand’s first simulation models
of space-time diffusion processes and early econometric interest in spatial
autocorrelation. Both themes are now central to spatial analysis.
The GIS focus shifted when Garrison, Marble, and I relocated to the
Chicago region. Garrison and I helped fund civil engineer Howard Fisher’s
first-generation computer graphics software, SYMAP I, and Marble and
I organized NSF workshops to spread the word and drew together an
initial overview of the field in spatial analysis. Fisher took his ideas to the
Ford Foundation and a subsequent grant to Harvard University, where he
established the Laboratory for Computer Graphics. The lab served as the
focus for research in the field well into the 1970s, providing the spark to
such innovators as Jack Dangermond, who subsequently established ESRI
and created what became the world’s most widely used computer graphics
software. Meanwhile, hardware development proceeded apace, as did imag-
ing and positioning capabilities created by the Department of Defense and
NASA, facilitating the resulting emergence of digital cartography and the
establishment of the first large-scale geographic information systems such as
the Canada Land Inventory. The rest, as they say, is history – albeit given
a new dynamic by the Internet and the continued evolution of computing
capabilities both on the desktop and in the supercomputer.
Fifty years after these beginnings, the result is a large and complex
field spanning many disciplines, continuing to grow and to expand into an
expanding array of applications. Cartographers have eschewed their pen-and-
ink, and rudimentary mapmaking is at the fingertips of everyone with Internet
access. Road atlases are fast giving way to satellite navigation systems.

xi
xii Foreword by Brian Berry

Congress continues to be concerned with the privacy issues raised by


geographic information system capabilities, yet police and fire departments
can no longer function effectively without GIS and homeland security
without modern database and data mining capabilities. From city planning
to store location, property taxation to highway building, disaster response to
environmental management, there are few arenas in which GIS is not playing
a significant role. What is important is the cross-cutting capability that was
recognized when the NSF funded the Center for Spatially Integrated Social
Science (CSICC) at the University of California, Santa Barbara, or my own
university’s PhD program, a joint venture of the School of Economic, Political
and Policy Sciences, the School of Natural Sciences and Mathematics, and the
School of Engineering and Computer Sciences.
I like to tell my colleagues that there are three levels of GIS education:
“Driver’s Ed,” “Mr. Goodwrench,” and “Design Team.” Driver’s Ed provides
essential skills to the broad base of software users; the Mr. Goodwrenches of
the world learn how to put together and maintain working software-hardware
installations, while Design Teams create new and improved GIS capabilities.
Most of the new data handling capabilities reside in the arenas of computer
science/mathematics, while advances in inference are coming from inno-
vations in the ability to handle space-time dynamics while simultaneously
accounting for serial and spatial dependence.
The Encyclopedia of GIS provides an essential reference work for all three
categories of users. In contrast to the current textbooks in the field, which are
keyed to Driver’s Ed, the Encyclopedia also provides a handy sourcebook
for the Mr. Goodwrenches while defining the platform on which future
Design Teams will build by focusing – more than existing works – on the
computer science/mathematical foundations of the field. I know that the GIS
community will value this important reference and guide, and I will cherish it
as a milestone. It marks how far we have come – far beyond what our group
of pioneers dreamed might be possible a half century ago. Professors Shekhar
and Xiong and the considerable community of contributors to the collection
have provided us with a comprehensive and authoritative treatment of the
field, extensively cross-referenced with key citations and further readings.
Importantly, it is available in both printed and XML online editions, the latter
with hyperlinked citations. The GIS world will be indebted to them. No GIS
bookshelf should, as they say, be without it.

School of Economic Brian J. L. Berry


Political and Policy Sciences
The University of Texas, Dallas
Dallas, TX, USA
McKinney, Texas
August 2007
Foreword by Michael Goodchild

Geographic information systems date from the 1960s, when computers were
mostly seen as devices for massive computation. Very significant technical
problems had to be solved in those early days: how did one convert the
contents of a paper map to digital form (by building an optical scanner from
scratch); how did one store the result on magnetic tape (in the form of a
linear sequence of records representing the geometry of each boundary line as
sequences of vertices); and how did one compute the areas of patches (using
an elegant algorithm involving trapezia). Most of the early research was about
algorithms, data structures, and indexing schemes and, thus, had strong links
to emerging research agendas in computer science.
Over the years, however, the research agenda of GIS expanded away
from computer science. Many of the technical problems of computation were
solved, and attention shifted to issues of data quality and uncertainty, the
cognitive principles of user interface design, the costs and benefits of GIS, and
the social impacts of the technology. Academic computer scientists interested
in GIS wondered if their research would be regarded by their colleagues as
peripheral – a marginally interesting application – threatening their chances
of getting tenure. Repeated efforts were made to have GIS recognized as
an ACM Special Interest Group, without success, though the ACM GIS
conferences continue to attract excellent research.
The entries in this encyclopedia should finally lay any lingering doubts to
rest about the central role of computer science in GIS. Some research areas,
such as spatiotemporal databases, have continued to grow in importance be-
cause of the fundamental problems of computer science that they address and
are the subject of several respected conference series. Geospatial data mining
has attracted significant attention from computer scientists as well as spatial
statisticians, and it is clear that the acquisition, storage, manipulation, and
visualization of geospatial data are special, requiring substantially different
approaches and assumptions from those in other domains.
At the same time, GIS has grown to become a very significant application
of computing. Sometime around 1995, the earlier view of GIS as an assistant,
performing tasks that the user found too difficult, complex, tedious, or
expensive to do by hand, was replaced by one in which GIS became the
means by which humans communicate what they know about the surface of
the Earth, with which they collectively make decisions about the management
of land, and by which they explore the effects of alternative plans. A host
of new issues suddenly became important: how to support processes of

xiii
xiv Foreword by Michael Goodchild

search, assessment, and retrieval of geospatial data; how to overcome lack of


interoperability between systems; how to manage large networks of fixed or
mobile sensors providing flows of real-time geographic data; how to offer
useful services on the very limited platform of a cell phone; and how to
adapt and evolve the technology in order to respond to emergencies and
to provide useful intelligence. A revitalized research agenda for computer
science emerged that shows no sign of diminishing and is reflected in many
of the topics addressed in this encyclopedia.
For example, computer scientists are engaged in the development of data
structures, algorithms, and indexing schemes to support the hugely popular
virtual globes (Google Earth, Microsoft’s Virtual Earth, NASA’s World Wind)
that have emerged in the past few years and are stimulating a whole new
generation of applications of geospatial technology. Research is ongoing on
sensor networks and the complex protocols that are needed to handle flows of
real-time data from massive numbers of devices distributed over the Earth’s
surface, in areas of scientific interest such as the seafloor, in vehicles acquiring
data on traffic movement, and in battlefields. Semantic interoperability, or the
ability of systems to share not only data but the meaning of data, remains a
thorny problem that will challenge the research community for many years to
come.
As a collection of well-written articles on this expanding field, this
encyclopedia is a welcomed addition to the GIS bookshelf. The fact that its
compilers have chosen to emphasize the links between GIS and computer
science is especially welcome. GIS is in many ways a boundary object, to
use a term common in the community of science historians: a field that has
emerged between two existing and recognized fields, in this case computer
science and geography, and which has slowly established its own identity. As
it does so, contributions such as this will help to keep those links alive and
to ensure that GIS continues to attract the interest of leading researchers in
computer science.

National Center for Geographic Information Michael F. Goodchild


and Analysis
and Department of Geography
University of California
Santa Barbara, CA, USA
Preface

It has been over 7 years since the publication of first edition of the Ency-
clopedia of GIS. During this period of time, we have witnessed numerous
significant advances in mobile technology and disruptive development in
business that are transforming the world: the widespread use of smartphones,
the increasing popularity of mobile apps, the wide deployment of location-
based services (LBSs), the fast-growing taxi-hailing services like Uber, the
evolution of mobile social networks, and, more recently, the global interests
in big data, unmanned aerial vehicles, and self-driving vehicles to improve
people’s lives. While various disciplines have been contributing to these new
advances, spatial computing and GIS techniques no doubt are playing a key
role here. For instance, localization is a fundamental issue for smartphones,
connected and self-driving vehicles, unmanned aerial vehicles, taxi-hailing
services, etc. Location information and location privacy are the essentials of
LBSs. Check-in recommendation is a key function of mobile social networks.
The study of spatial big data, such as Global Positioning System (GPS) traces
of vehicles and global climate data, helps people better understand human
mobility patterns as well as Earth climate change. Consequently, an influential
2011 report on big data from McKinsey included a chapter on location-based
big data.
To acknowledge the growth, the Association of Computing Machinery
(ACM) formed a special interest group, namely, SIGSPATIAL, and its annual
meeting attracts over 300 attendees. In addition, the Computing Research
Association’s Computing Community Consortium organized a multi-sector
multidisciplinary workshop titled “From GPS and Virtual Globes to Spatial
Computing 2020” at national academies in 2012 to assess the state of
the art and catalyze new research visions. A summary of the workshop
report appeared in the Communications of the ACM in January 2016 as the
cover article titled “Spatial Computing.” In summary, experts in GIS-related
fields and researchers from other disciplines have shown strong interests in
understanding these new spatial technologies and developments. Therefore,
we believe it is the time to develop the second edition of the encyclopedia
and include entries on the new emerging topics.
The second edition of the Encyclopedia of GIS also provides us an
opportunity to enhance the topic coverage and content timeliness of the first
edition. While over 200 entries across 50 different fields were included in

xv
xvi Preface

the first edition, there are still a few important topics left out, such as basic
concepts in GIS and GPS. As suggested by GIS colleagues, we have included
some of these topics in the second edition. Moreover, new research advances
on some existing fields of the first edition are also updated either by adding
new entries or through the revision of existing entries.
The second edition inherited all the key features from the previous edition.
Typical entries are 3000 words with sections such as definition, scientific
fundamentals, application domains, and future trends. Regular entries include
key citations and a list of recommended reading materials regarding the
literature. The encyclopedia is also simultaneously available as an HTML
online reference with hyperlinked citations, cross-references, four-color art,
links to web-based maps, and other interactive features.
It is worth noting that the first edition of the Encyclopedia of GIS has been
well received by a broad audience in the industry and academia. It is available
at thousands of libraries worldwide as well as on third-party websites such
as Google Books. By March 2016, the cumulative downloads via Springer
have been more than 133,000 not counting additional downloads via other
websites such as Google Books. Furthermore, it has received numerous
recognitions such as the CHOICE Outstanding Title Award. At the University
of Minnesota, the encyclopedia has been used as teaching materials in
two graduate-level spatial computing and spatial database research courses.
Its articles were used for the Fall 2014 Coursera’s massively open online
course titled “From GPS and Google Maps to Spatial Computing,” with
over 21,800 students from 182 countries. We hope that the second edition
could continue serving the research community and the general public as a
helpful introductory material to GIS, a resourceful research reference, and an
illustrative GIS textbook.

New Fields and Topics

The second edition includes 25 additional fields that are either previously
absent from the first edition or recently emerged as new research topics. Each
field has typically 3–10 articles. These fields include spatial computing infras-
tructure, spatial cognitive assistance, volunteering geographic information
(VGI), GPS-denied environment, statistically significant spatiotemporal pat-
tern mining, mobile economy, mobile recommender systems, spatial network
routing, spatial optimization, web-based GIS (industry perspective), location-
based recommendation systems, linear anomaly window detection, intelligent
transportation, GPU-based spatial computing, spatiotemporal analysis of
climate data, geospatial weather and climate nexus, spatial statistics, concepts
in spatial statistics, data science for GIS applications, 3D modeling and
analysis, geometric nearest-neighbor queries, modeling of spatial relations,
concepts in statistics for spatial and spatiotemporal data, high-performance
computing in GIS, and trends. Furthermore, there are two fields, road network
databases and constraint databases and data mining, which have been updated
Preface xvii

by the original editors with new concepts added or existing articles revised to
accommodate more recent research results and technical advances.
August 2016 Shashi Shekhar
Hui Xiong
Xun Zhou
0-9

categories of crisp clustering algorithm such


3D City Models as partitional algorithm, hierarchical algorithm,
density-based algorithm, and grid-based algo-
 Photogrammetric Applications rithm. The general definition of each group could
be defined as follows (Kovács et al. 2005):

3D Crisp Clustering of Geo-Urban • Partitional algorithms: divide the data into


Data a set of separate category. This algorithm
attempts to define the number of partitions
Suhaibah Azri1 , Alias Abdul Rahman1 , Uznir to optimize a certain criterion function. This
Ujang1 , François Anton2 , and Darka Mioc2 optimization is an iterative procedure.
1
Department of Geoinformation 3D GIS • Hierarchical algorithms: This algorithm cre-
Research Lab, Universiti Teknologi Malaysia, ates clusters repeatedly by merging a small
Johor Bahru, Malaysia cluster into a larger cluster. It also split cluster
2
Department of Geodesy, Technical University into several small classes.
of Denmark, Lyngby, Denmark • Density-based algorithms: By using this tech-
nique, clusters are generated based on its den-
sity function and produced arbitrary shaped
Synonyms clusters.
• Grid-based algorithms: These types of algo-
3D data clustering; 3D Geo-DBMS; 3D spatial rithms are widely used for the application
indexing; Access method; Urban data manage- of spatial data mining. The search space is
ment; Vector quantization quantized into a finite number of cells.

Definition Historical Background

Crisp clustering is a technique to cluster objects The crisp clustering algorithm has been used
into group without having overlapping partitions. ubiquitously in many fields and areas such as web
Each data point is either belongs to or not to mining, spatial data analysis, business, prediction
a group. Most of the clustering algorithms are based on groups, and much more. In the past
categorized as crisp clustering. There are several few years, a number of algorithms have been

© Springer International Publishing AG 2017


S. Shekhar et al. (eds.), Encyclopedia of GIS,
DOI 10.1007/978-3-319-17885-1
2 3D Crisp Clustering of Geo-Urban Data

invented and proposed for various applications. According to Ng (1994), PAM is an expensive
These algorithms can be represented based on its algorithm in finding medoid. This is due to
categories as follows. its properties that exchange the medoid with
other objects until all of the objects meet the
Partitional Algorithms requirement as a medoid.
• k-means • CLARA (Clustering Large Applications)
k-means is the most widely used crisp clus- CLARA used PAM as part of its tech-
tering algorithms in various applications such nique. From a set of data, it produced
as machine learning, statistical analysis, and multiple samples and applies PAM on the
computer visualization. k-means was invented samples
by MacQueen in 1967 to deal with the prob- • CLARANS (Clustering Large Applications
lem of data clustering (MacQueen 1967). The based on Randomized Search)
aim of this clustering technique is to optimize By combining its technique with PAM,
the objective function which can be described CLARANS started the process by searching
as follows: a graph on each node that has a potential
solution. This process produced a set of
X
c X
ED d.x; mi / (1) k medoid. Medoid will be replaced after
iD1 x2ci
this process and clusters will be produced.
Produced clusters are a neighboring cluster of
From the Eq. (1), the cluster center of Ci is mi, the existing clustering. In this technique, node
while d is the distance from point x to point will be selected and compared to user-defined
mi . In the equation, the criterion of function E number. CLARANS moves to another node
will minimize the distance between point and neighbor to start the process when the best
cluster center. A set of C cluster centers was candidate is found. If not, the local optimum
chosen at the initial step. Then, each object is is found, and node will be selected randomly
assigned to the nearest cluster center. The cen- to search a new local optimum.
ters are then recomputed, and the process con-
tinues until the cluster center stops changing.
• PAM (Partitioning Around Medoid) Hierarchical Algorithms
This algorithm attempts to find the medoid for Hierarchical algorithm is a method that produces
each cluster. The algorithm starts by searching the hierarchy of clustering. The application of
the nearest objects that are located in the clus- these clustering approaches could be found in
ter. The algorithm of PAM first will compute various fields such as modern biology, biological
a k representative object which is a medoid. taxonomy, as well as computer science and
A medoid is an object that has a very min- engineering. According to Theodoridis and
imal average dissimilarity. After finding the Koutroumbas (2009), hierarchical algorithms
medoid, each object is grouped to the nearest could be divided into two categories:
medoid, where object i is grouped into cluster
Pi when medoid mPi is the nearest than other • Agglomerative algorithms: The algorithms
medoids. produce a decreasing number of clusters
in each step. Two nearest clusters will be
d.i; mPi /£d.i; mk /for all x D 1; : : : ; k (2) merged to produce sequences of clustering
schemes.
The k number of objects is expected to min- • Divisive algorithms: Contrary to the agglom-
imize the objective function of PAM. The erative algorithms, these algorithms produce
objective function is described as follows: an increasing number of clustering each step.
X Each group is split into two clusters to produce
d.i; mpi /A (3) sequences of clustering scheme.
3D Crisp Clustering of Geo-Urban Data 3

The example of some hierarchical based algo- • STING (Statistical Information Grid-based
rithm could be described as follows: method) 0-9
STING is proposed by Wang et al. (1997). It
• BIRCH divides the space or region into several rect-
BIRCH by Zhang et al. (1996) uses CF-tree angular cells based on hierarchical structure.
as a hierarchical structure to partition a point Statistical parameters (i.e., min, max, mean,
dataset. BIRCH is also the first algorithm that etc.) are used to calculate numerical feature
could handle noise efficiently. of each object in the cell. Then clustering
• CURE information is represented based on the hier-
CURE by Guha et al. (1998) select points archical structure of the grid cell. This clus-
from a set of data and then pull them toward tering approach offers the efficiency of search
the cluster center. To cater the large volume queries.
application such as large database, CURE will • WaveCluster
use the combination of random sampling tech- WaveCluster is invented by Sheikholeslami
nique and partition clustering. et al. (2000). This algorithm is invented from
signal processing and frequency domain. The
Density-Based Algorithms process started by imposing multidimensional
This type of algorithm considers a cluster as a grid structure onto the space. Information is
region in the n-dimensional space. Most of these represented by grid cell and will be trans-
algorithms do not enforce any restriction to the formed using wavelet transformation. To find
produced result. It has the ability to handle the the cluster, a dense region in the transforma-
outliers. The time complexity is O.N2 / which is tion domain needs to be identified.
suitable for large data processing.

• DBSCAN (Density-Based Spatial Clustering Scientific Fundamental


of Applications with Noise)
In DBSCAN (Ester et al. 1996) algorithm, 3D geospatial data are expected to be the core of
each point in group cluster requires to have spatial data in the near future. This is due to the
at least minimum number of point based on increasing demand of 3D geospatial application
certain radius. This algorithm could handle and the state of the art of 3D spatial data cap-
noise or outliers effectively. For an incremen- turing such as LiDAR (light detection and rang-
tal clustering, DBSCAN is used as a basic ing), UAV (unmanned aerial vehicle), and TLS
clustering algorithm. Efficient insertion and (terrestrial laser scanning). The application of
deletion of an object to an existing cluster 3D data provides a better understanding of real-
could also be handled by using DBSCAN. world environment for its realistic visualization.
• DENCLUE (Density-Based Clustering) However, the issues of data management arise
DENCLUE is suggested by Hinneburg and when data need to be constellated in the database
Keim (1998). This clustering technique is to system. One of the issues is the volume size of
cluster large database application such as mul- 3D geospatial data. The size of 3D geospatial data
timedia. The algorithm models the point den- is large compared with 2D due to the geometric
sity analytically. By determining the density detail attached to it and other information such
attractors, cluster will be easily identified. as image, attribute, etc. Thus, a bigger space and
disk size is needed to store 3D geospatial data.
Grid-Based Algorithms For example, produced 3D geospatial data for
Grid-based algorithm is the clustering technique an urban area using laser scanning techniques
that quantizes a space or region into a finite num- require up to 63 GB disk space (Wand et al.
ber of cells. Recently, this type of clustering has 2007). For 3D urban dataset, the volume size is
been used increasingly in spatial application: usually large due to the high building density.
4 3D Crisp Clustering of Geo-Urban Data

Massive 3D geospatial dataset would be very Zhu et al. (2007), Deren et al. (2004), and Zla-
complex to be constellated in the database sys- tanova (2000). Based on those studies and re-
tem. Thus, data model is used as a guideline views, most researchers agree that the transition
to manage all these data. By using data model, of 2D R-tree structure to 3D R-tree would be a
geospatial data will be transformed into a set of starting point toward a promising 3D spatial in-
rows and records in the database. This dataset is dex structure. R-tree index structure was invented
then retrieved, processed, and analyzed to trans- by Guttman in (1984). It is a simple data structure
form it into valuable information. However, due that bounded objects with minimum bounding
to the large volume of geospatial data, perfor- rectangle (MBR). The structure of 3D R-tree and
mance of data retrieval is easily deteriorated dur- original R-tree is not much different even after the
ing query operation due to the inspection and ex- transition of its dimensionality. However, when
amination process of each row and record in the the R-tree is extended to the third dimension (3D
database. In some applications, performance of R-tree), the minimum bounding volume (MBV)
data retrieval is very important. For example, in between nodes is frequently overlap. In certain
business service application, retrieving customer case, MBV of node could also be covered by the
information on the specific time is important for other MBV. Overlapping node is the main reason
efficient delivery service. For service-based busi- for the low efficiency of query performance due
ness, punctuality is very important for company to multipath query and replicated data entry.
reputation. Fast data retrieval is also important for In several cases of urban application such as
emergency response application such as hospital real-time application, geospatial data or urban ob-
and fire station. In this case, time management is ject is frequently updated. Thus, rows or records
very important because each of every second is in the database will be modified through the pro-
meaningful. cess of data updating such as insert, delete, and
Since time is very important for data retrieval, update. This process is actually affecting the in-
a specific technique is required to boost up the dex structure of 3D R-tree. In certain case, nodes
performance during query operation. In spatial in the tree structure are overflown with M +1
database, spatial access method is used to support entries or underflow with n < minimum entry, m.
efficient spatial selection, especially for range In these cases, nodes may need to be merged
queries, map overlay, spatial analysis, and spatial with other node or split using splitting operation.
join. However, without spatial indexing, full table Splitting operation is the most critical process for
scans need to be performed in order to meet spa- R-tree index structure (Fu et al. 2002; Liu et al.
tial selection criterion. Therefore, spatial index- 2009; Korotkov 2012; Sleit and Al-Nsour 2014).
ing is required to address object efficiently with- At this phase the tree structure will be altered,
out examining every row and record. In spatial and, at the same time, it should produce minimal
database, the development of 2D spatial indexing overlapping node, minimal coverage area, and
is well established compared to 3D counterpart. minimal tree height. These issues become critical
2D spatial index structures are not the best fit when it comes to 3D. The minimization of over-
solution to be used for 3D geospatial data since lap coverage of MBV is more complex, and the
the data types and relationships between objects splitting operation requires a different approach
are defined differently than in 2D. Until now, than in 2D.
a well-established index structure for 3D spatial Crisp clustering considers non-overlapping
information is still an open research problem. partitions in its approach. Thus, each object either
Thus, a dedicated index structure for 3D geospa- belongs to or not to a class. This characteristic is
tial information is significant for efficient data suitable with the aim of R-tree (Guttman 1984)
retrieval. structure which is an object that will be appeared
The effort of developing 3D spatial indexing only once in an index node. The idea is to cluster
could be seen in several researches and studies; 3D geospatial data based on classes. Each class
see Wang and Guo (2012), Gong et al. (2009), represents a node or MBV of 3D R-tree. This
3D Crisp Clustering of Geo-Urban Data 5

approach is different with respect to the original Step 1:


R-tree approach, and it is expected to produce Choose initial center C1 0-9
better result of 3D R-tree structure. Step 2:
Among the crisp clustering techniques, k- Choose a new center Ci , by choosing p 2 P
means is the most widely used clustering with probability
in various applications. However, there is a
function in k-means that is NP (non-deterministic D.p/2
polynomial time) hard problem that causes this P (4)
p2p D.p/
2
clustering approach to have more than one cluster
center in the same group. Having more than one Step 3:
cluster center in the same group can cause a Step 2 is continued until k centers
serious overlapping node since the cluster center C1 ; : : : : : : ; Ck are chosen
is not evenly spread. Step 4:
In order to overcome this issue, we proposed Proceed with the standard k-means approach
the new addition of improved k-means crisp
clustering algorithm, k-means++. Back in 2007, The proposed k-means++ crisp clustering al-
Arthur and Vassilvitskii (2007) introduced the gorithm is proved to produce a better version of
approach of careful seeding to improve k-means 3D R-tree compared to k-means approach. In this
algorithm. By using this approach, initial seeds paper, we adopted this clustering approach to be
are defined and then the remaining objects are utilized in the construction of 3D R-tree as well
clustered based on the nearest distance to the as for its splitting operation of the overflown node
initial seeds. This algorithm is proven to yield N with M C 1 entry.
improvement in terms of accuracy of its original The workflow of 3D R-tree based on proposed
algorithm. The cluster centers are evenly spread crisp clustering approach is illustrated in Fig. 1.
compared to k-means algorithm. In this paper, the By using this workflow, a set of 3D objects is
algorithm of k-means++ is expanded to 3D for tested. There are 200 objects (n D 200) which
the urban data purposes. The description of this have been clustered in this test as shown in Fig. 2.
algorithm could be described as follows: The maximum entry M for each node is set to
25 which means only 25 objects are allowed in
each MBV. As a result, objects are grouped into
Input: eight classes: P, Q, R, S, T, U, V, and W. However,
a set of 3D vector data P = fp1 ; p2 ; : : :; pn g 2 among these groups there are three MBVs (R,
Rd V, and W) exceeding an M number of entries.

3D Crisp Clustering of
Geo-Urban Data, Fig. 1
3D R-tree workflow
6 3D Crisp Clustering of Geo-Urban Data

Q
P R

S
W

T
V
P Q R S T U V W
U
P1, Q1, R1, S1, T1, U1, V1, W1,
P2, Q2, R 2, S 2, T 2, U2, V 2, W 2,
P3, Q3, R 3, S 3, T 3, U 3, V 3, W 3,
P4 Q4, R 4, S 4, T 4, U 4, V 4, W 4, Parent Node
P5, Q5, R 5, S 5, T 5, U 5, V 5, W 5,
P6, Q6, R 6, S 6, T 6, U 6, V 6, W 6,
P7, Q7, R 7, S 7, T 7, U 7, V 7, W 7, Child Node
P8, Q8, R 8, S 8, T 8, U 8, V 8, W 8,
P9, Q9, R 9, S 9, T 9, U 9, V 9, W 9,
P10, Q10, R 10, S 10, T10, U10, V10, W10,
P11, Q11, R 11, S 11, T11, U11, V11, W11,
….. …. ……. ….. ….. ….. ….. ….. Results of the First Cycle
….. … ……. ….. ….. ….. ….. …..
Number of Clusters = 8
….. ….. ……. ….. ….. ….. ….. …..
Maximum Entry = 25
…..... ….. ……. ….. ….. ….. ….. …..
P19 Q25 R29 S20 T22 U23 V 34 W29 Nodes for Second Cycle = R, V and W

3D Crisp Clustering of Geo-Urban Data, Fig. 2 Clustered objects using crisp clustering

Coverage Percentage Overlap Percentage


94% 92%
88%
82%

45%

23%

Original R-Tree
k-means Clustering
Proposed Crisp
Clustering

3D Crisp Clustering of Geo-Urban Data, Fig. 3 Comparison of overlap percentage and coverage percentage

Thus, MBVs R, V, and W are qualified for the clustered based on the proposed approach. The
next cycle. In the second cycle, each node will be input data of these 3D buildings are based on
split and divided into two subgroups of R (Sub- LoD 2 (Level of Detail) of CityGML format. The
R1and Sub-R2), V (Sub-V1and Sub-V2), and W cluster classes for this dataset are set to 20 with
(Sub-W1and Sub-W2). maximum entry M where M is set to 25 for
To evaluate the efficiency of the proposed each class. Classes are then formed into MBV of
approach in constructing and producing efficient 3D R-tree. The result of this experiment is then
structure of 3D R-tree, a set of 3D vector data compared with the original R-tree and original k-
are tested in this experiment. The datasets are means crisp clustering. Figure 3 shows the com-
from 3D volumetric objects (i.e., 3D building). parison of overlapping percentage and coverage
For the first experiment, a set of 500 buildings percentage of the proposed approach with other
in an urban area as represented in Fig. 3 are approaches.
3D Crisp Clustering of Geo-Urban Data 7

0-9
Exhaustive
• Overlap Percentage = 97%

New Linear

• Overlap Percentage = 88%

Proposed Crisp Clustering

• Overlap Percentage = 20%

3D Crisp Clustering of Geo-Urban Data, Fig. 4 Percentage of overlap using different approaches

The same dataset in Fig. 3 is tested with for lision detection. However, problem arises while
node splitting operation. As mentioned in the utilizing this approach such as visiting node more
previous section, splitting operation of 3D R- than once and the transformation of node into
tree should preserve minimal overlapping among local coordinate system (Figueiredo et al. 2010).
node, minimal coverage area, as well as tree As a consequence, the performance for the col-
height. In this test three different splitting ap- lision detection process will be deteriorated. By
proaches are used for comparison purposes such using 3D R-tree based on the crisp clustering
as new linear (Ang and Tan 1997), exhaustive R- approach, the process of finding collision detec-
tree (Guttman 1984), and crisp clustering. From tion would be very efficient without visiting node
the Fig. 4, the percentage of total overlap between repetitively.
nodes indicates that crisp clustering offers a min-
imal percentage which is 20%. Meanwhile, the
percentage for original exhaustive R-tree is 97% Real-Time Application
and new linear 88%. Real-time application such as in-vehicle satellite
navigation or web-based system is exposed with
active data updating operation such as updated
coordinate information and number of online
Key Application users information. To retrieve a set of data within
a specific time, a performance booster such as
Collision Detection 3D R-tree spatial indexing could be used for
Collision detection is important in many com- this application. Frequent data updating process
puter graphics and visualization. Usually classi- needs an efficient index structure with minimal
cal hierarchical traversal scheme is used for col- overlap.
8 3D Crisp Clustering of Geo-Urban Data

Point Cloud Data Management Arthur D, Vassilvitskii S (2007) k-means++: the ad-
Dealing with millions of point cloud data col- vantages of careful seeding. In: Proceedings of the
eighteenth annual ACM-SIAM symposium on discrete
lected from airborne sensors or terrestrial laser algorithms, New Orleans. Society for Industrial and
scanner often creates many problems in data Applied Mathematics, pp 1027–1035
management and visualization. In this case, spa- Deren L, Qing Z, Qiang L, Peng X (2004) From 2D
tial indexing is used to retrieve points efficiently and 3D GIS for CyberCity. Geo-Spat Inf Sci 7(1):1–5.
doi:10.1007/bf02826668
from a huge and massive dataset. One of the Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-
famous spatial indexing techniques used for this based algorithm for discovering clusters in large spa-
application is R-tree index structure. However, R- tial databases with noise. Paper presented at the pro-
tree suffers with serious overlap among nodes, ceeding of 2nd international conference on knowledge
discovery and data mining, Portland
which could cause multipath query and deterio- Figueiredo M, Oliveira J, Araújo B, Pereira J (2010) An
rates the performance of data retrieval. By using efficient collision detection algorithm for point cloud
the crisp clustering algorithm, the risk of having models. In: 20th international conference on computer
multipath query could be reduced and increase graphics and vision, Warsaw. Citeseer, p 44
Fu Y, Teng J-C, Subramanya S (2002) Node splitting
the efficiency of search and query operation to- algorithms in tree-structured high-dimensional indexes
ward a massive point cloud collection. for similarity search. In: Proceedings of the 2002 ACM
symposium on applied computing, Madrid. ACM,
pp 766–770
Gong J, Ke S, Li X, Qi S (2009) A hybrid 3D spatial access
Future Directions method based on quadtrees and R-trees for globe data.
74920R–74920R. doi:10.1117/12.837594
Based on our observation, 3D R-tree has its own Guha S, Rastogi R, Shim K (1998) CURE: an efficient
limitation during the data updating operation. clustering algorithm for large databases. SIGMOD
Rec 27(2):73–84. doi:10.1145/276305.276312
Whenever the updating process occurs, such as Guttman A (1984) R-trees: a dynamic index structure
insert operation or delete operation, the tree struc- for spatial searching. SIGMOD Rec 14(2):47–57.
ture needs to be revised and all nodes including doi:10.1145/971697.602266
root node need to be modified. This cost may be Hinneburg A, Keim DA (1998) An efficient approach to
clustering in large multimedia databases with noise.
significant for the frequent update application or Paper presented at the proceedings of the 4th ACM
moving objects. Besides that, it also could reduce SIGKDD, New York
the processing time and increase the performance Korotkov A (2012) A new double sorting-based node
efficiency. Thus, a special technique in handling splitting algorithm for R-tree. Programm Comput
Softw 38(3):109–118
data updating using R-tree without the revision of Kovács F, Legány C, Babos A (2005) Cluster validity
its structure would be a very interesting topic for measurement techniques. In: Proceeding of sixth inter-
future directions of this study. national symposium Hungarian researchers on compu-
tational intelligence (CINTI), Barcelona. Citeseer,
Liu Y, Fang J, Han C (2009) A new R-tree node splitting
algorithm using MBR partition policy. In: 2009 17th
Cross-References international conference on geoinformatics, Fairfax.
IEEE, pp 1–6
MacQueen J (1967) Some methods for classification and
 Access Method
analysis of multivariate observations. In: Proceedings
 R-tree of the fifth Berkeley symposium on mathematical
 Spatial Indexing statistics and probability, Berkeley, p 14
Ng RT, Han J (1994) Efficient and effective clustering
methods for spatial data mining. In: Proceedings of the
20th VLDB conference, Santiago
References Sheikholeslami G, Chatterjee S, Zhang A (2000)
WaveCluster: a wavelet-based clustering approach
Ang CH, Tan TC (1997) New linear node splitting al- for spatial data in very large databases. VLDB J
gorithm for R-trees. In: Scholl M, Voisard A (eds) 8(3–4):289–304. doi:10.1007/s007780050009
Advances in spatial databases, vol 1262. Lecture notes Sleit A, Al-Nsour E (2014) Corner-based splitting: an
in computer science. Springer, Berlin/Heidelberg, improved node splitting algorithm for R-tree. J Inf Sci.
pp 337–349. doi:10.1007/3-540-63238-7_38 doi:10.1177/0165551513516709
3D Indoor Models and Their Applications 9

Theodoridis S, Koutroumbas K (2009) Chapter 13 –


clustering algorithms II: hierarchical algorithms. 3D Indoor Models and Their 0-9
In: Theodoridis S, Koutroumbas K (eds) Pattern Applications
recognition, 4th edn. Academic, Boston, pp 653–700.
doi:http://dx.doi.org/10.1016/B978-1-59749-272-0.50
Sisi Zlatanova1 and Umit Isikdag2
015-3 1
Wand M, Berner A, Bokeloh M, Fleck A, Hoffmann M, 3D Geoinformation, Faculty of Architecture
Jenke P, Maier B, Staneker D, Schilling A (2007) Inter- and the Built Environment, Delft University of
active editing of large point clouds. In: SPBG, Prague, Technology, Delft, The Netherlands
pp 37–45 2
Department of Informatics, Mimar Sinan Fine
Wang W, Yang J, Muntz RR (1997) STING: a statistical
information grid approach to spatial data mining. In: Arts University, Istanbul, Turkey
Paper presented at the proceedings of the 23rd interna-
tional conference on very large data bases, Athens
Wang Y, Guo M (2012) An integrated spatial indexing of
huge point image model. In: Paper presented at the Definition
international archives of the photogrammetry, remote
sensing and spatial information Sciences, Melbourne, Indoor environments are often referred as to en-
25 Aug–01 Sept 2012
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH:
closed spaces. However, the general definition of
an efficient data clustering method for very space can already indicate that a space can be
large databases. SIGMOD Rec 25(2):103–114. bounded. Wordnet (http://wordnet.priceton.edu)
doi:10.1145/235968.233324 defines space as “an empty areas usually bounded
Zhu Q, Gong J, Zhang Y (2007) An efficient 3D R-
tree spatial index method for virtual geographic in some way between things.” Specialized ontolo-
environments. ISPRS J Photogramm Remote Sens gies such as OmniClass (http://www.omniclass.
62(3):217–224. doi:http://dx.doi.org/10.1016/j. org, a classification for architecture, engineering,
isprsjprs.2007.05.007 and construction in North America) distinguish
Zlatanova S (2000) 3D GIS for urban development. In-
ternational Institute for Aerospace Survey and Earth between spaces by form and spaces by function.
Sciences (ITC) “Spaces by form are basic units of the built
environment delineated by physical or abstract
boundaries and characterized by physical form.”
“Spaces by function are basic units of the built
3D Data Clustering environment delineated by physical or abstract
boundaries and characterized by their function.”
 3D Crisp Clustering of Geo-Urban Data The spaces can be both 2D and 3D. For example,
space by form can be a 3D room or a 2D walking
path. An interesting example is a wall (interior,
exterior), which is considered a space by func-
3-D Data Models tion, which implies that spaces can be filled with
some material, i.e., not just air.
 Validation of Three-Dimensional Geometries Indoor spaces are artificial constructs designed
and developed to support human activities. 3D
indoor models, being a virtual digital representa-
3D Geo-DBMS tions of indoor spaces, have to be able to support
these activities.
 3D Crisp Clustering of Geo-Urban Data

Historical Background
3-D GIS Indoor mapping and modeling has received an
increased level of attention during the last decade
 Validation of Three-Dimensional Geometries (Worboys 2011; Zlatanova et al. 2014). Indoor
10 3D Indoor Models and Their Applications

space differs from outdoor space in many aspects: door models such as Industry Foundation Classes
the space is smaller and closed; there are many (IFC) or CityGML LOD4. Although this is a
constraints such as walls, doors, stairs, and furni- valuable approach, it is often insufficient. The
ture; the structure is multilayered, frequently con- existing models might be outdated, incomplete,
taining intermediate and irregular spaces; and the or even not existing. In such cases, new measure-
lighting is largely artificial and so forth (Figs. 1 ments are required using a range of sensors and
and 2). To be able to represent indoor spaces in processing techniques. The processed raw data
a proper manner, many data acquisition concepts, are then organized in 3D geometry representa-
data models, and ISO/OGC standards have to be tions such as 3D vector (B-reps, CSG, BIM) and
defined or redefined to meet the requirements of 3D raster (or dense colored point clouds). Some
indoor spatial applications (Figs. 3 and 4). of these representation have semantics and topol-
Indoor models representing 3D information ogy. The tendency is to identify semantics and
can be generated by using various manual, semi- topology at very early stage of data processing, to
automatic, and automatic methods. The global re- avoid post-processing and the so-called semantic
search trends are focused on finding methods for enrichment of geometric models (Billen et al.
automatic generation. Many of them are on model 2014). Azri et al. (2012) has identified several
transformation such as generation of application- possible approaches for automatic generation of
specific indoor models from general digital in- 3D indoor models (Table 1).

3D Indoor Models and Their Applications, Fig. 1 Example of obstacles (left) and intermediate floors (right)

3D Indoor Models and Their Applications, Fig. 2 Examples of “rooms inside rooms” (left) and complex layered
structures (right)
3D Indoor Models and Their Applications 11

3D Indoor Models and


Their Applications, Fig. 3 0-9
The overlap between GIS
and CAD/BIM domains
(Modified after Jacob
Beetz)

IfcSpace
ExteriorObject (Building)

RelatingSpace (Boundedby)

IfcRelSpaceBoundary

RelatedBuildingElement

Ifcwall

Ifcwall InteriorWallSurface
Room (space
surrounded by
surfaces)

IfcSpace

3D Indoor Models and Their Applications, Fig. 4 Conceptual difference between IFC (left) and CityGML LOD4
(right) for modeling interiors (Courtesy Fillipo Mortari)

Computer-aided design (CAD) and lately and detailed models did not focus on maintenance
the architecture, engineering, and construction of attributes and lack the support of geodetic
(AEC) are the oldest domain offering 3D tools reference systems. Although CAD models offer a
for representation of indoors. CAD was primarily convenience in representing indoor information,
developed for engineers responsible for designing several drawbacks of CAD models have been
and building facilities (Azri et al. 2012). It is easy revealed. For instance, CAD is only a platform to
to compute and design with CAD tools due to its design and model geometries. Thus, information
friendly environment and dynamic interaction. such as attribute, topology can only be tagged
CAD tools which were dealing with large-scale externally during the design process. Some
12 3D Indoor Models and Their Applications

3D Indoor Models and Their Applications, Table 1 Approaches and Methodologies of Automatic Indoor Model
Generation
Generation approaches Method(s) to be utilized Enriching semantics Enriching geometry
Document analysis • Text analysis Documents recordings Documents recordings
• Speech analysis
• Video analysis
Data fusion • Data processing ID tags CAD files Docu- CAD/GIS files Point
• Model integration ments clouds Videos/images
Model transformation • Transformation BIM city models BIM city models
User-based • SLAM GUI/software GUI/software

new extensions of CAD/AEC (Bentley Systems, a priority topic of research in GIS society.
Autodesk products) do allow the maintenance of Today, CityGML is the best known model for
topology and semantics but in a quite vendor- 3D indoor modeling. CityGML is developed
dependent way. Therefore, the topology and for representing 3D city geometry, (a kind
semantics are lost when the model is exported to of) topology, and thematic-semantic modeling.
another software tool. If the information attached CityGML can be used to represent buildings
to the model is not transferred together with the and building parts and properties in different
model, the users can only interpret information levels of detail (LOD) (i.e., from LOD0 up to
from what they have seen through the model. In LOD4). CityGML LOD4 provides a semantic-
addition, if the building model was developed thematic model for representing indoors. The
with low level of detail, there may not be much indoor objects are much less than the objects
geometric and semantic information that can be that can be represented in IFC. However, their
extracted and used. simplicity seems quite sufficient for a large group
Building information model (BIM) is the next of outdoor and indoor applications (Billen et al.
stage in the digital representation of building 2014).
interiors and facilities. BIMs can be used to Which of the two most prominent standards
model building information in 3D with the sup- will be used for 3D indoor modeling depends
port of an intelligent database that contains in- very much on the application. CAD/BIM domain
formation for design decision making, produc- has been traditionally dealing with very large-
tion of accurate construction documents, pre- scale representations, while GIS with very small
diction of performance factors, cost estimating, scale (up to km). In the last decade a fusion
design scenario planning, and construction plan- and overlap between the two domains is ob-
ning. BIM is an object-oriented, semantically served (Fig. 3). However, there are fundamental
rich model. The spatial relationships between difference between the two models related to
building elements are maintained in hierarchi- the conceptual definition of the indoor objects.
cal manner. It maintains many geometric prim- IFC objects are defined from the view of the
itives ranging from simple B-reps to free form constructor and the CityGML LOD4 from the
curves and surfaces. Today, the most prominent view of the user (Fig. 3). IFC is very appropriate
BIM standard is the Industry Foundation Classes to maintain information about construction parts
(IFC). of building as concrete walls, slabs, and columns.
3D indoor models are investigated by CityGML is focused on the modeling of the
researchers in GIS domain as well. Digital city visible environment such as surfaces of the walls
models have become widely used for digital as part of one room or surfaces of walls as part
representation of major cities. With the advent of the façade of a building. This poses numerous
of 3D city models such as in Google Earth, challenges to the transformations between the
CityGML, and others, indoor modeling became two models (Fig. 4).
3D Indoor Models and Their Applications 13

3D Indoor Models and


Their Applications, Fig. 5 0-9
Semantics of IndoorGML
(IndoorGML, OGC)

The interest, research, and developments in dicates whether it can be used for navigation or
modeling indoors resulted in the first standard not (Fig. 5). The topology can then be derived
dedicated to indoor navigation, i.e., IndoorGML. automatically from the semantic following the
Similar to all OGS standards, IndoorGML is duality-graph principle.
designed to represent and allow exchange of
geo-information that is intended to support in-
door navigation applications. As mentioned pre- Key Applications
viously, the characteristics of CityGML and IFC
might be not sufficient (either too complex or Indoor applications have been traditionally not a
lacking information) for all kinds of indoor ap- topic of research of GIS community. Designers,
plications. Indoor navigation requires a specific constructors, and engineers have been worked
semantics and a topological (connectivity) model, and used 3D indoor representations for model-
which would allow user-oriented path computa- ing airflow simulation, smoke modeling, interior
tion. IndoorGML semantics, geometry, and con- design, and facility management. However, the
nectivity can be derived from other 3D indoor two prominent indoor applications are indoor
models such as IFC and CityGML following the navigation and facility management.
rule of the model. In contract to CityGML and
IFC, IndoorGML requires complete subdivision Indoor Navigation
of the space into cellular units. The subdivision Generally speaking, a navigation system consists
can be done with respect to different themes: of the following components: positioning of a
topographic theme (i.e., representing the internal user, calculation of a best path (cheapest, fastest,
structure of the building) or sensor theme (rep- safest, etc.) to some destination(s), and guidance
resenting the coverage of Wi-Fi access points) along the path. Indoor navigation is a very promi-
or security theme (representing accessible areas nent and active research area. It has been origi-
due to security restrictions) (Becker et al. 2008). nated from navigation robots and it moved to hu-
Therefore, the semantics is quite general; it in- man navigation in the last two decades. However,
14 3D Indoor Models and Their Applications

it remains a challenging topic for several reasons: for coding location. Research on semantic ex-
indoor positioning is not very accurate, users can pression of spatial relationships, directions, and
freely move within the building, topology model locations such as “in room 321,” “on the second
(or path network) construction process may not floor,” as well as “two meters from the second
be straightforward due to complexity of indoor window” and “12 steps from the door,” has been
space, and humans need an appropriate guidance. discussed by a number of researches, e.g., Billen
Many papers have provided extended overview et al. (2014).
on navigation systems and models (2D and 3D) As mentioned previously, the 3D indoor mod-
to support indoor navigation (Afyouni et al. 2012; els can be generated in various ways. Becker
Montello 1993; Fallah et al. 2013; Bandi and et al. (2013) presented an approach based on
Thalmann 1998; Zlatanova et al. 2014). The ma- shape grammars applied to point clouds. Shape
jority of the indoor models found in current liter- grammars have been proven to be successful and
ature are still mostly 2D. They very often ignore efficient to deliver volumetric LOD2 and LOD3
architectural characteristics such as number of models, and the next challenge is its applica-
doors, openings, and windows. The granularity tion to indoor modeling, i.e., LOD4 models. In
of the models is still very low, i.e., they do building interiors, where the available observa-
not take into consideration moveable obstacles tion data may be inaccurate, the shape grammars
(such as furniture), of functional spaces such as can be used to make the reconstruction process
“coffee corner,” “resection area,” etc. Still most robust and verify the reconstructed geometries.
of the topological models used for navigation The potential benefit of using the grammar as a
are predefined, are pre-computed, and cannot support for indoor modeling was evaluated in the
reflect dynamic changes as closing because of study based on an example in which the grammar
renovations. There is a vast amount of research has been applied to automatically generate an
in the area of indoor navigation and localiza- indoor model from erroneous and incomplete
tion. Several conferences have been organized traces, gathered by foot-mounted MEMS/IMU
annually by various international organizations positioning systems.
(ACM SIGSPATIAL, ISPRS, LBS, ICA, etc.). Point clouds are widely used for generation of
For example, the Indoo3D conference organized 3D indoor models. They can be created using
in December 2013 discussed topics related to in- difference range techniques or from images
door model definition, model generation, indoor and videos. Obtaining the vector model can
localization, and indoor navigation applications. be also done using many different approaches
Agreeing on standards for indoor models is and algorithms. El Meouche et al. (2013)
one of the most investigated topics. It is well investigated automatic reconstruction of 3D
understood that standards will speed up the appli- building models from terrestrial laser scanned
cation development. Some researchers take into data. They proposed a surface reconstruction
consideration not only the internal structure of a technique for buildings by processing data from
building but also the manner people can be local- a 3D laser scanner. Funk et al. (2013) presented a
ized indoors to be able to give directions. Com- paper on implicit scene modeling from imprecise
monly geographical coordinates do not make point clouds. The authors stated that when
sense to humans. Humans, however, understand applying optical methods for automated 3D
expressions such as “10 m left from the door” and indoor modeling, the 3D reconstruction of objects
“at front of the restaurant.” Xiong et al. (2013) and surfaces is very sensitive to both lighting
presented the work on a multidimensional indoor conditions and the observed surface properties.
location and information model, which aims to This ultimately compromises the utility of
define absolute, relative, semantic, and metric the acquired 3D point clouds. The authors
expression of location. The model is comple- presented a reconstruction method which is based
mentary to 3D concepts such as CityGML and upon the observation that most objects contain
IndoorGML and is accepted as Chinese standards only a small set of primitives. The approach
3D Indoor Models and Their Applications 15

combined sparse approximation techniques from presented at the workshop focused on the use
the compressive sensing domain with surface of Wi-Fi technologies in indoor positioning. Ver-
0-9
rendering approaches from computer graphics. bree et al. (2013) investigated how Wi-Fi based
The amalgamation of these techniques allows indoor positioning can be used in museum envi-
a scene to be represented by a small set of ronment to navigate three categories of users: vis-
geometric primitives as well as generating itors, employees and emergency services. They
perceptually appealing results. The resulting compared two different Wi-Fi-based localization
surface models are defined as implicit functions techniques. The first one is based on Wi-Fi scan-
and may be processed using conventional ners, i.e., Libelium Meshlium Wi-Fi scanner. The
rendering algorithms, such as marching cubes, to second method was the traditional Wi-Fi fin-
deliver polygonal models of arbitrary resolution. gerprinting. In a similar research, Chan et al.
Wohlfeil et al. (2013) expressed the impor- (2013) worked on improving Wi-Fi fingerprinting
tance of using multi-scale sensor systems and by applying a probabilistic approach, based on
photogrammetric approaches in 3D reconstruc- previously recorded Wi-Fi fingerprint database.
tion. The authors discussed that 3D surface mod- In addition, the authors developed a 3D modeling
els with high resolution and high accuracy are of module that allows for efficient reconstruction of
great importance in many applications, especially outdoor building models to be integrated with in-
if these models are true to scale. As a promising door building models. The architecture consisted
alternative to active scanners (e.g., light section, of a sensor module for receiving, distributing,
structured light, laser scanners, etc.), the authors and visualizing real-time sensor data and a web-
believe that new photogrammetric approaches are based visualization module for users to explore
attracting more attention. They use modern struc- the dynamic urban life in a virtual world.
ture from motion (SfM) techniques, using the Research on algorithms for indoor navigation
camera as the main sensor. Their research com- is also very intensive with the aim to adapt
bined the strengths of novel surface reconstruc- them to the human perception and understanding.
tion techniques from the remote sensing sector Particular indoors, well-known outdoor strategies
with novel SfM technologies resulting in accurate as the shortest and the fastest path might be not
3D models of indoor and outdoor scenes. Starting relevant, while the safest, or less crowded, might
with the image acquisition, all particular steps to be of relevance. Applications that support indoor
a final 3D model were explained in their study. navigation and way finding have become one
The most prominent topic in indoor navigation of the booming industries in the last couple of
is indoor localization. The indoor localization is years. In spite of this, the algorithmic support for
in demand for a variety of applications within indoor navigation has been left mostly untouched
the built environment, and an overall solution so far, and most applications mainly rely on
based on a single technology has not been de- adapting Dijkstra’s shortest path algorithm to an
termined yet. This research is developed rather indoor network. In outdoor spaces, several alter-
independently from the indoor modeling. The native algorithms have been proposed by adding
focus is on the technology that would allow a more cognitive notion to the calculated paths
localizing a person in a building, and therefore and adhering to the natural way-finding behavior
the indoor model is used mostly for visualization (e.g., simplest paths, least risk paths). The need
of the location. In the context of localization, for indoor cognitive algorithms is highlighted by
3D indoor models have been used for improving a more challenging navigation and orientation
the localization accuracy (Girard et al. 2011; Liu requirements due to the specific indoor structure
et al. 2015). Many different localization tech- (e.g., fragmentation, less visibility, confined ar-
nologies are investigated indoors as well (Fallah eas) (Vanclooster et al. 2013).
et al. 2013). Much attention is given to WLAN Today, various indoor applications are avail-
applications, which does not require a person to able on the market. Google Maps, Open Street
carry specialized devices. Two research papers Map (the 3D indoor project), airports, museums,
16 3D Indoor Models and Their Applications

3D Indoor Models and Their Applications, Fig. 6 Visualization of a navigation path in 3D environment: Paris
airport (left) and Hubei Museum (right) (Xu et al. 2013)

and shopping malls have their own indoor nav- server which makes the mobile client very
igation applications. (Fig. 6, left) The real 3D lightweight.
applications are however still very sparse. One of • The network used for navigation is extracted
the reasons is that 3D visualization of enclosed semiautomatically and renewable.
indoor spaces is usually more disturbing than • The graphical user interface (GUI), which is
guiding; the other reason is that the calculations based on a game engine, has high performance
that are performed on 2D plans and 3D models of visualizing 3D model on a mobile display.
are therefore not maintained. Xu et al. (2013) (Fig. 6, right)
presented a 3D model-based indoor navigation
system for a museum in Wuhan, China. The Facility Management
system was based on a 3D model, organized in Facility management is an area of research, which
DBMS on a server and game engine for visual- is increasingly gaining attention. Building owners
ization on android device. The authors argue that are actively seeking for models that can give
3D models are more powerful because 3D models answer to questions as “how much paint do I need
can provide accurate descriptions of locations for the renovation of floor x,” “what is the area of
of indoor objects (e.g., doors, windows, tables), the window frames that have to be pained,” and
which are exhibited in walls and shelves. The “how many square meters of carpet do I need for
experimental system is an example of a flexible room y.” Facility managers need to have informa-
client-server, user-oriented applications. The sys- tion about pipes and cables in case regular checks
tem is composed of three layers: mobile app, web and/or failures. Local governments, institutions
services, and a database (PostGIS). There were performing taxation, and so forth are also
three main strengths of this system: becoming interested in systems, which can easily
compute net areas and volumes of apartments
• It stores all data needed in one database and offices. All these questions usually require
and processes most calculations on the web information about vertical elements, internal
3D Indoor Models and Their Applications 17

structure of buildings, and even “invisible” mation about floors and year of building, which
information about pipe and cables integrated in is used to estimate the thickness of the walls.
0-9
walls and floors/ceilings. IFC and CityGML are
very often compared and discussed, but still there Monitoring of Indoor Environments
is no agreement which model is more appropriate. Internet of Things (IoT) will be a key concept
For daily building and facility management, in monitoring of indoor environments. The IoT
IFC appear to be too heavy and complex and concentrates on making every physical and vir-
numerous solutions are investigated considering tual “thing” a publisher of information. The IoT
CityGML. approach enables “things” to publish information
Several 3D indoor models have been devel- once a state change occurs in them or in predeter-
oped with the ultimate goal of finding an interme- mined intervals. For instance, in a building that
diate solution between IFC and CityGML. Hijazi implements the IoT concepts, a door will publish
et al. (2012) presents a model that integrates the information such as “I am closed now!” or a
building structure concepts of CityGML with the light bulb will indicate “I am on at the moment.”
IFC concepts to provide simplified 3D model for In addition, the “things” will become capable of
maintenance of utility networks. The model is taking actions based on messages coming from
accessed by a simple application, which allows other “things” or humans. A building will be
facility managers to explore and query their elec- considered as a living entity, and applications
tricity and water facilities (Fig. 7). will require information from the “things” (i.e.,
Boeters et al. (2015) argue that CityGML real and virtual) and the “models” (such as City
should be extended with more indoor LOD to be GML/IndoorGML) in real time. In essence, ap-
able to deal with some of the building taxation plications such as Smart Buildings would require
issues such as area and volume computation. The the fusion of information acquired from mul-
authors propose a new LOD2+ which enriches tiple resources, such as things, models, virtual
LOD2 with floor indoor information (Fig. 8). The objects, and real objects. The efficient moni-
floors are volumes; the thickness of exterior walls toring of indoor environments will be directly
is taken into consideration. The LOD2+ is created proportional with the effectiveness in provision
automatically from LOD2 and additional infor- and fusion of real-time information related to

3D Indoor Models and Their Applications, Fig. 7 Google Earth- based prototype of the 3D facility management
application (Hijazi et al. 2012)
18 3D Indoor Models and Their Applications

3D Indoor Models and Their Applications, Fig. 8 Example of a LOD2+ buildings with indoor information about
floors (left) and the same building in reality (right)

indoors. By the utilization of ubiquitous monitor- the people in the rooms can interact with the IoT
ing of indoors, the information regarding build- nodes (to control sensor and actuators) to let them
ing elements would be available 24/7 regardless out of that building part. IoT provides unique
of the situation (i.e., which can be emergency opportunities for indoor monitoring.
or nonemergency). Building and city dashboard
applications would be the main consumers of
this ubiquitous information. Combining semantic Future Directions
information coming from the indoor models with
IoT data provides advantages in answering the 3D indoor models are going to be further ex-
emergency scene questions such as “would you plored, adjusted, and explored as the demand
provide the average CO2 level in the rooms which for indoor is increasing. Research in support of
are not affected by the fire?” and “would you indoor mapping and modeling has been an active
provide the number of doors which are open in field for over 30 years. 3D indoor modeling
the floors that are affected by the flood?” As research is related to all aspects of creating of
another example, in a fire response operation, digital models of the real world: data acquisi-
an emergency responder will acquire information tion, data structuring, visualization techniques,
from the sensors located in each floor regarding applications, and legal issues and standards. The
the spreading of the fire; in response, he can research topics are investigated by a large group
then invoke the web services to interact with of scientist coming from photogrammetry, com-
IoT Nodes which will then invoke the actuators puter vision and image analysis, computer graph-
to close the doors in certain floors to prevent ics, robotics, laser scanning, and many other
spreading of the fire to other floors. Furthermore, technologies. 3D indoor models are no longer a
machine-to-machine (M2M) autonomous inter- research area of engineers, planners, construc-
action is also possible, and a sensor can collect in- tors, and designers. GIS specialists as well as
formation regarding the emergency situation and governments, commercial enterprises, and indi-
interact with another IoT Node to perform a pre- viduals are also beginning to seek and apply 3D
ventive action. As another sample, sensors in the indoor models in their business applications. This
building can interact with the actuators to close reshaping of the users poses higher requirements
doors to prevent some parts of the building from to the models and the tools that would use them.
being flooded by water; in fact, if there would be There are many problems before the 3D indoor
people in these parts of the building, they can be models become commonly available, standard-
trapped as they cannot get out. In this situation, ized, and used for the development of flexible
3D Indoor Models and Their Applications 19

Data Legal Issues 0-9


Acquisition
Structures Visualizationa Navigation Applications and
and Sensors
and Modelling and Guidance Standards

Variable Software tool Web and Navigation Indoor Unification of


lighting mobile devices models modelling for outdoor and
conditions Diversity of crisis response indoor models
Indoor PoI and Automated
Variable landmarks space Augmented The diversity
Existing Environments
occupancy, strategies subdivision systems of indoor
problems
automated environments
feature Optimal Gaming
removal routing
Industrial
Sensor fusion applications

Real-time Navigation
modelling queries and
Mobility Real-time multiplicity of
Dynamic change targets
Real-time Natural
Emerging abstraction visualization Security and
acquisition of Travelling description of
problems levels of
dynamic Discovering Complexity imperatives indoor
access
environments the context of visualization environments
space Discrete vs
Learning the Real-time Privacy
Aural cues continuous
composition of Integration navigation decision
space with GIS/BIM Guidance support Copyright
models

3D Indoor Models and Their Applications, Fig. 9 Challenges in indoor mapping and modeling

user-oriented applications. Zlatanova et al. (2013) Cross-References


argue that there are many challenges to 3D mod-
eling, and they attempted to create an overview  Emergency Evacuations, Transportation Net-
of existing and emerging problems (Fig. 9). These works
problems can be categorized as related to acqui-  Indoor Localization
sition and sensors, data structures and modeling,  Indoor Positioning
visualization and guidance, navigation, applica-  Location-Aware Technologies
tions, legal issues, and standards. Furthermore,  Location-Based Services: Practices and Prod-
many of the challenges in creating 3D models ucts
are not new. They are inherited from current 3D  Spatiotemporal Data Models
outdoor modeling and applications, for example,
sensor fusion, data processing, or data standards.
However, there are many new challenges specific References
for indoor, such as real-time data acquisition, si-
multaneous localization and mapping, integration Afyouni I, Ray C, Claramunt C (2012) Spatial models for
of BIM and GIS models, appropriate 3D graphic context-aware indoor navigation systems: a survey. J
Spat Inf Sci 4:85–123
user interfaces to avoid “tunnel” effect in indoor Azri S, Isikdag U, Abdul-Rahman A (2012) Automatic
visualization and interaction. The area of indoor generation of 3D indoor models: current state of the art
applications will boost, if cognitive approaches and new approaches. In: Proceedings of international
for navigation, orientation, and localization will workshop on geoinformation advances, Malaysia
Bandi S, Thalmann D (1998) Space discretization for ef-
be developed. In this respect semantic annota- ficient human navigation. Computer Graphics Forum,
tions of 3D indoor models will play a critical role. vol. 11 (3), pp. 195–206
Further research is essential in order to develop Becker T, Nagel C, Kolbe TH (2008) A multilay-
more functional models for better positioning and ered space-event model for navigation in indoor
spaces. In: Lee J, Zlatanova S (eds) 3D geo-
navigation systems. information science. Lecture notes in geoinformation
20 3D Models

and 3D Geo-information Sciences, Springer, pp. 60– global matching. Int Arch Photogramm Remote Sens
77. Springer, Berlin/Heidelberg Spat Inf Sci XL-4/W4:37–43
Becker S, Peter M, Fritsch D, Philipp D, Baier P, Dibak Worboys M (2011) Modelling indoor space. In: Pro-
C (2013) Combined grammar for the modeling of ceedings of the third ACM SIGSPATIAL interna-
building interiors. ISPRS Ann Photogramm Remote tional workshop on indoor spatial awareness, Chicago,
Sens Spat Inf Sci II-4/W1:1–6 pp 1–6
Billen R, Cutting-Decelle A-F, Marina O, de Almeida J-P, Xiong Q, Zhu Q, Zlatanova S, Huang L, Zhou Y, Du Z
Caglioni M, Falquet G, Leduc T, Métral C, Moreau G, (2013) Multi dimensional indoor location information
Perret J, Rabino G, San Jose R, Yatskiv I, Zlatanova model. Int Arch Photogramm Remote Sens Spat Inf
S (2014) 3D city models and urban information: cur- Sci XL-4/W4:11–13
rent issues and perspectives, European COST Action Xu W, Kruminaitea M, Onrusta B, Liu H, Xiong Q, Zla-
TU0801, EDP science, 130 p tanova S (2013) A 3D model based imdoor navigation
Boeters RK, Ohori A, Biljecki F, Zlatanova S (2015) Au- system for Hubei provincial museum. Int Arch Pho-
tomatically enhancing CityGML LOD2 models with togramm Remote Sens Spat Inf Sci XL-4/W4:51–55
a corresponding indoor geometry. Int J Geogr Inf Sci Zlatanova S, Sithole G, Nakagawa M, Zhud’Q (2013)
29:2248–2268 Problems in indoor mapping and modelling. Int
Chan S, Sohn G, Wang L, Lee W (2013) Dynamic WIFI- Arch Photogramm Remote Sens Spat Inf Sci
based indoor positioning in 3D virtual world. Int Arch XL-4/W4:63–68
Photogramm Remote Sens Spat Inf Sci XL-4/W4: Zlatanova S, Liu L, Sithole G, Zhao J, Mortari F (2014)
1–6 Space subdivision for indoor applications. GISt Report
El Meouche R, Rezoug M, Hijazi I, Dieter M (2013) 66, 2014
Automatic reconstruction of 3D building models from
terrestrial laser scanner data. ISPRS Ann Photogramm
Remote Sens Spat Inf Sci II-4/W1:7–12
Fallah N, Apostolopoulos I, Bekris K, Folmer E (2013)
Indoor human navigation systems: a survey. Interact 3D Models
Comput 25(1):21–33
Funk E, Dooleya LS, Boernerb A, Griessbachb D
(2013) Implicit surface modeling from imprecise point  Photogrammetric Products
clouds. Int Arch Photogramm Remote Sens Spat Inf
Sci XL-4/W4:7–12
Girard G, Côté S, Zlatanova S, Barette Y, St-Pierre J,
Van Oosterom P (2011) Indoor pedestrian naviga- 3D Network Analysis for User Centric
tion using foot-mounted IMU and portable ultrasound
range sensors. Sensors 11(8):7606–7624
Evacuation Systems
Hijazi I, Ehlers M, Zlatanova S (2012) NIBU: a new
approach to representing and analyzing interior utility Umit Atila, Ismail Rakip Karas, and
networks within 3D geo-information systems. Int J Yasin Ortakci
Digit Earth 5(1):22–42
Liu L, Xu W, Penard W, Zlatanova S (2015) Leveraging
Department of Computer Engineering, Karabuk
spatial model to improve indoor tracking. In: Fuse T, University, Karabuk, Turkey
Nakagava M (eds) ISPRS Arch Photogramm Remote
Sens Spatial Inf Sci, XL-4/W5, pp 75–80
Montello, D. (1993). Scale and Multiple Psychologies of
Space. Spatial Information Theory: A theoretical basis Introduction
for GIS. A. Frank and I. Campari. Berlin, Springer-
Verlag. Lecture Notes in Computer Science 716: Research on evacuation of high-rise buildings
312–321
in case of disasters such as fire, terrorist at-
Vanclooster A, Viaenea P, Van de Weghea N, Fack V, De
Maeyer P (2013) Analyzing the applicability of the tacks, indoor air pollution incidents, etc., has
least risk path algorithm in indoor space. ISPRS Ann become popular in the last decade. In case of
Photogramm Remote Sens Spat Inf Sci II-4/W1:19–26 such disasters, people inside the buildings should
Verbree E, Zlatanova S, van Winden K, van der Laan
be evacuated out of the area as soon as possible.
E, Makri A, Taizhou L, Haojun A (2013) To localise
or to be localised with WiFi in the Hubei museum? However, organizing a quick and safe evacuation
Int Arch Photogramm Remote Sens Spat Inf Sci is a difficult procedure due to the complexity
XL-4/W4:31–35 of high-rise buildings and the huge number of
Wohlfeil J, Strackenbrock B, Kossykb I (2013) Auto-
people occupied inside such buildings. Besides,
mated high resolution 3D reconstruction of cultural
heritage using multi-scale sensor systems and semi- problems such as smoke inhalation, confluence,
3D Network Analysis for User Centric Evacuation Systems 21

panic, and inaccessibility of some exits may arise analysis using Oracle Spatial and Graph within a
during the evacuation procedure. Therefore, an Java-based 3D-GIS implementation. As an initial 0-9
efficient user-centric evacuation system should implementation, a GUI provides a 3D visual-
be developed for quick and safe evacuation from ization of a building. A network model based
high and complex buildings. on CityGML data stores spatial data in Oracle
Routing someone to an appropriate exit in database and then performs network analysis un-
safety can only be possible with a system that der different constraints, such as avoiding nodes
can manage the 3D topological transportation or links in the network model. All experiments
network of a building. Realizing an evacuation of highlighted in this chapter are performed on the
a building in such systems also called navigation 3D model of the Corporation Complex in Putra-
systems by guiding people in real time requires jaya, Malaysia.
complex analysis on 3D spatial data. Sections “Evacuation Process” and “Evacua-
Interest on 3D navigation systems has in- tion Systems” summarize evacuation process and
creased especially after the 9/11 attacks, and evacuation systems, respectively. Section “Vi-
many researchers concentrated on how a safe and sualization of 3D Network Models for Evac-
quick evacuation could be realized in case of such uation” gives examples of visualization of 3D
disasters (Lee 2007). Most of the navigation sys- building and network models from the CityGML
tems operate on 2D data to find and simulate the format. Section “Representing Network Model in
shortest path (which is lacking building environ- Geo-DBMS” gives some information on storing
ment) (Musliman and Rahman 2008). Therefore, spatial data and explains how to create Net-
there is a need for different approaches which use work Models in Oracle Spatial and Graph. Sec-
the 3D objects and eliminate the network analysis tion “Network Analysis Tool” introduces a 3D
limitations on multilevel structures (Cutter et al. network analysis tool and gives visualized re-
2003; Pu and Zlatanova 2005; Kwan and Lee sults of 3D network analysis performed by our
2005; Zlatanova et al. 2004). proposed 3D-GIS implementation. Section “Sim-
In a study conducted by Kwan and Lee (2005), ulation of User Centric Evacuation” elaborates
relative accessibility of the emergency response the routing engine integrated in the simulation
between a disaster site and an emergency station module and presents a visualization sample.
in a building was measured. Their results showed
that extending 2D-GIS to 3D-GIS representations
of the interiors of high-rise buildings can improve Evacuation Process
the overall speed of the rescue process.
Most of the GIS researchers use graph net- One of the most dangerous disasters threatening
works for indoor routing and evacuation analy- the high-rise and complex buildings is fire in
sis (Karas et al. 2006; Jun et al. 2009). While which most of the people may lose their lives
most of the 3D visualization problems have been due to smothering rather than burning. In case
solved by CityGML, initial requirements, con- of fire disasters, extraordinary indoor air pollu-
cepts, frameworks, and applications from a wide tion (EIAP) incidents happen suddenly and cause
point of view have been represented by some fatal consequences such as airlessness, excessive
other research such as (Pu and Zlatanova 2005; temperature, explosions, and smoke and toxic gas
Musliman et al. 2006). However, there is still leakages. Table 1 indicates the number of people
a lack of implementation of 3D network anal- died due to various reasons after a residential
ysis and navigation specifically for evacuation fire incident (Holborn et al. 2003). As it can be
purposes. deduced from Table 1, the major death cause was
The objective of this study is to investigate breathing in smoke, followed by combination of
and implement 3D visualization and navigation burning and smothering.
techniques and solutions for indoor spaces within There are three main stages in extraordinary
3D GIS. We explain how to perform 3D network indoor air pollution incidents. In the first stage,
22 3D Network Analysis for User Centric Evacuation Systems

3D Network Analysis for User Centric Evacuation Sys- 3D Network Analysis for User Centric Evacuation Sys-
tems, Table 1 Number of people died due to various tems, Table 2 The main factors that triggered occupant
reasons after a residential fire incident (Holborn et al. evacuation in buildings (Wood 1972; Bryan 1977)
2003)
The main factors that triggered England % USA %
Reason of death Number of people Percentage occupant evacuation
who lost their lives Smoke 34:0 35:1
Inhalation 101 36 Shouting and voices 33:0 34:7
Smothering 8 3 Flames 15:0 8:1
Burned bronchus 8 3 Noise 9:0 11:2
Burning 53 19 Alarm 7:0 7:4
Combination of 69 25 Others 2:0 2:8
burning and
smothering
Others 20 7
Injuries due to heart 20 7
attack stroke and The studies also indicated when an alarm
falling system sounds, occupants spend the most criti-
cal time period to understand the reason of the
alarm rather than evacuating the building. Also,
studies indicated that the occupants give different
occupants are not affected by smoke, gas, or tem- responses based on the type and method of alarm
perature; therefore, this stage is the most appro- system or content and time of the announce
priate stage for evacuation. In the second stage, (Bryan 2002). Uncertainty and insufficient infor-
the occupants are heavily exposed to smoke, toxic mation during the event may delay the evacuation
gas, and excessive temperature. procedure.
In previous studies, the behaviors of the oc- The second stage of evacuation is movement
cupants are analyzed in the two main stages dis- time or action time. Movement time is the period
cussed in the previous paragraph during a disaster between the time people react to escape from the
(Purser and Bensilum 2001). The first stage is building and the time they reach out of the build-
the premovement time or response time, and the ing or some safe place in the building (Purser and
second stage is the movement time or action Bensilum 2001). Movement time varies based
time. Premovement time is defined as the period on two main factors: exit preferences and smoke
between the time alarm systems activates and the problems.
time people react to escape from the building. Current evacuation systems assume that
Table 2 compares the main factors that triggered occupants use the closest exit in a time of
occupant evacuation in buildings in England and emergency evacuation. Table 3 indicates the
the USA (Wood 1972; Bryan 1977). This indi- results from a study where the preferences of
cates that the effect of alarm systems in initiating the occupants were investigated in a building
people to react is unexpectedly low. where there was one emergency exit door and
A study conducted by Purser and Bensilum one entrance door located in opposite locations
(2001) in a shopping mall indicated that when to each other. As seen in the Table 3, most of
occupants are informed by announcement sys- the guests used the entrance which they were
tem, most of the evacuation time procedure was more familiar with (Mawson 1980), while almost
realization of a need to evacuate, rather than all of the occupants use the emergency exit
movement time. Figure 1 indicates that the per- door. People use the closest exit only if they
centages of realization, response, and reaction know the building well (Gwynne et al. 1999).
times were 65%, 16%, and 19%, respectively. When the guidance of the evacuation systems is
Therefore, premovement time is 81% of the total insufficient, people consider various factors in
evacuation time. choosing the evacuation path.
3D Network Analysis for User Centric Evacuation Systems 23

3D Network Analysis for 90 Realization time


User Centric Evacuation 0-9
80 Response time
Systems, Fig. 1 Occupant
Reaction time
behavior time (Purser and
Bensilum 2001) 70

60

Time (min)
50

40

30

20

10

0
1 2 3 4 5 6 7 8 9 10 11
Occupants

3D Network Analysis for User Centric Evacuation Sys- 3D Network Analysis for User Centric Evacuation Sys-
tems, Table 3 Exit preference rates of people (Sime tems, Table 4 Percentages of occupants returning back
1985) due to low sight distance
Exit preference Guest Occupant Total Visibility (meter) England (%) USA (%)
Entrance door 37 1 38 0–2 29:0 31:8
Emergency exit door 24 13 37 3–6 37:0 22:3
7–12 25:0 22:3
13–30 6:0 17:6
Previous studies reported that when occupants 31–36 0:5 1:3
encounter a smoke problem, they keep moving 37–45 1:0 0
through the smoke if the sight distance is more 46–60 0:5 4:7
than 20 m; however, they hesitate and do not take > 60 1:0 0
the risk when sight distance is less than 20 m
(Bryan 1995). Thus, smoke is a serious problem
which affects the movement time in evacuation at the early stages of a disaster; and evacua-
process. People slow down in smoke, and they tion lighting to allow occupants to continue to
cannot determine an optimum evacuation path or navigate (Fig. 2). Traditional evacuation systems
cannot follow a straight route due to diminished are not sufficient for safe and quick evacuation
sight distance (Jin 1976). However, it can some- of today’s high-rise and complex buildings (Pu
times be necessary to pass through a smoke area and Zlatanova 2005). These evacuation systems
for survival. Based on a previous study, Table 4 are not flexible due to their static predefined
indicates the percentages of occupants returning scenarios. This may guide people to block exits
back due to low sight distance in smoked zones or places where there are gas leakages. Also,
(Bryan 1995). traditional evacuation systems become useless
when sight distance is very low due to smoke
and electricity cuts. They also provide insuffi-
Evacuation Systems cient evacuation information, especially for peo-
ple who are not familiar with the building.
Traditional evacuation systems can be divided Emergency incidents are not static, but they
into three main groups: sensors to detect heat, are dynamic and variable events. However,
smoke, or radiation; alarm system to alert people traditional evacuation instructions are generally
24 3D Network Analysis for User Centric Evacuation Systems

Alarm devices

Detectors &
Sensors Control room Evacuating
people

Evacuation lights

3D Network Analysis for User Centric Evacuation Systems, Fig. 2 The components of current evacuation systems
(Pu and Zlatanova 2005)

insufficient in dynamic evacuation process. The • 3D-GIS-based routing engine centralized in


stage in which people spend most of the time an appropriate host.
in case of emergency is the stage during which • Mobile-based navigation software for passing
they do not react or take action but rather the user-related data to the host and for presenting
stage of realizing the event before starting to routing instructions to the user clearly.
move. Uncertainty at the time of the emergency • An accurate 3D indoor positioning system.
and the lack of clear information about the • Well-organized wireless communication and
incident are factors in delaying the evacuation sensor network architectures inside the build-
of the building. Therefore, a system that can ing.
provide understandable and clear information to
all users in real time and resolve their concerns In the rest of this topic, we will concentrate
will definitely shorten the evacuation process. on formalization of 3D building and network
Such an ideal system is a smart evacuation models within 3D GIS needed to construct a
system that can avoid congestion by allocating dynamic evacuation system and present a shortest
traffic across the available routes or guide path analysis and various evacuation simulation
people away from areas of risk (smoky and examples.
dangerous) in case of necessity. Therefore, an
ideal evacuation system allows people to progress
rapidly without hesitation and without the need Visualization of 3D Network Models
for determining the route themselves. To realize for Evacuation
an ideal indoor evacuation system, a number of
main functionalities should be addressed. These A Java-based 3D-GIS implementation has been
functionalities are listed as follows: developed that is able to visualize 3D build-
ing model and perform network analysis on the
• A spatial database for the management of the network model of building. The implementation
building and network models. uses citygml4j Java class library and API for
3D Network Analysis for User Centric Evacuation Systems 25

facilitating work with the CityGML and JOGL Representing Network Model in
Java bindings for the OpenGL graphic library to Geo-DBMS
0-9
carry out visualization of 3D spatial objects.
CityGML is introduced as one of the inter- While CityGML is used to store and visualize 3D
national standards for representing and exchang- spatial objects, graph model managed in a geo-
ing spatial data, making it easier to visualize, database management system (DBMS) is used to
store, and manage 3D city models data efficiently. perform 3D network analysis. Oracle Spatial and
CityGML is able to represent 3D city models in Graph is one of the most powerful geo-DBMS,
five well-defined Level of Details (LOD), namely, which offers a combination of geometry models
LOD0 to LOD4. The accuracy and structural and graph models (Murray 2009).
complexity of the 3D objects increases with the Oracle Spatial and Graph maintains a combi-
LOD level where LOD0 is the simplest LOD nation of geometry and graph models within the
with a two-and-a-half-dimensional Digital Ter- Network Data Model. A spatial network consists
rain Model, while LOD4 is the most complex of nodes and links which are SDO_GEOMETRY
LOD including architectural details with interior objects representing points and lines, respectively
structures. LOD1 is the well-known blocks model (Kothuri et al. 2010).
comprising prismatic buildings with flat roofs. Network support in the Oracle database is
Differentiated from LOD1, LOD2 has roof struc- composed of the following elements:
tures. LOD3 denotes architectural models with
detailed wall and roof structures and balconies • A data model to store networks inside the
(Gröger et al. 2008). database as a set of network tables: This is the
The implemented system reads CityGML persistent copy of a network.
datasets from LOD0 to LOD2. 3D building • SQL functions to define and maintain net-
models are represented in LOD2 described by works (i.e., the SDO_NET package).
polygons using the Building Module of CityGML • Network analysis functions in Java program-
(Fig. 3). Network models are represented as ming language: The Java API works on a copy
linear networks in LOD0 using CityGML’s of the network loaded from the database. This
Transportation Module (Fig. 4). is the volatile copy of the network.

3D Network Analysis for User Centric Evacuation Systems, Fig. 3 Building model (textured viewing mode)
26 3D Network Analysis for User Centric Evacuation Systems

3D Network Analysis for User Centric Evacuation Systems, Fig. 4 Network model

3D Network Analysis for


User Centric Evacuation
Systems, Fig. 5 Oracle
network data model

• Network analysis functions in PL/SQL (the and content to model the network. A node table
SDO_NET_MEM package). (see Table 5) describes all nodes in the network.
Each node has a unique numeric identifier (the
Figure 5 illustrates the relationship between NODE_ID column). A link table (see Table 6)
the elements of the Oracle Network Model describes all links in the network. Each link has a
(Kothuri et al. 2010). unique numeric identifier (the LINK_ID column)
To define a network in Oracle Spatial and and contains the identifiers of the two nodes it
Graph, at least two tables should be created. connects (Kothuri et al. 2010).
These are node and link tables. These tables In this study, as we define a spatial network
should be provided with the proper structure containing both connectivity and geometric
3D Network Analysis for User Centric Evacuation Systems 27

3D Network Analysis for User Centric Evacuation Sys- 3D Network Analysis for User Centric Evacuation Sys-
tems, Table 5 Example entry in node table in network tems, Table 6 Example entry in link table in network 0-9
model model
NODE_ID 230 LINK_ID 15
NODE_NAME NODE-230 START_NODE_ID 452
GEOMETRY MDSYS.SDO_GEOMETRY(3001, END_NODE_ID 455
NULL,MDSYS. SDO_POINT_TYPE
(42.2019449799705,100.382921548 LINK_NAME Link-452-455-Corridor
946, 3.7),NULL,NULL) GEOMETRY MDSYS.SDO_GEOMETRY(3002,
ACTIVE Y NULL,NULL,MDSYS. SDO_ EL-
EM_INFO_ARRAY(1,2,1),MDSYS.
SDO_ORDINATE_ARRAY (115.30
6027729301,85.9775129777152,1.8,
information, we use SDO_GEOMETRY for 115.306027729301,82.9483382781
representing points and lines. 573,1.8))
For completing the network creation process, LINK_LENGTH 3,029174699557899
Oracle Spatial and Graph needs a metadata table ACTIVE Y
called USER_SDO_NETWORK_METADATA
LINK_TYPE Corridor
(see Table 7) to ensure the table structures are
consistent with the metadata. The metadata
table USER_SDO_NETWORK_METADATA
describes the elements that compose a network manually. CREATE_SDO_NETWORK function
such as names of the tables and names of the creates all the structures of a network, but it is
optional columns. not flexible as it gives very little control over
There are two choices to create a network. One the actual structuring of the tables. Sample code
can either prefer to create network automatically given below illustrates creation of CORPORA-
using CREATE_SDO_NETWORK function of TION_PUTRAJAYA network with explicit table
SDO_NET package or prefer to create network and column names.

SQL >BEGIN
SDO_NET.CREATE_SDO_NETWORK (
NETWORK D> ‘CORPORAHON_PUTRAJAYA’,
NO_OF_HIERARCHY-LEVELS D> 1,
IS_DIRECTED D> FALSE,
NODE_TABLE_NAME D> ‘CORP_NETWORK_NODE’,
NODE_GEOM_COLUMN D> ‘GEOMETRY’,
NODE_COST_COLUMN D> NULL,
LINK_TABLE_NAME D> ‘CORP_NETWORK_LINK ’,
LINK_GEOM_COLUMN D> ‘GEOMETRY’,
LINK_COST_COLUMN D> ‘LINK_LENGTH’
);
END;

The alternative way is to create the network statements. Manual creation gives total flexibility
tables manually. When defining network over the table structures, but one must ensure
manually, one has to create all needed tables that the table structures are consistent with the
and insert proper data into tables using SQL metadata.
28 3D Network Analysis for User Centric Evacuation Systems

3D Network Analysis for User Centric Network Analysis Tool


Evacuation Systems, Table 7 Example entry in
USER_SDO_NETWORK_METADATA view
In this section, we will present our implementa-
NETWORK CORPORATION_ tion that performs network analysis with its net-
PUTRAJAYA
work analysis tool based on a Java API provided
NETWORK_CATE- SPATIAL
GORY
by the Network Data Model of Oracle Spatial and
Graph. The Java API which is put in a package
GEOMETRY_TYPE SDO_GEOMETRY
called oracle.spatial.network is very rich and pro-
NO_OF_HIERARCHY- 1
LEVELS
vides a range of analysis functions. Our network
analysis tool allows conducting most common 3D
NO_OF_PARTITIONS 1
network analysis supported by the Oracle Spatial
LINK_DIRECTION UNDIRECTED
and Graph. With this network analysis tool, it is
NODE_TABLE_NAME CORP_NETWORK_NODE also possible to perform common 3D network
NODE_GEOM_ GEOMETRY analysis with full functionality including con-
COLUMN straints and to see the results on a 3D graphical
NODE_COST_CO- NULL screen. In this section, a shortest path example
LUMN
will be presented. Figure 6 shows a UML diagram
LINK_TABLE_NAME CORP_NETWORK_LINK
that summarizes network analysis process.
LINK_GEOM_ GEOMETRY The analysis functions are provided by the
COLUMN
methods of the NetworkManager class. These
LINK_COST_COLUMN LINK_LENGTH
methods operates on volatile copy of the network.
PATH_TABLE_NAME CORP_NETWORK_ Therefore, the first step in the network analysis
PATH
tool is to load network from database. The follow-
PATH_LINK_TABLE_ CORP_NETWORK_PATH_
NAME LINK ing loads the complete network named CORPO-
RATION_PUTRAJAYA from database in read
PATH_GEOM_COLUMN GEOMETRY
only mode.

Network corporation_Putrajaya = NetworkManager.readNetwork


(dbConnection, “CORPORATION_PUTRAJAYA”);

If we want to define a set of constraint for and MustAvoidLinks. Once we define the
any of analysis methods to limit the search SystemConstraint object, we can pass it as last
space, we simply define a SystemConstraint parameter to any of the analysis methods of the
object. The SystemConstraint class allows to NetworkManager class. The following sets a
define constraints such as MaxCost, MaxDepth, constraint to avoid use of link identified by 6012,
MaxDistance, MaxMBR, MustAvoidNodes, 6013, and 6014 in the network.

Vector avoidLinks = new Vector();


avoidLinks.add(“6012”);
avoidLinks.add(“6013”);
avoidLinks.add(“6014”);
SystemConstraint myConstraint = new SystemConstraint
(corporation_Putraj aya) ;
myConstraint.SetMustAvoidLinks(avoidLinks) ;
3D Network Analysis for User Centric Evacuation Systems 29

USER GRAPHICAL DISPLAY


0-9
READ CityGML DATA 3D NETWORK MODEL
& VISUALIZE 3D
BUILDING MODEL NETWORK ANALYSIS FORM
(USE citygml4j JAVA
LIBRARY) 3D NETWORK ANALYSIS

ROUTING SIMULATION FORM


RETURN NETWORK MODEL IN
ROUTING SIMULATION
CityGML FORMAT
PERFORM SPATIAL
ANALYSIS (USE
CREATE 3D NETWORK MODEL IN
SDONM.JAR JAVA
ORACLE SPATIAL
LIBRARY)

OPEN NETWORK ANALYSIS FORM Perform


routing
simulation
RETURN ANALYSIS RESULT TO GRAPHICAL DISPLAY on 3D
Network
OPEN ROUTING SIMULATION FORM Model

RETURN ROUTING INSTRUCTIONS

3D Network Analysis for User Centric Evacuation Systems, Fig. 6 UML diagram summarizing network analysis
process

A fundamental operation on a network is to nodes have a cost of 0. As stated in the previous


find the shortest path between two nodes. The section, in this study we use link lengths as
shortestPath() method returns the best path be- costs.
tween two nodes in a network. This method takes The shortestPath() method returns a Path ob-
network object on which we perform analysis and ject. We have a number of methods to extract
the start and end nodes. The best path between various pieces of information from a path object
two nodes is the one with smallest cost. Cost such as the cost of path, number of the links,
of a node or a link is defined in tables with and array of Link objects to extract further in-
numeric values. The cost can represent anything formation. The following finds the shortest path
such as length of a link or time to travel along between nodes 3059 and 3368 on the network
that link. If there is no cost column in tables, then applying the constraint defined and then prints
all links are considered to have a cost of 1 and various information on found path.

Path foundPath = NetworkManager.shortestPath(corporation_Putrajaya,


3059, 3368, myConstraint);
System.out.println(“Path cost is ”+ foundPath.getCost());
System.out.println(“Number of links ”+ foundPath.getNoOfLinks());
Link [ ] linkArray = foundPath.getlinkArray();
Node[ ] nodeArray = foundPath.getNodeArray();

for (int i = 0; i < linkArray.length; i++)


System.out.println (“ Link ” + linkArray[i].getID() + “nt”
+ linkArray[i].getName() +“nt” + linkArray[i].getCost());
30 3D Network Analysis for User Centric Evacuation Systems

3D Network Analysis for User Centric Evacuation Systems, Fig. 7 Shortest path between two nodes without any
constraint

3D Network Analysis for User Centric Evacuation Systems, Fig. 8 Recalculated shortest path considering avoided
elevators in a part of building

Figure 7 shows the shortest path analysis result with one of elevators are avoided, shown by red
on a graphical screen without any constraint. The lines which means that elevator is not in use
found path follows nodes 3059-3067-3066-3366- any more. Updated path to destination follows
3367-3359-3365-3368. Figure 8 shows how the nodes 3059-3065-3070-3069-3369-3370-3365-
shortest path is updated after links associated 3368.
3D Network Analysis for User Centric Evacuation Systems 31

0-9

3D Network Analysis for User Centric Evacuation Systems, Fig. 9 Routing simulation process of the instruction
engine-Scene-1 (The red point is the user)

Simulation of User Centric Evacuation Considering the complexity of modern buildings


and the great numbers of people inside, it is rather
Our implementation has an instruction engine difficult to organize such a quick emergency
which is integrated into the simulation module evacuation.
to produce voice commands and visual instruc- Many evacuation systems have been devel-
tions for assisting users dynamically on the way oped to minimize losses in such disasters. 3D
to their destination. In the simulation stage, a geo-information has been widely used in all the
floating cursor moves over the path in order to disaster management phases such as mitigation,
simulate a walking person with respect to the preparedness, and recovery phases. However, it
given orders. In this procedure, first, the turns and hasn’t really been applied to the response phase
descending and ascending ways are calculated under extraordinary circumstances.
between nodes and floors, and instructions are In this study, some samples of performing 3D
defined with respect to the calculations. Then, network analysis with visualized results support-
according to the path and calculations, moving ing both graph-based and geometric constraints
person is simulated on the screen by a floating applied were presented. It has been showed how
cursor over the path line. Guiding instructions Oracle Spatial and Graph can be a powerful geo-
are spoken by computer by using the text speech DBMS for realizing 3D network analysis and
algorithms and written on the screen (Figs. 9, 10 developing evacuation systems that provide dy-
and 11). namic, specific, and accurate evacuation guidance
based on indoor geo-information.
We also presented a simulation module that
Conclusions produces voice commands and visual instructions
for assisting people dynamically on the way to the
The modern buildings are designed higher and destination. The instruction engine presented in
more complex than ever before, which makes this study for simulating evacuation is intended
them vulnerable to many potential disasters such to be the infrastructure of a voice-enabled mo-
as terrorist bombings, fire, and toxic gas leakage. bile navigation system for indoor spaces in our
32 3D Network Analysis for User Centric Evacuation Systems

3D Network Analysis for User Centric Evacuation Systems, Fig. 10 Routing simulation process of the instruction
engine-Scene-2

3D Network Analysis for User Centric Evacuation Systems, Fig. 11 Routing simulation process of the instruction
engine-Scene-3

work currently in progress. In our future study, evacuated and produce the personalized instruc-
we intend to design an intelligent user-centric tions in real time.
evacuation model based on neural networks for
high-rise building fires in which we will consider Acknowledgements This study was supported by
TUBITAK-The Scientific and Technological Research
the physical conditions of the environment and
Council of Turkey research grant [grant number:
the properties of the person that requests to be 112Y050]. We are indebted for its financial support.
4-Intersection Calculus 33

References Rahman A, Zlatanova S, Coors V (eds) Innovations


in 3D geo information systems. Springer, Berlin/New
Bryan J (1977) Smoke as a determinant of human be- York, pp 125–134
haviour in fire situations: project people. NBS GCR Pu S, Zlatanova S (2005) Evacuation route calculation of
77–94, National Bureau of Standards, Washington, DC inner buildings. In: van Oosterom PJM, Zlatanova S,
Bryan J (1995) Behavioural response to fire and smoke. In: Fendel EM (eds) Geo-information for disaster man-
SFPE, handbook of fire protection engineering, vol 3, agement. Springer, Heidelberg, pp 1143–1161
2nd edn. National Fire Protection Association, Quincy, Purser DA, Bensilum M (2001) Quantification of a be-
pp 241–262 haviour for engineering design standards and escape
Bryan J (2002) Human behaviour in fire. Fire Prot Eng time calculations. Saf Sci 38:157–182
16:4–16 Sime JD (1985) Movement towards the familiar person
Cutter S, Richardson DB, Wilbanks TJ (eds) (2003) The and place affiliation a fire entrapment setting. Environ
geographical dimensions of terrorism. Routledge, New Behav 17:697–724
York/London, pp 75–117 Wood PG (1972) The behaviour of people in fires. Fire
Gröger G, Kolbe TH, Czerwinski A, Nagel C (2008) research note, No. 953, Building Research Establish-
OpenGIS city geography markup language ment. Fire Research Station, Borehamwood, p 113
(CityGML) encoding standard: OGC 08-007r1. Zlatanova S, van Oosterom P, Verbree E (2004) 3D
Open Geospatial Consortium Inc., 7–10, 22–25, technology for improving disaster management: geo-
56–62, 77–79 DBMS and positioning. In: Proceedings of the XXth
Gwynne S, Galea ER, Lawrence PJ, Filippidis L (1999) ISPRS congress, Istanbul
A review of the methodologies used in the computer
simulation of evacuation from the built environment.
Build Environ 34:741–749
Holborn PG, Nolan PF, Golt J (2003) An analysis of fatal
unintentional dwelling fires investigated by London 3-D RDBMS
Fire Brigade between 1996 and 2000. Fire Saf J
38(1):1–42  Validation of Three-Dimensional Geometries
Jin T (1976) Visibility through fire smoke – Part 5,
allowable smoke density for escape from fire. Report
No. 42, Fire Research Institute of Japan
Jun C, Kim H, Kim G (2009) Developing an indoor
evacuation simulator using a hybrid 3D model. In: 3D Spatial Indexing
Lee J, Zlatanova S (eds) 3D Geo-information science.
Springer, Berlin, pp 173–178  3D Crisp Clustering of Geo-Urban Data
Karas IR, Batuk F, Akay AE, Baz I (2006) Automati-
cally extracting 3D models and network analysis for
indoors. In: Abdul-Rahman A, Zlatanova S, Coors
V (eds) Innovation in 3D-geo information system.
Springer, Berlin, pp 395–404 3-Value Indeterminacy
Kothuri R, Godfrind A, Beinat E (2010) Pro Oracle spatial
for Oracle database 11g. Apress, New York
Kwan MP, Lee J (2005) Emergency response after 9/11:  Objects with Broad Boundaries
the potential of real-time 3D GIS for quick emergency
response in micro-spatial environments. Comput Env-
iron Urban Syst 29:93–113
Lee J (2007) A three-dimensional navigable data model
to support emergency response in microspatial built- 4IM
environments. Ann Assoc Am Geogr 97(3):512–529
Mawson A (1980) Is the concept of panic useful for sci-
 Dimensionally Extended Nine-Intersection
entific purposes? In: Second international seminar on
human behavior in fire emergencies. National Bureau Model (DE-9IM)
of Standards, Washington, DC
Murray C (2009) Oracle spatial developer’s guide, 11g
release 1 (11.1). Oracle. pp 4–8
Musliman IA, Rahman AA (2008) Implementing 3D net-
work analysis in 3D GIS. In: International archives of 4-Intersection Calculus
ISPRS, Beijing, vol 37, Part B, Comm. 4/4
Musliman IA, Rahman AA, Coors V (2006) 3D navi-
gation for 3D-GIS – initial requirements. In: Abdul-  Mereotopology
34 9IM

9IM 9-Intersection Calculus

 Dimensionally Extended Nine-Intersection  Mereotopology


Model (DE-9IM)
A

A Algorithm Abstraction of Geodatabases

 Fastest-Path Computation Monika Sester


Institute of Cartography and Geoinformatics,
Leibniz University of Hannover, Hannover,
Germany
Absolute Positional Accuracy
Synonyms
 Positional Accuracy Improvement (PAI)

Cartographic generalization; Conceptual gener-


alization of databases; Geographic data reduc-
tion; Model generalization; Multiple resolution
Abstract Features database

 Feature Extraction, Abstract


Definition

Model generalization is used to derive a more


simple and more easy to handle digital repre-
Abstract Representation of
sentation of geometric features (Gr nreich 1995).
Geographic Data
It is being applied mainly by National Map-
ping Agencies to derive different levels of rep-
 Feature Catalogue
resentations with less details of their topographic
data sets, usually called Digital Landscape Mod-
els (DLM s). Model generalization is also called
geodatabase abstraction, as it relates to gener-
Abstraction ating a more simple digital representation of
geometric objects in a database, leading to a
 Hierarchies and Level of Detail considerable data reduction. The simpli cation

' Springer International Publishing AG 2017


S. Shekhar et al. (eds.), Encyclopedia of GIS,
DOI 10.1007/978-3-319-17885-1
36 Abstraction of Geodatabases

refers to both the thematic diversity and the geo- Fully automatic processes are available that are
metric complexity of the objects. Among the well able to generalize large data sets, e.g., the whole
known map generalization operations the follow- of Germany (Urbanke and Dieckhoff 2006).
ing subset is used for model generalization: se-
lection, (re-)classi cation, aggregation, and area
collapse. Sometimes, also the reduction in the Scientific Fundamentals
number of points to represent a geometric feature
is applied in the model generalization process, Operations of model generalization are selection,
although this is mostly considered a problem of re classi cation, aggregation, area collapse, and
cartographic generalization. This is achieved by line simpli cation.
line generalization operations.
Selection
According to a given thematic and/or geometric
property, objects are selected which are being
Historical Background preserved in the target scale. Typical selection
criteria are object type, size or length. Objects
Generalization is a process that has been applied ful lling these criteria are preserved, whereas the
by human cartographers to generate small scale others are discarded. In some cases, when an
maps from detailed ones. The process is com- area partitioning of the whole data set has to
posed of a number of elementary operations that be preserved, then the deleted objects have be
have to be applied in accordance with each other replaced appropriately by neighboring objects.
in order to achieve optimal results. The dif culty
is the correct interplay and sequencing of the Re-Classi cation
operations, which depends on the target scale, the Often, the thematic granularity of the target scale
type of objects involved, as well as constraints is also reduced when reducing the geometric
these objects are embedded in (e.g., topological scale. This is realized by reclassi cation or new
constraints, geometric and semantic context, . . . ). classi cation of object types. For example, in the
Generalization is always subjective and requires German ATKIS system, when going from scale
the expertise of a human cartographer (Spiess 1:25.000 to 1:50.000, the variation of settlement
1995). In the digital era, attempts to automate structures is reduced by merging two different
generalization have lead to the differentiation settlement types to one class in the target scale.
between model generalization and cartographic
generalization, where the operations of model Area Collapse
generalization are considered to be easier to au- When going to smaller scales, higher-dimensional
tomate than those of cartographic generalization. objects may be reduced to Lower dimensional
After model generalization has been applied, ones. For instance, a city represented as an area
the thematic and geometric granularity of the is reduced to a point; an areal river is reduced
data set corresponds appropriately to the target to a linear river object. These reductions can
scale. However, there might be some geometric be achieved using skeleton operations. For the
con icts remaining that are caused by applying area-to-line reduction, the use of the Medial
signatures to the features as well as by impos- Axis is popular, which is de ned as the locus of
ing minimum distances between adjacent objects. points that have more than one closest neighbor
These con icts have to be solved by cartographic on the polygon boundary. There are several
generalization procedures, among which typi - approximations and special forms of axes (e.g.,
cation and displacement are the most important Straight Skeleton (David and Erickson 1998)).
(for a comprehensive overview, see Mackaness Depending on the object and the task at hand,
et al. 2007). As opposed to cartographic gen- there are forms that may be more favorable than
eralization, model generalization processes have others (e.g., Chin et al. 1995 and Haunert and
already achieved a high degree of automation. Sester 2007).
Abstraction of Geodatabases 37

Definition Neighbours Size All neighbors

object -> A
dark object -> object -> object ->
light gray object max_neighbors biggest neighbor equal distribution
to all neighbors

Abstraction of Geodatabases, Fig. 1 Different aggregation methods

Aggregation can be approximated by convex hulls, or to ag-


This is a very important operation that merges gregations of lines features.
two or more objects into a single one, thus lead-
ing to a considerable amount of data reduction. Line Simpli cation
Aggregation is often following a selection or area Line simpli cation is a very prominent gener-
collapse process: when an object is too small (or alization operation. Many operations have been
unimportant) to be presented in the target scale, proposed, mainly taking the relative distance
it has to be merged with a neighboring object. between adjacent points and their relative context
For the selection of the most appropriate neigh- into account. The most well-known operator is
bor, there are different strategies (see Fig. 1, e.g., the Douglas-Peucker-Algorithm (Douglas and
selecting the neighbor according to thematic pri- Peucker 1973).
ority rules, the neighbor with the longest common
boundary, the largest neighbor, or the area can be
distributed equally to the neighbors (Haunert and Key Applications
Sester 2007; van Oosterom 1995; Podrenek 2002;
van Smaalen 2003). Another criterion is to select The key application of database abstraction or
a neighbor which leads to a compact aggregated model generalization is the derivation of less
region and solve the whole problem as a global detailed data sets for different applications.
optimization process (Haunert and Wolff 2006).
Aggregation can also be performed when the Cartographic Mapping
objects are not topologically adjacent. Then, ap- The production of small scale maps requires a
propriate criteria for the determination of the detailed data set to be reduced in number and
neighborhood are needed as well as measures to granularity of features. This reduction is achieved
ll the gaps between the neighboring polygons using database abstraction. It has to be followed
(Bundy et al. 1995). Aggregation can also be by cartographic generalization procedures that
applied to other geometric features such as points are applied in order to generate the nal symbol-
and lines. This leads to point aggregations that ized map without graphical con icts.
38 Abstraction of Geodatabases

Visualization on Small Displays isting data sets (Hampe et al. 2004). Although
The size of mobile display devices requires the different approaches already exist, there is still
presentation of a reduced number of features. research needed to fully exploit this data structure
To this end, the data can be reduced using data (Sheeren et al. 2004).
abstraction processes.

Internet Mapping: Streaming Data Update


Generalization An MRDB in principle offers the possibility of
Visualization of maps on the internet requires ef ciently keeping the information in linked data
the transmission of an appropriate level of detail sets up-to-date. The idea is to exploit the link
to the display of the remote user. To achieve structure and propagate the updated information
an adequate data reduction that still ensures that to the adjacent and linked scales. There are sev-
the necessary information is communicated to eral concepts for this, however, the challenge is to
the user, database abstraction methods are used. restrict the in uence range to a manageable size
Also, it allows for the progressive transmission (Haunert and Sester 2005).
of more and more detailed information (Brenner
and Sester 2005; Yang 2005).
Cross-References
Spatial Data Analysis
Spatial analysis functions usually relate to a cer-  Generalization, On-the-Fly
tain level of detail where the phenomena are  Hierarchies and Level of Detail
best observed, e.g., for planning purposes, a scale  Map Generalization
of approximately 1:50.000 is very appropriate.  Mobile Usage and Adaptive Visualization
Database abstraction can be used to generate this  Voronoi Diagram
scale from base data sets. The advantage is that  Web Mapping and Web Cartography
the level of detail is reduced while still preserving
the geometric accuracy.
References
Future Directions
Balley S, Parent C, Spaccapietra S (2004) Modelling
geographic data with multiple representation. Int J
MRDB: Multiple Resolution Database Geogr Inf Sci 18(4):327 352
For topographic mapping, often data sets of dif- Brenner C, Sester M (2005) Continuous generalization
ferent scales are provided by Mapping Agen- for small mobile displays. In: Agouris P, Croitoru A
(eds) Next generation geospatial information. Taylor
cies. In the past, these data sets were typically
& Francis, Hoboken, pp 33 41
produced manually by generalization processes. Bundy G, Jones C, Furse E (1995) Holistic generaliza-
With the availability of automatic generalization tion of large-scale cartographic data. In: M ller JC,
tools, such manual effort can be replaced. In order Lagrange JP, Weibel R (eds) GIS and generalization
methodology and practice. Taylor & Francis, London,
to make additional use of this lattice of data sets,
pp 106 119
the different scales are stored in a database where Chin FY, Snoeyink J, Wang CA (1995) Finding the medial
the individual objects in the different data sets are axis of a simple polygon in linear time. In: Springer
connected with explicit links. These links then (ed) ISAAC 95: proceedings of the 6th international
symposium on algorithms and computation, London,
allow for an ef cient access of the corresponding
pp 382 391
objects in the neighboring scales, and thus an David E, Erickson J (1998) Raising roofs, crashing cycles,
ease of movement up and down the different and playing pool: applications of a data structure for
scales. There are several proposals for appro- nding pairwise interactions. In: SCG 98: proceedings
of the 14th annual symposium on computational ge-
priate MRDB data structures, see e.g., Balley
ometry, Minneapolis, pp 58 67
et al. (2004). The links can be created either in Douglas D, Peucker T (1973) Algorithms for the reduc-
the generalization process or by matching ex- tion of the number of points required to represent
Accident Impact Prediction 39

a digitized line or its caricature. Can Cartogr 10(2):


112 122 Access Control
Gr nreich D (1995) Development of computer-assisted
generalization on the basis of cartographic model the-  Privacy Threats in Location-Based Services
A
ory. In: M ller JC, Lagrange JP, Weibel R (eds) GIS
and generalization methodology and practice. Taylor
& Francis, London, pp 47 55
Hampe M, Sester M, Harrie L (2004) Multiple repre-
sentation databases to support visualisation on mobile Access Method
devices. In: International archives of photogrammetry,
remote sensing and spatial information sciences, IS-
PRS, Istanbul, vol 35  3D Crisp Clustering of Geo-Urban Data
Haunert JH, Sester M (2005) Propagating updates be-
tween linked datasets of different scales. In: Proceed-
ings of 22nd international cartographic conference, La
Coruna, pp 9 16 Access Method, High-Dimensional
Haunert JH, Sester M (2007, in press) Area collapse and
road centerlines based on straight skeletons. Geoinfor-
matica  Indexing, X-Tree
Haunert JH, Wolff A (2006) Generalization of land cover
maps by mixed integer programming. In: Proceedings
of 14th international symposium on advances in geo-
graphic information systems, Arlington
Mackaness WA, Sarajakoski LT, Ruas A (2007) Gen- Access Structures for Spatial
eralisation of geographic information: cartographic Constraint Databases
modelling and applications. Published on behalf of
the international cartographic association by Elsevier,
Amsterdam  Spatial Constraint Databases, Indexing
M ller JC, Lagrange JP, Weibel R (eds) (1995) GIS and
generalization methodology and practice. Taylor &
Francis, London
Podrenek M (2002) Aufbau des DLM50 aus dem Ba- Accident Impact Prediction
sisDLM und Ableitung der DTK50 L sungsansatz
in Niedersachsen. Kartographische Schriften Band 6.
Kirschbaum Verlag, Bonn, pp 126 130 Cyrus Shahabi1;2;3;4 and Bei (Penny) Pan5
1
Sheeren D, MustiŁre S, Zucker JD (2004) Consistency Computer Science Department, University of
assessment between multiple representations of geo- Southern California, Los Angeles, CA, USA
graphical databases: a speci cation-based approach. 2
In: Proceedings of the 11th international symposium Information Laboratory (InfoLab), Computer
on spatial data handling, Leicester Science Department, University of Southern
Spiess E (1995) The need for generalization in a gis envi- California, Los Angeles, CA, USA
ronment. In: M ller JC, Lagrange JP, Weibel R (eds) 3
University of Southern California, Los Angeles,
GIS and generalization methodology and practice.
Taylor & Francis, London, pp 31 46 CA, USA
4
Urbanke S, Dieckhoff K (2006) The adv-project Integrated Media Systems Center, University of
atkis generalization, part model generalization Southern California, Los Angeles, CA, USA
(in German). Kartographische Nachrichten 56(4): 5
Microsoft Corp., Redmond, WA, USA
191 196
van Oosterom P (1995) The gap-tree, an approach to on-
the- y map generalization of an area partitioning.
In: M ller JC, Lagrange JP, Weibel R (eds) GIS and Definition
generalization methodology and practice. Taylor &
Francis, London, pp 120 132 For the rst time, real-time high- delity
van Smaalen J (2003) Automated aggregation of geo-
graphic objects. A new approach to the conceptual spatiotemporal data on the transportation
generalisation of geographic databases. PhD thesis, networks of major cities have become available.
Wageningen University, The Netherlands This gold mine of data can be utilized to
Yang B (2005) A multi-resolution model of vector map learn about the behavior of traf c congestion
data for rapid transmission over the internet. Comput
Geosci 31(5): 569 578 at different times and locations, potentially
40 Accident Impact Prediction

resulting in major savings in time and fuel, the datasets: (1) traf c accident reports and (2) traf c
two important commodities of the twenty- rst sensor data collected from a historical time stamp
century. According to FASANA Motion report until t0 , the following three sets of parameters
(Report 2012), approximately 50% of the freeway must be predicted:
congestions are caused by nonrecurring issues, (a) The set of road segments that are impacted by
such as traf c accidents, weather hazard, special the incident: {ri }.
events, and construction zone closures. Hence, (b) For each impacted road segment ri , the sig-
it is fairly important to quantify and predict the ni cance of the impact (i.e., scale of speed
impact of traf c incidents on the surrounding decrease): vi .
traf c. This quanti cation can alleviate the (c) For each impacted road segment ri , the time
signi cant nancial and time losses attributed stamp when the impact starts: ti .
to traf c incidents, for example, it can be used In this de nition, a sensor refers to a loop
by city transportation agencies for providing detector or any other sensing device built on a
evacuation plan to eliminate potential congested road segment. It continuously (e.g., every 30 s)
grid locks, for effective dispatching of emergency reports readings (e.g., speed) to re ect traf c sit-
vehicles, or even for long-term policy-making. uation on road segments. In this problem setting,
Moreover, the predictive information can be to quantify the traf c situation on a road segment
either used by a driver directly to avoid potential (e.g., impacted or not), the readings collected
gridlocks or consumed by a predictive route- from the sensors located on this segment are
planning algorithm (e.g., Demiryurek et al. 2011) utilized. Other terms that are seen frequently in
to ensure a driver to select the best route from the this entry are de ned as follows:
start. Impacted Road Segment: For a road segment ri
The McKinsey report (McK 2011) predicts a equipped with a sensor s and time stamp t (e.g.,
worldwide consumer saving of more than $600 8:30 AM), if the speed readings reported by s
billion annually by 2020 for location-based ser- presents an anomalous decrease (e.g., 40% drop)
vices, where the biggest single consumer bene t compared with historical daily readings at time
will be from time and fuel savings from naviga- t (i.e., average of all readings collected at 8:30
tion services tapping into real-time traf c data. AM in the dataset), we consider ri as impacted
Therefore, let us consider a navigation system by traf c events.
utilizing predictive route-planning algorithm as a Backlog: For a particular accident ev, its backlog
next-generation consumer navigation system (in- (b) refers to the total length of all impacted road
car or on smartphone). We notate such systems segments between ev s location and the last im-
as ClearPath, as a motivating application, which pacted road segment, along the opposite direction
can help drivers to effectively plan their routes in of vehicle ow.
real time by avoiding the incidents impact areas. Propagation Behavior: Given a traf c accident
That is, suppose an accident is reported in real (ev) occurred at time t0 , ev s propagation behav-
time (by crowdsourcing (WAZE 2014) or through ior is de ned as a time series of backlog (b) after
agency reports or SIGALERTS (2013)) in front t0 and until it propagate to the maximum back-
of a driver, but the accident is 20 min away. If we log. Assuming ev reaches the maximum backlog
can effectively predict the impact of the accident, after t time units, its propagation behavior is
ClearPath would know that this accident would represented as bE or {b0 ; b1 ; : : : ; bt }, where the
be cleared in the next 10 min. Thereby, ClearPath subscript i for bi represents the time unit after t0 .
would guide the driver directly toward the acci-
dent because it knows that by the time the driver
arrives in the area, there would be no accident. Historical Background

Problem Definition: For a traf c accident e Several disciplines, such as transportation


occurring at time t0 , given two transportation science, civil engineering, policy planning, and
Accident Impact Prediction 41

operations research have studied the traf c con- in a faraway future (e.g., the next 30 min). In
gestion problem through mathematical models, fact, the occurrence of most accidents involves
simulation studies, and eld surveys. However, two phenomenon: (1) abrupt speed changes, for A
due to the recent sensor instrumentations of example, it is very common for the traf c speed
road networks in major cities as well as the to drop 60% when an accident occurs on freeways
vast availability of auxiliary commodity sensors in LA and (2) long-lasting propagation of the
from which traf c information can be derived speed changes, for example, a closer sensor to
(e.g., CCTV cameras, GPS devices), for the rst the accident may report a speed decrease in the
time a large volume of real-time traf c data at 3rd min after its occurrence, and a further sen-
very high spatial and temporal resolutions has sor may report similar decrease in the 30th min.
become available. While this is a gold mine of Since traditional prediction approaches rely on
data, the most popular utilization of this data is to the immediate past data to predict the future,
simply visualize and utilize the current real-time they cannot effectively predict the abrupt speed
traf c congestion on online maps, car navigation changes and how they propagate over a long term.
systems, sig-alerts, or mobile applications. Hence, the navigation systems relied on these
However, the most useful application of this data approaches may hardly navigate drivers around
is to predict the traf c ahead of you during the the accident impact area.
course of a commute to avoid traf c congestions,
especially in the presence of traf c accidents.
In the last decade, most of the studies on ac- Scientific Fundamentals
cident impact prediction are based on theoretical
modeling and simulations, which can be clas- For the motivating navigation application,
si ed into three groups: (1) deterministic queu- ClearPath, to be effective, it is essential to predict
ing theory or shock wave theory (e.g., Lawson speci c values of speed changes and backlog
et al. (1997) and Wirasinghe (1978)), (2) heuristic lengths over the lifetime (i.e., temporal) and
methods and simulations (e.g., Pal and Sinha impact area (i.e., spatial) of an accident. In
2002), and (3) microscopic modeling of driver s particular, the following three aspects need to
behavior (e.g., Daganzo (1994) and Wang and be considered:
Murray-Tuite (2010)). However, the outcome of First, the numeric values of speed changes and
these studies relies on theoretical simulations of backlog lengths. There are two major approaches
road network traf c instead of the real-world to measure the impact of accidents: (1)
collected traf c data. Also, none of these studies qualitative approaches (i.e., classify accident s
uses a source of incident data with description impact into conceptual categories such as
variables and reporting techniques, and their spa- severe or non-severe and signi cant delay
tial transferability is limited. or slight delay ) and (2) quantitative approaches
When working with real-world data, it is im- (i.e., providing numeric measurement such as
portant to identify certain characteristics of traf c 45% speed decrease and 3.2 miles of congested
data, such as temporal patterns of rush hours backlog). In the past, most studies focused on
or the spatial impacts of accidents, which need qualitative approaches for measuring impact,
to be incorporated into a data-mining technique which makes the impact harder to quantify
to make the prediction much more accurate. For (e.g., Ozbay and Kachroo 1999). The qualitative
example, for generic time series, the observations measurement may be suf cient for general
made in the immediate past are usually a good decision-making or response analysis, however,
indication of the short-term future. However, for not precise and informative enough for ClearPath.
traf c time series, this is not true in the beginning In section Impact Parameters , the prediction
of a traf c accident. Speci cally, for accident of quantitative information, which provides
impact prediction, it is necessary to predict the numeric measurements of the impact to the
sudden speed changes caused by traf c accidents surrounding areas, is introduced.
42 Accident Impact Prediction

Second, the spatiotemporal behavior of the quanti ed on the surrounding traf c in real time
impact. In previous studies, it was suf cient to using the information from past accidents.
predict the impact of an accident as a single The impact of a traf c accident can be charac-
or a set of aggregate values. For example, in terized in multiple ways. Three typical quanti -
the literature by Pan et al. (2012), the impact is cation impact parameters are (1) impact backlog,
predicted as average speed decrease or average (2) speed decrease caused by the accident, and (3)
of the backlog length. Since the impact region congestion duration.
of an accident evolves over time and space, the Based on the analysis of real-world data,
outcome of prediction approach should be the it is observed that the impact parameters
exact length of time varying backlogs (i.e., evo- vary across accidents with different attributes.
lution of congested spatial span) with different The accident reports normally contain (but
scales of speed changes. The section Impact not limited to) the following metadata: (1)
Propagation will explain the prediction accident date, (2) accident start time, (3)
strategy of the propagation behavior of traf c accident location (i.e., street name, latitude,
accident. longitude), (4) accident type (Note that the
Third, the comprehensive area impacted by a accident type usually refers to one of the
traf c accident. Most of existing researches fo- following: Traf c collision+no/minor injuries,
cused on predicting the impact with respect to the Traf c collision+major injuries/ambulance,
set of upstream road segments impacted from a Traf c collision-no details, Signal alert, Natural
traf c accident (Kwon et al. 2006). In reality, traf- weather hazard, Lane closure and Fire, etc.).,
c incidents may cause surges in traf c demand (5) type of vehicles involved if incident is an
that overwhelm the system in their vicinity with accident, and (6) number of affected lanes. Let us
a radically different ow from typical patterns. consider one of the attributes start time as an
Section Impact on Other Streets explains the example. The impact backlog of accidents that
algorithms to forecast the impact of incidents happen during daytime may be large compared
on the nearby streets and intersecting freeways, with accidents happening at midnight, due to
which can (1) identify a set of road segments that higher traf c ow during the daytime. Thereby,
will be impacted given a new incident and, (2) the key to predict impact parameters (e.g.,
for each impacted road segment, predict the spa- impact backlog) is to investigate which accident
tiotemporal performance decrease, i.e., determine attributes are correlated with them. It is likely
when and how the impact will occur in time and that some accident attributes are irrelevant or
space. redundant for inferring the impact backlog. In
order to identify the most correlated subset, we
rst process the accident attributes as normalized
Impact Parameters features and impact backlog as numerical classes.
In this entry, we utilize two real-world trans- Then we apply the Correlation-based Feature
portation datasets: (1) accident reports and (2) Selection (CFS) algorithm (Hall and Smith
traf c sensor data. And we address the problem 1998) on top of this normalized data to select
of predicting and quantifying the impact of traf c correlated features. From the result obtained from
accidents. By analyzing historic accident data, this procedure, the following accident attributes
the main idea is to classify accidents based on are selected as the most relevant:{start time,
their features (e.g., time, location, type of ac- location, direction, type, #. of affected lanes}.
cident). Subsequently, we model the impact of We use the selected attributes to categorize the
each accident class on its surrounding traf c by traf c accident according to the values of their
analyzing the archived traffic data at the time attributes and utilize the average value of the
and location of the accidents. Consequently, if impact parameters in each category to predict
a similar accident (from real-time accident data) the impact of an accident with corresponding
is observed, its impact can be predicted and attributes.
Accident Impact Prediction 43

Impact Propagation 3. Finally, utilize the learned tting function


For next-generation navigation systems to be and interpolate the backlogs at missing time
bene cial, it is essential to predict speci c values stamps and generate a complete propagation A
of speed changes, backlog lengths over the behavior. Figure 1b shows the propagation
lifetime (i.e., temporal), and impact area (i.e., behavior for our running example, where the
spatial) of an accident. This is in contrast to impact backlog fb0 ; b1 ; : : : ; b19 g is plotted at
previous scenarios where forecasting abstract or each minute.
aggregate impact parameters (e.g., backlog) was
suf cient. There are alternative modeling approaches,
To calculate the propagation behavior for an such as the use of learned parameters in the tting
accident, one naive way is to record the speed function to represent the propagative curve. The
changes on all the possible upstream locations. superiority of using interpolation result compared
However, this method requires a fairly dense with using the parameters is as follows: (1) when
placement of sensors. In most sensor networks, the interpolation is constructed, this above strat-
the sensors reporting traf c speed are always egy only uses the tting function to interpolate
distributed with a certain distance interval (e.g., the missing impact backlogs; for existing impact
0.5 mile) to each other. Therefore, due to the backlogs, it still uses the original data. However,
limited data availability, it is only feasible to de- if the coef cient vectors of the tting function
rive impact backlog from the locations equipped are directly used, additional tting error might
with sensors. To solve this problem and create be introduced into the original data, which may
a continuous propagation behavior, interpolation result in inaccurate representation of the propa-
can be used, which may be achieved by curve gation behavior. And (2) when evaluating the pre-
tting or regression analysis. An example tting diction accuracy, the variation between the actual
strategy is summarized as follows: backlog vector and predicted backlog vector can
be directly used as an error measurement, and
1. Utilize the distance of a sensor from the inci- it is also straightforward to interpret. However,
dent location to represent the impact backlog the differences between the actual and predicted
at time t , at what time they start to get im- coef cient vectors cannot intuitively explain the
pacted. prediction accuracy.
2. Subsequently, plot the derived impact back- With the propagation behavior constructed,
logs into 2D space (e.g., the scatter points the same prediction strategy for impact param-
in Fig. 1a) using the information from all the eters can be utilized to predict propagation be-
impacted sensors (e.g., sensor S1 to S4 in havior. However, in some particular cases, it is
Fig. 1a) and train a function (e.g., polynomial observed that although two incidents have similar
function) to t the plotted discrete points with attributes, their propagation behaviors are still
minimal error. highly different from each other. Therefore, it is

Accident Impact b
Prediction, Fig. 1 a 2 2
Impact Backlog (mile)

Sample propagation
1.5
Distance(mile)

behavior. (a) Fitting result. S1 1.5


(b) Interpolation result S2
1 1
S3
0.5 S4 0.5
0 0
0 10 20 0 10 20
Time Elapsed (min) Time Elapsed (min)
44 Accident Impact Prediction

Accident Impact
Prediction, Fig. 2 Impact
s4 traffic
of a traf c incident sensor
s2 3
2 traffic
s0 incident
1

potential
impact
s1 direction
s3

important to incorporate more information such attributes, the detected causality can be utilized to
as traf c density measures (e.g., volume and predict the impact in the vicinity area of a traf c
occupancy) to improve the prediction accuracy. accident.
Moreover, the consideration of using a multistep Given the strategy above, the challenge is how
prediction approach that takes into account the to detect the causality between the traf c speed
initial behavior (i.e., sub-pattern of propagation time series. One straightforward idea is to use
behavior) of an incident may further improve the the traditional causality test (e.g.,Granger 1969)
prediction accuracy. to detect the causality. However, with real-world
traf c data, it is observed that hardly any Granger
Impact on Other Streets causality existed between any pair of traf c speed
As illustrated in Fig. 2, the impact caused by a time series. This was a surprising observation and
traf c accident on a freeway may affect the traf c counterintuitive as it is expected strong causal-
ow in the following three types of locations: ity relationship among traf c time series. With
further investigation regarding the unique charac-
(1) Upstream stretch of the occurrence freeway, teristics of traf c speed time series, two types of
(2) Adjacent arterial streets, and time-sensitive causalities that are unique to traf c
(3) Other surrounding freeways. speed time series are discovered. Speci cally,
for two traf c speed time series with correlated
This section focuses on how to forecast the historical patterns, it is observed that sometimes
impact of incidents on the nearby streets and the causality only exists during the beginning
freeways (i.e., the locations (2) and (3)). of rush hours when the traf c starts to become
The intuitive way to predict the impact of congested, named as slowdown causality. Such
accidents on the nearby streets and freeways is causality only exists between two road segments
to identify the causal interactions among traf c that have strong connectivity in the road network.
at different road segments to address the afore- Conversely, in other connectivity scenarios, espe-
mentioned challenges. To identify the causality cially when the two time series are not correlated,
relationship, the main idea is to utilize archived another type of causality is observed that only
traf c sensor datasets to train causality models exists in the presence of traf c accidents and
to determine whether the time series data (e.g., during non-rush hours, named as intervention
collected from s0 in Fig. 2) is useful for predicting causality. Consequently, the detected causalities
other time series data (e.g., collected from s1 ). If can be utilized for predicting the impact of traf c
the change in traf c performance (e.g., decrease accidents, with procedure illustrated in Fig. 3.
or increase in traf c speed) at s0 leads to a change Given that a new incident e has just occurred,
in traf c performance at another location s1 , in its closest upstream sensor s0 is sent to the
the presence of a traf c accident near s0 , then s1 archived database to retrieve the relevant time
could be identi ed as part of the impacted area. intervals for causality detection. It is also utilized
Consequently, given a traf c accident and its to search among the nearby sensors and retrieve a
Accident Impact Prediction 45

Incident e
Real-time & archived traffic dataset
occurred
s0
Causality A
For each Have Real-time
detection traffic speed
e’ info <s 0 , s i > correlated
pattern No for s0
Locate sensors
?
e’s nearest Does Does Identify
sensor s 0 slowdown intervention sensor si as to
causality Yes causality be impacted,
e’s adjacent exist? exist? and predict its
sensors{ s i } traffic speed
Yes Yes
Impact Select important lag(s) based on lasso-
regressive model
prediction granger & re-train regressive model

Online Offline Online

Accident Impact Prediction, Fig. 3 Flow chart for impact prediction

potential candidate sensor to be impacted. Then, speed data collected from s0 and the learned
the sensor pair < s0 ; si >, together with the regressive model are utilized to predict the speed
corresponding dataset and the causality detection of si .
model, is used to identify whether the slowdown To enable real-time impact prediction, in
causality or intervention causality exists from s0 Fig. 3, the causality detection and important
to si . If the slowdown causality exists, there is variable selection steps need to be implemented
no need to examine the intervention causality off-line for every sensor pair on the road
because the impact of signi cant speed drops networks. Because the training step in the
from traf c incidents is already covered in the regressive model and the lasso approach require
de nition of the slowdown causality. At the access to large amounts of archived traf c time
end of the causality detection, the sensor pairs series data, the causality detection and important
(< s0 ; si >) holding the causality relationship variable selection signi cantly delays the online
can proceed to the next step, and the sensor prediction process due to a great deal of training
pairs (< s0 ; si >) holding neither slowdown time consumption. In this way, when a new
nor intervention causality are disregarded. In incident occurs, the system will search within
the former case, si is considered one of the the off-line training results to identify whether
impacted sensors that can be contributed to the causality exists between the corresponding
spatial impact range. In the latter case, si is sensor pairs and will further retrieve the learned
excluded from the spatial impact range caused by regressive model for the online traf c speed
incident e. For sensor pairs (< s0 ; si >) holding prediction for the sensor to be impacted.
the causality relationship, the next step is to Note that in the domain of social science and
select the most important time stamps (i.e., t C h economics, the causality models have already
given the accident occurs at t ) to identify when been widely applied (Pearl 1988; Glymour et al.
si starts to become impacted. In the pipeline 1987; Spirtes et al. 2001), many of which are
illustrated in Fig. 3, we resort to lasso-Granger superior to Granger causality in multivariate
(Arnold et al. 2007) approach to achieve this causality inference. However, for the impact
step. Note that after lasso, we need to retrain the problem, Granger causality model is a better
regressive model for predicting si based on the candidate for causality detection for the following
selected lag in s0 . Finally, the real-time traf c reasons. First, in this study, the ultimate goal
46 Accident Impact Prediction

is to enable the better prediction of the traffic is the major focus in transportation networks.
time series in the presence of traf c incidents It is entirely possible that traf c at different
by taking advantage of the detected causality locations is causally dependent, and they may
relationship. Revealing the complete causality have more than one cause. However, such cases
relationship among all traf c data on the road are barely useful to this problem, which is
network is not our focus. Compared with predicting the impact by a single cause (i.e., a
other causality inference models, the regressive particular traf c incident). Thus, the Granger
model of Granger causality serves as a fairly causality, even though it ignores multivariate
effective predictor for time series data, such dependencies, is particularly effective for this
as traf c sensor data. Second, for accident purpose.
impact prediction problem, it not only needs
the identi cation of the causality but also the
time lag of the causality (i.e., how much time Key Applications
needs to pass until a road segment s traf c gets
impacted by a traf c incident). For the time Navigation Systems
series-based Granger causality model, such time The result of impact prediction can be applied
lag can be effectively learned through the model s in smart-routing applications in real time to help
learning process. However, for the graph-based users avoid unexpected congestion. Speci cally,
causal inference models (e.g., Pearl (1988) and when there is traf c accident, the prediction result
Glymour et al. (1987)), it is fairly dif cult to of event impact including the backlogs and speed
learn such a temporal dependency in the detected decrease caused by the traf c events can also
causality. Finally, the existing literatures focus be utilized for the purpose of avoiding traf c
on predicting the impact of one traf c incident at congestion. To be more speci c, consider an-
a time. In particular, the one-to-one causality other example illustrated in Fig. 4. In this gure,
relationship detection between the traf c at the caution mark, the directed solid red lines,
the incident location and one other location and the dashed blue lines represent the incident

Accident Impact Prediction, Fig. 4 (a) Route calcu- incident location. (c) Route calculated based on accurate
lated based on current incident s impact. (b) Time-varying prediction of impact
expansion of impacted region as driver approaches the
Accident Impact Prediction 47

location, the congested region caused by the in- Cross-References


cident, and the route a driver plans to follow,
respectively. Without prediction, but with the  Predictive Route Planning A
knowledge of the incident, a typical navigation
application, such as Waze (WAZE 2014), may
suggest the route shown in Fig. 4a to the drivers.
References
If the driver follows this route, he would be stuck
in the traf c congestion caused by the incident, Arnold A, Liu Y, Abe N (2007) Temporal causal model-
as illustrated in Fig. 4b, due to the fact that ing with graphical granger methods. In: Proceedings
the congested region has grown. On the other of the 13th ACM SIGKDD international conference
on knowledge discovery and data mining, KDD 07.
hand, if we can predict how the impacted spatial
ACM, New York, pp 66 75
span (i.e., congested region) evolves over time, Daganzo CF (1994) The cell transmission model: a dy-
ClearPath could calculate the route that can effec- namic representation of highway traf c consistent
tively avoid the congestion from the beginning, as with the hydrodynamic theory. Transp Res Part B:
Methodol 28:269 287
shown in Fig. 4c. Demiryurek U, Banaei-Kashani F, Shahabi C, Ran-
ganathan A (2011) Online computation of fastest path
Public Policy and Decision-Making in time-dependent spatial networks. In: SSTD, Min-
The accident impact prediction cannot only ben- neapolis
Glymour C, Scheines R, Spirtes P, Kelly K (1987) Discov-
e t individual drivers through navigation system ering causal structure. Academic, Orlando
but also transportation authorities, e.g., by noti- Granger CWJ (1969) Investigating causal relations
fying drivers when they are approaching an acci- by econometric models and cross-spectral methods.
dent and suggesting alternative routes, as well as Econometrica 37:424 438
Hall MA, Smith LA (1998) Practical feature subset
implementing traf c jam diagnosis and dispersal. selection for machine learning. In: ACSC98, Perth.
Moreover, the predicted result can be visualized Springer, Berlin, pp 181 191
through a web-based user interface, which pro- Kwon J, Mauch M, Varaiya PP (2006) Components of
vides the transportation authorities with a global congestion: delay from incidents, special events, lane
closures, weather, potential ramp metering gain, and
view of all the traf c accidents in a city. Equipped excess demand. Transp Res Rec 1959:84 91
with such a service, transportation authorities Lawson TW, Lovell DJ, Daganzo CF (1997) Using the
could ef ciently monitor all the traf c accidents input-output diagram to determine the spatial and
with detailed diagnoses of their impact regions temporal extents of a queue upstream of a bottleneck.
Trans Res Rec 1572:140 147
for the purpose of better policy and decision- Manyika J, Chui M, Brown B, Bughin J, Dobbs R,
making. Roxburgh C, Byers AH (2011) Big data: the next
frontier for innovation, competition, and productivity.
McKinsey Global Institute, New York
Ozbay K, Kachroo P (1999) Incident management in
Future Directions intelligent transportation systems. Artech House, Nor-
wood, MA
The research on accident impact prediction can Pal R, Sinha KC (2002) Simulation model for evaluating
and improving effectiveness of freeway service patrol
be extended in several directions. First, besides
programs. J Transp Eng 128:355 365
traf c accidents, more complex events causing Pan B, Demiryurek U, Shahabi C (2012) Utilizing real-
congestions, such as large-scale parades or sport- world transportation data for accurate traf c predic-
ing events, can be studied, and their impacts tion. In: ICDM, Brussels
Pearl J (1988) Probabilistic reasoning in intelligent sys-
can be predicted. Second, besides traf c sensor
tems: networks of plausible inference. Morgan Kauf-
datasets, other modalities of data acquired from mann, San Mateo
video cameras and/or mobile phones can also Report FM (2012) Http://www.metro.net/board/Items/
be utilized for better prediction of accident im- 2012/03March/20120322RBMItem57.pdf
SIGALERT (2013) Http://www.sigalert.com. Last visited
pacts. Finally, studying the long-term or online
May 2013
strategies to update the prediction models using Spirtes P, Glymour C, Scheines R (2001) Causation,
streaming data is of great importance. prediction, and search. MIT, Cambridge
48 Accuracy

Wang Z, Murray-Tuite PM (2010) A cellular automata


approach to estimate incident-related travel time on Activities, Fixed
interstate 66 in near real time. Virginia Transportation
Research Council, Charlottesville  Time Geography
WAZE (2014) Http://www.waze.com. Last visited May
2014
Wirasinghe SC (1978) Determination of traf c delays
from shock-wave analysis. Trans Res 12:343 348
Activity

 Temporal GIS and Applications


Accuracy

 Uncertain Environmental Variables in GIS


Activity Analysis

Accuracy, Map  Time Geography

 Imprecision and Spatial Uncertainty


Activity Theory

Accuracy, Spatial  Time Geography

 Imprecision and Spatial Uncertainty

Acyclic Directed Graph


Active Data Mining  Hierarchies and Level of Detail

 Gaussian Process Models in Spatial Data Min-


ing
Adaptation

 Climate Adaptation, Introduction


ActiveX Components  Climate Change and Developmental Economies
 Climate Extremes and Informing Adaptation
 MapWindow GIS  Geospatial Semantic Web: Personalization

Activities and Occurrences Adaptation, Complete

 Processes and Events  User Interfaces and Adaptive Maps

Activities Flexible Adaption, Complete

 Time Geography  Mobile Usage and Adaptive Visualization


Aggregate Data: Geostatistical Solutions for Reconstructing Attribute Surfaces 49

Adaptive Aggregate Data: Geostatistical


Solutions for Reconstructing
 User Interfaces and Adaptive Maps Attribute Surfaces A
Phaedon Kyriakidis
Cyprus University of Technology, Lemesos,
Adaptive, Context-Aware Cyprus

 Mobile Usage and Adaptive Visualization


Synonyms

Ad-hoc Localization Downscaling; Spatial Interpolation; Surface


modeling
 Localization, Cooperative

Definition

Aerial Geographic information systems (GIS) are rou-


tinely used to integrate different layers of geospa-
 Photogrammetric Applications tial information for spatial analysis and decision
making. Such an integration often involves data
of geospatial attributes available at different ge-
Aerial Imagery ographical units, e.g., administrative zones ver-
sus pixels of remotely sensed imagery. What is
required in such cases is a transformation of
 Photogrammetric Sensors
attribute values from one existing spatial partition
to another, that is, a change of the geographical
units over which the original data were acquired
Affordance with an associated change in the actual attribute
values reported. In public health applications,
 Way nding: Affordances and Agent Simula- for example, socioeconomic data reported over
tion census tracts must be integrated with disease data
available over different administrative reporting
zones, to assess disease risk at increasingly ner
spatial resolutions. Similarly, in remote sensing
Agent Simulation
applications, re ectance data recorded by differ-
ent sensors with different spatial resolutions must
 Way nding: Affordances and Agent Simula-
be integrated with predictions of biophysical vari-
tion ables furnished by environmental models at yet
different spatial resolutions, for enhanced envi-
ronmental monitoring and assessment possibly at
Agent-Based Models sub-pixel scales. A particular case of geospatial
attribute transformation is the construction of
 Geographic Dynamics, Visualization and Mod- continuous attribute (e.g., population density or
eling temperature) surfaces, from originally aggregate
50 Aggregate Data: Geostatistical Solutions for Reconstructing Attribute Surfaces

data reported over arbitrary-shaped polygons or guide the redistribution of aggregate attribute
regular pixels. Attribute surfaces have appealing values to ner resolutions while maintaining
processing and interoperability characteristics, as consistency with the aggregate data (Haining
they are amenable to spatial operations in GIS 2003). It should be noted, here, that, apart from
and they can be aggregated at arbitrary spatial statistical (regression-based) models for surface
resolutions for subsequent data integration pur- reconstruction, the reliability of the resulting
poses. This contribution provides an overview of target predictions is rarely reported since most
geostatistical methods developed for the purpose surface reconstruction methods are cast in a
of reconstructing attribute surfaces from aggre- deterministic framework.
gate (areal) data. Geostatistics is a branch of spatial statistics,
with origins in mining applications, that deals
with the analysis of spatially distributed data
Historical Background (Journel and Huijbregts 1978). Geostatistical an-
alytical methods appear in numerous and diverse
In the spatial analysis literature, the task of scienti c disciplines, ranging from geoinformat-
changing an attribute s geographical unit frame ics, to earth sciences, to environmental and at-
falls in the realm of areal interpolation (Haining mospheric sciences, as well as to socioeconomic
2003). It is customary in the literature to applications. Geostatistical interpolation meth-
designate the known data and their corresponding ods, i.e., Kriging and its variants, have histori-
measurement units as source data and source cally addressed the exact same problem as areal
zones and similarly the unknown attribute interpolation. In particular, the concept of pre-
values and measurement units as target values dicting attribute values at arbitrary blocks (in 3D)
and target zones. When the target zones from known measurements de ned over points or
are in nitesimally small, i.e., points, areal blocks, termed change of support, was one of the
interpolation is tantamount to surface creation. early selling points of geostatistics, particularly
Surface reconstruction can be either based on in mining applications, along with the assess-
point source data the classical punctual spatial ment of uncertainty in the reported predictions
interpolation case or on source data de ned as (Journel and Huijbregts 1978). The problem of
aggregate values over regular pixels or irregular change of support has close connections with two
polygons, the problem of surface reconstruction celebrated issues in spatial analysis, namely, the
addressed in this contribution (a particular case modi able areal unit problem (MAUP) that per-
of downscaling). tains to the effects of aggregation on the statistics
The simplest (and earliest) form of surface of spatial attributes and the ecological inference
reconstruction from aggregate data is the problem (EIP) that pertains to the inference of
choropleth map, whereby all point attribute the statistics of disaggregate attribute values from
values within the same polygon receive the same aggregate data (Haining 2003).
value; see, for example, Haining (2003). Tobler s The connection between geostatistical
celebrated mass-preserving or pycnophylactic methods and areal interpolation, however, was
interpolation method aims at smoothing until recently limited mostly to the application of
the patchy attribute surface corresponding to punctual (point-to-point) Kriging and (point-to-)
the choropleth map, by invoking explicitly a block Kriging (Haining 2003). It was until
smoothness criterion for that surface subject recently that several commonly used areal
to constraints of aggregation consistency or interpolation methods for surface reconstruction
mass preservation (Tobler 1979). Ancillary data, were formulated within a geostatistical (area-
e.g., land cover information in a population to-point Kriging) framework (Kyriakidis 2004).
interpolation context, have also been accounted The remainder of this contribution provides an
for in surface reconstruction, via dasymetric overview of geostatistical surface reconstruction
mapping or regression models, to better methods from aggregate attribute data, with and
Aggregate Data: Geostatistical Solutions for Reconstructing Attribute Surfaces 51

without ancillary information, highlights recent Eq. (1) constitutes a discrete convolution of point
extensions, and discusses open problems and attribute values with the sampling kernel.
future directions. In its simplest form, the sampling kernel A
gn .cm / can attain a binary (0=1) value, indicating
that a particular target point cm lies within a
Scientific Fundamentals given source support Cn or not, accounting for
the representative region of the target point.
In its discrete approximation, surface reconstruc- That indicator value could be divided by the
tion can be formulated within a general spatial measure (length, area, volume) of the source
prediction framework as the task of predicting the support, depending on whether the geospatial
unknown entries of the (M 1) target attribute attributes undergoing transformation pertain to
vector yt D y.cm /; m D 1; : : : ; M T at a set area averages (spatially intensive variables),
of M point locations from the known entries of e.g., population density or average income, or
the (N 1) source data vector ys D y.Cn /, to area totals (spatially extensive variables),
n D 1; : : : ; N T available at N source supports. e.g., population counts or total income. More
Here, y.cm / denotes the unknown attribute value elaborate weighting schemes or sampling kernels
at a target location with coordinate vector cm , can be de ned, e.g., based on buffers or distance
assumed representative of an elemental region to roads or other geographical features in
around cm for discrete integration purposes, and population density estimation applications, or
y.Cn / denotes the known attribute value pertain- based on a sensor s point-spread function in
ing to a source support de ned as a polygon with remote sensing applications.
vertex coordinates stored in matrix Cn ; super- In the above formulation, the source data vec-
script T denotes transposition. For simplicity and tor ys and the target attribute vector yt , i.e.,
without loss of generality, it is assumed that the the discrete approximation of the sought-after
union of the N source polygons identi es the attribute surface, are linked as
study region A; that is, source polygons do not
overlap and cover completely the study region. In ys D Gyt (2)
addition, it is assumed that the M point locations
provide an adequate approximation to a continu- where G D gn .cm /; n D 1; : : : ; N; m D
ous surface, that is, an adequate discretization of 1; : : : ; M denotes a (N M ) matrix of sampling
the study region A, implying that M N. function values; the n-th row of G consists of
the M sampling function values for all point val-
Links Between Attribute Surface, ues within the source polygon Cn . Equation (2)
Aggregate Data, and Their Statistics contains the N measurement equations de ning
Source data are de ned via the aggregation of the N known source data, and matrix G can
point attribute values within their respective sup- be regarded as a linear spatial aggregation op-
ports. In particular, the aggregation procedure is erator. Note that the aggregation matrix G can
speci ed as a weighted linear averaging of point accommodate both point and aggregate data. In
values: other words, some elements of the source data
vector ys could pertain to point support attribute
M
X values, known, for example, from ne-resolution
y.Cn / D gn .cm /y.cm / (1) surveys. In this case, some rows of the aggre-
mD1 gation matrix G contain only one nonzero entry
corresponding to the locations of the point-level
where gn .cm / denotes the known contribution source attribute data.
of point attribute value y.cm / to the aggregate In geostatistics, the spatial distribution of an
(source) data value y.Cn /; that contribution is attribute surface y is regarded a realization of
termed sampling function or sampling kernel, and a random eld model fY.c/; c 2 Ag, or its
52 Aggregate Data: Geostatistical Solutions for Reconstructing Attribute Surfaces

discrete counterpart, a random vector (Journel aggregate data. In particular, the inference of
and Huijbregts 1978). In the second-order sta- a point covariogram model Y .hI / from ag-
tionary case, that random eld is parameterized gregate data is termed covariogram deconvolu-
by a constant mean Y and a positive-de nite tion (or deregularization) and constitutes an ill-
covariogram model Y .hI /, speci ed as a de- posed (under-determined) inverse problem, as is
creasing parametric function of distance; here h the ecological inference problem; some propos-
denotes a lag vector between any two locations, als for possible solutions to such an inference
and denotes a vector with covariogram model objective are offered by Kyriakidis (2004) and
parameters (range, sill, nugget). This implies that, Goovaerts (2008). In what follows, it is assumed
in the discrete case, the target attribute surface yt that the functional form and parameters of such
is characterized by a (M 1) constant expectation a point-level covariogram model Y .hI / have
(mean) vector t D 1t Y D and a (M been inferred, and the resulting model is used to
M ) covariance matrix t t D Y .cm cm0 I /, construct all necessary covariance matrices t t ,
m D 1; : : : ; M; m0 D 1; : : : ; M D . /, where st , and ss .
Y .cm cm0 I / denotes the covariance value
pertaining to a location pair cm and cm0 , built
from the point covariogram model Y .hI /, and Surface Reconstruction Using Aggregate
1t denotes a (M 1) vector of ones. Data Only
Being functionally linked to the unobserved When the expectation vector t of the point at-
attribute surface, the source (aggregate) data tribute values (hence the expectation vector s of
vector ys is also a realization of a random vector the source data) is known, surface reconstruction
characterized by a (N 1) expectation vector can be performed via simple Kriging (SK). In
particular, the (M 1) vector yO t D y.c O m /; m D
s D G and a (N N ) covariance matrix
0 1; : : : ; M T of SK predictions for the unknown
ss D Y .Cn ; Cn0 /; n D 1; : : : ; N; n D
T
1; : : : ; N D G . /G , where Y .Cn ; Cn0 / target attribute values is expressed as:
denotes the covariance value pertaining to
a pair of supports Cn and Cn0 . The two yO t D C WT ys (3)
t s
random vectors ys and yt are also correlated
with (N M ) (cross)covariance matrix
st D Y .Cn ; cm /; n D 1; : : : ; N; m D where W is a (N M ) matrix of SK weights; the
1; : : : ; M D G . /, where Y .Cn ; cm / denotes m-th column of matrix W contains the N weights
the covariance value pertaining to a polygon- applied to the N source data for computing the
point pair Cn and cm . When the entries of the O m / at location cm .
target prediction y.c
point-level covariance matrix . / are computed In the formulation above, all N source data
using a positive-de nite covariogram model are considered for predicting any target attribute
Y .hI /, both covariance matrices st and ss value y.cm /, a procedure termed global inter-
are positive de nite. Note that in the case of polation. Local variants of spatial interpolation
irregular supports, e.g., polygons, second-order amount to considering only a subset N 0 < N
stationarity cannot be assumed for the statistics of source data for prediction. In the isotropic
of the source data, even if that assumption is case, this subset is typically limited to a circular
reasonable for the statistics of the underlying neighborhood centered at the target location cm ;
attribute surface; this is a consequence of the the neighborhood radius is linked to the range of
spatially varying characteristics of aggregation. the point-level covariogram model Y .hI /. In
In practical applications of surface reconstruc- what follows, the discussion pertains to the global
tion, one has access to the aggregate data and not interpolation case, unless otherwise noted.
to the underlying attribute surface. This implies The SK weights of Eq. (3) are determined by
that the statistics and . / of the underly- solving a (N N ) system of (normal) equations,
ing surface must be inferred from those of the termed the simple Kriging (SK) system:
Aggregate Data: Geostatistical Solutions for Reconstructing Attribute Surfaces 53

ss W D st or G . /GT W D G . / these terms depend on the particular point-level


(4) covariance model Y .hI / adopted, as well as
on the aggregation matrix G. Note that Eqs. (3) A
and (4) do not explicate the nature (aggregate or
which has one and only one solution provided not) of the source data and target values, since
that the covariance matrices ss and st are this is encapsulated in the aggregation matrix G;
positive de nite. Such a requirement is satis- this implies that surface reconstruction via SK
ed if those covariance matrices are consistently can accommodate both point and aggregate data.
built through the aggregation matrix G using a In simple Kriging, the vector t of point-level
positive- de nite point-level covariogram model attribute expectations could in principle contain
Y .hI /, i.e., using the second expression of the spatially varying (nonconstant) entries, as long
SK system of Eq. (4). as those entries are assumed known (hence, cer-
The resulting SK predictions constitute a re- tain) or previously estimated without, however,
constructed surface that is consistent, upon ag- accounting for the uncertainty inherent in their
gregation, with the original source data. In other estimation. In most practical applications, and in
words, when the same aggregation procedure, the absence of auxiliary data, the mean of the
encapsulated in matrix G, is applied to the vector attribute surface is assumed constant, Y , since
yO t of SK predictions, one recovers the known it is unknown. In this case, surface reconstruction
source data vector ys . Indeed, using Eqs. (3) and can be achieved by ordinary Kriging (OK).
(4), one nds: More precisely, the (M 1) vector yO OK of OK
t
GOyt D ys (5) target point-level predictions is expressed as:

a (coherence) property of SK that is independent yO OK


t D WTOK ys (6)
of the particular point-level covariogram model
Y .hI / adopted. The aggregation consistency where WOK is a (.N C 1/ M ) matrix of OK
of SK predictions stems from the fact that Kriging weights, subject to the unbiasedness constraints
is an (exact) interpolator (Kyriakidis 2004). WTOK G1t D 1t , with G1t being a (N 1)
Indeed, it is well known that when Kriging is used vector containing the values of the integrals of
to create an attribute surface from point attribute the sampling functions over each of the N source
data, that surface passes through the data; supports (Sales et al. 2013).
i.e., target point predictions computed by SK The matrix of ordinary Kriging weights WOK
at source points reproduce the observed source is obtained by solving a constrained (.N C 1/
data. When area-to-point SK is used for surface .N C 1/) system of normal equations, termed the
reconstruction with consistently populated ordinary Kriging (OK) system:
(accounting for aggregation) covariance matrices,
the exactitude property of SK predictions holds G . /GT G1t WOK G . /
D (7)
for any type of aggregate data de ned as .G1t /T 0 OK 1Tt
weighted linear combinations of point attribute
values (Kyriakidis 2004). where OK is a (1 M ) vector of Lagrange
The SK weights obtained by solving the SK multipliers due to the constraints on the weights.
system of Eq. (4), and consequently the resulting Surface reconstruction via OK is also consistent
SK predictions, account for (i) the expected cor- with the aggregate source data, no matter the par-
relation or relevance between the source data and ticular point-level covariogram model Y .hI /
target values via the source-to-target covariance adopted. In other words, the coherence property
matrix ts and (ii) the relative redundancy or of Eq. (5) applies also to the vector of OK-derived
clustering (linked to the size and shape of the predictions yO OK
t .
source polygons) between the source data via Surface reconstruction via OK is typically
the source-to-source covariance matrix ss ; both performed in a local interpolation mode, whereby
54 Aggregate Data: Geostatistical Solutions for Reconstructing Attribute Surfaces

a constant but unknown local attribute mean at small distances, particularly the nugget effect
Y .cm / is assumed for the target location cm and contribution, can only be indirectly (if at all)
all locations within the N 0 < N source supports estimated, since any information at resolutions
considered within the search neighborhood smaller than the source supports is lost due to
around cm . This amounts to the assumption of aggregation. This implies that surface reconstruc-
intrinsic stationarity, a weaker assumption than tion from aggregate data can only be achieved
second-order stationarity, whereby (i) the point- in this case after invoking, explicitly or implic-
level attribute mean is assumed locally (within itly, assumptions regarding the point semivari-
each search neighborhood) constant, and (ii) a ogram model corresponding to the underlying
more general distance-based metric of spatial attribute surface. The work of Kyriakidis (2004)
association (dissimilarity), the semivariogram demonstrated that several commonly used areal
function, can be de ned even in cases (in nite interpolation methods for surface reconstruction
attribute variance) where the covariogram cannot from aggregate data can be actually formulated
(Journel and Huijbregts 1978). That point- as particular cases of (area-to-point) Kriging un-
level local attribute mean Y .cm / is implicitly der very particular point semivariogram mod-
estimated, in conjunction with the OK weights, els. In particular, it was demonstrated that (a)
using a local version of the OK system of Eq. (7) the choropleth map corresponds to area-to-point
from the N 0 source data within each search Kriging with a white-noise (pure-nugget effect)
neighborhood. point semivariogram model, (b) kernel smoothing
No matter the formulation (SK or OK) methods often do not explicitly account for the
adopted, it should be stressed that surface aggregate nature of source data, and (c) Tobler s
reconstruction from aggregate data is an under- pycnophylactic interpolation (Tobler 1979) cor-
determined (ill-posed) inverse problem, as is the responds to area-to-point Kriging with a logarith-
classical problem of surface construction from mic point semivariogram model (in 2D); this was
point measurements via spatial interpolation. also shown in practice by Yoo et al. (2010).
In other words, there are multiple alternative Several extensions and improvements of the
surfaces that could be de ned at the point level, original formulation of geostatistical surface re-
all of which could be consistent with (reproduce) construction from aggregate data have been pro-
the available source data; such surfaces constitute posed in the literature. In particular, Yoo and Kyr-
solutions to the inverse problem of surface iakidis (2006) incorporated nonnegativity con-
reconstruction. In both cases (aggregate or not straints in the formulation of area-to-point Krig-
source data), what is required is a (prior) model ing, Guan et al. (2011) proposed ef cient numer-
of attribute spatial structure at the ne (target) ical methods based on the fast Fourier transform
resolution to resolve the inherent ambiguity of the for evaluating the source-to-source G . /GT
ill-posed inverse problem and render it solvable. and source-to-target G st . / covariance inte-
In geostatistics, that prior structural information grals involved in all Kriging systems, whereas
is explicitly speci ed in terms of a (typically Nagle (2010) incorporated measurement error
parametric) semivariogram (or covariogram) in the source data by via area-to-point factorial
model that characterizes the spatial variability or Kriging. In this latter case, factorial Kriging pre-
smoothness of the unobserved attribute surface. dictions do not reproduce the aggregate source
Such semivariogram models can range from data, since such data are deemed error prone,
pure-nugget effect models, indicative of an and the resulting surface is smoother than the
ultimately rough (random) attribute surface, to one computed via area-to-point simple or ordi-
models with extremely large range and no nugget nary Kriging. Last, Goovaerts (2006) developed
contribution, indicative of an extremely smooth a variant of area-to-point Kriging, termed Pois-
surface (Journel and Huijbregts 1978). son Kriging, capable of accounting for aggregate
When only aggregate source data are avail- source data following a non-Gaussian distribu-
able, the shape of the point semivariogram model tion.
Aggregate Data: Geostatistical Solutions for Reconstructing Attribute Surfaces 55

No matter the effort put into ameliorating sur- errors. Equation (9) implies that the area-level
face reconstruction with better or more realistic regression coef cients, e.g., s , are the same with
predictors, however, the nal attribute surface those of the point level t ; the reason behind this A
re ects the information content of the aggregate resolution invariance is the fact that the point-
source data. For a given attribute surface, the level values of the dependent variable Y and of
larger the aggregation extent, the less informative the predictors X are subjected to the same linear
the aggregate source data are. Surface reconstruc- aggregation encapsulated in matrix G.
tion thus becomes more realistic and more accu- Under the above linear model, surface recon-
rate as long as reconstruction methods are able struction is achieved via Kriging with external
to incorporate auxiliary geospatial information drift (KED) (Sales et al. 2013). In particular,
available at ne spatial resolutions, particularly the (M 1) vector yO KD t of KED predictions is
at the point support level. expressed as:

Surface Reconstruction Incorporating yO KD


t D WTKD ys (10)
Point-Level Auxiliary Data
When data on K 1 auxiliary variables fXk ; k D where WTKD is a (N M ) matrix of KED
1; : : : ; K 1g relevant to the attribute Y being weights, computed under the unbiasedness con-
predicted are available at the point level, they can straints WT GXt D Xt .
be used to inform the expectation vector t of the The matrix of KED weights WKD is obtained
latent attribute surface. In this case, the unknown by solving a constrained (.N C K/ .N C K/)
target attribute vector yt can be linked to data of system of normal equations, termed the KED
the K 1 auxiliary variables via a linear model: system:

yt D Xt C et (8) G Q /GT GXt WTKD Q/


t E. G E.
D
.GXt /T O KD XTt
where Xt is a (M K) design matrix with point (11)
data on the K 1 covariates and a vector of
where KD is a (K M ) matrix of Lagrange
M ones in its rst column, and t is a (K 1)
multipliers due to the constraints on the weights,
vector of point-level regression coef cients. Term
and O is a (K K) matrix of zeros. Note that the
et is a (M 1) vector of multivariate Gaus-
KED system for surface reconstruction reverts to
sian errors or disturbances, uncorrelated with the
the OK system of Eq. (7) when the design matrix
K 1 covariates, having zero expectation and
Xt used above includes only a vector of ones.
(M M ) covariance matrix E . Q /; here Q de-
The KED system of equations call for knowl-
notes a parameter vector pertaining to the point-
edge of the covariance matrix E . Q / of the
level covariogram model E .hI Q / of the error
point-level linear model disturbance term et , in-
term.
stead of the covariance matrix . / of the un-
Based on the above linear model, the aggre-
derlying attribute surface yt used in the SK of
gation of the point attribute values leads to an
Eq. (4) and the OK system of Eq. (7). This im-
updated de nition of the source data vector ys to
plies that surface reconstruction via KED requires
account for the auxiliary data:
inference of the functional form and parame-
ter vector Q of a point-level error covariance
ys D G.Xt t Cet / D GXt t CGet D Xs t Ces model E .hI Q /, along with the vector of re-
(9) gression coef cients implicitly estimated in
where GXt D Xs is a (N K) vector of area- the KED formulation. No matter the point-level
level data on the K 1 covariates at the N source error covariance model E .hI Q / adopted, surface
supports (plus a vector of N ones), and Get D es reconstruction via KED is also consistent with
is a (N 1) vector of unobserved area-level the aggregate source data. In other words, the
56 Aggregate Data: Geostatistical Solutions for Reconstructing Attribute Surfaces

coherence property of Eq. (5) applies also to the attribute value. That uncertainty is quanti ed by
vector of KED-derived predictions yO KEDt (Sales the prediction error variance at each target loca-
et al. 2013). tion cm , taking into account (conditional on) the
Surface reconstruction via KED can also be con guration of the source supports, the point-
performed in a local interpolation mode, whereby level covariogram model Y .hI /, as well as the
a new linear regression model similar to Eq. (8) aggregate nature of the source data encapsulated
is postulated for the N 0 < N source supports in the aggregation matrix G.
considered within the search neighborhood cen- For the case of SK, the prediction error vari-
tered at a target location cm . The vector t of ance O Y .cm / at location cm can be derived from
point-level regression coef cients is implicitly the (M M ) SK prediction error covariance ma-
estimated, in conjunction with the KED weights, trix O t t D O Y .cm ; cm0 /, m D 1; : : : ; M; m0 D
using a local version of the KED system of 1; : : : ; M , with O Y .cm ; cm0 / denoting the condi-
Eq. (11) from the N 0 source data within each tional covariance value for a location pair cm and
search neighborhood. cm0 , expressed as:
A rather restrictive requirement of area-to-
point KED is that aggregate data of both the O tt D tt WT st
dependent variable Y and the K 1 independent
variables X pertaining to the same support D . / . /G G . /GT 1
G . /
Cn be all de ned using the same aggregation (12)
mechanism, since the sampling function gn .cm /
does not depend on any particular variable. where the M entries on the diagonal of matrix
When this requirement is not satis ed, surface O t t correspond to the SK prediction error vari-
reconstruction can be achieved via area-to- ances at the M target locations; such conditional
point coKriging and its variants accounting variance values represent the uncertainty in the
for a spatial varying attribute mean (Atkinson target predictions and are typically mapped along
et al. 2008). CoKriging weights furnish the with the SK-derived attribute surface of Eq. (3).
contribution of each source datum value, be it When the attribute expectation vector t is un-
of the dependent variable Y or of an auxiliary known and linked to auxiliary data via the regres-
variable Xk , to the target prediction as a function sion model of Eq. (8), the corresponding predic-
of both point-level and regularized (aggregate- tion error variance O YKD .cm / at location cm can
level) auto- and cross-covariogram values. The be derived from the (M M ) SK prediction error
solution of the corresponding coKriging system covariance matrix O KD tt D O YKD .cm ; cm0 /; m D
of equations calls for a permissible (positive- 1; : : : ; M; m D 1; : : : ; M , with O YKD .cm ; cm0 /
0

de nite) joint model, e.g., the linear model of denoting the KED-derived conditional covariance
coregionalization (Journel and Huijbregts 1978), value for a location pair cm and cm0 , expressed as
for all point-level auto- and cross-covariograms (Sales et al. 2013):
de ned between all pairs of variables involved.
Although surface reconstruction based on area- O KD
tt D .Q/ . Q /G G . Q /GT 1
G .Q/
to-point coKriging is more exible than KED-
T
based reconstruction, it requires parameter C KD Xt (13)
estimation for a signi cantly larger number of
point-level covariogram models, thus increasing where term TKD Xt represents the increase (with
considerably the required inference effort. respect to SK) in prediction uncertainty brought
by the fact that the attribute expectation vector t
Uncertainty in Surface Reconstruction (hence s ) assumed known in the SK formulation
Kriging is a stochastic surface reconstruction of Eq. (3) is now implicitly estimated by KED.
method and as such provides an estimate of In the multivariate Gaussian case, the area-
uncertainty or reliability for each predicted point to-point Kriging prediction and the associated
Aggregate Data: Geostatistical Solutions for Reconstructing Attribute Surfaces 57

prediction error variance furnish the parameters ing and its variants, as well as ATP stochastic
of a local Gaussian probability distribution of simulation, has been employed in several elds,
possible attribute values given the aggregate ranging from remote sensing and geoinformation A
source data, possibly including data on to environmental science, population mapping, as
relevant auxiliary variables used for spatial well as public health. In terms of remote sens-
prediction (in the case of KED), as well as ing applications, ATP Kriging and ATP coKrig-
the particular point-level covariogram model ing have been extensively used for downscaling
adopted. That probability distribution can be moderate resolution imaging spectroradiometer
used for propagating (either analytically or (MODIS) data to ner spatial resolutions; see,
numerically through statistical simulation) the for example, Atkinson et al. (2008), Sales et al.
local uncertainty in interpolated attribute values (2013), and Wang et al. (2015), as well as Truong
to quantify uncertainty in the results of local GIS et al. (2014) who extended ATP Kriging and
operations involving one location at a time. simulation to account for expert knowledge when
Reconstructed attribute surfaces, however, of- downscaling MODIS temperature pro le data.
ten undergo spatial operations involving multiple These applications showcase the great poten-
locations at a time, e.g., gradient computations or tial of geostatistical surface reconstruction when
other focal or zonal operations in a GIS. In such combined with MODIS data for a wide variety
cases, however, knowledge of the local Kriging of environmental monitoring purposes, such as
attribute prediction and variance at a set of target global deforestation mapping.
locations, considered one at a time, is not ade- In terms of soil science applications, Kerry
quate for such a multiple-point uncertainty anal- et al. (2012) employed ATP Kriging to disaggre-
ysis task. The preferred means for uncertainty gate legacy soil data for mapping soil organic
propagation in this case is surface reconstruction carbon, and Horta et al. (2014) applied ATP
via geostatistical simulation (Kyriakidis and Yoo stochastic simulation for mapping soil hydraulic
2005). As stated before, attribute surface recon- properties integrating measurements of different
struction from (aggregate or point) source data is spatial resolutions. Last, geostatistical surface re-
an under-determined inverse problem, which can construction has been also employed in socioe-
be rendered solvable once a covariogram model is conomic applications. In particular, Goovaerts
postulated or inferred for the underlying attribute (2012) employed ATP binomial Kriging for map-
surface (when using SK and OK) or for the ping cancer mortality risk while accounting for
regression error surface (when using KED). Even different levels of spatial aggregation and for non-
when such point-level statistics related to surface Gaussian distribution of the aggregate data, Liu
smoothness have been inferred, multiple plausi- et al. (2008) applied ATP Kriging to the residuals
ble solutions exist to the stochastic surface recon- of a regression model linking urban population
struction inverse problem, all sharing the same data from census units to land-use zones, Yoo and
covariogram model and being consistent with Kyriakidis (2009) applied ATP Kriging for down-
the aggregate source data. Surface reconstruction scaling housing prices within a hedonic pricing
via geostatistical simulation can be regarded as model framework, and Nagle (2010) applied ATP
the procedure of generating or exploring such factorial Kriging to predict employment density
alternative attribute surfaces, thus furnishing mul- from aggregate data.
tiple solutions to the stochastic disaggregation
inverse problem (Kyriakidis and Yoo 2005).
Future Directions

Key Applications Geostatistical surface reconstruction from


aggregate attribute data calls for knowledge
Geostatistical reconstruction of attribute surfaces of a point-level covariogram or semivariogram
from aggregate data via area-to-point (ATP) Krig- model, even when auxiliary point-level attribute
58 Aggregate Data: Geostatistical Solutions for Reconstructing Attribute Surfaces

data are available. Although this requirement multiple-point geostatistics are increasingly used
might seem problematic at rst sight, it explicates as geostatistical downscaling methods.
the subjective decisions made at the surface It is expected that as more of these develop-
(point) level by existing methods for surface ments nd their way into commercial or open-
reconstruction. Explicit model speci cation at source GIS software, geostatistical surface re-
the point level is more exible and creates construction from aggregate data will become an
more opportunities for interdisciplinary problem- even more popular downscaling method across
solving than downscaling relying on somewhat multiple disciplines.
arbitrary decisions invoked implicitly by
traditional surface reconstruction methods. More
research is thus required to develop guidelines Recommended Reading
for selecting appropriate models of point-
level spatial correlation for selected classes of Atkinson PM, Pardo-Igœzquiza E, Chica-Olmo M (2008)
downscaling problems, e.g., depending on the Downscaling cokriging for super-resolution mapping
of continua in remotely sensed images. IEEE Trans
particular attribute surface being reconstructed Geosci Remote Sens 46:573 580
and/or the particular region or environment where Goovaerts P (2006) Geostatistical analysis of disease data:
the aggregate source data are available. accounting for spatial support and population density
The inclusion of time as an additional data and in the isopleth mapping of cancer mortality risk using
area-to-point Poisson kriging. Int J Health Geogr 5:52
modeling component has been one of the major Goovaerts P (2008) Kriging and semivariogram deconvo-
areas of development in geostatistics during the lution in the presence of irregular geographical units.
last decade. Several space-time semivariogram Math Geosci 40:101 128
functions have been proposed in the literature for Goovaerts P (2012) Geostatistical analysis of health data
with different levels of spatial aggregation. Spat Spa-
modeling joint attribute variation in a spatiotem- tiotemporal Epidemiol 3:83 92
poral context. In addition, space-time semivari- Guan Q, Kyriakidis PC, Goodchild MF (2011) A parallel
ogram functions derived from analytical solutions computing approach to fast geostatistical areal inter-
of partial differential equations have also been polation. Int J Geogr Inf Sci 25:1241 1267
Haining R (2003) Spatial data analysis: theory and prac-
developed to account for the dynamic evolution tice. Cambridge University Press, Cambridge
of spatiotemporal processes. Such models can Horta A, Pereira MJ, Gon alves M, Ramos T, Soares A
furnish the much sought-after, yet often elusive, (2014) Spatial modelling of soil hydraulic properties
point-level semivariogram function required for integrating different supports. J Hydrol 51:1 9
Journel AG, Huijbregts CJ (1978) Mining geostatistics.
geostatistical surface reconstruction, as well as Academic, London
infuse process-based expert or prior knowledge Kerry R, Goovaerts P, Rawlings BG, Marchant BP (2012)
in the downscaling procedure. Disaggregation of legacy soil data using area to point
Another recent development in geostatistics kriging for mapping soil organic carbon at the regional
scale. Geoderma 170:347 358
is that of multiple-point geostatistics, whereby Kyriakidis PC (2004) A geostatistical framework for area-
attribute spatial patterns involving more than two to-point spatial interpolation. Geogr Anal 36:259 389
points at a time (a semivariogram is a two-point Kyriakidis PC, Yoo E-H (2005) Geostatistical prediction
statistic) are learned from training images (Ma- and simulation of point values from areal data. Geogr
Anal 37:124 151
riethoz and Caers 2014). Such images could be Liu X, Kyriakidis PC, Goodchild MF (2008) Population
constructed from remotely sensed images or even density estimation using regression and area-to-point
attribute surfaces stemming from numerical mod- residual Kriging. Int J Geogr Inf Sci 22:431 447
els of physical or social processes and represent Mariethoz G, Caers J (2014) Multiple-point geostatis-
tics: stochastic modeling with training images. Wiley-
prior (before aggregate data acquisition) repos- Blackwell, Chichester
itories of spatial patterns. These learned spatial Nagle NN (2010) Geostatistical smoothing of areal data:
patterns provide more realistic models of spa- mapping employment density with factorial Kriging.
tial heterogeneity and complexity than paramet- Geogr Anal 42:99 117
Sales MHR, Sousa CM Jr, Kyriakidis PC (2013) Fusion
ric (or nonparametric) semivariogram functions. of MODIS images using Kriging with external drift.
Fine spatial resolution training images along with IEEE Trans Geosci Remote Sens 51:2250 2259
Aggregate Queries, Progressive Approximate 59

Tobler WR (1979) Smooth pycnophylactic interpolation and in reasonable time. Alternatively, the precise
for geographical regions. J Am Stat Assoc 74:519 530 value of the aggregate may not even be needed
Truong PN, Heuvelink GBM, Pebezma E (2014) Bayesian
area-to-point kriging using expert knowledgeas infor-
by the application submitting the query, e.g., if A
mative priors. Int J Appl Earth Obs Geoinf 30:128 138 the aggregate value is to be mapped to an 8-bit
Wang Q, Shi W, Atkinson PM, Zhao Y (2015) Down- color code for visualization. Hence, this moti-
scaling MODIS images with area-to-point regression vates the use of approximate aggregate queries,
kriging. Remote Sens Environ 166:191 204
Yoo E-H, Kyriakidis PC (2006) Area-to-point Kriging
which return a value close to the exact one, but at
with inequality-type data. J Geogr Syst 8:357 390 a fraction of the time.
Yoo E-H, Kyriakidis PC (2009) Area-to-point Kriging in Progressive approximate aggregate queries go
spatial hedonic pricing models. J Geogr Syst 11:381 one step further. They do not produce a single
406
Yoo E-H, Kyriakidis PC, Tobler W (2010) Reconstructing
approximate answer, but continuously re ne the
population density surfaces from areal data: a compar- answer as time goes on, progressively improving
ison of Tobler s pycnophilactic interpolation method its quality. Thus, if the user has a xed deadline,
and area-to-point Kriging. Geogr Anal 42:78 98 he can obtain the best answer within the allotted
time; conversely, if he has a xed answer accu-
racy requirement, the system will use the least
amount of time to produce an answer of suf cient
Aggregate Nearest Neighbor
accuracy. Thus, progressive approximate aggre-
Queries
gate queries are a exible way of implementing
aggregate query answering.
 Variations of Nearest Neighbor Queries in Eu-
Multi-Resolution Aggregate trees (MRA-
clidean Space
trees) are spatial or in general multi-
dimensional indexing data structures, whose
nodes are augmented with aggregate values for
Aggregate Queries, Progressive all the indexed subsets of data. They can be used
Approximate very ef ciently to provide an implementation of
progressive approximate query answering.
Iosif Lazaridis and Sharad Mehrotra
Department of Computer Science, University of
California, Irvine, CA, USA Historical Background

Aggregate queries are extremely useful because


they can summarize a huge amount of data by a
Synonyms single number. For example, many users expect
to know the average and highest temperature in
Approximate aggregate query; On-line aggre-
their city and are not really interested in the
gation
temperature recorded by all environmental moni-
toring stations used to produce this number. The
simplest aggregate query speci es a selection
Definition condition specifying the subset of interest, e.g.,
all monitoring stations in Irvine and an aggre-
Aggregate queries generally take a set of objects gate type to be computed, e.g., MAX tempera-
as input and produce a single scalar value as ture .
output, summarizing one aspect of the set. Com- The normal way to evaluate an aggregate
monly used aggregate types include MIN, MAX, query is to collect all data in the subset
AVG, SUM, and COUNT. of interest and evaluate the aggregate query
If the input set is very large, it might not over them. This approach has two problems:
be feasible to compute the aggregate precisely rst, the user may not need to know that the
60 Aggregate Queries, Progressive Approximate

temperature is 34:12 C, but 34 0:5 C will on the subset of interest without having to process
suf ce; second, the dataset may be so large a great number of tuples individually. Moreover,
that exhaustive computation may be infeasible. MRA-trees provide deterministic answer quality
These observations motivated researchers to guarantees to the user that are easy for him
devise approximate aggregate query answering to prescribe (when he poses his query) and to
mechanisms. interpret (when he receives the results).
Off-line synopsis based strategies, such as his-
tograms (Ioannidis and Poosala 1999), samples
(Acharya et al. 1999), and wavelets (Chakrabarti Scientific Fundamentals
et al. 2000) have been proposed for approx-
imate query processing. These use small data Multi-dimensional index trees such as R-trees,
summaries that can be processed very easily to quad-trees, etc., are used to index data exist-
answer a query at a small cost. Unfortunately, ing in a multi-dimensional domain. Consider a
summaries are inherently unable to adapt to the d-dimensional space Rd and a nite set of points
query requirements. The user usually has no way (input relation) S Rd . Typically, for spatial
of knowing how good an approximate answer is applications, d 2 f2; 3g. The aggregate query
and, even if he does, it may not suf ce for his is de ned as a pair (agg, RQ ) where agg is
goals. Early synopsis based techniques did not an aggregate function (e.g., MIN, MAX, SUM,
provide any guarantees about the quality of the AVG, COUNT) and RQ Rd is the query
answer, although this has been incorporated more region. The query asks for the evaluation of agg
recently (Garofalakis and Kumar 2005). over all tuples in S that are in region RQ . Multi-
Online aggregation (Hellerstein et al. 1997) dimensional index trees organize this data via a
was proposed to deal with this problem. In online hierarchical decomposition of the space Rd or
aggregation, the input set is sampled continu- grouping of the data in S . In either case, each
ously, a process which can, in principle, continue node N indexes a set of data tuples contained in
until this set is exhausted, thus providing an an- its subtree which are guaranteed to have values
swer of arbitrarily good quality; the goal is, how- within the node s region RN .
ever, to use a sample of small size, thus saving MRA-trees (Lazaridis and Mehrotra 2001) are
on performance while giving a good enough generic data techniques that can be applied over
answer. In online aggregation, a running aggre- any standard multi-dimensional index method;
gate is updated progressively, nally converging they are not yet another indexing technique. They
to the exact answer if the input is exhausted. The modify the underlying index by adding the value
sampling usually occurs by sampling either the of the agg over all data tuples indexed by (i.e.,
entire data table or a subset of interest one tuple in the sub-tree of) N to each tree node N . Only
at a time; this may be expensive, depending on a single such value, e.g., MIN, may be stored,
the size of the table, and also its organization: but in general, all aggregate types can be used
if tuples are physically ordered in some way, without much loss of performance. An example
then sampling may need to be performed with of an MRA-quad-tree is seen in Fig. 1.
random disk accesses, which are costiercompared The key observation behind the use of MRA-
to sequential accesses. trees is that the aggregate value of all the tuples
Multi-resolution trees (Lazaridis and Mehrotra indexed by a node N is known by just visiting N .
2001) were designed to deal with the limita- Thus, in addition to the performance bene t of
tions of established synopsis-based techniques a standard spatial index (visiting only a fraction
and sampling-based online aggregation. Unlike of selected tuples, rather than the entire set), the
off-line synopses, MRA-trees are exible and MRA-tree also avoids traversing the entire sub-
can adapt to the characteristics of the user s tree of nodes contained within the query region.
quality/time requirements. Their advantage over Nodes that partially overlap the region may or
sampling is that they help queries quickly zero in may not contribute to the aggregate, depending
Aggregate Queries, Progressive Approximate 61

Aggregate Queries, Progressive Approximate, Fig. 1 Example of an MRA-quad-tree

Aggregate Queries, Progressive Approximate, Fig. 2 A snapshot of MRA-tree traversal

on the spatial distribution of points within them. The progressive approximation algorithm
Such nodes can be further explored to improve (Fig. 3) has three major components:
performance. This situation is seen in Fig. 2:
nodes at the perimeter of the query (set Np ) can Computation of a deterministic interval of
be further explored, whereas nodes at the interior confidence guaranteed to contain the aggre-
(Nc ) need not be. gate value, e.g., [30, 40].
62 Aggregate Queries, Progressive Approximate

Aggregate Queries, Progressive Approximate, Fig. 3 Progressive approximation algorithm

Estimation of the aggregate value, e.g., 36.2. example, if the SUM of all contained nodes is 50
A traversal policy which determines which and the SUM of all partially overlapping nodes is
node to explore next by visiting its children 15, then the interval is [50, 65] since all the tuples
nodes. in the overlapping nodes could either be outside
or inside the query region.
The interval of con dence can be calculated There is no single best way for aggregate value
by taking the set of nodes partially overlap- estimation. For example, taking the middle of
ping/contained in the query into account (Fig. 2). the interval has the advantage of minimizing the
The details of this for all the aggregate types can worst-case error. On the other hand, intuitively, if
be found in Lazaridis and Mehrotra (2001). For a node barely overlaps with the query, then it is
Aggregate Queries, Progressive Approximate 63

Aggregate Queries, Relative Error (COUNT, 25%)


Progressive 1.4
Approximate, Fig. 4
Answer error improves as 1.2 A
more MRA-tree nodes are

Average Relative Error


visited 1

0.8

0.6

0.4

0.2

0
0 100 200 300 400 500 600
# MRA-tree Nodes Visited

expected that its overall contribution to the query lectivity affects processing speed; like all multi-
will be slight. Thus, if in the previous example dimensional indexes, performance degrades as a
there are two partially overlapping nodes, A and higher fraction of the input table S is selected.
B, with SUM(A) D 5 and SUM(B) D 15, and However, unlike traditional indexes, the degrada-
30% of A and 50% of B overlaps with the query tion is more gradual since the interior area of
respectively, then a good estimate of the SUM the query region is not explored. A typical pro le
aggregate will be 50 C 5 0:3 C 15 0:5 D 59. of answer error as a function of the number of
Finally, the traversal policy should aim to nodes visited can be seen in Fig. 4.
shrink the interval of con dence by the great- MRA-trees use extra space (to store the
est amount, thus improving the accuracy of the aggregates) in exchange for time. If the
answer as fast as possible. This is achieved by underlying data structure is an R-tree, then
organizing the partially overlapping nodes using storage of aggregates in tree nodes results in
a priority queue. The queue is initialized with the decreased fanout since fewer bounding rectangles
root node and subsequently the front node of the and their accompanying aggregate values
queue is repeatedly picked, its children examined, can be stored within a disk page. Decreased
the con dence interval and aggregate estimate is fanout may imply increased height of the tree.
updated, and the partially overlapping children Fortunately, the overhead of aggregate storage
are placed in the queue. Our example may show does not negatively affect performance since it
the preference to explore node B before A since it is counter-balanced by the bene ts of partial
contributed more (15) to the uncertainty inherent tree exploration. Thus, even for computing the
in the interval of con dence than B (5). Detailed exact answer, MRA-trees are usually faster than
descriptions of the priority used for the different regular R-trees and the difference grows even if a
aggregate types can be found in Lazaridis and small error, e.g., in the order of 10%, is allowed
Mehrotra (2001). (Fig. 5).
Performance of MRA-trees depends on both
the underlying data structure used as well as
the aggregate type and query selectivity. MIN Key Applications
and MAX queries are typically evaluated very
ef ciently since the query processing system uses Progressive approximate aggregate queries using
the node aggregates to quickly zero in on a a multi-resolution tree structure can be used in
few candidate nodes that contain the minimum many application domains when data is either
value; very rarely is the entire perimeter needed large, dif cult to process, or the exact answer is
to compute even the exact answer. Query se- not needed.
64 Aggregate Queries, Progressive Approximate

Aggregate Queries, MRA-RTree Vs. R-Tree I/Os (2D Synthetic)


Progressive 25
Approximate, Fig. 5 RTree
MRA-RTree (exact)
MRA-R-tree performance

Page I/Os (% Database Size)


MRA-RTRee (10% max. rel. error)
compared to regular R-tree 20

15

10

0
0 2 4 6 8 10 12
Spatial Query Selectivity (% space)

On-line Analytical Processing: On-line analyt- Future Directions


ical processing (OLAP) is often applied to huge
transaction datasets, such as those produced by A limitation of MRA-trees is that they have to
merchants or other geographically distributed en- maintain the aggregate values at each node of
terprises. If these data are indexed using an MRA- the tree. Thus, whenever a data insertion and
tree, such as aggregate queries, the most frequent deletion takes place, all nodes in the path from
type of query found in OLAP can be processed the root to the modi ed leaf have to be updated.
ef ciently. This cost may be signi cant, e.g., in applications
with frequent updates, such as those involving
Wireless Sensor Networks: Sensor networks moving objects. This extra cost may be reduced
consist of numerous small sensors deployed over if updates are deferred; this would improve per-
a geographical region of interest. Interestingly, formance, but with an accompanying loss of
sensors are often organized in a routing tree accuracy.
leading to an access point from which data is Beyond aggregate queries, progressive
forwarded to the data infrastructure. This routing approximation can also be used in queries
tree itself could become a spatial index, thus lim- producing a set of objects as output. Unlike
iting the number of hops of wireless communica- aggregate queries that admit to a natural
tion needed to obtain the aggregate value. Thus, de nition of accuracy (the length of the
fewer hops of wireless communication analogous con dence interval), there is no clear metric to
to disk I/Os in disk-based data structures such as assess the quality of set-based answers. Precision
R-trees, will be necessary. and recall used in information retrieval systems
may quantify the purity and completeness of
Virtual Reality and Visualization: Information the answer set (Lazaridis and Mehrotra 2004),
about a geographical region is often presented but more elaborate methods can be devised,
in visual form, in either a static or a dynamic particularly if the answer set is visualized
presentation (e.g., a virtual y-through). Queries in a GIS system. While datasets continue to
may come at a very high rate (equal to the frame exponentially grow in size, visualization media
rate), whereas the precision of the visualization and the human perceptual system does not,
is inherently limited by the color coding and and hence, it will be useful to adapt query
limitations of human perception. Approximate processing to their limitations rather than to
aggregate queries are thus ideally suited to drive process data exhaustively at a great cost, but
interactive visualizations (Porkaew et al. 2001). with no observable bene t for the user.
Aggregation Query, Spatial 65

Cross-References Papadias D, Kalnis P, Zhang J, Tao Y (2001) Ef -


cient OLAP operations in spatial data warehouses. In:
SSTD 01: proceedings of the 7th international sympo-
 Multi-resolution Aggregate Tree sium on advances in spatial and temporal databases. A
 Progressive Approximate Aggregation Springer, London, pp 443 459

Recommended Reading
Aggregation
Acharya S, Gibbons P, Poosala V, Ramaswamy S (1999)
Joint synopses for approximate query answering. In:
SIGMOD 99: proceedings of the 1999 ACM SIG-  Hierarchies and Level of Detail
MOD international conference on management of
data. ACM Press, New York, pp 275 286
Chakrabarti K, Garofalakis MN, Rastogi R, Shim K
(2000) Approximate query processing using wavelets. Aggregation Query, Spatial
In: VLDB 00: proceedings of the 26th international
conference on very large data bases. Morgan Kauf-
mann, San Francisco, pp 111 122 Donghui Zhang
Garofalakis M, Kumar A (2005) Wavelet synopses for College of Computer and Information Science,
general error metrics. ACM Trans Database Syst Northeastern University, Boston, MA, USA
30(4):888 928
Hellerstein JM, Haas PJ, Wang HJ (1997) Online aggrega-
tion. In: SIGMOD 97: proceedings of the 1997 ACM
SIGMOD international conference on management of
data. ACM Press, New York, pp 171 182 Synonyms
Ioannidis YE, Poosala V (1999) Histogram-based approx-
imation of set-valued query-answers. In: VLDB 99: Spatial Aggregate Computation
proceedings of the 25th international conference on
very large data bases. Morgan Kaufmann, San Fran-
cisco, pp 174 185
Lazaridis I, Mehrotra S (2001) Progressive approximate Definition
aggregate queries with a multi-resolution tree struc-
ture. In: SIGMOD 01: proceedings of the 2001 ACM Given a set O of weighted point objects and a
SIGMOD international conference on management of
data. ACM Press, New York, pp 401 412 rectangular query region r in the d-dimensional
Lazaridis I, Mehrotra S (2004) Approximate selection space, the spatial aggregation query asks the to-
queries over imprecise data. In: ICDE 04: proceedings tal weight of all objects in O which are contained
of the 20th international conference on data engineer-
in r.
ing, Washington, DC. IEEE Computer Society
Porkaew K, Lazaridis I, Mehrotra S, Winkler R (2001) This query corresponds to the SUM aggrega-
Database support for situational awareness. In: Vassil- tion. The COUNT aggregation, which asks for
iop MS, Huang TS (eds) Computer-science handbook the number of objects in the query region, is a
for displays summary of ndings from the Army Re-
special case when every object has equal weight.
search Lab s advanced displays & interactive displays
federated laboratory. Rockwell Scienti c Company The problem can actually be reduced to a
special case, called the dominance-sum query. An
object o1 dominates another object o2 if o1 has
larger value in all dimensions. The dominance-
Recommended Reading sum query asks for the total weight of objects
Karras P, Mamoulis N (2005) One-pass wavelet synopses
dominated by a given point p. It is a special case
for maximum-error metrics. In: VLDB 05: proceed- of the spatial aggregation query, when the query
ings of the 31st international conference on very large region is described by two extreme points: the
data bases, Trondheim. VLDB Endowment, pp 421 lower-left corner of space and p.
432
Lenz HJ, Jurgens M (1998) The Ra -tree: an improved
The spatial aggregation query can be reduced
r-tree with materialized data for supporting range to the dominance-sum query in the 2D space, as
queries on olap-data. In: DEXA workshop, Vienna illustrated below. Given a query region r (a 2D
66 Aggregation Query, Spatial

rectangle), let the four corners of r be low- To externalize an internal memory data structure,
erleft, upperleft, lowerright, and upperright. It a widely used method is to augment it with block-
is not hard to verify that the spatial aggregate access capabilities (Vitter 2001). Unfortunately,
regarding to r is equal to this approach is either very expensive in query
cost, or very expensive in index size and update
d omi nancesum.upperright / cost.
d omi nancesum.lowerright / Another approach to solve the spatial aggre-
gation query is to index the data objects with
d omi nancesum.upperlef t /
a multidimensional access method like the R -
C d omi nancesum.lowerlef t /
tree (Beckmann et al. 1990). The R -tree (and
the other variations of the R-tree) clusters nearby
objects into the same disk page. An index entry
Historical Background is used to reference each disk page. Each index
entry stores the minimum bounding rectangle
In computational geometry, to answer the (MBR) of objects in the corresponding disk page.
dominance-sum query, an in-memory and static The index entries are then recursively clustered
data structure called the ECDF-tree (Bentley based on proximity as well. Such multidimen-
1980) can be used. The ECDF-tree is a multi-level sional access methods provide ef cient range
data structure, where each level corresponds to query performance in that subtrees whose MBRs
a different dimension. At the rst level (also do not intersect the query region can be pruned.
called main branch), the d -dimensional ECDF- The spatial aggregation query can be reduced to
tree is a full binary search tree whose leaves the range search: retrieve the objects in the query
store the data points, ordered by their position region and aggregate their weights on the y.
in the rst dimension. Each internal node of this Unfortunately, when the query region is large, the
binary search tree stores a border for all the query performance is poor.
points in the left subtree. The border is itself An optimization proposed by Lazaridis and
a (d -1)-dimensional ECDF-tree; here points Mehrotra (2001) and Papadias et al. (2001) is
are ordered by their positions in the second to store, along with each index entry, the total
dimension. The collection of all these border weight of objects in the referenced subtree. The
trees forms the second level of the structure. index is called the aggregate R-tree, or aR-tree
Their respective borders are (d -2)-dimensional in short. Such aggregate information can improve
ECDF-trees (using the third dimension and so the aggregation query performance in that if the
on). To answer a dominance-sum query for point query region fully contains the MBR of some
p D .p1 ; : : : ; pd /, the search starts with the index entry, the total weight stored along with
root of the rst level ECDF-tree. If p1 is in the index entry contributes to the answer, while
the left subtree, the search continues recursively the subtree itself does not need to be examined.
on the left subtree. Otherwise, two queries are However, even with this optimization, the query
performed, one on the right subtree and the other effort is still affected by the size of the query
on the border; the respective results are then region.
added together.
In the elds of GIS and spatial databases, one Scientific Fundamentals
seeks for disk-based and dynamically updateable
index structures. An approach is to externalize This section presents a better index for the
and dynamize the ECDE-tree. To dynamize a dominance-sum query (and in turn the spatial
static data structure, some standard techniques aggregation query) called the Box-Aggregation
can be used (Chiang and Tamassia 1992), for ex- Tree, or BA-tree in short.
ample, the global rebuilding (Overmars 1983) or The BA-tree is an augmented k-d-B-tree
the logarithmic method (Bentley and Saxe 1980). (Robinson 1981). The k-d-B-tree is a disk-based
Aggregation Query, Spatial 67

index structure for multidimensional point points contained in F.box; (2) the points domi-
objects. Unlike the R-tree, the k-d-B-tree indexes nated by the low point of F (in the shadowed
the whole space. Initially, when there are only region of Fig. 1a); (3) the points below the lower A
a few objects, the k-d-B-tree uses a single disk edge of F.box (Fig. 1b); and (4) the points to the
page to store them. The page is responsible for left of the left edge of F.box (Fig. 1c).
the whole space in the sense that any new object, To compute the dominance-sum for points in
wherever it is located in space, should be inserted the rst group, a recursive traversal of subtree(F )
to this page. When the page over ows, it is split is performed. For points in the second group,
into two using a hyperplane corresponding to a in record F a single value (called subtotal) is
single dimension. For instance, order all objects kept, which is the total value of all these points.
based on dimension one and move the half of For computing the dominance-sum in the third
the objects with larger dimension-one values to group, an x-border is kept in F which contains
a new page. Each of these two disk pages is the x positions and values of all these points.
referenced by an index entry, which contains a This dominance-sum is then reduced to a 1D
box: the space the page is responsible for. The dominance-sum query for the border. It is then
two index entries are stored in a newly created suf cient to maintain these x positions in a 1D
index page. As more split happens, the index BA-tree. Similarly, for the points in the fourth
page contains more index entries. group, a y-border is kept which is a 1D BA-tree
For ease of understanding, let s focus the dis- for the y positions of the group s points.
cussion on the 2D space. Figure 1 shows an To summarize, the 2D BA-tree is a k-d-B-
exemplary index page of a BA-tree in the 2D tree where each index record is augmented with
space. As in the k-d-B-tree, each index record a single value subtotal and two 1D BA-trees
is associated with a box and a child pointer. The called x-border and y-border, respectively. The
boxes of records in a page do not intersect and computation for a dominance-sum query at point
their union creates the box of the page. p starts at the root page R. If R is an index node,
As done in the ECDE-tree, each index record it locates the record r in R whose box contains p.
in the k-d-B-tree can be augmented with some A 1D dominance-sum query is performed on the
border information. The goal is that a dominance- x-border of r regarding p:x. A 1D dominance-
sum query can be answered by following a sin- sum query is performed on the y-border of r
gle subtree (in the main branch). Suppose in regarding p:y. A 2D dominance-sum query is
Fig. 1a, there is a query point contained in the performed recursively on page(r.child). The nal
box of record F . The points that may affect the query result is the sum of these three query results
dominance-sum query of a query point in F.box plus r.subtotal.
are those dominated by the upper-right point of The insertion of a point p with value v starts
F.box. Such points belong in four groups: (1) the at the root R. For each record r where r.lowpoint

a b c
B B B
F G F G F G
A A A
C C C

D E H D E H D E H

Aggregation Query, Spatial, Fig. 1 The BA-tree is a k-d-B-tree with augmented border information. (a) Points
affecting the subtotal of F. (b) Points affecting the x-border of F. (c) Points affecting the y-border of F
68 Aggregation Query, Spatial

dominates p, v is added to r.subtotal. For each for data cube range-sum appear in Chung et al.
r where p is below the x-border of r, position (2001) and Geffner et al. (2000). When applied
p:x and value v are added to the x-border. For to this problem, the BA-tree differs from Geffner
each record r where p is to the left of the y- et al. (2000) in two ways. First, it is disk based,
border of r, position p:y and value v are added while (Geffner et al. 2000) presents a main-
to the y-border. Finally, for the record r whose memory structure. Second, the BA-tree partitions
box contains p, p and v are inserted in the the space based on the data distribution, while
subtree(r.child). When the insertion reaches a leaf (Geffner et al. 2000) does partitioning based on
page L, a leaf record that contains point p and a uniform grid.
value v is stored in L.
Since the BA-tree aims at storing only the ag-
Future Directions
gregate information, not the objects themselves,
there are chances where the points inserted are
The update algorithm for the BA-tree is omitted
not actually stored in the index, thus saving stor-
from here, but can be found in Zhang et al.
age space. For instance, if a point to be inserted
(2002). Also discussed in Zhang et al. (2002) are
falls on some border of an index record, there is
more general queries, such as spatial aggregation
no need to insert the point into the subtree at all.
over objects with extent.
Instead, it is simply kept in the border that it falls
The BA-tree assumes that the query region is
on. If the point to be inserted falls on the low
an axis-parallel box. One practical direction of
point of an internal node, there is even no need
extending the solution is to handle arbitrary query
to insert it in the border; rather, the subtotal value
regions, in particular, polygonal query regions.
of the record is updated.
The BA-tree extends to higher dimensions
in a straightforward manner: a d -dimensional Cross-References
BA-tree is a k-d-B-tree where each index record
is augmented with one subtotal value and d  Aggregate Queries, Progressive Approximate
borders, each of which is a (d -1)-dimensional  OLAP, Spatial
BA-tree.

References
Key Applications
Beckmann N, Kriegel HP, Schneider R, Seeger B (1990)
The R -tree: an ef cient and robust access method
One key application of ef cient algorithms for for points and rectangles. In: SIGMOD, Atlantic City,
the spatial aggregation query is interactive GIS pp 322 331
systems. Imagine a user interacting with such a Bentley JL (1980) Multidimensional divide-and-conquer.
Commun ACM 23(4):214 229
system. She sees a map on the computer screen. Bentley JL, Saxe NB (1980) Decomposable searching
Using the mouse, she can select a rectangular problems I: static-to-dynamic transformations. J Algo-
region on the map. The screen zooms in to the rithms 1(4):301 358
selected region. Besides, some statistics about the Chiang Y, Tamassia R (1992) Dynamic algorithms in com-
putational geometry. Proc IEEE Spec Issue Comput
selected region, e.g., the total number of hotels, Geom 80(9):1412 1434
total number of residents, and so on, can be Chung C, Chun S, Lee J, Lee S (2001) Dynamic up-
quickly computed and displayed on the side. date cube for range-sum queries. In: VLDB, Roma,
Another key application is in data mining, pp 521 530
Geffner S, Agrawal D, El Abbadi A (2000) The dynamic
in particular, to compute range sums over data data cube. In: EDBT, Konstanz, pp 237 253
cubes. Given a d -dimensional array A and a Lazaridis I, Mehrotra S (2001) Progressive approximate
query range q, the range-sum query asks for the aggregate queries with a multi-resolution tree struc-
total value of all cells of A in range q. It is ture. In: SIGMOD, Santa Barbara, pp 401 412
Overmars MH (1983) The design of dynamic data struc-
a crucial query for online analytical processing tures. Lecture notes in computer science, vol 156.
(OLAP). The best known in-memory solutions Springer, Heidelberg
Anomaly Detection 69

Papadias D, Kalnis P, Zhang J, Tao Y (2001) Ef -


cient OLAP operations in spatial data warehouses. Ambient Spatial Intelligence
In: Jensen CS, Schneider M, Seeger B, Tsotras VJ
(eds) Advances in spatial and temporal databases,  Geosensor Networks, Qualitative Monitoring
A
7th international symposium, SSTD 2001, Redondo
Beach, July 2001. Lecture notes in computer science, of Dynamic Fields
vol 2121. Springer, Heidelberg, pp 443 459
Robinson J (1981) The K-D-B tree: a search structure
for large multidimensional dynamic indexes. In: SIG-
MOD, Orlando, pp 10 18 Ambiguity
Vitter JS (2001) External memory algorithms and data
structures. ACM Comput Surv 33(2):209 271
Zhang D, Tsotras VJ, Gunopulos D (2002) Ef cient ag-  Retrieval Algorithms, Spatial
gregation over objects with extent. In: PODS, Madi-  Uncertainty, Semantic
son, pp 121 132

Analysis, Robustness
Air Borne Sensors
 Multicriteria Decision-Making, Spatial
 Photogrammetric Sensors

Analysis, Sensitivity
akNN
 Multicriteria Decision-Making, Spatial
 Nearest Neighbor Problem

Anamolies
Algorithm
 Data Analysis, Spatial
 Data Structure

Anchor Points
All-k-Nearest Neighbors
 Way nding, Landmarks
 Nearest Neighbor Problem

Anchors, Space-Time
All-Lanes-Out
 Time Geography
 Contra ow for Evacuation Traf c Management

Anomaly Detection
All-Nearest-Neighbors
 Homeland Security and Spatial Data Mining
 Nearest Neighbor Problem  Outlier Detection
70 Anonymity

Definition
Anonymity
In the context of geographic information and
 Cloaking Algorithms for Location Privacy
ISO/TC 211 vocabulary, an application schema
consists in an application level conceptual
schema rendering to a certain level of detail a
universe of discourse described as data. Such
Anonymity in Location-Based
data is typically required by one or more
Services
applications (ISO/TC211 ISO19109:2005 2005).
Typically, additional information not found in
 Privacy Threats in Location-Based Services
the schema is included in a feature catalogue to
semantically enrich the schema. Levels of details
regarding schemata (models) and catalogues
(data dictionaries) are described in the cross-
Anonymization of GPS Traces
references.

 Privacy Preservation of GPS Traces


Main Text

An application schema documents the content


ANSI NCITS 320-1998 and the logical structure of geographic data along
with manipulating and processing operations of
 Spatial Data Transfer Standard (SDTS) the application to a level of details that allows
developers to set up consistent, maintainable,
and unambiguous geographic databases and
related applications (Brodeur et al. 2000). As
Application such, an application schema contributes to both
the semantics of geographic data and describes
 Photogrammetric Applications the structure of the geographic information in a
computer-readable form. It speci es spatial and
temporal objects and may also specify reference
systems and data quality elements used to depict
geographic features. It also supports the use of
Application Schema
the geographic data appropriately (i.e., tness for
use). Typically, an application schema is depicted
Jean Brodeur1 and Thierry Badard2
1 in a formal conceptual schema language.
Center for Topographic Information, Natural
Resources Canada, Sherbrooke, QC, Canada
2
Department of Geomatic Science, Center for Cross-References
Research in Geomatics (CRG), UniversitØ Laval,
QuØbec, QC, Canada  Modeling with ISO 191xx Standards
 Modeling with Pictogrammic Languages

Synonyms
References
Conceptual model; Conceptual schema; Data
Brodeur J, BØdard Y, Proulx MJ (2000) Modelling geospa-
models; Data schema; ISO/TC 211; Object tial application databases using UML-based repos-
model; Object schema itories aligned with international standards in geo-
Approximation 71

matics. In: Eighth ACM symposium on advances


in geographic information systems, Washington, DC
House
(ACMGIS)
ISO/TC211 ISO19109:2005 (2005) Geographic informa-
sister1 A
tion rules for application schema (ISO)

sister2
front
you
Approximate Aggregate Query back

 Aggregate Queries, Progressive Approximate


brother

left right
Approximation

Thomas Bittner1 and John G. Stell2 Approximation, Fig. 1 Approximation in a frame of


1 reference created by your major body axes
Department of Philosophy; Department of
Geography, State University of New York at
Buffalo, Buffalo, NY, USA including your-brother (yb), your-sister1 (ys1),
2
School of Computing, University of Leeds, your-sister2 (ys1), and your-house (yh), can be
Leeds, UK characterized in terms of their relations to the
cells of the partition. For example, part-of(ys1, ),
disjoint(ys1,fr), disjoint(ys1,br), disjoint(ys1,bl),
Synonyms partly-overlaps(yh, ), partly-overlaps(yh,fr), and
so on. Two objects are indiscernible with respect
Rough approximation; Rough set theory to the underlying partition if they have the same
mereological relations to all cells of the partition.
For example, your two sisters, ys1 and ys2, are
Definition indiscernible when described in terms of the rela-
tions to the cells of the partition P1 D { ,fr,br,bl}
Approximations are representations that describe since they have the same relations to the members
entities in terms of relations to cells in a partition of P1. Notice that in a coarser frame of reference,
which serves as a frame of reference. Approxi- more objects become indiscernible. For example,
mations give raise to an indiscernibility relation: with respect to the partition P2 D {left,right}, all
in the approximation space two entities are your siblings become indiscernible (all three are
indiscernible if and only if they have identical ap- part of left and disjoint from right). Notice also
proximations. Approximations are used as tools that from the facts that partition cells are disjoint
for the representation of objects with indeter- and part-of(ys1, ) and part-of(yb,fb) hold, one
minate boundaries and multi-resolution spatial, can derive that ys1 and yb are disjoint, i.e., the
temporal, and attribute data. structure of the partition can be taken into account
in reasoning processes.

Example
Historical Background
At every moment in time, your body axes create
a partition of space consisting of the cells front- Rough set theory, the formal basis of the theory
left ( ), back-left (bl), front-right (fr), and back- of approximations as reviewed in this entry,
right (br) as depicted in Fig. 1. Every object, was introduced by Pawlak (1982; 1991) as a
72 Approximation

union of all members of I is X , and no distinct


members of I overlap.
6 An arbitrary subset b X can be approxi-
5 mated by a function ’b W I ! fo; po; no. The
value of b [x] is de ned to be fo if x b, it is
4
no if x \ b D ¿, and otherwise the value is po.
3 The three values fo, po, and no stand respectively
2
for full overlap , partial overlap and no over-
lap ; they measure the extent to which b overlaps
1 the members of the partition I of X .

1 2 3 4 5 6
Regional Approximations
x z In spatial representation and reasoning, it is of-
y u ten not necessary to approximate subsets of an
arbitrary set, but subsets of a set with topolog-
Approximation, Fig. 2 Rough approximations of spa- ical or geometric structure. Thus, rather than
tial regions (Bittner and Stell 2002b) considering arbitrary sets and subsets thereof,
regular closed subsets of the plane are considered.
The cells (elements) of the partitions are regular
formal tool for data analysis. The main areas of closed sets which may overlap on their bound-
application are still data mining and data analysis aries, but not their interiors.
(Duentsch and Gediga 2000; Or owska 1998; Consider Fig. 2. Let X D f.x; y/ j 0 < x <
Slezak et al. 2005); however there are successful 7&0 < y < 7g be a regular closed subset of
applications in GIScience (Bittner and Stell the plane and c.0;0/ D f.x; y/ j 0 < x <
2002b; Worboys 1998a, b) and in other areas. 1&0 < y < 1g, c.0;1/ D f.x; y/ j 0 < x <
Ongoing research in rough set theory includes 1&1 < y < 2g, : : : c.7;7/ D f.x; y/ j 6 <
research on rough mereology (Polkowski and x < 7&6 < y < t g a partition of X formed
Skowron 1996) and its application to spatial by the regular closed sets c.0;0/ ; : : : ; c.6;6/ (cells),
reasoning (Polkowski 2004). Rough mereology i.e., I D fc.0;0/ ; : : : ; c.6;6/ g. Two members of X
is a generalization of rough set theory and of the are equivalent if and only if they are part of the
research presented here. interior of the same cell c.i;j / .
The subsets x, y, ·, and u now can be ap-
proximated in terms of their relations to the cells
c.i;j / of I which is represented by the mappings
Scientific Fundamentals
x , y , · , u of signature I ! with D
fo; po; no):
Rough Set Theory
Rough set theory (Or owska 1998; Pawlak 1982,
1991) provides a formalism for approximating I : : : c.2;6/ c.3;6/ : : : c.3;5/ : : :
X D ’x D
subsets of a set when the set is equipped with an : : : po po : : : fo : : :
equivalence relation. An equivalence relation is a
I : : : c.5;4/ c.6;4/ c.5;3/ c.6;3/ : : :
binary relation which is re exive, symmetric, and Y D ’y D
: : : po po po po : : :
transitive. Given a set X , an equivalence relation
on X creates a partition I of X into a set of jointly I : : : c.0;1/ c.1;2/ c.2;2/ c.3;2/ : : :
Z D ’· D
exhaustive and pairwise disjoint subsets. Let [x] : : : po fo po no : : :
be the set of all members of X that are equivalent
to x with respect to, i.e., x D fy 2 X j x yg. I : : : c.0;1/ c.1;1/ c.1;2/ c.1;3/ : : :
U D ’u D
Then, I D f x j x 2 X g is a partition of X : the : : : po no no no : : :
Approximation 73

In GIScience, regular closed sets like X , the ^max no po fo


members of I, as well as x, y, ·, u are usually no no no no
considered to model crisp regions, i.e., regions po no po po A
with crisp and well de ned boundaries. The map- fo no po fo
pings x , y , · , u are called rough approxima-
tions of the (crisp) regions x, y, ·, u with respect
to the partition I. In the reminder, the notions I
These operations extend to elements of
regular closed set , crisp region and region (i.e., the set of functions from I to ) by:
are used synonymously. Non-capitalized letters
are used as variables for regions, and capitalized .X ^min Y /c Ddef .Xc/ ^min .Yc/
letters are used as variables for approximation
.X ^max Y /c Ddef .Xc/ ^max .Yc/ :
mappings, i.e., X is used instead of x to refer
to the rough approximation of x.
In the example it holds that .Z ^min
X /c.2;2/ D no and .Z ^max X /c.2;2/ D po
since .Zc.2;2/ / D po, .Zc.2;2/ / D po,
Indiscernibility po ^min po D no and po ^max po D po.
Given a partition I of a regular set X , each of
the approximation functions X , Y , Z, U stands Reasoning About Approximations
for a whole set of regular subsets R.X / of X . Consider the RCC5 relations (Randell et al. 1992)
For example, X stands for all sets having the disjoint (DR), partial overlap (PO), proper part
approximation X . This set (of regular sets) will (PP), has proper part (PPi), and equal (EQ)
be denoted x D fy 2 R.X / j X D Y g. Corre- between two regions as depicted in Fig. 3. Given
spondingly, one can introduce an indiscernibility two regions x and y, these relations between
relation between regular subsets of X : x and y them can be determined by considering the triple
are indiscernible with respect to the partition I, of boolean values:
x I y, if and only if x and y have the same
approximation with respect to I, i.e., X D Y . .x ^ y ⁄ ?; x ^ y D x; x ^ y D y/ :
Through this indiscernibility relation, the notion
of approximation is closely related to the notion The correspondence between such triples and
of granularity in the sense of Hobbs (1985). the relations DR, PO, PP, PPi, and EQ are given
in Table 1. Notice that these de nitions of RCC5
relations are exclusively formulated in terms of
Operations on Approximations statements about meet operations (intersections).
The domain of regions is equipped with a meet The set of triples is partially ordered by setting
operation interpreted as the intersection of re-
gions. In the domain of approximation functions, .a1 ; a2 ; a3 / < q.b1 ; b2 ; b3 / iff ai < qbi
the meet operation between regions is approx- for i D 1; 2; 3
imated by pairs of greatest minimal, ^min , and
least maximal, ^max , meet operations on approx- Approximation, Table 1 De nition of RCC5 relations
imation mappings (Bittner and Stell 2002a). exclusively using the meet operator (Bittner and Stell
Consider the operations ^min and ^max on the 2002a)
set D ffo; po; nog that are de ned as follows. x^y ⁄? x^y Dx x^y Dy RCC5
F F F DR
T F F PO
^min no po fo
T T F PP
no no no no
T F T PPi
po no no po
T T T EQ
fo no po fo
74 Approximation

where the Boolean values are ordered by F < T. I. Each of the above triples provides an RCC5
The resulting ordering (which is similar to the relation; thus the relation between X and Y
conceptual neighborhood graph (Goodday and can be measured by a pair of RCC5 relations.
Cohn 1994)) is indicated by the arrows in Fig. 3. These relations will be denoted by Rmin .X; Y /
There are two approaches one can take to and Rmax .X ,Y /. One then can prove that the pairs
generalize the RCC5 classi cation from precise (Rmin .X; Y /, R max .X; Y //, which can occur, are
regions to approximations of regions. These two all pairs (a,b) where a b with the exception of
may be called the semantic and the syntactic. (PP,EQ) and (PPi, EQ).
Semantic generalization. One can de ne the Let the syntactic generalization of RCC5 de-
RCC5 relationship between approximations X ned by
and Y to be the set of relationships which occur
between any pair of precise regions having the S Y N .X; Y / D .Rmin .X; Y /; Rmax .X; Y // ;
approximations X and Y . That is, one can de ne
where Rmin and Rmax are de ned as described
SEM .X; Y / D fRC C 5.x; y/ j x 2 X in the previous paragraph. It then follows that
and y 2 Y g : for any approximations X and Y , the two ways
of measuring the relationship of X to Y are
Syntactic generalization. One can take a for- equivalent in the sense that
mal de nition of RCC5 in the precise case which
uses meet operations between regions and gener- SEM .X; Y /
alize this to work with approximations of regions D f 2 RCC5 j Rmin .X; Y / < q
by replacing the meet operations on regions by < qRmax .X; Y /g ;
analogous ones for approximations.
If X and Y are approximations of regions (i.e., where RCC5 is the set {EQ, PP, PPi, PO, DR}
functions from I to ), one can consider the two and is the ordering as indicated by the arrows
triples of Boolean values: in Fig. 3.

.X ^min Y ⁄?; X ^min Y D X; X ^min Y D Y / ;


.X ^max Y ⁄?; X ^max Y D X; X ^max Y D Y /: Key Applications

In the context of approximations of regions, The theoretical framework of rough approx-


the bottom element, ?, is the function from I to imations presented above has been applied
which takes the value no for every element of to geographic informations in various ways,

Approximation, Fig. 3 PP(x,y)


RCC5 relations with
ordering relations

DR(x,y) PO(x,y) PPI(x,y) EQ(x,y)


Approximation 75

including spatial representation of objects with An important special case is the approxima-
indeterminate boundaries, representation of tion of objects with indeterminate boundaries
spatial data at multiple levels of resolution, with respect to so-called egg-yolk partitions A
representation of attribute data at multiple levels (Cohn and Gotts 1996). Here the partition
of resolution, and the representation of temporal consists of three concentric disks, called the
data. central core, the broad boundary, and the exterior.
An egg-yolk partition is chosen such that an
object with indeterminate boundaries has the
Objects with Indeterminate Boundaries relation fo to the central core, the relation po to
Geographic information is often concerned the broad boundary, and the relation no to the
with natural phenomena, cultural, and human exterior cell of the partition.
resources. These domains are often formed by
objects with indeterminate boundaries (Burrough Processing Approximate Geographic
and Frank 1995) such as The Ruhr, The Information at Multiple Levels of Detail
Alps, etc. Natural phenomena, cultural, and Partitions that form frames of references
human resources are not studied in isolation. for rough approximations can be organized
They are studied in certain contexts. In the hierarchically. In Fig. 4, three partitions which
spatial domain, context is often provided by partition the region A at different levels
regional partitions forming frames of reference. of resolution are depicted: {A}, {B; C },
Consider, for example, the location of the {D,E,F ,G,H }. Obviously, parts/subsets of
spatial object The Alps. It is impossible to A can be approximated at different levels
draw exact boundaries for this object. However, of granularity with respect to {A}, {B,C },
in order to specify its location, it is often or {D,E,F ,G,H }. Various approaches of
suf cient to say that parts of The Alps are processing approximations at and across different
located in South Eastern France, Northern Italy, levels of granularity in such hierarchical
Southern Germany, and so on. This means subdivisions have been proposed including
that one can specify the rough approximation (Bittner and Stell 2003; Stell and Worboys 1998;
of The Alps with respect to the regional Worboys 1998a, b).
partition created by the regions of the European
states. This regional partition can be re ned by
distinguishing northern, southern, eastern, and Processing Attribute Data
western parts of countries. It provides a frame From the formal development of rough approx-
of reference and an ordering structure which imations, it should be clear that its application
is used to specify the location of The Alps
and which can be exploited in the represen-
{A}
tation and reasoning process as demonstrated A
above.
The utilization of rough approximations in the
{B,C}
above context allows one to separate two aspects: C
(a) the exact representation of the location of
well-de ned objects using crisp regions and (b)
the nite approximation of the location of objects B
with indeterminate boundaries in terms of their E
relations to the regions of the well-de ned ones.
The approximation absorbs the indeterminacy D F G H {D,E,F,G,H}
(Bittner and Stell 2002b) and allows for determi-
nate representation and reasoning techniques as Approximation, Fig. 4 Partitions at multiple levels of
demonstrated above. resolution (Bittner and Stell 2003)
76 Approximation

Approximation, Fig. 5 A {a}


classi cation tree (left), the a
corresponding lattice of
possible partitions of the
{b, c}
root class (Bittner and Stell
2003) b c
{d, e, f, c} {b, g, h}

d e f g h
{d, e, f, g, h}

is not limited to the approximation of spatial Cross-References


location, but can be applied in the same way
to attribute data. For example, at a coarse level  Representing Regions with Indeterminate
of representing attribute data, one might ignore Boundaries
the distinction between different kinds of roads  Uncertainty, Semantic
(motorways, major roads, minor roads, etc.) and
represent only a single class road. Consider
the classi cation tree in the left part of Fig. 5. References
One can create partitions of the class a (sets
of jointly exhaustive and pairwise disjoint sub- Bittner T (2002) Approximate qualitative temporal rea-
classes of a/ at different levels of resolutions as soning. Ann Math Artif Intell 35(1-2):39 80
Bittner T, Stell JG (2002) Approximate qualitative spatial
indicated in the lattice in the right part of the g- reasoning. Spat Cognit Comput 2(4):435 466
ure. Let a h be subsets of a, then other subsets of Bittner T, Stell JG (2002) Vagueness and rough location.
a can be approximated with respect to the various GeoInformatica 6:99 121
partitions in the ways described above. Again, Bittner T, Stell JG (2003) Strati ed rough sets and
vagueness. In: Kuhn W, Worboys M, Impf S (eds)
see also Bittner and Stell (2003),Stell (2000),Stell Spatial information theory. Cognitive and computa-
and Worboys (1998), Worboys (1998a), Worboys tional foundations of geographic information science.
(1998b). International conference COSIT 03. Springer, Berlin,
pp 286 303
Burrough P, Frank AU (eds) (1995) Geographic objects
with indeterminate boundaries, GISDATA series II.
Temporal Data Taylor and Francis, London
Humans have sophisticated calendars that hierar- Cohn AG, Gotts NM (1996) The egg-yolk represen-
chically partition the timeline in different ways, tation of regions with indeterminate boundaries. In:
Burrough PA, Frank AU (eds) Geographic objects with
for example, into minutes, hours, days, weeks,
indeterminate boundaries, GISDATA series II. Taylor
months, etc. Representations and reasoning about and Francis, London, pp 171 187
the temporal location of events and processes Duentsch I, Gediga G (2000) Rough set data analysis: a
need to take into account that events and pro- road to non-invasive knowledge discovery. Methodos
Publishers, Bangor
cesses often lie skew to the cells of calendar
Goodday JM, Cohn AG (1994) Conceptual neighborhoods
partitions (i.e., x happened yesterday does not in temporal and spatial reasoning. In: ECAI-94 spatial
mean that x started at 12 a.m. and ended at 0 p.m.) and temporal reasoning workshop, Amsterdam
Thus, descriptions of the temporal location of Hobbs J (1985) Granularity. In: Proceedings of the IJCAI
85, Los Angeles
events and processes are often approximate and
Or owska E (ed) (1998) Incomplete information rough
rough in nature rather than exact and crisp. As set analysis. Studies in fuzziness and soft computing,
demonstrated in Bittner (2002) and Stell (2003), vol 13. Physica-Verlag, Heidelberg
rough approximation and reasoning methods of Pawlak Z (1982) Rough sets. Internat J Comput Inform
11:341 356
the sort introduced above can be used to repre-
Pawlak Z (1991) Rough sets: theoretical aspects of rea-
sent and to reason about approximate temporal soning about data. Theory and decision library. Se-
location. ries D, system theory, knowledge engineering, and
ArcGIS: General-Purpose GIS Software 77

problem solving, vol 9. Kluwer Academic Publishers,


Dordrecht/Boston ArcGIS: General-Purpose GIS
Polkowski L, Skowron A (1996) Rough mereology: a new Software
paradigm for approximate reasoning. J Approx Reason A
15(4):333 365
David J. Maguire
Polkowski L (2004) A survey of recent results on
spatial reasoning via rough inclusions. In: Bolc L, ESRI, Redlands, CA, USA
Michalewicz Z, Nishida T (eds) Intelligent media
technology for communicative intelligence, second
international workshop (IMTCI 2004). Lecture notes
in computer science. Springer, Berlin Synonyms
Randell DA, Cui Z, Cohn AG (1992) A spatial logic
based on regions and connection. In: Nebel B, Rich ESRI; GIS software
C, Swartout W (eds) Principles of knowledge rep-
resentation and reasoning. Proceedings of the third
international conference (KR 92). Morgan Kaufmann,
San Mateo, pp 165 176 Definition
Slezak D, Wang G, Szczuka MS, D ntsch I, Yao Y
(eds) (2005) Rough sets, fuzzy sets, data mining, and
granular computing. In: 10th international conference
ArcGIS is a general-purpose GIS software de-
RSFDGrC 2005 (Part I). Lecture notes in computer veloped by Environmental Systems Research In-
science, vol 3641. Springer, Regina, 31 Aug 3 Sept stitute (ESRI). It is an extensive and integrated
2005 software platform technology for building opera-
Stell JG (2003) Granularity in change over time. In: Duck-
ham M, Goodchild M, Worboys M (eds) Foundations
tional GIS. ArcGIS comprises four key software
of geographic information science. Taylor and Francis, parts: a geographic information model for mod-
New York, pp 95 115 eling aspects of the real world; components for
Stell JG (2000) The representation of discrete multi- storing and managing geographic information in
resolution spatial knowledge. In: Cohn AG,
Giunchiglia F, Selman B (eds) Principles of knowledge
les and databases; a set of out-of-the-box appli-
representation and reasoning: proceedings of the cations for creating, editing, manipulating, map-
seventh international conference (KR2000). Morgan ping, analyzing, and disseminating geographic
Kaufmann, San Francisco, pp 38 49 information; and a collection of web services
Stell JG, Worboys MF (1998) Strati ed map spaces: a
formal basis for multi-resolution spatial databases. that provide content and capabilities (data and
In: Poiker TK, Chrisman N (eds) Proceedings functions) to networked software clients. Parts
8th international symposium on spatial data han- of the ArcGIS software can be deployed on mo-
dling (SDH 98). International Geographical Union, bile devices, laptop, and desktop computers and
pp 180 189
Worboys MF (1998) Computation with imprecise servers.
geospatial data. Comput Environ Urban Syst 22: From the end-user perspective, ArcGIS has
85 106 very wide ranging functionality packaged up into
Worboys MF (1998) Imprecision in nite resolution spa- a generic set of menu-driven GIS applications
tial data. GeoInformatica 2:257 279
that implement key geographic work ows. The
applications deal with geographic data creation,
Recommended Reading import and editing, data integration and manage-
ment, data manipulation and organization, and
Pawlak Z, Grzymala-Busse J, Slowinski R, Ziarko RA data analysis, mapping, and reporting. Addition-
(1995) Rough sets Commun ACM 38(11):89 95 ally, ArcGIS Online provides a set of web ser-
vices that can be accessed from any web-enabled
device, browser, or other applications.
ArcGIS is also a developer-friendly product.
ArcExplorer The software is accessible to developers using
several programming paradigms including
 Web Feature Service (WFS) and Web Map within application scripting (Python, VBScript,
Service (WMS) and JScript) and web services end points
78 ArcGIS: General-Purpose GIS Software

ArcGIS: General-Purpose GIS Software, Fig. 1 ArcGIS platform

(SOAP/XML, KML) and as component inter- During the 1980s ESRI devoted its resources
faces (.Net and Java). Developers can personalize to developing and applying a core set of ap-
and customize the existing software applications, plication tools that could be applied in a com-
build whole new applications, embed parts of puter environment to create a geographic infor-
ArcGIS in other software, and interface to other mation system. In 1982 ESRI launched its rst
software systems. commercial GIS software called ARC/INFO. It
combined computer display of geographic fea-
tures, such as points, lines, and polygons (the
Historical Background ARC software), with a database management
tool for assigning attributes to these features (the
ArcGIS is developed by a company called Envi- Henco, Inc., INFO DBMS). Originally designed
ronmental Systems Research Institute, Inc. (ESRI to run on minicomputers, ARC/INFO was the
pronounce each letter, it is not an acronym). rst modern GIS software. As the technology
Headquartered in Redlands, California, and with shifted operating system, rst to UNIX and later
of ces throughout the world, ESRI was founded to Windows, ESRI evolved software tools that
in 1969 by Jack and Laura Dangermond (who took advantage of these new platforms. This
to this day are president and vice-president) as shift enabled users of ESRI software to apply
a privately held consulting rm that specialized the principles of distributed processing and data
in land use analysis projects. The early mission management.
of ESRI focused on the principles of organizing The 1990s brought more change and
and analyzing geographic information; projects evolution. The global presence of ESRI grew
included developing plans for rebuilding the City with the release of ArcView, an affordable,
of Baltimore, Maryland, and assisting Mobil Oil relatively easy-to-learn desktop mapping tool,
in selecting a site for the new town of Reston, which shipped 10,000 copies in the rst 6 months
Virginia. of 1992. In the mid-1990s, ESRI released the
ArcGIS: General-Purpose GIS Software 79

ArcGIS: General-Purpose GIS Software, Fig. 2 ArcGIS desktop user interface

rst of a series of Internet-based map servers ArcReader, ArcView, ArcEditor, and ArcInfo
that published maps, data, and metadata on the (Fig. 2). ESRI has built plug-in extensions (3D
web. These laid the foundation for today s server- Analyst, Spatial Analyst, Network Analyst,
based GIS called ArcGIS Server and a suite of etc.) which add new functional capabilities to
online web services called ArcWeb Services. the main desktop products. There is a desktop
In 1997 ESRI embarked on an ambitious re- runtime called ArcGIS Engine which is a set
search project to reengineer all of its GIS soft- of software components that developers can use
ware as a series of reusable software objects. to build custom applications and embed GIS
Several hundred person years of development functions in other applications. ArcGIS Server is
later, ArcInfo 8 was released in December 1999. also a scalable set of products, namely, ArcGIS
In April 2001, ESRI began shipping ArcGIS 8.1, Server Basic, Standard, and Advanced (with
a family of software products that formed a com- each available in either workgroup or enterprise
plete GIS built on industry standards that pro- editions). The mobile products include ArcPad
vides powerful, yet easy-to-use, capabilities right and ArcGIS Mobile, and to complete the picture,
out of the box. ArcGIS 9 followed in 2003 and there are a suite of ArcGIS Online web services
saw the addition of ArcGIS Server and ArcGIS which provide data and applications to desktop,
Online a part of ArcGIS that ESRI hosts in its server, and mobile clients.
own servers and makes accessible to users over Today, ESRI employs more than 4,000 staff
the web. worldwide, over 1,900 of which are based at the
Although developed as a complete system, worldwide headquarters in California. With 27
ArcGIS 9 is a portfolio of products and is international of ces, a network of more than 50
available in individual parts. The major product other international distributors, and over 2,000
groups are desktop, server, online, and mobile business partners, ESRI is a major force in the
(Fig. 1). ArcGIS desktop has a scalable set of GIS industry. ESRI s lead software architect,
products, in increasing order of functionality, Scott Morehouse, remains the driving force
80 ArcGIS: General-Purpose GIS Software

behind ArcGIS development, and he works and 3D multipatches), rasters, addresses, CAD
closely with Clint Brown, product development entities, topologies, terrains, networks, and
director; David Maguire, product director; and, surveys. In ArcGIS, geographic objects of the
of course, Jack Dangermond, president. same type (primarily the same spatial base di-
mensionality, projection, etc.) are conventionally
organized into a data structure called a layer.
Scientific Fundamentals Several layers can be integrated together using
functions such as overlay processing, merge, and
Fundamental Functional Capabilities map algebra. Geodatabases can be physically
ArcGIS is a very big software system with lit- stored in both le system les and DBMS tables
erally thousands of functional capabilities and (e.g., in DB2, Oracle, and SQL Server).
tens of millions of lines of software code. It is It is convenient to discuss the functional capa-
impossible, and in any case worthless, to try to bilities of ArcGIS in three main categories: geovi-
describe each piece of functionality here. Instead, sualization, geoprocessing, and geodata manage-
the approach will be to present some of the core ment.
foundational concepts and capabilities. Geovisualization, as the name suggests, is
The best way to understand ArcGIS is to start concerned with the visual portrayal of geographic
with the core information (some people use the information. It should come as no surprise
term data) model since it is this which de nes that many people frequently want to visualize
what aspects of the world can be represented geographic information in map or chart form.
in the software and is the push-off point for Indeed many people s primary use for a GIS is
understanding how things can be manipulated. to create digital and/or paper maps. ArcGIS has
ArcGIS s core information model is called the literally hundreds of functions for controlling
geographic database or geodatabase for short. the cartographic appearance of maps. These
The geodatabase de nes the conceptual and include specifying the layout of grids, graticules,
physical model for representing geographic legends, scale bars, north arrows, titles, etc., the
objects and relationships within the system. type of symbolization (classi cation, color, style,
Geodatabases work with maps, models, globes, etc.) to be used, and also the data content that
data, and metadata. Instantiated geodatabases will appear on the nal map. Once authored,
comprise information describing geographic maps can be printed or published in softcopy
objects and relationships that are stored in formats such as PDF or served up over the
les or DBMS. These are bound together at web as live map services. Additionally, many
runtime with software component logic that geographic work ows are best carried out
de nes and controls the applicable processes. using a map-centric interface. For example,
It is this combination of data (form) and editing object geometries, examining the results
software (process) which makes the geodatabase of spatial queries, and verifying the results
object oriented and so powerful and useful. For of many spatial analysis operations can only
example, a geodatabase can represent a linear really be performed satisfactorily using a map-
network such as an electricity or road network. based interface. ArcGIS supports multiple
The data for each link and node in a network is dynamic geovisualization display options such
stored as a separate record. Functions (tools or as 2D geographic (a continuous view of many
operators), such as tracing and editing that work geodatabase layers), 2D layout (geodatabase
with networks, access all the data together and layers presented in paper space), 3D local
organize it into a network data structure prior scenes (strictly a 2.5D scene graph view of local
to manipulation. Geodatabases can represent and regional data), and 3D global (whole-Earth
many types of geographic objects and associated view with continuous scaling of data).
rules and relationships including vector features The term geoprocessing is used to describe
(points, lines, polygons, annotations [map text], the spatial analysis and modeling capabilities
ArcGIS: General-Purpose GIS Software 81

of ArcGIS. ArcGIS adopts a data transforma- ported/exported in many standard formats (e.g.,
tion framework approach to analysis and mod- dxf and mif) and is accessible via standard-based
eling: data + operator = data. For example, streets interfaces (e.g., OGC WMS and WFS) and open A
data + buffer operator = streets_with_buffers data. APIs (application programming interfaces, e.g.,
ArcGIS has both a framework for organizing SQL, .Net, and Java), and the key data structure
geoprocessing and an extensive set of hundreds formats are openly published (e.g., shape le and
of operators that can be used to transform data. geodatabase).
The framework is used to organize operators
(also called functions or tools) and compile and Fundamental Design Philosophy
execute geoprocessing tasks or models (collec- The ArcGIS software has evolved considerably
tions of tools and data organized as a work ow) over the two and a half decades of its existence as
and interfaces to the other parts of ArcGIS that the underlying computer technologies, and con-
deal with geodata management and geovisual- cepts and methods of GIS have advanced. Nev-
ization. The set of operators includes tools for ertheless, many of original design philosophies
classic GIS analysis (overlay, proximity, etc.), are still cornerstones of each new release. Not
projection/coordinate transformation, data man- surprisingly, the original design goals have been
agement and conversion, domain-speci c anal- supplemented by more recent additions which
ysis 3D, surfaces, network, raster, geostatis- today drive the software development process.
tics, linear referencing, cartography, etc. and This section discusses the fundamental design
simulation modeling. Geoprocessing is widely philosophies of ArcGIS in no particular order of
used to automate repetitive tasks (e.g., load 50 signi cance.
CAD les into a geodatabase); integrate data
Commercial off-the-shelf (COTS) hardware.
(e.g., join major_streets and minor_streets data
ArcGIS has always run on industry standard
layers to create a single complete_streets layer),
COTS hardware platforms (including computers
as part of quality assurance work ows (e.g., nd
and associated peripherals, such as digitizers,
all buildings that overlap); and to create process
scanners, and printers). Today, hardware is
models (e.g., simulate the spread of re through
insulated by a layer of operating system
a forested landscape).
software (Windows, Linux, Solaris, etc.), and this
Geodata management is a very important part
constitutes much of the underlying computing
of GIS not least because geodata is a very valu-
platform on which the GIS software runs. The
able and critical component of the most well-
operating system affords a degree of hardware
established operational GIS. It is especially im-
neutrality. ArcGIS runs on well-established
portant in large enterprise GIS implementations
mainstream operating systems and hardware
because the data volumes tend to be enormous,
platforms.
and multiple users often want to share access.
ArcGIS has responded to these challenges by Multiple computer architectures. Parts of the
developing advanced technology to store and ArcGIS software system can run on desktop,
manage geodata in databases and les. An ef - server, and mobile hardware. There is also a
cient storage schema and well-tuned spatial and portion of ArcGIS that is available online for use
attribute indexing mechanisms support rapid re- over the web. The software can be con gured to
trieval of data record sets. Coordinating multiuser run stand alone on desktop and mobile machines.
updates to continuous geographic databases has It can also be con gured for workgroup and
been a thorny problem for GIS developers for enterprise use so that it runs as a client-server
many years. ArcGIS addresses this using an opti- and/or distributed server-based implementation.
mistic concurrency strategy based on versioning. This offers considerable exibility for end-use
The versioning data management software, data deployment. The newest release of the software
schema, and application business logic are a core is adept at exploiting the web as a platform for
part of ArcGIS. The data in ArcGIS can be im- distributed solutions.
82 ArcGIS: General-Purpose GIS Software

GIS professionals. The target user for the core guages including Farsi, French, German, Hebrew,
of ArcGIS is the GIS professional (loosely de- Japanese, Italian, Mandarin, Spanish, and Thai.
ned as a career GIS staff person). GIS pro-
fessionals often build and deploy professional
GIS applications for end users (e.g., planners, Key Applications
utility engineers, military intelligence analysts,
and marketing staff). The software is also fre- ArcGIS has been applied to thousands of dif-
quently incorporated in enterprise IT systems by ferent application arenas over the years. It is a
IT professionals and is increasingly being used by testament to the software s exibility and adapt-
consumers (members of the general public with ability that it has been employed in so many
very limited GIS skills). different application areas. By way of illustration,
this section describes some example application
Generic toolbox with customization. From the areas in which ArcGIS has been widely adopted.
outset ArcGIS was designed as a toolbox of
generic GIS tools. This means that functional Business
GIS capabilities are engineered as self-contained Businesses use many types of information geo-
software components or tools that can be ap- graphic locations, addresses, service boundaries,
plied to many different data sets and application sales territories, delivery routes, and more that
work ows. This makes the software very exible can be viewed and analyzed in map form. ArcGIS
and easily adaptable to many problem domains. software integrated with business, demographic,
The downside to this is that the tools need to geographic, and customer data produces applica-
be combined into application solutions that solve tions that can be shared across an entire organi-
problems, and this adds a degree of complexity. zation. Typical applications include selecting the
In recent releases of the software, this issue has best sites, pro ling customers, analyzing market
been ameliorated by the development of menu- areas, updating and managing assets in real time,
driven applications for key geographic work ows and providing location-based services (LBSs) to
(editing, map production, 3D visualization, busi- users. These applications are used extensively in
ness analysis, utility asset management and de- banking and nancial services, retailing, insur-
sign, etc.). ance, media and press, and real estate sectors.

Education
Strong release control. ArcGIS is a software In the education sector, ArcGIS is applied daily
product which means that it has well-de ned in administration, research, and teaching at the
capabilities, extensive online help and printed primary, secondary, and tertiary levels. In recent
documentation, and add-on materials (third- years, ArcGIS use has grown tremendously, be-
party scripts, application plug-ins, etc.), a coming one of the hottest new research and edu-
license agreement that controls usage, and that cation tools. At the primary and secondary level,
it is released under carefully managed version GIS provides a set life skills and a stimulating
control. This means that additions and updates learning environment. More than 100 higher ed-
to the product are added only at a new release ucation academic disciplines have discovered the
(about two to three times a year). power of spatial analysis with GIS. Researchers
are using GIS to nd patterns in drug arrests,
Internationalized and localized. The core soft- study forest rehabilitation, improve crop produc-
ware is developed in English and is internation- tion, de ne urban empowerment zones, facilitate
alized so that it can be localized into multiple historic preservation, develop plans to control
locales (local languages, data types, documen- toxic waste spills, and much more. GIS is also
tation, data, etc.). The latest release of ArcGIS a useful tool for the business of education. It
has been localized into more than 25 local lan- is used to manage large campuses, plan campus
ArcGIS: General-Purpose GIS Software 83

expansion, and provide emergency campus re- Natural Resources


sponse plans. It is also used by administrators Just as ArcGIS is routinely used in managing
to track graduates and alumni or identify from the built environment, it is also very popular in A
where potential new students may be recruited. measuring, mapping, monitoring, and managing
the natural environment. Again the application
Government areas are very wide ranging extending from agri-
Government organizations throughout the world culture to archaeology, environmental manage-
are under increasing pressure to improve services ment, forestry, marine and coast, mining and
and streamline business practices while adhering earth science, petroleum, and water resources.
to complex political or regulatory requirements. ArcGIS provides a strong set of tools for de-
To do so, they must digest huge amounts of in- scribing, analyzing, and modeling natural system
formation, most of which is tied to a very speci c processes and functions. Interactions and rela-
geographic location a street, an address, a park, tionships among diverse system components can
and a piece of land. As a result, ArcGIS has be explored and visualized using the powerful an-
become indispensable for most large and many alytical and visualization tools that GIS software
small governments. The applications of ArcGIS provides.
are very diverse and their implementation ex-
tremely extensive. Very many major government
organizations at the national, state, regional, and
local levels use ArcGIS. Some of the main ap- Utilities
plication areas include economic development, Utilities (electric, gas, pipeline, telco, and
elections, national policy formulation, homeland water/wastewater) were among the rst users
security, land records and cadastral solutions, law of GIS. Today ArcGIS is involved in many of
enforcement, public safety, public works, state the core activities of utilities including asset
and local, sustainable development, and urban information, business for utilities, network
and regional planning. design, emergency management, electricity
generation and transmission, land management,
Military outage management, pipeline management,
Although ArcGIS is deployed as a niche tool and work force productivity. ArcGIS is used
in some application domains, there is increasing to manage the ow of water and wastewater
realization that enterprise ArcGIS implementa- to service homes and businesses, to track the
tions are providing defense-wide infrastructures, location and condition of water mains, valves,
capable of supporting ghting missions, com- hydrants, meters, storage facilities, sewer mains,
mand and control, installation management, and and manholes. The same systems make keeping
strategic intelligence. GIS plays a critical role up with regulatory compliance, TV inspection
within the defense community in the application data, and condition ratings easier. Competitive
areas of command and control (C2), defense map- pressure and regulatory constraints are placing
ping organizations, base operations and facility increasing demands on pipeline operators to
management, force protection and security, en- operate in an ef cient and responsible manner.
vironmental security and resource management, Responding to these demands requires accessi-
health and hygiene intelligence, surveillance and bility to information regarding geographically
reconnaissance systems, logistics, military engi- distributed assets and operations. ArcGIS is
neering, mine clearance and mapping, mission enabling telecommunication professionals to
planning, peacekeeping operations, modeling and integrate location-based data into analysis and
simulation, training, terrain analysis, visualiza- management processes in network planning and
tion, and chemical, biological, radiological, nu- operations, marketing and sales, customer care,
clear, and high explosive (CBRNE) incident plan- data management, and many other planning and
ning and response. problem-solving tasks.
84 ArcIMS

Future Directions Ease of Use. A key goal of future work is the


continued improvement in ease of use. ArcGIS
ArcGIS is in a constant state of evolution, and has been featured rich for many releases, but
even though the core values and capabilities are a little daunting for new users. A new desktop
well established, there is always a need for im- user interface design and careful attention to
provement and expansion into new application user work ows, combined with improvements in
areas. While there is new development in all interactive performance, should go someway to
areas, the research agenda currently is centered satisfying the requirements of usability.
on the following key topics:

ArcGIS Online. ArcGIS Online is a suite of


web-based applications that combine data and Cross-References
functionality in a way that supplements the
 Distributed Geospatial Computing (DGC)
desktop, server, and mobile software which is
 Internet GIS
installed on computers in user organizations. The
 Web Services, Geospatial
web services include framework and coverage
data typically at global and regional scales
(e.g., global imagery and street centerline
les) and several functional services (e.g., Recommended Reading
geocoding and routing). Initially released with
ArcGIS 9.2, the online services are undergoing ESRI web site. http://www.esri.com
considerable enhancement in both the 2D and 3D ESRI (2006) What is ArcGIS 9.2? Redlands, California
domains. Hoel E, Menon S, Morehouse S (2003) Building a robust
relational implementation of topology. In: Proceedings
of 8th international symposium on spatial and tempo-
ArcGIS Mobile. ArcGIS has included mobile ral databases (SSTD 03), Santorini Island
capabilities for several releases. The current de- Morehouse S (1985) ARC/INFO: a georelational model
for spatial information. In: Proceedings of AUTO-
velopment focus is on enhancing the mobile ca- CARTO 8. ASPRS, Falls Church, pp 388 97
pabilities to support the deployment of profes- Morehouse S (1992) The ARC/INFO geographic informa-
sional mobile applications by end users. A central tion system. Comput Geosci 18(4):435 443
piece of this effort is the development of a GIS Ormsby T, Napoleon E, Burke R, Feaster L, Groessl
C (2004) Getting to know ArcGIS desktop second
server that is responsible for data management edition: basics of ArcView, ArcEditor, and ArcInfo.
and running central applications (e.g., mapping, ESRI Press, Redlands
geocoding, and routing). There are also clients for Zeiler M (1999) Modeling our world: the ESRI guide to
several hardware devices including smartphones database design. ESRI, Redlands
and web browsers.

Distributed GIS. In keeping with the general


progress of building ArcGIS using industrial ArcIMS
strength IT standard technologies, much is being
done to make it possible to integrate the GIS  Web Feature Service (WFS) and Web Map
software into enterprise information systems and Service (WMS)
thus distribute GIS throughout an organization.
This includes additional work on standards
(both GIS domain speci c and general IT), web
services (especially XML), security (for single Arrival, Angle of
sign on authentication), and integration APIs
(such as SQL, .Net, and Java).  Indoor Localization
Atlas Information Systems 85

Arrival, Time of Atlas Information System

 Indoor Localization  Atlas Information Systems


A

Artificial Neural Network Atlas Information Systems


 Self-Organizing Map (SOM) Usage in LULC Lorenz Hurni
Classi cation Institute of Cartography and Geoinformation,
ETH Zurich, Zurich, Switzerland

aR-Tree
Synonyms
 Multi-resolution Aggregate Tree
Atlas information system; Atlas, electronic;
Atlas, interactive; Atlas, multimedia; Atlas,
virtual; Atlas, web; Cartographic information
Asset Pricing system; Earth, digital; Globe, virtual; Google
Map/Earth
 Financial Asset Analysis with Mobile GIS

Definition
Association
Atlas information systems (AIS) are system-
 Co-location Patterns, Algorithms atic, targeted collections of spatially related
knowledge in electronic form, allowing a user-
oriented communication for information and
Association Measures decision-making purposes. As in a conventional
atlas, an AIS mainly consists of a harmonized
collection of maps with different topics, scales,
 Co-location Patterns, Interestingness Measures
and/or from different regions. The maps usually
come in standardized scales or degrees of
generalization, respectively. The different map
types have a common legend and symbolization.
Association Rules, Spatiotemporal The access to the maps is granted through
thematic or geographic indexes. AIS dispose
of special interactive functions for geographic
 Movement Patterns in Spatio-Temporal Data
and thematic navigation, querying, analysis,
and visualization in 2D and 3D mode. Unlike
in many geographic information systems (GIS)
applications, the data in AIS is cartographically
Association Rules: Image Indexing
edited, and the functionality is intentionally
and Retrieval
limited in order to provide a user-targeted
set of data as well as adapted analysis and
 Image Mining, Spatial
visualization functions. In multimedia atlases,
86 Atlas Information Systems

additional related multimedia information, like map series. Modern interactive atlases make use
graphics, diagrams, tables, text, images, videos, of vector data sets and/or statistical data which
animations, and audio documents, are linked are symbolized and visualized on the y (e.g.,
to the geographic entities. The access to data the Tirol Atlas). The atlases evolved from CD-
and functions is provided through a graphical ROM, then DVD to Internet-based or combined
user interface (GUI). Ef cient management interactive atlases.
of the increasing amount of information led
to the development of database-driven AIS.
Whereas some years ago, most AIS were based Scientific Fundamentals
on CD-ROM and DVD, and currently they
are increasingly based on Internet and WWW For the case of interactive maps on new media,
technologies. the classical graphical variables and their expres-
sions are extended as shown in Table 1 (Buziek
2001).
Historical Background The added values and advantages of AIS
compared to paper atlases can be summed up
The technological leap, which caused the tran- as follows: interactivity, navigation, maps as
sition from analog to digital cartography in the interface, exploration, customized/customizable
1980s, has also stimulated the development of to user s need, updatable, dynamics/animation,
interactive atlases. GIS, computer-aided design and multimedia integration (Ormeling 1996;
(CAD) systems, desktop publishing (DTP) sys- Borchert 1999).
tems, and the thereby evoked releases of geomet- The degree of interactivity, a very signi cant
ric and thematic cartographic data were the cata- element of the usability of a cartographic applica-
lysts of both digital and interactive cartographies. tion, is mainly based on the richness of available
It is disputed which atlas was the rst digital one: cartographic functions. Table 2 shows the most
Some authors claim that an early version of the important functions, arranged in ve main groups
Electronic Atlas of Canada was the rst digital (Cron 2006).
atlas (Ormeling 1995); others consider that it was Complementary, AIS can be characterized ac-
the Electronic Atlas of Arkansas (Siekierska and cording to the basic concepts as shown in Table 3.
Williams 1996). Early digital atlases had a rather Today, most atlases still consist of raster and
limited functionality, like name search, zoom, vector-based data, but a transition to relational or
and layer selection. Other atlases like the PC
version of the National Atlas of Sweden were
based on commercial GIS software. In the fol- Atlas Information Systems, Table 1 Aspects of carto-
graphic expression forms (After Buziek 2001)
lowing years, interactive atlases were evolving
with respect to content, data, and technology. In Aspects of cartographic ex- Ordering of aspects
pression forms
several countries national atlases on CD-ROM
were produced, either as a digital version of a Display media Print, screen,
projection
conventional paper atlas (such as the National
Dimension of representation 2D, pseudo-3D, 3D
Atlas of Germany) or as entirely interactive ver-
Degree of dynamics Static,
sion (such as the Atlas of Switzerland). In the cinematographic,
late 1990s, national mapping authorities began dynamic
to publish their topographical map series on CD- Degree of interaction Noninteractive,
ROM/DVD. A third group of atlases are counter partially interactive,
pieces to conventional world or school atlases, interactive
such as the Swiss World Atlas – interactive. Tech- Channels of representation Visual, acoustic, haptic
nologically, the rst atlases were based on raster User-map relation Separating, integrative,
ampli cation of reality
data maps like most of the electronic national
Atlas Information Systems 87

object-oriented vector databases can be observed. Today s AIS comprise of basic topographic
While most atlases are still bound to classic and thematic data and software allowing the cre-
computer interfaces like keyboards, mice, and ation of maps on demand, as in GIS (Da Silva A
screens, many have adapted to the touch user Ramos and Cartwright 2006). However the dif-
interface found in tablets and mobile phones. ferences between AIS and GIS can be perceived
Furthermore, Internet and mobile technologies when comparing three approaches for applying
increase the degree of system distribution. With GIS to the development of AIS (B r and Sieber
respect to interactivity, atlases are arranged into 1999; Schneider 2001). The concept multimedia
three groups: view-only atlases, interactive at- in GIS proposes the integration of multime-
lases, and analytical atlases (Siekierska and Tay- dia functionality in GIS, mainly at the cost of
lor 1991). The latter can be subdivided into sim- user friendliness. GIS in multimedia incorpo-
ple, constructive, and automatic analytical atlas rated explicitly de ned and developed GIS func-
types (Hurni 2006). Furthermore, many atlases tions in a cartographic multimedia environment.
serve no longer as a main, but as one out of The third concept GIS analysis for multimedia
several possible interfaces to the data, e.g., in the atlases combines a GIS, the authoring system,
Google search engine. and a multimedia map extension (GIS data con-
verter) in one common multimedia atlas develop-

Atlas Information Systems, Table 2 Main functions in a multimedia atlas information system (MAIS) (After Cron
2006)
Function groups Function subgroups Functions
General functions Mode selection, language selection, le import/export, printing,
placing bookmarks, hot spots, forward/backward, settings
(preferences), tooltips, display of system state, help, imprint,
home, exit
Navigation functions Spatial navigation Spatial unit selection, enlarge/reduce of map extend (zoom in,
zoom out, magni er), move map (pan, scroll), reference
map/globe, map rotation, determination of location (coordinates,
altitude), line of sight and angle, placement of pins,
spatial/geographical index, spatial/geographical search, tracking
Thematic navigation Theme selection and change, index of themes, search by theme,
theme favorites
Temporal navigation Time selection (positioning of time line, selection of time
period), animation (start/stop, etc.)
Didactic functions Explanatory functions Guided tours, preview, explanatory texts, graphics, images,
sounds, lms
Self-control functions Quizzes, games
Cartographic and Map manipulation Switch on/off layers, switch on/off legend categories,
visualization functions modi cation of symbolization, change of projection
Redlining Addition of user-de ned map elements, addition of labels
(labeling)
Explorative data Modi cation of classi cation, modi cation of appearance/state
analysis (brightness, position of sun), map comparison, selection of data
GIS functions Space- and Spatial query/position query (coordinates query/query of
object-oriented query altitude), measurement/query of distance and area, creating
functions pro le
Thematic query Thematic queries (data/attribute queries), access to statistical
functions table data
Analysis functions Buffering, intersection, aggregation and overlapping (transparent
overlapping/fading), terrain analysis (exposition, slope, etc.)
88 Atlas Information Systems

Atlas Information Systems, Table 3 Main characteristics and concepts of AIS (After Hurni 2006)
Main characteristic Characteristic/functionality Subgroups/remarks Examples
Data type and modeling Raster Raster GIS, map layers in
raster format
Vector Sequentially attributive DTP les with attributes
( le oriented)
Relational-topological Database-based system
(geometry and thematic data)
Object-oriented- OO-geo-databases
topological
Medium, Text Keyboard, alphanumeric
communication channel output
Language Voice output in car navigation
systems
Screen Stationary screens Computer screen
Portable screens Tablet, mobile phone
Degree of system distri- Off-line Local system (client AIS on storage media
bution based)
Online (1:1) Client/server based Swiss World Atlas
interactive
Distributed (1:n) One client/several WMS
servers
Multiple distributed Several clients/several Sensor networks coupled
(n:m) servers with distributed real-time
information systems
Degree of interactivity View only Display of prepared Information maps on the
maps Internet
Interactive Queries by criteria, AIS like Atlas of Switzerland
adjustment of V1
output/display
Simple analytical Combined queries, AIS with GIS functions, data
more complex however prepared (Atlas of
(GIS-like) analysis Switzerland 2 partially)
functions
Constructive analytical Direct processing of Web-GIS, projection web
user data, design services
possibilities
Automatic analytical Automatic data analysis Cartographic real-time web
and rule-based information systems, e.g.,
processing online avalanche maps, radar
precipitation maps,
egocentric real-time
information display on LBS,
online generalization
Priority of cartographic Map information Map functions as main AIS, web map information
functionality systems interaction tools systems
General information Map functions as Digital encyclopediae (e.g.,
systems further query and Encarta), environmental
export/display information systems, real
possibility estate portals, etc.
Atlas Information Systems 89

Atlas Information Systems, Table 4 Differences between GIS and multimedia atlas information systems (MAIS)
(Adapted after Schneider 1999)
GIS AIS A
Target users Experts Nonexperts (and experts)
Use of interface Complex Easy
Control of functions and data By users By authors
Guidance Minimal Distinct
Flow of information Unstructured Structured (narrative)
Main focus Handling, analysis, and presentation of data Visualization of themes
Data Raw, not integrated Edited, integrated
Data model Primary model Secondary model
Covered area Open Usually prede ned: regional, national
Computation time Short to long Short
Purpose Open for any kind of data and analysis Speci c purpose

ment environment. Table 4 shows the main differ- the full text and the maps plus some additional
ences between GIS and AIS (Schneider 1999). interactive maps (Hanewinkel and Tzschaschel
2005). An example for an entirely digital atlas
is the DVD-based Atlas of Switzerland 3 which
Key Applications consists of 1,700 interactive maps derived on the
y from digital topographic, environmental, and
World Atlases and School Atlases statistical base data, combined with multimedia
Interactive world atlases mainly consist of elements (Fig. 1) (Sieber et al. 2009).
physical (and some thematic) maps of the world
with search and index functions. An example is Topographic Atlases
Google Map/Earth which provides detailed base Many national or state mapping authorities pub-
map information with additional services such lish their topographic map series on web-based
as routing functions. User-generated information portals. In most cases, the maps are stored in
may be included, however, with only a limited raster format, but enriched with place names
quality control by the publisher. School atlases and vector line data for routes and trails. Some
are a special type of world atlases which also products offer the possibility of importing own
include more thematic maps and numerous data like GPS tracks or drawing map overlays.
carefully edited exemplary maps for didactic Simple analyses like measurement functions, pro-
purposes. An example is the Swiss World Atlas – les, and 3D displays are possible. Examples
interactive. are the USGS TOPO! Map layers available on
various web portals and the online version of the
National Atlases and Regional Atlases Swiss National Map Series. Many atlases also
National and regional atlases depict a country or display geo-referenced satellite or aerial images
a region in a broad variety of mainly thematic as additional layer.
maps. Today many national atlases have been
converted from the printed to the interactive form. Thematic Atlases and Statistical Atlases
There also exist mixed versions like the National Numerous atlases cover speci c thematic topics
Atlas of Germany which consists of a series of like geology, hydrology, climate, planning, his-
theme books and accompanying CD-ROMs with tory, etc., both in 2D and 3D (Fig. 2). Statistical
90 Atlas Information Systems

Atlas Information Systems, Fig. 1 Example of a multimedia national atlas: soil map in the Atlas of Switzerland,
combined with legend, text, and image information (' Atlas of Switzerland)

atlases allow the visualization of statistical data tions, rule-based display functions, or analysis
as choropleth or diagram maps, usually on the functions. Real-time data, for instance, will be
basis of administrative boundaries (e.g., the In- analyzed automatically and visualized on the y.
teractive Statistical Atlas of Switzerland). The integration of user-generated data will be
simpli ed, and an AIS will become a collabora-
tive platform that can constantly be maintained
and updated by crowdsourcing (shared editing).
Future Directions Furthermore, the quality of atlas maps generated
on the y by web services has increased. Cou-
A major focus will be the further development pled with data stored in distributed geospatial
of spatial data models and structures. Such data databases, AIS will evolve toward a service-
are usually managed and processed in relatively oriented architecture (Iosifescu Enescu 2011).
specialized systems like GIS. Data are enriched
with graphical attributes for cartographic
visualization and thematic attributes. Often, this
attributing is already handled the other way Cross-References
round: thematic data is stored in standardized,
distributed databases and they are additionally  ArcGIS: General-Purpose GIS Software
annotated with spatial information, i.e., they are  Constraint Data, Visualizing
geo-referenced. In the future the functionality  Cyberinfrastructure for Spatial Data Integration
of GIS and general information systems will  Data Analysis, Spatial
therefore converge. Search engines will, for  Data Infrastructure, Spatial
instance, be equipped with more sophisticated  Distributed Geospatial Computing (DGC)
geographical search functions.  Exploratory Visualization
Speci c cartographic functions will be devel-  Generalization and Symbolization
oped further, e.g., automatic generalization func-  Generalization, On-the-Fly
Atlas Information Systems 91

Atlas Information Systems, Fig. 2 Example of a 3D labeled mountain names, and star constellations (' Atlas
atlas: 3D display of geological data as overlay on a digital of Switzerland)
elevation model; combined with legend, automatically

 Google Map/Earth Cron J (2006) Graphische Benutzerober chen inter-


 Internet GIS aktiver Atlanten Konzept zur Strukturierung und
praktischen Umsetzung der Funktionalit t; mit Krite-
 Map Generalization
rienkatalog. Diploma thesis, Hochschule f r Technik
 OGC Web Services und Wirtschaft, Dresden, and ETH Zurich, Dresden
 Open-Source GIS Libraries and Zurich
 Quantum GIS Da Silva Ramos C, Cartwright W (2006) Atlases from
paper to digital medium. In: Stefanakis E, Peterson M,
 Scalable Vector Graphics (SVG)
Armenakis C, Delis V (eds) Geographic hypermedia
 Web Feature Service (WFS) concepts and systems. Lecture notes in geoinformation
 Web Feature Service (WFS) and Web Map and cartography. Springer, Berlin, p 486
Service (WMS) Hanewinkel C, Tzschaschel S (2005) Germany in maps
a multi- purpose toolbox. Paper presented at the 22th
 Web Mapping and Web Cartography
international cartographic conference, A Coruæa, 9 16
 Web Services July 2005
 Web Services, Geospatial Hurni L (2006) Interaktive Karteninformationssysteme
quo vaditis? Kartographische Nachrichten
56(3):136 142
Iosifescu Enescu I (2011) Cartographic web services.
References Diss, Eidgen ssische Technische Hochschule ETH
Z rich, Nr 19824, 2011, ETH, Z rich
Ormeling F (1995) Atlas information systems. Paper pre-
B r H, Sieber R (1999) Towards high standard interactive
sented at the 17th international cartographic confer-
atlases: the GIS and multimedia cartography approach.
ence, Barcelona, 3 9 Sept 1995
Paper presented at the 19th international cartographic
Ormeling F (1996) Functionality of electronic school
conference, Ottawa, 14 21 Aug 1999
atlases. Paper presented at the seminar on electronic
Borchert A (1999) Multimedia atlas concepts. In:
atlases II, Prague/The Hague, 31 July 8 Aug 1996
Cartwright W, Peterson M, Gartner G (eds) Multime-
dia cartography. Springer, Berlin, pp 75 86 Schneider B (1999) Integration of analytical GIS-
Buziek G (2001) Eine Konzeption der kartographischen functions in multimedia atlas information systems.
Visualisierung. Habilitation thesis, Universit t Han- Paper presented at the 19th international cartographic
nover, Hannover conference, Ottawa, 14 21 Aug 1999
92 Atlas, Electronic

Schneider B (2001) GIS functionality in multimedia at-


lases: spatial analysis for everyone. In: 20th inter- Attribute and Positional Error in GIS
national cartographic conference, Beijing, 6 10 Aug
2001. International Cartographic Association, pp 829  Uncertain Environmental Variables in GIS
840
Sieber R, Geisth vel R, Hurni L (2009) Atlas of Switzer-
land 3: a decade of exploring interactive atlas car-
tography. Paper presented at the 24th international
conference of the ICA, Santiago de Chile, 15 21 Nov Autocorrelation, Spatial
2009
Siekierska E, Taylor F (1991) Electronic mapping and Chandana Gangodagamage1 , Xiaobo Zhou2 , and
electronic atlases: new cartographic products for the
information era the electronic atlas of Canada. CISM Henry Lin2
1
J ACSGC 45(1):11 21 Saint Anthony Falls Laboratory, Department of
Siekierska E, Williams D (1996) National atlas of Canada Civil Engineering, University of Minnesota,
on the internet and schoolnet. Paper presented at the Minneapolis, MN, USA
seminar on electronic atlases II, Prague/The Hague, 31 2
July 8 Aug 1996 Department of Crop and Soil Sciences, The
Pennsylvania State University, University Park,
PA, USA

Atlas, Electronic
Synonyms
 Atlas Information Systems
Spatial correlation; Spatial dependence; Spatial
interdependence

Atlas, Interactive
Definition
 Atlas Information Systems
In many spatial data applications, the events at
a location are highly in uenced by the events at
neighboring locations. In fact, this natural incli-
nation of a variable to exhibit similar values as a
Atlas, Multimedia
function of distance between the spatial locations
at which it is being measured is known as spatial
 Atlas Information Systems
dependence. Spatial autocorrelation is used to
measure this spatial dependence. If the variable
exhibits a systematic pattern in its spatial distribu-
tion, it is said to be spatially autocorrelated. The
Atlas, Virtual existence and strength of such interdependence
among values of a speci c variable with refer-
 Atlas Information Systems ence to a spatial location can be quanti ed as a
positive, zero, or negative spatial autocorrelation.
Positive spatial autocorrelation indicates that sim-
ilar values or properties tend to be collocated,
Atlas, Web while negative spatial autocorrelation indicates
that dissimilar values or properties tend to be near
 Atlas Information Systems each other. Random patterns indicate zero spa-
Autocorrelation, Spatial 93

tial autocorrelation since independent, identically across space aggregates similar values or prop-
distributed random data are invariant with regard erties adjacent to each other.
to their spatial location. In classical statistics, the observed samples are A
assumed to be independent and identically dis-
tributed (iid). This assumption is no longer valid
Historical Background for inherently spatially autocorrelated data. This
fact suggests that classical statistical tools like
The idea of spatial autocorrelation is not new linear regression are inappropriate for spatial data
in the literature and was conceptualized as early analysis. The inferences made from such analyses
as 1854, when nebula-like spatial clusters with are either biased, indicating that the observations
distance decay effects were readily apparent in are spatially aggregated and clustered, or overly
mapped cholera cases in the city of London precise, indicating that the number of real inde-
(Moore and Carpenter 1999). This led to the pendent variables is less than the sample size.
hypothesis that the systematic spatial pattern of When the number of real independent variables is
cholera outbreak decayed smoothly with distance less than the sample size, the degree of freedom
from a particular water supply which acted as the of the observed data is lower than that assumed in
source for the disease. This concept of spatial au- the model.
tocorrelation was also documented in the rst law
of geography in 1970 which states: Everything
is related to everything else, but near things are
more related than distant things (Tobler 1970). Scale Dependence of Spatial
Autocorrelation
The strength of spatial autocorrelation is often a
Scientific Fundamentals function of scale or spatial resolution, as illus-
trated in Fig. 1 using black and white cells. High
Spatial autocorrelation is a property of a vari- negative spatial autocorrelation is exhibited in
able that is often distributed over space (Shekhar Fig. 1a since each cell has a different color from
and Chawla 2003). For example, land-surface its neighboring cells. Each cell can be subdivided
elevation values of adjacent locations are gener- into four half-size cells (Fig. 1b), assuming the
ally quite similar. Similarly, temperature, pres- cell s homogeneity. Then, the strength of spa-
sure, slopes, and rainfall vary gradually over tial autocorrelation among the black and white
space, thus forming a smooth gradient of a vari- cells increases, while maintaining the same cell
able between two locations in space. The propen- arrangement. This illustrates that spatial autocor-
sity of a variable to show a smooth gradient relation varies with the study scale.

Autocorrelation, Spatial,
Fig. 1 The strength of
spatial autocorrelation as a
function of scale using:
(a) 4-by-4 raster and
(b) 8-by-8 raster
94 Autocorrelation, Spatial

a b c
10 10 10
20 20 20
30 30 30
40 40 40
50 50 50
60 60 60
10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60

Autocorrelation, Spatial, Fig. 2 Three different data distributions. (a) Binary distributed data in space. (b) Random
uniformly distributed lattice data. (c) Random normally distributed lattice data in space

Differentiating Random Data from Spatial ples, demonstrating the nonexistence of spatial
Data autocorrelation in randomly generated data sets.
Consider three different random distributions and Consider a digital elevation model (DEM)
three lattice grids of 64-by-64 cells (see Fig. 2) of that shows an array of elevations of the land
each distribution: the rst lattice data set (Fig. 2a) surface at each spatial location (i; j / as shown in
is generated from a binary distribution, the sec- Fig. 3a. The values of this data set do not change
ond data set (Fig. 2b) is generated from a uni- abruptly, whereas in Fig. 3b, the difference of
form distribution, and the third data set is gen- the elevations between the location (i; j / and its
erated from a normal distribution. The value at neighborhoods changes abruptly as shown in its
pixel .i; j /; P .i; j / fP .i; j /I i D 1; : : : ; 64; j D corresponding color scheme.
1; : : : ; 64g is assumed to be independent and The variogram, a plot of the dissimilarity
identically distributed. As shown in Fig. 2, the against the spatial separation (i.e., the lag
non-clustering or spatial segregation of the data distance) (Wackernagel 2003) in spatial data,
suggests that the value P .i; j /; where i , j 2 R, quanti es spatial autocorrelation and represents
has no correlation (zero correlation) with itself in how spatial variability changes with lag distance
space. (Devary and Rice 1982). In Fig. 4a, the semi-
Each pixel .i; j / has eight neighborhoods, variogram value of the DEM surface is zero
and each neighborhood also has its own eight at the zero lag distance and increases with
adjacent neighborhoods except the cells located the lag distance, whereas in Fig. 4b, the semi-
on the boundary. The variability of P .i; j / in one variogram value of the random surface varies
direction will not be the same in other directions, erratically with the increasing lag distance.
thus forming an anisotropic system, indicating Contrary to spatial autocorrelation, the semi-
the spatial autocorrelation varies in all directions. variogram has higher values in the absence
The quanti cation of this directional spatial au- of spatial correlation and lower values in
tocorrelation is computationally expensive; thus, the presence of spatial correlation. This
the average of each direction at distance k is indicates that spatial autocorrelation gradually
used to quantify the spatial autocorrelation. The disappears as the separation distance increases
distance k (e.g., k pixel separation of (i; j / in Spatial Autocorrelation (2006) (Fig. 4a). These
any direction) is called lag distance k. The spa- variogram gures are generated at a point (xi )
tial autocorrelation from each spatial entity to by comparing the values at its four adjacent
all other entities can be calculated. The average neighbors such that:
value over all entities of the same lag distance is
expressed as a measure of spatial autocorrelation. 1 X
n

The above three data sets are illustrative exam- .h/ D .·.xi / ·.xi C h//2 ; (1)
N.h/
iD1
Autocorrelation, Spatial 95

a b

200 200 A
400 400

600 600

800 800

1000 1000

1200 1200
200 400 600 800 1000 1200 1400 1600 200 400 600 800 1000 1200 1400 1600

Autocorrelation, Spatial, Fig. 3 (a) One meter spatial resolution LIDAR DEM for South Fork Eel, California. (b)
One meter normally distributed DEM reconstructed for same statistics (i.e., mean and variance) as LIDAR DEM in (a)

where ·.xi / and ·.xi C h) are the values of the where N is the number of cases, XN is the mean
function · located at xi and (xi C h), respec- value of the variable X , Xi and Xj are the values
tively. The four adjacent average of the squared of the variable X at location i and j , respectively,
difference values along the X and Y axes at lag and Wi;j is the weight applied to the comparison
distance h are used in these variogram clouds. between the values at i and j .
The semi-variogram values in Fig. 3a (generated The same equation in matrix notation can also
from lattice data) increase with increasing lag be represented as Shekhar and Chawla (2003):
distance, whereas the semi-variogram values gen-
erated from point data reach a steady state with
increasing lag distance. ·W ·t
I D ; (3)
··t
How to Quantify Spatial Autocorrelation
Several indices can be used to quantify spatial
where · D .x1 x; x2 x; : : : ; xn x/; ·t is the
autocorrelation. The most common techniques
transpose of matrix · and W is the same contigu-
are Moran s I , Geary s C , and spatial autore-
ity matrix of n- by- n that has been introduced in
gression. These techniques are described in the
Eq. 2.
following sections.
An important property in Moran s I is that
Moran’s I Method the index I depends not only on the variable X ,
Moran s I index is one of the oldest (Moran but also on the data s spatial arrangement. The
1950) methods in spatial autocorrelation and is spatial arrangement is quanti ed by the conti-
still the de facto standard method of quantify- guity matrix, W . If a location i is adjacent to
ing spatial autocorrelation (Moran 1950). This location j , then this spatial arrangement receives
method is applied for points or zones with con- the weight of 1; otherwise the value of the weight
tinuous variables associated with them. The value is 0. Another option is to de ne W based on
obtained at a location is compared with the value the squared inverse distance .1=dij2 / between the
of other locations. Morgan s I method can be locations i and j (Lembo 2006). There are also
de ned as: other methods to quantify this contiguity matrix.
P P For example, the sum of the products of the vari-
N i j Wi;j .Xi X/.X N j XN / able x can be compared at locations i and j and
I D P P P ; (2) then weighted by the inverse distance between i
Wi;j .Xi X/ N 2
i j i
and j .
96 Autocorrelation, Spatial

Autocorrelation, Spatial, a
Fig. 4 (a) Variogram for 3
spatial data in Fig. 3a.
(b) Variogram for the
2.5
random data in Fig. 3b

semivariogram
1.5

0.5

0
0 20 40 60 80 100 120 140 160 180 200
lag distance in m
b ×10–3
4

3.5

2.5
semivariogram

1.5

0.5

0
0 20 40 60 80 100 120 140 160 180 200
lag distance in m

The value of I is close to 1 or 1 when where C typically varies between 0 and 2. If


spatial autocorrelation is high or low, respectively the value of one zone is spatially unrelated to
(Lembo 2006). any other zone, the expected value of C will
be 1. If the value of C is less than 1, a negative
spatial autocorrelation is inferred (Lembo 2006).
Geary’s C Method
Geary s C values are inversely related to Moran s
Geary s method (Geary 1954) differs from
I values.
Moran s method mainly in that the interaction
Geary s C and Moran s I will not provide
between i and j is measured not as the deviation
identical inference because the former deals with
from the mean, but by the difference of the values
differences and the latter deals with covariance.
of each observation (Geary 1954). Geary s C can
The other difference between these two methods
be de ned as:
hP P i is that Moran s I gives a global indication while
.N 1/ W ij .X i X j / 2 Geary s C is more sensitive to differences in
i j
C D P P ; (4) small neighborhoods (Lembo 2006).
2 i j Wij .Xi X / N 2
Autocorrelation, Spatial 97

Spatial Autoregression The contiguity matrix (neighborhood matrix)


The disadvantage of linear regression methods is W can be written as follows:
that they assumed iid condition, which is strictly A
not true for spatial data analysis. Research in 100 101 102 103
spatial statistics has suggested many alternative 2 3
methods to incorporate spatial dependence into 100 0 1 1 0
autoregressive regression models, as explained in 101 6 6 1 0 0 1 7
7
the following section. 102 4 1 0 0 1 5
103 0 1 1 0
Spatial Autoregressive Regression Model 2 3
0 0:5 0:5 0
The spatial autoregressive regression model Normalized contiguity matrix 6 0:5 0 0 0:5 7
(SAR) is one of the commonly used autore- !6 4 0:5
7
0 0 0:5 5
gressive models for spatial data regression. 0 0:5 0:5 0
The spatial dependence is introduced into the
autoregressive model using the contiguity matrix.
Based on this model, the spatial autoregressive The normalized contiguity matrix is shown
regression (Shekhar and Chawla 2003) can be in the right panel. We assumed that D
written as: 0:1 ; D 1:0 1I 2I 3I 4 , and column vector
" is equal to a column vector (0.01*rand(4,1)).
Then, Eq. 5 can be written as:
Y D WY CX C"; (5)

where Y D .QX / C"; (6)

1
where Q D .I W /:
Y Observation or dependent variable,
Spatial autoregressive parameter,
Demonstration Using Mathworks Matlab
Y Observation or dependent variable,
Software
W Contiguity matrix, Matlab software (MATLAB 1997) is used to
Regressive coef cient, demonstrate this example. The following ve ma-
Unobservable error term (N (0, I)), trices are de ned for W , , ", , and X as W D
X Feature values or independent variable. 0; 0:5; 0:5; 0I 0:5; 0; 0; 0:5I 0:5; 0; 0; 0:5I 0; 0:5;
0:5; 0 ; D 0:1 ; " D 0:01*rand .4; 1/; D
When D 0, this model is reduced to the 1:0; X D 100I 101I 102I 103 . The above
ordinary least square regression equation. de ned values are substituted into Eq. 6,
The solution for Eq. 5 is not straightforward, which can be shown in Matlab notation as
and the contiguity matrix W gets quadratic in y D i nv.eye.4; 4/ . *W /*( *X C "/.
size compared to the original size of data sample. The solution provides an estimation of y D
However, most of the elements of W are zero; 111:2874; 112:2859; 113:2829; d114:2786 .
thus, sparse matrix techniques are used to speed
up the solution process (Shekhar and Chawla
2003).
Key Applications
Illustration of SAR Using Sample Data Set
The key application of spatial autocorrelation
Consider the following 2-by-2 DEM grid data set.
is to quantify the spatial dependence of spatial
variables. The following are the examples from
100 101 various disciplines where spatial autocorrelation
102 103 is used:
98 Autocorrelation, Spatial

Sampling Design Ecology


The de ned spatial autocorrelation among the Ecologists have used spatial autocorrelation
contiguous or close locations can be used to statistics to study species-environment rela-
answer how large of an area does a single mea- tionships. Spatial autocorrelation analysis is a
surement represent. The answer to such questions useful tool to investigate mechanisms operating
allows estimates of the best places to make fur- on species richness at different spatial scales
ther observations and the number of the samples (Diniz et al. 2003). It has shown that spatial
required in accuracy assessment and provides autocorrelation can be used to explore how
useful information for interpolation to estimate organisms respond to environmental variation
values at unobserved locations (Wei and Chen at different spatial scales.
2004).
Environmental Science
The physical and chemical processes controlling
Cartography the fate and transport of chemicals in the en-
A main assumption on which statistical estimates vironment do not operate at random. All mea-
of uncertainty are usually based is the indepen- surable environmental parameters exhibit spatial
dence of the samples during mapping processes. autocorrelation at certain scales (Haining 1993).
A spatial autocorrelation analysis can be used The patterns of spatial autocorrelation in stream
to test the validity of such an assumption and water quality can be used to predict water quality
the related mapping errors. Adjacent elevation impaired stream segments. The spatial autocor-
differences are usually correlated rather than in- relation test of environmental variables provides
dependent and errors tend to occur in clusters. important information to the policy-makers for
In addition, the level of accuracy of GIS output more ef cient controls of environmental contam-
products depends on the level of spatial autocor- inants.
relation in the source data sets.

Soil Science Risk Assessment


Spatial autocorrelation has been used to study the
domain that a soil water content or soil tempera- It is often the case that the occurrence of natural
ture measurement can represent. The distinctive hazardous events such as oods and forest res
spatial autocorrelations of soil solutes manifest shows spatial dependence. Spatial autocorrela-
the different reaction and migration patterns for tion allows risk assessment of such undesirable
solutes in soil. With a high-resolution soil sam- events. It can be used to estimate the probability
pling, a spatial autocorrelation analysis provides of a forest re, as an example, taking place at a
another means to delineate boundaries between speci c location. Spatial autocorrelation analysis
soil series. is also useful in geographical disease clustering
tests.

Biology Economics
Patterns and processes of genetic divergence Because of the heterogeneity across regions and a
among local populations have been investigated large number of regions strongly interacting with
using spatial autocorrelation statistics to describe each other, economic policy measures are tar-
the autocorrelation of gene frequencies for geted at the regional level. Superimposed spatial
increasing classes of spatial distance. Spatial structures from spatial autocorrelation analysis
autocorrelation analysis has also been used to improve the forecasting performance of nonspa-
study a variety of phenomena, such as the genetic tial forecasting models. The spatial dependence
structure of plant and animal populations and the and spatial heterogeneity can be used to inves-
distribution of mortality patterns. tigate the effect of income and human capital
Automated Map Generalization 99

inequalities on regional economic growth. Spatial Recommended Reading


autocorrelation analysis is also a useful tool to
study the distribution of unemployment rate and Devary JL, Rice WA (1982) Geostatistis software users
manual for geosciences research and engineering. A
price uctuation within a speci c area.
Paci c Northwest Laboratory, Richland, Washington
Diniz JAF, Bini LM, Hawkins BA (2003) Spatial auto-
Political Science correlation and red herrings in geographical ecology.
Glob Ecol Biogeogr 12:53 64
After spatial autocorrelation has been de ned, Geary RC (1954) The contiguity ratio and statistical
geographic units (countries, counties, or census mapping. Inc Stat 5:115 145
tracts) can be used as predictors of the political Haining RP (1993) Spatial data analysis in the social and
outcomes. For example, spatial autocorrelation environmental sciences. Cambridge University Press,
Cambridge
methods can use geographic data coordinates to Lembo J (2006) Lecture 9, Spatial autocorrelation. http://
check if a location has a signi cant impact on the www.css.cornell.edu/courses/620/lecture9.ppt#256,1.
voting choice. Accessed 10 Apr 2006
MATLAB (1997) The student edition of MATLAB, The
language of technical computing, version 5, users
Sociology guide. The Mathworks, Inc., New Jersey
Spatial autocorrelation has been used to study Moore DA, Carpenter TE (1999) Spatial analytical meth-
the correlation between population density and ods and geographic information systems: use in health
research and epidemiology. Epidemiol Rev 21(2):
pathology. The spatial interaction has been taken 143 161
into consideration to study the relationship be- Moran PAP (1950) Notes on continuous stochastic phe-
tween population density and fertility. Spatial nomena. Biometrika 37:17 23
Pace RK, Barry R, Sirmans CF (1998) Spatial statistics
autocorrelation can also be used to investigate the
and real estate. J Real Estate Finance Econ 17:5 13
variations in crime rates and school test scores. Shekhar S, Chawla S (2003) Spatial databases: a tour.
Pearson Education Inc., Prentice Hall, Upper Saddle
River
Spatial Autocorrelation (2006), Natural resources Canada.
Future Directions http://www.pfc.forestry.ca/pro les/wulder/mvstats/
spatial_e.html. Accessed 10 Apr 2006
A good knowledge and understanding of spa- Tobler WR (1970) A computer movie simulating ur-
ban growth in the Detroit region. Econ Geogr 46(2):
tial autocorrelation is essential in many disci- 234 240
plines which often need predictive inferences Wackernagel H (2003) Multivariate geostatistics: an in-
from spatial data. Ignoring spatial autocorrela- troduction with applications, 3rd edn. Springer, New
tion in spatial data analysis and model devel- York, pp. 45 58
Wei H, Chen D (2004) The effect of spatial auto-
opment may lead to unreliable and poor t re- correlation on the sampling design in accuracy assess-
sults. The result of spatial autocorrelation anal- ment: a case study with simulated data. Environ Inform
ysis can guide our experiment design, trend anal- Arch 2:910 919
ysis, model development, and decision-making.
For example, long-term eld data monitoring is
tedious and costly. Spatial autocorrelation analy-
sis would bene t the design and sampling strate- Automated Map Compilation
gies development of optimal eld monitoring
sites. Spatial autocorrelation analysis for vari-
 Con ation of Features
ables of interest can also assist in the selection
of a supermarket location or of a new school.

Cross-References Automated Map Generalization

 Semivariogram Modeling  Feature Extraction, Abstract


100 Automated Vehicle Location (AVL)

Automated Vehicle Location (AVL) Autonomous Navigation

 Intergraph: Real-Time Operational Geospatial  Computer Vision Augmented Geospatial Lo-


Applications calization

Automatic Graphics Generation Autonomy, Space Time

 Information Presentation, Dynamic  Time Geography

Automatic Information Extraction

 Data Acquisition, Automation


B

Balanced Box Decomposition Tree Bayesian Network Integration


(Spatial Index) with GIS

 Nearest Neighbor Problem Daniel P. Ames and Allen Anselmo


Department of Geosciences, Geospatial Software
Lab, Idaho State University, Pocatello, ID, USA

Bayesian Estimation Synonyms


 Indoor Positioning, Bayesian Methods Directed acyclic graphs; Influence diagrams;
Probabilistic map algebra; Probability networks;
Spatial representation of Bayesian networks

Bayesian Inference
Definition
 Hurricane Wind Fields, Multivariate Model-
ing A Bayesian network (BN) is a graphical-
mathematical construct used to probabilistically
model processes which include interdependent
variables, decisions affecting those variables, and
costs associated with the decisions and states
Bayesian Maximum Entropy of the variables. BNs are inherently system
representations and, as such, are often used
 Uncertainty, Modeling with Spatial and Tem- to model environmental processes. Because
poral of this, there is a natural connection between

© Springer International Publishing AG 2017


S. Shekhar et al. (eds.), Encyclopedia of GIS,
DOI 10.1007/978-3-319-17885-1
102 Bayesian Network Integration with GIS

certain BNs and GIS. BNs are represented as probabilistic map algebra) as demonstrated in
a directed acyclic graph structure with nodes Taylor (2003); (2) BN-based classification as
(representing variables, costs, and decisions) and demonstrated in Stassopoulou et al. (1998)
arcs (directed lines representing conditionally and Stassopoulou et al. (1998); (3) using BNs
probabilistic dependencies between the nodes). A for intelligent, spatially oriented data retrieval,
BN can be used for prediction or analysis of real- as demonstrated in Walker et al. (2004) and
world problems and complex natural systems Walker et al. (2005); and (4) GIS-based BN
where statistical correlations can be found decision support system (DSS) frameworks
between variables or approximated using expert where BN nodes are spatially represented in
opinion. BNs have a vast array of applications a GIS framework as presented by Ames et al.
for aiding decision-making in areas such as (2005).
medicine, engineering, natural resources, and
decision management. BNs can be used to model
geospatially interdependent variables as well
Scientific Fundamentals
as conditional dependencies between geospatial
layers. Additionally, BNs have been found to be
As noted above, BNs are used to model reality by
useful and highly efficient in performing image
representing conditional probabilistic dependen-
classification on remotely sensed data.
cies between interdependent variables, decisions,
and outcomes. This section provides an in-depth
explanation of BN analysis using an example BN
Historical Background model called the “Umbrella” BN (Fig. 1), an aug-
mented version of the well-known “Weather” in-
Originally described by Pearl (1988), BNs have
fluence diagram presented by Shachter and Peot
been used extensively in medicine and computer
(1992). This simple BN attempts to model the
science (Heckerman 1997). In recent years,
variables and outcomes associated with the de-
BNs have been applied in spatially explicit
cision to take or not take an umbrella on a
environmental management studies. Examples
given outing. This problem is represented in the
include the Neuse Estuary Bayesian ecological
response network (Borsuk and Reckhow 2000),
Baltic salmon management (Varis and Kuikka
1996), climate change impacts on Finnish A. Forecast B. Weather
watersheds (Kuikka and Varis 1997), the Interior
Columbia Basin Ecosystem Management Project Sunny No Rain
(Lee and Bradshaw 1998), and waterbody Cloudy
eutrophication (Haas 1998). As illustrated Rainy Rain
in these studies, a BN graph structures a
problem such that it is visually interpretable by
stakeholders and decision-makers while serving
as an efficient means for evaluating the probable
outcomes of management decisions on selected C. Take Umbrella D. Satisfaction
variables.
Take
Both BNs and GIS can be used to represent
spatially explicit, probabilistically connected
environmental and other systems; however, the Do not Take
integration of the two techniques has only been
explored relatively recently. BN integration with Bayesian Network Integration with GIS, Fig. 1
GIS typically takes one of the four distinct Umbrella Bayesian decision network structure. A and B
forms: (1) BN-based layer combination (i.e., nature nodes, C a decision node, and D a utility node
Bayesian Network Integration with GIS 103

BN by four nodes. “Weather” and “Forecast” the conditional probability of event A given B,
are nature or chance nodes where “Forecast” multiplied by the probability of event B (Eq. 2):
is conditioned on the state of “Weather” and
“Weather” is treated as a random variable with P .A; B/ D P .AjB/ P .B/ : (2)
a prior probability distribution based on histor- B
ical conditions. “Take Umbrella” is a decision Equation 2 is used to compute the probability
variable that, together with the “Weather” vari- of any state in the Bayesian network given the
able, defines the status of “Satisfaction.” The states of the parent node events. In Eq. 3, the
“Satisfaction” node is known as a “Utility” or probability of state Ax occurring given parent B
“Value” node. This node associates a resultant is the sum of the probabilities of the state of Ax
outcome value (monetary or otherwise) to repre- given state Bi , with i being an index to the states
sent the satisfaction of the individual based on of B, multiplied by the probability of that state
the decision to take the umbrella and whether of B:
or not there is rain. Each of these BN nodes
contains discrete states where each variable state X
P .Ax ; B/ D P .Ax jBi / P .Bi / : (3)
represents abstract events, conditions, or numeric i
ranges of each variable.
The Umbrella model can be interpreted as Similarly, for calculating states with multiple
follows: if it is raining, there is a higher prob- parent nodes, the equation is modified to make
ability that the forecast will predict it will rain. the summation of the conditional probability of
In reverse, through the Bayesian network “back- the state Ax given states Bi and Cj multiplied by
ward propagation of evidence,” if the forecast the individual probabilities of Bi and Cj :
predicts rain, it can be inferred that there is
a higher chance that rain will actually occur. P .Ax ; B; C /
The link between “Forecast” and “Take Um- X
brella” indicates that the “Take Umbrella” deci- D P .Ax jBi ; Cj / P .Bi / P .Cj / : (4)
sion is based largely on the observed forecast. i;j
Finally, the link to the “Satisfaction” utility node
from both “Take Umbrella” and “Weather” cap- Finally, though similar in form, utility nodes
tures the relative gains in satisfaction derived do not calculate probability, but instead calcu-
from every combination of states of the BN late the utility value as a metric or index given
variables. the states of its parent or parents as shown in
Bayesian networks are governed by two math- Eqs. 5 and 6:
ematical techniques: conditional probability and X
Bayes’ theorem. U.A; B/ D U.AjBi / P .Bi / (5)
Conditional probability is defined as the prob- i
ability of one event given the occurrence of an-
other event and can be calculated as the joint
probability of the two events occurring divided U.A; B; C /
by the probability of the second event: X
D U.AjBi ; Cj / P .Bi / P .Cj / : (6)
P .A; B/ i;j
P .AjB/ D : (1)
P .B/
The second equation that is critical to BN
From Eq. 1, the fundamental rule for proba- modeling is Bayes’ theorem:
bility calculus and the downward propagation of
evidence in a BN can be derived. Specifically, it is P .BjA/ P .A/
P .AjB/ D : (7)
seen that the joint probability of A and B equals P .B/
104 Bayesian Network Integration with GIS

The conditional probability inversion repre- Bayesian Network Integration with GIS, Table 3
sented here allows for the powerful technique Satisfaction utility conditioned on rain and the “Take
Umbrella” decision
of Bayesian inference, for which BNs are par-
ticularly well suited. In the Umbrella model, Satisfaction
inferring a higher probability of a rain given a Weather Take Umbrella Satisfaction
rainy forecast is an example application of Bayes’ No Rain Take 20 units
theorem. No Rain Do not Take 100 units
Connecting each node in the BN is a condi- Rain Take 70 units
tional probability table (CPT). Each nature node Rain Do not Take 0 units
(state variable) includes a CPT that stores the
probability distribution for the possible states of
the variable given every combination of the states Table 3 is a utility table defining the relative
of its parent nodes (if any). These probability gains in utility (in terms of generic “units” of
distributions can be assigned by frequency anal- satisfaction) under all of the possible states of
ysis of the variables and expert opinion based on the BN. Here, satisfaction is highest when there
observation or experience, or they can be set to is no rain and the umbrella is not taken and
some “prior” distribution based on observations lowest when the umbrella is not taken but it does
of equivalent systems. rain. Satisfaction “units” are in this case assigned
Tables 1 and 2 show CPTs for the Umbrella as arbitrary ratings from 0 to 100, but in more
BN. In Table 1, the probability distribution of complex systems, utility can be used to represent
rain is represented as 70% chance of no rain and monetary or other measures.
30% chance of rain. This CPT can be assumed Following is a brief explanation of the imple-
to be derived from historical observations of the mentation and use of the Umbrella BN. First it
frequency of rain in the given locale. Table 2 is useful to compute P (Forecast D Sunny) given
represents the probability distribution of the pos- unknown Weather conditions as follows:
sible weather forecasts (“Sunny,” “Cloudy,” or
“Rainy”) conditioned on the actual weather event.
For example, when it actually rained, the prior P .Forecast D Sunny/
forecast called for “Rainy” 60% of the time, X
D P .Forecast
“Cloudy” 25% of the time, and “Sunny” 15% iDNoRain; Rain
of the time. Again, these probabilities can be
derived from historical observations of prediction D SunnyjWeatheri / P .Weatheri /
accuracies or from expert judgment. D 0:7 0:7 C 0:15 0:3 D 0:535 D 54%:

Bayesian Network Integration with GIS, Table 1 Next P (Forecast D Cloudy) and P (Forecast
Probability of rain D Rainy) can be computed as
Weather
No rain Rain P .Forecast D Cloudy; Weather/
70% 30%
D 0:2 0:7 C 0:25 0:3 D 0:215 D 22%

Bayesian Network Integration with GIS, Table 2 P .Forecast D Cloudy; Weather/


Forecast probability conditioned on rain
D 0:1 0:7 C 0:6 0:3 D 0:25 D 25% :
Forcast
Weather Sunny Cloudy Rainy
Finally, evaluate the “Satisfaction” utility un-
No rain 70% 20% 10%
der both possible decision scenarios (take or leave
Rain 15% 25% 60%
the umbrella):
Bayesian Network Integration with GIS 105

U.SatisfactionjTakeUmbrella D Take/ As outlined above, GIS-based BN analysis


X typically takes one of the four distinct forms
D U.SatisfactionjTakeUmbrella; Weatherj / including:
i;j
P .TakeUmbreallai / P .Weatherj / • Probabilistic map algebra B
• Image classification
D 20 1:0 0:7C100 0:0 0:7C70 1:0 0:3
• Automated data query and retrieval
C 0 0:0 0:3 D 35 : • Spatial representation of BN nodes

Similarly, the utility of not taking the umbrella A brief explanation of the scientific fundamentals
is computed as of each of these uses is presented here.

U.Satisfaction; TakeUmbrella Probabilistic Map Algebra


D NoTake; Weather/ Probabilistic map algebra involves the use of a
BN as the combinatorial function used on a cell-
D 20 0:0 0:7C100 1:0 0:7C70 0:0 by-cell basis when combining raster layers. For
0:3 C 0 1:0 0:3 D 70 example, consider the ecological habitat mod-
els described by Taylor (2003). Here, several
Clearly, the higher satisfaction is predicted for geospatial raster data sets are derived represent-
leaving the umbrella at home, thereby providing ing proximity zones for human-caused landscape
an example of how a simple BN analysis can aid disturbances associated with the development of
the decision-making process. While the Umbrella roads, wells, and pipelines. Additional data layers
BN presented here is quite simple and not partic- representing known habitat for each of several
ularly spatially explicit, it serves as a generic BN threatened and endangered species are also devel-
example. Specific application of BNs in GIS is oped and overlaid on the disturbance layers. Next,
presented in the following section. a BN was constructed representing the probabil-
ity of habitat risk conditioned on both human dis-
Key Applications turbance and habitat locations. CPTs in this BN
were derived from interviews with acknowledged
As discussed before, integration of GIS and BNs ecological experts in the region. Finally, this BN
is useful in any BN which has spatial compo- was applied on a cell-by-cell basis throughout the
nents, whether displaying a spatially oriented study area, resulting in a risk probability map for
BN, using GIS functionality as input to a BN, or the region for each species of interest.
forming a BN from GIS analysis. Given this, the The use of BNs in this kind of probabilistic
applications of such integration are only limited map algebra is currently hindered only by the
by that spatial association really. One example lack of specialized tools to support the analysis.
mentioned above of such a spatial orientation has However, the concept holds significant promise
showed usefulness of a watershed management as an alternative to the more traditional GIS-
BN, but there are other types of BNs which based “indicator analysis” where each layer is
may benefit from this form of integration. For reclassified to represent an arbitrary index and
instance, many ecological, sociological, and ge- then summed to give a final metric (often on a
ological studies which might benefit from a BN 1 to 100 scale of either suitability or unsuitabil-
also could have strong spatial associations. An- ity). Indeed, the BN approach results in a more
other example might be that traffic analysis BNs interpretable probability map. For example, such
have very clear spatial associations often. Finally, an analysis could be used to generate a map
even BNs trying to characterize the spread of of the probability of landslide conditioned on
diseases in epidemiology would likely have clear slope, wetness, vegetation, etc. Certainly a map
spatial association. that indicates percent chance of landslide could
106 Bayesian Network Integration with GIS

Bayesian Network Integration with GIS, Fig. 2 The East Canyon Creek BDN from Ames et al. (2005), as seen in
the GeNIe (Decision Systems Laboratory 2006) graphical node editor application

be more informative for decision-makers than an given raster cell based on the input layers. The
indicator model that simply displays the sum of application of the final BN model to predict
some number of reclassified indicators. land cover or other classifications at an unknown
point is similar to the probabilistic map algebra
Image Classi cation described previously.
In the previous examples, BN CPTs are derived
from historical data or information from experts.
However, many BN applications make use of the Automated Data Query and Retrieval
concept of Bayesian learning as a means of au- In the case of application of BNs to automated
tomatically estimating probabilities from existing query and retrieval of geospatial data sets, the
data. BN learning involves a formal automated goal is typically to use expert knowledge to
process of “creating” and “pruning” the BN node- define the CPTs that govern which data layers
arc structure based on rules intended to maximize are loaded for visualization and analysis. Using
the amount of unique information represented by this approach in a dynamic web-based mapping
the BN CPTs. In a GIS context, BN learning system, one could develop a BN for the display of
algorithms have been extensively applied to im- layers using a CPT that indicates the probability
age classification problems. Image classification that the layer is important, given the presence or
using a BN requires the identification of a set absence of other layers or features within layers at
of input layers (typically multispectral or hyper- the current view extents. Such a tool would sup-
spectral bands) from which a known set of objects plant the typical approach which is to activate or
or classifications are to be identified. deactivate layers based strictly on “zoom level.”
Learning data sets include both input and For example, consider a military GIS mapping
output layers where output layers clearly indicate system used to identify proposed targets. A BN-
features of the required classes (e.g., polygons in- based data retrieval system could significantly
dicating known land cover types). A BN learning optimize data transfer and bandwidth usage by
algorithm applied to such a data set will produce only showing specific high-resolution imagery
an optimal (in BN terms) model for predicting when the probability of needing that data is raised
land cover or other classification schemes at a due to the presence of other features which in-
Bayesian Network Integration with GIS 107

Bayesian Network Integration with GIS, Fig. 3 (a) East Canyon displayed with the East Canyon BN overlain on it.
(b) Same, but with the DEM layer turned off and the BN network lines displayed

dicate a higher likelihood of the presence of the in Fig. 2. This BN is a model of streamflow
specific target. (FL_TP and FL_HW) at both a wastewater
BN-based data query and retrieval systems can treatment plant and in the stream headwaters,
also benefit from Bayesian learning capabilities conditional on the current season (SEASON).
by updating CPTs with new information or ev- Also the model includes estimates of phosphorus
idence observed during the use of the BN. For concentrations at the treatment plant and in the
example, if a user continually views several data headwaters (PH_TP and PH_HW) conditional
sets simultaneously at a particular zoom level or on the season and also on operations at both the
in a specific zone, this increases the probability treatment plant (OP_TP) and in the headwaters
that those data sets are interrelated and should (OP_HW). Each of these variables affects phos-
result in modified CPTs representing those con- phorus concentrations in the stream (PH_ST)
ditional relationships. and ultimately reservoir visitation (VIS_RS).
Costs of operations (CO_TP and CO_HW) as
Spatial Representation of BN Nodes well as revenue at the reservoir (REV_RS) are
Many BN problems and analyses though not represented as utility nodes in the BN.
completely based on geospatial data have a clear Most of the nodes in this BN (except for
geospatial component and as such can be mapped SEASON) have an explicit spatial location (i.e.,
on the landscape. This combined BN-GIS they represent conditions at a specific place). Be-
methodology is relatively new but has significant cause of this intrinsic spatiality, the East Canyon
potential for helping improve the use and under- BN can be represented in a GIS with points
standing of a BN. For example, consider the East indicating nodes and arrows indicating the BN
Canyon Creek BN (Ames et al. 2005) represented arcs (i.e., Fig. 3). Such a representation of a BN
108 Bayesian Spatial Regression

within a GIS can give the end users a greater Taylor KJ (2003) Bayesian belief networks: a conceptual
understanding of the context and meaning of the approach to assessing risk to habitat. Utah State Uni-
versity, Logan
BN nodes. Additionally, in many cases, it may be Varis O, Kuikka S (1996) An influence diagram approach
that the BN nodes correspond to specific geospa- to Baltic salmon management. In: Proceedings of the
tial features (e.g., a particular weather station) in conference on decision analysis for public policy in
which case spatial representation of the BN nodes Europe, INFORMS decision analysis society, Atlanta
Walker A, Pham B, Maeder A (2004) A Bayesian frame-
in a GIS can be particularly meaningful. work for automated dataset retrieval. In: Geographic
information systems. 10th International Multimedia
Future Directions Modelling Conference (MMM), Brisbane, p 138
Walker A, Pham B, Moody M (2005) Spatial Bayesian
learning algorithms for geographic information re-
It is expected that research and development of trieval. In: Proceedings 13th annual ACM international
tools for the combined integration of GIS and workshop on geographic information systems, Bre-
BNs will continue in both academia and com- men, pp 105–114
mercial entities. New advancements in each of
the application areas described are occurring on a
regular basis and represent an active and interest- Recommended Reading
ing study area for many GIS analysts and users.
Ames DP (2002) Bayesian decision networks for water-
shed management. Utah State University, Logan
References Norsys Software Corp (2006) Netica Bayesian belief
network software. Acquired from http://www.norsys.
Ames DP, Neilson BT, Stevens DK, Lall U (2005) Us- com/
ing Bayesian networks to model watershed manage- Stassopoulou A, Caelli T (2000) Building detection using
ment decisions: an East Canyon Creek case study. J Bayesian networks. Int J Pattern Recognit Artif Intell
Hydroinform 7:267–282. IWA Publishing 14(6):715–733
Borsuk ME, Reckhow KH (2000) Summary description of
the Neuse estuary Bayesian ecological response net-
work (Neu-BERN). http://www2.ncsu.edu/ncsu/CIL/
WRRI/neuseltm.html. 26 Dec 2001
Haas TC (1998) Modeling waterbody eutrophication with Bayesian Spatial Regression
a Bayesian belief network. Working paper, School
of Business Administration, University of Wisconsin,  Bayesian Spatial Regression for Multisource
Milwaukee
Heckerman D (1997) Bayesian networks for data mining.
Predictive Mapping
Data Mining Knowl Discov 1:79–119. MapWindow
Open Source Team (2007). MapWindow GIS 4.3 Open
Source Software. Accessed 06 Feb 2007 at the Map-
Window Website: http://www.mapwindow.org/ Bayesian Spatial Regression for
Kuikka S, Varis O (1997) Uncertainties of climate change Multisource Predictive Mapping
impacts in Finnish watersheds: a Bayesian network
analysis of expert knowledge. Boreal Environ Res
2:109–128 Andrew O. Finley1 and Sudipto Banerjee2
1
Lee DC, Bradshaw GA (1998) Making monitor- Department of Forestry and Department of
ing work for managers: thoughts on a concep- Geography, Michigan State University, East
tual framework for improved monitoring within
broad-scale ecosystem management. http://icebmp.
Lansing, MI, USA
2
gov/spatial/lee_monitor/preface.html (26 Dec 2001) Biostatistics, School of Public Health, The
Pearl J (1988) Probabilistic reasoning in intelligent sys- University of Minnesota, A460 Mayo Bldg.
tems: networks of plausible inference. Morgan Kauf- MMC303, Minneapolis, MN, USA
mann, San Francisco
Shachter R, Peot M (1992) Decision making using prob-
abilistic inference methods. In: Proceedings of the
eighth conference on uncertainty in artificial intelli- Synonyms
gence, Stanford, pp 275–283
Stassopoulou A, Petrou M, Kittler J (1998) Application of
a Bayesian network in a GIS based decision making Bayesian spatial regression; Pixel-based predic-
system. Int J Geograph Inf Sci 12(1):23–45 tion; Spatial regression
Bayesian Spatial Regression for Multisource Predictive Mapping 109

Definition equipment, information theory and processing


methodology, communications theory and
Georeferenced ground measurements for devices, space and airborne vehicles, and large-
attributes of interest and a host of remotely sensed systems theory and practices for the purpose
variables are coupled within a Bayesian spatial of carrying out aerial or space surveys of the B
regression model to provide predictions across earth’s surface” (National Academy of Sciences
the domain of interest. As the name suggests, 1970) p1. In the nearly four decades since this
multisource refers to multiple sources of data definition was offered, every topic noted has
which share a common coordinate system and enjoyed productive research and development.
can be linked to form sets of regressands or As a result, a diverse set of disciplines routinely
response variables, y.s/, and regressors or covari- use remotely sensed data including natural
ates, x(s), where the s denotes a known location in resource management, hazard assessment,
R2 (e.g., easting-northing or latitude-longitude). environmental assessment, precision farming and
Interest here is in producing spatially explicit agricultural yield assessment, coastal and oceanic
predictions of the response variables using the monitoring, freshwater quality assessment, and
set of covariates. Typically, the covariates can be public health. Several key publications document
measured at any location across the domain of these advancements in remote sensing research
interest and help explain the variation in the set of and application including Remote Sensing of
response variables. Within a multisource setting, Environment, Photogrammetric Engineering and
covariates commonly include multitemporal Remote Sensing, International Journal of Remote
spectral components from remotely sensed im- Sensing, and IEEE Transactions on Geoscience
ages, topographic variables (e.g., elevation, slope, and Remote Sensing.
aspect) from a digital elevation model (DEM), With the emergence of highly efficient ge-
and variables derived from vector or raster maps ographical information system (GIS) databases
(e.g., current or historic land use, distance to and associated software, the modeling and anal-
stream or road, soil type, etc.). Numerous meth- ysis of spatially referenced data sets have also
ods have been used to map the set of response received much attention over the last decade. In
variables. The focus here is linking the y.s/ and parallel with the use of remotely sensed data,
x.s/ through Bayesian spatial regression models. spatially referenced data sets and their analysis
These models provide unmatched flexibility for using GIS are often an integral part of scientific
partitioning sources of variability (e.g., spatial, and engineering investigations; see, for example,
temporal, random), simultaneously predicting texts in geological and environmental sciences
multiple response variables (i.e., multivariate or (Webster and Oliver 2001), ecological systems
vector spatial regression), and providing access (Scheiner and Gurevitch 2001), digital terrain
to the full posterior predictive distribution of cartography (Jones 1997), computer experiments
any base map unit (e.g., pixel, multipixel, or (Santner et al. 2003), and public health (Cromley
polygon). This entry offers a brief overview of and McLafferty 2002). The last decade has also
remotely sensed data which is followed by a seen significant development in statistical mod-
more in-depth presentation of Bayesian spatial eling of complex spatial data; see, for example,
modeling for multisource predictive mapping. the texts by Chilées and Delfiner (1999), Cressie
Multisource forest inventory data is used to (1993), Möller (2003), Schabenberger and Got-
illustrate aspects of the modeling process. way (2004), and Wackernagel (2006) for a variety
of methods and applications.
A new approach that has recently garnered
Historical Background popularity in spatial modeling follows the
Bayesian inferential paradigm. Here, one
In 1970, the National Academy of Sciences constructs hierarchical (or multilevel) schemes by
recognized remote sensing as “the joint effects assigning probability distributions to parameters
of employing modern sensors, data-processing a priori, and inference is based upon the
110 Bayesian Spatial Regression for Multisource Predictive Mapping

distribution of the parameters conditional upon and spectral change. As the pixel size decreases,
the data a posteriori. By modeling both the the signal is reduced and so too is the sensor’s
observed data and any unknown regressor ability to detect changes in brightness. Scale
or covariate effects as random variables, the refers to the geographic extent of the image or
hierarchical Bayesian approach to statistical scene recorded by the sensor. Scale and spatial
analysis provides a cohesive framework for resolution hold an inverse relationship; that is,
combining complex data models and external the greater the spatial resolution, the smaller the
knowledge or expert opinion. A theoretical extent of the image.
foundation for contemporary Bayesian modeling In addition to the academic publications
can be found in several key texts, including noted above, numerous texts (see, e.g., Campbell
Banerjee et al. (2004), Carlin and Louis (2000), 2006; Mather 2004; Richards and Xiuping 2005)
Gelman et al. (2004), and Robert (2001). provide detail on acquiring and processing
remotely sensed imagery for use in prediction
models. The modeling illustrations offered in this
Scientific Fundamentals entry use imagery acquired from the Thematic
Mapper (TM) and Enhanced Thematic Mapper
This entry focuses on predictive models that use Plus (ETM+) sensors mounted on the Landsat 5
covariates derived from digital imagery captured and Landsat 7 satellites, respectively (see, e.g.,
by sensors mounted on orbiting satellites. These http://landsat.gsfc.nasa.gov for more details).
modern spaceborne sensors are categorized as These are considered mid-resolution sensors
either passive or active. Passive sensors detect because the imagery has moderate spatial,
the reflected or emitted electromagnetic radiation radiometric, and spectral resolution. Specifically,
from natural sources (typically solar energy), the sensors record reflected or emitted radiation
while active sensors emit energy that travels to in blue-green (band 1), green (band 2), red
the surface feature and is reflected back toward (band 3), near-infrared (band 4), mid-infrared
the sensor, such as radar or light detection and (bands 5 and 7), and far-infrared (band 6)
ranging (LIDAR). The discussion and illustration portions of the electromagnetic spectrum. Their
covered here focus on data from passive sensors, radiometric resolution within the bands records
but can be extended to imagery obtained from brightness at 265 levels (i.e., 8 bits) with a spatial
active sensors. resolution of 30 30 m pixels (with the exception
The resolution and scale are additional sensor of band 6 which is 120 120). The scale of these
characteristics. There are three components to images is typically 185 km wide by 170 km long,
resolution: (1) spatial resolution refers to the size which is ideal for large-area moderate-resolution
of the image pixel, with high spatial resolution mapping.
corresponding to small pixel size; (2) radiometric In addition to the remotely sensed covariates,
resolution is the sensor’s ability to resolve levels predictive models require georeferenced mea-
of brightness, and a sensor with high radiometric surements of the response variables of interest.
resolution can distinguish between many levels of Two base units of measure and mapping are
brightness; and (3) spectral resolution describes commonly encountered: locations that are areas
the sensor’s ability to define wavelength intervals, or regions with well-defined neighbors (such as
and a sensor with high spectral resolution can pixels in a lattice, counties in a map, etc.), whence
record many narrow wavelength intervals. These they are called areally referenced data, or loca-
three components are related. Specifically, higher tions that are points with coordinates (latitude-
spatial resolution (i.e., smaller pixel size) results longitude, easting-northing, etc.), in which case
in lower radiometric and/or spectral resolution. In they are called point referenced or geostatistical.
general terms, if pixel size is large, the sensor Statistical theory and methods play a crucial role
receives a more robust signal and can then dis- in the modeling and analysis of such data by
tinguish between a smaller degree of radiometric developing spatial process models, also known
Bayesian Spatial Regression for Multisource Predictive Mapping 111

as stochastic process or random function models, K.si ; sj I / ni;j D1 is the n n covariance matrix
that help in predicting and estimating physical with .i; j /-th element given by K.si ; sj I /.
phenomena. This entry deals with the latter – Clearly K.s; s0 I / cannot be just any function;
modeling of point-referenced data. it must ensure that the resulting w matrix is
The methods and accompanying illustration symmetric and positive definite. Such functions B
presented here provide pixel-level prediction at are known as positive definite functions and are
the lowest spatial resolution offered in the set characterized as the characteristic function of a
of remotely sensed covariates. In the simplest symmetric random variable (due to a famous
setting, it is assumed that the remotely sensed theorem due to Bochner). Further technical
covariates cover the entire area of interest, re- details about positive definite functions can be
ferred to as the domain, D. Further, all covariates found in Banerjee et al. (2004), Chilées and
share a common spatial resolution (not necessar- Delfiner (1999), and Cressie (1993).
ily common radiometric or spectral resolution). For valid inference on model parameters
Finally, each point-referenced location s, in the and subsequent prediction model, (1) requires
set S D fs1 ; : : : ; sn g, where a response variable that the underlying spatial random field be
is measured must coincide with a covariate pixel. stationary and isotropic. Stationarity, in spatial
In this way, the elements in the n 1 response modeling contexts, refers to the setting when
vector, y D y.si / niD1 , are uniquely associated K.s; s0 I / D K.s s0 I /; that is, the covariance
with the rows of the n p covariate matrix, function depends upon the separation of the
X D xT .si / niD1 . This statement suggest that sites. Isotropy goes further and specifies
given the N pixels which define D, n of them K.s; s0 / D 2 .s; s0 I /, where k s s0 k is the
are associated with a known response value, and distance between the sites. Usually one further
n D N n require prediction. This is the typical specifies K.s; s0 I / D 2
.s; s0 I / where
setup for model-based predictive mapping. . I / is a correlation function and includes
The univariate spatial regression model for parameters quantifying rate of correlation decay
point-referenced data is written as and smoothness of the surface w.s/. Then
Var.w.s// D 2 represents a spatial variance
component in the model in (1). A very versatile
y.s/ D xT .s/ C w.s/ C .s/ ; (1)
class of correlation functions is the Matérn
correlation function given by
where {w.s/ W s 2 D} is a spatial random
field, with D an open subset of Rd of dimension 1
d ; in most practical settings, d D 2 or d D 3. ks s0kI D 1
ks s0k
2 . /
A random field is said to be a valid spatial
process if for any finite collection of sites S of K ks s0kI I > 0; >0;
arbitrary size, the vector w D w.si / niD1 follows (2)
a well-defined joint probability distribution. Also,
i id
.s/ N.0; 2 / is a white-noise process, often where D . ; / with controlling the decay
called the nugget effect, modeling measurement in spatial correlation and yielding smoother
error or microscale variation (see, e.g., Chilées process realizations for higher values. Also, is
and Delfiner 1999). the usual gamma function, while K is a modified
A popular modeling choice for a spatial Bessel function of the third kind with order , and
random field is the Gaussian process, w.s/ ks s0k is the Euclidean distance between the sites
GP .0; K. ; //, specified by a valid covariance s and s0 .
function K.s; s0 I / D C ov.w.s/; w.s0 // that With observations y from n locations, the data
models the covariance corresponding to a likelihood is written in the marginalized form
pair of sites s and s0 . This specifies the joint y M V N.X ; y /, with y D 2 R. / C
distribution for w as M V N.0; w /, where w D 2
In and R. / D .si ; sj I / ni;j D1 that is the
112 Bayesian Spatial Regression for Multisource Predictive Mapping

spatial correlation matrix corresponding to w(s). contours to produce image and contour plots
For hierarchical models, one assigns prior (hyper- of the spatial processes.
prior) distributions to the model parameters (hy- For predictions, if fs0i gniD1
0
is a collection
perparameters), and inference proceeds by sam- of n0 locations, one can compute the posterior
pling from the posterior distribution of the param- predictive distribution p.w jy/ where w D
eters (see, e.g., Banerjee et al. 2004). Generically w.s0k / nkD1
0
. Note that
denoting by D . ; 2 ; ; 2 /, the set of param-
eters that are to be updated in the marginalized p.w jy/
model from, sample from the posterior distribu- Z
tion / p.w jw; ; y/p.wj ; y/p. jy/d dw:

p. jy/ / p. /p. 2 /p. /p. 2


/p.y j ; 2 ; ; /:2
This can be computed by composition sam-
(3) pling by first obtaining the posterior samples
{ .l/ gL lD1
p. jy/, then drawing w.l/
An efficient Markov Chain Monte Carlo p.wj .l/ ; y/ for each l as described in (5), and
(MCMC) algorithm is obtained by updating finally drawing w.l/ 0 p.w0 jw.l/ ; .l/ ; y/. This
from its full conditional M V N. j ; j /, last distribution is derived as a conditional distri-
where bution from a multivariate normal distribution.
As a more specific instance, consider
1 T 1 1
j D C X y X and prediction at a single arbitrary site s0 evaluates
T 1 p.y.s 0 /jy/. This latter distribution is sampled
j D j X y y: (4) in a posterior predictive manner by drawing
y .l/ .s0 / p.y.s0 /j .l/ ; y/ for each .l/ ; l D
All the remaining parameters must be updated 1; : : : ; L, where .l/ ’s are the posterior samples.
using Metropolis steps. Depending upon the ap- This is especially convenient for Gaussian
plication, this may be implemented using block likelihoods (such as (1)) since P .y.s0 /j .l/ ; y/
updates. On convergence, the MCMC (Markov is itself Gaussian with
Chain Monte Carlo) output generates L samples,
say f .l/ gL lD1
, from the posterior distribution E y.s0 /j ; y D xT .s0 / C T .s0 /
in (3).
1
In updating as outlined above, the spa- R. / C 2 = 2 In .y X / and (6)
tial coefficients w are not sampled directly. This
shrinks the parameter space resulting in a more
efficient MCMC algorithm. A primary advantage Var y.s0 /j ; y D 2 1 T
.s0 /
of first-stage Gaussian models (as in (1)) is that
1
the posterior distribution of w can be recovered in R. / C 2 = 2 In .s0 / C 2 ; (7)
a posterior predictive fashion by sampling from
where .s0 / D .s0 ; si I / N
iD1 when s0 ⁄
Z
si ; i D 1; : : : ; n, while the nugget effect 2 is
p.wjy/ / p.wj ; y/p. jy/d : (5)
added when s0 D si for some i . This approach
is called “Bayesian kriging.”
Once the posterior samples from p. jy/, These concepts are illustrated with data from
f .l/ gLlD1
, have been obtained, posterior samples permanent georeferenced forest inventory plots
from p.w j y/ are drawn by sampling w.l/ from on the USDA Forest Service Bartlett Experimen-
p.wj .l/ ; Data/, one for one for each .l/ . tal Forest (BEF) in Bartlett, New Hampshire.
This composition sampling is routine because The 1,053 hectare BEF covers a large elevation
p.wj ; y/ in (5) is Gaussian and the posterior gradient from the village of Bartlett in the Saco
estimates can subsequently be mapped with River Valley at 207 m to about 914 m above
Bayesian Spatial Regression for Multisource Predictive Mapping 113

a b

2597500

B
Latiude (meters)
2595500
2593500

Elevation Tasseled cap brightness


c d
2597500
Latiude (meters)
2595500 2593500

1948500 1950500 1952500 1948500 1950500 1952500


Longitude (meters) Longitude (meters)
Tasseled cap greenness Tasseled cap wetness

Bayesian Spatial Regression for Multisource Predic- derived from the 1 arc-second (approximately 30 30 m)
tive Mapping, Fig. 1 Remotely sensed variables georec- US Geological Survey national elevation dataset DEM
tified to a common coordinate system (North American data (Gesch et al. 2002). (b)–(d) are the tasseled cap
Datum 1983) and projection (Albers Conical Equal Area) components of brightness, greenness, and wetness derived
and resampled to a common pixel resolution and align- from bands 1 to 5 and 7 of a spring 2002 date of Landsat 7
ment. The images cover the US Forest Service Bartlett ETM+ sensor imagery (Huang et al. 2002). This Landsat
Experimental Forest near Bartlett, New Hampshire, USA. imagery was acquired from the National Land Cover
(a) is the elevation measured in meters above sea level Database for the USA (Homer et al. 2004)

sea level. For this illustration, the focus is on useful regressors for predicting forest biomass. A
predicting the spatial distribution of total tree spring, summer, and fall 2002 date of 30 30
biomass per hectare across the BEF. Tree biomass Landsat 7 ETM+ satellite imagery was acquired
is measured as the weight of all above ground for the BEF. Following Huang et al. (2002), the
portions of the tree, expressed here as metric image was transformed to tasseled cap compo-
tons per hectare. Within the data set, biomass nents of brightness (1), greenness (2), and wet-
per hectare is recorded at 437 forest inventory ness (3) using data reduction techniques. Three
plots across the BEF (Fig. 2). Satellite imagery of the nine resulting spectral variables labeled
and other remotely sensed variables have proved TC1, TC2, and TC3 are depicted in Fig. 1b–d.
114 Bayesian Spatial Regression for Multisource Predictive Mapping

a b

2597500
Latitude (meters)
2595500
2593500

1948500 1950500 1952500 1948500 1950500 1952500


Longitude (meters) Longitude (meters)
Inventor Plots OLS residuals

Bayesian Spatial Regression for Multisource Predic- tree biomass per hectare measured at forest inventory plots
tive Mapping, Fig. 2 The circle symbols in (a) represent depicted in (a) and remotely sensed regressors, some of
georeferenced forest inventory plots on the US Forest which are depicted in Fig. 1. Note that the spatial trends
Service Bartlett Experimental Forest near Bartlett, New in (b) suggest that observations of total tree biomass per
Hampshire, USA. (b) is an interpolated surface of residual hectare are not conditionally independent (i.e., conditional
values from an ordinary least squares regression of total on the regressors)

In addition to these nine spectral variables, dig-


5500

ital elevation model data was used to produce


a 30 30 elevation layer for the BEF (Fig. 1a).
The centroids of the 437 georeferenced inventory
4500

plots were intersected with the elevation and


Semivariance

spectral variables to form the 437 11 covari-


ate matrix (i.e., an intercept, elevation, and nine
spectral components).
3500

Choice of priors is often difficult, and there-


fore in practice it is helpful to initially explore the
data with empirical semivariograms and density
2500

plots. Examination of the semivariogram in Fig. 3


suggests an appropriate prior would center the 0 1000 2000 3000
spatial variance parameter 2 at 2,000 (i.e., the Distance
distance between the lower and upper horizontal
Bayesian Spatial Regression for Multisource Predic-
lines), the nugget or measurement error param- tive Mapping, Fig. 3 Empirical semivariograms and ex-
eter 2 at 3,000, and a support of 0–1,500 for ponential function REML parameter estimates of residual
the spatial range . There are several popular values from an ordinary least squares regression of total
choices of priors for the variance parameters tree biomass per hectare measured at forest inventory
plots depicted in Fig. 2 and remotely sensed regressors
including inverse gamma, half-Cauchy, and half- (Fig. 1). The estimates of the nugget (bottom horizontal
normal (see, e.g., Gelman 2006). Commonly, line), sill (upper horizontal line), and range (vertical line)
a uniform prior is placed on the spatial range are approximately 3250, 5200, and 520, respectively
parameter. Once priors are chosen, the Bayesian
specification is complete and the model can be mean of the posterior predictive distribution of
fit and predictions made as described above. Pre- the random spatial effects, E w jy , is given
dictive maps are depicted in Fig. 4. Here, the in a. The mean predicted main effect over the
Bayesian Spatial Regression for Multisource Predictive Mapping 115

a b
2597500

B
2595500
2593500

Predicted random spatial effects w* Predicted main effects, X (s)* b

c d
2597500
2595500
2593500

1948500 1950500 1952500 1948500 1950500 1952500


Predicted response, y* Uncertainy in y*

Bayesian Spatial Regression for Multisource Predic- total tree biomass per hectare, depicted in Fig. 2, and
tive Mapping, Fig. 4 Results of pixel-level prediction remotely sensed regressors, some of which are depicted
from a spatial regression model of the response variable in Fig. 1

domain, Xb, where b D E jy is depicted representing different hypothesis and perhaps


in (b) and c, gives the mean predictive surfaces, incorporating varying degrees of spatial richness.
E y jy , where y D XbCw . The uncertainty in This brings up the issue of comparing these
surface c is summarized in d as the range between models and perhaps ranking them in terms of
the upper and lower 95% credible interval for better performance. Better performance is usually
each pixel’s predictive distribution. As noted, the judged employing posterior predictive model
trend in pixel-level uncertainty in d is consistent checks with predictive samples and perhaps by
with the inventory plot locations in Fig. 2; specif- computing a global measure of fit. One popular
ically uncertainty in prediction increases with approach proceeds by computing the posterior
increasing distance from observed sites. predictive distribution
R of a replicated data
In practical data analysis, it is more set, p.yrep jy/ D p.yrep j /p. jy/d , where
common to explore numerous alternative models p.yrep j / has the same distribution as the data
116 Bayesian Spatial Regression for Multisource Predictive Mapping

likelihood. Replicated data sets from the above investigations attempt to predict the spatial and
distribution are easily obtained by drawing, for temporal distribution of risk to humans or com-
each posterior realization .l/ , a replicated data ponents of an ecosystem. For example, Thayer
set y.l/
rep from p.yrep j
.l/
/. Preferred models will et al. (2003) explore the utility of geostatistics for
perform well under a decision-theoretic balanced human risk assessments of hazardous waste sites.
loss function that penalizes both departure from Another example is from Kooistra et al. (2005)
corresponding observed value (lack of fit) and who investigate the uncertainty of ecological risk
for what the replicate is expected to be (variation estimates concerning important wildlife species.
in replicates). Motivated by a squared error loss As noted, the majority of the multisource risk
function, the measures for these two criteria are prediction literature is based on non-Bayesian
T
evaluated as G D .y rep / .y rep / and P D kriging models; however, as investigators begin
t r.Var.yrep jy//, where rep D E yrep jy is the to recognize the need to estimate the uncertainty
posterior predictive mean for the replicated data of prediction, they will likely embrace the basic
points and P is the trace of the posterior predic- Bayesian methods reviewed here and extend them
tive dispersion matrix for the replicated data; both to fit their specific domain. For example, Kneib
these are easily computed from the samples y.j /
rep . and Fahrmeir (2007) have proposed one such
Gelfand and Ghosh (1998) suggests using the Bayesian extension to spatially explicit hazard
score D D G C P as a model selection criterion, regression.
with lower values of D indicating better models.
Another measure of model choice that Agricultural and Ecological Assessment
has gained much popularity in recent times, Spatial processes, such as predicting agricultural
especially due to computational convenience, crop yield and environmental conditions (e.g.,
is the deviance information criteria (DIC) deforestation, soil or water pollution, or forest
(Spiegelhalter et al. 2002). This criteria is species change in response to changing climates),
the sum of the Bayesian deviance (a measure are often modeled using multisource spatial re-
of model fit) and the (effective) number of gression (see., e.g., Atkinson et al. 1994; Bert-
parameters (a penalty for model complexity). erretche et al. 2005; Bhatti et al. 1991). Only
The deviance, up to an additive quantity not recently have Bayesian models been used for pre-
depending upon , is D. / D 2 log L.yj /, dicting agricultural and forest variables of interest
where L.yj / is the first-stage Gaussian within a multisource setting. For example, in an
likelihood as in (1). The Bayesian deviance is effort to quantify forest carbon reserves, Banerjee
the posterior mean, D. / D E jy D. / , while and Finley (2007) used single and multiple reso-
the effective number of parameters is given by lution Bayesian spatial regression to predict the
pD D D. / D. N /. The DIC is then given distribution of forest biomass. An application of
by D. / C pD and is easily computed from such models to capture spatial variation in growth
the posterior samples. It rewards better fitting patterns of weeds is discussed in Banerjee and
models through the first term and penalizes more Johnson (2006).
complex models through the second term, with
lower values indicating favorable models for the
Atmospheric and Weather Modeling
data.
Arrays of weather monitoring stations provide
a rich source of spatial and temporal data on
Key Applications atmospheric conditions and precipitation. These
data are often coupled with a host of topographic
Risk Assessment and satellite derived variables through a spatial
Spatial and/or temporal risk mapping and auto- regression model to predict short- and long-term
matic zonation of geohazards have been mod- weather conditions. Recently, several investiga-
eled using traditional geostatistical techniques tors used these data to illustrate the virtues of
that incorporate both raster and vector data. These a Bayesian approach to spatial prediction (see
Bayesian Spatial Regression for Multisource Predictive Mapping 117

Diggle and Ribeiro Jr 2002; Paciorek and Cross-References


Schervish 2006; Riccio et al. 2006).
 Data Analysis, Spatial
 Hierarchical Spatial Models
Future Directions
Over the last decade, hierarchical models imple-  Hurricane Wind Fields, Multivariate Modeling B
 Kriging
mented through MCMC methods have become
 Public Health and Spatial Modeling
especially popular for spatial modeling, given
 Spatial Regression Models
their flexibility to estimate models that would be
 Spatial Uncertainty in Medical Geography: A
infeasible otherwise. However, fitting hierarchi-
cal spatial models often involves expensive ma- Geostatistical Perspective
 Statistical Descriptions of Spatial Patterns
trix decompositions whose complexity increases
 Uncertainty, Modeling with Spatial and Tem-
exponentially with the number of spatial loca-
tions. In a fully Bayesian paradigm, where one poral
seeks formal inference on the correlation param-
eters, these matrix computations occur in each
iteration of the MCMC rendering them com- References
pletely infeasible for large spatial data sets. This
Atkinson PM, Webster R, Curran PJ (1994) Cokriging
situation is further exacerbated in multivariate with airborne MSS imagery. Remote Sens Environ
settings with several spatially dependent response 50:335–345
variables, where the matrix dimensions increase Banerjee S, Carlin BP, Gelfand AE (2004) Hierarchical
by a factor of the number of spatially dependent modeling and analysis for spatial data. Chapman and
Hall/CRC Press, Boca Raton
variables being modeled. This is often referred to Banerjee S, Finley AO (2007) Bayesian multi-resolution
as the “big N problem” in spatial statistics and modelling for spatially replicated datasets with appli-
has been receiving considerable recent attention. cation to forest biomass data. J Stat Plann Inference.
While attempting to reduce the dimension of the doi:10.1016/j.jspi.2006.05.024
Banerjee S, Johnson GA (2006) Coregionalized single
problem, existing methods incur certain draw- and multi-resolution spatially varying growth-curve
backs such as requiring the sites to lie on a regular modelling. Biometrics 62:864–876
grid, which entail realigning irregular spatial data Berterretche M, Hudak AT, Cohen WB, Maiersperger TK,
using an algorithmic approach that might lead to Gower ST, Dungan J (2005) Comparison of regression
and geostatistical methods for mapping Leaf Area
unquantifiable errors in precision. Index (LAI) with Landsat ETM+ data over a boreal
Another area receiving much attention in forest. Remote Sens Environ 96:49–61
recent times is that of multivariate spatial models Bhatti AU, Mulla DJ, Frazier BE (1991) Estimation of
with multiple response variables as well as soil properties and wheat yields on complex eroded
hills using geostatistics and thematic mapper images.
dynamic spatiotemporal models. With several Remote Sens Environ 37(3):181–191
spatially dependent variables, the association Campbell JB (2006) Introduction to remote sensing, 4th
between variables accrues further computational edn. The Guilford Press, New York, p 626
costs for hierarchical models. More details about Carlin BP, Louis TA (2000) Bayes and empirical Bayes
methods for data analysis, 2nd edn. Chapman and
multivariate spatial modeling can be found in Hall/CRC Press, Boca Raton
Banerjee et al. (2004), Chilées and Delfiner Chilées JP, Delfiner P (1999) Geostatistics: modelling
(1999), Cressie (1993), and Wackernagel (2006). spatial uncertainty. Wiley, New York
A special class of multivariate models known Cressie NAC (1993) Statistics for spatial data, 2nd edn.
Wiley, New York
as coregionalization models is discussed and Cromley EK, McLafferty SL (2002) GIS and public
implemented in Finley et al. (2007). Unless health. Guilford Publications Inc., New York
simplifications such as separable or intrinsic Diggle PJ, Ribeiro PJ Jr (2002) Bayesian inference in
correlation structures (see Wackernagel 2006) Gaussian model-based geostatistics. Geogr Environ
Model 6:29–146
are made, multivariate process modeling proves Finley AO, Banerjee S, Carlin BP (2007) spBayes: an
too expensive for reasonably large spatial data R package for univariate and multivariate hierarchical
sets. This is another area of active investigation. point-referenced spatial models. J Stat Softw 19:4
118 Bead

Gelfand AE, Ghosh SK (1998) Model choice: a mini- fit (with discussion and rejoinder). J R Stat Soc Ser B
mum posterior predictive loss approach. Biometrika 64:583–639
85:1–11 Thayer WC, Griffith DA, Goodrum PE, Diamond GL,
Gelman A (2006) Prior distributions for variance parame- Hassett JM (2003) Application of geostatistics to risk
ters in hierarchical models. Bayesian Anal 3:515–533 assessment. Risk Anal Int J 23(5):945–960
Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Wackernagel H (2006) Multivariate geostatistics: an in-
Bayesian data analysis, 2nd edn. Chapman and troduction with applications, 3nd edn. Springer, New
Hall/CRC Press, Boca Raton York
Gesch D, Oimoen M, Greenlee S, Nelson C, Steuck Webster R, Oliver MA (2001) Geostatistics for environ-
M, Tyler D (2002) The national elevation dataset. mental scientists. Wiley, New York
Photogramm Eng Remote Sens 68(1):5–12
Homer C, Huang C, Yang L, Wylie B, Coan M (2004)
Development of a 2001 national land-cover database
for the United States. Photogramm Eng Remote Sens
Recommended Reading
70:829–840
Huang C, Wylie B, Homer C, Yang L, Zylstra G (2002) Handcock MS, Stein ML (1993) A Bayesian analysis of
Derivation of a tasseled cap transformation based on kriging. Technometrics 35:403–410
landsat 7 at-satellite reflectance. Int J Remote Sens Wang Y, Zheng T (2005) Comparison of light detection
8:1741–1748 and ranging and national elevation dataset digital el-
Jones CB (1997) Geographical information systems and evation model on floodplains of North Carolina. Natl
computer cartography. Addison Wesley Longman, Hazards Rev 6(1):34–40
Harlow
Kneib T, Fahrmeir L (2007) A mixed model approach
for geoadditive hazard regression. Scand J Stat 34:
207–228 Bead
Kooistra L, Huijbregts MAJ, Ragas AMJ, Wehrens R,
Leuven RSEW (2005) Spatial variability and uncer-  Space-Time Prism Model
tainty in ecological risk assessment: a case study on
the potential risk of cadmium for the little owl in
a Dutch River Flood Plain. Environ Sci Technol 39:
2177–2187
Mather PM (2004) Computer processing of remotely-
Best linear Unbiased Prediction
sensed images, 3rd edn. Wiley, Hoboken, p 442
Möller J (2003) Spatial statistics and computational  Spatial Econometric Models, Prediction
method. Springer, New York
National Academy of Sciences (1970) Remote Sensing
with Special Reference to Agriculture and Forestry.
National Academy of Sciences, Washington, DC, Big Data
p 424
Paciorek CJ, Schervish MJ (2006) Spatial modelling using
a new class of nonstationary covariance functions.  Informing Climate Adaptation with Earth Sys-
Environmetrics 17:483–506 tem Models and Big Data
Riccio A, Barone G, Chianese E, Giunta G (2006)
A hierarchical Bayesian approach to the spatio-
temporal modeling of air quality data. Atmosph En-
viron 40:554–566 Big Data and Spatial Constraint
Richards JA, Xiuping J (2005) Remote sensing digital
image analysis, 4th edn. Springer, Heidelberg, p 439
Databases
Robert C (2001) The Bayesian choice, 2nd edn. Springer,
New York Peter Z. Revesz
Santner TJ, Williams BJ, Notz WI (2003) The design and Department of Computer Science and
analysis of computer experiments. Springer, New York
Engineering, University of Nebraska-Lincoln,
Schabenberger O, Gotway CA (2004) Statistical methods
for spatial data analysis. Texts in statistical science Lincoln, NE, USA
series. Chapman and Hall/CRC, Boca Raton
Scheiner SM, Gurevitch J (2001) Design and analysis of
ecological experiments, 2nd edn. Oxford University
Press, Oxford
Synonyms
Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A
(2002) Bayesian measures of model complexity and Spatial big data; Spatial constraint database
Big Data and Spatial Constraint Databases 119

Definition precision. The feasibility of compressing weather


data in a constraint database was already demon-
Business, government, and scientific spatial data strated in Chapter 18 of Revesz (2010). We need
are all growing rapidly fueled by the gathering to store 3653 tuples in a relational database to
of information using mobile devices, computers, record the daily high temperatures from a single B
cameras, microphones, remote sensors, and other weather station. That looks reasonable until we
technologies. The immense wealth of new spatial consider a simple query such as “Find all pairs of
data overwhelms most relational database man- days such that for each pair the high temperature
agement systems. in the first day is greater than the high tempera-
The collected spatial data is complex and ture in the second day.” That would result in an
messy and needs to be simplified for database output relation with 6; 645; 646 tuples. The well-
representation. Constraint databases were recognized problem is that SQL queries with k
designed to represent mathematical equations number of joins could result in output relations
and inequalities in a way that extend relational of size O.nk/. That motivates every current pro-
databases while preserving the simplicity of posal to deal with big data to drop SQL as a query
querying of relational databases (Kanellakis et al. language. Instead of dropping SQL queries, we
1995; Revesz 2010). Constraint databases are propose another idea. We note that the temper-
particularly useful in representing spatial and ature data can be represented as a piecewise
spatiotemporal relationships. Data regarding linear function. We developed an algorithm that
spatial objects and their movements can be given an error tolerance parameter and the
often greatly compressed by the use of scientific original measurements .t0 ; y0 /; .tn ; yn / returns a
laws. For example, in the seventeenth century, piecewise linear function f that approximates
Kepler’s laws of planetary motion summarized each data point such that jf .ti /j for all
over twenty years of experimental data collected ti . Figure 1 gives a visualization of a possible
by Tycho Brahe, the Danish astronomer. approximation around one data point P1 . The
Kepler’s laws and any many other spatial and larger the error tolerance value, the more com-
spatiotemporal relationships in real data can be pression can be achieved. The piecewise linear
expressed by constraint relations. approximation can be easily represented in a
constraint database.
There is a trade-off between compression and
query accuracy, which is commonly evaluated
Historical Background by the precision and values. With a compressed
data of about 5% the original size, in the MLPQ
For example, real estate data can be interpolated
constraint database system (Revesz et al. 2000),
and the interpolation stored compactly in a con-
the output of the SQL query was only 51,681
straint database (Li and Revesz 2004). As another
tuples, which is less than 1% of its size in a
example, in one of our recent data mining of
relational database. However, the precision and
citation data, we simplified very complex citation
the recall both remained above 93% as shown in
curves by numerical approximations, which en-
the following table:
abled us to discover citation patterns that were not
visible before (Revesz 2014).

Constraint approx. Precision Recall


Scientific Fundamentals — 6,645,646 100.00% 100.00%
1:0 2,006,001 99.39% 99.49%
Much of the collected big data is redundant and 2:0 1,208,010 98.54% 98.67%
needs to be compressed and represented in a 4:0 235,639 96.83% 96.97%
8:0 51,681 93.39% 93.53%
compact form even if it comes with some loss of
120 Big Data and Spatial Constraint Databases

Big Data and Spatial


Constraint Databases,
Fig. 1 Piecewise linear
approximation
P1

Big Data and Spatial


Constraint Databases,
Fig. 2 A mountain range

Similarly, a good spatial approximation would Mountain_TIN


be always within the error tolerance at any mea- X Y Z
sured location. Spatial approximations also show x y · y 0; y x; xCy 6; 3y · D 0
x y · x 0; y x; xCy 6; 3x · D 0
trade-offs between the compression ratio and the
x y · y 6; y x; xCy 6; 3y C · D 18
precision and recall measures. For example, sup- x y · x 6; y x; xCy 6; 3x C · D 18
pose that we need to find an approximation for the
shape of the mountain range shown in Fig. 2. In
that case, a triangulated irregular network (TIN)
can give a rough approximation of the moun- Even with good data compression techniques,
tain range where each point on the surface of big data will need to be stored in a distributed
the mountain is within at most some meters way in many data stores. We envision that the co-
from the surface of the triangulated irregular operation among these distributed data stores will
network. The triangulated irregular network can be more sophisticated than the ones envisioned
be recorded in a constraint database with only in HADOOP (White 2009). In particular, each
as many constraint tuples as many triangular data store will communicate with the central data
elements are used (Chomicki and Revesz 1999; store using constraint databases as their main data
Li and Revesz 2004). format. Integration of the constraint data from the
For example, suppose that a simple trian- various data stores need to accomplished as easily
gulated irregular network represents the largest as the integration of aggregate data, for example,
mountain as a pyramid with the corner vertices the count of customer names is accomplished in
.0; 0; 0/; .0; 6; 0/; .6; 0; 0/, and .6; 6; 0/ and HADOOP.
.3; 3; 9/, then it can be represented as the The aggregation of spatial big data when de-
following constraint relation. scribed by constraint databases can be accom-
Big Data and Spatial Constraint Databases 121

plished by the appropriate type of constraint solv- or other diseases, provides another growing
ing, which generalizes the aggregate operators application area.
on relational data. Some of the aggregation can
be done by spatial indexing algorithms. Another
Future Directions
type of aggregation calls for finding the inter- B
section (or union) of various spatial areas. The
The following are some open problems or barely
intersection operator is commutative and associa-
addressed topics in the context of big data and
tive and therefore can be computed using a tree
spatial constraint databases:
structure where the leaves are the input constraint
databases describing various areas, and each in-
ternal node is the intersection of all the leaves • Constraint relations for communications in
below it, and the root is the intersection of all the a map-reduce architecture need to be facili-
areas. tated by development of efficient algorithms
that are locally produced at each data store
triangular irregular networks, spatiotemporal
functions describing the trajectories of moving
Key Applications
objects, and other constraint relations. How
There are many possible applications of spatial much computational time and space are re-
data ranging from astronomy, to geography, and quired by the aggregation? How conflicts in
urban planning. As an example from astronomy, the constraint data can be handled?
suppose that we have sky survey data with a • There need to be developed good methods
distributed storage where each location records to estimate the error in the spatial constraint
observations at midnight on different days. The database approximation. This is similar to
location of one particular galaxy is identified in numerical analysis methods that find an error
the records at each location. Then a query may be term when estimating the integral of a func-
to find any star that is always near the galaxy on tion using a particular numerical integration
a specific set of days. method, for example, composite Simpson’s
Another application is in integrating data that rule (Burden and Faires 2014). We can reduce
is stored at each location and classified using the size of the error by reducing the error
machine-learning techniques such as support vec- tolerance value. That is similar to numerical
tor machines (SVMs) or decision trees. The local integration methods that can reduce the size of
classifications can be represented using constraint the error by considering smaller interval sizes.
databases. Classification integration (Revesz and
Triplet 2010, 2011) combines the classifications Cross-References
at each location into a single classification. The
classification integration can be also performed  MLPQ Spatial Constraint Database System
using a tree structure similar to the intersection  Spatial
query described above.  Spatial Join with Hadoop
Weather forecasting and climate change mod-
eling is another application where a distributed
sensor network continuously collects a vast References
amount of data to be mined for both local and
Burden RL, Faires JD (2014) Numerical analysis, 9th edn.
global trends (Revesz and Woodward 2014). Springer, New York
Finally, the explosion of genomic data when Chomicki J, Revesz PZ (1999) Constraint-based interop-
spatial allele variations are also considered, erability of spatiotemporal databases. Geoinformatica
3(3):211–243
as in the case of personalized medicine, such
Li L, Revesz PZ (2004) Interpolation methods for spatio-
as predicting a person’s chances of developing temporal geographic data. Comput Env Urban Syst
specific types of cancer (Revesz and Assi 2013) 28(3):201–227
122 Binary Correlation

Kanellakis PC, Kuper GM, Revesz PZ (1995) Constraint Definition


query languages. J Comput Syst Sci 51(1):26–52
Revesz PZ (2010) Introduction to databases: from biolog-
ical to spatio-temporal. Springer, New York The web definition suggests that bioinformatics
Revesz PZ (2014) A method for predicting the citations is the science that deals with the collection, orga-
to the scientific publications of individual researchers. nization, and analysis of large amounts of biolog-
In: 18th international database engineering and appli- ical data using advanced information technology
cations symposium. ACM Press, pp 9–18
Revesz PZ (2013) Assi C data mining the functional such as networking of computers, software, and
characterizations of proteins to predict their cancer- databases. This is the field of science in which
relatedness. Int J Biol Biomed Eng 7(1):7–14 biology, computer science, and information tech-
Revesz PZ, Chen R, Kanjamala P, Li Y, Liu Y, Wang Y nology merge into a single discipline. Bioin-
(2000) The MLPQ/GIS constraint database system. In:
ACM SIGMOD international conference on manage- formatics deals with information about human
ment of data. ACM Press and other animal genes and related biological
Revesz PZ, Triplet T (2010) Classification integration and structures and processes. Genome structures such
reclassification using constraint databases. Artif Intell as DNA, RNA, proteins in cells, etc., are spatial
Med 49(2):79–91
Revesz PZ, Triplet T (2011) Temporal data classification in nature (Fig. 1a–c). They can be represented as
using linear classifiers. Inf Syst 36(1):30–41 two- or three-dimensional.
Revesz PZ, Woodward R (2014) Variable bounds anal- Because of the spatial nature of genome data,
ysis of a climate model using software verification
geographic information systems (GIS), being an
techniques. In: Balicki J et al (eds) Applications of
information systems in engineering and bioscience. information science, has a larger role to play in
WSEAS Press, pp 31–36 the area of bioinformatics. GIS and bioinformat-
White T (2009) Hadoop: the definitive guide. O’Reilly ics have much in common. Digital maps, large
Media
databases, visualization mapping, pattern recog-
nition and analysis, etc., are common tasks in GIS
Recommended Reading and bioinformatics (Anonymous 2007). While
GIS research is based on analyzing satellite or
Anderson S, Revesz PZ (2009) Efficient maxCount and aerial photos, gene research deals with high-
threshold operators of moving objects. Geoinformatica resolution images (Fig. 2) generated by micro-
13(4):355–396 scopes or laboratory sensing technologies. GIS
techniques and tools are used by researchers for
pattern recognition, e.g., geographic distribution
Binary Correlation of cancer and other diseases in human, animal,
and plant populations (Anonymous 2007; Kotch
 Statistically Significant Co-location Pattern 2005).
Mining

Historical Background
Bioinformatics, Spatial Aspects
According to Kotch (2005), spatial epidemiology
Sudhanshu Panda mapping and analysis has been driven by soft-
GIS/Environmental Science, Gainsville State ware developments in geographic information
College, Gainsville, GA, USA systems (GIS) since the early 1980s. GIS map-
ping and analysis of spatial disease patterns and
geographic variations of health risks is helping
Synonyms understand the spatial epidemiology since past
centuries (Jacquez 2000). Researchers in bioin-
Biological data mining; Epidemiological map- formatics also deal with similar pattern recogni-
ping; Genome mapping tion and analysis regarding very small patterns,
Bioinformatics, Spatial Aspects 123

Bioinformatics, Spatial Aspects, Fig. 1 Images of DNA and proteins in cell as 2-D and 3-D form

Bioinformatics, Spatial Aspects, Fig. 2 High-resolution (4000 4000 pixels) images of a genome maps showing
the spatial nature of the data

such as those in DNA structure that might predis- like other geospatial data. GIS can interactively
pose an organism to developing cancer (Anony- be used in bioinformatics projects for better
mous 2007). As both bioinformatics and GIS are dynamism, versatility, and efficiency. Figure 3
based on common mapping and analytical ap- shows mapping of genome data using ArcGIS
proaches, there is a good possibility of gaining an software. This helps in managing the genome
important mechanistic link between individual- data interactively with the application of superior
level processes tracked by genomics and pro- GIS functionality. Below is a description of a GIS
teomics and population-level outcomes tracked application in bioinformatics for different aspects
by GIS and epidemiology (Anonymous 2007). of management and analysis.
Thus, the scope of bioinformatics in health re-
search can be enhanced by collaborating with Use of GIS for Interactive Mapping of
GIS. Genome Data
In bioinformatics application, genome browsers
are developed for easy access of the data. They
Scientific Fundamentals use only simple keyword searches and limit the
display of detailed annotations to one chromo-
As discussed earlier, data in bioinformatics are somal region of the genome at a time (Dolan
of spatial nature and could be well understood et al. 2006). Spatial data browsing and man-
if represented, analyzed, and comprehended just agement could be done with efficiency using
124 Bioinformatics, Spatial Aspects

Bioinformatics, Spatial Aspects, Fig. 3 Display of a mouse genome in ArcGIS (Adapted from Dolan et al. 2006)

ArcGIS software (ESRI). Dolan et al. (2006) have provided (Pushker et al. 2005) with an interface
employed concepts, methodologies, and the tools for selecting a particular sampling location on the
that were developed for the display of geographic world map and getting all the genome sequences
data to develop a Genome Spatial Information from that location and their details. Geodatabase
System (GenoSIS) for spatial display of genomes management ability helped them obtained the
(Fig. 4). The GenoSIS helps users to dynamically following information: (i) taxonomy report,
interact with genome annotations and related at- taxonomic details at different levels (domain,
tribute data using query tools of ArcGIS, such phylum, class, order, family, and genus); (ii)
as query by attributes, query by location, query depth report, a plot showing the number of
by graphics, and developed definition queries. sequences vs. depth; (iii) biodiversity report, a
The project also helps in producing dynamically list of organisms found; (iv) collection of all
generated genome maps for users. Thus, the ap- entries; and (v) advanced search for a selected
plication of GIS in bioinformatics projects helps region on the map. Using GIS tools, they
genome browsing become more versatile and retrieved sequences corresponding to a particular
dynamic. taxonomy, depth, or biodiversity (Pushker et al.
2005). Meaning, the bioinformatics dealing with
GIS Application as a Database Tool for a spatial scale can be well managed by GIS
Bioinformatics database development and management.
GIS can be applied for efficient biological While developing a “Global Register of Mi-
database management. While developing a gratory Species (GROMS),” Riede (2000) devel-
database for the dynamic representation of oped a geodatabase of the bird features including
marine microbial biodiversity, the GIS option their genomes. It was efficient in accessing the
Bioinformatics, Spatial Aspects 125

Conserved regions
Buildings

Roads, rivers

Land parcels
Expression levels B
Elevation data

Satellite image data Genes

Chromosome space
Underlying geographic space

http://www.gis.com/whatisgis/whyusegis.html

Bioinformatics, Spatial Aspects, Fig. 4 Comparative use of GIS paradigm of the map layers to the integration and
visualization of genome data (Adapted from Dolan et al. 2006)

Bioinformatics, Spatial
Aspects, Fig. 5 Image
analysis of a barley leaf for
cell transformation analysis
(Adapted from Schweizer
2007)

information about the migratory birds including fully penetrated into the transformed cell and
their spatial location through the geodatabase. started to grow out on the leaf surface illus-
trated by the elongating secondary hyphae. This
Genome Mapping and Pattern study shows the spatial aspect of bioinformatics
Recognition and Analysis (Schweizer 2007).
While studying the genome structure, it is es-
sential to understand the spatial extent of its GIS Software in Bioinformatics as Spatial
structure. As genome mapping is done through Analysis Tool
imaging, image processing tools are used to an- Software has been developed as tools of
alyze the pattern. GIS technology could be used bioinformatics to analyze nucleotide or amino
for pattern recognition and analysis. Figure 5 is acid sequence data and extract biological
an example of an image of a microscopic top information. “Gene prediction software (Pavy
view of a barley leaf with a transformed - et al. 1999)” and “sequence alignment software
glucuronidase-expressing cell (blue-green). From (Anonymous: Sequence alignment software
the image analysis, it is observed that two fungal 2007)” are examples of some of the software
spores of Blumeria graminis f.sp. hordei (fungus, developed for bioinformatics.
dark blue) are interacting with the transformed Gene prediction software is used to identify a
cell and the spore at the left-hand side success- gene within a long gene sequence. As described
126 Bioinformatics, Spatial Aspects

by Dolan et al. (2006), if the genome database bioinformatics in these areas could be associ-
can be presented through ArcGIS, they can be ated with spatial technology of GIS. A study
visualized, analyzed, and queried better than the by Nielson and Panda (2006) was conducted
present available techniques. Thus, GIS func- on predictive modeling and mapping of fasting
tion development techniques can be replicated blood glucose level in Hispanics in southeastern
to make the available software more efficient. Idaho. The levels were mapped according to the
Programs like GenoSIS (Dolan et al. 2006) is a racial genes and their spatial aspect of represen-
step in that direction. Sequence alignment soft- tation. This study shows how bioinformatics can
ware is a compilation of bioinformatics software be used in several other areas including epidemi-
tools and web portals which are used in se- ology.
quence alignment, multiple sequence alignment,
and structural alignment (Anonymous: Sequence
alignment software 2007). They are also used Future Directions
for database searching. Thus, GIS can be used
for bringing dynamism to the database search. According to Virginia Bioinformatics Institute
Matching of DNA structures could be efficiently (VBI) Director Bruno Sobral, “The notion of a
done with the GIS application. map goes all the way from the level of a genome
Molecular modeling and 3-D visualization is to a map of the United States,” he said, “bioin-
another aspect of genome research. To under- formatics has focused on modeling from the level
stand the function of proteins in cells, it is es- of the molecules up to the whole organism, while
sential to determine a protein’s structure (BSCS GIS has created tools to model from the level of
2003). The process of determining a protein’s the ecosystem down.” This indicates that there is
exact structure is labor intensive and time con- great potential for bioinformatics and geospatial
suming. Traditionally, X-ray crystallography and technology to be combined in a mutually enhanc-
nuclear magnetic resonance (NMR) spectroscopy ing fashion for important applications.
techniques are used to solve protein structure
determination (BSCS 2003). The maps developed
by these instruments are preserved in the form
of a Protein Data Bank (PDB). The PDB is Cross-References
the first bioinformatics resource to store three-
dimensional protein structures. Currently, it is  Biomedical Data Mining, Spatial
possible to visualize the utility of GIS regarding  Public Health and Spatial Modeling
molecular modeling and the 3-D visualization
process. ArcScence™ is the 3-D visualization
and modeling software for spatial data. It could References
very well be the best tool for PDB data visualiza-
tion, modeling, and analysis. Anonymous (2007) VT conference puts new research area
on the map; GIS expert Michael Goodchild echoes its
value. https://www.vbi.vt.edu/article/articleview/48/1/
15/. Accessed 05 Mar 2007
Key Applications Anonymous (2007) Sequence alignment software. http://
en . wikipedia.org/wiki/Sequence_alignment_software.
Bioinformatics is playing important roles in many Accessed
BSCS (2003) Bioinformatics and the human genome
areas such as agriculture, medicine, biotechnol- project. A curriculum supplement for high school
ogy, environmental science, animal husbandry, biology. Developed by BSCS under a contract from the
etc., as a genome is not only a principal com- Department of Energy
ponent of the human body but also of plants. Dolan ME, Holden CC, Beard MK, Bult CJ (2006)
Genomes as geography: using GIS technology to
GIS or geotechnology has been successfully used build interactive genome feature maps. BMC Bioinf
in these areas for a long time. Therefore, using 7:416
Biomedical Data Mining, Spatial 127

ESRI. ArcIMS. http://www.esri.com/software/arcgis/ Definition


arcims/index.html. Accessed 05 Mar 2007
Jacquez GM (2000) Spatial analysis in epidemiology:
nascent science or a failure of GIS? J Geogr Syst 2: The use of biomedical data in object classification
91–97 presents several challenges that are well-suited
Kotch T (2005) Cartographies of disease: maps, mapping to knowledge discovery and spatial modeling B
and disease. ESRI Press, Redlands methods. In general, this problem consists of
Nielson J, Panda SS (2006) Predictive modeling and
mapping of fasting blood glucose: a preliminary study extracting useful patterns of information from
of diabetes among Hispanics in SE Idaho. Presented in large quantities of data with attributes that often
the intermountain GIS user’s conference, Helena, 4–8 have complex interactions. Biomedical data are
Apr 2006 inherently multidimensional and therefore diffi-
Pavy N, Rombauts S, Dehais P, Mathe C, Ramana
DVV, Leroy P, Rouze P (1999) Evaluation of gene cult to summarize in simple terms without losing
prediction software using a genomic data set: applica- potentially useful information. A natural conflict
tion to Arabidopsis thaliana sequence. Bioinformatics exists between the need to simplify the data to
15(11):887–899 make it more interpretable and the associated risk
Pushker R, D’Auria G, Alba-Casado JC, Rodríguez-
Valera F (2005) Micro-Mar: a database for dynamic of sacrificing information relevant to decision
representation of marine microbial biodiversity. BMC support.
Bioinf 6:222 Transforming spatial features in biomedical
Riede K (2000) Conservation and modern informa-
data quantifies, and thereby exposes, underly-
tion technologies: the global register of migratory
species (GROMS). J Int Wildl Law Policy 3(2): ing patterns for use as attributes in data mining
152–165 exercises (Teh and Chin 1988). To be useful,
Schweizer P (2007) IPK Gatersleben. www.ipk- a data transformation must faithfully represent
gatersleben.de/en/02/04/04/. Accessed 05 Mar
the original spatial features. Orthogonal poly-
2007
nomials such as the Zernike polynomials have
been successfully used to transform very different
types of spatial data (Hoekman and Varekamp
Biological Data Mining 2001; Zernike 1934). The use of wavelets (Mallat
1999), coupled with space-filling curves (Platz-
 Bioinformatics, Spatial Aspects man and Bartholdi 1989) remains another possi-
bility of representing spatial data.
The following entry presents general findings
from work in the use of spatial modeling methods
to represent and classify the shape of the human
Biomedical Data Mining, Spatial cornea.

Keith Marsolo1 , Michael Twa2 ,


Mark A. Bullimore2 , and
Srinivasan Parthasarathy1 Historical Background
1
Department of Computer Science and
Engineering, The Ohio State University, Zernike polynomials were originally derived to
Columbus, OH, USA describe optical aberrations and their geometric
2
College of Optometry, The Ohio State modes have a direct relation to the optical
University, Columbus, OH, USA function of the eye (Born and Wolf 1980;
Zernike 1934). Several of the circular polynomial
modes of Zernike polynomials, presented in
Synonyms Fig. 1 show strong correlation with natural
anatomical features of the cornea (i.e., normal
Polynomials; Polynomials, orthogonal; Wavelets; corneal asphericity and astigmatic tori-city).
Zernike; Zernike polynomials Zernike polynomials have been used within the
128 Biomedical Data Mining, Spatial

where is the Kronecker delta. Their computa-


tion results in a series of linearly independent cir-
cular geometric modes that are orthonormal with
respect to the inner product given above. Zernike
polynomials are composed of three elements:
a normalization coefficient, a radial polynomial
component and a sinusoidal angular component
(Thibos et al. 1999). The general form for Zernike
polynomials is given by:

Znm . ; /
Biomedical Data Mining, Spatial, Fig. 1 Graphical
representation of the modes that constitute a 4th order 8p m
Zernike polynomial. Each mode is labeled using the single
< p2.n C 1/ZRn . / cos.m / for m > 0
indexing convention created by the Optical Society of D 2.n C 1/ZRnm . / sin.jmj / for m < 0
:p
America (Thibos et al. 1999) .n C 1/ZRnm . / for m D 0
(2)
medical community for a number of different
applications, most recently for modeling the where n is the radial polynomial order and m
shape of the cornea (Schwiegerling et al. 1995). represents azimuthal frequency. The normaliza-
Studies have shown that these models can tion coefficient is given by the square root term
effectively characterize aberrations that may exist preceding the radial and azimuthal components.
on the corneal surface (Iskander et al. 2001a, b). The radial component of the Zernike polynomial,
The use of wavelets is natural in applications the second portion of the general formula, is
that require a high degree of compression with- defined as:
out a corresponding loss of detail, or where the
detection of subtle distortions and discontinuities ZRnm . /
is crucial (Mallat 1999). Wavelets have been used
.n X
jmj/=2
in a number of applications, ranging from signal . 1/s .n s/ n 2s
D :
processing, to image compression, to numerical sD0 s . nCjmj s/ . n 2jmj s/
2
analysis (Daubechies 1992). They play a large (3)
role in the processing of biomedical instrument
data obtained through techniques such as ultra-
Note that the value of n is a positive integer or
sound, magnetic resonance imaging (MRI) and
zero. For a given n, m can only take the values
digital mammography (Laine 2000).
n, n C 2, n C 4, : : :, n. In other words,
m jnj D even and jnj < m. Thus, only certain
combinations of n and m will yield valid Zernike
Scientific Fundamentals
polynomials. Any combination that is not valid
simply results in a radial polynomial component
Zernike Polynomials
of zero.
Zernike polynomials are a series of circular poly-
Polynomials that result from fitting raw data
nomials defined within the unit circle. They are
with these functions are a collection of approxi-
orthogonal by the following condition:
mately orthogonal circular geometric modes. The
Z 2 Z 1 coefficients of each mode are proportional to its
0
Znm . ; /ZRnm0 . ; / d d contribution to the overall topography of the orig-
0 0 inal image data. As a result, one can effectively
D nn0 mm0 (1) reduce the dimensionality of the data to a subset
2.n C 1/ of polynomial coefficients that represent spatial
Biomedical Data Mining, Spatial 129

features from the original data as the magnitude Space-Filling Curves


of discrete orthogonal geometric modes. The two-dimensional wavelet decomposition
will, to some degree, capture the spatial locality
Wavelets of the original data. There may be instances,
Given a decomposition level S and a one- however, where the data does not lend itself B
dimensional signal of length N , where N is to the two-dimensional decomposition and the
divisible by 2S , the discrete wavelet transform one-dimensional transformation is preferred.
consists of S stages. At each stage in the In order to retain some of the spatial locality,
decomposition, two sets of coefficients are space-filling curves can be used to sample the
produced, the approximation and the detail. data (Platzman and Bartholdi 1989). There are
The approximation coefficients are generated by a number of different space-filling curves, but
convolving the input signal with a low-pass filter this work utilizes the Hilbert curve. Hilbert
and down-sampling the results by a factor of two. curves can be used to generate a one-dimensional
The detail coefficients are similarly generated, signal that visits every point in two-dimensional
convolving the input with a high-pass filter and space. Figure 2 shows the Hilbert curves of first,
down-sampling by a factor of two. If the final second and third order. These curves can be
stage has not been reached, the approximation used to sample a matrix of size 4, 16, and 81,
coefficients are treated as the new input signal and respectively. Higher-order curves can be used to
the process is repeated. In many cases, once the sample larger input matrices, or the input data
wavelet transformation is complete, the original can be scaled to fit accordingly.
signal will be represented using combinations
of approximation and detail coefficients. In the Corneal Topography
applications presented here, however, only the Clear vision depends on the optical quality of the
final level of approximation coefficients are used. corneal surface, which is responsible for nearly
The rest are simply ignored. 75% of the total optical power of the eye (Kiely
To operate on a two-dimensional signal of et al. 1982). Subtle distortions in the shape of the
size N N (with N divisible by 2S ), the de- cornea can have a dramatic effect on vision. The
composition proceeds as follows: First, the rows process of mapping the surface features on the
are convolved with a low-pass filter and down- cornea is known as corneal topography (Mandell
sampled by a factor of two, resulting in matrix 1996).
L (N=2 N ). The process is repeated on the The use of corneal topography has rapidly in-
original signal using a high-pass filter, which creased in recent years because of the popularity
leads to matrix H (N=2 N ). The columns of of refractive surgical procedures such as laser
L are convolved two separate times, once with a assisted in-situ keratomileusis (LASIK). Corneal
low-pass filter and again with a high-pass filter. topography is used to screen patients for corneal
After passing through the filters, the signals are disease prior to this surgery and to monitor the ef-
down-sampled by a factor of two. This results in fects of treatment after. It is also used to diagnose
a matrix of approximation (LL) and horizontal and manage certain diseases of the eye, includ-
detail coefficients (LH ), respectively (both of ing keratoconus, a progressive, non-inflammatory
size N=2 N=2). These steps are executed once corneal disease that can lead to corneal transplant.
more, this time on the columns of H , resulting The most common method of determining
in a matrix of diagonal (HH ) and vertical (HL) corneal shape is to record an image of a series
detail coefficients (again of size N=2 N=2). of concentric rings reflected from the corneal sur-
The whole procedure can then be repeated on face. Any distortion in corneal shape will cause a
the approximation coefficients contained in ma- distortion of the concentric rings. By comparing
trix LL. As in the one-dimensional case, only the size and shape of the imaged rings with their
the final level of approximation coefficients are known dimensions, it is possible to mathemati-
considered. cally derive the topography of the corneal surface.
130 Biomedical Data Mining, Spatial

a b
2 4

1 2

0 0
0 1 2 0 1 2 3 4
c
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8

Biomedical Data Mining, Spatial, Fig. 2 Graphical representation of first (a), second (b), and third-order Hilbert
curves (c). Curves used to sample matrices of size 4, 16, and 81, respectively

Figure 3 shows an example of the output classify corneal shape, and finally, to present the
produced by a corneal topographer for a patient results of those decisions to allow clinicians to
suffering from Keratoconus, a patient with a nor- visually inspect the classification criteria.
mal cornea, and an individual who has undergone
LASIK eye surgery. The methods presented here Data Transformation
are intended to differentiate between these pa- The data from a corneal topographer are largely
tient groups. These images represent illustrative instrument-specific but typically consist of a 3D
examples of each class, i.e., they are designed point cloud of approximately 7000 spatial coordi-
to be easily distinguishable by simple visual in- nates arrayed in a polar grid. The height of each
spection. The top portion of the figure shows the point · is specified by the relation · D f . ; /,
imaged concentric rings. The bottom portion of where the height relative to the corneal apex
the image shows a false color map representing is a function of radial distance from the origin
the surface curvature of the cornea. This color ( ) and the counter-clockwise angular deviation
map is intended to aid clinicians and is largely from the horizontal meridian ( ). The inner and
instrument-dependent. outer borders of each concentric ring consist of
a discrete set of 256 data points taken at a known
angle , but a variable distance , from the origin.
Key Applications
Zernike
This section covers techniques employed to rep- The method described here is based on tech-
resent the topography data with the above mod- niques detailed by Schwiegerling et al. (1995)
eling methods, to use those representations to and Iskander et al. (2001a). In summary, the
Biomedical Data Mining, Spatial 131

Biomedical Data Mining, Spatial, Fig. 3 color topographical map representing corneal curvature,
Characteristic corneal shapes for three patient groups. with an increased curvature given a color in the red
The top image shows a picture of the cornea and reflected spectrum, decreased curvature in blue. From left to right:
concentric rings. The bottom image shows the false Keratoconus, Normal and post-refractive surgery

data is modeled over a user-selected circular polar to Cartesian coordinates. Once this step has
region of variable-diameter, centered on the axis been completed, the matrix is normalized to a
of measurement. The generation of the Zernike power of 2, typically 64 64 or 128 128. After
model surface proceeds in an iterative fashion, normalization, the 2D wavelet decomposition is
computing a point-by-point representation of the applied, with the final level of approximation
original data at each radial and angular loca- coefficients serving as the feature vector.
tion up to a user-specified limit of polynomial
complexity. The polynomial coefficients of the
1D Wavelets
surface that will later be used to represent the
This section details two methods for transforming
proportional magnitude of specific geometric fea-
the topography data into a 1D signal. The first
tures are computed by performing a least-squares
method is to simply trace along each ring in a
fit of the model to the original data, using stan-
counter-clockwise fashion, adding each of the
dard matrix inversion methods (Schwiegerling
256 points to the end of the signal. Upon reaching
et al. 1995).
the end of a ring, one moves to the next larger ring
and repeats the process. Next, a 1D decomposi-
2D Wavelets tion is applied and the final level approximation
To create a model using the 2D wavelet decom- coefficients are taken to serve as a feature vector
position, the data must first be transformed from for classification. The second method involves
132 Biomedical Data Mining, Spatial

transforming the data into a distance matrix as Part of the rationale behind using Zernike
with the 2D wavelet decomposition. Then, one polynomials as a transformation method over
can sample the data using a space-filling curve, other alternatives is that there is a direct corre-
which will result in a 1D representation of the 2D lation between the geometric modes of the poly-
matrix. nomials and the surface features of the cornea.
An added benefit is that the orthogonality of the
Classification series allows each term to be considered inde-
Given a dataset of normal, diseased, and post- pendently. Since the polynomial coefficients used
operative LASIK corneas, the above represen- as splitting attributes represent the proportional
tations were tested using a number of differ- contributions of specific geometric modes, one
ent classification strategies, including decision can create a surface representation that reflects
trees, Naïve Bayes, and neural networks. The the spatial features deemed “important” in classi-
data was modeled with Zernike polynomials and fication. These features discriminate between the
several polynomials orders were tested, rang- patient classes and give an indication as to the
ing from 4 to 10. The experiments showed that specific reasons for a decision.
the low-order Zernike polynomials, coupled with The section below discusses a method that
decision trees, provided the best classification has been designed and developed to visualize
performance, yielding an accuracy of roughly decision tree results and aid in clinical decision
9% (Marsolo et al. 2006; Twa et al. 2003). The support (Marsolo et al. 2005). The first step of
2D, 1D Hilbert and 1D ring-based wavelet rep- the process is to partition the dataset based on
resentations were tested as well. For the 2D the path taken through the decision tree. Next a
representation, a normalized matrix of size 128 surface is created using the mean values of the
128 and a 3rd level decomposition yielded the polynomial coefficients of all the patients falling
highest accuracy. For the Hilbert-based approach, into each partition. For each patient, an individual
a normalized matrix of size 64 64 and a 6th polynomial surface is created from the patient’s
level transformation. With the ring-based model, coefficient values that correspond to the splitting
the 7th level decomposition yielded the highest attributes of the decision tree. This surface is
accuracy. The accuracy of the different wavelet- contrasted against a similar surface that contains
based models was roughly the same, hovering the mean partition values for the same splitting
around 80%, approximately 10% lower than the coefficients, providing a measure to quantify how
best Zernike-based approach. “close” a patient lies to the mean.
Figure 4 shows a partial decision tree for a
Visualization 4th Zernike model. The circles correspond to
While accuracy is an important factor in choosing the Zernike coefficients used as splitting criteria,
a classification strategy, another attribute that while the squares represent the leaf nodes. In
should not be ignored is the interpretability of this tree, three example leaf nodes are shown,
the final results. Classifiers like decision trees are one for each of the patient classes considered in
often favored over “black-box” classifiers such this experiment (K – keratoconus, N – normal,
as neural networks because they provide more L – post-operative LASIK). The black triangles
understandable results. One can manually inspect are simply meant to represent subtrees that were
the tree produced by the classifier to examine the omitted from the figure in order to improve read-
features used in classification. In medical image ability. While each node of the tree represents
interpretation, decision support for a domain ex- one of the Zernike modes, the numbers on each
pert is preferable to an automated classification branch represent the relation that must be true in
made by an expert system. While it is important order to proceed down that path. (If no relation
for a system to provide a decision, it is often is provided, it is simply the negation of the
equally important for clinicians to know the basis relation(s) on the opposing branch(es)). An object
for an assignment. can be traced through the tree until a leaf node is
Biomedical Data Mining, Spatial 133

Biomedical Data Mining,


Spatial, Fig. 4 Partial
Z –13
decision tree for a 4th order
Zernike model
2.88

K
B
Z 04

–4.04
Z 22
Z 00
9.34
–401.83
L
Z 33

1.42

Z 13

0.07 > 1.00

Z 24

–0.31

reached and the object is then assigned the label comparison will give clinicians some indication
of that leaf. Thus, given the tree in Fig. 4, if an of how “close” a patient is to the rule average. To
object had a Z3 1 coefficient with a value 2.88, compute these “rule mean” coefficients, denoted
one would proceed down the left branch. If the rule, the training data is partitioned and the aver-
value was > 2.88, the right branch would be taken age of each coefficient is calculated using all the
and the object would be labeled as keratoconus. records in that particular partition.
In this manner, there is a path from the root node For a new patient, a Zernike transformation
to each leaf. is computed and the record is classified using
As a result, one can treat each possible path the decision tree to determine the rule for that
through a decision tree as a rule for classifying patient. Once this step has been completed, the
an object. For a given dataset, a certain num- visualization algorithm is applied to produce five
ber of patients will be classified by each rule. separate images (illustrated in Fig. 5). The first
These patients will share similar surface features. panel is a topographical surface representing the
Thus, one can compare a patient against the Zernike model for the patient. It is constructed
mean attribute values of all the other patients by plotting the 3-D transformation surface as a
who were classified using the same rule. This 2-D topographical map, with elevation denoted
134 Biomedical Data Mining, Spatial

a b c
1 1 1
200 200 200
0 400 0 0
400 400
600
600
800 600
1 1 1
1 0.5 0 0.5 1 1 0.5 0 0.5 1 1 0.5 0 0.5 1
1 1 1
200 0
200
400 200
0 0 0 400
600 400
600
800 600
1 1 1
1 0.5 0 0.5 1 1 0.5 0 0.5 1 1 0.5 0 0.5 1
1 1 5 1 355
20
0 360
0 0 0 0
5 365
370
1 20 1 10 1
1 0.5 0 0.5 1 1 0.5 0 0.5 1 1 0.5 0 0.5 1
1 1 1 385
20 4
2 390
10
0 0
0 0 0 2 395
4
10 400
1 6 1
1
1 0.5 0 0.5 1 1 0.5 0 0.5 1 1 0.5 0 0.5 1
0.4 2 0

0.5
0.2 0
1

0 2 1.5
1 2 3 4 5 6 7 8 9 1011 12131415 1 2 3 4 5 6 7 8 9 1011 12131415 1 2 3 4 5 6 7 8 9 1011 12131415

Biomedical Data Mining, Spatial, Fig. 5 Example sur- representation using the rule values. The third and fourth
faces for each patient class. (a) represents a Keratoconic panels show the rule surfaces, using the patient and rule
eye, (b) a Normal cornea, and a post-operative LASIK coefficients, respectively. The bottom panel consists of a
eye. The top panel contains the Zernike representation bar chart showing the deviation between the patient’s rule
of the patient. The next panel illustrates the Zernike surface coefficients and the rule values

by color. The second section contains a topo- same rule partition. For each of the distinguishing
graphical surface created in a similar manner by coefficients, the relative error between the patient
using the rule coefficients. These surfaces are and the rule is computed. The absolute value of
intended to give an overall picture of how the the difference between the coefficient value of
patient’s cornea compares to the average cornea the patient and the value of the rule is taken and
of all similarly-classified patients. divided by the rule value. A bar chart of these
The next two panels in Fig. 5 (rows 3 and 4) error values is provided for each coefficient (the
are intended to highlight the features used in clas- error values of the the coefficients not used in
sification, i.e., the distinguishing surface details. classification are set to zero). This plot is intended
These surfaces are denoted as rule surfaces. They to provide a clinician with an idea of the influence
are constructed from the value of the coefficients of the specific geometric modes in classification
that were part of the classification rule (the rest and the degree that the patient deviates from the
are zero). The first rule surface (third panel of mean.
Fig. 5) is created by using the relevant coeffi- The surfaces in Fig. 5 correspond to the three
cients, but instead of the patient-specific values, example rules shown in the partial tree found
the values of the rule coefficients are used. This in Fig. 4. For each of the patient classes, the
surface will represent the mean values of the majority of the objects were classified by the
distinguishing features for that rule. The second example rule. Since these are the most discrimi-
rule surface (row 4, Fig. 5) is created in the same nating rules for each class, one would expect that
fashion, but with the coefficient values from the the rule surfaces would exhibit surface features
patient transformation, not the average values. commonly associated with corneas of that type.
Finally, a measure is provided to illustrate These results are in agreement with expectations
how close a patient lies to those falling in the of domain experts.
B-tree, Versioned 135

Future Directions Teh C-H, Chin RT (1988) On image analysis by the


methods of moments. IEEE Trans Pattern Anal Mach
Intell 10(4):496–513
The above methods are not limited just to model- Thibos LN, Applegate RA, Schwiegerling JT, Webb R
ing the shape of the cornea. They could be applied (1999) Standards for reporting the optical aberrations
to almost any domain dealing with structure- of eyes. In: TOPS, Santa Fe. OSA B
based data. Twa MD, Parthasarathy S, Raasch TW, Bullimore MA
(2003) Automated classification of keratoconus: a case
study in analyzing clinical data. In: SIAM interna-
tional conference on data mining, San Francisco, 1–3
Cross-References May 2003
Zernike F (1934) Beugungstheorie des Schneidenver-
fahrens und seiner verbesserten Form, der Phasenkon-
 Space-Filling Curves trastmethode. Physica 1:689–704

References Bi-temporal
Born M, Wolf E (1980) Principles of optics: electromag-
netic theory of propagation, interference and diffrac-  Spatiotemporal Query Languages
tion of light, 6th edn. Pergamon Press, Oxford/New
York
Daubechies I (1992) Ten lectures on wavelets. Society for
Industrial and Applied Mathematics, Philadelphia
Bitmap
Hoekman DH, Varekamp C (2001) Observation of trop-
ical rain forest trees by airborne high-resolution in-  Raster Data
terferometric radar. IEEE Trans Geosci Remote Sens
39(3):584–594
Iskander DR, Collins MJ, Davis B (2001a) Optimal mod-
eling of corneal surfaces with Zernike polynomials. Bivariate Median
IEEE Trans Biomed Eng 48(1):87–95
Iskander DR, Collins MJ, Davis B, Franklin R (2001b)
 Geometric Median
Corneal surface characterization: how many Zernike
terms should be used? (ARVO) abstract. Invest Oph-
thalmol Vis Sci 42(4):896
Kiely PM, Smith G, Carney LG (1982) The mean shape Black Hole Detection
of the human cornea. J Modern Opt 29(8):1027–1040
Laine AF (2000) Wavelets in temporal and spatial pro-
cessing of biomedical images. Annu Rev Biomed Eng  Ring-Shaped Hotspot Detection
02:511–550
Mallat S (1999) A wavelet tour of signal processing, 2nd
edn. Academic, New York
Mandell RB (1996) A guide to videokeratography. Int BLUP
Contact Lens Clin 23(6):205-28
Marsolo K, Parthasarathy S, Twa MD, Bullimore MA  Spatial Econometric Models, Prediction
(2005) A model-based approach to visualizing classifi-
cation decisions for patient diagnosis. In: Proceedings
of the conference on artificial intelligence in medicine
(AIME), Aberdeen, 23–27 July 2005 Branch and Bound
Marsolo K, Twa M, Bullimore MA, Parthasarathy S
(2006) Spatial modeling and classification of corneal
shape. IEEE Trans Inf Technol Biomed  Skyline Queries
Platzman L, Bartholdi J (1989) Spacefilling curves and the
planar travelling salesman problem. J Assoc Comput
Mach 46:719–737
Schwiegerling J, Greinvenkamp JE, Miller JM (1995) B-tree, Versioned
Representation of videokeratoscopic height data with
Zernike polynomials. J Opt Soc Am A 12(10):2105–
2113  Smallworld Software Suite
136 Bubble Estimation

Bubble Estimation Business Application

 Financial Asset Analysis with Mobile GIS  Decision-Making Effectiveness with GIS

Bundle Adjustment Bx -Tree

 Photogrammetric Methods  Indexing of Moving Objects, Bx -Tree


C

Definition
Caching
A cadastre may be de ned as an of cial geo-
 OLAP Results, Distributed Caching
graphic information system (GIS) which iden-
ti es geographical objects within a country, or
more precisely, within a jurisdiction. Just like
CAD and GIS Platforms a land registry, it records attributes concerning
pieces of land, but while the recordings of a
land registry is based on deeds of conveyance
 Computer Environments for GIS and CAD
and other rights in land, the cadastre is based
on measurements and other renderings of the
location, size, and value of units of property. The
Cadaster cadastre and the land registry in some countries,
e.g., the Netherlands and New Zealand, are man-
 Cadastre aged within the same governmental organization.
From the 1990s, the term land administration
system came into use, referring to a vision of
a complete and consistent national information
Cadastre system, comprising the cadastre and the land
registry.
Erik Stubkj r The above de nition of cadastre accom-
Department of Development and Planning, modates to the various practices within
Aalborg University, Aalborg, Denmark continental Europe, the British Commonwealth,
and elsewhere. Scienti c scrutiny emerged
from the 1970s, where the notions of a land
Synonyms information system or property register provided
a frame for comparison of cadastres across
Cadaster; Land administration system; Land in- countries. However, GIS and sciences emerged
formation system; Land policy; Land registry; as the main approach of research in the more
Property register; Spatial reference frames technical aspects of the cadastre. In fact, the

' Springer International Publishing AG 2017


S. Shekhar et al. (eds.), Encyclopedia of GIS,
DOI 10.1007/978-3-319-17885-1
138 Cadastre

above de nitional exercise largely disregards recordings in ledgers and maps became a model
the organizational, legal, and other social for other European principalities and kingdoms.
science aspects of the cadastre. These are The uneven diffusion of cadastral technology
more adequately addressed when the cadastre reveals a power struggle between the ruling elite
is conceived of as a sociotechnical system, and the landed gentry and clerics, who insisted
comprising technical, intentional, and social on their tax exemption privileges. In the early
elements. modern states, the cadastre was motivated by
reference to a God-given principle of equality
(German: gottgef llige Gerechtigkeit or gottge-
f llige Gleichheit (Kain and Baigent 1993)). Gen-
Historical Background erally, absolutist monarchs were eager to estab-
lish accounts of the assets of their realm, as
The notion of the cadastre has been related to the basis for decisions concerning their use in
Byzantine ledgers, called katastichon in Greek, wars and for the general bene t of the realm.
literally line by line . A Roman law of 111 A continental European version of mercantilism,
BC required that land surveyors (agrimensores) cameralism , was lectured at universities, seek-
should complete maps and registers of certain ing a quasirational exploitation of assets and fair
tracts of Italy. Also, an archive, a tabularium, taxation, for which the cadastre was needed, as
was established in Rome for the deposit of the well as regulations and educational programs,
documents. Unfortunately, no remains of the tab- for example in agriculture, forestry, and mining.
ularium seem to have survived the end of the Cadastral technology, the related professions, and
Roman Empire. the centralized administration together became
The Cadastre reemerged in the fteenth cen- an instrument of uni cation of the country, pro-
tury in some Italian principalities as a means viding the technical rationale for greater equality
of recording tax liabilities. This seems part of in taxation. Taxation thus gradually became con-
a more general trend of systematic recording of trolled by the central authority, rather than me-
assets and liabilities, e.g., through double-entry diated through local magnates. This change was
bookkeeping, and spread to other parts of Europe. recognized by Adam Smith, by physiocrats, and
In order to compensate for the lack of mapping by political writers in France. The administrative
skills, landed assets and their boundaries were de- technology was complemented by codi cation,
scribed through a kind of written maps or cartes that is, a systematic rewriting of laws and by-laws
parlantes. During the sixteenth century, land- that paved the way for the modern state where
scape paintings or so called picture maps were individual citizens are facing the state, basically
prepared e.g., for the court in Speyer, Germany, on equal terms.
for clarifying argumentation on disputed land. The reorganization of institutions after the
During the same period in the Netherlands, the French revolution of 1789 also changed the role
need for dike protection from the sea called for of the cadastre as introduced by Enlightenment
measurements and the organization of work and monarchs. In part, this was due to university re-
society; the practice of commissioning surveys forms, e.g., by Wilhelm von Humboldt in Berlin.
for tax collection became increasingly common Cameralism was split into economics, which in-
there. creasingly became a mathematically based dis-
A new phase was marked by new technology, cipline and a variety of disciplines lectured at
the plane table with the related methods for dis- agricultural and technical universities. Cadastral
tance measurement, mapping, and area calcula- expertise was largely considered a sub eld of
tion. The technology was introduced in 1720 in geodesy, the new and rational way of measuring
Austrian Lombardy through a formal trial against and recording the surface of the Earth. From
alternative mapping methods. The resulting Mi- the end of eighteenth century, cadastral maps
lanese census, the Censimento, with its integrated were increasingly related to or based on geodetic
Cadastre 139

triangulations, as was the case for the Napoleonic sumption being that such recordings substantially
cadastre of France, Belgium and the Netherlands. reduce the number of boundary disputes.
The same cadastral reform included the inten- Application of the above-mentioned xed-
tion of using the measured boundaries and areas boundary solution raises serious problems, even
and the parcel identi cation for legal purposes, if the assumption may be largely con rmed.
primarily by letting the cadastral documentation In the cases of land slides and land drifting
with its xed boundaries prove title to land due to streams, the solution is insensitive to C
and become the nal arbiter in case of boundary the owner s cost of getting access to all parts
disputes. However, the courts that were in charge of the property. Furthermore, the solution
of land registries, generally for a more than a does not accommodate for later and better
century, and the legal profession were reluctant measurements of the boundary. Moreover, the
to adopt what might be considered an encroach- boundary may have been shifted years ago for
ment on their professional territory. During the reasons that have become too costly to elicit,
nineteenth century, most countries improved their relative to the value of the disputed area and
deeds recording system, and German-speaking even acknowledging the fact that justice may
countries managed to develop them into title not be served. Some jurisdictions hence allow
systems backed by legal guaranties. Similarly, for adverse possession , that is: an of cial
from South Australia the so-called Torrens sys- recognition of neighbors agreement on the
tem, adopted in 1858, in uenced the English- present occupational boundary, even if it differs
speaking world. However, with few exceptions, from previous cadastral recordings. Likewise,
the integration of cadastral and legal affairs into legal emphasis on merestones and other boundary
one information system had to await the introduc- marks, as well as the recording of terrain features
tion of computer technology and the adoption of which determine permanent and rather well
business approaches in government. de ned boundary points may supplement the
The above historical account is Eurocentric pure xed-boundary approach.
and bypasses possible scienti c exchange with The geodetic surveyors reference frames lo-
South-Eastern neighbors during the twelfth to cate points in terms of coordinates, but the nam-
fteenth centuries. Also, it leaves out the devel- ing and identi cation of parcels relates to names,
opment in England and its colonies worldwide. which is a sub eld of linguistics. The cadastral
The account describes how the notion of the identi er is a technical place name, related to the
cadastre emerged and varied across time and place names of towns, roads, parishes, and to-
place. This calls for special care in scienti c com- pographic features. Hierarchically structured ad-
munications, since no standardized and theory- ministrative units or jurisdictions and their names
based terminology has been established. provide a means of location of property units.
Even if such ordinal structuring of a jurisdiction
through place names is coarse, relative to the
Scientific Fundamentals metric precision of measured boundary points, it
provides in many cases for a suf cient localiza-
Spatial Reference Frames tion of economic activities and it reduces the de-
The center and rotational axis of the Earth, to- pendency of specialized and costly competence.
gether with the Greenwich meridian, provide a The linguistic approach to localization refers
reference frame for the location of objects on to another spatial reference frame than the Earth,
the surface of the Earth. Furthermore, a map namely the human body. The everyday expres-
projection relates the three-dimensional positions sions of left and right , up and down all
to coordinates on a two-dimensional map plane. refer to the body of the speaker or the listener,
The skills of the geodesist and land surveyor are as practiced when giving directions to tourists
applied in the cadastral eld to record agreed or explaining where to buy the best offers of
legal boundaries between property units, the as- the day. Important research areas include the
140 Cadastre

balancing of nominal, ordinal and metric means and the building erected on it may establish
of localization and the consideration of relations one unit, yet alternatively the building always or
amongst various spatial reference frames. under certain conditions constitutes a unit in its
own right. Variations also occur as to whether
Communication and Standardization parts of buildings can become units, for example
The national information systems (cadastre, land in terms of condominiums, which may depend on
registry, or whatever may be the name) and orga- conditions related to use for housing or business
nizational structure of real property information purposes. Research efforts under the heading of
systems depend on databases and related archives standardization of the core cadastral unit have
and need updating if they are to render trust- contributed substantially to the understanding of
worthy minutes of the spatial and legal situation the complexity of the property unit.
of the property units. Computer and commu- Updating the information in property registers
nication sciences provide the theoretical back- is as essential as the speci cation of units. From
ground for these structures and tasks. However, an informatics point of view, a survey of the
until recently the methods provided in terms of information ows in what may be called the geo-
systems analysis and design, data modeling, etc. data network may reveal uncoordinated, and
addressed the information system within a single perhaps duplicated, efforts to acquire information
decision body, while the situation pertaining to and other suboptimal practices. However, from
real property information depends on an interplay the end users point of view, what takes place
between ministries, local government, and the is a transaction of property rights and related
private sector. The notion of a geospatial data processes, for example subdivision of property
infrastructure is used to describe this scope. The units. The updating of property registers is from
modeling of this infrastructure compares to the this point of view a by-product of the transaction.
modeling of an industrial sector for e-business, The end-user point of view is taken also by
an emergent research issue that includes the de- economists, who offer a theoretical basis for
velopment of vocabulary and ontology resources investigations of the mentioned processes, a eld
of the domain. known as institutional economics . New insti-
The speci cation of the property unit is a tutional economics (NIE) introduces transaction
fundamental issue. Often, the unit is supposed costs as an expense in addition to the cost of pro-
to be in individual ownership, but state or other ducing a commodity to the market. In the present
public ownership is common enough to deserve context, the transaction costs are the fees and
consideration, as are various forms of collective honoraries, etc. to be paid by the buyer and seller
ownership. Collective ownership is realized by of a property unit, besides the cost of the property
attributing rights and obligations to a so-called itself. Buyers efforts to make sure that the seller
legal person, which may be an association, a is in fact entitled to dispose of the property unit
limited company, or another social construct en- concerned can be drastically reduced, that is:
dorsed by legislation or custom. Comparative transaction costs are lowered, where reliable title
studies reveal that the property unit itself can be information from land registries is available.
speci ed in a host of variations: Is the unit a The NIE approach, as advocated by Nobel
single continuous piece of land, or is it de ned laureate Douglass C North, was applied in re-
as made up of one or more of such entities? cent, comparative research. His notion of in-
Relations among pieces of land can be speci ed stitution : the norms which restrict and enable
in other ways: In Northern and Central Europe, human behavior, suggested research focused on
a construct exists where a certain share of a practices rather than legal texts. Methods were
property unit is owned by the current owners of developed and applied for the systematic de-
a number of other property units. In cadastral scription and comparison of property transac-
parlance, such unit is said to be owned by the tions, including a formal, ontology-based ap-
share-holding property units. Furthermore, land proach. The methods developed were feasible and
Cadastre 141

initial national accounts of transaction costs were terms, conveyance is an example of a transaction
drafted. However, advice for optimal solutions in commodities or other assets. These transac-
are not to be expected, partly because many tions are performed according to a set of rules:
of the agents involved, both in the private and acts, by-laws, professional codes of conduct, etc.
the public sector, have other preferences than which are, in the terminology of Douglass North,
the minimizing of transaction costs. Moreover, a set of institutions. The process of change of
property transactions are, for various political these institutions is the object of analysis on C
reasons, often regulated, for example through the second layer and, following Daniel Bromley,
municipal preemption rights or spatial planning called institutional transactions . Institutional
measures. The NIE approach does however offer transactions may reduce transaction costs within
an adequate theoretical basis for analyzes of the the jurisdiction concerned, but Bromley shows
infrastructure of real property rights, analyzes at length that this need not be so; generally, the
which assist in the identi cation and remedy of initiator of an institutional transaction cannot be
the most inappropriate practices. sure whether the intended outcome is realized.
Among other things, this is because the transac-
Cadastral Development Through tion is open to unplanned interference from actors
Institutional Transactions on the periphery and also because the various
Institutional economics divides into two strands: resources of the actors are only partly known at
NIE, and institutional political economy, respec- the outset.
tively. The former may be applied to the cadastre The strand of institutional political economy
and its related processes, conceived as a quasir- is researching such institutional transactions in
ational, smooth-running machine that dispatches order to explain why some countries grow rich,
information packets between agents to achieve while others fail to develop their economy. Here
a certain outcome, e.g., the exchange of money we have the theoretical basis for explaining the
against title to a speci c property unit. However, emergence and diffusion of the cadastre in early
this approach does not account for the fact that modern Europe, cf. Historical Background
the various governmental units, professional as- above. The pious and enlightened absolutist
sociations, etc. involved in the cadastral processes monarchs and their advisors established a set
have diverse mandates and objectives. Develop- of norms that framed institutional transactions in
ment projects pay attention to these con icting a way that encouraged a growing number of the
interests through stakeholder analyzes. Research population to strive for the common weal .
in organizational sociology suggests identi ca-
tion of policy issue networks and investigation
of the exchange of resources among the actors
involved, for example during the preparation and Key Applications
passing of a new act. The resources are gener-
ally supposed not to be money, although bribery The de nition of cadastre speci es the key ap-
occurs, but more abstract entities such as legal- plication: the of cial identi cation and recording
technical knowledge, access to decision centers, of information on geographical objects: pieces of
organizational skills, reputation, and mobilizing land, buildings, pipes, etc. as well as documents
power. and extracts from these on rights and restrictions
Institutional economics provides a frame for pertaining to the geographical objects. Cadastral
relating this power game to the routine trans- knowledge is applied not only in the public
actions of conveyance of real estate. It is done sector, but also in private companies, as we shall
by introducing two layers of social analysis: the see form the following overview of services:
layer of routine transactions, and the layer of
change of the rules which determine the rou- Facilitation of nancial in ow to central and
tine transactions (Williamson 2000). In economic local government by speci cation of property
142 Cadastre

units and provision of information on identi- Supporting property development by assist-


cation, size, and boundaries and by provid- ing owners in reorganizing the shape of and
ing standardized assessment of property units rights in the property unit and its surround-
(mass-appraisal) ings, in compliance with the existing private
Supporting the market in real estate by provid- and public restrictions, for example easements
ing trustworthy recordings of title, mortgages, and zoning. Delivering impartial expertise in
and restrictions boundary disputes.
Supporting sustainable land use (urban, rural, Assisting bodies in need of measuring and
and environmental aspects) by providing data recording of location-speci c themes with le-
for the decision process and applying the de- gal implications, including cultural and natu-
cision outcome in property development ral heritage and opencast mining
Supporting the construction industry and util- Facilitate the diffusion of open source and
ities with data and related services, and coop- proprietary GIS software by adapting it to
erating on the provision of statistics local needs
Assisting in continuous improvement of ad-
ministration, often called good governance ,
through: Again, in a speci c country the structure of the
Considering data de nitions and data construction industry and the utilities, as well as
ows with other bodies in government the status and scope of professional associations
and industry, by mutual relating of post like those of geodetic surveyors and property
addresses, units of property, units for developers may vary.
small-area statistics and spatial planning,
as well as road segments and other
topographic elements Future Directions
Applying cost-effective technology
Contributing towards good governance Formalization of Practice
through participating in national high- Much of cadastral knowledge is still tacit. Thus a
level commissions that aim at change in large task ahead is to elicit this knowledge and in-
the institutional structure concerning real tegrate it with relevant existing conceptual struc-
property rights, housing, and land use. tures and ontologies. Cadastral work is embedded
in legal prescripts and technical standards. The
The taxation issue is mentioned rst among pub- hypothesis supported by institutional economics
lic sector issues, partly for historical reasons, is that the needed formalization will be achieved
but more essentially because the operation of the better through observation and analysis of human
cadastre cost money. In a speci c country, the list routine behavior than through legal analysis of
of tasks or functions may include more or less prescripts or model building of information sys-
operations, and the grouping of tasks may differ. tems. Furthermore, the need for formalization is
However, the list may give an idea of career not only in regard to the production of cadastral
opportunities, as does the corresponding list of services, but also the articulation of the intricate
services within the private sector: structure of process objectives or functions. The
description of realized functions and function
Assisting bodies in need of recording and clusters depends to a certain extent on the local
management of a property portfolio, for ex- expertise that has an obvious interest in staying
ample utilities, road, railroad and harbor agen- in business. The description effort thus has to
cies, defense, property owners associations face a value-laden context. Good places to begin
and hunting societies, charitable trusts, eccle- research are with Kuhn (2001), Frank (2001),
siastical property; supporting the selection of Stubkj r (2003), van Oosterom et al. (2005), and
antenna sites for wireless networks Zevenbergen et al. (2007).
Cadastre 143

Prioritized Issues departments, central and local governments,


The International Federation of Surveyors (FIG) nongovernmental organizations (NGOs) and
provides a framework for exchanges among pro- international organizations, for example as
fessionals and academics. Of the ten commis- demonstrated in the Cities Alliance. Furthermore,
sions, commission 7 on Cadastre and Land Man- the United Nations University (UNU) includes
agement at the International FIG Congress in UNU-IIST, the International Institute for Soft-
Munich, October 2006, adopted a Work Plan ware Technology, enabling software technologies C
2007 2010. Major points of the plan include: for development, as well as UNU-WIDER which
analyses property rights and registration from the
Pro-poor land management and land adminis- point of view of development economics.
tration, supporting good governance The Netherlands-based International Institute
Land administration in post con ict areas for Geo-Information Science and Earth Observa-
Land administration and land management in tion (ITC) has become an associated institution
the marine environment of UNU. With the Netherlands Cadastre, Land
Innovative technology and ICT support in land Registry and Mapping Agency, the ITC is estab-
administration and land management lishing a School for Land Administration Studies
Institutional aspects for land administration at ITC. The school will among others execute a
and land management joint land administration programme with UNU,
consisting of a series of seminars, short courses
Current working relations with UN bodies and networking. Bruce (1993), de Janvry and
will continue and possibly include the UN Sadoulet (2001), Palacio, and North (1990) are
Habitat Global Land Tool Network (GLTN). good places to start further reading on this.
A good starting point would be http://www. g.
net/commission7/ and Kaufmann and Steudler Spatial Learning: Naming Objects of the
(1998). Environment
In autumn 2006, the US National Science Foun-
dation awarded a $3.5 million grant to establish a
Development Research new research center to investigate spatial learning
From a global perspective, the past decades have and use this knowledge to enhance the related
been marked by an attempt, on the part of donors skills students will need. Research goals include
and experts, to install the Western model of in- how to measure spatial learning.
dividual real property rights and cadastre in de- Cadastral studies combine the verbal and the
veloping or partner countries. The efforts have graphical-visual-spatial modes of communication
not had the intended effects for a variety of and need re ection of the teaching and learn-
reasons. One may be that cadastral processes ing methods, also with web-supported distance
interfere with the preferences and habits of end learning in mind. More speci cally, the need of
users, including the large number of owners, balancing of nominal, ordinal and metric means
lease-holders, tenants, mortgagees, etc. This is of localization, etc. which was mentioned above
the case especially where the general economy in Spatial Reference Frames , would bene t
and the structure of professions and university from such re ection.
departments hamper the provision of skilled and
impartial staff (in public and private sector) who
could mediate between the wishes of the right References
holders and the technicalities of property rights,
transactions, and recordings. Bruce J (1993) Review of tenure terminology. Land
tenure center report. http://agecon.lib.umn.edu/ltc/
The provision of resources for such cadastral
ltctb01.pdf. Accessed 14 Aug 2007
development likely depends on one or more de Janvry A, Sadoulet E (2001) Access to land
global networks which comprises university and land policy reforms. Policy brief no. 3, Apr
144 Camera Model

2001. http://www.wider.unu.edu/publications/pb3.pdf.
Accessed 14 Aug 2007 Carbon Emissions
Frank AU (2001) Tiers of ontology and consistency con-
straints in geographic information systems. Int J Geogr  Climate Risk Analysis for Financial Institu-
Inf Sci 15:667 678
Kain RJP, Baigent E (1993) The cadastral map in the tions
service of the state: a history of property mapping. The
University of Chicago Press, Chicago
Kaufmann J, Steudler D (1998) Cadastre 2014. http://
www. g.net/cadastre2014/index.htm. Accessed 14 Carbon Finance
Aug 2007
Kuhn W (2001) Ontologies in support of activities in
geographical space. Int J Geogr Inf Sci 15:613 631  Climate Risk Analysis for Financial Institu-
North DC (1990) Institutions, institutional change and tions
economic performance. Cambridge University Press,
Cambridge
Palacio A, Legal empowerment of the poor: an action
Agenda for the world bank. Available via http://
rru.worldbank.org/Documents/PSDForum/2006/back Carbon Trading
ground/legal_empowerment_of_poor.pdf. Accessed
14 Aug 2007
Stubkj r E (2003) Modelling units of real property  Climate Risk Analysis for Financial Institu-
rights. In: Virrantaus K, Tveite H (eds) ScanGIS 03 tions
proceedings, 9th Scandinavian research conference
on geographical information sciences, Espoo, June
2003
van Oosterom P, Schlieder C, Zevenbergen J, Hess
C, Lemmen C, Fendel E (eds) (2005) Standardiza- Cardinal Direction Relations
tion in the cadastral domain. In: Proceedings, stan-
dardization in the cadastral domain, Bamberg, Dec
 Directional Relations
2004. The International Federation of Surveyors,
Frederiksberg
Williamson OE (2000) The new institutional economics:
taking stock, looking ahead. J Econ Literat 38:595
613 Cartographic Data
Zevenbergen J, Frank A, Stubkj r E (eds) (2007) Real
property transactions: procedures, transaction costs
and models. IOS Press, Amsterdam  Photogrammetric Products

Recommended Reading
Cartographic Generalization
Chang H-J Understanding the relationship between
institutions and economic development: some key
theoretical issues. UNU/WIDER Discussion Paper  Abstraction of Geodatabases
2006/05. July 2006. http://www.wider.unu.edu/
publications/dps/dps2006/dp2006-05.pdf. Accessed
14 Aug 2007
de Janvry A, Gordillo G, Platteau J-P, Sadoulet E
(eds) (2001) Access to land, rural poverty Cartographic Information System
and public action. UNU/WIDER studies in
development economics. Oxford University Press,
Oxford/New York  Atlas Information Systems

Camera Model Catalog Entry

 Photogrammetric Methods  Metadata and Interoperability, Geospatial


Catalogue Information Model 145

stored in the registry as well as the relationships


Catalogue Information Model among metadata classes. The registry information
model de nes what types of objects are stored in
Liping Di1 and Yuqi Bai2
1
the registry and how stored objects are organized
Center for Spatial Information Science and
in the registry. The common part from these
Systems (CSISS), George Mason University,
two de nitions is the schema for describing
Fairfax, VA, USA
2
the objects catalogued/registered in and for C
Center for Spatial Information Science and
organizing the descriptions in a catalogue/registry
Systems (CSISS), George Mason University,
the catalogue metadata schema.
Greenbelt, MD, USA

Historical Background
Synonyms
The rst catalogues were introduced by pub-
Catalogue information schema; Catalogue meta- lishers serving their own business of selling the
data schema; Registry information model books they printed. At the end of the fteenth
century, they made lists of the available titles
and distributed them to those who frequented
the book markets. Later on, with the increasing
Definition
volume of books and other inventories, the library
became one of the earliest domains providing
The catalogue information model is a conceptual
a detailed catalogue to serve their users. These
model that speci es how metadata is organized
library catalogues hold much of the reference
within the catalogue. It de nes a formal structure
information (e. g., author, title, subject, publica-
representing catalogued resources and their inter-
tion date, etc.) of bibliographic items found in a
relationships, thereby providing a logical schema
particular library or a group of libraries.
for browsing and searching the contents in a
People began to use the term metadata in the
catalogue.
late 1960s and early 1970s to identify this kind
There are multiple and slightly different
of reference information. The term meta comes
de nitions of the catalogue information model
from a Greek word that denotes alongside, with,
used by various communities. The Open
after, next. More recent Latin and English us-
Geospatial Consortium (OGC) de nes the
age would employ meta to denote something
catalogue information model in the OGC
transcendental or beyond nature (Using Dublin
Catalogue Services Speci cation (OGC) as an
Core).
abstract information model that speci es a BNF
The card catalogue was a familiar sight to
grammar for a minimal query language, a set
users for generations, but it has been effectively
of core queryable attributes (names, de nitions,
replaced by the computerized online catalogue
conceptual data types), and a common record
which provides more advanced information tools
format that de nes the minimal set of elements
helping to collect, register, browse, and search
that should be returned in the brief and
digitized metadata information.
summary element sets. The Organization for
the Advancement of Structured Information
Standards (OASIS) de nes the registry informa- Scientific Fundamentals
tion model in the ebXML Registry Information
Model (ebRIM) speci cation (OASIS) as the Metadata can be thought of as data about
information model which provides a blueprint other data. It is generally used to describe the
or high-level schema for the ebXML registry. It characteristics of information-bearing entities to
provides the implementers of the ebXML registry aid in the identi cation, discovery, assessment,
with information on the type of metadata that is management, and utilization of the described
146 Catalogue Information Model

entities. Metadata standards have been developed their own catalogue information model as pro-
to standardize the description of information- les. However, to facilitate the interoperability
bearing entities for speci c disciplines or between diverse OGC-compliant catalogue ser-
communities. For interoperability and sharing vice instances, a set of core queryable parameters
purposes, a catalogue system usually adopts originated from Dublin Core is proposed in the
a metadata standard used in the community base speci cation and is desirable to be supported
the system intends to serve as its catalogue in each catalogue service instance. OGC further
information model. endorsed the OASIS ebRIM (e-Business Registry
A metadata record in a catalogue system Information Model) as the preferred basis for
consists of a set of attributes or elements future pro les of OGC Catalogue (OGC).
necessary to describe the resource in question. How a catalogue information model can be
It is an example of the catalogue information formally discovered and described in a catalogue
model being used by the catalogue system. A service is another issue. Some catalogue
library catalogue, for example, usually consists services do not provide speci c operations
of the author, title, date of creation or publication, for automatic discovery of the underlying
subject coverage, and call number specifying the catalogue information model, while others
location of the item on the shelf. The structures, support particular operations to ful ll this task.
relationships, and de nitions for these queryable In the OGC Catalogue Services Speci cation, the
attributes known as conceptual schemas exist names of supported information model elements
for multiple information communities. For the can be listed in the capabilities les, and a
purposes of interchange of information within an mandatory DescribeRecord operation allows the
information community, a metadata schema may client to discover elements of the information
be created that provides a common vocabulary model supported by the target catalogue service.
which supports search, retrieval, display, and This operation allows some of or the entire
association between the description and the information model to be described.
object being described.
A catalogue system needs to reference an
information model for collecting and manipu- Key Applications
lating the metadata of the referenced entities
catalogued in the system. The information model The concept of the catalogue information model
provides speci c ways for users to browse and has been widely applied in many disciplines for
search them. Besides the metadata information information management and retrieval. Common
that directly describes those referenced entities metadata standards are widely adopted as the
themselves, a catalogue might hold another type catalogue information model. Among them, the
of metadata information that describes the rela- Dublin Core is one of the most referenced and
tionship between these entities. commonly used metadata information models for
Some catalogue services may only support scienti c catalogues. In the area of geographic
one catalogue information model, each with the information science, ISO 19115 is being widely
conceptual schema clearly de ned, while others adopted as the catalogue information model for
can support more than one catalogue information facilitating the sharing of a large volume of
model. For example, in the US Geospatial Data geospatial datasets.
Clearinghouse, the af liated Z39.50 catalogue
servers only support US Content Standard for Dublin Core
Digital Geospatial Metadata (CSDGM) standard The Dublin Core metadata standard is a simple
in their initial developing stage. While in OGC yet effective element set for describing a wide
Catalogue Service base speci cation, what cat- range of networked resources (Dublin Core).
alogue information model can be used is un- The Dublin in the name refers to Dublin,
de ned. Developers are encouraged to propose Ohio, USA, where the work originated from
Catalogue Information Model 147

a workshop hosted by the Online Computer through a forum on geospatial metadata. The rst
Library Center (OCLC), a library consortium version of the standard was approved on June 8,
which is based there. The Core refers to the 1994, by the FGDC.
fact that the metadata element set is a basic but Since the issue of Executive Order 12906,
expandable core list (Using Dublin Core). Coordinating Geographic Data Acquisition
The Simple Dublin Core Metadata Element and Access: The National Spatial Data Infras-
Set (DCMES) consists of 15 metadata elements: tructure, by President William J. Clinton on C
title, creator, subject, description, publisher, con- April 11, 1994, this metadata standard has been
tributor, date, type, format, identi er, source, lan- adopted as the catalogue information model in
guage, relation, coverage, and rights. Each ele- numerous geospatial catalogue systems operated
ment is optional and may be repeated. by US federal, state, and local agencies as well as
The Dublin Core Metadata Initiative (DCMI) companies and groups. It has also been used by
continues the development of exemplary terms or other nations as they develop their own national
quali ers that extend or re ne these original 15 metadata standards.
elements. Currently, the DCMI recognizes two In June of 1998, the FGDC approved the
broad classes of quali ers: element re nement CSDGM version 2, which is fully backward com-
and encoding scheme. Element re nement makes patible with and supersedes the June 8, 1994, ver-
the meaning of an element narrower or more spe- sion. This version provides for the de nition of
ci c. Encoding scheme identi es schemes that pro les (Appendix E) and extensibility through
aid in the interpretation of an element value. user-de ned metadata extensions (Appendix D).
There are many syntax choices for Dublin The June 1998 version also modi es some pro-
Core metadata, such as SGML, HTML, duction rules to ease implementation.
RDF/XML, and key-value pair TXT le. In The Content Standard for Digital Geospatial
fact, the concepts and semantics of Dublin Core Metadata (CSDGM) (FGDC, CSDGM) identi-
metadata are designed to be syntax independent es and de nes the metadata elements used to
and are equally applicable in a variety of contexts. document digital geospatial data sets for many
purposes, which includes (1) preservation of the
Earth Science meaning and value of a dataset, (2) contribution
With the advances in sensor and platform tech- to a catalogue or clearinghouse, and (3) aid in
nologies, the Earth science community has col- data transfer. CSDGM groups the metadata infor-
lected a huge volume of geospatial data in the past mation into the following seven types:
30 years via remote sensing methods. To facilitate
the archival, management, and sharing of these Identi cation_Information
massive geospatial data, the Earth science com- Data_Quality_Information
munity has been one of the pioneers in de ning Spatial_Data_Organization_Information
metadata standards and using them as informa- Spatial_Reference_Information
tion models in building catalogue systems. Entity_and_Attribute_Information
Distribution_Information
FGDC Content Standard for Digital Geospatial Metadata_Reference_Information
Metadata
The Federal Geographic Data Committee For each type, it further de nes composed ele-
(FGDC) of the USA is a pioneer in setting ments and their type, short name, and/or domain
geospatial metadata standards for the US federal information.
government. To provide a common set of To provide a common terminology and set of
terminology and de nitions for documenting de nitions for documenting geospatial data ob-
digital geospatial data, FGDC initiated work tained by remote sensing, the FGDC de ned the
on setting the Content Standard for Digital Extensions for Remote Sensing Metadata within
Geospatial Metadata (CSDGM) in June of 1992 the framework of the June 1998 version of the
148 Catalogue Information Model

CSDGM (FGDC, Content Standard for Digital in the metadata or sometimes be documented,
Geospatial Metadata). These remote sensing ex- ISO 19115 de nes a descriptor for each package
tensions provide additional information partic- and each element. This descriptor may have the
ularly relevant to remote sensing: the geome- following values:
try of the measurement process, the properties
of the measuring instrument, the processing of
raw readings into geospatial information, and M (mandatory)
the distinction between metadata applicable to C (conditional)
an entire collection of data and those applicable O (optional)
only to component parts. For that purpose, these
remote sensing extensions establish the names, Mandatory (M) means that the metadata entity or
de nitions, and permissible values for new data metadata element shall be documented.
elements and the compound elements of which Conditional (C) speci es an electronically
they are the components. These new elements are manageable condition under which at least
placed within the structure of the base standard, one metadata entity or a metadata element
allowing the combination of the original standard is mandatory. Conditions are de ned in the
and the new extensions to be treated as a sin- following three possibilities:
gle entity (FGDC, Content Standard for Digital
Geospatial Metadata). Expressing a choice between two or more
options. At least one option is mandatory and
ISO 19115 must be documented.
In May 2003, ISO published ISO 19115: Ge- Documenting a metadata entity or a metadata
ographic Information Metadata (ISO/TC 211). element if another element has been docu-
The international standard was developed by ISO mented.
Technical Committee (TC) 211 as a result of Documenting a metadata element if a speci c
consensus among TC national members as well value for another metadata element has been
as its liaison organizations on geospatial meta- documented. To facilitate reading by humans,
data. ISO 19115, rooted at FGDC CSDGM, pro- plain text is used for the speci c value. How-
vides a structure for describing digital geographic ever, the code shall be used to verify the
data. Actual clauses of 19115 cover properties condition in an electronic user interface.
of the metadata: identi cation, constraints, qual-
ity, maintenance, spatial representation (grid and In short, if the answer to the condition is positive,
vector), reference systems, content (feature cata- then the metadata entity or the metadata element
logue and coverage), portrayal, distribution, ex- shall be mandatory.
tensions, and application schemas. Complex data Optional (O) means that the metadata entity or
types used to describe these properties include the metadata element may be documented or may
extent and citations. ISO 19115 has been adopted not be documented. Optional metadata entities
by OGC as a catalogue information model in and optional metadata elements provide a guide
its Catalogue Service for Web-ISO 19115 Pro- to those looking to fully document their data. If an
le (OGC). Figure 1 depicts the top-level UML optional entity is not used, the elements contained
model of the metadata standard. within that entity (including mandatory elements)
ISO 19115 de nes more than 300 metadata will also not be used. Optional entities may have
elements (86 classes, 282 attributes, 56 relations). mandatory elements; those elements only become
The complex, hierarchical nested structure and mandatory if the optional entity is used.
relationships between the components are shown ISO 19115 de nes the core metadata that
using 16 UML diagrams. consists of a minimum set of metadata required
To address the issue whether a metadata entity to serve the full range of metadata applications.
or metadata element shall always be documented All the core elements must be available in
Catalogue Information Model 149

<<Leaf>>
Constraint
information <<Leaf>>
Content
information <<Leaf>>
<<Leaf>> Citation and responsible
Portrayal catalogue party information
information C
<<Leaf>>
<<Leaf>> Distribution
Maintenance information
information

<<Leaf>>
Meta data entity
set information <<Leaf>>
<<Leaf>> Meta data extension
Application schema information
information

<<Leaf>>
<<Leaf>> Identification
<<Leaf>>
Data quality information
Reference system
information information

<<Leaf>> <<Leaf>> <<Leaf>>


Spatial representation Units of Measure Extent
information (from Derived) information

Catalogue Information Model, Fig. 1 ISO 19115 metadata UML package

a given metadata system. The optional ones US NASA ECS Core Metadata Standard
need not be instantiated in a particular dataset. To enable an improved understanding of the Earth
These 22 metadata elements are shown in as an integrated system, in 1992, the National
Table 1. Aeronautics and Space Administration (NASA)
Currently, ISO is developing ISO 19115-2, of the USA started the Earth Observing System
which extends ISO 19115 for imagery and grid- (EOS) program, which coordinates efforts to
ded data. Similar to the FGDC efforts and using study the Earth as an integrated system. This
FGDC CSDGM Extensions for Remote Sensing program, using spacecraft, aircraft, and ground
Metadata as its basis, ISO 19115-2 will de ne instruments, allows humans to better understand
metadata elements particularly for imagery and climate and environmental changes and to
gridded data within the framework of ISO 19115. distinguish between natural and human-induced
According to the ISO TC 211 program of work, changes. The EOS program includes a series
the nal CD was posted in March 2007; barring of satellites, a science component, and a data
major objection, it will be published as DIS in system for long-term global observations of the
June 2007. land surface, biosphere, solid Earth, atmosphere,
150 Catalogue Information Model

Catalogue Information Model, Table 1 ISO 19115 core metadata


Dataset title (M) Spatial representation type (O)
(MD_Metadata > MD_DataIdenti cation.citation > (MD_Metadata >
CI_Citation.title) MD_DataIdenti cation.spatialRepresentationType)
Dataset reference date (M) Reference system (O)
(MD_Metadata > MD_DataIdenti cation.citation > (MD_Metadata > MD_ReferenceSystem)
CI_Citation.date)
Dataset responsible party (O) Lineage (O)
(MD_Metadata > MD_DataIdenti cation.pointOfContact > (MD_Metadata > DQ_DataQuality.lineage >
CI_ ResponsibleParty) LI_Lineage)
Geographic location of the dataset (by four coordinates or Online resource (O)
by geographic identifier) (C)
(MD_Metadata > MD_DataIdenti cation.extent > EX_Extent (MD_Metadata > MD_Distribution >
> EX_GeographicExtent > EX_GeographicBoundingBox or MD_DigitalTransferOption.onLine >
EX_GeographicDescription) CI_OnlineResource)
Dataset language (M) Metadata file identifier (O)
(MD_Metadata > MD_DataIdenti cation.language) (MD_Metadata. leIdenti er)
Dataset character set (C) Metadata standard name (O)
(MD_Metadata > MD_DataIdenti cation.characterSet) (MD_Metadata.metadataStandardName)
Dataset topic category (M) Metadata standard version (O)
(MD_Metadata > MD_DataIdenti cation.topicCategory) (MD_Metadata.metadata.StandardVersion)
Spatial resolution of the dataset (O) Metadata language (C)
(MD_Metadata > MD_DataIdenti cation.spatial Resolution > (MD_Metadata.language)
MD_ Resolution.equivalentScale or MD_Resolution.distance)
Abstract describing the dataset (M) Metadata character set (C)
(MD_Metadata > MD_DataIdenti cation.abstact) (MD_Metadata.characterSet)
Distribution format (O) Metadata point of contact (M)
(MD_Metadata > MD_Distribution > MD_Format.name and (MD_Metadata.contact > CI_ResponsibleParty)
MD_Format.version)
Additional extent information for the dataset (vertical and Metadata date stamp (M)
temporal) (O)
(MD_Metadata > MD_DataIdenti cation.extent > EX_Extent (MD_Metadata.dateStamp)
> EX_TemporalExtent or EX_VerticalExtent)

and oceans. The program aims at accumulating extent, and content. The ECS Core Metadata
15 years of Earth observation data at a rate Standard has been used as the catalogue infor-
of over 2 terabytes per day. To support data mation model for EOSDIS Data Gateway (EDG)
archival, distribution, and management, NASA and the EOS ClearingHOuse (ECHO). The ECS
has developed an EOS Data and Information Core Metadata Standard was the basis for the
System (EOSDIS) and its core system (ECS), development of FGDC CSDGM Extensions for
the largest data and information system for Earth Remote Sensing Metadata.
observation in the world. With new satellites being launched and instru-
In order to standardize the descriptions of ments being operational, this standard will incor-
data collected by the EOS program, NASA has porate new keywords from them into the new ver-
developed the ECS Core Metadata Standard. The sion. The current version is 6B, released on Octo-
standard de nes metadata in several areas: al- ber 2002. The 6B version is logically segmented
gorithm and processing packages, data sources, into eight modules for the purpose of readability,
references, data collections, spatial and temporal including data originator, ECS collection, ECS
Catalogue Information Model 151

RegistryEntry
RegistryPackage 0..* ExternalLink Externalldentifier
0..*
packages
externalLinks

0..* externalldentifiers
Slot <(Association)>
Association

C
member
0..* slots 0..* 1..* linked Objects identificationScheme
RegistryObject
Classification
affectedObjects 1..* classifications 0..*
0..*
specifcationObject
0..* 0..* Association
classificationScheme
AuditableEvent
RegistryEntry
Association ClassificationScheme
SpecificationLink

classificationNode
requestor
1 1..* 0..*
User
0..* Association 0..* Organization
classificationScheme
affiliatedWith ServiceBinding
1
0..* 0.1
primaryContact
ClassificationNode 0..1
parent 0..* bindings
targetBinding
parent

1..* 0..* 1..* 1..* 1


EmailAddress TelephoneNumber
1 PostalAddress RegistryEntry
Service

Catalogue Information Model, Fig. 2 The high-level UML model of ebRIM (OASIS)

data granule, locality spatial, locality temporal, Pro les in the OGC technical meeting in
contact, delivered algorithm package, and docu- December 2007 (OGC).
ment (ECS).

ebRIM Future Directions


The ebXML Registry Information Model
(ebRIM) was developed by OASIS to specify Catalogue information models de ne the organi-
the information model for the ebXML registry zation of metadata in the catalogue. Each cata-
(OASIS). The goal of the ebRIM speci cation logue system normally adopts a metadata stan-
is to communicate what information is in the dard as its catalogue information model. With
registry and how that information is organized. the wide acceptance of the concept of metadata,
The high-level view of the information model is there are usually multiple related metadata stan-
shown in Fig. 2. dards used by different catalogue systems in the
ebRIM has been widely used in the world same application domain. Metadata crosswalks
of e-business web services as the standardized are needed to build maps between two metadata
information model for service registries. In the elements and/or attributes so that the interop-
geospatial community, OGC has adopted the erability among the legacy catalogue systems
ebRIM speci cation as one of the application becomes possible. Besides these crosswalks, an-
pro les of its Catalogue Service for Web other direction is to organize a formal representa-
Speci cation (OGC). And this model has been tion of every metadata concept into a geospatial
further approved as the preferred meta-model ontology for enabling semantic interoperability
for future OGC CS-W Catalogue Application between catalogues.
152 Catalogue Information Schema

In addition to the catalogue information mod-


els, there are other necessary parts to compose an
Category, Geographic
operational catalogue service, such as query lan-
guage, query and response protocol, etc. Related  Geospatial Semantic Integration
research has been directed in the OGC, FGDC,
and other agencies.

Central Perspective
References
 Photogrammetric Methods
Dublin Core, http://en.wikipedia.org/wiki/Dublin_Core.
Accessed 13 Sept 2007
ECS, Release 6B Implementation Earth Science Data Mo-
del for the ECS Project. http://spg.gsfc.nasa.gov/stan
dards/heritage/eosdis-core-system-data-model. Acc- Central Projection
essed 13 Sept 2007
FGDC, CSDGM. http://www.fgdc.gov/standards/stand
ards_publications/index_html. Accessed 13 Sept 2007  Photogrammetric Methods
FGDC, Content Standard for Digital Geospatial Metadata,
Extensions for Remote Sensing Metadata. http://www.
fgdc.gov/standards/standards_publications/index_html.
Accessed 13 Sept 2007
ISO/TC 211, ISO 19115 geographic information: meta- Centrographic Measures
data
OASIS, ebXML Registry Information Model (ebRIM).
http://www.oasis-open.org/committees/regrep/docume  CrimeStat: A Spatial Statistical Program for the
nts/2.0/specs/ebrim.pdf. Accessed 13 Sept 2007 Analysis of Crime Incidents
OGC, OGC Catalogue Services Speci cation, OGC 04-
021r3. http://portal.opengeospatial.org/ les/?artifact_
id=5929&Version=2 . Accessed 13 Sept 2007
OGC, OpenGISfi Catalogue Services Speci cation 2.0
ISO19115/ISO19119 Application Pro le for CSW CGI
2.0. OGC 04-038r2. https://portal.opengeospatial.org/
les/?artifact_id=8305. Accessed 13 Sept 2007
OGC, EO Products Extension Package for ebRIM  Web Mapping and Web Cartography
(ISO/TS 15000-3) Pro le of CSW 2.0. OGC 06-131.
http://portal.opengeospatial.org/ les/?artifact_id=17
689. Accessed 13 Sept 2007
OGC, OGC Adopts ebRIM for Catalogues. http://www.
opengeospatial.org/pressroom/pressreleases/655. Ac- CGIS
cessed 13 Sept 2007
Using Dublin Core, http://www.dublincore.org/docume
nts/2001/04/12/usageguide/. Accessed 13 Sept 2007  Geocollaboration

Catalogue Information Schema Chain

 Catalogue Information Model  Spatial Data Transfer Standard (SDTS)

Catalogue Metadata Schema Chain of Space-Time Prisms

 Catalogue Information Model  Space-Time Prism Model


Change Detection 153

digital data of the earth surface in multispectral


Change Detection bands allowed scientists to get relatively consis-
tent data over time and to characterize changes
JØr me ThØau
over relatively large area for the rst time. The
GIS Training and Research Center, Idaho State
continuity of this mission as well as the launch of
University, Pocatello Idaho, ID, USA
numerous other ones ensured the development of
change detection techniques from that time. C
However, the development of digital change
Synonyms detection techniques was limited by data process-
ing technology capacities and followed closely
Detection of changes; Digital change detection the development of computer technologies. The
methods; Land cover change detection situation evolves from the 1960s when a few
places in the world were equipped with expensive
computers to the present when personal com-
Definition puters are fast and cheap enough to apply even
complex algorithms and change detection tech-
Change Detection can be de ned as the process niques to satellite imagery. The computer tech-
of identifying differences in the state of an object nology also evolved from dedicated hardware to
or phenomenon by observing it at different times relatively user-friendly software specialized for
(Singh 1989). This process is usually applied image processing and change detection.
to earth surface changes at two or more times. Based on published literature, the algebra
The primary source of data is geographic and is techniques such as image differencing or image
usually in digital format (e.g., satellite imagery), ratioing were the rst techniques used to
analog format (e.g., aerial photos), or vector for- characterize changes in digital imagery during
mat (e.g., feature maps). Ancillary data (e.g., the 1970s (Lunetta and Elvidge 1998). These
historical, economic, etc.) can also be used. techniques are simple and fast to perform and
are still widely used today. More complex
techniques were developed since then with
Historical Background the improvement of processing capacities but
also with the development of new theoretical
Change detection history starts with the history approaches. Change detection analysis of the
of remote sensing and especially the rst aerial earth surface is a very active topic due to the
photography taken in 1859 by Gaspard Felix concerns about consequences of global and local
Tournachon, also known as Nadar. Thereafter, changes. This eld of expertise is constantly
the development of change detection is closely progressing.
associated with military technology during world
wars I and II and the strategic advantage pro-
vided by temporal information acquired by re- Scientific Fundamentals
mote sensing. Civilian applications of change de-
tection were developed following these events in Changes on Earth Surface
the twentieth century using mostly interpretation The earth surface is changing constantly in many
and analog means. However, civilian availability ways. First, the time scales, at which changes
of data was limited until the 1970s and 1980s due can occur, are very heterogeneous. They may
to military classi cation of imagery. vary from catastrophic events (e.g., ood) to
The development of digital change detection geological events (e.g., continental drift) which
era really started with the launch of Landsat-1 correspond to a gradient between punctual and
(called rst: Earth Resources Technology Satel- continuous changes respectively. Secondly, the
lite) in July 1972. The regular acquisition of spatial scales, at which changes can occur, are
154 Change Detection

also very heterogeneous and may vary from local performed by carefully selecting relevant mul-
events (e.g., road construction) to global changes tidate imagery and by applying pre-processing
(e.g., ocean water temperature). Due to this very treatments.
large spatio-temporal range, the nature and extent
of changes are complex to determine because
they are interrelated and interdependent at differ- Data Selection and Pre-processing
ent scales (spatial, temporal). Change detection Data selection is a critical step in change detec-
is, therefore, a challenging task. tion studies. The acquisition period (i.e., season,
month) of multidate imagery is an important
Imagery Characteristics Regarding parameter to consider in image selection because
Changes it is directly related to phenology, climatic con-
Since the development of civilian remote sensing, ditions, and solar angle. A careful selection of
the earth bene ts from a continuous and increas- multidate images is therefore needed in order to
ing coverage by imagery such as: aerial photogra- minimize the effects of these factors. In vegeta-
phy or satellite imagery. This coverage is ensured tion change studies (i.e., over different years), for
by various sensors with various properties. First, example, summer is usually used as the target
in terms of the time scale, various temporal reso- period because of the relative stability of phe-
lutions (i.e., revisit time) and mission continuities nology, solar angle, and climatic conditions. The
allow coverage of every point of the earth from acquisition interval between multidate imagery
days to decades. Secondly, in terms of the spatial is also important to consider. As mentioned be-
scale, various spatial resolutions (i.e., pixel size, fore, earth surface changes must cause enough
scene size) allow coverage of every point of the radiance changes to be detectable. However, the
earth at a sub-meter to a kilometer resolution. data selection is often limited by data availability
Thirdly, sensors are designed to observe the earth and the choice is usually a compromise between
surface using various parts of the electromagnetic the targeted period, interval of acquisition, and
spectrum (i.e., spectral domain) at different res- availability. The cost of imagery is also a limiting
olutions (i.e., spectral resolution). This diversity factor in data selection.
allows the characterization of a large spectrum However, a careful data selection is usually
of earth surface elements and change processes. not enough to minimize radiometric heterogene-
However, change detection is still limited by ity between multidate images. First, atmospheric
data availability and data consistency (i.e., mul- conditions and solar angle differences usually
tisource data). need additional corrections and secondly other
factors such as sensor calibration or geometric
Changes in Imagery distortions need to be considered. In change de-
Changes in imagery between two dates trans- tection analysis, multidate images are usually
late into changes in radiance. Various factors compared on a pixel basis. Then, very accu-
can induce changes in radiance between two rate registrations need to be performed between
dates such as changes in: sensor calibration, so- images in order to compare pixels at the same
lar angle, atmospheric conditions, seasons, or locations. Misregistration between multidate im-
earth surface. The rst premise of using imagery ages can cause signi cant errors in change in-
for change detection of the earth surface is that terpretation. The sensitivity of change detection
change in the earth surface must result in a approaches to misregistration is variable though.
change in radiance values. Secondly, the change The minimization of radiometric heterogeneity
in radiance due to earth surface changes must (due to sources other than earth surface change)
be large compared to the change in radiance due can be performed using different approaches de-
to other factors. A major challenge in change pending on the level of correction required and
detection of the earth surface using imagery is the availability of atmospheric data. The tech-
to minimize these other factors. This is usually niques such as dark object subtraction, relative
Change Detection 155

radiometric normalization or radiative transfer threshold selection using the standard deviation
code can be used. of resulting pixels.

Change Detection Methods Post-classification: This method is also com-


Summarized here are the most common methods monly referred to as Delta classi cation . It is
used in change detection studies (Singh 1989; widely used and easy to understand. Two im-
Lunetta and Elvidge 1998; Coppin et al. 2004; ages acquired at different times are independently C
Lu et al. 2004; Mas 1999). Most of these meth- classi ed and then compared. Ideally, similar
ods use image processing approaches applied to thematic classes are produced for each classi -
multidate satellite imagery. cation. Changes between the two dates can be
visualized using a change matrix indicating, for
Image differencing: This simple method is both dates, the number of pixels in each class
widely used and consists of subtracting registered (Fig. 2). This matrix allows one to interpret what
images acquired at different times, pixel by pixel changes occurred for a speci c class. The main
and band by band. No changes between times advantage of this method is the minimal impacts
result in pixel values of 0, but if changes occurred of radiometric and geometric differences between
these values should be positive or negative multidate images. However, the accuracy of the
(Fig. 1). However, in practice, exact image nal result is the product of accuracies of the
registration and perfect radiometric corrections two independent classi cations (e.g., 64 % nal
are never obtained for multidate images. Residual accuracy for two 80 % independent classi cation
differences in radiance not caused by land cover accuracies).
changes are still present in images. Then the
challenge of this technique is to identify threshold Direct multidate classification: This method
values of change and no-change in the resulting is also referred to as Composite analy-
images. Standard deviation is often used as sis , Spectral-temporal combined analysis ,
a reference values to select these thresholds. Spectral-temporal change classi cation ,
Different normalization, histogram matching, Multidate clustering , or Spectral change
and standardization approaches are used on pattern analysis . Multidate images are combined
multidate images to reduce scale and scene into a single dataset on which a classi cation
dependent effects on differencing results. The is performed (Fig. 3). The areas of changes
image differencing method is usually applied to are expected to present different statistics (i.e.,
single bands but can be also applied to processed distinct classes) compared to the areas with no
data such as multidate vegetation indices or changes. The approach can be unsupervised or
principal components. supervised and necessitates only one classi ca-
tion procedure. However, this method usually
Image ratioing: This method is comparable to produces numerous classes corresponding to
the image differencing method in terms of its spectral changes within each single image but
simplicity and challenges. However, it is not as also to temporal changes between images.
widely used. It is a ratio of registered images The interpretation of results is often complex
acquired at different times, pixel by pixel and and requires a good knowledge of the study
band by band. Changes are represented by pixel area. Combined approaches using principal
values higher or lower than 1 (Fig. 1). Pixels with component analysis or Bayesian classi er can be
no change will have a value of one. In practice, performed to reduce data dimensionality or the
for the same reasons as in image differencing, the coupling between spectral and temporal change
challenge of this technique is in selecting thresh- respectively.
old values between change and no change. This
technique is often criticized because the non- Linear transformations: This approach
normal distribution of results limits the validity of includes different techniques using the same
156 Change Detection

Change Detection, Fig. 1


Example of image
differencing and image
ratioing procedures

theoretical basis. The Principal Component dates). Usually the PCA is calculated from a
Analysis (PCA) and the Tasseled-Cap trans- variance/co-variance matrix. However, standard-
formations are the most common ones. Linear ized matrix (i.e., correlation matrix) is also used.
transformations are often used to reduce spectral The PCA is scene dependent and results can be
data dimensionality by creating fewer new hard to interpret. The challenging steps are to
components. The rst components contain most label changes from principal components and to
of the variance in the data and are uncorrelated. select thresholds between change and no-change
When used for change detection purposes, linear areas. A good knowledge of the study area is
transformations are performed on multidate required.
images that are combined as a single dataset The Tasseled-Cap is also a linear transfor-
(Fig. 3). mation. However, unlike PCA, it is independent
After performing a PCA, unchanged areas are of the scene. The new component directions
mapped in the rst component (i.e., information are selected according to pre-de ned spectral
common to multidate images) whereas areas of properties of vegetation. Four new components
changes are mapped in the last components (i.e., are computed and oriented to enhance brightness,
information unique to either one of the different greenness, wetness, and yellowness. Results are
Change Detection 157

Change Detection, Fig. 2


Example of a
post-classi cation
procedure

also dif cult to interpret and change labeling a change between two dates, its position in
is challenging. Unlike PCA, Tasseled-Cap n-dimensional spectral space is expected to
transformation for change detection requires change. This change is represented by a vector
accurate atmospheric calibration of multidate (Fig. 4) which is de ned by two factors, the
imagery. direction which provides information about
Other transformations such as multivariate the nature of change and the magnitude which
alteration detection or Gramm-Schmidt transfor- provides information about the level of change.
mation were also developed but used to a lesser This approach has the advantage to process
extent. concurrently any number of spectral bands. It
also provides detailed information about change.
Change vector analysis: This approach is The challenging steps are to de ne thresholds of
based on the spatial representation of change magnitude, discriminating between change and
in a spectral space. When a pixel undergoes no change, and to interpret vector direction in
158 Change Detection

Change Detection, Fig. 3


Example of direct
multidate classi cation and
linear transformation
procedures

relation with the nature of change. This approach sion function and to de ne thresholds between
is often performed on transformed data using change and no change areas.
methods such as Tasseled-Cap.
Multitemporal spectral mixture analysis: The
Image regression: This approach assumes that spectral mixture analysis is based on the premise
there is a linear relationship between pixel values that a pixel re ectance value can be computed
of the same area at two different times. This from individual values of its composing elements
implies that a majority of the pixels did not (i.e., end-members) weighted by their respective
encounter changes between the two dates (Fig. 5). proportions. This case assumes a linear mixing
A regression function that best describes the of these components. This method allows re-
relationship between pixel values of each spectral trieving sub-pixel information (i.e., surface pro-
band at two dates is developed. The residuals of portions of end-members) and can be used for
the regression are considered to represent the ar- change detection purposes by performing sepa-
eas of changes. This method has the advantage of rate analysis and comparing results at different
reducing the impact of radiometric heterogeneity dates (Fig. 6). The advantage of this method is
(i.e., atmosphere, sun angle, sensor calibration) to provide precise and repeatable results. The
between multidate images. However, the chal- challenging step of this approach is to select
lenging steps are to select an appropriate regres- suitable end-members.
Change Detection 159

Change Detection, Fig. 4


Example and principle of
the change vector
procedure

Combined approaches: The previous tech- the vegetation cover. This degradation has had
niques represent the most common approaches a direct impact on health problems observed
used for change detection purposes. They can in the caribou (Rangifer tarandus) population
be used individually, but are often combined over the last few years and may also have
together or with other image processing contributed to the recent decline of the GRCH
techniques to provide more accurate results. (404,000 head in 2000 2001). Lichen habitats
Numerous combinations can be used and they are good indicators of caribou herd activity
will not be described here. Some of them because of their sensitivity to overgrazing and
include the combination of vegetation indices and overtrampling, their widespread distribution over
image differencing, change vector analysis and northern territories, and their in uence on herd
principal component analysis, direct multidate nutrition. The herd range covers a very large
classi cation and principal component analysis, territory which is not easily accessible. As a
multitemporal spectral analysis and image result, eld studies over the whole territory are
differencing, or image enhancement and post- limited and aerial surveys cannot be conducted
classi cation. frequently. Satellite imagery offers the synoptic
view and temporal resolution necessary for
Example of change detection analysis: mapping and monitoring caribou habitat. In
Mapping changes in caribou habitat using this example, a change detection approach using
multitemporal spectral mixture analysis: The Landsat imagery was used. The procedure was
George River Caribou Herd (GRCH), located based on spectral mixture analysis and produced
in northeastern Canada, increased from about maps showing the lichen proportion inside each
5,000 in the 1950s to about 700,000 head in pixel. The procedure was applied to multidate
the 1990s. This has led to an over-utilization imagery to monitor the spatio-temporal evolution
of summer habitat, resulting in degradation of of the lichen resource over the past three decades
160 Change Detection

Change Detection, Fig. 5 Raster data covering the exact same location
Example and principle of (e.g. Digital Number Band a)
the image regression
procedure

Time x Time y

Pixel values Time y


Pixel values Time y

Pixel values Time x Pixel values Time x

Scatterplot representing a Scatterplot representing a


theoretical situation realistic situation with various
without any changes in changes in pixel values
pixel values between the between the two dates.
two dates. However, a linear relationship

and gave new information about the habitat used Change Master Directory; IGBP; IHDP; WCRP).
by the herd in the past, which was very useful to Monitoring changes using GIS and remote sens-
better understand population dynamics. Figure 6 ing is therefore used in a wide eld of applica-
summarizes the approach used in this study and tions. A non-exhaustive list of key applications is
illustrates the steps typical of a change detection presented here.
procedure.

Forestry
Key Applications Deforestation (e.g., clear cut mapping, regen-
eration assessment)
The earth surface is changing constantly in many Fire monitoring (e.g., delineation, severity,
ways. Changes occur at various spatial and tem- detection, regeneration)
poral scales in numerous environments. Change Logging planning (e.g., infrastructures, inven-
detection techniques are employed for different tory, biomass)
purposes such as research, management, or busi- Herbivory (e.g., insect defoliation, grazing)
ness (Lunetta and Elvidge 1998; Canada Centre Habitat fragmentation (e.g., landcover
for Remote Sensing; Diversitas; ESA; Global changes, heterogeneity)
Change Detection 161

Change Detection Procedure Example: Mapping changes in caribou habitat

-Sensor selected: Landsat Thematic Mapper and Multispectral


Image Selection Scanner (MSS)
-Targeted years: 1998, 1988, 1978
-Targeted periods: end of august (minimize phenological effects)
-Limitations: cloud free scene and Landset MSS availability
C
Relative image registration
Image Registration
Image A Image B
+ +
master to correct
(e.g. 1998) + + (e.g. 1988)
+ +

Radiometric Normalization of scenes


Radiometric and Atmospheric using statistical selection of pseudo-invariant features
Corrections
Image A
master
Common
area used for
correction

Image B
to correct Before correction After correction

Multitemporal Spectral Mixture Analysis


Multitemporal Analyses
Mathematical representation of a mixed pixel Digital Number (DN) in channel c:
DNcMixed pixel = (DNc Fraction)Lichen+(DNc Fraction)Canopy+(DNc Fraction)Shadow+
Errorc

Spectral Mixture Analysis provides for each pixel: Lichen fraction, Canopy fraction, and Shadow

Lichen map 1978 Lichen map 1988 Lichen map 1998


0 % lichen 100

Image Differencing
Change Detection Results for lichen fractions between 1978 and 1998
0 20 km

0 Increased % lichen 100

0 Decreased % lichen100

For more details see: Théau and Duguay (2004) Mapping Lichen Habitat Changes inside the Summer Range of the George River Caribou Herd
(Québec-Labrador, Canada) using Landsat Imagery (1976-1998). Rangifer. 24: 31-50.

Change Detection, Fig. 6 Example of a change detection procedure. Case study of mapping changes in caribou
habitat using multitemporal spectral mixture analysis

Agriculture and Rangelands Deserti cation assessment (e.g., bare ground


Crop monitoring (e.g., growing, biomass) exposure, wind erosion)
Invasive species (e.g., detection, distribution)
Soil moisture condition (e.g., drought, ood, Urban
landslides) Urban sprawl (e.g., urban mapping)
162 Change Detection

Transportation and infrastructure planning and data processing capacities. In the future,
(e.g., landcover use) these elds will still evolve in parallel and new
developments in change detection are expected
Ice and Snow with the development of computer technologies.
Navigation route (e.g., sea ice motion) Developments and applications of new im-
Infrastructure protection (e.g., ooding moni- age processing methods and geospatial analysis
toring) are also expected in the next decades. Arti -
Glacier and ice sheet monitoring (e.g., motion, cial intelligence systems as well as knowledge-
melting) based expert systems and machine learning al-
Permafrost monitoring (e.g., surface tempera- gorithms represent new alternatives in change
ture, tree line) detection studies (Coppin et al. 2004). These
techniques have gained considerable attention in
Ocean and Coastal the past few years and are expected to increase in
Water quality (e.g., temperature, productivity) change detection approaches in the future. One
Aquaculture (e.g., productivity) of the main advantages of these techniques is
Intertidal zone monitoring (e.g., erosion, veg- that they allow the integration of existing knowl-
etation mapping) edge and non-spectral information of the scene
Oil spill (e.g., detection, oil movement) content (e.g., socio-economic data, shape, and
size data). With the increasing interest in using
integrated approaches such as coupled human-
environment systems, these developments look
Future Directions promising.
The recent integration of change detection and
In the past decades, a constant increase of re- spatial analysis modules in most GIS software
motely sensed data availability was observed. also represents a big step towards integrated tools
The launch of numerous satellite sensors as well in the study of changes on the earth surface. This
as the reduction of product costs can explain integration also includes an improvement of com-
this trend. The same evolution is expected in the patibility between image processing software and
future. The access to constantly growing archive GIS software. More developments are expected
contents also represents a potential for the de- in the future which will provide new tools for
velopment of more change detection studies in integrating multisource data more easily (e.g.,
the future. Long-term missions such as Landsat, digital imagery, hard maps, historical informa-
SPOT (Satellite pour l Observation de la Terre), tion, vector data).
AVHRR (Advanced Very High Resolution Ra-
diometer) provide continuous data for more than
20 30 years now. Although radiometric hetero-
geneity between sensors represents serious lim- Cross-References
itation in time series analysis, these data are
still very useful for long term change studies.  Co-location Pattern Discovery
These data are particularly suitable in the de-  Correlation Queries in Spatial Time Series Data
velopment of temporal trajectory analysis which  Spatiotemporal Change Footprint Pattern Dis-
usually involves the temporal study of indicators covery
(e.g., vegetation indices, surface temperature) on
a global scale.
Moreover, as mentioned before in the His-
References
torical Background section, the development of
change detection techniques are closely linked Canada Centre for Remote Sensing, http://ccrs.nrcan.gc.
with the development of computer technologies ca/index_e.php. Accessed Nov 2006
Channel Modeling and Algorithms for Indoor Positioning 163

Coppin P, Jonckheere I, Nackaerts K, Muys B, Lambin E Definition


(2004) Digital change detection methods in ecosystem
monitoring: a review. Int J Remote Sens 25(9):1565
1596 One of the new frontiers in wireless networking
Diversitas Integrating biodiversity science for hu- research is location awareness. Knowledge of
man well-being, http://www.diversitas-international. a user s location enables a number of location-
org/. Accessed Nov 2006 based services (LBS) to be delivered to that user.
ESA Observing the Earth, http://www.esa.int/esaEO/
However, while this problem has been largely C
index.html. Accessed Nov 2006
Global Change Master Directory, http://gcmd.nasa.gov/ addressed for the outdoor environment, indoor
index.html. Accessed Nov 2006 positioning is an open area of research. Here,
IGBP International Geosphere-Biosphere Programme, the problem of accurate indoor positioning is dis-
http://www.igbp.net/. Accessed Nov 2006
IHDP International Human Dimensions Programme on cussed, and the current state of the art in accurate
Global Environmental Change, http://www.ihdp.org/. position estimation techniques is reviewed.
Accessed Nov 2006
Lu D, Mausel P, Brond zios E, Moran E (2004) Change
detection techniques. Int J Remote Sens 25(12):2365
2407
Lunetta RS, Elvidge CD (1998) Remote sensing Historical Background
change detection: environmental monitoring meth-
ods and applications. Ann Arbor Press, Chelsea, Serious research in the eld of positioning rst
p 318
Mas J-F (1999) Monitoring land-cover changes: a com- began in the 1960s, when several US government
parison of change detection techniques. Int J Remote agencies, including the Department of Defense
Sens 20(1):139 152 (DoD), National Aeronautics and Space Admin-
Singh A (1989) Digital change detection techniques using
istration (NASA), and the Department of Trans-
remotely-sensed data. Int J Remote Sens 10(6):989
1003 portation (DOT) expressed interest in develop-
WCRP World Climate Research Programme, http:// ing systems for position determination (Kaplan
wcrp.wmo.int/. Accessed Nov 2006 1996). The result, known as the Global Position-
ing System (GPS), is the most popular position-
ing system in use today. Activity in this area
continued after cellular networks ourished in the
Change of Support Problem 1990s, driven largely by regulatory requirements
for position estimation, such as E-911.
 Error Propagation in Spatial Prediction While these developments were taking place,
similar research and development activity started
in the eld of indoor positioning, due to emerging
applications in the commercial as well as public
Channel Modeling and Algorithms safety/military areas. In the commercial space,
for Indoor Positioning indoor positioning is needed for applications such
as tracking people with special needs (such as
Muzaffer Kanaan, Bardia Alavi, Ahmad Hatami, people who are sight impaired), as well as lo-
and Kaveh Pahlavan cating equipment in warehouses and hospitals. In
Center for Wireless Information Network the public safety and military space, very accurate
Studies, Worcester Polytechnic Institute, indoor positioning is required to help emergency
Worcester, MA, USA workers as well as military personnel effectively
complete their missions inside buildings. Some of
these applications also require simple, low-power
Synonyms user terminals such as those that might be found
in ad hoc sensor networks.
Indoor geolocation; Indoor location estimation; Positioning techniques developed for GPS and
Indoor position estimation cellular networks generally do not work well
164 Channel Modeling and Algorithms for Indoor Positioning

RP #1
RP #2

Location metric
Sensor
Location metric

RP #3
RP #N RP #4
Location metric Location metric

Location metrics from


the network

Positioning Algorithm

Sensor position estimate

Channel Modeling and Algorithms for Indoor Positioning, Fig. 1 General structure of an indoor geolocation
system. RP reference point

in indoor areas, owing to the large amount of sections, an overview of positioning techniques is
signal attenuation caused by building walls. In provided for the indoor environment.
addition, the behavior of the indoor radio channel
is very different from the outdoor case, in that it
exhibits much stronger multipath characteristics. Scientific Fundamentals
Therefore, new methods of position estimation
need to be developed for the indoor setting. In Structure of a Positioning System
addition, the accuracy requirements of indoor po- The basic structure of a positioning system is
sitioning systems are typically a lot higher. For an illustrated in Fig. 1, where a sensor (whose loca-
application such as E-911, an accuracy of 125 m tion is to be determined) is shown. The system
for 67 % of the time is considered acceptable consists of two parts: reference points (RPs) and
(FCC 1996), while a similar indoor application the positioning algorithm. The RPs are radio
typically requires an accuracy level on the order transceivers, whose locations are assumed to be
of only a few meters (Sayed et al.). In the next few known with respect to some coordinate system.
Channel Modeling and Algorithms for Indoor Positioning 165

Each RP measures various characteristics of the shows an example of the AOA estimation in an
signal received from the sensor, which is referred ideal nonmultipath environment. The two RPs
to in this entry as location metrics. These location measure the AOAs from the sensor as 78:3 and
metrics are then fed into the positioning algo- 45 , respectively. These measurements are then
rithm, which then produces an estimate of the used to form lines of position, the intersection of
location of the sensor. which is the position estimate.
The location metrics are of three main In real-world indoor environments, however, C
types: multipath effects will generally result in AOA
estimation error. This error can be expressed as
Angle of arrival (AOA)
Time of arrival (TOA)
OD true (1)
Received signal strength (RSS)

This section is organized in four subsections; where true is the true AOA value, generally
in the rst three, each of these location metrics obtained when the sensor is in the line-of-sight
is discussed in greater detail, while the last is (LOS) path from the RP. In addition, O represents
devoted to a nonexhaustive survey of position the estimated AOA, and is the AOA estimation
estimation techniques using these metrics. error. As a result of this error, the sensor position
is restricted over an area de ned with an angular
Angle of Arrival spread of 2 , as illustrated in Fig. 3 below for
As its name implies, AOA gives an indication of the two-RP scenario. This clearly illustrates that
the direction the received signal is coming from. in order to use AOA for indoor positioning, the
In order to estimate the AOA, the RPs need to be sensor has to be in the LOS path to the RP, which
equipped with special antennae arrays. Figure 2 is generally not possible.

Channel Modeling and RP1


Algorithms for Indoor
Positioning, Fig. 2
Illustration of angle of 78,3° s
arrival (AOA). S sensor

45,0°

RP2

Channel Modeling and RP1


Algorithms for Indoor
Positioning, Fig. 3 s
Illustration of AOA in the 78.3°
presence of multipath

45,0°

RP2
166 Channel Modeling and Algorithms for Indoor Positioning

Channel Modeling and 100


Algorithms for Indoor
Positioning, Fig. 4 90 First Path (Expected TOA)
Illustrating basic time of
arrival (TOA) principles 80 First Peak (Estimated TOA)
for positioning
70 TOA Estimation Error
60

Amplitude
50 Channel Profile

40

30 Reflected and Transmitted Paths


20

10

0
0 20 40 60 80 100 120 140
Time (nsec)

Time of Arrival (TOA) increasing the bandwidth of the system used for
TOA gives an indication of the range (i.e., dis- the TOA estimation (Alavi and Pahlavan 2005).
tance between a transmitter and a receiver). The UDP conditions, on the other hand, refer to cases
basic concept can be illustrated with reference where the DP cannot be detected at all, as shown
to the channel pro le of Fig. 4 below. Since the in Fig. 5 below. UDP conditions generally occur
speed of light in free space, c, is constant, the at the edge of coverage areas, or in cases where
TOA of the direct path (DP) between the trans- there are large metallic objects in the path be-
mitter and the receiver, , will give the true range tween the transmitter and the receiver. As a result,
between the transmitter and receiver as de ned the difference between the rst detected path
by the equation: (FDP) and the DP is beyond the dynamic range
of the receiver, and the DP cannot be detected, as
d Dc : (2) shown in Fig. 5. Unlike multipath-based ranging
error, UDP-based ranging error typically cannot
In practice, the TOA of the DP cannot be be reduced by increasing the bandwidth. In addi-
estimated perfectly, as illustrated in Fig. 4. The tion, the occurrence of UDP-based ranging error
result is ranging error [also referred to as the is itself random in nature (Alavi and Pahlavan
distance measurement error (DME) in the liter- 2005).
ature], given as Through UWB measurements in typical in-
door areas, it has been shown that both multipath
ranging error and UDP-based ranging error fol-
" D dO d (3)
low a Gaussian distribution, with mean and vari-
ance that depend on the bandwidth of operation
where dO is the estimated distance and d is the true (Alavi and Pahlavan 2005). The overall model
distance. can be expressed as follows:
There are two main sources of ranging er-
ror: multipath effects and undetected direct path
(UDP) conditions. Multipath effects will result in
the DP, as well as re ected and transmitted paths dO D d C G.mw ; w / log.1 C d/
to be received. It has been shown empirically
that multipath ranging error can be reduced by C G.mUDP;w ; UDP;w / (4)
Channel Modeling and Algorithms for Indoor Positioning 167

Channel Modeling and 1.4


Algorithms for Indoor BW = 200 MHz
Strongest Path
Positioning, Fig. 5 and FDP Distance Error = 5.3247 meters
Illustration of undetected 1.2
direct path (UDP)-based
distance measurement error 1
(DME) at a bandwith (BW)

Dynamic Range
Amplitude (mU)
Error
of 200 MHz. FDP stands
for rst detected path 0.8 C
0.6

0.4 Detection
Threshold

0.2 DP

0
0 20 40 60 80 100 120 140
Time (nsec)

where G.mw , w / and G.mUDP;w , UDP;w / are RSSd D 10 log10 Pr D 10 log10 Pt


the Gaussian random variable (RV) that refer to
multipath and UDP-based ranging error, respec- 10 log10 d C X (6)
tively. The subscript w in both cases denotes
the bandwidth dependence. The parameter is a where is the distance-power gradient, X is the
binary RV that denotes the presence or absence shadow fading (a lognormal distributed random
of UDP conditions, with a probability density variable), Pr is the received power, and Pt is
function (PDF) given as the transmitted power. While simple, this method
yields a highly inaccurate estimate of distance in
f . / D .1 PUDP;w / . 1/ C PUDP;w . / indoor areas, since instantaneous RSS inside a
(5) building varies over time, even at a xed location;
this is largely due to shadow fading and multipath
where PUDP;w denotes the probability of occur- fading. If, on the other hand, the RSS value to
rence of UDP-based ranging error. expect at a given point in an indoor area is known,
then the location can be estimated as the point
Received Signal Strength where the expected RSS values approximate the
RSS is a simple metric that can be measured and observed RSS values most closely. This is the
reported by most wireless devices. For example, essence of the pattern recognition approach to
the MAC layer of IEEE 802.11 WLAN standard position estimation, which will be discussed in
provides RSS information from all active access greater detail in the following section.
points (APs) in a quasiperiodic beacon signal that
can be used as a metric for positioning (Bahl and Position Estimation Techniques
Padmanabhan 2000). RSS can be used in two Position estimation techniques can be categorized
ways for positioning purposes. in a number of different ways. They can be
If the RSS decays linearly with the log- grouped in terms of whether the sensing infras-
distance between the transmitter and receiver, tructure used for measuring location metrics is
it is possible to map an observed RSS value to deployed in a xed or an ad hoc manner. They
a distance from a transmitter and consequently can also be grouped according to how the position
determine the user s location by using distances computations are performed. In the category of
from three or more APs. In other words: centralized algorithms, all the location metrics
168 Channel Modeling and Algorithms for Indoor Positioning

Channel Modeling and y


Algorithms for Indoor
Positioning, Fig. 6
System scenario for
performance evaluation
d_4 d_3
RP-4 RP-3

d_1
d_2 D

RP-1 RP-2

are sent to one central node, which then carries CN-TOAG Algorithm
out the computations. In contrast, the term dis- The CN-TOAG algorithm leverages the fact that
tributed algorithms refers to a class of algorithms at any given point in an indoor covered by a
where the computational load for the position number of RPs, the exact value of the TOA is
calculations are spread out over all the nodes in known Kanaan and Pahlavan (2004). Consider
the network. In the next few sections, some ex- the grid arrangement of RPs in an indoor setting,
amples of centralized and distributed positioning as shown in Fig. 6. Each of these RPs would
algorithms for both xed positioning and ad hoc perform a range measurement, di .1 < qi < qN ,
scenarios will be discussed. Owing to space lim- where N is the number of RPs in the grid), to the
itations, the treatment is by no means exhaustive; user to be located.
the interested reader is referred to Hightower and Let D represent the vector of range measure-
Borriello (2001) and Niculescu as well as any ments that are reported by the RPs, and let Z rep-
associated references contained therein. resent the vector of expected TOA-based range
measurements at a certain point, r D .x; y/. For
the purposes of this algorithm, Z is known as the
Centralized Algorithms range signature associated with the point r. An
In this section, two algorithms for xed position estimate of the user s location, rO , can be obtained
estimation and one algorithm from ad hoc by nding that point r, where Z most closely ap-
positioning are discussed. For xed location proximates D. The error function, e(r)De(x, y),
estimation, the closest neighbor with TOA grid is de ned as
(CN-TOAG) (Kanaan and Pahlavan 2004) and
ray-tracing assisted closest neighbor (RT-CN) e.r/ D e.x; y/ D jjD Z.r/jj D jjD Z.x; y/jj
algorithms (Hatami and Pahlavan 2006) are (7)
discussed. For ad hoc positioning, a distributed
version of the least-squares (LS) algorithm is where jj:jj represents the vector norm. Equa-
presented (Di Stefano et al. 2003). tion (7) can also be written as
Channel Modeling and Algorithms for Indoor Positioning 169

e.x; y/ of access points within the coverage area. In


v addition, a powerful central entity is required,
uN
uX p 2 both to perform the RT computations for the radio
Dt dk .x Xk /2 C .y Yk /2 map and to execute the NN algorithm on the radio
kD1 map to come up with the nal position estimate.
(8) As such, it is an example of a centralized pattern
recognition algorithm. C
where N is the number of RPs, dk is the range
measurement performed by kth RP .1 < qk <
Distributed LS Algorithm
qN /, and .Xk ; Yk / represents the location of the
The algorithm that is featured in Di Stefano
kth RP in Cartesian coordinates (assumed to be
et al. (2003) is a distributed implementation
known precisely). The estimated location of the
of the steepest descent LS algorithm. The
mobile, rO ; can then be obtained by nding the
system scenario assumes ultrawide band (UWB)
point (x, y) that minimizes (8). This point can
communications between sensor nodes. The
be found by using the gradient relation:
sensor nodes perform range measurements
between themselves and all the neighbors that
re.x; y/ D 0 (9) they are able to contact. Then the following
objective function is minimized using the
Owing to the complexity of the function in (2), distributed LS algorithm:
it is not possible to nd an analytical solution
to this problem. CN-TOAG provides a numerical 1X X 2
method of solving (9), detailed in Kanaan and ED dij dOij (10)
2
Pahlavan (2004). i j 2N .i/

RT-CN Algorithm where dij is the actual distance between two


The RT-CN algorithm is based on the RSS nodes i and j and dOij is the estimated distance
metric. The general idea is that the RSS between the same two nodes. Assuming some
characteristics of the area covered by the RPs transmission range R for every sensor node, N.i /
are characterized in a data structure known as a represents the set of neighbors for node i , i.e.,
radio map. Generally, the radio map is generated N.i / D j W dij < qR; i ⁄ j .
using on-site measurement in a process called
training or ngerprinting. On-site measurement Effects of the Channel Behavior on
is a time- and labor-consuming process in TOA-Based Positioning Algorithm
a large and dynamic indoor environment. In Performance
Hatami and Pahlavan (2006) two alternative Channel behavior is intimately linked with the
methods to generate a radio map without on- performance of the positioning algorithms. As
site measurements are introduced. The RT-CN already noted above, the main effect of the chan-
algorithm uses two-dimensional ray-tracing (RT) nel is to introduce errors into the measurement
computations to generate the reference radio of the metrics used for the positioning process.
map. During localization, mobile station (MS) The precise manner in which these errors are
applies the nearest neighbor (NN) algorithm to introduced is determined by the quality of link
the simulated radio map and the point that is (QoL) between the sensor and all the RPs that it is
the closest in signal space to the observed RSS in contact with. In a TOA-based system, the exact
values. In this way, a very high-resolution radio amount of error from a given RP depends on
map can be generated and higher localization whether UDP conditions exist or not. In this case,
accuracy results. In order to generate an accurate the channel is said to exhibit bipolar behavior,
radio map in this technique, the localization i.e., it suddenly switches from the detected direct
system requires knowledge of the location path (DDP) state to the UDP state from time to
170 Channel Modeling and Algorithms for Indoor Positioning

time and this results in large DME values. These tinct combinations will have to be used: UUUU,
will then translate to large values of estimation UUUD, UUDD, UDDD, and DDDD. Each of
error; in other words, the quality of estimation these combinations can be used to characterize
(QoE) will be degraded (Kanaan et al. 2006). a different QoL class. The occurrence of each
Owing to the site-speci c nature of indoor of these combinations will give rise to a certain
radio propagation, the very occurrence of UDP MSE value in the location estimate. This MSE
conditions is random and is best described sta- value will also depend on the speci c algorithm
tistically (Alavi and Pahlavan 2005). That being used. There may be more than one way to obtain
the case, the QoE (i.e., location estimation ac- each DDP/UDP combination. If UDP conditions
curacy) will also need to be characterized in the occur with probability Pudp , then the overall prob-
same manner. Different location-based applica- ability of occurrence of the i th combination Pi
tions will have different requirements for QoE. can be generally expressed as
In a military or public safety application (such as
keeping track of the locations of re ghters or N N N Nudp;i
Pi D Pudpudp;i 1 Pudp (11)
soldiers inside a building), high QoE is desired. Nudp;i
In contrast, lower QoE might be acceptable for
a commercial application (such as inventory con- where N is the total number of RPs (in this
trol in a warehouse). In such cases, it is essential case four), and Nudp;i is the number of RPs
to be able to answer questions like: What is where UDP-based DME is observed. Combining
the probability of being able to obtain a mean the probabilities, Pi , with the associated MSE
square error (MSE) of 1 m2 from an algorithm values for each QoL class, a discrete cumulative
x over different building environments that give distribution function (CDF) of the MSE can be
rise to different amounts of UDP? or What obtained. This discrete CDF is known as the MSE
algorithm should be used to obtain an MSE of pro le (Kanaan et al. 2006). The use of the MSE
0.1 cm2 over different building environments? pro le will now be illustrated with examples,
Answers to such questions will heavily in uence focusing on the CN-TOAG algorithm.
the design, operation, and performance of indoor The system scenario in Fig. 6 is considered
geolocation systems. with D D 20 m. A total of 1,000 uniformly
Given the variability of the indoor propagation distributed random sensor locations are simulated
conditions, it is possible that the distance mea- for different bandwidth values. In line with the
surements performed by some of the RPs will be FCC s formal de nition of UWB signal band-
subject to DDP errors, while some will be sub- width as being equal to or more than 500 MHz
ject to UDP-based errors. Various combinations (US Federal Communications Commission
of DDP and UDP errors can be observed. To 2004), the results are presented for bandwidths of
illustrate, consider the example system scenario 500, 1,000, 2,000, and 3,000 MHz. For each
shown in Fig. 6. For example, the distance mea- bandwidth value, different QoL classes are
surements performed by RP-1 may be subject to simulated, speci cally UUUU, UUUD, UUDD,
UDP-based DME, while the measurements per- UDDD, and DDDD. Once a sensor is randomly
formed by the other RPs may be subject to DDP- placed in the simulation area, each RP calculates
based DME; this combination can be denoted as TOA-based distances to it. The calculated dis-
UDDD. Other combinations can be considered in tances are then corrupted with UDP- and DDP-
a similar manner. based DMEs in accordance with the DME model
Since the occurrence of UDP conditions is based on UWB measurements as given in Alavi
random, the performance metric used for the and Pahlavan (2005). The positioning algorithm
location estimate (such as the MSE) will also is then applied to estimate the sensor location.
vary stochastically and depends on the particu- Based on 1,000 random trials, the MSE is cal-
lar combination observed. For the four-RP case culated for each bandwidth value and the corre-
shown in Fig. 6, it is clear that the following dis- sponding combinations of UDP- and DDP-based
Channel Modeling and Algorithms for Indoor Positioning 171

Channel Modeling and CDF of the MSE for CN-TOAG (h = 0.3125 m)


Algorithms for Indoor 1
Positioning, Fig. 7
(MSE) Pro le for the
closest neighbor with TOA 0.8
grid (CN-TOAG) algorithm

P(MSE abscissa)
0.6 C

0.4

500 MHz
0.2 1000 MHz
2000 MHz
3000 MHz
0
0 0.5 1 1.5 2 2.5 3
MSE

Channel Modeling and MSE with different QoL classes: CN-TOAG algorithm
Algorithms for Indoor 3
Positioning, Fig. 8 w=500 MHz
Quality of link (QoE) w=1000 MHz
variation across the various 2.5 w=2000 MHz
QoL classes
w=3000 MHz

2
MSE

1.5

0.5

0
1 2 3 4 5
QoL classes

DMEs. The probability of each combination is speci c range of bandwidths. Above 2,000 MHz,
also calculated in accordance with (11). however, the MSE pro le becomes wider as a
The results are shown in Figs. 7 and 8. Figure 7 result of increased probability of UDP conditions
shows the MSE pro les for the CN-TOAG (Alavi and Pahlavan 2005), which increases the
algorithm. From this plot, it is observed that overall DME. This, in turn, translates into an
as the bandwidth increases from 500 MHz to increase in the position estimation error. In order
2,000 MHz, the range of MSE pro le values gets to gain further insight into the variation of the
smaller. This correlates with the ndings of Alavi QoE across the different QoL classes, again
and Pahlavan (2005), where it was observed considering bandwidth as a parameter, just the
that the overall DME goes down over this MSE is plotted, as seen in Fig. 8.
172 Channel Modeling and Algorithms for Indoor Positioning

Key Applications Cross-References

The applications of indoor localization tech-  Indoor Positioning


nology are vast and can be broadly classi ed
into two categories: commercial and public
References
safety/military. Commercial applications range
from inventory tracking in a warehouse to Alavi B, Pahlavan K (2005) Indoor geolocation distance
tracking children, elderly, and people with special error modeling with UWB channel measurements. In:
needs (McKelvin et al. 2005). Location-sensitive Proceedings of the IEEE personal indoor mobile radio
web browsing and interactive tour guides for communications conference (PIMRC), Berlin, 11 14
Sept 2005
museums are other examples (Koo et al. 2003). Bahl P, Padmanabhan VN (2000) RADAR: an in-building
In the public safety/military space, the most RF-based user location and tracking system. In: Pro-
prevalent application is to help emergency ceedings of the IEEE INFOCOM 2000, Tel Aviv, 26
workers (police, re ghters, etc.). 30 March 2000
Di Stefano G, Graziosi F, Santucci F (2003) Distributed
Accurate indoor localization is also an impor- positioning algorithm for ad-hoc networks. In: Pro-
tant part of various personal robotics applications ceedings of the IEEE international workshop on UWB
(Jensfelt 2001) as well as in the more general systems, Oulu, June 2003
context of context-aware computing (Ward et al.). FCC Docket No. 94-102. Revision of the commissions
rules to insure compatibility with enhanced 911 emer-
More recently, location sensing has found ap- gency calling systems. Federal Communications Com-
plications in location-based handoffs in wireless mission Technical report RM-8143, July 1996
networks (Pahlavan et al. 2000), location-based Hatami A, Pahlavan K (2006) Comparative statistical
ad hoc network routing (Ko and Vaidya 1998), analysis of indoor positioning using empirical data
and indoor radio channel models. In: Proceedings of
and location-based authentication and security. the consumer communications and networking confer-
Many of these applications require low-cost, low- ence, Las Vegas, 8 10 Jan 2006
power terminals that can be easily deployed with Hightower J, Borriello G (2001) Location systems for
little or no advanced planning; this is the basis for ubiquitous computing. IEEE Comput Mag 34(8):
57 66
developments in ad hoc sensor networks. Recent Jensfelt P (2001) Approaches to mobile robot localization
developments in integrated circuit (IC) technol- in indoor environments. Ph.D. thesis, Royal Institute
ogy as well as microelectromechanical systems of Technology, Stockholm
Kanaan M, Pahlavan K (2004) CN-TOAG: a new algo-
(MEMS) have made it possible to realize such
rithm for indoor geolocation. In: Proceedings of the
low-cost, low-power terminals. In the next few IEEE international symposium on personal, indoor and
years, there will undoubtedly be scores of new mobile radio communications, Barcelona, 5 8 Sept
applications for indoor localization. 2004
Kanaan M, Akg l FO, Alavi B, Pahlavan K (2006) Per-
formance benchmarking of TOA-based UWB indoor
geolocation systems using MSE pro ling. In: Proceed-
Future Directions ings of the IEEE vehicular technology conference,
Montreal, 25 28 Sept 2006
Kaplan ED (1996) Understanding GPS. Artech, Boston
Indoor positioning is a relatively new area of
Ko Y, Vaidya NH (1998) Location-aided routing (LAR)
research, and, as such, there are a number of in mobile ad hoc networks. In: Proceedings of the
different problems to be solved. Among these ACM/IEEE international conference on mobile com-
are questions such as: What algorithms and puting and networking, 1998 (MOBICOM 98), Dal-
las, 25 30 Oct 1998
techniques should be used to obtain a certain level Koo SGM, Rosenberg C, Chan H-H, Lee YC (2003)
of positioning error performance? and What Location-based e-campus web services: from design
is the best performance that can be obtained to deployment. In: Proceedings of the rst IEEE in-
from a given positioning algorithm under UDP ternational conference on pervasive computing and
communications, Fort Worth, 23 26 Mar 2003
conditions? . Issues such as these will need to be McKelvin ML, Williams ML, Berry NM (2005) Integrated
looked at in order for the indoor positioning eld radio frequency identi cation and wireless sensor net-
to mature. work architecture for automated inventory manage-
Climate Adaptation, Introduction 173

ment and tracking applications. In: Proceedings of


the Richard Tapia celebration of diversity in comput- Climate Adaptation
ing conference (TAPIA 05), Albuquerque, 19 22 Oct
2005  Climate Extremes and Informing Adaptation
Niculescu D. Positioning in ad-hoc sensor networks. IEEE
Netw 18(4):24 29
Pahlavan K, Krishnamurthy P, Hatami A, Ylianttila M,
M kel J, Pichna R, Vallstr m J (2000) Handoff in C
hybrid mobile data networks. IEEE Personal Commun
Mag 7:34 47 Climate Adaptation, Introduction
Sayed AH, Tarighat A, Khajehnouri N. Network-
based wireless location. IEEE Signal Process Mag Shahed Najjar, Udit Bhatia, and
22(4):24 40
Auroop R. Ganguly
US Federal Communications Commission (2004) Revi-
sion of Part 15 of the commission s rules regard- Sustainability and Data Sciences Laboratory
ing ultra-wideband transmission systems, FCC 02 48, (SDS Lab), Department of Civil and
First Report & Order, April 2004 Environmental Engineering, Northeastern
Ward A, Jones A, Hopper A. A new location technique
University, Boston, MA, USA
for the active of ce. IEEE Personal Commun Mag
4(5):42 47

Synonyms

Characteristic Travel Time Adaptation; Learning; RFC; Scenario planning;


Transformations
 Dynamic Travel Time Maps
Definitions

Climate Change
Check-Out Climate change is de ned as changes in the state
of the climate variables that can be identi ed (by
 Smallworld Software Suite using statistical tests) by changes in the mean
and/or the variability of its properties and that
persists for an extended period. An extended
period in climate context implies decades or an
Chronological even longer time scale (IPCC 2014). Climate
change may be due to natural internal processes
 Time-Aware Personalized Location Recom- or external forcings and persistent anthropogenic
mendation changes in the composition of the atmosphere or
in land use (Stocker et al. 2013).

Classification Integration Adaptation

 Integration of Spatial Constraint Databases In the context of climate and climate-related


extremes, IPCC s Special Report on Managing
the Risks of Extreme Events and Disasters to
Advance Climate Change Adaptation (SREX)
Clementini Operators de nes adaptation as, the process of adjustment
to actual or expected climate and its effects, in
 Dimensionally Extended Nine-Intersection order to moderate harm or exploit bene t oppor-
Model (DE-9IM) tunities (Field 2012).
174 Climate Adaptation, Introduction

Reasons for Concerns (RFCs) by an assumption that these variables conform


to stationary, independent, and identically
Five integrative reasons for concerns (RFCs) pro- distributed (i.i.d.) random process. These
vide a framework for summarizing key risks assumptions are at odds with the recognition
across sectors and regions. RFCs provide one that there is high statistical con dence in the fact
starting point for evaluating dangerous anthro- that climate varies naturally at all scales, both
pogenic interference with the climate system. spatially and temporally, as well as in response to
IPCC s fth assessment report highlights the fol- the anthropogenic activities.
lowing ve RFCs (IPCC 2014):
Transformational Adaptation
1. Unique and threatened systems: Ecosystems
and cultures that are already at risk from
IPCC SREX de nes transformations as The al-
changing climate.
tering of fundamental attributes of a system (in-
2. Extreme weather events: Ampli ed risks of
cluding value systems; regulatory, legislative, or
extreme weather events such as precipitation
bureaucratic regimes; nancial institutions; and
extremes, heat waves, and coastal ooding in
technological or biological systems). While in-
a changing climate.
cremental adaptation aims to improve ef ciency
3. Distribution of impacts: Uneven distribution
within existing systems, transformational adap-
of risk and pronounced impacts of these risks
tation may involve changing the fundamental
for disadvantages population and communi-
attributes of these systems. Where vulnerability
ties.
is high and adaptive capacity low, changes in
4. Global aggregate impacts: Mass extinctions in
climate extremes can make it dif cult for sys-
biodiversity, perturbed global transportation,
tems to adapt sustainably without transforma-
communication, and trade networks as a con-
tional changes.
sequence of increased risks from climate- and
weather-related events in a changing climate.
5. Large-scale singular events: For example, Historical Background
there is medium con dence that sustained
warming greater than a certain threshold could Across the globe, speci cally in urban coastal
result in near-complete loss of the Greenland areas, some extreme weather events such as heat
ice, which in turn will contribute up to 7 m of waves have become more frequent, cold extremes
global mean sea level rise. have become less frequent, and patterns of rain-
fall are likely changing (Mishra et al. 2015). Even
if future greenhouse gases emissions were to be
Nonstationarity committed to lower levels, there is moderate to
high con dence that the climate would continue
As climate change became an increasingly to change for decades to come. It has been es-
prominent topic, technical terms like station- timated that a heat wave of the same magnitude
arity and nonstationarity also became more as the 2003 European heat wave could prove
noticeable. Nonstationarity can be de ned as ve times more lethal in a large American city
processes that have statistical properties that are in terms of mortality. In ecosystems, changing
deterministic functions of time. Demonstrating climate could reduce the productivity and abun-
nonstationarity is more complex than stationarity dance of species and induce mass extinctions and
because it is necessary to do so through analysis habitat changes (Urban 2015).
of the process physics. In the context of risk Society s need to manage changing environ-
management and informing adaptation in the mental conditions is not new; people have been
context of climate, ood frequency analysis adjusting to their environment since the emer-
of variables such as precipitation events, gence of civilizations. Modern efforts to stabilize
oods, temperature extremes, etc. is marked and protect our homes, livelihoods, and resources
Climate Adaptation, Introduction 175

in the face of a variable climate include the However, if mitigation is not an easy prob-
development of oodplain regulations, insurance, lem to address, climate adaptation is perhaps
wildlife reserves, drinking water reservoirs, and relatively better poised. In addition, adaptation
building codes. However, these actions have been becomes an absolute necessity if mitigation path-
taken in response to a climate that has been ways fail to fully materialize and/or if historical
relatively stable for many centuries. emissions are already causing signi cant dam-
While climate mitigation deals with energy age. The impacts of natural hazards such as C
and economic policies to avoid the unmanage- hurricanes, oods, and heat or cold waves are
able, climate adaptation is about engineering the felt immediately, and the scarcity of water, food,
coupled natural-built human system to manage and energy resources affects lives and well-being.
the unavoidable. A survey at the World Eco- Thus, the need to adapt is often acceptable to
nomic Forum ranked failure of climate adaptation even those who may not perceive climate change
and mitigation as well as greater incidences of ex- as a major threat or even as a driver of weather
tremes weather as two of the top ten global risks extremes or resource constraints. This is espe-
of highest concern. Two other concerns among cially true if adaptation measures are low regret
the top ten were water and food crises, which (e.g., reinforces what needed to be done anyway
in turn are strongly in uenced by climate. The irrespective of the nature of climate change in
United States Department of Defense, an agency any given region), even though there may be oc-
with a broad mission space encompassing the po- casions when transformational adaptations may
litical, military, economic, social, informational, be the only strategy. The importance and near
and infrastructural sectors, has called climate centrality of water to adaptation have been well
change a threat multiplier in their Quadrennial recognized, both directly through water security
Defense Review Report. and through their impacts on energy and food
Climate mitigation relies on the perceived ur- security. The resilience of interdependent criti-
gency of the climate challenge by nations of the cal lifelines and infrastructures, in sectors such
world. Developing nations may nd the trade-offs as water (including waste water), transportation,
particularly dif cult to justify, given the immedi- energy (or power and fuel), and communications,
ate impacts on the poor and on the aspirations has been recognized as an urgent societal need.
of their middle class, as well as the disparities Natural ecosystems, which may act as soft in-
in per capita and in historical emissions when frastructures (e.g., in coastal and/or urban regions
compared to developed nations. However, the where marine ecosystems could help slow down
poorer sections of the developing nations are ex- the impacts of sea level rise or reduce the strength
pected to have to bear the brunt of climate-related of storm surges), may have interdependencies
hazards and resource scarcity. Conversely, de- with built, or the hard, infrastructures. One step
veloped nations do not consistently rank climate to adaptation is what has been sometimes called
change as among the highest of policy concerns translational climate science or the ability to
but expect the developing nations to bear the develop actionable yet credible insights from cli-
burden of emission reductions by appealing not to mate science through computational modeling
per capita but to the total emissions per country. and data sciences.
Belief systems, ranging from political ideologies
and the ability of technological innovations to
solve society s problems to humanity s manifest Incremental and Transformative
destiny, in addition to a host of cultural and Adaptation
historical experiences on individuals and nations,
color the perceptions around climate mitigation. Incremental adaptations to climate change can be
What adds to the complexity is that the costs thought as extensions of actions and behaviors
of mitigation have to be borne by the current that already reduce the loss and enhance the
generation but the perceived bene ts are to future bene ts of variations in changing climate and
generations and to the planet at large. weather extremes. Incremental adaptations are
176 Climate Adaptation, Introduction

doing slightly more of what is already being done growing challenge of food and water security.
to deal with natural variation in climate and with For instance, the population of Chennai, an
extreme events. However, transformative adapta- urban coastal metropolitan city on the Eastern
tion measures seek to change the basic attributes coast of India, has increased by fourfolds in
of the systems that are affected or likely to be the last ve decades (Chennai City Popula-
affected by the variations. Kates et al. classify tion Census 2011). When the city was fac-
transformation adaptation into three broad cate- ing an unprecedented crisis of drinking water,
gories which are discussed in further detail in this introduction of seawater desalination plants
section (Kates et al. 2012): has proved to be transformative adaptation to
drinking water problem.
1. Adaptation new to resources/location: Ex- 2. Enlarged scale or intensity: Incremental
amples include introduction of technologies adaptations can become transformative
into places where they have been not used when they are used at a greater scale with
before. This can either be done through tech- much larger effects. This kind of adaptation
nology transfer or by technological inven- measures generally requires a system-level
tions relevant for the location. Environmental, view with an underlying philosophy that
human-induced changes and mass migrations the whole is greater than the sum of its
have posed serious challenges or a serious parts.
challenge to urban coastal megacities. As a 3. Different places and locations: Some adap-
consequence, these cities are facing an ever- tations collectively transform place-based hu-

Climate Adaptation, Introduction, Table 1 Summary of sectorial risks in changing climate and potential adaptation
strategies
Sector Key risks Potential adaptation strategies
Water Drought frequency likely to increase by the Adaptive water management strategies
resources end of the twentieth century Scenario planning
Raw water quality likely to reduce Low regret solutions (IPCC 2014)
Increased concentration of pollutants during
droughts
Ecosystems Increasing ocean acidi cation in medium to Promote genetic diversity
Terrestrial high emission scenarios to impact population Assisted migration and dispersal from
Marine dynamics, physiology, and behavior of marine severely impacted ecosystems
Inland species Manipulation of disturbing regimes (such as
Carbon stored in terrestrial biosphere suscepti- forest res, coastal ooding)
ble to loss to atmosphere as a consequence of
deforestation and climate change
Increased risk of species extinction and habitat
migrations
Urban areas Heat stress Multilevel urban risk governance
Inland and coastal ooding Including voice of low-income groups in
Drought and water scarcity informing policy
Risks ampli ed by lack of resilient infrastruc- Building resilient infrastructure systems
ture systems
Rural areas Moderate to severe impact on: Trade reforms and investments in rural ar-
Food security eas
Agriculture income Adaptations for agriculture and water
Shifts in production area of crops through policies taking account of rural
Freshwater availability decision-making contexts
Climate Adaptation, Introduction 177

man environment systems or shift such sys- Some examples of key risks in changing cli-
tems to other locations. Resettlement associ- mate include:
ated with climate variability, and, by some
accounts, climate change per se, is already (a) Risk of disruption of livelihoods in low-lying
under way in a few locations. This category coastal zones and small islands due to sea
of transformations becomes imperative when level rise, storms, and coastal ooding (Aerts
risks have increased beyond the threshold, et al. 2014). C
where incremental transformations and even (b) Risk of deaths and mass migrations of large
technology transformations may not result in urban populations as a consequence of inland
a signi cant positive effect. ooding.
(c) Systematic risks due to extremes leading to
breakdown of infrastructure networks (Bhatia
et al. 2015).
Future Directions: Risks and (d) Increased risk of mortality and illness during
Opportunities for Adaptation extreme period of heats, particularly for vul-
nerable urban population (Meehl and Tebaldi
In the context of climate change, key risks refer 2004).
to dangerous human-induced interferences with (e) Risk of loss of biodiversity in terrestrial, ma-
changing climate. Identi cations of key risks are rine, and/or inland water ecosystems.
based on the following criteria: (f) Risk of food insecurity linked to warming,
drought, ooding, and extreme precipitation
(a) Large magnitude events, particularly for poorer populations in
(b) High probability or irreversibility of impacts urban and rural settings.
(c) Timing of impact
(d) Persistent vulnerability Sectors that are likely to be impacted by the
(e) Limited potential to reduce risks through mit- key risks include freshwater resources, ecosys-
igation or adaptation tems (include terrestrial, marine, and inland),

Climate Adaptation, Introduction, Table 2 Summary adaptation strategies for given key risk are identi ed by
of regional risks in changing climate, climate drivers, the same number (Adapted from IPCC 2014)
and potential adaptation strategies. Climate drivers and
Region Key risk Climate driver Potential adaptation strategies
Australasia I. Increased risk in riverine, I. Extreme precipitation, cy- I. Exposure reduction and protect-
coastal, and urban ood clones, and sea level rise ing natural barriers (e.g., man-
II. Heat waves: increased risk II. Warming trends and ex- groves)
of heat-wave-related mor- treme temperature events II. Heat health warning systems, ur-
tality, forest res, decreas- III. Warming trends and cy- ban planning to reduce heat is-
ing crop output/hectare clones lands, and new work practices to
III. Signi cant change in IV. Extreme temperature, dry- avoid heat stress among outdoor
community composition ing trends, and warming population
and structure of coral reef trends III. Direct interventions such as as-
systems in Australia sisted colonization and shading
IV. Increased risk of drought- and reducing stresses such as sh-
related water and food ing, tourism
shortage in South Asia and IV. Integrated water resource
the Indian subcontinent management, water infrastructure
and reservoir development, water
reuse, and desalinated sea water
usage in coastal areas

(continued)
178 Climate Adaptation, Introduction

Climate Adaptation, Introduction, Table 2 (continued)


Region Key risk Climate driver Potential adaptation strategies
Central I. Decreased food production I. Precipitation, extreme I. Strengthening traditional indige-
and South and quality precipitation, temperature, nous systems and developing a
America II. Par Waterborne diseases and warming trends new variety of crops more adapt-
III. Water availability in Cen- II. Warming trends and ex- able to temperature and droughts
tral America and extreme treme precipitation II. Developing early warning sys-
precipitation events result- III. Drying trends, warming tems for disease control
ing in oods and landslides trends, and extreme pre- III. Programs to extend public health
cipitation services

North I. Wild re induces loss of I. Warming trends and dry- I. Introducing resilient vegetation
America ecosystem integrity and hu- ing trend and prescribed burning
man mortality II. Extreme temperature and II. Early heat warning systems,
II. Heat-related human mortal- warming trends cooling centers, residential air
ity III. Extreme precipitation and conditioning, and community
III. Urban oods in riverine cyclones and household scale adaptations
and coastal area resulting in through family support
ecosystem damage, human III. Low impervious surface
mortality, mass migrations, pavement designs, updating
and infrastructure damage old rainfall-based infrastructure
design to re ect current and
changing climate conditions, and
protecting natural ood barriers
(e.g., mangroves)

Europe I. Signi cant reduction in wa- I. Drying trend, warming I. Implementation of best practices
ter availability from river ab- trend, and extreme and governance instruments in
straction and from ground- temperature river basin management plans and
water resources, combined II. Extreme temperature integrated water management
with increased water de- II. Implementation of warning sys-
mand (e.g., for irrigation, en- tems
ergy and industry, domestic III. Adaptation of dwellings and
use) and with reduced water workplaces and of transport and
drainage and runoff as a re- energy infrastructure
sult of increased evaporative IV. Reductions in emissions to im-
demand prove air quality
II. Increased economic losses V. Improved wild re management
and people affected by ex- VI. Development of insurance prod-
treme heat events: impacts ucts against weather-related yield
on health and well-being, la- variations
bor productivity, crop pro-
duction, air quality, and in-
creasing risk of wild res in
southern Europe and in Rus-
sian boreal region

food sector, infrastructure sector, urban and rural tion. Moreover, adaptation is region and context
areas, and human health. Table 1 summarizes speci c, and with no universal strategy to reduce
the selected key risks and possible adaptation the risk, characterizing the risks and understand-
scenarios for these risks in changing climate. ing context and place-based adaptation strate-
Risks will vary through time across regions gies are critical to inform adaptation strategies.
and populations, dependent on innumerable fac- Table 2 summarizes the selected regional risks
tors including the extent of adaptation and mitiga- and feasible adaptation scenarios for Australasia
Climate and Human Stresses on the Water-Energy-Food Nexus 179

(Australia Asia), Europe, North America, and


Central America (IPCC 2014).
Climate and Human Stresses on the
Water-Energy-Food Nexus

Laura Blumenfeld1 , Tyler Hall1 , Hayden


Cross-References Henderson1;2 , Lindsey Bressler1 , Catherine
Moskos1 , Udit Bhatia1 , Poulomi Ganguli1 , C
 Climate Change and Developmental Economies Devashish Kumar1 , and Auroop R. Ganguly1
1
 Climate Extremes and Informing Adaptation Sustainability and Data Sciences Laboratory
(SDS Lab), Department of Civil and
Environmental Engineering, Northeastern
University, Boston, MA, USA
References 2
Department of Mechanical and Industrial
Aerts JCJH, Botzen WJW, Emanuel K, Lin N, de Moel H, Engineering, Northeastern University, Boston,
Michel-Kerjan EO (2014) Evaluating ood resilience MA, USA
strategies for coastal megacities. Science 344:473
475. doi:10.1126/science.1248222
Bhatia U, Kumar D, Kodra E, Ganguly AR (2015) Net-
work science based quanti cation of resilience demon- Synonyms
strated on the Indian railways network. PLoS ONE
10:e0141890. doi:10.1371/journal.pone.0141890 Food security; Integrated assessment models;
Chennai City Population Census 2011 j Tamil Nadu
n.d. http://www.census2011.co.in/census/city/463- Mass migration; Resource scarcity; Resources
chennai.html. Accessed 3 Sept 2016 management; WEF
Field CB (2012) Managing the risks of extreme events and
disasters to advance climate change adaptation: spe-
cial report of the intergovernmental panel on climate
change. Cambridge University Press, New York
IPCC (2014) Climate change 2014: impacts, adaptation, Definition
and vulnerability. Part B: regional aspects. Contri-
bution of working group II to the fth assessment Water, energy, and food are indistinguishably
report of the intergovernmental panel on climate
change [Barros VR, Field CB, Dokken DJ, Mastran- linked. Water is an input for producing agricul-
drea MD, Mach KJ, Bilir TE, Chatterjee M, Ebi tural goods in the elds and along the entire
KL, Estrada YO, Genova RC, Girma B, Kissel ES, agro-food supply chain. Energy is required for
Levy AN, MacCracken S, Mastrandrea PR, White food production and water distribution: to power
LL (eds)]. Cambridge University Press, Cambridge/
New York agricultural and irrigation machinery and for pro-
Kates RW, Travis WR, Wilbanks TJ (2012) Transfor- cessing and transportation of agricultural goods.
mational adaptation when incremental adaptations to Agriculture accounts for nearly 70 % of total wa-
climate change are insuf cient. Proc Natl Acad Sci ter withdrawal across the globe, while food pro-
109:7156 7161. doi:10.1073/pnas.1115521109
Meehl GA, Tebaldi C (2004) More intense, more frequent, duction and processing accounts for nearly 30 %
and longer lasting heat waves in the 21st century. energy consumption worldwide (Water 2014).
Science 305:994 997. doi:10.1126/science.1098704 The synergies and trade-offs between water
Mishra V, Ganguly AR, Nijssen B, Lettenmaier DP (2015) consumption, food production, and energy con-
Changes in observed climate extremes in global urban
areas. Environ Res Lett 10:24005. doi:10.1088/1748- sumption are manifold:
9326/10/2/024005
Stocker TF, Qin D, Plattner GK, Tignor M, Allen SK, Using water to irrigate crops might promote
Boschung J et al (2013) Climate change 2013: the
physical science basis. Intergovernmental panel on food production, but it can also reduce river
climate change, working group I contribution to the ows and hydropower potential.
IPCC fth assessment report (AR5), New York Growing bioenergy crops under irrigated agri-
Urban MC (2015) Accelerating extinction risk culture can increase overall water withdrawals
from climate change. Science 348:571 573.
doi:10.1126/science.aaa4984 and endanger food security.
180 Climate and Human Stresses on the Water-Energy-Food Nexus

Converting surface irrigation into high- leading to a 6 % rise in electricity use. For three
ef ciency pressurized irrigation may save days in August, peak demand was so high that
water but may also result in higher energy utilities shut off 1.5 GW of nonessential industrial
use. loads to avoid instating rolling blackouts (Texas).
Given the severity of the drought, the
Water, energy, and food (WEF) represent the Texas power system demonstrated exceptional
greatest global risks because they are expected resiliency. As a state prone to such dry weather,
to be highly impacted by climate change, demo- most power plants have either been built or
graphic shifts including mass migrations, aging retro tted with equipment to ensure operability
infrastructure, global trade networks, and other with restricted water use. Natural gas, for
challenges of the twenty- rst century (Andrews- example, has become a major source of electricity
Speed et al. 2012). The nexus approach considers in Texas and requires no cooling water if used
the different dimensions of water, energy, and in a combustion turbine. The construction of
food equally and recognizes the interdependen- ef cient combined-cycle power plants reduces
cies of different resource uses to develop sustain- water use per unit generation. Many plants that
ably (Bazilian et al. 2011). rely primarily on once-through cooling have
supplementary cooling towers for use during
drought conditions. One plant even has an
8.5-mile pipeline to bring cooling water from
Historical Background a secondary source. Lastly, wind power has
seen enormous growth in Texas during the past
In 2011, Texas experienced its most extreme decade, with 10 GW capacity now installed. Wind
drought on record, stressing the ability of the power requires no cooling water. As evidence of
power grid to meet demand. A 2013 study dis- the effectiveness of these alternatives, not a single
cussed the water-energy nexus in the context power plant was cited for water discharges above
of Texas droughts. The electricity supply grid the allowable level during the drought.
in Texas is unique in that it is almost entirely The population of Texas is projected to
separated from the rest of the power infrastructure increase dramatically in the coming decades,
in the USA. As such, there is limited capacity and infrastructure planners are working on
to purchase power from other geographic regions new ways to ensure that electricity demand is
in case of a generation de cit. In the event of met even under extreme drought. Some have
a statewide drought, this isolation presents vul- suggested adding supplemental cooling towers
nerability. Texas also encompasses a range of to all plants, but critics argue that this option
climates. In the subhumid eastern half of the is too costly. Rather, those critics recommend
state, most power plants (70 %) use once-through wisely choosing what type of new generation
cooling and draw from surface water, most often and cooling systems to build. These include
reservoirs. In the semiarid west, power plants use dry cooling systems that, while expensive, use
wet cooling systems to minimize water demand, air rather than water for cooling. To meet the
which is met mostly with groundwater. Dur- rising demand for cooling water, the Texas State
ing 2011 Texas experienced 100 days of above Water Plan calls for the construction of 26 new
100 F temperatures and a record low level of reservoirs. Some are advocating the increased
precipitation. The combination of high demand, utilization of groundwater resources, speci cally
low rainfall, and higher temperatures increased using aquifers to store water and eliminate
evaporation, lowering the state s reservoirs by evaporative losses. This is a common practice in
30 % compared to the previous year. At one California, Arizona, and Florida but has yet to be
point 88 % of the state was experiencing excep- implemented in Texas. Another option is drawing
tional drought. The drought was accompanied water from non-freshwater sources, including
by greater electricity demand for air conditioning, treated wastewater, brackish water, and seawater.
Climate and Human Stresses on the Water-Energy-Food Nexus 181

Seawater in particular is an untapped resource, water towers supplying a majority of freshwater


accounting for 30 % of cooling withdrawals in to India, Pakistan, and Bangladesh, including the
the whole USA, but only 2 % in Texas. volatile ashpoints of the India-Pakistan borders.
A 2014 Pew survey at the World Economic However, a 1960 Indus Water Treaty signed by
Forum ranked the top ten global risks; water both nations has been in place for over 50 years
crises ranked third after scal crises and unem- and is considered to be one of the most successful
ployment, closely followed by failure of climate water sharing agreements. The issues are made C
mitigation and adaptation, as well as greater in- more complicated by the fact that the major
cidence of extreme events such as oods, storms, rivers with sources in the Himalayas originate
and res, in fth and sixth positions, respectively, from Tibet in China. Other ashpoints include
and food crises in the eighth. Water and climate the Nile river basin (Fig. 1).
are not only interrelated to each other but as
noted in the recent US Department of Defense
Quadrennial Defense Review report act as threat Scientific Fundamentals
multipliers for energy and food crises, in addition
to in uencing governance failures and global Water-Energy Nexus
con icts. Nearly 90 % of the electricity produced in the
Certain areas of the world are at a greater risk USA requires water for cooling, amounting to
of experiencing water stress than others, with about 170 billion gallons of water withdrawn and
larger potential for stresses on the water-food- six billion gallons consumed daily. Thermoelec-
energy nexus. Unfortunately, many of the water- tric power plants boil water into steam to drive
stressed regions happen to be in geopolitically a turbine and then discharge the remaining heat
sensitive regions of the world. The region energy into a ow of cooling water (Fig. 2).
encompassing the Middle East and North Africa Thermoelectric power plants get energy from
is particularly at risk; it is home to 6.3 % of the a variety of fuels and utilize several types of cool-
world s population but only 1.4 % of the world s ing systems. The most basic distinction among
renewable fresh water. Researchers quantify water use in these systems is withdrawal versus
water availability by measuring the amount of consumption. Withdrawal refers to water that is
annual renewable freshwater per person. Less pumped through the plant and discharged back
than one thousand cubic meters per person per to the source. Withdrawn water remains available
year is considered water scarce while having for other uses downstream, such as irrigation.
between one thousand and one thousand eight Consumption refers to water that is used in the
hundred cubic meters per person classi es a cooling process and not returned to the source.
country as water stressed. Currently, there For example, cooling systems that evaporate wa-
are 15 water-scarce countries in the world, ter have high consumption rates (Rutberg and
and 12 of them are located in the Middle East Michael 2012).
and North Africa. On the other hand, Israel The amount of water consumed by a power
happens to be among the most technologically plant depends on the type of the cooling sys-
advanced nations in terms of water technologies tem installed. Once-through systems pull water
related to water use, including for agriculture, from the source, use it for cooling, and then return
as well as desalination. The relatively unique it to the source. As this method withdraws very
perspective of and about Israel, with advanced large quantities, it requires an abundant water
water technologies, in the water-scarce Middle supply. The bene t of such systems is that little
East region, leads to the possibility of intriguing water is consumed. One other typical cooling sys-
treaties occasionally negotiated in outside of full tem is wet cooling, which runs water through a
public view. Water sharing considerations lead cooling tower. Cooling towers make use of evap-
to intriguing geopolitical considerations in the oration to reduce the temperature of the water.
Indian subcontinent as well, with the Himalayan While this method withdraws much less water
182 Climate and Human Stresses on the Water-Energy-Food Nexus

Climate and Human Stresses on the Water-Energy- duce electricity for other activities. Water is also essential
Food Nexus, Fig. 1 A simpli ed schematic view of the for agriculture and food. Some agricultural products are
relationship between water, food, and energy. Water is re ned into fuels in a process that connects water, energy,
used for cooling thermoelectric power plants, which pro- and food

than once-through systems, it has a higher rate of corn and soy irrigation each consume upward of
consumption. A third type of cooling method is 10,000 gallons of water for each MMBTU of fuel
known as dry cooling. These systems circulate produced. The re ning of petroleum and plant-
water in a closed loop and dissipate the heat to based fuels also requires large amounts of water,
the air through a large heat exchanger, similar to in the range of 1 2 billion gallons every day.
the radiator in a car. Dry cooling technology While this chapter focuses on water for cooling,
has the bene t of consuming virtually no water water plays a vital role throughout the extraction,
but is expensive, large, and dependent on the local re ning, and transport of fuels as well (DoE US
climate. Some power plants use a combination of 2006).
wet and dry cooling. These are called hybrid
systems. Dammed hydroelectric power produc- Climate Impacts on Water and
tion relies on an adequate supply of water; lower Consequences on Power Generation
water levels in reservoirs correlates to reduced As mentioned at the outset of this chapter, climate
generating capacity. Extracting a single gallon of change will have wide reaching effects on the
oil requires between 2 and 350 gallons of water. water cycle, altering the temperature and quantity
Growing biofuel crops has a higher water inten- of water available in all regions of the country.
sity than any method of fossil fuel extraction: Also as discussed, electricity generation is highly
Climate and Human Stresses on the Water-Energy-Food Nexus 183

Climate and Human Stresses on the Water-Energy- (Adapted from Ganguly et al. 2015) and the rate of water
Food Nexus, Fig. 2 Power plant information from the intake. Water availabilities are median values taken from
Energy Information Agency (EIA) overlaid with county- an ensemble of precipitation-minus-evaporation models.
level water availability projections for 2040. Power plants Values are less domestic demand, taken as per capita
are broken down by the type of cooling system employed demand times the projected 2040 population

dependent on abundant sources of water for cool- With the exception of the last criterion, which
ing. The con uence of these two relationships is a mechanical limit, these regulations are put
forms a water-energy nexus that will stress water in place at the local, state, and federal levels to
supplies and potentially limit power production protect the aquatic environment. The rst three
capacity in the future (F rster and Lilliestam criteria are temperature dependent. As water tem-
2009). peratures increase due to climate change, it be-
The relationship between energy production comes increasingly dif cult for power plants to
and water for once-through cooling systems can meet these discharge requirements. If the intake
be broken down into following sub-relations: rate of cooling water is kept constant, higher
intake temperatures equate to higher discharge
The temperature of the water discharged by a temperatures. In some cases, the higher discharge
plant cannot exceed a speci ed level temperature will exceed criteria 1 or 2. Alter-
The mixed river and discharged water temper- natively, a cooling system may compensate for
ature must not exceed a speci ed level higher intake temperatures by increasing the in-
The temperature difference of discharged ver- take rate. In this case criterion 4 limits plant
sus river water must not exceed a speci ed operation, especially if the availability of water is
level reduced, as in drought conditions. Additionally,
The plant can only withdraw up to a certain criterion 5 prevents the plant from drawing in
fraction of the available stream ow more cooling water than the capability of the
The cooling system must not exceed its pump- plant s equipment (Kimmell 2009). Water avail-
ing capacity ability can also become an issue in light of criteria
184 Climate and Human Stresses on the Water-Energy-Food Nexus

Climate and Human Stresses on the Water-Energy-Food Nexus, Fig. 3 A owchart of water availability with both
its drivers and the responses (Source: IPCC WG-II 2014 report Chapter 3 IPCC 2014)

3 and 4. In the event of low water levels, a temperature, carbon dioxide levels, and water lev-
plant may be forced to withdraw less water so els. Climate disruptions will cause a signi cant
as to satisfy criterion 4. As a result of the lower decrease in the yield of most crops and livestock
intake rate, the temperature of the discharged because of changes in the atmosphere and the
water will be higher than usual. In this case, water availability. Changes in water availability
criterion 1 limits the plant s generating capacity. will affect what crops will grow in certain areas
In addition to the above 5 criteria, in certain cases and the amount of yield and hence in uence food
reduced water availability may lower the water production around the world.
level in bodies of water from which power plants In a world with a growing population, the
withdraw cooling water. If the water level falls demand for food is growing, and the agriculture
below the intake level, the plant will be unable sector needs to increase production to match
to intake suf cient quantities of cooling water. this projected growth. Without the necessary in-
Many systems have intakes at depths shallower crease in agricultural lands, there would either
than 10 feet and may run dry under drought con- need to be fundamental improvements in yield
ditions. In any of these situations, power plants and management, consumption patterns, or both.
are forced to either reduce generation or shut However, consumption patterns are expected to
down entirely. As a result, the availability of increase, with large middle class populations in
electricity is reduced. In regions at high risk of developing nations aspiring to the standards of
increased water temperatures and/or reduced wa- living available in the First World countries. Wa-
ter availability induced by climate change, power ter requirements vary by crop type, and scarcity
production is especially vulnerable (Fig. 3). or variability in water availability may reduce
yield substantially. In a study on wheat, rice,
Climate Change Consequences on the maize, and soybean conducted by Parry et al.,
Marine Food Web carbon dioxide levels, temperature, and water
Agriculture and water systems have an obvious availability were projected to future levels to
connection: plants need water to grow and yield determine how the yield would be affected (Parry
food. Aside from this main connection, climate et al. 2004). The results illustrate slight to mod-
in uences the agriculture sector with changes in erate negative impact worldwide with the most
Climate and Human Stresses on the Water-Energy-Food Nexus 185

Plausible Projection A
Surface Water Climate Change Multi-sector Population Stream Temperature (°C)
Ground Water and Variability Demand Change 15.0
> 15.0 - 20.0
EPA Changes in Regional Changes in Changes in > 20.0 - 25.0
Regulation Hydro-climatology Water Supply Water Demand > 25.0 - 32.0
> 32.0

Power Plant Capacity (Quad/yr)


Rising Stream Water Stress Net Water £ 0.03
Temperature Availability > 0.03 - 0.05 C
Plausible Projection B > 0.05 - 0.07
Installed > 0.07
Power
Geospatial Production Capacities
at Risk
Locations
ili ts
n

s
Ex
Po

er Pla
tie

Cooling Net Water Availibility (Mgal/yr)


po lan
we

r
ab
Vu we
se
rP

Technologies
d

Po
ln

Impacted
ts

-– k

k
00

50

50

50

50

0
85

15

85
Ecosystems

20

–3

-1

-3

-8
Proximity to

-–

>

k
k

k
£

50

0
0

50

50

15

35
Water Bodies

00

–1
Energy Security Challenges

–8

–3

>

>
–2

>
>

>
>
Climate and Human Stresses on the Water-Energy- power production based on climate model and population
Food Nexus, Fig. 4 A proof of concept on aspects of projections, which combined with stream temperature
the water-energy nexus. The process ow (left) considered sensor and GIS data on energy infrastructures resulted in
water stress resulting from scarcer and warmer freshwater new insights (right)
(Adapted from Ganguly et al. 2015) on thermoelectric

potential changes in yield in Asia and Africa. The Climate, Humans, and WEF
yields, according to this study, will fall by 10 % Changes in population and lifestyles, changes
combined overall throughout all of the regions, in climate, resilience of infrastructures and so-
which has the potential to affect the food security cieties, and regional land use and urbanization
of the growing population, especially in develop- are the main stressors of the water-food-energy
ing economies. nexus. They impact water resources (both quan-
Climate change is severely impacting marine tity and quality), water-related hazards ( oods
food webs. Marine life is comprised within an in- and droughts), built and natural infrastructures,
tricate network; currents make nutrient-rich wa- and coupled natural-engineered human systems
ters available to marine life, sustaining single- across many scales. The in uence and impact
celled plants called phytoplankton, the zooplank- of these stressors are critically dependent on
ton that feed on them, and the larger sh that eat time horizons, spatiotemporal resolutions, and
the zooplankton and may be consumed as food by the resilience of the coupled systems. As climate
humans. change in uences water availability, additional
Climate change and associated changes in stress will be placed on the balance between food
ocean circulations, sea level rise, and coastal and biofuel crops. As discussed earlier, biofuel
winds are altering the patterns of nutrient crops are much more water intensive than tradi-
upwelling and thereby changing the timing of sh tional fuels. As such, the energy and food needs
spawning and the yield from sheries (Wang et al. of people must be carefully considered in light of
2015). Rising sea temperatures are threatening changes in the geographic distribution of water
the habitats of sea life by straining the ability availability (Fig. 4).
of these creatures to cope with temperature
changes. Marine mammals are being forced to
travel longer distances to nd food or to rely Planning Horizons and Spatial Resolutions
on less nutritious, energy-expending substrate The spectrum of water challenges operates at
for survival. Changing marine patterns are also multiple space and time scales, and across dis-
affecting human food production and the shing parate planning horizons, over which scienti c
industry. insights, projections, and policy insights need
186 Climate and Human Stresses on the Water-Energy-Food Nexus

to be generated. They include near-real time to appropriate changes can be made in emergency
weeks, seasonal to interannual, and decadal to preparedness. Reinforcements and remediation
mid-century. measures to build the resilience of communities,
High-resolution predictions are possible over engineered systems, and natural ecosystems may
near-real time to weekly time scales. Beyond this be revived or made more resilient based on this
range, chaotic characteristics and/or extreme sen- knowledge.
sitivity to initial conditions limits the predictabil- Beyond decadal to mid-century time scales,
ity of hydrological and meteorological systems. climate trends are expected to dominate over
Within these time frames, short-term monitoring uncertainties. Projections for population and hu-
and predictions may lead to urgent and immedi- man systems in this range are not available and
ate events and emergency management. Speci c dif cult to create. Similarly, infrastructure and
examples include ood and ash ood warnings, technology changes are nearly impossible to fore-
monitoring water quality of source lakes to gen- see. Most stakeholder decision horizons do not
erate advance warning of possible pathogens in extend to these scales. The climate change adap-
drinking water, urgent warnings about crop de- tation community, as well as the related inte-
struction, or stoppage of power plant operations grated assessments community, has been work-
owing to lack of water during a drought or excess ing at these ranges with a variety of simpli-
water because of oods. ed models. Data and geographic information
Seasonal to interannual time scales encompass science may be useful at these scales primarily
changes in climate patterns such as El Niæo, to develop predictability and uncertainty studies
seasonal changes in monsoons in the Southwest in climate, evaluate the performance of models
USA, seasonal oods in Iowa, the severity of of water, energy, and food systems, develop en-
winter in the Northeast USA, the number of trop- hanced projections with uncertainty, and examine
ical cyclones striking the Gulf Coast during the aggregate impacts in terms relevant for adaptation
hurricane season, seasonal droughts in California, and mitigation. Water in the atmosphere, oceans,
and regional energy production. Speci c weather land, and biosphere signi cantly impact climate
events are not predictable at these time scales, variability yet remain among the largest sources
but average seasonal and annual patterns can be of uncertainty. Climate impacts regional hydrom-
characterized. eteorology and the water balance, including avail-
Decadal to mid-century time scales, ranging ability and quality of surface and groundwater,
from about 5 30 years in the future, are not pre- and in uences natural hazards such as oods
dictable on a seasonal or even annual basis. How- and droughts. Water availability and tempera-
ever, physics-based climate models can project ture affect terrestrial and marine ecosystems, as
climate trends and variability based on assumed well as energy and food security. Climate in u-
emission scenarios. Trends in global warming ences the probability of hazards such as oods
and changes in weather patterns start becoming and droughts at multiple space-time resolutions.
prominent at these horizons. Also at this scale, Adapting to changing hazards requires the re-
variability in mean and extreme climate is ex- silient design of critical infrastructures and effec-
pected to be large. While climate change and the tive management of key resources.
associated uncertainty are expected to be a major
factor in water challenges at these time horizons,
demographic changes may dominate the impact Key Applications
on water resources. The intensity and severity
of oods and droughts may not be predictable The challenge of understanding energy, water,
on a single-event, seasonal, or annual basis at and food policy interactions, and addressing them
these time horizons, but duration and frequency in an integrated manner, appears daunting. Com-
changes may be predictable. The nexus of water prehensive understanding of WEF nexus and sub-
with food and energy can be examined and char- sequent impacts of the human and climate is
acterized together with their uncertainties, and required to:
Climate and Human Stresses on the Water-Energy-Food Nexus 187

Assess the current state and pressures on nat- to enable short-term recovery and long-term
ural and human resources systems preparedness?
Forecast expected demands, trends, and 5. How would uncertainties along the intercon-
drivers on resources systems and interactions nected networks of the nexus be quanti ed,
between water, energy, and food systems including in how the impacts of changes in the
Delineate different sectorial goals, policies, stressors and their extremes propagate along
and strategies in regard to water, energy, and the nexus? C
food. This includes an analysis of the degree
of coordination and coherence of policies, as
well as the extent of regulation of uses. Cross-References
Need for planned investments, acquisitions,
 Climate Extremes and Informing Adaptation
reforms, and large-scale Infrastructure;
 Climate Hazards and Critical Infrastructures
Inform key stakeholders, decision-makers,
and user groups. Resilience

References
Future Directions
Andrews-Speed P, Bleischwitz R, Boersma T, Johnson C,
The key challenges/questions for future direc- Kemp G, VanDeveer SD (2012) The global resource
nexus: the struggles for land, energy, food, water, and
tions are vefold: minerals. Transatlantic Academy, Washington, DC
Bazilian M, Rogner H, Howells M, Hermann S,
1. What are the relationships among the Arent D, Gielen D et al (2011) Considering the en-
interlinked stressors and the stressed systems ergy, water and food nexus: towards an integrated
modelling approach. Energy Policy 39:7896 906.
across the components of the water-climate- doi:10.1016/j.enpol.2011.09.039
energy-food-ecosystem nexus at different time DoE US (2006) Energy demands on water resources: re-
and spatial scales? port to Congress on the interdependency of energy and
2. Can remotely sensed and other information water, vol 1. U.S. Department of Energy, Washington,
DC
about lakes or rivers including water levels F rster H, Lilliestam J (2009) Modeling thermoelectric
and quality, capacity, location and water use of power generation in view of climate change. Reg En-
power production, resilience of energy infras- viron Change 10:327 338. doi:10.1007/s10113-009-
tructures, food crops and biofuels, as well as 0104-x
IPCC (2014) Climate change 2014: impacts, adaptation,
freshwater or marine ecosystems be related to and vulnerability. Part B: regional aspects. Contribu-
each other through graphical dependencies to tion of working group II to the fth assessment report
form interconnected network structures across of the intergovernmental panel on climate change
the disparate systems of the nexus, with a view [Barros VR, Field CB, Dokken DJ, Mastrandrea MD,
Mach KJ, Bilir TE, Chatterjee M, Ebi KL, Estrada
to understanding their systemic dependencies, YO, Genova RC, Girma B, Kissel ES, Levy AN,
feedback, and resilience? MacCracken S, Mastrandrea PR, White LL (eds)].
3. What are the characteristics of the stressors, Cambridge University Press, Cambridge/New York
especially the attributes of their extremes, how Ganguly AR, Kumar D, Ganguli P, Short G, Klausner J
(2015) Climate adaptation informatics: water stress
do they impact the nexus as well as the in- on power production. Comput Sci Eng 17:53 60.
dividual components of the nexus, and how doi:10.1109/MCSE.2015.106
do failures or loss of functionality propagate Kimmell TA, Veil JA, Division ES (2009) Impact of
drought on U.S. steam electric power plant cooling
along the tightly interconnected system of
water intakes and related water resource management
systems? issues. Argonne National Laboratory (ANL), Wash-
4. Can the future ows, feedback, and vulnera- ington, DC
bility along the nexus network, as well as the Parry ML, Rosenzweig C, Iglesias A, Livermore M,
Fischer G (2004) Effects of climate change on global
perturbations of the nexus owing to possible
food production under SRES emissions and socio-
non-stationary behavior of the extreme stres- economic scenarios. Glob Environ Change 14:53 67.
sors, be predicted across multiple time scales doi:10.1016/j.gloenvcha.2003.10.008
188 Climate Change

Rutberg MJ, Michael J (2012) Modeling water use at Definitions


thermoelectric power plants. Thesis, Massachusetts
Institute of Technology
Texas NS. Dried out: confronting the Texas drought n.d.
Climate Change: Climate change is de ned as
https://stateimpact.npr.org/texas/drought/. Accessed
22 Aug 2016 changes in the state of the climate variables that
Wang D, Gouhier TC, Menge BA, Ganguly AR (2015) can be identi ed (by using statistical tests) by
Intensi cation and spatial homogenization of coastal changes in the mean and/or the variability of its
upwelling under climate change. Nature 518:390 394.
properties and that persists for an extended pe-
doi:10.1038/nature14235
Water UN (2014) The United Nations world water de- riod. Extended period in climate context implies
velopment report 2014: water and energy. UNESCO, three decades or even longer time scale (IPCC
Paris 2014). Climate change may be due to natural
internal processes or external forcings and persis-
tent anthropogenic changes in the composition of
Climate Change the atmosphere and/or in land use (Stocker et al.
2013).
 Climate Extremes and Informing Adaptation
 Climate Risk Analysis for Financial Institu- Adaptation: In the context of climate and
tions climate-related extremes, Intergovernmental
Panel on Climate Change Special Report on
Managing the Risks of Extreme Events and
Climate Change and Developmental Disasters to Advance Climate Change Adaptation
Economies (SREX) de nes adaptation as the process of
adjustment to actual or expected climate and its
Lindsey Bressler1 , Kara Morgan1 , Allison effects, in order to moderate harm or exploit
Traylor1 , Hayden Henderson1 , Udit Bhatia1 , bene t opportunities (Field 2012).
Babak Fard1 , Devashish Kumar1 , Rajarshi
Majumder2 , Sourav Mukherji3 , Joyashree Roy4 , Mitigation: IPCC de nes mitigation in its third
Matthias Ruth5;6 , and Auroop R. Ganguly1 assessment report as an anthropogenic interven-
1
Sustainability and Data Sciences Laboratory tion to reduce the sources or enhance the sinks of
(SDS Lab), Department of Civil and greenhouse gases (IPCC-TAR 2001). Although
Environmental Engineering, Northeastern it is hard to completely distinguish mitigation
University, Boston, MA, USA from adaptation measures, the key difference is
2
Department of Economics, The University of that mitigation reduces all impacts (positive and
Burdwan, Burdwan, West Bengal, India negative) of climate change and thus reduces the
3
Organisational Behaviour & Human Resources adaptation challenge, whereas adaptation is se-
Management, Indian Institute of Management lective; it can take advantage of positive impacts
Bangalore, Bengaluru, India and reduce negative ones.
4
Global Change Programme, Department of
Economics, Jadavpur University, Kolkata, India Gini coefficient: Gini coef cient is a measure
5
Resilient Cities Lab, School of Public Policy of statistical dispersion intended to represent the
and Urban Affairs, Boston, MA, USA income distribution of residents in a nation. De-
6
Department of Civil and Environmental veloped in 1912, this is the most commonly used
Engineering, Northeastern University, Boston, measure of inequality (van Ginneken 2003).
MA, USA
Sustainable development: Fundamental to our
Synonyms understanding of climate change and develop-
mental economics is the term sustainable de-
Adaptation; Kuznets curve; Mitigation; Sustain- velopment. The most commonly used de nition
able development for sustainable development comes from Our
Climate Change and Developmental Economies 189

Common Future, a paper released by Brundt- or other low-carbon investments may make good
land Commission. The commission was estab- business sense while lessening the more severe
lished in the 1980s as a response to the in- effects of climate change. The Intergovernmental
adequate response of the 1970s environmental Panel on Climate Change (IPCC) ndings on
movement. Sustainable development, according mitigation are outlined in its Working Group III
to the commission, is development that meets report (Summary for Policymakers 2014). Ac-
the needs of the present without compromising cording to the IPCC, mitigation efforts, by lim- C
the ability of future generations to meet their iting the impacts of climate change, can enhance
own needs (Brundtland 1985). The commission sustainable development, allow for more equi-
further speci ed this de nition by adding three table distribution of resources, and assist with
pillars: economic growth, environmental protec- poverty. IPCC Working Group III describes the
tion, and social equality. Although many focus challenge of distributive justice, or the division
on the second pillar, the commission emphasized of mitigation efforts equitably, to account for
the interconnectivity of all three. Only when each the greater impact of climate change on devel-
pillar is achieved can sustainable development oping nations with historically lower emissions.
truly be realized. For the 1.3 billion people in the world without
access to electricity (Summary for Policymakers
2014), sustainable development will be required
Historical Background for this population s economic growth to not
simultaneously expanding fossil fuel usage. The
Climate change is expected to worsen the scarcity report presents developing, urbanizing cities as a
of water, food, and energy resources and exacer- signi cant mitigation target to reduce emissions,
bate hazards such as heat waves and heavy pre- as many developed nations remain locked into
cipitation. Developing economies are character- excessive emissions by existing infrastructure.
ized by growing and vulnerable populations, ever Mitigation policy will require a systemic ap-
increasing income and wealth inequality, deterio- proach, innovation, and dif cult decisions: effec-
rating or inadequate infrastructures, as well as the tive greenhouse gas emission reductions cannot
inability to mobilize resources for relief, rescue, be done if all nations do not act. The systemic
or recovery efforts. Thus, while no region is im- change required will necessitate signi cant pub-
mune to climate change, developing economies lic, private, and institutional spending at an global
have been hit the hardest and will continue to level that takes into consideration local and his-
be disproportionately impacted in the coming torical practices and the potential for injustice
decades. However, climate adaptation and mitiga- between both developing and developed nations.
tion remain hotly debated, especially in situations The IPCC s WGII report (IPCC 2014) iden-
where economic inequalities are severe and the ti es different areas of society that are at risk
disparity between future societal aspirations and and the impact of climate change on different
the current status is large. Reduced reliance on populations in the face of remaining uncertainty
fossil fuels may be viewed as less urgent com- on the exact timing and extent of its impacts.
pared to industrialization and economic growth, The IPCC report nds the most at risk vul-
while land use and urban planning may be viewed nerable populations to be those that are already
as hindrances to improving quality of life. Adapt- the most disadvantaged in society: socially, eco-
ing to projected disasters or resource scarcity may nomically, politically, culturally, institutionally,
seem a case of misplaced priorities amidst a lack or otherwise. The WGII report recommends rst
of adequate investments in education, health, or evaluating levels of vulnerability and risk. Once
food security. However, it is within these devel- risks are de ned, policy makers must evaluate
oping economies that low regrets or transforma- resilience; the ability of a society to recover from
tional adaptation may signi cantly reduce loss disasters and hazardous events, ef ciently and
of lives and economic devastation of the under- effectively. Climate resilience, speci cally, is the
privileged. In the longer term, renewable energy ability to manage the impacts of climate change,
190 Climate Change and Developmental Economies

Climate Change and Developmental Economies, in both the climate system (left) and socioeconomic
Fig. 1 Risk of climate-related impacts results from the processes including adaptation and mitigation (right) are
interaction of climate-related hazards with the vulnerabil- drivers of hazards, exposure, and vulnerability (Source:
ity and exposure of human and natural systems. Changes IPCC 2014, WGII AR5 SPM, pp 26)

reducing disruptions and expanding opportuni- that this relationship can be decoupled, that is,
ties. In Fig. 1, the IPCC s iterative risk manage- that economic growth should not be conditional
ment process is illustrated. Risk, according to the on environmental degradation. Two ideas, the
IPCC is the intersection of hazards, vulnerability, environmental Kuznets curve (EKC) and the
and exposure. While climate causes hazards, risk more recent Ecomodernist Manifesto, explore
arises out of vulnerability and exposure due so- the topic of decoupling further.
cioeconomic processes.
Among the uncertainty of climate change and The Environmental Kuznets Curve
its impacts, it is widely accepted that effective
adaptation requires localized and targeted ap- A country s level of economic development is a
proaches. The IPCC recommends investing in in- key component in the debate on climate change
frastructures, development assistance, and exist- mitigation. Do developed countries need to take
ing disaster risk management institutions. How- on the majority of emissions reductions? Is there
ever, the report emphasizes that a one-size- ts- such a thing as a far share carbon space? Does
all solution will be ineffective globally due to a country s GDP growth inherently lead to a
varying social values, interests, expectations, and worse off environment? Questions like these are
circumstances (Smith et al. 2014). at the heart of the climate change-development
conundrum. The environmental Kuznets curve,
Scientific Fundamentals developed in 1991 by Grossman and Krueger,
shows a graphical relationship between develop-
The relationship between economic growth and ment and environmental degradation. Based on
environmental degradation is a fundamental the economic Kuznets curve, which described the
one for sustainable development. Many hope relationship between a country s gross domestic
Climate Change and Developmental Economies 191

state or market system. In addition, Stern et al.


point out that an EKC-type relationship may be
the result of the effects of free trade (Stern 2004).
Environment degradation

Turning point

As countries economies begin to grow, they


specialize in human capital-based activities and
trade with countries that specialize in resource-
Environment Environment intensive ones. Because of specialization, C
worsens improves
the results of the EKC may not accurately
re ect a true improvement in environmental
outcomes.

Per capita income


Climate Resilient Pathways
Climate Change and Developmental Economies,
Fig. 2 Hypothetical environmental Kuznets curve. Ac- Effective policy to address climate change must
tual EKC may not be smooth or symmetrical extend beyond adaptation. IPCC s Working
Group II recommends the implementation of
climate-resilient pathways, an approach to sus-
product (GDP) per capita and Gini coef cient, tainable development that combines both adap-
the EKC, too, is shaped like an inverted parabola, tation and mitigation strategies. The pathways
which is shown in Fig. 2. A key difference be- require transformations on social, economic,
tween the two curves is that the Kuznets curve political, and technological levels (IPCC 2014).
went from observation to a theory, whereas the Mitigation strives to reduce the rate and magni-
EKC is the reverse. tude of climate change, allowing more time, even
The EKC is important because it challenges decades, to implement effective adaptation.
the idea that environmental degradation The IPCC implores policy makers to
necessarily continues with economic growth and implement deliberative policy, strive for
instead provides a paradigm for how growth innovation, and make changes as mistakes are
and sustainability can work simultaneously. made and lessons are learned. Policy makers
It is noted that EKC is true for many local face challenges of limited resources, poor
pollutants but not for CO2 . EKC has not have planning, misinformation, miscalculation, and
been interpreted as a justi cation for maintaining the prioritization of short-term consequences
business as usual or encouraging greater GDP over the long term. Though the task of climate
growth. Under the logic of the EKC, developing mitigation is daunting, the impact is predicted
countries will, with economic development, to do more than reduce climate change risks,
ultimately reach an optimal level of pollution. but to improve lives, well-being, and global
This theory supports the idea that in order to management of the environment.
lower emissions overall, a country s economy
must be stronger. However, the concept that
some countries are too poor to be green New Methods of Measuring
misinterprets development s relationship to the Development
environment in two key ways. First, it leaves
out movements that are directed against the Although the empirical validity of EKC specif-
environmental degradation that are caused, ically for CO2 has been disputed, there is not
rather than mitigated, by increasing wealth. one of cial alternative to measuring the rela-
Second, it disregards the environmentalism of tionship between degradation and the environ-
the poor (Martinez-Alier 1995). This type of ment economy. Barrett and Graddy proposed that
environmentalism is centered on maintaining there exists an inverted U-shape relationship be-
community resources that are threatened by the tween environmental degradation and civil and
192 Climate Change and Developmental Economies

political freedoms, rather than economic devel- The way for humanity to develop further while
opment (Barrett et al. 2000). preserving the surface of the earth is for society
Some ecological economists argue that a bet- to decouple from nature and decrease resource
ter measure of human well-being is re ected in dependency. Simply put, nature unused is nature
human development rather than in the measures spared (Asafu-Adjaye et al. 2015).
of GDP growth alone. Steinberger and Roberts The Manifesto de nes decoupling as the
looked at the relationships between four differ- decrease in rate of environmental impact of a
ent measures of human development (life ex- process, as economic output rates increase. The
pectancy, literacy, GDP per capita, and Human authors set a goal of absolute decoupling, when
Development Index) and two measures of re- the rate of human consumption of resources and
source use (primary energy use and carbon emis- energy peaks and declines. The Manifesto sug-
sions). They concluded that human well-being gests humanity is on this path to peak envi-
over time is becoming steadily more ef cient. ronmental impact by the end of this century,
This challenges the perception that increased en- due to trends of increased urbanization, agricul-
ergy usage and increased emissions are necessary tural technology expansion, and the introduction
for better living conditions. of ef cient technology. The Manifesto s recom-
Ultimately, Steinberger and Roberts research mended strategy to reduce the human dependency
re ects new possibilities of dissociation between on natural resources and strive for rapid de-
raising the standard of living while degrading carbonization include urbanization, aquaculture,
the environment. In their words, high human agricultural intensi cation, nuclear power, and
development can be generated at lower and lower desalination.
energy and carbon emissions costs, and the qual- A strength of the manifesto is that it recom-
ity of life is steadily decoupling from its material mends innovation, not a return to earlier prac-
underpinnings. They found that different mea- tices. It rejects nostalgia in environmentalism,
sures of development can be achieved at different and the idea that humanity has previously lived
rates of energy usage. Literacy, for instance, for lighter on the land, since three-fourths of global
instance, requires far less energy output than deforestation occurred before the industrial revo-
GDP (Steinberger and Roberts 2010). This new lution. It calls for the expansion of electricity in
paradigm of development is useful. The justi ca- the developing world, in contrast to environmen-
tion for decoupling is re ected in other literature talists who theorize that resources are not avail-
as well, including the Ecomodernist Manifesto. able for such development. It presents the idea
that global poverty is an environmental problem,
one that cannot be ignored.
Ecomodernist Manifesto However, the Manifesto lacks a concrete
strategy for more ef cient use of resources. It
The Ecomodernist Manifesto rejects the idea that presents scalable, power dense technologies as
humanity will run out of resources and that in- an alternative to carbon energy sources, even
creased human development is a problem. Ac- though present technologies are not yet capable
cording to the Manifesto, the real issues are the of achieving that transition. It relies on the
misuse of energy sources, inef cient technology, discovery of such technologies, but even admits
and excessive carbon emissions. Predicted out- that such progress is not inevitable.
comes of the current path of resource usage,
including ocean acidi cation, the loss of ozone in
the earth s stratosphere, and climate shifts, could Case Study: Paradox of Development
result in economic, population, and ecological and Mitigation
loss. Along with long-term effects, the Manifesto
points out well-known immediate impacts on The relationship between development and its
populations, including water and air pollution. con ict with sustainable development is well
Climate Change and Developmental Economies 193

documented both in theoretical and empirical ture (sponge iron, carbon products, and smelting
literature. Though the start was made in units, among others). The industrial units were
developed countries and most of the empirical belching fumes containing substantial amount of
studies are still in the context of high-income harmful gases, toxic chemicals, and Suspended
industrialized countries, developing countries Particulate Matter (SPM). This has resulted in
are slowly rising up to analytical research that pollution of air and water in the surrounding
explores the link between economic activities, areas, health- and other-related damages to both C
environmental impacts, climate change, and the workers and residents, and another impetus to
sustainability. Several such studies as well as global climate change. SPM10 level at Durgapur
the reports brought out by the UNFCC have stood at 350 mg/m3 in August 2014, more than
put developing countries, especially India and ve times of the permissible standard level of
China, in a dilemma. On one hand lie the 60 mg/m3 . The nitrogen dioxide level stood at
aspirations of its people to achieve a decent 51.5 mg/m3 in 2014 and continues to increase.
standard of living and come out of poverty, It has been observed by researchers that the neo-
which requires these economies to achieve a high industrialization drive has raised pollution levels
macroeconomic growth rate. On the other is the and there are substantial costs involved as evident
global responsibility of these countries to check by using Willingness To Pay and Willingness
GHG emissions and adopt mitigation measures to Accept methods. More than one-third of the
to delay climate change. Matters are made more respondents report the pollution level as unbear-
complex by the domestic heterogeneity of these able while half of them say that it has increased
countries one part having a lifestyle and values over the last decade. Estimated value of environ-
akin to the rst world and another whose troubles, mental damage is about 2 % of the gross output
struggles, and aspirations resemble the least of the new industries clearly pointing at the
developed countries. While the terms of and substantial cost of development without looking
solutions to challenges at the macro level remain at environmental impacts.
gray, there is some evidence that at the microlevel It is not that only industrialization in
policy makers prefer development at any cost. developing countries is to be blamed. Agriculture
We shall refer to some case studies to understand has its own way of affecting the environment and
how some of these activities are degrading the bringing about climate change. Biswas (2010)
environment in developing countries. looked at the impact of extensive agricultural
The Asansol-Durgapur region in the eastern expansion on water availability and LULC
part of India has been an economic downturn changes in the rice bowl of Eastern India (Biswas
and industries shuttered. The region came to 2010). It was observed that over a period of
be known as the Ruhr of India because of the three decades [1971 2001 roughly coinciding
large number of large industrial units set up in with the period of agricultural revolution in
the region postindependence. These units were the region that turned mono-crop land to
set up with adequate attention to environmental three-crop land through intensive irrigation,
standards and impact on local pollution. How- mechanization, improved seeds, pesticides, and
ever, since the early 1990s the region faced an chemical fertilizer], land use shifted strongly
economic downturn and industries shuttered. By in favor of agriculture. However, the strategy
the early 2000, the government started a new was water intensive and resulted in lowering of
industrialization drive in the region by providing the groundwater table from 8 m to 15 m in just
several concessions to entrepreneurs including 10 years. The number of surface water bodies
scal bene ts, a map of deregulated land use, and decreased drastically, as did their total surface
fast licensing. A large number of industrial units area. Markov chain modeling predicts a 50 %
came up over the next ve years, most of them in decline in water bodies over the next 25 years
the earlier green belts and close to the residential along with a corresponding rise in cultivated area
areas. Most of them were also polluting in na- and settlements. However, this situation is not
194 Climate Change and Developmental Economies

sustainable as reducing water availability leads to them from the non-governmental/not-for-pro t


increased cost of cultivation, making agriculture organizations and the fact that they do not want
a nonpro table and vulnerable livelihood option, to maximize pro ts makes them different from
defeating the initial objective of agricultural the commercial enterprises.
expansion. Another associated threat is that Key application of the mitigation and enabling
of arsenic contamination of groundwater a resilience in the resource constrained setting is
serious health risk in the lower Gangetic plain of exempli ed through a story of social enterprise
Bangladesh and Eastern India mainly caused by Selco that provides solar lighting systems to
groundwater depletion for irrigation (Chakraborti the economically underprivileged. A signi cant
et al. 2010). number of India s 620,000 villages that
accommodates nearly 60% of her population
do not enjoy bene ts of electricity. The electric
Key Application grid has not yet reached many of them. Even
those villages where there is grid connectivity,
Social Entrepreneurship a lack of adequate power generation results in
As outlined in previous sections, climate long periods without a supply. As a consequence,
change s consequences could lead to even life and livelihood activities in these villages are
greater vulnerabilities and worse outcomes governed by availability of sunlight. Availability
for all of us, even more so for the world s of sources of energy at an affordable price is a
most vulnerable populations. For instance, critical problem for the rural poor in India, the
in developing economies such as India, the negative impact of which is disproportionately
government s priorities seem to be at odds borne by the womenfolk. The compromises that
with those of climate mitigation. Economic they make in order to overcome this challenge
development, more often than not, is likely had deleterious effects both on their health as
to result in adverse impact on climate in this well as on the environment. In 1994, Harish
context. It is also not practical to expect business founded Selco as a for-pro t enterprise that
organizations to do much for climate mitigation would sell solar lights to the rural poor in
apart from what is mandated by the law or India. While organizations and institutions in
regulatory authority. After all, their objective those days that wanted to address the needs of
function is maximization of shareholders wealth the poor, typically operated as not-for-pro ts,
and climate related priorities can at best be looked depending on grants and philanthropies, Harish
as one of their several constraints. Does that mean was con dent that he could create a nancially
all is lost for countries such as India, as far as sustainable enterprise that had a social objective.
climate mitigation is concerned? Are climate He has often been quoted saying that he set up
mitigation priorities going to be necessarily Selco to prove three things the poor can afford
sacri ced at the altar of developmental priorities, technology, the poor can maintain technology and
or is there any hope? Social enterprises, an it is possible to run a commercial venture that
emerging class of organizations that work at ful lls a social objective. In 2014, after nearly
the intersection of economic and environmental two decades of operation, Selco has sold 200,000
sustainability might offer such hope. While a solar lighting systems across ve different Indian
consistent de nition of a social enterprise is states and has remained a nancially sustainable
yet to emerge, what is being generally accepted enterprise. Selco s success clearly demonstrates
is the fact that these organizations pursue a that it is possible to tackle the twin problems
social objective such as poverty alleviation or of sustainable energy and poverty alleviation
environmental sustainability as their primary goal simultaneously, even while maintaining the
(objective function) and leverage market princi- bottom line of an enterprise.
ples to become nancially sustainable. Their However, the journey in the beginning was
objective of nancial sustainability distinguishes arduous, to say the least. Even though Harish
Climate Change and Developmental Economies 195

and his team were able to put together a solar determination. On the other hand, it will make
lighting system that was suitable for the harsh them resilient to deal with the adverse impact of
environment of rural Karnataka, at about INR climate change.
5000 per light, it was not affordable by those at
the base of the economic pyramid. They could Future Directions
only buy the product if they were provided credit.
However, the banks were not ready to lend to the C
First and foremost, Ecomodernist Manifesto are
poor, especially because lighting systems were calls to action and discussion. Though the Man-
viewed as a consumer durable product and banks ifesto identi es the challenges of global climate
were instructed to provide loans to the poor only change to be technological ones, it recognizes
for income generating activities. Thus, Harish the need to adopt certain values in society to
realized that it was necessary to link the purchase fully address them, including democracy, toler-
of solar lights to a stream of income. That was ance, and pluralism. However, in order to reach
not too dif cult to do because solar lights could the great Anthropocene era that the Manifesto
increase the number of business hours for those strives for, private businesses and state institu-
who had to close shop after sundown because of tions must invest in technological research and
lack of electricity. Moreover, there were others embrace regulations to mitigate emissions. Tech-
who were purchasing kerosene to do business nologies including nuclear power, wind power,
after sunset. Selco could structure a nancing solar power, and desalination remain either un-
plan for them such that the money they saved sustainable and carbon intensive or economically
from not having to buy kerosene was more than inef cient. Scalable, power dense alternatives to
the money they would have to pay for loan carbon energy must be developed in order to both
repayment. Finally, after a lot of convincing, urbanize and intensify agriculture and simultane-
banks started to provide credit for purchasing ously reduce human impacts on the environment.
solar lights and Harish s dream of selling solar The environmental Kuznets curve, too, re ects a
lighting systems to the poor took concrete shape. certain sense of optimism in its own modeling.
Harish also realized that apart from nancing, One could argue that either the Ecomodernist
Selco also needed to provide prompt service to Manifesto or the EKC simpli es the idea of
its customers. Since customers would depend on decoupling too much. However, the concept
its lights to run their business, any downtime that economic growth does not necessarily need
would imply loss of livelihood opportunity and to be the cause of environmental degradation
thereby loss of credibility. Selco therefore estab- may be a positive framework to encourage action
lished a wide network of service centers all across on sustainable development.
Karnataka so that service engineers could reach
even the most remotely located customer within
a reasonable amount of time. Cross-References
Selco s journey, apart from being inspi-
rational, holds a lot of lessons for social  Climate Adaptation, Introduction
entrepreneurs and others who engage with the  Climate and Human Stresses on the Water-En-
problem of seeking market based solutions for ergy-Food Nexus
poverty alleviation. However, changes need to
be systemic in order to have any perceptible
impact on resilience or mitigation. Therefore, References
such efforts need to be scaled in multiple domains
such as healthcare, education and livelihood Asafu-Adjaye J, Blomquist L, Brand S, Brook BW, De-
fries R, Ellis E, Foreman C, Keith D, Lewis M, Lynas
generation. This, on one hand, will reduce the M, Nordhaus T, Pielke R, Pritzker R, Roy J, Sagoff
economic vulnerability of a large section of the M, Shellenberger M, Stone R, Teague P (2015) An
population, giving them opportunity for self- ecomodernist manifesto, pp. 32
196 Climate Extremes

Barrett S, Graddy K (2000) Freedom, growth, and Adler Baum Brunner P Eickemeier B Kriemann J
the environment. Env Dev Econ. core/journals/ Savol. Schl mer C, Von Stechow T, Zwickel JC x (eds)
environment-and-development-economics/article /free Cambridge University Press, Cambridge/New York
dom-growth-and-the-environment / 393DCC0CAB23F van Ginneken W (2003) Extending social
8A9837DCC892B3CB90A. Accessed 6 Sept 2016 security: policies for developing countries. Int
Biswas B (2010) Changing water resources study using Labour Rev 142:277 294. doi:10.1111/j.1564-
GIS and spatial model a case study of Bhatar Block, 913X.2003.tb00263.x
district Burdwan, West Bengal, India. J Indian Soc
Remote Sens 37:705 717. doi:10.1007/s12524-009-
0049-z
Brundtland GH (1985) World commission on environment
and development. Env Policy Law 14:26 30
Climate Extremes
Chakraborti D, Rahman MM, Das B, Murrill M, Dey
S, Chandra Mukherjee S et al (2010) Status of  Climate Extremes and Informing Adaptation
groundwater arsenic contamination in Bangladesh:
a 14-year study report. Water Res 44:5789 5802.
doi:10.1016/j.watres.2010.06.051
Field CB (2012) Managing the risks of extreme events and
disasters to advance climate change adaptation: spe- Climate Extremes and Informing
cial report of the intergovernmental panel on climate Adaptation
change. Cambridge University Press, Cambridge
IPCC. Climate Change (2014) Impacts, adaptation, and
vulnerability. Part B: regional aspects. Contribution Hayden Henderson1;2 , Laura Blumenfeld1 ,
of working group II to the fth assessment report Allison Traylor1;3 , Udit Bhatia1 , Devashish
of the intergovernmental panel on climate change Kumar1 , Evan Kodra1;4 , and Auroop R.
[Barros VR, Field CB, Dokken DJ, Mastrandrea MD, Ganguly1
Mach KJ, Bilir TE, Chatterjee M, Ebi KL, Estrada 1
YO, Genova RC, Girma B, Kissel ES, Levy AN, Sustainability and Data Sciences Laboratory
MacCracken S, Mastrandrea PR, White LL (eds)]. (SDS Lab), Department of Civil and
Cambridge University Press, Cambridge/New York Environmental Engineering, Northeastern
IPCC-TAR M (2001) Third assessment report of the in- University, Boston, MA, USA
tergovemmental panel on climate change. Cambridge 2
University Press, New York Department of Mechanical and Industrial
Martinez-Alier J (1995) The environment as a luxury good Engineering, Northeastern University, Boston,
or too poor to be green ? Ecol Econ 13:1 10 MA, USA
Smith KR, Woodward A, Campbell-Lendrum D, Chadee 3
Department of Political Science, Northeastern
DD, Honda Y, Liu Q et al (2014) Human health:
impacts, adaptation, and co-bene ts. In: Field CB, University, Boston, MA, USA
4
Barros VR, Dokken DJ, Mach KJ, Mastrandrea MD, risQ Incorporated, Cambridge, MA, USA
Bilir TE et al (eds) Climate change 2014 impacts
adapt. Vulnerability Part Glob. Sect. Asp. Contrib.
Work. Group II Fifth Assess. Rep. Intergov. Panel Synonyms
Clim. Change. Cambridge University Press, Cam-
bridge/New York, pp 709 54
Steinberger JK, Roberts JT (2010) From constraint to Climate adaptation; Climate change; Climate im-
suf ciency: the decoupling of energy and carbon pacts; Climate resilience; Climate risks; Climate
from human needs, 1975 2005. Ecol Econ 70:425 33. variability; Disaster risks; Floods and droughts;
doi:10.1016/j.ecolecon.2010.09.014
Weather extremes
Stern DI (2004) The rise and fall of the environ-
mental Kuznets curve. World Dev 32:1419 1439.
doi:10.1016/j.worlddev.2004.03.004 Definitions
Stocker TF, Qin D, Plattner GK, Tignor M, Allen SK,
Boschung J et al (2013) Climate change 2013: the
physical science basis. Intergovernmental panel on Climate Extremes
climate change, working group I contribution to the Climate extremes may be de ned inclusively as
IPCC fth assessment report (AR5), New York severe hydrological or weather events, as well
Summary for Policymakers. Clim. Change (2014) Mitig.
as signi cant regional changes in hydromete-
Clim. Change Contrib. Work. Group III Fifth Assess.
Rep. Intergov. Panel Clim. Change Edenhofer O R orology, which are caused or exacerbated by
Pichs-Madruga Sokona E Farahani Kadner K Seyboth climate change and which may in turn cause
Climate Extremes and Informing Adaptation 197

severe stresses on regional resources, economy, Historical Background


and the environment. While regional warming
and heat waves, and perhaps heavy precipita- The United States experienced 32 weather ex-
tion, can be attributed to climate change with treme events including oods, hurricanes, and
a degree of credibility and projected relatively droughts between 2011 and 2013. Each of these
reliably, signi cant uncertainties continue to exist events caused at least one billion dollars in dam-
for regional hydrology, including oods and soil ages. 2012 ranks as the second costliest year on C
moisture, as well as tropical cyclones or hur- record, with more than $110 billion in damages
ricanes and droughts. Intergovernmental Panel (Karl et al. 2009). While heat waves are ex-
of Climate Change (IPCC) Special Report on pected to grow more intense, frequent, and longer
Extremes 2012 (SREX) has adopted an event- duration despite considerable uncertainties and
based de nition of climate extremes. It de nes geographic variability, cold snaps are expected
extremes as occurrence of a value of a weather to reduce in frequency but expected to persist
or climate variable above (or below) a threshold with current intensity and duration. Overall, the
value near the upper (or lower) ends of the range tails (or extremes) of temperature distributions
of observed values of the variable (Field 2012). at regional and seasonal scales have been shown
It is noted that our de nition of the extremes to change asymmetrically under warming sce-
includes stresses induced by severe events as well narios. Precipitation extremes, speci cally high
as regional changes in hydrometeorology. rainfall events, are expected to intensify under
warming scenarios, although design (intensity-
Adaptation duration-frequency, or IDF) curves exhibit con-
IPCC SREX 2012 de nes adaptation as the siderable uncertainty especially from local to
process of adjustment to actual or expected cli- regional scales (Kao and Ganguly 2011). Snow-
mate and its effects, in order to moderate harm fall averages may decrease over certain regions
or exploit bene cial opportunities. Adaptation but the extremes of snowfall may not reduce in
measures can range from local actions including intensity (Kodra et al. 2011). Wind speeds appear
regulated water usage in households or farmers to have shown a decline globally although projec-
planting eco-friendly crops to large-scale infras- tions of wind extremes under climate scenarios
tructure changes such as building defenses are dif cult and certain regional wind-derived
(e.g., levees, natural barriers such as mangroves marine circulation has intensi ed and is expected
in coastal area) to protect against rising sea to continue to do so under warming scenarios
levels or improving the quality of infrastruc- (Kulkarni et al. 2016). Analysis of climate ex-
ture to stand against high intensity hurricane tremes averaged over global urban regions sug-
events. gests more intense heat waves, lack of consistent
The two primary responses of policy makers patterns in precipitation extremes, and reduction
and stakeholders to climate change are mitiga- in wind extremes in cities.
tion and adaptation. Both mitigation and adapta- In addition to intensi cation of weather ex-
tion are essential, because even if emissions are tremes in changing environments damage and
drastically reduced in future decades, adaptation losses from weather-related events have markedly
measures will still be required to cope up with the increased over the past 30 years, mostly due to
changes that have already been induced. Mitiga- increase in exposure owing to mass population
tion addresses the root causes of factors inducing migrations and increased value of properties in
climate change. For example, measures to reduce urban coastal areas (Aerts et al. 2014). Accord-
the emissions of greenhouse gases are considered ing to one reinsurance company (Munich Re),
mitigation efforts. Adaptation seeks to reduce the approximately $150 billion in economic losses
risks posed by consequences of environmental were caused by weather-related events in the year
changes, weather extremes exacerbated by cli- 2012 alone. The situation is further exacerbated
mate change, or natural variability. by lack of adaptation, i.e., reactive responses and
198 Climate Extremes and Informing Adaptation

anticipatory planning. Weather and hydrologic As a result, the uncertainty in future changes
hazards may be caused or exacerbated by natural of extreme events, especially at the local and
climate variability and climate change. However, larger scale, is great. The uncertainty created
the hazards turn into disasters and indeed catas- by a changing climate and dynamic develop-
trophic events when infrastructures and lifelines ment trajectories poses challenges for decision-
are vulnerable and when exposure to hazards is making. This section outlines methods that can
high. be used to quantify, characterize, and attribute
For example, in 2005 during Hurricane Kat- extremes to inform adaptation and policies. In
rina, the eye of the hurricane passed east of the the context of climate, while there are different
city of New Orleans without causing catastrophic types of extremes, heat waves and cold snaps
damage to buildings and structures. However, are the most dif cult to quantify, and hence we
ood walls and levees designed to protect the city focus on the methods related to these. Methods
from oods were breached at more than 50 loca- to quantify extremes are classi ed into three
tions leaving approximately 8 % of New Orleans broad categories: (a) impact relevant metrics,
ooded. Hence, how much human population (b) methods to quantify trends in time and space,
are affected by changes in extreme weather also and (c) extreme attribution.
depends on level of adaptability and preparedness
in addition to exposure and vulnerability. (a) Impact relevant metrics
The major constraints in translating climate Impact relevant metrics include heat
extreme science to adaptation-relevant insights waves (de ned as prolonged period of
are the uncertainties in our understanding and excessively hot weather. While de nitions
in projections at (local to regional) scales and vary, a heat wave is measured relative to
(decadal) planning horizons relevant to stake- the usual weather in the area and relative to
holders. At regional and decadal scales process normal temperatures for the season) and cold
understanding and model projections are less spells (de ned as rapid fall in temperature
accurate, while at decadal scales the uncertainties within a 24-h period requiring substantially
are dominated by natural variability and hence increased protection to agriculture, industry,
dif cult to translate to risk-based design princi- commerce, and social activities. The precise
ples. While there is strong evidence of human criterion for a cold wave is determined by the
in uence in the warming of the atmosphere and rate at which the temperature falls and the
the ocean and in changes in the global water minimum to which it falls. This minimum
cycle and changes in climatic extremes (Qin et al. temperature is dependent on the geographical
2013), the low con dence in the presence of region and time of year).
trends in certain extreme events such as inten- (b) Methods to quantify trends in time and
si cation of hurricanes, droughts, and the sub- space
sequent attribution to human activities makes A few examples of these methods include
adaptation and planning for these extreme events (but not limited to) generalized extreme value
a daunting task (Table 1). theory (GEV), trend analysis, and covariates
in extremes.

Scientific Methods
Generalized Extreme Value
IPCC s fth assessment calls for more attention
to how adaptation is implemented in response Generalized extreme value (GEV) theory is a
to climate risks with special focus on the role family of continuous distribution that combine
of extremes in the adaptation process (Change type I (Gumbel), type II (FrØchet), and type III
IP on C 2014). However, future climate simula- (Weibull) extreme value distributions. The GEV
tions display large uncertainty in mean changes. is the only possible limit distribution of sequence
Climate Extremes and Informing Adaptation, Table 1 Summary of global-scale increase in uncertainty as we move down the table (Source: IPCC AR5 (Field 2012)
assessment of recent observed changes and human contribution to the extremes, both Working Group I (WGI) Summary for Policy Makers, Table SP)
in terms of detection of change and attribution to humans for the changes. Note the
Phenomenon and direction of trend Assessment that changes occurred (typically since 1950 unless Assessment of a human contribution to
otherwise indicated) observed changes
Warmer and/or fewer cold days and nights over most land areas Very likely Very likely
Very likely Likely
Very likely Likely
Warmer and/or more frequent hot days and nights over most land Very likely Very likely
areas Very likely Likely
Very likely Likely (nights only)
Climate Extremes and Informing Adaptation

Warm spells/heat waves. Frequency and/or duration increases over Medium con dence on a global scale Likelya
most land areas Likely in large parts of Europe, Asia
and Australia
Medium con dence in many (but not Not formally assessed
all) regions
Likely More likely than not
Heavy precipitation events, Increase in the frequency, intensity, Likely more land areas with increases Medium con dence
and/or amount of heavy precipitation than decreasesc

Likely more land areas with increases Medium con dence


than decreases
Likely over most land areas More likely than not
Increases in intensity and/or duration of drought Low con dence on a global scale Low con dence
Likely changes in some regionsd
Medium con dence in some regions Medium con dencef
Likely in many regions, since 1970e More likely than not
Increases in intense tropical cyclone activity Low con dence in long term (centen- Low con dencei
nial) changes
Virtually certain in North Atlantic
since 1970
(continued)
199

C
200

Climate Extremes and Informing Adaptation, Table 1 (continued)


Phenomenon and direction of trend Assessment that changes occurred (typically since 1950 unless Assessment of a human contribution to
otherwise indicated) observed changes
Low con dence Low con dence
Likely in some regions, since 1970 More likely than not
Increased incidence and/or magnitude of Likely (since 1970) Likelyk
extreme high sea level Likely (late twentieth century) Likelyk
Likely More likely than notk
a
Attribution is based on available case studies. It is likely that human in uence has more than doubled the probability of occurrence of some observed heat waves in some
locations.
b
Models project near-term increases in the duration, intensity and spatial extent of heat waves and warm spells.
c
In most continents, con dence in trends is not higher than medium except in North America and Europe where there have been likely increases in either the frequency or
intensity of heavy precipitation with some seasonal and/or regional variation. It is very likely that there have been increases in central North America.
d
The frequency and intensity of drought has likely increased in the Mediterranean and West Africa, and likely decreased in central North America and north-west Australia.
e
AR4 assessed the area affected by drought.
f
SREX assessed medium con dence that anthropogenic in uence had contributed to some changes in the drought patterns observed in the second half of the 20th century, based
on its attributed impact on precipitation and temperature changes. SREX assessed low con dence in the attribution of changes in droughts at the level of single regions.
g
There is low con dence in projected changes in soil moisture.
h
Regional to global-scale projected decreases in soil moisture and increased agricultural drought are likely (medium con dence) in presently dry regions by the end of this century
under the RCP8.5 scenario. Soil moisture drying in the Mediterranean, Southwest US and southern African regions is consistent with projected changes in Hadley circulation
and increased surface temperatures, so there is high con dence in likely surface drying in these regions by the end of this century under the RCP8.5 scenario.
i
There is medium con dence that a reduction in aerosol forcing over the North Atlantic has contributed at least in part to the observed increase in tropical cyclone activity since
the 1970s in this region.
j
Based on expert judgment and assessment of projections which use an SRES A1B (or similar) scenario.
k
Attribution is based on the close relationship between observed changes in extreme and mean sea level.
l
There is high con dence that this increase in extreme high sea level will primarily be the result of an increase in mean sea level. There is low con dence in region-speci c
projections of storminess and associated storm surges.
m
SREX assessed it to be very likely that mean sea level rise will contribute to future upward trends in extreme coastal high water levels
Climate Extremes and Informing Adaptation
Climate Extremes and Informing Adaptation 201

independent and identically distributed random which may be linear or non-linear. Trend may be
variables maxima that are properly normalized. linear or nonlinear. To test the presence of trends,
The GEV has cumulative distribution function: simple linear regression is most commonly
n o used to estimate the slope in combination with
x 1n
F .xI ; ; / D exp 1C signi cance tests such as parametric Student s
t-test or nonparametric Mann-Kendall test
(1)
(to test both linear and nonlinear signi cance) C
It is the three parameter distribution where , with the underlying null hypothesis that no trend
, and represent location parameter, scale pa- is present.
rameter, and the shape parameter, respectively. In
statistics, location parameter determines the shift (c) Extreme attribution
of distribution, scale parameter quanti es spread Weather and climate extremes occur all
(or variability) of the distribution, and shape the time, with or without climate change.
parameter controls symmetry of the distribution However, as shown in Table 1, there is a
(Coles 2001). In the context of climate, Fig. 1 justi able and strong sense that some of these
(top row) shows how changes in the location extremes are evolving and becoming more
parameter would impact the distribution of ex- frequent, and the primary reason can be at-
tremes, and similarly middle and bottom rows tributed to human-induced changes in cli-
show the corresponding changes in extremes (or mate. However, given the small signal-to-
tails) when scale and shape factors are changed noise ratio and uncertain nature of forced
(Kodra and Ganguly 2014). changes, attributing changes solely to human-
To model series of extremes, a series of in- induced changes or natural variability can
dependent observations X1 ; X2 ; : : : Xn is consid- be misleading (Trenberth et al. 2015). Ex-
ered for some large value of n. Data is blocked treme attribution studies aim to determine to
into such sequences and a series of block maxima what extent human-induced climate change
Mn1 ; Mn2 ; : : : Nnm to which GEV is generated. has altered the probability or magnitude of
For example, if n corresponds to the number particular events with signi cant con dence
of observations in each year and m number of levels (Stott et al. 2016). This section dis-
years are considered, block maxima corresponds cusses some of the methods used for extreme
to annual maxima. attribution.
Estimates of extreme quantiles of the annual
maximum distribution are then obtained by in- Fractional Attributable Risk
verting (1): If A is the probability of a climatic event occur-
ring in the presence of human-induced forcing,
h i and Bis the probability of it occurring if the same
xp D 1 yp (2) forcing had not been present, then the fraction
of the current risk that is attributable to past
where xp is the return level associated with return greenhouse gas emissions (fraction of attributable
period 1/p. In other words, xp is exceeded by the risk; FAR) is given by FAR D 1 A=B.
annual maxima in a given year by probability p.
Model Approaches
Analysis of Trends General circulation models (GCMs), which often
The detection, estimation, and prediction of include biological, chemical, geological, atmo-
trends and associated statistical signi cance are spheric, and oceanic processes, provide the most
important aspects of climate extremes to analyze comprehensive simulations of the climate system.
extremes. For example, given a time series Data from model experiments with different
of temperature, the trend is the rate at which climate forcing combinations are available from
temperature changes over a given period of time, Climate Research Program s Coupled Model
202 Climate Extremes and Informing Adaptation

Climate Extremes and Shifted Mean


Informing Adaptation,

Probability of Occurrence
Fig. 1 The IPCC SREX a
discussed the potential
global warming
consequences on
(temperature, in this case)
extremes through three more
representational images hot
and assuming Gaussian less weather
(normal bell shaped, cold
symmetrical distribution). weather
less more
The rst image (top) extreme cold extreme hot
depicts a shift in the mean weather weather
without any other change
in the temperature
distribution, leading to
more hot extremes but less Increased Variability
cold extremes. The middle
Probability of Occurrence

gure shows no change in


b
mean but changes in
variability, leading to more
extremes on either tails,
i.e., hotter and colder
extremes. The last gure more more
shows a change in the cold hot
distributional symmetry weather weather
with or without climate more
more
change (Source: IPCC extreme cold extreme hot
SREX Field 2012) weather weather

Changed Symmetry
Probability of Occurrence

c
Without climate change
With climate change

near constant more


cold hot
weather weather
near constant more
extreme cold extreme hot
weather weather

extreme cold cold hot extreme hot


Mean:
without and with weather change

Intercomparison Project Phase 5 (CMIP5) (Tay- such as temperature, precipitation, humidity, etc.
lor et al. 2012). This data typically involves pool- The distribution of variables in the world with
ing data from multimodel ensembles of simula- human in uences and the world without these
tions with and without anthropogenic in uences in uences thus can be constructed from which es-
to generate large samples of the relevant variables timates of FAR can be obtained (Stott et al. 2016).
Climate Extremes and Informing Adaptation 203

Climate Extremes and Informing Adaptation, Fig. 2 be prepared. In the current era of greenhouse gas-driven
The 2012 IPCC SREX depicts the connection between climate change, even hazards are not immune to human
climate-related hazards and vulnerability or exposure. in uences (Figure source: IPCC SREX (1.1.2, gure 1 1)
While hazards have traditionally been considered acts of Lavell et al. 2012)
God, disasters are caused by the very human failure to

Informing Adaptation: Risk that event. In the context of climate and weather
Management extremes, hazard (H) can be visualized as an
outcome of an extreme event. For example,
Adaptation to climate extremes and preparedness in the context of planning for transportation
to disaster seek to reduce factors and modify en- systems, H may represent the severe snowstorm
vironmental and human contexts that contribute or hurricane that can potentially deviate the
to climate-related risk, to promote sustainabil- system from its normal functionality. In addition
ity in social and economic development (Lavell to hazard (H) and its subsequent probability of
et al. 2012). The promotion of adequate pre- occurrence p(H), risk to the system also depends
paredness for disaster is also a function of dis- on likelihood of vulnerability p(V) and chances
aster risk management and adaptation to climate of the system getting exposed to these risks.
change. Mathematically, risk can be quanti ed as:
One of the many ways in which climate
change is likely to affect societies and ecosystems Risk D p.H/ p.E/ p.V/ (3)
around the world is through extremes and
changes in extreme events (Fig. 2). As a result, Risk in a system is interpreted as total reduction
regularly updated appraisals of evolving climate in functionality and is related to the temporal
conditions and extreme weather would be effect of an extreme event on the system (Linkov
immensely bene cial for adaptation planning. et al. 2014).
In fact, in a conventional risk framework, one
of the components is probability of occurrence
of hazard. Risk analysis methods identify the Resilience Framework
vulnerabilities of speci c components of a
system to an adverse event and quantify the loss As discussed in the previous section, adaptation
of functionality of the system as a consequence of to climate extremes seeks to reduce factors
204 Climate Extremes and Informing Adaptation

contribute to climate-related risk. While a risk Future Directions


management system framework focuses on
strengthening the speci c components within Leveraging Physics and Data
a system, increasing interconnectivity and Science-Based Methods
complexity of systems makes risk analysis
of individual components constituting these Geographic information sciences (GI science or
complex systems unrealistic (Linkov et al. 2014; GIS) are necessary to develop data science meth-
Bhatia et al. 2015). Moreover, uncertainties ods that can translate the science of climate ex-
associated with the three components of risk tremes to insights and metrics that are actionable
challenge our ability to fully comprehend the by stakeholders and policy makers. Note that
risk related to the system. To address these here GIS is used broadly to denote two different
challenges, resilience must be built into systems connotations. This section describes how GIS
to help them quickly recover and adapt when in both forms can address stakeholder-relevant
adverse events do occur. The National Academy crucial gaps in climate (extremes) science.
of Sciences de nes resilience as the ability The rst major gap is to help generate deeper
to prepare and plan for, absorb, recover from, understanding of the processes that may relate to
and more successfully adapt to adverse events climate extremes. A combination of data and pro-
(Disaster Resilience: A National Imperative cess understanding, physics-guided data min-
n.d.). Resilient systems are able to minimize ing may help us gain a better understanding
the negative impacts of adverse events on of relevant processes such as convection and
impacted societies and even improve their aerosols, El Nino and other climate oscillators,
functionality by adapting to and learning from and the monsoons (Ganguly et al. 2014). A sec-
fundamental changes caused by the extreme ond gap is the ability to characterize extremes and
events. develop downscaling strategies. A comprehen-
sive assessment of uncertainty, including those
arising from multiple emissions scenarios, cli-
mate models and initial conditions, may be devel-
oped by bringing together physical understand-
Key Applications ing and data sciences, characterizing predictabil-
ity and natural variability, and balancing histor-
Concepts to quantify extremes play a fundamen- ical skills with multimodel consensus. The third
tal role in understanding the evolving nature gap is the development of metrics for geospatial
of weather and climatic extremes in global and temporal climate extremes and their spatial
environmental change. This understanding is heterogeneity and temporal change, as well as
translated to planning and managing diverse decision support tools and policy aids for en-
critical infrastructure sectors including 4 abling effective decisions. The massive volume
lifelines, namely, transportation (Bhatia et al. and complexity of data take the data and decision
2015), water resources (Kao and Ganguly science challenges to the realm of what has been
2011), healthcare (Semenza et al. 1995), and called Big Data. However, the need to examine
energy (Pryor and Barthelmie 2010). Given extremes and large changes as well as their early
the increase in interdependencies, complexities indicators makes this a small data concern.
and interconnectivity of infrastructure systems, This Big Data small data problem is perhaps
concepts to quantify the resilience of complex the enduring GIS challenge of climate extremes.
systems ranging from ecosystems to power Newer big data-driven methods in rare or extreme
grid management (Gao et al. 2016), and events and in the analysis of complex data are
communication and transportation (Bhatia et al. likely to lead to innovative solutions.
2015) networks are gaining attention in scienti c Arguably the most signi cant knowledge
community. gap in climate science relevant for informing
Climate Extremes and Informing Adaptation 205

stakeholders and policy makers is the inability understanding of the relevant processes. The so-
to produce credible assessments of local to called Big Data methods can succeed in the con-
regional climate extremes. Results from the text of climate extremes if in addition to handling
latest generation of global climate model runs massive data volumes, nonlinear data generation
do not suggest the possibility of signi cant processes, complex proximity based as well as
improvements in the near future, while regional long-memory and long-range dependence in time
climate models remain promising. However, and space, and extreme events or change can be C
ultrahigh-resolution models and physical directly addressed.
understanding continue to improve process
models. On the other hand, climate-related data,
from archived model simulations, and remote
Cross-References
or in situ sensors, have already moved into
the petabyte scale and are projected to reach
 Climate Adaptation, Introduction
350 PB by 2030. Thus, data-driven hypothesis
 Informing Climate Adaptation with Earth Sys-
examination and hypothesis generation need
tem Models and Big Data
to leverage methods for handling massive and
complex data. Geographical information science,
comprising both geospatial process models and
References
data science developments, can help address
these challenges. Aerts JCJH, Botzen WJW, Emanuel K, Lin N, Moel H de,
Michel-Kerjan EO (2014) Evaluating ood resilience
strategies for coastal megacities. Science 344:473
475. doi:10.1126/science.1248222
Role of Big Data in Extreme Event Bhatia U, Kumar D, Kodra E, Ganguly AR (2015) Net-
Mining work science based quanti cation of resilience demon-
strated on the Indian Railways network. PLoS ONE
Generalized extreme value distribution is the only 10:e0141890. doi:10.1371/journal.pone.0141890
Change IP on C (2014) Climate change 2014 impacts,
possible limit distribution of properly normalized adaptation and vulnerability: regional aspects. Cam-
maxima of a sequence of independent and iden- bridge University Press, New York
tically distributed random variables. However, Coles S (2001) An introduction to statistical modeling of
climate extreme events that can be correlated with extreme values. Springer, London
Disaster Resilience: A National Imperative n.d. http://
space and time may deviate from the assump- www.nap.edu/openbook.php?record_id=13457.
tion of proper normalization. Hence, statistical Accessed 1 July 2015
approaches have not been well developed for Field CB (2012) Managing the risks of extreme events and
a majority of climate extremes. Nonlinear dy- disasters to advance climate change adaptation: spe-
cial report of the intergovernmental panel on climate
namical approaches are better at characterizing change. Cambridge University Press, New York
the climate system rather than generating projec- Ganguly AR, Kodra EA, Agrawal A, Banerjee A, Boriah
tions and, even so, are not well developed for S, Chatterjee S et al (2014) Toward enhanced under-
predictability assessment in climate. Traditional standing and projections of climate extremes using
physics-guided data mining techniques. nonlinear Pro-
spatial and spatiotemporal data mining in com- cess Geophys 21:777 795. doi:10.5194/npg-21-777-
puter science, while well suited to certain kinds 2014
of geographic data, cannot handle the complex Gao J, Barzel B, BarabÆsi A-L (2016) Universal resilience
dependence structures, low-frequency variability, patterns in complex networks. Nature 530:307 312.
doi:10.1038/nature16948
and nonlinear data generation processes relevant Kao S-C, Ganguly AR (2011) Intensity, duration, and
for predicting climate extremes. The barriers are frequency of precipitation extremes under 21st-
particularly challenging given the so-called deep century warming scenarios. J Geophys Res Atmos
uncertainties in climate arising from both natural 116:D16119. doi:10.1029/2010JD015529
Karl TR, Melillo JT, Peterson TC (2009) Global climate
variability in the climate system, such as from change impacts in the United States. Cambridge Uni-
oceanic oscillators combined with our lack of versity Press, New York
206 Climate Finance

Kodra E, Ganguly AR (2014) Asymmetry of projected


increases in extreme temperature distributions. Sci Rep Climate Hazards and Critical
4. doi:10.1038/srep05884 Infrastructures Resilience
Kodra E, Steinhaeuser K, Ganguly AR (2011)
Persisting cold extremes under 21st-century
Udit Bhatia1 , Allison Traylor1 ,
warming scenarios. Geophys Res Lett 38:L08705.
doi:10.1029/2011GL047103 Catherine Moskos1 , Laura Blumenfeld1 ,
Kulkarni S, Deo MC, Ghosh S (2016) Evaluation of Lindsey Bressler1 , Tyler Hall1 , Rachael Heiss1 ,
wind extremes and wind potential under chang- Kevin D. Clark1;4 , Nan Deng1 , Devashish
ing climate for Indian offshore using ensemble
Kumar1 , Evan Kodra1 , Stephen E. Flynn2 ,
of 10 GCMs. Ocean Coast Manag 121:141 152.
doi:10.1016/j.ocecoaman.2015.12.008 Haris N. Koutsopoulos5 , Jerome F. Hajjar5 , and
Lavell A, Oppenheimer M, Diop C, Hess J, Lempert Auroop R. Ganguly1
R, Li J et al (2012) Climate change: new dimen- 1
Sustainability and Data Sciences Laboratory
sions in disaster risk, exposure, vulnerability, and re-
(SDS Lab), Department of Civil and
silience. In: Field CB (ed) Managing the risks of ex-
treme events and disasters to advance climate change Environmental Engineering, Northeastern
adaption. Cambridge University Press, New York, University, Boston, MA, USA
pp 25 64 2
College of Social Sciences and Humanities,
Linkov I, Bridges T, Creutzig F, Decker J, Fox-
Lent C, Kr ger W et al (2014) Changing the re-
Northeastern University, Boston, MA, USA
3
silience paradigm. Nat Clim Change 4:407 409. Northeastern University, Boston, MA, USA
4
doi:10.1038/nclimate2227 risQ Corporation, Cambridge, MA, USA
Pryor SC, Barthelmie RJ (2010) Climate change impacts 5
Department of Civil and Environmental
on wind energy: a review. Renew Sustain Energy Rev
14:430 437. doi:10.1016/j.rser.2009.07.028
Engineering, Northeastern University, Boston,
Qin D, Plattner GK, Tignor M, Allen SK, Boschung J, MA, USA
Nauels A et al (2013) Summary for policymakers.
Climate change 2013: the physical science basis. Con-
tribution of working group I to the fth assessment
report of the Intergovernmental Panel on Climate Introduction
Change. Cambridge University Press, Cambridge/
New York Climate Hazards and Critical
Semenza JC, Rubin CH, Falter KH, Selanikio Infrastructures Resilience
JD, Flanders WD, Howe HL et al (1996)
Heat-related deaths during the July 1995 heat Civil and environmental engineers design
wave in Chicago. N Engl J Med 335:84 90. structures such as buildings, bridges, and levees
doi:10.1056/NEJM199607113350203 based on implicit or explicit assessments of risks.
Stott PA, Christidis N, Otto FEL, Sun Y, Vanderlinden J-P,
Thus, dead and live loads are considered along
van Oldenborgh GJ et al (2016) Attribution of extreme
weather and climate-related events. Wiley Interdiscip with acute stressors including natural hazards,
Rev Clim Change 7:23 41. doi:10.1002/wcc.380 in addition to the strength of materials and
Taylor KE, Stouffer RJ, Meehl GA (2012) An the fragility of components. Safety factors in
overview of CMIP5 and the experiment design.
engineering design essentially attempt to account
Httpdxdoiorg101175BAMS 11-000941. http://
journals.ametsoc.org/doi/abs/10.1175/BAMS-D-11- for unknown or perhaps even unknowable
00094.1. Accessed 10 Dec 2015 uncertainties and change. The consequences of
Trenberth KE, Fasullo JT, Shepherd TG (2015) Attri- failure, measured in terms of economic damage
bution of climate extreme events. Nat Clim Change
and danger to human lives, result in safety factor
5:725 730. doi:10.1038/nclimate2657
assignments. Probabilistic risk assessments and
reliability analyses attempt to explicitly consider
risks in engineering design. Performance-based
Climate Finance engineering and fuse-based systems attempt to
develop design paradigms that rely on anticipated
 Climate Risk Analysis for Financial Institu- stresses to structural components. In certain
tions implementations, system level functionality may
Climate Hazards and Critical Infrastructures Resilience 207

be maintained while allowing for a systematic across organizational and jurisdictional barriers.
failure of components. The methods have been Adaptation to climate change and climate-related
successfully applied to earthquake engineering. weather or hydrologic extremes, especially
However, the challenges and the opportunities over the lifetime of infrastructure sectors and
become signi cantly different when concepts lifelines, requires an understanding on both the
from engineering design need to be generalized nonstationary nature of climate stressors and the
to embedding resilience in critical infrastructures, deep uncertainties. The earth s climate system C
especially in the context of adapting to threats is fundamentally changing in ways such that the
resulting from climate change. The current state past is no longer an effective guide to the future
of practice and research consider three related in terms of design parameters. Uncertainties
issues, speci cally, the nature of the climate and resulting from both our lack of understanding
related stressors, the de nition of the stressed and the intrinsic variability of the climate system
systems under consideration, as well as the cannot be assigned likelihoods. The situation
evolving concept of resilience. Resilience in this calls for exible design principles, which remain
context goes beyond robustness to the immediate risk informed and resilience centric. Case studies
effects of a hazard as well as the ability to discuss urban heat islands, sea level rise and
gracefully recover from the aftermath in a timely, land subsidence, hurricanes and storm surge in
cost-effective, and ef cient manner. In other coastal megacities, and severe droughts with
words, resilience is de ned as the ability of the consequences for the nexus of food-energy-water.
entire system to maintain essential functionality
despite acute or chronic stressors and, in the event
of failure or loss of functions, gets back to nor- Probabilistic Risk Assessments and
malcy quickly and easily. The stressed systems Climate Hazards
of primary concern are what have been called
critical infrastructures and lifeline infrastructure The Special Report on Extremes (IPCC 2012) as
networks. The United States Department of well as the Intergovernmental Panel on Climate
Homeland Security de nes 16 critical infrastruc- Change s Fifth Assessment Report (AR5) (IPCC
ture sectors, speci cally, chemical, commercial 2014a, b) published in 2013 2014 depicts
facilities, communications, critical manufactur- how climate extremes may turn into disasters
ing, dams, defense industrial base, emergency depending on vulnerability and exposure. The
services, energy, nancial services, food and framework relies on risk computations, where
agriculture, government facilities, healthcare and three aspects are considered: hazards, or the
public health, information technology, nuclear probability of threats; vulnerability, or the
reactors and materials and waste, transportation, probability of damage conditional on hazards;
and water and wastewater. The National and consequences or economic damages and/or
Infrastructure Advisory Council lists four critical losses of human lives. Climate hazards may be
lifeline infrastructure networks: transportation, broadly de ned to include either extreme weather
electricity and power, communications, and or hydrological events or changes in regional
water and wastewater. Developing resilience hydrometeorology, which may be caused or
across these lifelines and sectors requires an un- exacerbated by climate variability or change
derstanding of the cascading interdependencies and which could stress all or parts of the coupled
across infrastructure elements and networks, the natural-engineered-built systems (Fig. 1). Recent
ability to design systems for effective response climate hazards in the United States include
and recovery, the ability to design for greater hurricanes Katrina in New Orleans in 2005 and
resilience, the availability of appropriate metrics Sandy in New York/New Jersey in 2012, oods
and nancial instruments or economic incentives, in Iowa in 2013, the 2010 (ongoing) droughts in
as well as the ability to effectively govern California, the 2014 cold snaps in the Northeast,
208 Climate Hazards and Critical Infrastructures Resilience

Climate Hazards and Critical Infrastructures Re- to identify the vulnerabilities of both natural and built
silience, Fig. 1 Schematic representation of probabilistic environments to an expected climate-related hazard and
risk assessment (PRA) methods in context of climate- quantify the losses as a result of consequences of these
related hazards. PRA methods such as this can be used events

and 2012 summer heat waves across the United to examine the impacts of strategic policy
States. Hurricane Katrina was a Category 5 over and tactical interventions. Figure 1 shows a
the Gulf of Mexico but reduced to a Category 3 comprehensive depiction of PRA and PRA-
by the time it made landfall on the Gulf Coast. inspired methods, which have been or could
However, the natural phenomena, the hurricane be used in the context of climate hazards. In
hazard itself in this case, was not the sole reason the context of climate-change impacts, risk is
why Hurricane Katrina was the costliest natural often represented as probability of occurrence of
disaster in US history. In fact, post-landfall hazards, including but not limited to extremes
news for a while appeared to suggest that the such as heat waves, droughts, oods, cold
storm was moving northward over land, but snaps multiplied by the impacts these events
no major destruction was reported. However, may cause on natural and human systems.
it was then that the levee, which was known to Climate observations from in situ and remote
be highly vulnerable to start with, broke from sensors such as satellites, reanalysis data (Kalnay
the weight of the water. The resulting oodwater et al. 1996), and data from general circulation
devastated New Orleans, where to start with model (GCMs) are assimilated together with
the human settlement patterns were susceptible. Greenhouse emission scenarios, multi-model
This is where the hazard (Hurricane Katrina) ensembles, and multiple initial conditions of
interacted with the vulnerability of a critical GCMs (see next sections) to project the changes
infrastructure (levee) as with exposure (e.g., and variability in climate and climate-related
human settlements in this case) to result in levels extremes. However, global circulation models
of losses that were historically unprecedented are run at a coarse spatial resolution, typically of
and thus far unsurpassed within the United the order of 100 km and are unable to delve
States. Probabilistic risk assessments (PRA) thus information at the local to regional scales
remain important to extract a comprehensive relevant to policymakers and stakeholders. As
characterization of climate hazards, understand a result, GCM output cannot be directly used
how and when the hazards may turn into for impact assessment at regional or local scales.
catastrophic disasters, and perhaps even used To overcome this problem, downscaling is often
Climate Hazards and Critical Infrastructures Resilience 209

used to obtain local-scale climate projections enormous interest, most of the research endeavors
at ner resolution from atmospheric variables have focused on the isolated systems. However,
provided by GCMs (Ghosh and Mujumdar critical infrastructures including lifelines exhibit
2008). Interaction of the climate-related hazards a large number of interdependencies. These in-
with the exposure and vulnerabilities of critical terdependencies could be cyber or cyber-physical
infrastructures and population put these systems (Buldyrev et al. 2010), geographical (SolØ et al.
at risk, resulting in the loss of economy or/and 2008) or political, and so on. Traditional risk C
human lives. However, quanti cation of climate- analysis methods focus on identi cation of vul-
related hazards and related risk is associated with nerabilities of speci c system components. Sub-
uncertainties arising out of natural variability, sequent risk management frameworks, hence, fo-
anthropogenic climate change, or a combination cus on strengthening these speci c components
of both. Hence, uncertainty quanti cation and to prevent overall system failure (Linkov et al.
characterization forms a crucial part of PRA- 2014). However, the factors which make tra-
inspired methods before they can be deployed ditional risk assessment tools unviable are as
to motivate strategic policy changes and resilient follows: (1) complexity and interconnectedness
design practices. of infrastructure networks including lifelines and
(2) nonstationarity and deep uncertainty asso-
Resilience Paradigm: Beyond ciated with climate hazards. However, the de-
Probabilistic Risks velopment of resilience at system level faces
the following challenges: the lack of consensus
While critical infrastructure systems and life- over de ning and quantifying resilience, lack of
lines were built as isolated entities, in actuality preparedness for foreseeable and unforeseeable
they are functionally interdependent. Disasters risks under changing climate, absence of incen-
ranging from hurricane to large-scale power out- tive structure for public and private infrastructure
ages have shown how failure in one system may owners to create resilience, and organizational
trigger a cascade of failures in interdependent barriers to creating resilience. Figure 2 sums
infrastructure systems. Although investigation of up the barriers and plausible solutions to over-
resilience in infrastructure systems has triggered come these in order to translate resilience from

Climate Hazards and Critical Infrastructures Re- infrastructure and lifeline systems. Visualization and un-
silience, Fig. 2 De ciencies in critical infrastructure re- derstanding resilience is an obligatory part of the frame-
silience arise from four broad challenges. The four pil- work to enforce resilient engineering and policy practices,
lars outline the elements of solution to overcome these which in turn, requires exhaustive understanding of inter-
challenges to embed resilience in functioning of critical dependencies of various infrastructure systems
210 Climate Hazards and Critical Infrastructures Resilience

Climate Hazards and Critical Infrastructures Re- and plan for adverse events, resilience management goes
silience, Fig. 3 Resilience management framework beyond and integrates the capacity of a system to absorb
adapted from the commentary in Nature by Linkov et al. and recover from adverse events, and then adapt. The
While probabilistic risk assessment-enabled methods give dashed line suggests that state of the system after recovery
the probability of system hitting the lowest point of its may be better or worse with respect to the initial perfor-
essential functionality and thus help the system prepare mance, depending upon the system resilience

a mere buzzword to operational paradigm for ate threats (e.g., terrorism, sabotage). Over the
system management (Linkov et al. 2014). As last decade, there have been considerable ad-
highlighted in the correspondence piece (Fisher vances in the understanding of cascading inter-
2015), resilience has been de ned in more than dependencies of the lifeline networks (Buldyrev
70 ways in the literature. While the National et al. 2010; Ko et al. 2013; Hernandez-Fajardo
Academy of Sciences (Disaster Resilience 2015) and Dueæas-Osorio 2013). However, they had
de nes resilience as the ability to prepare and relatively little impact on the design of resilient
plan for, absorb, recover from, and more success- interconnected infrastructures to mitigate the risk
fully adapt to adverse events , many scientists of cascading failures because the applicability of
have just focused on the recovery part (Fig. 3a) to these frameworks on real-life networked infras-
de ne resilience as the system s ability to bounce tructures is not a trivial task, because the over-
back after stress. Long-term policies based on simpli ed assumptions on which these models
the two extreme ends of de nitions are likely are based may not be valid for the inextricable
to be very different and would be associated interdependent systems (Vespignani 2010).
with different costs, depending on the de nitions
and metrics we adopt to measure resilience. At
the regional scale, the structure and function of Climate Hazards: Variability and
infrastructure systems particularly in the life- Deep Uncertainty
line sectors are appropriately represented us-
ing network models and network science tools As discussed in previous sections, both PRAs
(SolØ et al. 2008; Albert et al. 2000; Sen et al. and resilience management framework include
2003; Guimer et al. 2005). A key issue for risk analysis as a central component. However,
assessing and improving the resilience of infras- climate change might produce extreme events
tructure systems is to understand the behavior that cannot be predicted precisely, particularly
of the lifeline sectors during normal operating at the spatial resolutions and time horizons
conditions, as well as in the presence of both relevant to the infrastructure owners and
nondeliberate hazards (e.g., natural hazards, hu- managers. Time horizons to be considered
man accidents, technology failures) and deliber- for emergency(Aerts et al. 2014) management
Climate Hazards and Critical Infrastructures Resilience 211

and infrastructure planning are near real time, Case Studies


seasonal to interannual, decadal to mid-century
and multi-decadal to centennial, and so on. This section elucidates three case studies in the
Infrastructures and lifelines are expected to context of lifeline infrastructures, where climate
remain the same on near real-time horizons and extremes act as a stressor. In case study I, the
seasonal to interannual time horizons. However, impact of blizzards on the national airspace sys-
weather predictions at these time scales may tem of the United States is discussed. Particularly, C
inform emergency response and management. this case study highlights that a regional climate
On decadal to mid-century time scale, signi cant event such as blizzard impacts the entire air traf c
changes in reinforcements and remediation across the nation.
measures to build resilience of communities The second case study demonstrates the cas-
are expected. However, predicting climate- cading interdependencies that exist between the
related hazards would be a major challenge various lifeline networks. In 2012, the power-
as nonstationarity, including trends in global grid failure in the northern and eastern parts of
warming and changes in the statistics of weather India resulted in catastrophic impacts on vari-
pattern and relations (Salvi et al. 2015), and ous lifelines dependent upon the power grid, di-
variability in mean and extreme climate (Ghosh rectly or indirectly. Triggered by the combination
et al. 2012) are expected to be predominant at of climatic events, delayed monsoon and heat
these time scales (Hawkins and Sutton 2009). For waves, in this case, and manmade error, this has
example, Ganguly et al. (2009) demonstrated that been recorded as the worst power blackout in
increased trends in temperature and heat waves the global history in terms of number of people
suggest for urgent mitigation and adaptation affected.
strategies, but these projections are concurrent Finally, the vulnerability of a Massachusetts
with large uncertainty and variability making the Bay Transit Authority (MBTA) system of Boston
decision making process complicated. is discussed, which lost its essential functionality
Since projections of weather and climate come on its rail system when hit by severe blizzards in
from the numerical models that resolve the rel- 2015.
evant processes, uncertainties in applying these
models may result from: Case Study 1: Blizzard 2015 and Its Impact
on National Airspace System of the United
1. Internal variability: Arises out of initial con- States of America
dition uncertainty and is more relevant for the FlightAware (2015) reported that 1,200 ights
short time scales (Palmer et al. 2005) were expected to be cancelled on January 26,
2. Multimodal uncertainty: As we know, cli- 2015, to reduce air traf c volume in the northeast
mate system is highly complex, and it is prac- prior to a forecasted heavy winter storm. Delta
tically impossible to model all the processes airlines preemptively cancelled 600 ights; fur-
and parameters that govern the climate. De- thermore, a dozen ights from London Heathrow
pending upon the choice of the parameters to to New York, Philadelphia, and Boston were
model these processes, different GCMs dif- cancelled on the same date. These are just a
fer substantially in their projections in the few of the steps commercial airlines, the Federal
future. Aviation Administration (FAA), State Of cials,
3. Boundary condition uncertainty: The and Airport Authorities took in preparation for
source of this uncertainty resides in the the January 27, 2015, blizzard, designated Juno
assumption over the future world economic by the National Weather Service (NWS). Across
and social development, leading to alternative New England, contingency plans were imple-
sources of greenhouse emissions whose mented to sustain critical functions and return
relative likelihood cannot be easily accessed air transportation to normal operations as soon
(Tebaldi and Knutti 2007). as possible. The impact of the storm was felt
212 Climate Hazards and Critical Infrastructures Resilience

Climate Hazards and Critical Infrastructures Re- terdependent lifeline services, including water distribution
silience, Fig. 4 Flowchart showing events resulting in and wastewater distribution networks, transportation net-
2012 blackouts and resulting consequences on other in- works, and healthcare services

locally and nationally. This study addresses air Stats.com (FlightStats 2015) issued a report stat-
traf c delays, diversions, and ight cancellations ing that from October 27 to November 1 in North
caused by extreme winter weather events and America alone, 20,254 ights were canceled due
system recovery. An airport that is better prepared to Hurricane Sandy. Roughly 9,978 ights were
to respond to weather hazards operates more canceled at New York area airports alone. United
ef ciently for passengers and airlines and can stands as the airline with the most cancellations
avoid signi cant negative impact to the NAS as by Sandy (2,149) followed by JetBlue (1,469),
a whole. US Airways (1,454), Southwest (1,436), Delta
As of August 21, 2014, there were 19,453 air- (1,293), and American (759). In an examina-
ports in the United States (IPCC 2014a). Five of tion of weather events over the past 7 years,
the busiest are located in the Eastern Service Area Sandy comes in second in terms of total number
(ESA) of the National Airspace System (NAS): of cancelled ights, behind the North American
Atlanta, New York s JFK, Boston, Philadelphia, Blizzard of February 2010 (22,441 ights), for
and Washington DC. Numerous studies (Jarrah which the Blizzard of January 2015, designated
et al. 1993; Abdelghany et al. 2004) have shown Juno, is compared in this report. Airport system
that convective weather in/around airports are a capacity directly relates to NAS capacity, and
major cause of ight delays and a signi cant Juno adversely affected airports and air traf c in
causal factor in aircraft accidents. In 2012, Flight- the system.
Climate Hazards and Critical Infrastructures Resilience 213

Climate Hazards and Critical Infrastructures Re- international airport (Adapted from Massacheusetts Bay
silience, Fig. 5 Massachusetts Bay Transit System: light Transportation Authority, Boston)
rail routes (Green, orange, blue, red lines) and bus route to

Case Study 2: 2012 India Blackouts grid for running water pumps to irrigate the paddy
On July 30 31, 2012, two severe blackouts hit elds in Kharif season.
northern and eastern India, which impacted over On July 30, circuit breakers on a 400 kV line
620 million people, across 22 out of 29 states between cities of Bina and Gwalior got tripped.
of the nation. Given the population size affected, As this line fed into another transmission section
this has been recorded as the largest power outage (Agra-Bareilly), circuit breakers at that section
in the history. Figure 1 shows how both non- also tripped. As a result of this sequential trip-
intentional manmade and natural events resulted ping, power failure cascaded through the grid.
in the collapse of the power grid. In the sum- The system failed again on the afternoon of July
mer of 2012, extreme heat caused record power 31 due to relay problem. As a result, power
consumption in northern India. The situation was stations across the affected parts went of ine,
further exacerbated by delayed monsoons, which resulting in the shortage of 32 GW of power.
resulted in drawing of increased power from the The failure cascaded through other dependent
214 Climate Hazards and Critical Infrastructures Resilience

Climate Hazards and Critical Infrastructures Re- 2012 power blackout brought more than 300 trains in
silience, Fig. 6 Illustrative representation of interdepen- northern and eastern India to a standstill, leaving people
dencies between power grid and Indian Railways network. con ned in the trains

infrastructures, hence severely affecting the func- tems such as these, operating at spatial scales
tioning of lifeline systems including transporta- ranging from local to regional to global (Fig. 4).
tion, water distribution and wastewater treatment
units, and health care services. Several hospitals Case Study 3: 2012 Blizzard 2015 and
faced interruptions in providing health services. Massachusetts Bay Transit System
Water treatment plants in affected regions were In 2015, Boston confronted the snowiest winter
shut down for several hours. More than 300 ever in the history of recording climate events.
trains, which include both long distance trains In February alone, four storms had brought
and local trains, were stalled, leaving passen- Boston record-breaking snowfall of over 100 in.
gers stuck midway. An illustration of cascad- Thousands of citizens lives were affected.
ing independencies between the power grid and Boston s transportation system undertook an
Indian Railways Network is shown in Fig. 3. unprecedented test: highways blocked, ights
This case study highlights the imperative need canceled, and train service shut down. After a
to address the model the complexities of inte- thorough analysis on dwell time and boarding
grated systems to embed resilient design prac- data of the northern stations of the Orange Line
tices into large-scale lifeline infrastructure net- (shown in Fig. 5) provided by MBTA Overhead
works (Linkov et al. 2014). Also, the role of Contact System center, the ridership decreased
geographic information systems is implicit and dramatically by nearly 30 % on the rst day after
ubiquitous to model and visualize complex sys- the blizzard and recovered rapidly on the next
Climate Hazards and Critical Infrastructures Resilience 215

one. Meanwhile, the travel time and dwell time Buldyrev SV, Parshani R, Paul G, Stanley HE,
among, between, and within stations increased Havlin S (2010) Catastrophic cascade of failures
in interdependent networks. Nature 464:1025 1028.
almost 50 %, which means the Orange Line train doi:10.1038/nature08932
system lost one-third of its capacity. According to Disaster Resilience: A National Imperative (2015) [In-
the boarding record and peak hour statistics, the ternet]. [cited 1 Jul 2015]. Available: http://www.nap.
remaining capacity can just meet the highest edu/openbook.php?record_id=13457
demand of current ridership. However, with
Fisher L (2015) Disaster responses: More than 70 C
ways to show resilience. Nature 518:35 35.
growing population and transit use, the capacity doi:10.1038/518035a
limit might become a bottleneck in front of FlightAware Flight Tracker/Flight Status/Flight Track-
extreme weather or an emergency event, let ing. In: FlightAware [Internet]. [cited 1 Jul 2015].
Available: http:// ightaware.com/
alone worse weather conditions. The subsequent FlightStats Global Flight Tracker, Status
snowstorms during February 2015 have proved Tracking and Airport Information [Internet].
this hypothesis and resulted in system shutdown [cited 1 Jul 2015]. Available: http://www. ight
at times (Fig. 6). stats.com/go/Home/home.do
Ganguly AR, Steinhaeuser K, Erickson DJ, Branstetter
Given that the capacity one train can carry M, Parish ES, Singh N et al (2009) Higher trends but
is equivalent to almost 15 buses, it is almost larger uncertainty and geographic variability in 21st
impossible to completely replace the train service century temperature and heat waves. Proc Natl Acad
by using the shuttle bus. As a result, passengers Sci 106:15555 15559. doi:10.1073/pnas.0904495106
Ghosh S, Mujumdar PP (2008) Statistical downscal-
have to turn away transit and resort to driving ing of GCM simulations to stream ow using rele-
cars as their commuter mode, which brought even vance vector machine. Adv Water Resour 31:132 146.
more congestion on the highways. The transition doi:10.1016/j.advwatres.2007.07.005
from SOV (single-occupancy vehicle) to HOV Ghosh S, Das D, Kao S-C, Ganguly AR (2012) Lack
of uniform trends but increasing spatial variability in
(high-occupancy vehicle) usage cannot be widely observed Indian rainfall extremes. Nat Clim Change
accepted if robust and reliable transit service is 2:86 91. doi:10.1038/nclimate1327
not being provided. Given the fragility and unre- Guimer R, Mossa S, Turtschi A, Amaral LAN (2005)
liability of current rail service that the northern The worldwide air transportation network: anoma-
lous centrality, community structure, and cities global
part of MBTA Orange Line presented, a more roles. Proc Natl Acad Sci USA 102:7794 7799
comprehensive evaluation of the whole MBTA doi:10.1073/pnas.0407994102
transit system, including its capacity, resilience, Hawkins E, Sutton R (2009) The potential
and future evolution, is recommended. to narrow uncertainty in regional climate
predictions. Bull Am Meteorol Soc 90:1095 1107.
doi:10.1175/2009BAMS2607.1
Hernandez-Fajardo I, Dueæas-Osorio L (2013) Probabilis-
Cross-References tic study of cascading failures in complex interdepen-
dent lifeline systems. Reliab Eng Syst Saf 111:260
272. doi:10.1016/j.ress.2012.10.012
 Internet-Based Spatial Information Retrieval IPCC (2012) Managing the risks of extreme events and
disasters to advance climate change adaptation: spe-
cial report of the intergovernmental panel on climate
References change [Internet]. Available: https://www.ipcc.ch/pdf/
special-reports/srex/SREX_Full_Report.pdf
IPCC (2014a) Climate change 2014 impacts, adaptation
Abdelghany KF, Shah SS, Raina S, Abdelghany AF and vulnerability: part A: global and sectoral aspects
(2004) A model for projecting ight delays during [Internet]. Cambridge University Press. Avail-
irregular operation conditions. J Air Transp Manag able: http://www.cambridge.org/us/academic/subjects/
10:385 394. doi:10.1016/j.jairtraman.2004.06.008 earth-and-environmental-science/climatology-and-cli
Aerts JCJH, Botzen WJW, Emanuel K, Lin N, Moel H mate-change/climate-change-2014-impacts-adaptation
de, Michel-Kerjan EO (2014) Evaluating ood re- -and-vulnerability-part-global-and-sectoral-aspects-wo
silience strategies for coastal megacities. Science 344: rking-group-ii-contribution-ipcc- fth-assessment-repo
473 475. doi:10.1126/science.1248222 rt-volume-1?format=PB
Albert R, Jeong H, BarabÆsi A-L (2000) The Internet s IPCC (2014b) Climate change 2014 impacts,
Achilles Heel: error and attack tolerance of complex adaptation and vulnerability: part B: regional aspects
networks. Nature 406:200 0 [Internet]. Cambridge University Press. Available:
216 Climate Impacts

http://www.cambridge.org/us/academic/subjects/earth-
and-environmental-science/climatology-and-climate-c Climate Risk Analysis for Financial
hange/climate-change-2014-impacts-adaptation-and-v Institutions
ulnerability-part-b-regional-aspects-working-group-ii-
contribution-ipcc- fth-assessment-report-volume-2?fo
Farid Razzak
rmat=PB#contentsTabAnchor
Jarrah AIZ, Yu G, Krishnamurthy N, Rakshit A (1993) Rutgers Business School, Rutgers University,
A decision support framework for airline ight New Brunswick, NJ, USA
cancellations and delays. Transp Sci 27:266 280.
doi:10.1287/trsc.27.3.266
Kalnay E, Kanamitsu M, Kistler R, Collins Synonyms
W, Deaven D, Gandin L et al (1996) The
NCEP/NCAR 40-year reanalysis project. Bull
Am Meteorol Soc 77:437 471. doi:10.1175/1520- Carbon Emissions; Carbon Finance; Carbon
0477(1996)077<0437:TNYRP>2.0.CO;2 Trading; Climate Change; Climate Finance;
Ko Y, Warnier M, Kooij RE, Brazier FMT (2013) Climate Trend Analysis; Emissions Trading;
An entropy-based metric to quantify the robustness
of power grids against cascading failures. Saf Sci
GHG; GIS Mobile Remote Sensors; MRV;
59:126 134. doi:10.1016/j.ssci.2013.05.006 REDD+; Sequestration; Sustainability Risk
Linkov I, Bridges T, Creutzig F, Decker J, Fox-
Lent C, Kr ger W et al (2014) Changing the re- Definition
silience paradigm. Nat Clim Change 4:407 409.
doi:10.1038/nclimate2227 The climate change phenomenon is widely un-
Palmer TN, Shutts GJ, Hagedorn R, Doblas-Reyes
FJ, Jung T, Leutbecher M (2005) Representing
derstood to be magni ed by harmful greenhouse
model uncertainty in weather and climate pre- gases (GHGs) that are by-products of emissions
diction. Annu Rev Earth Planet Sci 33:163 193. yielded from advances in human engineering
doi:10.1146/annurev.earth.33.092203.122552 in the energy, technology, transportation, and
Salvi K, Ghosh S, Ganguly AR (2015) Credibility of
statistical downscaling under nonstationary climate.
land development industries. Effectively, the
Clim Dyn 1 33 doi:10.1007/s00382-015-2688-9 pollution that is being generated from human
Sen P, Dasgupta S, Chatterjee A, Sreeram PA, Mukher- activities is actively contributing to the imbalance
jee G, Manna SS (2003) Small-world properties of in the planet s climate, therefore creating the
the Indian railway network. Phys Rev E 67:036106.
doi:10.1103/PhysRevE.67.036106 scenario where human prosperity may be
SolØ RV, Rosas-Casals M, Corominas-Murtra B, Valverde severely hindered in the near future. Global
S (2008) Robustness of the European power grids industrial incentives, regulations, and policies
under intentional attack. Phys Rev E 77:026102. have been formed to mitigate the climate change
doi:10.1103/PhysRevE.77.026102
Tebaldi C, Knutti R (2007) The use of the multi-model phenomenon in the form of monetized nancial
ensemble in probabilistic climate projections. Philos instruments that can help manage the amount
Trans R Soc Lond Math Phys Eng Sci 365:2053 2075. of global pollution permitted, nancial climate
doi:10.1098/rsta.2007.2076 risk disclosures that keep investors informed
Vespignani A (2010) Complex networks: the
fragility of interdependency. Nature 464:984 985. about climate-related impacts to investments,
doi:10.1038/464984a and environmental sustainability analysis that
validates the business continuity of an investment
impacted by environmental risks.
Climate Impacts The management of future pollution that
may contribute to furthering climate change by
nancially incentivizing more prudent business
 Climate Extremes and Informing Adaptation
practices and climate-friendly organizational
strategies has created opportunities for climate
change investment research. Geographical
Climate Resilience Information Systems that can provide insight
into different aspects of global climate change
 Climate Extremes and Informing Adaptation facilitates data-driven nancial investment
Climate Risk Analysis for Financial Institutions 217

decisions which can create a dynamic and robust supporting the analysis, it was decided at the
relationship between the nancial and scienti c 1992 United Nations Conference on Environment
aspects of leveraging climate change mitigation. and Development (UNCED) to formally begin
This chapter will explore the historical back- action to create policies for climate change
ground of global climate change polices and leg- mitigation by commissioning the United Nations
islation over recent decades; the nancial instru- Framework Convention on Climate Change
ments, markets, and risk disclosures that resulted (UNFCCC) (Moore 2012; Raufer and Iyer 2012). C
from these policies; the relevant scienti c and The purpose of the UNFCCC was to estab-
investment approaches regarding climate change lish a voluntary commitment from the United
mitigation; how Geographical Information Sys- States and 153 other nations to reduce harmful
tems can serve as a crucial tool in the nancial greenhouse gas (GHG) emissions to environ-
applications of climate change mitigation and mentally acceptable levels within the next few
the future prospects of Geographical Information decades, to nd strategies to reduce the global
Systems in this domain. warming epidemic, and to assess viable options
to address inevitable climate change effects on
the environment (Moore 2012; Raufer and Iyer
Historical Background 2012). Annual meetings of the parties involved
with the UNFCCC have been conducted since
United Nations Climate Mitigation Polices the inception of the convention onward, formally
The environmental impacts of climate change referred to as the UNFCCC conference of par-
were not clearly understood by the nations of ties (COP), yielding progressive legislation and
the world in the early 1980s. The United States policies toward the mitigation of climate change
was the rst government to lead an exploratory (Moore 2012).
study of international environmental risks which The meetings of most signi cant and consid-
included a thorough analysis on climate change ered progressive milestones for climate change
effects. This study brought signi cant awareness mitigation policies have been that of the COP of
to the potential impacts of climate change war- 1997 in Kyoto, Japan, and the COP of 2009 in
ranting a more speci ed scienti c study of cli- Copenhagen, Denmark (Moore 2012; Raufer and
mate change to illuminate the future risks that na- Iyer 2012; Alexander 2013).
tions of the world may have to encounter (Moore
2012). Kyoto Protocol
In 1988, the United Nations World Meteoro- On December 11, 1997, during an annual
logical Organization (WMO) and the United Na- UNFCC COP in Kyoto, Japan, the Kyoto
tions Environment Program (UNEP) established Protocol was adopted and given an effective
the Intergovernmental Panel on Climate Change date of February 16, 2005. The Kyoto Protocol
(IPCC) to provide research on the science of is widely seen as the rst signi cant step toward
climate change, analyze the societal and econom- an internationally standardized GHG emissions
ical risks due to climate change, and produce reduction plan that seeks to manage harmful
strategies to mitigate the impacts that climate emissions and provide a formalized scalability
change presents for further discussion on the platform to continuously improve on climate
international topic (Moore 2012). change mitigation strategies (Moore 2012;
The rst assessment report from the IPCC Raufer and Iyer 2012; Baranzini and Carattini
was delivered on 1990 and provided ample 2014).
evidence to suggest that climate change would The Kyoto Protocol facilitated the reduction
be of crucial importance for the near future of emissions by the establishment of binding
of environmental risks and policy planning. agreements among 37 industrialized nations and
With subsequent reports from the IPCC echoing European nations which committed the nations to
similar sentiments and additional evidence reduce their GHG emissions output by an average
218 Climate Risk Analysis for Financial Institutions

of about 5 8 % from the year 1990 emissions and Carattini 2014; Henr quez 2013; Kossoy and
output by a 5-year span of 2008 2009 (Moore Guigon 2012).
2012; Raufer and Iyer 2012; Baranzini and Carat- The provision in the Kyoto Protocol also al-
tini 2014). More importantly, the Kyoto Protocol lows the trade of other equally important environ-
placed a larger responsibility and burden on the mental reduction targets such as the removal units
developed countries due to the accepted notion (RMU) based on land use, land use change, and
that they were the primary contributors to the cur- forestry (LULUCF) activities to help mitigate de-
rent amount of GHG emissions in the atmosphere forestation activities which directly contribute to
(Moore 2012). the natural mitigation of climate change (Moore
Enforcement of the Kyoto Protocol was gen- 2012). Additionally the Kyoto Protocol also of-
erally conducted through industrial policies and fers the global Clean Development Mechanisms
regulations at the federal and local government that acts as the authority of GHG emissions offset
levels of each respective participatory nation programs which allows industrialized or devel-
(Moore 2012). However, the Kyoto Protocol also oping countries that engage in quali ed local
offered market-based nancial and economical projects that are designed to help reduce GHG
incentives to achieve promotion of environment- emissions or to provide environmental sustain-
friendly investments, business practices, and ability to earn certi ed emissions credits (CER).
technologies as well as to meet GHG reduction A CER is the equivalent of 1 ton of carbon
targets via economic and ef cient options. The dioxide (CO2 ) allowed to be emitted into the at-
market-based options that the Kyoto Protocol mosphere. CO2 is one of the harmful GHG emis-
introduced were GHG Emissions Trading, the sions that contributes to climate change. With
Clean Development Mechanism (CDM), and the these earned CER credits, they can be traded,
Joint Implementation (JI). Each of the options sold, or purchased on international markets for
followed a cap and trade framework in which the bene t of nations to meet or exceed their
there was a cap or quota on the allowed GHG emissions reduction targets (Moore 2012;
amount of commodities (emissions allowed to be Raufer and Iyer 2012). It is also important to
produced) that were in the market. The trade note that 2 % of the income proceeds from CDM
aspect refers to the ability and platform to trade projects goes toward the Kyoto Protocol Adap-
the commodities as an instrument with other tation Fund which nancially backs projects and
market participants (Moore 2012; Raufer and Iyer programs for countries that are impacted most
2012; Baranzini and Carattini 2014; Henr quez adversely from climate change effects without the
2013; Kossoy and Guigon 2012). ability to mitigate them (Moore 2012). Lastly,
the Joint Implementation provision in the Ky-
oto Protocol under Article 6 allows participating
Greenhouse Gas Emissions Trading nations to engage in quali ed projects that re-
As previously mentioned, the Kyoto Protocol duce GHG emissions in other countries to earn
receives commitments from participatory nations emissions reduction credits which can be used
to reduce GHG emission levels by a 5-year span toward the participatory nations GHG emissions
in 2008 2012. Some countries may be able to reduction targets. Joint Implementations allows
facilitate these targets while being well under for mutually bene cial partnerships that help to
target emissions levels, but some may require ad- foster prosperity in developing nations while also
ditional allowance of emissions to meet practical keeping a focus on the mitigation of climate
industrial and economic demands. To address this change (Moore 2012). It is important to note that
potential issue, Article 17 of the Kyoto Protocol all of these market-based mechanisms are heavily
allows for different market-based nancial instru- dependent on accurate analysis, measurement,
ments that can allow for trade of excess emissions and forecasting of GHG emissions to be consid-
allowances to countries that may exceed emis- ered a viable climate change mitigation strategy
sions targets (Raufer and Iyer 2012; Baranzini (Raufer and Iyer 2012; Rosenqvist et al. 2003).
Climate Risk Analysis for Financial Institutions 219

GHG emissions trading relies on the overall the Bali Action Plan was rati ed, yielding the
calculated emissions quotas for each respective REDD+ program (REDDplus). REDD+ included
nation to determine the appropriate amount of all of the original REDD stipulations but also
commoditized GHG emissions to be allowed into incorporated a focus on funding projects that
the market. For this market-based platform to created sustainable management of forests and
be successful, accurate monitoring and measure- further enhancement of forest carbon stocks in
ments of actual GHG emissions from each nation developing countries (Alexander 2013). REDD+ C
is required though regulated carbon registries and programs are based on the science that terres-
authorities (Moore 2012; Rosenqvist et al. 2003). trial forests, wetland forests, and biodiversity are
Once regulated appropriately and accurately, capable of natural carbon sequestration, where
national and regional marketplaces are allowed GHG emissions such as carbon dioxide (CO2 )
to be established so long as they follow the is captured by plant life and where carbon is
Kyoto Protocol s fundamental stipulations. This stored in the soil beneath the plant life (Alexander
has allowed for emissions marketplaces such 2013; T nzler and Ries 2012; Plugge et al. 2011;
as the then Chicago Climate Exchange (CCX) Nzunda and Mahuve 2011; Wertz-Kanounnikoff
and European Climate Exchange (ECX), both of et al. 2008).
which operated as a trading platform similar to
that of other nancial commodities exchanges, Copenhagen Accord
and now the Intercontinental Exchange Futures As the Kyoto Protocol s framework approached
Europe which is now the leading market in its expiration date of 2012, a mounting need to
emissions trading. All of which followed the develop a new framework that may extend and/or
European Union s emissions trading scheme enhance the Kyoto Protocol s principles for
(EU ETS) (Raufer and Iyer 2012; Baranzini and climate change mitigation was direly needed. The
Carattini 2014; Rosenqvist et al. 2003; Kossoy December 2009 United Nations Climate Change
and Guigon 2012). Conference in Copenhagen, Denmark, addressed
the concerns of the expiration of the Kyoto
Reducing Emissions from Deforestation and Protocol by developing, negotiating, and ratifying
Forest Degradation the Copenhagen Accord. The Copenhagen
At the 11th Conference of Parties (COP11) of the Accord committed 186 nations (including the
UNFCCC in the year 2005, the reducing emis- United States) to reduce GHG emissions levels,
sions from deforestation and forest degradation engage in clean energy projects, and put focus on
(REDD) program was established to assist with adaptation projects due to the impacts of climate
the reduction of carbon emissions and preser- change. The Copenhagen Accord also requests
vation of the forests (Alexander 2013; T nzler a technical analysis due in 2015 to determine
and Ries 2012). The program was initially devel- the need of a new potential CO2 atmospheric
oped to support the Clean Development Mech- concentration level to maintain to achieve
anism (CDM) policies under the Kyoto Proto- the underlying goals behind climate change
col to allow developing countries to gain funds mitigation (Moore 2012). The main highlights
for projects focused around the conservation, of the Copenhagen Accord included continued
afforestation, and reforestation leading to the re- action by countries to manage global temperature
duction of GHG emissions. The IPCC had ear- increases to under 2 C, submission of GHG
lier concluded that the continual degradation of emissions reduction goals by January 2010
terrestrial and wetland forests has direct impacts from each participatory country, reports from
on the mitigation of climate change (Alexander developing countries about climate mitigation
2013; T nzler and Ries 2012; Plugge et al. 2011; actions, and nancial funding for environmental
Nzunda and Mahuve 2011; Wertz-Kanounnikoff conservation projects in developing countries
et al. 2008). At the 12th Conference of Par- (Moore 2012). The Copenhagen Accord also
ties (COP15) of the UNFCC in the year 2007, stipulates that the UNFCCC will continue its
220 Climate Risk Analysis for Financial Institutions

role for nancial governance, GHG emissions More advanced statistical techniques can be
reporting and monitoring, and scienti c climate applied to derive more speci c data analysis, such
analysis for the years beyond the expiration of as Taylor diagrams, to graphically compare sta-
the Kyoto Protocol and will conduct meetings tistical correlation summaries between individual
as necessary to achieve appropriate mitigation of climate patterns(observed or modeled), empiri-
climate change (Moore 2012). cal orthogonal function(EOF), and rotated EOF
analysis to interpret potential spatial modes or
patterns of variability changes over time (Shea
Scientific Fundamentals 2014).
All the fundamental Climate Trend Analysis
Given the importance of the climate change phe- techniques are important for statistical analy-
nomenon evidenced by the international climate sis and modeling that can help produce climate
change mitigation policies mentioned in the pre- change projections for the near future. These
vious section, a substantial focus on accurately projections directly impact climate change miti-
assessing, measuring, forecasting, and validating gation policies, adaptation projects, and business
the variables of climate change emerges. All the decisions of respective stakeholders.
policies and strategies to mitigate climate change
fundamentally require measurement and valida- Surface and Air Temperature Analysis for
tion methodologies in order to succeed. The fun- Land and Sea
damental scienti c approaches to analyzing cli- Measurements of land air temperature and sea
mate change and its mitigation provide a key surface temperature (SST) are of signi cant im-
perspective of the future and how to maneuver portance to understand climate conditions in re-
accordingly to adapt to the potential impacts from spective regions. This is evidenced by the many
climate change. decades of available data of the measurements
Proper scienti c analysis can bene t all that exist previously to the climate change mit-
stakeholders within the climate change mitigation igation conversation. These measurements can
framework by providing relative perspective provide the data necessary to corroborate nd-
and data interpretation that can potentially ings from climate data models by serving as
drive strategic decisions. This section will the ground truth validation source (Hansen et al.
brie y review the popular scienti c methods 2006). More importantly, the temperature mea-
that examine aspects of climate change and its surements over land and sea can be coordinated in
mitigation. a spatiotemporal plane for pattern analysis, data
modeling, and statistical analysis.
Climate Trend Analysis Land surface air temperature weather stations
Climate can be de ned as the weather conditions are usually stationed in strategic locations
that reveal over an arbitrary period of time, which throughout a speci ed region to collect
is usually supported through conventional sta- appropriate data and summarize as the highest
tistical analysis or statistical diagnostics. Trend, and lowest temperature recorded for a particular
relative to climate, can be de ned as the gradual day, which is then reported to a central station
differences of certain climate-related variables which may collect the raw data and combine
over some period of time (Shea 2014). it with other regional surface temperature
Traditional statistical time series analysis can weather stations for further analysis. Appropriate
be conducted on temperature changes, rainfall standards are followed in the placement of
measurement, snow patterns, and ooding, and the temperature sensors which ensure they are
other climate change indicators to detect, es- impartial to in uences that may be in close
timate, and predict possible emerging climate proximity (Hansen et al. 2006).
trends are signi cant scienti c tools to better Similarly, sea surface temperatures (SST) can
understand climate change (Shea 2014). be collected by remote stations on ships or buoys
Climate Risk Analysis for Financial Institutions 221

equipped with sensors that take measurements of ing sensors that can accurately audit the amount
the water surface and summarize the highs and of GHG emissions produced. These policies are
lows of daily water temperature and levels which driven by the science that each GHG has a di-
can be later polled and combined at a central rect impact on the climate change to the planet.
station (Reynolds et al. 2007). Climate scien- These greenhouse gases that are emitted to the
tists rely on statistical anomaly analysis of the atmosphere create a barrier which does not allow
water temperature and levels to assess potential solar heat received from the sun to escape the C
inclement weather in the form of cyclones, hur- planet s atmosphere once it has reached surface
ricanes, and tropical storms. With the mentioned level, thereby increasing the climate (Myhre et al.
techniques, climatologists can develop statistical 2013).
models that can help estimate, detect, and project Scienti c methods to derive the atmospheric
future weather patterns (Reynolds et al. 2007). lifetime, which is the amount of time a gas
Satellite resolution imaging may give a may stay in the atmosphere; GHG concentra-
broader, less granular depiction of the overall tions, which are the estimated values measured
temperature ranges worldwide to help focus on in respective until current GHG emissions in
particular patterns or regions of interest, but they the atmosphere; radiative forcing, which is the
are unable to produce the amount of detail that amount of heat energy the gases absorb and keep
surface level temperature sensors can provide in the earth s atmosphere rather than allow it to
(Kungvalchokechai and Sawada 2013). leave back to space; and global warming potential
The monitoring and analysis of land surface (GWP), which is the a derived ratio from the
temperature is scienti cally linked to the planet s atmospheric lifetime and radiative forcing over a
weather and climate patterns, which can be a di- speci ed timescale to determine the impact of the
rect result of increasing atmospheric GHGs. The gas on global warming relative to carbon dioxide
temperature increases in certain regions can have (CO2 ); give climate scientists quanti able met-
effects on global glaciers, arctic ice sheets, and rics to weigh and assess the impact of each GHG
vegetation on the planet. Accurately understand- emission to appropriate mathematical models and
ing the aspects of the surface temperatures can climate data models (Myhre et al. 2013). These
give scientists a clearer picture about adaptation methodology and analysis give climate scientists
needs and climate impact projections. quanti able terms to weight and assess the differ-
ent intensities and impacts of each GHG emission
Emissions Analysis to appropriate mathematical models and climate
The term greenhouse gases emissions refer di- data models. An important note about emissions
rectly to the emissions produced from indus- that impact climate change include both natural
trial processes, transportation by-products, agri- (water vapor) and anthropogenic (pollution or
cultural by-products, and societal waste products. pollutants from human activity) sources which
The gases in questions are the following: carbon both need to be accurately quanti ed and ana-
dioxide (CO2 ), methane (CH4 ), nitrous oxide lyzed (Myhre et al. 2013).
(N2 O), per uorocarbons (PF C s), hydro uoro-
carbons (HF C s), sulfur hexa uoride (SF6 ), as Carbon Capture and Sequestration
well as the indirect gases that will not be men- Analysis
tioned here (Raufer and Iyer 2012). As mentioned The term carbon sequestration refers to the natu-
in the previous section, the success of climate ral or synthetic process of capturing and/or stor-
change mitigation policies rely directly on the ac- ing carbon dioxide (CO2 ) emissions, thereby mit-
curate measurements of past, present, and future igating climate change by reducing the amount
GHG emissions that could reach the atmosphere, of the GHG emission to reach or remain in the
thereby increasing the global temperatures. atmosphere. The natural process of achieving a
GHG emissions control stipulated from cli- balance of CO2 emissions and climate change
mate change mitigation policies require monitor- comes in the form of forested wetlands, terrestrial
222 Climate Risk Analysis for Financial Institutions

forests, and plant life, all of which have the capa- Geographical Information Systems can be
bility to capture CO2 emissions for consumption developed and customized to successfully
and store carbon into the soil which their roots are achieve the feature requirements for different
deeply entrenched (Freedman 2014; Alexander climate analysis purposes, but some of the
2013). The synthetic process captures carbon- conceptual fundamentals that a GIS system
based emissions at the point of production from developed to analyze climate change usually
industrial facilities that produce the emissions revolve around the following abilities.
and transport it deep underneath land or sea
where it may dissolve or be stored inde nitely Mapping
(Katzer et al. 2007). A GIS system for climate change analysis should
Both the natural and synthetic carbon seques- have the ability to render a data canvas of the
tration processes require accurate calculations geographical region of interest or global map
and depictions of the amount of CO2 being cap- where data overlays can be produced based on
tured and/or stored to determine the effectiveness appropriate data streams to represent appropriate
of the mitigation (Freedman 2014; Alexander depictions of the said data.
2013; Katzer et al. 2007). To achieve this feat
synthetically, scientists need to mathematically Gridding and Regridding
calculate the amount of CO2 in units of metric Gridded data can be high-resolution images of
tons that can be properly captured and stored a certain geographical region that does not give
under the planet s land and sea without causing the total perspective of surrounding regions due
adverse effects to the environment. The terrestrial to computational or storage limitations. Segments
or natural approach would require scientists to or fragments of a larger overall high-resolution
determine the amount of CO2 that plant life from image are provided, which is a part of a se-
forested areas can capture and store the emissions quenced grid of neighboring images that can be
to achieve a substantial mitigation to climate examined individually. Due to the nature of the
change (Freedman 2014; Katzer et al. 2007). This high-resolution image, data overlays, points of
is evidenced by the rati cation of the REDD+ interests, and data streams can still be integrated
policy mentioned in the previous section. using GIS technologies but only speci c to the
gridded image provided (Shea 2014; Reynolds
and Smith 1994).
Geographical Information Systems Regridding refers to the interpolation of
The scienti c analysis techniques, data sources, one grid resolution image to a different grid
and respective stakeholder interests in climate resolution image, usually that of a sequence that
change mitigation have created a demand for depicts the immediate neighboring resolutions of
platforms that can dynamically bring together a speci c geographical region. Different methods
the different aspects that are required to perform such as temporal, vertical, or horizontal inter-
effective climate change mitigation analysis. Ad- polation is used to combine the resolutions, but
vances in information technology, accessibility to most commonly spatial (horizontal) interpolation
data sources, and economic costs of data storage is utilized (Shea 2014; Reynolds and Smith
have allowed for the availability of Geographi- 1994). Depending on the type of analysis and
cal Information Systems (GIS) platforms to be data, appropriate interpolation techniques are
developed for robust analysis requirements of required. To perform quantitative analysis on
climate change mitigation research. GIS serves data points across many gridded resolutions,
as a tool for scienti c-based climate research by regretting across a common grid is required
practically combining the many different scien- to avoid misleading numerical calculations
ti c analysis techniques with appropriate data among the data from different grid images.
streams and visualizations to provide data-driven GIS applications and platforms provide many
insights for climate change stakeholders. different interpolation techniques for regridding
Climate Risk Analysis for Financial Institutions 223

which allows for more accurate data analysis climate change (Rosenqvist et al. 2003; Reynolds
(Shea 2014; Reynolds and Smith 1994). This is and Smith 1994; Gibbs et al. 2007; Palmer Fry
a crucial tool that can ensure accuracy of very 2011).
computationally large amounts of geographical
data. Verification
The ability to monitor and report on different
Monitoring and Measurement aspects of climate change based on statistical C
GIS applications and systems can be con gured models or projections derived from historical data
to dynamically operate with real-time data may not always accurately portray the actual
streams from third-party data vendors or remote observational data. Scienti c analysis requires
sensors that may provide climate-based or corroborated ground truth data to validate if the
emissions-based information. A platform that data models developed from historical data or
can actively receive the data streams from the data from a different region is statistically sig-
sensors and spatially visualize and overlay ni cant enough to be accurate. Veri cation is a
the data on a geographical plane relative to critical factor in climate change mitigation poli-
the sensor s logistical location can provide cies due to the reliance on the ability to cor-
an automated monitoring system to detect rectly determine climate change and emissions
potentially interesting climate or emissions levels to properly incentivize global participants
patterns which can be practically interpreted to achieve the common goal to degrade atmo-
depending on stakeholder interests (Rosenqvist spheric temperature increases (Moore 2012). To
et al. 2003; Reynolds and Smith 1994; Gibbs achieve ground truth validations, climate change
et al. 2007; Palmer Fry 2011). The monitoring mitigation policies emphasize the requirement of
aspect of GIS indicates the ability to process large approved sensors that can accurately verify the
amounts of data, store the data, and visualize integrity of measurements taken at the point of
the data in minimal amounts of time to provide production. This can be interpreted as remote
insight to the stakeholder. Without this aspect sensors that are capable of measuring the ground
or ability of a GIS platform or system, climate truth data that is required in climate-based anal-
change analysis techniques would not bene t ysis scenarios. (Rosenqvist et al. 2003; Reynolds
greatly from GIS technologies. and Smith 1994; Gibbs et al. 2007; Palmer Fry
2011). GIS systems need to be scalable and
Reporting adaptable to incorporate regulatory ground truth
The ability to retrieve information and analysis data or provide the appropriate information tech-
dynamically in an easy to interpret format is nology that meet the standards of climate mitiga-
a key fundamental for a GIS system that may tion policies.
be developed for the purposes of climate anal-
ysis. The reporting mechanism allows the user
of the system to gather important data and in- Key Applications
telligence that could lead insight-driven decision.
Both monitoring and reporting are crucial aspects Some of the aspects of climate change mitigation
of a GIS system designed for climate analysis policies discussed offer nancial instruments,
due to the fact that reporting is based on the data incentives, and platforms for interested investors,
derived from monitoring, and the insights from impacted industrial stakeholders and participat-
reporting are the primary output that analysis will ing nations to explore opportunities and strategies
be conducted on. Inaccuracies or inconsistencies that can directly, indirectly, or residually impede
in reporting may deem the GIS system obsolete, the global temperature increase. Stakeholders
but accurate reporting could mean a substantial who may decide to participate in the incentives
increase in productivity, ef ciency, and progress offered by climate mitigation polices are aware
in conducting relevant analysis and research on that proper knowledge and analysis of climate
224 Climate Risk Analysis for Financial Institutions

change aspects that may be related to respective Joint Implementation programs, have reached bil-
interests may provide a competitive edge for lions of dollars a year on average which is fore-
potential investment decisions (Kossoy and casted to grow into the trillions in the near future
Guigon 2012). This section will explore some (Buchner et al. 2011; Moore 2012). To effectively
practical examples of Geographical Information monitor the funding needs, progress, success,
Systems that perform climate science-related and completion of projects and programs, the
analysis and their application to different intermediaries of the climate nance framework
nancial investment research. rely on GIS-based tools and analysis to make
informed data-driven decisions.
Climate Finance
The term climate nance represents the nancial Carbon Finance
mechanisms set in place by climate change The UNFCCC stipulations of pollution and emis-
mitigation policies, such as Kyoto Protocol and sions control creates a realm in which carbon
Copenhagen Accord, which allow for national, footprints and greenhouse gases are constrained
regional, and international parties to have access to limit the potential increase in global climate
to nancing channels speci cally for climate change (Moore 2012). This constraint creates a
change mitigation and adaptation projects and commodity out of the amount of carbon-based
programs (Kossoy and Guigon 2012; Buchner or GHG emissions permitted for industrial and
et al. 2011). These projects and programs are national interests (Raufer and Iyer 2012). Climate
developed based on achieving minimal carbon- mitigation policy frameworks have promoted the
based emissions footprints and resiliency to investments in projects and programs that reduce
climate change through appropriate research the previously mentioned GHG emissions as well
and economic development. The term had been as provided a platform where the commodity of
originally coined to refer to the obligations that allowed emissions amounts are monetized into -
developed countries committed to developing nancial instruments that are tradable in a market-
countries under the rati ed UNFCCC policies; based cap and trade framework. The platform
however, the term is now more synonymous with where the commoditized emissions allowances
the all nancial procedures and ows relating are exchanged is typically referred to the carbon
to climate change mitigation and adaptation market, while the overall concept of investing and
projects and programs (Kossoy and Guigon 2012; trading these commodities can be represented by
Buchner et al. 2011). Financial funding can be the term carbon nance. (Moore 2012; Raufer
provided from government budgets, domestic and Iyer 2012; Kossoy and Guigon 2012).
budgets, capital markets, and public and/or pri- Carbon nance leverages the Kyoto Protocol s
vate sectors mediated through bilateral nancial Clean Development Mechanisms and Joint
institutions, multilateral nancial institutions, Implementation framework to help facilitate the
and development cooperation agencies or directly investments into emissions reductions projects
from the UNFCC itself via the Green Climate to earn or trade emissions allowances or credits
Fund, NGOs(nongovernmental organizations), (Kossoy and Guigon 2012; Henr quez 2013). The
and/or private sector. Investments decisions and World Bank facilitates carbon nance through its
strategies in renewable energy can potentially own carbon nance unit which purchases carbon
be considered climate nance if the renewable credits or GHG emissions reductions generated
energy projects and programs qualify under the from projects or programs in developing
UNFCC guidelines (Kossoy and Guigon 2012; countries or transitioning economies to their
Buchner et al. 2011). fund contributors that employ their services,
The nancing projects and programs designed usually in the form of governments or companies
to mitigate or adapt the effects of climate change, with an interest in attaining or trading the
such as the previously discussed Carbon Offset carbon credits (Lewis 2010). The World Bank
Programs, Clean Development Mechanisms, and can achieve this by providing carbon funds
Climate Risk Analysis for Financial Institutions 225

and facilities which contribute to projects and Sustainability Risk Management


programs that can yield carbon credits or GHG The awareness of the effects climate change may
emissions reductions according to the Kyoto have on society and economy creates a legitimate
Protocol s Clean Development Mechanism and business concern for investor and stakeholder
Joint Implementation frameworks (Lewis 2010; con dence. Organizations which employ busi-
Henr quez 2013; Kossoy and Guigon 2012; ness strategies that do not take environmental
Moore 2012). Essentially, the World Bank invests risk factors, such as climate change, into consid- C
and supports projects and programs that qualify eration when planning, operating, or expanding
to earn carbon credits, which the World Bank can may be adversely impacted by evolving climate
acquire and sell to interested parties through their change mitigation policies or effects of climate
carbon nance business (Lewis 2010; Henr quez change. Organizations must solidify con dence
2013). Carbon credits are the of cial allowance with investors and stakeholders by engaging in
of 1 metric ton of CO2 or equivalent gases earned business strategies that align operations and rev-
through approved projects or programs that enue goals with environmentally friendly poli-
progress the climate change mitigation agenda cies. Global and national compliance regulations
(Moore 2012). stemming from global climate change mitiga-
The pricing of allowed carbon-based emis- tion efforts call for renewable energy, environ-
sions is based on the limited supply and high mentally friendly business practices, and sustain-
demand for carbon credits (Litterman 2013; Hen- ability initiatives. Organizations may not have a
r quez 2013; Kossoy and Guigon 2012). To make strategic initiative or outlook to align their busi-
informed investment decisions from the buyers ness strategies with considerations for regulatory
and sellers positions, proper nancial analysis and environmental risks associated with climate
and careful investment research need to occur on change which can lead to lack of investment
the projects and programs that yield the carbon con dence and appeal (Zu 2013; Baumast 2013;
credits. Climate nance shares elements with Schmiedeknecht 2013). An example of this can
carbon nance with respect to the dependence on be evidenced by organizational climate risk dis-
GIS tools and analysis to determine if projects closures that are disseminated as public infor-
and programs properly qualify and succeed to mation for investors and stakeholders to review
earn carbon credits. Carbon nance, however, potential liabilities and assets of the respective
depends on both nancial analysis and scienti c business that can be impacted by environmental
research to make proper investment decisions risk factors.
(Litterman 2013; Henr quez 2013; Kossoy and Sustainability Risk Management considers the
Guigon 2012). GIS tools that combine both give optimal business strategy for an organization to
stakeholders in carbon nance more insight when achieve an effective and ef cient balance between
making critical decisions. Investment research the prosperity of a business and its adherence
can potentially incorporate GIS-based tools to to environmentally friendly policies. Traditional
understand estimates or forecasts of potential risk management and climate science techniques
de cit or surplus in carbon emissions. GIS serves may be performed on business assets, interests,
as an important investment research tool in the and liabilities to assess relative impacts to the
carbon nance domain because of the similarities organizational pro t goals (Zu 2013; Baumast
to the commodities markets. Understanding the 2013). Sustainability and vulnerability assess-
fundamentals of the commodity may help pro- ments conducted in depth consider environmen-
vide bene cial insight when investing in such a tal risk factors such as oods, natural disas-
commodity. In the case of carbon nance, the ters, and climate change and how they may be
commodity are the carbon credits or GHG emis- detrimental to the business (Zu 2013; Baumast
sions allowances yielded from climate mitigation 2013; Schmiedeknecht 2013). They also consider
projects or purchased from a carbon market at a adaptation strategies to achieve residency in the
market-competitive price. wake of such environmental risks for business
226 Climate Risk Analysis for Financial Institutions

continuity and prosperity which can be trans- GIS systems and tools are used to provide mean-
lated into long-term con dence for investors. To ingful insight and intelligence for monitoring,
achieve such assessments, GIS tools and analysis reporting, and veri cation applications of climate
can be employed to analyze environmental risk change analysis. GIS technologies and systems
factors to business operations, supply chains, and may combine climate-related research and anal-
other applicable business assets. Forecasting and ysis data sources on geographical planes that
simulation models of risk factors are considered can help perform traditional analytical techniques
as well as nancial burdens that may be ex- to yield data models that can be used to make
perienced by the impacted business (Zu 2013). strategic decisions. The continued emergence and
After examination of each business process and demand for GIS and geospatial analysis skills
potential environmental risks that may impact in the climate science and investment research
them are analyzed, strategies are developed to markets can be expected to grow as the relevance
minimize the risks (Zu 2013). Integrated tech- of climate-related applications, such as climate
nologies utilizing GIS-based analysis and data nance, carbon nance, and sustainability man-
management tools can be employed to conduct agement, increases.
automated monitoring, auditing, and reporting Important trends in climate change research,
on sustainability models and goals to achieve GIS, and nancial applications are brie y dis-
compliance. The identi cation of potential risks cussed in the following sections.
and issues impacting business interests early on
can help the business maneuver its directional Mobile GIS Remote Sensor Networks
strategy to avoid costly regulatory failures or Mobile GIS remote sensor networks are consid-
repetitional damage (Zu 2013; Schmiedeknecht ered an important topic in both climate research
2013). and GIS. To optimally and ef ciently design,
Sustainability Risk Management extends monitoring, reporting, and veri cation GIS sys-
into the nancial markets by allowing orga- tems that can potentially be incorporated into
nizations that satisfy corporate sustainability REDD+ projects and programs or other climate
assessments to be held in Sustainability Indices nance-funded projects are crucial to attain accu-
(Schmiedeknecht 2013). Sustainability indices rate data at the highest integrity standards (Samek
represent an index of organizations considered to et al. 2013; Rosenqvist et al. 2003; Patenaude
be socially responsible, environmentally friendly, et al. 2004). Much research is being conducted to
and sustainable in the event of environmental optimize and propose equipment and techniques
risks. Investment rms that offer Sustainability to achieve an economically and practically fea-
Indices may market them as safer and resilient sible approach to achieving GIS remote sensor
to climate change to potential investors who seek networks that can potentially become a standard-
investment con dence relative to environmental ized method to collect and validate data such as
risks (Schmiedeknecht 2013). Organizations who emissions, carbon storage, carbon sequestration
are able to reach Sustainability Indices may be rates, air temperatures, and other climate change-
considered a safer investment option compared related datapoint.
to organizations that cannot achieve the same
quali cations. Data-Driven GIS Decision-Making Tools
GIS systems that can properly collect data from
multiple data sources and conduct application-
Future Directions speci c analysis on the said data with potential
business logic are an area that climate change
Climate change analysis and Geographical Infor- stakeholders are seeking to expand (Sizo
mation Systems share a relationship that will only et al. 2014; Benz et al. 2004; Ganguly et al.
evolve as the awareness and applications of cli- 2005). Climate-based nancial and regulatory
mate change mitigation become more prevalent. applications to automate business intelligent GIS
Climate Risk Analysis for Financial Institutions 227

systems that can perform dynamic analytical Freedman B (2014) Maintaining and enhancing ecological
observations to yield insights to assist in carbon sequestration. In: Freedman B (ed) Global
environmental change. Handbook of global environ-
decision-making scenarios can directly provide mental pollution, vol 1. Springer, Berlin/Heidelberg,
value-added service for climate change stake- pp 783 801
holders. Automated sustainability assessments Ganguly AR, Gupta A, Khan S (2005) Data min-
for organizations or GIS-based systems that can ing technologies and decision support systems for
signal important investment research analysis are
business and scienti c applications. In: Encyclope- C
dia of data warehousing and mining. Idea Group
some of the many applications that data-driven Publishing
geospatial analysis and technology is making Gibbs HK, Brown S, Niles JO, Foley JA (2007) Mon-
available to the climate research and nance- itoring and estimating tropical forest carbon stocks:
making REDD a reality. Environ Res Lett 2(4):045023
based industries (Tomlinson 2007; Zu 2013). Hansen JE, Ruedy R, Sato M, Lo K (2006) NASA GISS
surface temperature (GISTEMP) analysis. Trends: a
compendium of data on global change
Henr quez BLP (2013) Environmental commodities mar-
Cross-References kets and emissions trading, towards a low carbon
future. Routledge
 ArcGIS: General-Purpose GIS Software Katzer J, Ansolabehere S, Beer J, Deutch J, Ellerman
 Climate Adaptation, Introduction AD, Friedmann SJ, Herzog H, Jacoby HD, Joskow PL,
McRae G et al (2007) The future of coal: options for
 Climate Change and Developmental Economies
a carbon-constrained world. Massachusetts Institute of
 Climate Extremes and Informing Adaptation Technology, Boston
 Climate Hazards and Critical Infrastructures Kossoy A, Guigon P (2012) State and trends of the carbon
Resilience market. World Bank, Washington DC
Kungvalchokechai S, Sawada H (2013) The ltering of
 Data Models in Commercial GIS Systems
satellite imagery application using meteorological data
 Financial Asset Analysis with Mobile GIS aiming to the measuring, reporting and veri cation
 Geosensor Networks, Formal Foundations (MRV) for REDD. Asian J Geoinf 13(3)
 GPS Data Processing for Scienti c Studies of Lewis JI (2010) The evolving role of carbon nance in
promoting renewable energy development in China.
the Earth s Atmosphere and Near-Space Envi- Energy Policy 38(6):2875 2886
ronment Litterman B (2013) What is the right price for carbon
emissions. Regulation 36:38
Moore C (2012) Climate change legislation: current
developments and emerging trends. In: Chen W-Y,
References Seiner J, Suzuki T, Lackner M (eds) Handbook of cli-
mate change mitigation. Springer, Berlin/Heidelberg,
Alexander S (2013) Reducing emissions from deforesta- pp 43 87
tion and forest degradation. In: Finlayson M, McInnes Myhre G, Shindell D, BrØon F-M, Collins W, Fuglestvedt
R, Everard M (eds) Encyclopedia of wetlands: wetland J, Huang J, Koch D, Lamarque J-F, Lee D, Mendoza
management, vol 2. Springer, Berlin/Heidelberg B, Nakajima T, Robock A, Stephens G, Takemura T,
Baranzini A, Carattini S (2014) Taxation of emissions Zhang H (2013) Climate change 2013: the physical
of greenhouse gases. In: Freedman B (ed) Global science basis. Contribution of working group I to
environmental change. Handbook of global environ- the fth assessment report of the intergovernmental
mental pollution, vol 1. Springer, Berlin/Heidelberg, panel on climate change, book section 8. Cambridge
pp 543 560 University Press, Cambridge/New York, pp 659 740
Baumast A (2013) Carbon disclosure project. In: Id- Nzunda EF, Mahuve TG (2011) A swot analysis of mit-
owu SO, Capaldi N, Zu L, Gupta AD (eds) Ency- igation of climate change through REDD. In: Filho
clopedia of corporate social responsibility. Springer, WL (ed) Experiences of climate change adaptation
Berlin/Heidelberg, pp 302 309 in Africa. Climate change management. Springer,
Benz UC, Hofmann P, Willhauck G, Lingenfelder I, Berlin/Heidelberg, pp 201 216
Heynen M (2004) Multi-resolution, object-oriented Palmer Fry BP (2011) Community forest monitoring
fuzzy analysis of remote sensing data for gis-ready in REDD+: the M in MRV? Environ Sci Policy
information. ISPRS J Photogramm Remote Sens 14(2):181 187
58(3):239 258 Patenaude G, Hill RA, Milne R, Gaveau DLA, Briggs
Buchner B, Falconer A, HervØ-Mignucci M, Trabacchi C, BBJ, Dawson TP (2004) Quantifying forest above
Brinkman M (2011) The landscape of climate nance. ground carbon content using LiDAR remote sensing.
Climate Policy Initiative, Venice, p 27 Remote Sens Environ 93(3):368 380
228 Climate Risks

Plugge D, Baldauf T, K hl M (2011) Reduced emissions


from deforestation and forest degradation (REDD): Climate Risks
why a robust and transparent monitoring, reporting and
veri cation (MRV) system is mandatory. In: Climate  Climate Extremes and Informing Adaptation
change research and technology for adaptation and
mitigation. InTech, Rijeka, pp 155 170
Raufer R, Iyer S (2012) Emissions trading. In: Chen W-
Y, Seiner J, Suzuki T, Lackner M (eds) Handbook
of climate change mitigation. Springer, New York, Climate Trend Analysis
pp 235 275
Reynolds RW, Smith TM (1994) Improved global sea
 Climate Risk Analysis for Financial Institu-
surface temperature analyses using optimum interpo-
lation. J Clim 7(6):929 948 tions
Reynolds RW, Smith TM, Liu C, Chelton DB, Casey
KS, Schlax MG (2007) Daily high-resolution-
blended analyses for sea surface temperature. J Clim Climate Variability
20(22):5473 5496
Rosenqvist ¯, Milne A, Lucas R, Imhoff M, Dobson
C (2003) A review of remote sensing technology in  Climate Extremes and Informing Adaptation
support of the kyoto protocol. Environ Sci Policy
6(5):441 455
Samek JH, Skole DL, Thongmanivong S, Lan DX,
Van Khoa P (2013) Deploying internet-based MRV Cloaking Algorithms
tools and linking ground-based measurements with
remote sensing for reporting forest carbon. APN Sci
Bull Issue 3, 4 Chi-Yin Chow
Schmiedeknecht MH (2013) Dow jones sustainability Department of Computer Science, City
indices. In: Idowu SO, Capaldi N, Zu L, Gupta AD University of Hong Kong, Hong Kong, China
(eds) Encyclopedia of corporate social responsibility.
Springer, Berlin/Heidelberg, pp 832 838
Shea D (2014) Climate data guide retrieved from https://
climatedataguide.ucar.edu/climate-data-tools-and-anal- Definition
ysis/
Sizo A, Bell S, Noble B (2014) Automated gis routine for
strategic environmental assessment: a spatiotemporal
Spatial cloaking is a technique used to blur a
analysis of urban and wetland change user s exact location into a spatial region in order
T nzler D, Ries F (2012) International climate change to preserve her location privacy. The blurred
policies: the potential relevance of REDD+ for peace spatial region must satisfy the user s speci ed
and stability. In: Scheffran J, Brzoska M, Brauch HG,
Link PM, Schilling J (eds) Climate change, human
privacy requirement. The most widely used pri-
security and violent con ict. Hexagon Series on Hu- vacy requirements are k-anonymity and mini-
man and Environmental Security and Peace, vol 8. mum spatial area. The k-anonymity requirement
Springer, Berlin/Heidelberg, pp 695 705 guarantees that a user location is indistinguish-
Tomlinson RF (2007) Thinking about GIS: geographic
information system planning for managers. ESRI, Inc.,
able among k users. On the other hand, the min-
Redlands imum spatial area requirement guarantees that a
Wertz-Kanounnikoff S, Verchot LV, Kanninen M, Mur- user s exact location must be blurred into a spatial
diyarso D (2008) How can we monitor, report and region with an area of at least A, such that the
verify carbon emissions from forests. Moving ahead
with REDD: issues, options, and implications. Center
probability of the user being located in any point
for International Forestry Research (CIFOR), Bogor, within the spatial region is 1=A. A user location
pp 87 98 must be blurred by a spatial cloaking algorithm
Zu L (2013) Sustainability risk management. In: Id- either on the client side or a trusted third-party
owu SO, Capaldi N, Zu L, Gupta AD (eds) Ency-
clopedia of corporate social responsibility. Springer, before it is submitted to a location-based database
Berlin/Heidelberg, pp 2395 2407 server.
Cloaking Algorithms for Location Privacy 229

Main Text Definition

This article surveys existing spatial cloaking Spatial cloaking is a technique to blur a user s
techniques for preserving users location privacy exact location into a spatial region in order to
in location-based services (LBS) where users preserve her location privacy. The blurred spatial
have to continuously report their locations to the region must satisfy the user s speci ed privacy
database server in order to obtain the service. requirement. The most widely used privacy re- C
For example, a user asking about the nearest gas quirements are k-anonymity and minimum spa-
station has to report her exact location. With tial area. The k-anonymity requirement guar-
untrustworthy servers, reporting the location antees that a user location is indistinguishable
information may lead to several privacy threats. among k users. On the other hand, the minimum
For example, an adversary may check a user s spatial area requirement guarantees that a user s
habit and interest by knowing the places she exact location must be blurred into a spatial
visits and the time of each visit. The key idea of region with an area of at least A, such that the
a spatial cloaking algorithm is to perturb an exact probability of the user being located in any point
1
user location into a spatial region that satis es within the spatial region is A . A user location
user speci ed privacy requirements, e.g., a k- must be blurred by a spatial cloaking algorithm
anonymity requirement guarantees that a user is either on the client side or a trusted third party
indistinguishable among k users. before it is submitted to a location-based database
server.

Cross-References
Historical Background
 Location-Based Services: Practices and Prod-
ucts The emergence of the state-of-the-art location-
 Privacy Preservation of GPS Traces detection devices, e.g., cellular phones, global
positioning system (GPS) devices, and radio-
frequency identi cation (RFID) chips, has
resulted in a location-dependent information
access paradigm, known as location-based
Cloaking Algorithms for Location services (LBS). In LBS, mobile users have the
Privacy ability to issue snapshot or continuous queries
to the location-based database server. Examples
Chi-Yin Chow of snapshot queries include where is the nearest
Department of Computer Science, City gas station and what are the restaurants within
University of Hong Kong, Hong Kong, China one mile of my location, while examples of
continuous queries include where is the nearest
police car for the next one hour and continuously
Synonyms report the taxis within one mile of my car
location. To obtain the precise answer of these
Anonymity; Location anonymization; Location queries, the user has to continuously provide
blurring; Location perturbation; Location-based her exact location information to a database
services; Location-privacy; Nearest neighbor; server. With untrustworthy database servers, an
Peer to peer; Privacy adversary may access sensitive information about
230 Cloaking Algorithms for Location Privacy

individuals based on their location information y


and queries. For example, an adversary may
identify a user s habits and interests by knowing F
the places she visits and the time of each visit. 4
E
The k-anonymity model (Sweeney 2002a, b) D
has been widely used in maintaining privacy in
databases (Bayardo and Agrawal 2005; LeFevre 3
et al. 2006, 2005; Meyerson and Williams 2004).
The main idea is to have each tuple in the table
as k-anonymous, i.e., indistinguishable among
other k 1 tuples. However, none of these tech- 2 C
B
niques can be applied to preserve user privacy
for LBS, mainly for the reason that these ap-
proaches guarantee the k-anonymity for a snap-
1 A
shot of the database. In LBS, the user loca-
tion is continuously changing. Such dynamic be- x
havior requires continuous maintenance of the 1 2 3 4
k-anonymity model. In LBS, k-anonymity is a Cloaking Algorithms for Location Privacy, Fig. 1
user-speci ed privacy requirement which may Adaptive interval cloaking (k D 3)
have a different value for each user.

which the k-anonymity requirement is three. If


Scientific Fundamentals the algorithm wants to cloak user A s location,
the system space is rst divided into four
Spatial cloaking algorithms can be divided into equal subspaces, h.1; 1/; .2; 2/i, h.3; 1/; .4; 2/i,
two major types: k-anonymity spatial cloaking h.1; 3/; .2; 4/i, and h.3; 3/; .4; 4/i. Since user
(Chow et al. 2006; Gedik and Liu 2005; Gruteser A is located in the subspaces h.1; 1/; .2; 2/i,
and Grunwald 2003; Gruteser and Liu 2004; which contains at least k users, these subspaces
Kalnis et al. 2006; Mokbel et al. 2006) and are further divided into four equal subspaces,
uncertainty spatial cloaking (Cheng et al. 2006). h.1; 1/; .1; 1/i, h.2; 1/; .2; 1/i, h.1; 2/; .1; 2/i, and
k-anonymity spatial cloaking aims to blur user h.2; 2/; .2; 2/i. However, the subspace containing
locations into spatial regions which satisfy the user A does not have at least k users, so the
user s speci ed k-anonymity requirement, while minimum suitable subspace is h.1; 1/; .2; 2/i.
uncertainty spatial cloaking aims to blur user Since there are three users, D, E, and F , located
locations into spatial regions which stratify the in the cell (4,4), this cell is the cloaked spatial
user s speci ed minimum spatial area require- region of their locations.
ment.
CliqueCloak
Adaptive Interval Cloaking This algorithm assumes a different k-anonymity
This approach assumes that all users have the requirement for each user (Gedik and Liu 2005).
same k-anonymity requirements (Gedik and Liu CliqueCloak constructs a graph and cloaks user
2005). For each user location update, the spatial locations when a set of users forms a clique
space is recursively divided in a KD-tree-like in the graph. All users share the same cloaked
format until a minimum k-anonymous subspace spatial region which is a minimum bounding box
is found. Such a technique lacks scalability as covering them. Then, the cloaked spatial region
it deals with each single movement of each is reported to a location-based database server
user individually. Figure 1 depicts an example as their locations. Users can also specify the
of the adaptive interval cloaking algorithm in maximum area of the cloaked region which is
Cloaking Algorithms for Location Privacy 231

considered as a constraint on the clique graph, Uncertainty


i.e., the cloaked spatial region cannot be larger This approach proposes two uncertainty spatial
than the user s speci ed maximum acceptable cloaking schemes, uncertainty region and cover-
area. age of sensitive area (Cheng et al. 2006). The
uncertainty region scheme simply blurs a user
k-Area Cloaking location into an uncertainty region at a particular
This scheme keeps suppressing a user location time t , denoted as U.t /. The larger region size C
into a region which covers at least k 1 other means a more strict privacy requirement. The
sensitive areas, e.g., restaurants, hospitals, and coverage of sensitive area scheme is proposed
cinemas around the user s current sensitive area for preserving the location privacy of users who
(Gruteser and Liu 2004). Thus, the user resident are located in a sensitive area, e.g., hospital or
area is indistinguishable among k sensitive areas. home. The coverage of sensitive area for a user is
Area.sensitive=area/
This spatial cloaking algorithm is based on a de ned as Coverage D Area.uncertainty=region/ . The
map which is partitioned into zones, and each lower value of the coverage indicates a more strict
zone contains at least k sensitive areas. Thus, the privacy requirement.
continuous movement of users is just abstracted
as moving between zones. Users can specify
their own privacy requirements by generalizing Casper
personalized sensitivity maps. Casper supports both the k-anonymity and
minimum spatial area requirements (Mokbel
Hilbert k-Anonymizing Spatial Region et al. 2006). System users can dynamically
(hilbASR) change their own privacy requirements at any
Here, users are grouped together into variant instant. It proposes two grid-based pyramid
buckets based on the Hilbert ordering of user lo- structures to improve system scalability, complete
cations and their own k-anonymity requirements pyramid and incomplete pyramid.
(Kalnis et al. 2006). Using the dynamic hilbASR,
the cloaked spatial regions of users A to F can
be determined by using two equations, start(u) Complete Pyramid
and end(u), which are depicted in Fig. 2, where Figure 3a depicts the complete pyramid data
start(u) and end(u) indicate the start and end structure which hierarchically decomposes the
rankings of a cloaked spatial region, respectively, spatial space into H levels where a level of height
u is a user identity, and the dotted line represents h has 4h grid cells. The root of the pyramid is of
the Hilbert ordering. height zero and has only one grid cell that covers
the whole space. Each pyramid cell is represented
Nearest-Neighbor k-Anonymizing Spatial as (cid, N ), where cid is the cell identi er and
Region (nnASR) N is the number of mobile users within the cell
This is the randomized version of a k-nearest boundaries. The pyramid structure is dynamically
neighbor scheme (Kalnis et al. 2006). For a user maintained to keep track of the current number
location u, the algorithm rst determines a set S of mobile users within each cell. In addition, the
of k-nearest neighbors of u, including u. From algorithm keeps track of a hash table that has
S , the algorithm selects a random user u0 and one entry for each registered mobile user with the
forms a new set S 0 that includes u0 and the k 1 form (uid, pro le, cid), where oid is the mobile
nearest neighbors of u0 . Then, another new set S 00 user identi er, pro le contains the user-speci ed
is formed by taking a union between S and S 00 . privacy requirement, and cid is the cell identi er
Finally, the required cloaked spatial region is the in which the mobile user is located. The cid is
bounding rectangle or circle which covers all the always in the lowest level of the pyramid (the
users of S 00 . shaded level in Fig. 3a).
232 Cloaking Algorithms for Location Privacy

F Users A B C D E F
4
E
D ku 6 2 2 3 3 3
C
3 Rank(u) 0 1 2 3 4 5
B

Start(u) 0 0 2 3 3 3
2
End(u) 5 1 3 5 5 5

1 A Start(u) = Rank(u) - (Rank(u) mod ku)


End(u) = Start(u) + ku-1
X
1 2 3 4

Cloaking Algorithms for Location Privacy, Fig. 2 hilbASR

Incomplete Pyramid cells at level i C 1 would result in having a new


The main idea of the incomplete pyramid struc- cell that satis es the privacy requirements of unew .
ture is that not all grid cells are appropriately If this is the case, the algorithm will split cell cid
maintained. The shaded cells in Fig. 3b indicate and distribute all its contents to the four new cells.
the lowest level cells that are maintained. However, if this is not the case, the algorithm
just updates the information of ur . In case one
of the users leaves cell cid, the algorithm will just
Cloaking Algorithm update ur if necessary.
Casper adopts a bottom-up cloaking algorithm In the cell merging operation, four cells at
which starts at a cell where the user is located level i are merged into one cell at a higher level
at from the lowest maintained level and then i 1 only if all the users in the level i cells
traverses up the pyramid structure until a cell have strict privacy requirements that cannot be
satisfying the user-speci ed privacy requirement satis ed within level i . To maintain this criterion,
is found. The resulting cell is used as the cloaked the algorithm keeps track of the most relaxed user
spatial region of the user location. In addition to u0r for the four cells of level i together. If such a
the regular maintenance procedures as that of the user leaves these cells, the algorithm has to check
basic location anonymizer, the adaptive location upon all existing users and make sure that they
anonymizer is also responsible for maintaining still need cells at level i . If this is the case, the
the shape of the incomplete pyramid. Due to the algorithm just updates the new information of u0r .
highly dynamic environment, the shape of the However, if there is no need for any cell at level i ,
incomplete pyramid may have frequent changes. the algorithm merges the four cells together into
Two main operations are identi ed in order to their parent cell. In the case of a new user entering
maintain the ef ciency of the incomplete pyramid cells at level i , the algorithm just updates the
structure, namely, cell splitting and cell merging. information of u0r if necessary.
In the cell splitting operation, a cell cid at level
i needs to be split into four cells at level i C 1 if
there is at least one user u in cid with a privacy Peer-to-Peer Spatial Cloaking
pro le that can be satis ed by some cell at level This algorithm also supports both the k-anony-
i C 1. To maintain such criterion, Casper keeps mity and minimum spatial area requirements
track of the most relaxed user ur for each cell. (Chow et al. 2006). The main idea is that before
If a newly coming object unew to the cell cid has requesting any location-based service, the mobile
a more relaxed privacy requirement than ur , the user will form a group from her peers via single-
algorithm checks if splitting cell cid into four hop and/or multi-hop communication. Then, the
Cloaking Algorithms for Location Privacy 233

Cloaking Algorithms for a


Location Privacy, Fig. 3 Hash Table The Entire System Area (level 0)
Grid-based pyramid data UID CID
structures. (a) Complete ... 2 2 Grid Structure (level 1)
pyramid. (b) Incomplete
...
pyramid
... 4 4 Grid Structure (level 2)
...
... ...
C
... ...
... ...
... ... 8 8 Grid Structure (level 3)

b
Hash Table The Entire System Area (level 0)
UID CID
... 2 2 Grid Structure (level 1)
...
... 4 4 Grid Structure (level 2)
...
... ...
... ...
... ...
... ... 8 8 Grid Structure (level 3)

spatial cloaked area is computed as the region that answers, it forwards the candidate answers to the
covers the entire group of peers. Figure 4 gives mobile user A. Finally, the mobile user A gets
an illustrative example of peer-to-peer spatial the actual answer by ltering out all the false
cloaking. The mobile user A wants to nd her positives.
nearest gas station while being ve anonymous,
i.e., the user is indistinguishable among ve
users. Thus, the mobile user A has to look around Key Applications
and nd four other peers to collaborate as a
group. In this example, the four peers are B, C , Spatial cloaking techniques are mainly used to
D, and E. Then, the mobile user A cloaks her preserve location privacy, but they can be used in
exact location into a spatial region that covers the a variety of applications.
entire group of mobile users A, B, C , D, and
E. The mobile user A randomly selects one of
the mobile users within the group as an agent. Location-Based Services
In the example given in Fig. 4, the mobile user Spatial cloaking techniques have been widely
D is selected as an agent. Then, the mobile user adopted to blur user location information before
A sends her query (i.e., what is the nearest gas it is submitted to the location-based database
station) along with her cloaked spatial region to server, in order to preserve user location privacy
the agent. The agent forwards the query to the in LBS.
location-based database server through a base
station. Since the location-based database server Spatial Database
processes the query based on the cloaked spatial Spatial cloaking techniques can be used to deal
region, it can only give a list of candidate answers with some speci c spatial queries. For example,
that includes the actual answers and some false given an object location, nd the minimum area
positives. After the agent receives the candidate which covers the object and other k 1 objects.
234 Cloaking Algorithms for Location Privacy

Cloaking Algorithms for


Location Privacy, Fig. 4
An example of peer-to-peer
spatial cloaking Gas Station

A
Base
Station B

D
E

Data Mining of mobile users, spatial cloaking techniques


To perform data mining on spatial data, spatial should allow continuous privacy preservation
cloaking techniques can be used to perturb indi- for both user locations and queries. Currently,
vidual location information into lower resolution existing spatial cloaking algorithms only sup-
to preserve their privacy. port snapshot location and queries.
(b) Not distinguishing between location and query
Sensor-Based Monitoring System privacy. In many applications, mobile users
Wireless sensor networks (WSNs) promise to do not mind that their exact location infor-
have a vast signi cant academic and commercial mation is revealed; however, they would like
impact by providing real-time and automatic data to hide the fact that they issue some location-
collection, monitoring applications, and object based queries as these queries may reveal their
positioning. Although sensor-based monitoring personal interests. Thus far, none of the exist-
or positioning systems clearly offer convenience, ing spatial cloaking algorithms support such
the majority of people are not convinced to use a relaxed privacy notion where it is always
such systems because of privacy issues. To over- assumed that users have to hide both their
come this problem, an in-network spatial cloak- locations and the queries they issue.
ing algorithm can be used to blur user locations
Examples of applications that call for such a
into spatial regions which satisfy user-speci ed
new relaxed notion of privacy include:
privacy requirements before location information
is sent to a sink or base station. (1) Business operation. A courier business com-
pany has to know the location of its employ-
ees in order to decide which employee is
Future Directions the nearest one to collect a certain package.
However, the company is not allowed to keep
Existing spatial cloaking algorithms have limited track of the employees behavior in terms of
applicability as they are: their location-based queries. Thus, company
employees reveal their location information,
(a) Applicable only for snapshot locations and but not their query information.
queries. As location-based environments are (2) Monitoring system. Monitoring systems
characterized by the continuous movements (e.g., transportation monitoring) rely on
Cluster Analysis 235

the accuracy of user locations to provide international conference on data engineering (ICDE),
their valuable services. In order to convince Atlanta
Meyerson A, Williams R (2004) On the complexity of
users to participate in these systems, certain optimal K-anonymity. In: Proceedings of the ACM
privacy guarantees should be imposed on symposium on principles of database systems (PODS),
their behavior through guaranteeing the Paris, pp 223 228
privacy of their location-based queries even Mokbel MF, Chow CY, Aref WG (2006) The new casper:
though their locations will be revealed.
query processing for location services without com- C
promising privacy. In: Proceedings of the international
conference on very large data bases (VLDB), Seoul,
pp 763 774
Sweeney L (2002a) Achieving k-anonymity privacy pro-
Cross-References tection using generalization and suppression. Int J
Uncertain Fuzziness Knowl Sys 10(5):57 88
 Location-Based Services: Practices and Prod- Sweeney L (2002b) k-anonymity: a model for protecting
ucts privacy. Intern J Uncertain Fuzziness Knowl-based
Syst 10(5):557 570
 Privacy and Security Challenges in GIS
 Privacy Preservation of GPS Traces

References
Close Range

Bayardo RJ Jr, Agrawal R (2005) Data privacy through  Photogrammetric Applications


optimal k-anonymization. In: Proceedings of the in-
ternational conference on data engineering (ICDE),
Tokyo, pp 217 228
Cheng R, Zhang Y, Bertino E, Prabhakar S (2006) Pre-
serving user location privacy in mobile data man- Closest Point Query
agement infrastructures. In: Proceedings of privacy
enhancing technology workshop, Cambridge, pp 393
412  Nearest Neighbor Query
Chow CY, Mokbel MF, Liu X (2006) A peer-to-peer spa-
tial cloaking algorithm for anonymous location-based
services. In: Proceedings of the ACM symposium on
advances in geographic information systems (ACM
GIS), Arlington, pp 171 178 Closest Topological Distance
Gedik B, Liu L (2005) A customizable k-anonymity
model for protecting location privacy. In: Proceedings
of the international conference on distributed comput-  Conceptual Neighborhood
ing systems (ICDCS), Columbus, pp 620 629
Gruteser M, Grunwald D (2003) Anonymous usage of
location-based services through spatial and temporal
cloaking. In: Proceedings of the international confer-
ence on mobile systems, applications, and services
(MobiSys), San Francisco, pp 31 42 Cloud
Gruteser M, Liu X (2004) Protecting privacy in contin-
uous location-tracking applications. IEEE Secur Priv  Medical Image Dataset Processing over
2(2):28 34
Kalnis P, Ghinita G, Mouratidis K, Papadias D (2006) Pre-
Cloud/MapReduce with Heterogeneous Archi-
serving anonymity in location based services. Techni- tectures
cal report TRB6/06, Department of Computer Science,
National University of Singapore
LeFevre K, DeWitt DJ, Ramakrishnan R (2005) Incognito:
ef cient full-domain k-anonymity. In: Proceedings of
the ACM international conference on management of Cluster Analysis
data (SIGMOD), Baltimore, pp 29 60
LeFevre K, DeWitt D, Ramakrishnan R (2006) Mondrian
multidimensional k-anonymity. In: Proceedings of the  Geodemographic Segmentation
236 Clustering of Geospatial Big Data in a Distributed Environment

processing in real time. It is possible to scale


Clustering of Geospatial Big Data a system vertically to some extent, that is, add
in a Distributed Environment extra storage, more memory, or a faster CPU to a
single machine. This approach is however usually
Thomas Triplet and Samuel Foucher
costly, remains sensible to failures, and does not
Computer Research Institute of Montreal,
scale well with the number of users.
Montreal, QC, Canada
As an alternative, it is also possible to scale
a system horizontally, that is, combine several
commodity servers to solve a problem too com-
Synonyms plex for a single machine. Besides the sheer
amount of data, distributed systems have many
Distributed computing; Machine learning; Spa- bene ts compared to a monolithic system, in-
tiotemporal clustering; Unsupervised learning cluding greater reliability, higher availability, and
better performance depending on the use cases.
However, distributed systems also present a num-
Historical Background ber of challenges, network latency, and hardware
failure, to name a few.
Clustering, sometimes called unsupervised learn- To be effective, this horizontal approach also
ing/classi cation or exploratory data analysis, is requires specially designed algorithms that can
one of the most fundamental steps in understand- be ef ciently distributed between the different
ing a dataset, aiming to discover the unknown machines. After reviewing distributed comput-
nature of data through the separation of a nite ing systems, this article presents clustering algo-
dataset, with little or no ground truth, into a rithms suitable for geospatial big data in a dis-
nite and discrete set of natural, hidden data tributed environment. The last section describes
structures. Given a set of n points in a two- several key applications that can bene t from
dimensional space, the purpose of clustering is large-scale geospatial clustering.
to group them into a number of sets based on
similarity measures and distance vectors. Clus-
tering is also useful for compression purpose Scientific Fundamentals
in large databases (Daschiel and Datcu 2005).
The term Unsupervised Learning is sometimes We can distinguish two main types of geospatial
used in some elds (i.e., in Machine Learning data: rasters and vectors. While the same data can
and Data Mining). Clustering will usually aim at usually be represented in both models, there are
creating homogeneous groups that are maximally key differences between the two models, result-
separable. It is a fundamental tool in Knowledge ing in speci c clustering algorithms for the two
Discovery and Data (KDD) mining when looking cases. In this section, we rst describe those two
for meaningful patterns (Alam et al. 2014). Ge- geospatial data types and their speci cities, and
ographical Knowledge Discovery (GKD) is seen then we detail key architectural differences be-
as an extension of KDD to the case of spatial data tween major distributed databases and computing
(Miller 2010). systems. Last, we present geospatial clustering
Geospatial data are data coupled with some algorithms designed to perform ef ciently on
information about the location where the data those distributed computing systems.
were collected or measured. For example, a pho-
tography may be associated with the location Geospatial Data Types
where it was shot or a temperature reading may The raster data model relies a discrete regular
be associated with the location of the sensor. grid of individual and usually square cells, where
Many geospatial applications now rely on each cell represents a spatial position and each
massive amounts of data that may require piece of data is associated with one or more cells.
Clustering of Geospatial Big Data in a Distributed Environment 237

Raster models are best suited to represent SQL for traditional relational databases (ISO
data that vary continuously, for example, aerial 2008).
and satellite imagery or elevation surfaces. The Recently, with the rapid increase of geospatial
spatial resolution of raster data depends on the archives and the availability of large temporal
resolution of the grid and is determined at the data stacks, research efforts have focused on spa-
acquisition phase. tiotemporal clustering in order to extract mean-
Data in raster format is basically a matrix of ingful spatiotemporal patterns (Kisilevich et al. C
data points on a regular grid. It comes in various 2010b). The direct approach is to simply add
forms depending on the source, satellite imagery, the time dimension in the distance metric be-
Digital Elevation Model, or grid data in mete- tween points. With the addition of a time di-
orology. Data volumes are generally signi cant mension on a single spatial entity, the notion
as the spatial coverage can be extensive com- of trajectory appears. The time measurements
bined sometimes with a high resolution (Miller can be regular or irregular. Geo-referenced time
2010). In addition, the number of raster dimen- series are common in meteorology such as sea
sions can be fairly high such as hyperspectral surface temperature. Moving spatial objects and
imagery. trajectories such as sea surface temperature, or to
The vectorial data model relies on geometric represent moving spatial clouds. Kisilevich et al.
shapes such as points, lines, or polygons that (2010a) distinguish three kinds of spatialtemporal
can be de ned by mathematical functions: points data according to the way they are collected:
are de ned by their coordinates, latitude, and movement, cellular networks, and environmen-
longitude typically in the 2D space. Altitude or tal. Movement datasets are typically associated
depth may also be used to de ne coordinates in with location-based services or sometimes video
the 3D space. Points can be joined together in surveillance applications (Kuijpers et al. 2008);
a speci c order to de ne a line. A closed line, patterns will be formed by grouping similar tra-
where the last point corresponds to the rst point jectories (Andrienko 2008). Environmental data
of the line, de nes a polygon. The vector model collected either from a network of sensors or
is most useful to represent data with discrete and satellite imagery are used in many applications
well-de ned boundaries such as country borders, (seismology, meteorology, remote sensing, etc.).
parcels, or streets. Various data structures can
be used to store vector data, in particular the Distributed Systems for Geospatial
spaghetti model, which simply describes the ob- Big Data
jects independently of the others, and more so- Many geospatial applications (see next section)
phisticated topological models where each object now rely on massive amounts of data that may
includes information about the elements it is re- require processing in real time. It is possible to
lated to. For example, using the spaghetti model, scale a single machine vertically to some extent
a polygon is de ned by the coordinates of its by adding extra storage, more memory, or a faster
boundary points; using a topological approach, a CPU. This approach is however usually costly, is
polygon can be described as a series of connected sensible to failures, and does not scale well with
lines; each line of the polygon was previously the number of users.
de ned in the model as a series of points.
The vector data model was standardized CAP Theorem
together by the Open Geospatial Consortium To overcome those limitations, it is possible to
(OGC) and the International Organization for scale a system horizontally, that is, combine sev-
Standardization (ISO 19125). This standard eral machines to form a cluster. The cluster is
called Simple Features de nes the speci cations leveraged by distributing data and processing
of vector data (coordinates, points, lines, and algorithms across the different machines. Dis-
polygons) (ISO 2004), as well as a number of tributed systems are typically used to process
spatial operators including an extension of the amounts of data that are typically too large for a
238 Clustering of Geospatial Big Data in a Distributed Environment

single machine. Besides the sheer amount of data, ture (Stonebraker et al. 1986) where all the nodes
distributed systems have many bene ts compared are independent, which facilitates horizontal scal-
to a monolithic system, including greater relia- ability.
bility, higher availability and better performance NoSQL systems can be broadly classi ed into
depending on the use-cases. However, distributed 5 families:
systems also present a number of challenges,
network latency and hardware failure to name a Key Value (KV): the simplest form of NoSQL
few. system, where data are represented as a list
In general, distributed systems should feature of pairs <key; value>, similar to a hash ta-
the following characteristics: ble. In many systems, this list is stored in
memory for better performance. This family
Consistency (C): all nodes in the cluster see includes in particular MemcacheDB, (http://
the same data; memcachedb.org/) Redis (http://redis.io/), and
Availability (A): all requests get a success or Amazon DynamoDB (DeCandia et al. 2007).
error noti cation, even if one or several nodes Column: those systems are used to log-
are unavailable (failure or planned mainte- ically organize <key; value> pairs into
nance); tables, conceptually similar to tables in
Partition tolerance (P): the system remains the relational model. Notable column-
fully functional, even if one or several nodes based systems include Google s proprietary
are unavailable. Bigtable (Chang et al. 2006), used in
particular to index massive amounts of
However, Brewer s CAP theorem (Brewer geospatial data from Google Earth (https://
2012; Gilbert and Lynch 2002) stipulates that a www.google.com/earth/), and its open-source
distributed system can present at most two of derivatives from the Apache foundation:
the three traits above. It is therefore possible to HBase (http://hbase.apache.org/), Cassandra
design CA, CP, or AP systems, and the choice of (http://cassandra.apache.org/), and Accumulo
an architecture over another largely depends on (https://accumulo.apache.org/).
the intended usage of the system. Document: this family is the most common
among NoSQL systems and is used to store
semi-structured data, typically in XML or
Distributed Geospatial Databases JSON format. The main systems of this
The fundamental principle to allow the process- type are MongoDB (http://www.mongodb.
ing of spatial big data in a reasonable time is org/), Apache CouchDB (http://couchdb.
parallelism, which is often not trivial and re- apache.org/), and ElasticSearch (http://www.
quires new algorithms that can be distributed elasticsearch.org/).
across the different machines of the cluster. For Graph: Graph systems can ef ciently repre-
example, while many geospatial systems rely sent strongly connected data. The most popu-
on PostgreSQL (http://www.postgresql.org/) and lar graph database is Neo4J (Webber 2012).
PostGIS (http://postgis.net/) to store geospatial Constraint: Constraint databases (Kanellakis
data, the traditional relational model of database et al. 1995) rely on constraint programming
is dif cult to distribute and parallelizing complex to represent geospatial data and reason about
SQL queries is challenging. New paradigms such them. While they can represent raster data,
as Not-Only-SQL (Cattell 2011) (NoSQL) have their capabilities are best leveraged with vec-
thus emerged. Most NoSQL systems relax the torial data sets. Unlike other NoSQL fam-
transactional properties of traditional databases, ilies, the constraint programming paradigm
which guarantee consistency, to favor high avail- inherently dif cult to parallelize and distribute
ability and partition tolerance (type AP). They ef ciently. As a result, current systems do not
usually implement a Shared-Nothing architec- adopt a Shared-Nothing architecture and do
Clustering of Geospatial Big Data in a Distributed Environment 239

not scale well. NoSQL constraint databases 2015), GeoCloud2 (http://www.mapcentia.com/


will thus not be considered in the remaining en/geocloud/), and HadoopGIS (Chen et al.
of this article. 2014). PostGIS users can also leverage Hadoop
and Cassandra by using BigSQL (http://www.
Most NoSQL databases do not natively sup- bigsql.org/se/) While those frameworks feature
port geospatial operations or raster data types. geospatial indexes and operators, they do not
Notable exceptions are ElasticSearch, MongoDB, provide tools for more advanced analysis using C
and Amazon DynamoDB, which now feature machine learning algorithms. It is however
a limited set of geospatial data types (points, possible to integrate a third-party machine
lines, and polygons), indexes (geohashes, quad- learning library for that purpose and bene t from
trees, or R-trees), and queries (bounding-box, the MapReduce paradigm to apply distributed
radius, or arbitrary shapes). However, a num- classi cation and clustering algorithms to
ber of NoSQL databases now support geospa- geospatial big data. One of the most popular
tial features using lightweight extensions similar libraries is Apache Mahout (http://mahout.
to PostGIS for PostgreSQL, for example, Geo- apache.org/). It includes a wide variety of
Couch (https://github.com/couchbase/geocouch/) algorithms covering supervised learning (random
for Apache CouchDB. forest, naive Bayes, hidden Markov models,
etc.), collaborative ltering (user and item-based
Distributed Computing and Clustering collaborative ltering, matrix factorization, etc.),
While they have been studied for many years, dimensionality reduction (Lanczos, stochastic,
horizontally distributed computing systems principle component analysis), natural language
became widely popular in 2004 when Google processing (latent Dirichlet allocation, TF-IDF
described the architecture of its distributed vectors), and clustering (k-means algorithm
le system GFS (Ghemawat et al. 2003) and and its fuzzy and streaming variants, spectral
the MapReduce (Dean and Ghemawat 2004) clustering).
paradigm they modernized to process and More recently, the MapReduce paradigm was
index billions of web pages. Apache Hadoop generalized to handle streaming data, which are
(http://hadoop.apache.org/) is the most popular very frequent in GIS applications (see next
open-source framework that implements and section), and several alternative frameworks
reproduces the distributed architecture developed supporting this use case are now available,
by Google. including Storm (https://storm.apache.org/)
The MapReduce paradigm relies on a form of and Apache Spark (Zaharia et al. 2010). The
divide-and-conquer technique to simplify some key feature of Spark is the introduction of
of the challenges of distributed computing: a Resilient Distributed Datasets (Zaharia et al.
complex task is rst decomposed into simpler 2012), a distributed memory abstraction that lets
tasks that are executed on the machines of the programmers perform in-memory computations
cluster (map), and the individual results are then on large clusters in a fault-tolerant manner. As
aggregated (reduce) to produce the desired out- a result, Spark can improve the performance of
put. This approach is very effective for batch MapReduce by two orders of magnitude, and its
processing or when subsets of the data can be popularity is now surpassing Hadoop (According
processed individually. For example, to process to Google Trends, Spark became more popular
geospatial data, geohashes or quadtrees can be than Hadoop in September 2014. Source: http://
leveraged to effectively divide a large dataset www.google.ca/trends/explore#q=ApacheSpark,
into smaller pieces that can be processed by the Apache_Hadoop). While Spark does not offer
machines in the cluster. any geospatial features, several geospatial
A number of geospatial frameworks for frameworks are making use of its distributed
Hadoop have been developed recently, in computing capabilities to process geospatial
particular SpatialHadoop (Eldawy and Mokbel big data in real time. Notable systems include
240 Clustering of Geospatial Big Data in a Distributed Environment

GeoMesa (Fox et al. 2013) to process vectorial number of independent points in a local neigh-
data and GeoTrellis (http://geotrellis.io/), more borhood which can clash with the independence
adequate for raster data. Both libraries provide assumptions of some clustering methods.
a geospatial extension of the standard Spark
RDDs. This key feature allows programmers to Density-based algorithms on the other hand
leverage other Spark features and apply them make use of the density of data points within a
to geospatial data. For example, an important region to discover the clusters. Unlike distance-
component of Spark is MLLib, a library for data based techniques, density-based algorithms can
analysis similar to Mahout. It implements several uncover clusters of various shapes but assume
key machine learning algorithms, including the that they are of similar density. The choice
k-means clustering algorithm and its streaming of a clustering algorithm therefore depends
variant to build data clusters in real time as new on the distribution of the data. Density-based
data feed the system. algorithms are easier to parallelize and more
scalable as they usually rely on local search
Distributed Clustering Algorithms techniques to identify dense regions in the
This section details properties of the algorithms feature space. One of the most popular density-
that are important for their parallelization in a based algorithm is DBSCAN (Ester et al.
distributed environment. It is however beyond 1996), and many distributed variants (He et al.
the scope of this article to extensively review all 2014; Noticewala and Vaghela 2014; Kisilevich
characteristics and use cases of each clustering et al. 2010a; Patwary et al. 2012) have been
algorithm. This article also does not aims at implemented using MapReduce and show
comparing the accuracy of those algorithms: all signi cant running-time improvements, even
clustering algorithms rely on some assumption when handling billions of data points. Another
on the distribution of the data (clusters of sim- popular example of density-based algorithm is
ilar shapes, similar densities, etc.), and the best DenClue (Hinneburg and Keim 1998) and its
method to cluster some data depends on the actual recent improvements (Hinneburg and Gabriel
distribution of the data. 2007), which is also suitable for segmenting
Clustering techniques can be categorized into and clustering raster data. More recently,
two broad categories: density-based and distance- Cludoop (Yu et al. 2015) was implemented
based algorithms. using MapReduce, and experiments on geospatial
data showed signi cant improvements in
Distance-based algorithms rely on the distance terms of performance and scalability over
between data points in the feature space to es- MR-DBSCAN (He et al. 2014).
tablish the clusters. Distance-based algorithms In addition, multiple dense regions can be
assume that clusters to nd are of similar shapes explored simultaneously by discretizing the input
and will perform well if this hypothesis is veri ed feature space into a nite number of grid cells
by the actual data. Those algorithms are also and applying the clustering method within each
well suited to cluster raster data types: While we cell. Existing algorithms include STING (Wang
can look at a raster dataset simply as a collec- et al. 1997), WaveCluster (Sheikholeslami et al.
tion of points, clustering techniques speci c to 1998; Jestes et al. 2011), and Clique (Agrawal
raster will try to take into account or enforce a et al. 1998). Parallel grid-based clustering further
certain notion of spatial homogeneity between divides cells into sub-cells, processes each sub-
neighborhood points. Spatial homogeneity that cell, and combines the individual results to
states that nearby points are more similar than far- build the nal clusters (Xiaoyun et al. 2009;
apart points is often captured via the estimation of Zhang et al. 2010). More recently, PatchWork
the local autocorrelation function (Hagenauer and (Gouineau et al. 2016) was implemented
Helbich 2013). High spatial correlation values using Apache Spark to distribute local density
between spatially close points often limit the computations and sowed signi cant performance
Clustering of Geospatial Big Data in a Distributed Environment 241

improvements over MapReduce implementations main factors that differentiate HAC methods. In
of DBScan. This approach is particularly useful single-linkage clustering, the link between two
to mine geospatial data as the cell grid can clusters is made by a single element pair of
be de ned using Hilbert or Z-order space the two elements (one in each cluster) that are
lling curves (Dai and Su 2003; Hong-bo et al. closest to each other. In complete-linkage clus-
2009) which are implemented in the distributed tering, the link between two clusters considers
GeoMesa framework (see the section above). all element pairs, and the distance between clus- C
It is also possible to further categorize clus- ters equals the distance between those two ele-
tering techniques depending on the output of the ments (one in each cluster) that are farthest away
algorithm (Hruschka et al. 2009): hierarchical or from each other. Other linkage methods such
partitioning. as UPGMA have been proposed and often used
in bioinformatics for phylogenetic studies. How-
Partitioning algorithms may de ne mutually ever, HAC algorithms rely on a global distance
exclusive hard clusters or soft clusters that al- matrix and are notoriously dif cult to parallelize.
low a certain degree of overlap measured by Furthermore, most of those algorithms have a
a membership function. Fuzzy clustering tech- computational complexity in O.N 2 log N / or
niques are typical of this kind of soft partitioning O.N 2 / with N the number of data points and
approach (Ehrlich et al. 1984). The most pop- do not scale well. Clustering of geospatial big
ular partitioning algorithm is k-means cluster- data using naive HAC algorithms will therefore
ing (MacQueen 1967; Lloyd et al. 1982): given quickly become problematic, and more sophis-
k clusters to nd, the technique determines the ticated methods have been proposed, including
centers of the clusters and updates the member- DISC (Jin et al. 2013), MR-VPSOM (Gao et al.
ship of each cluster iteratively using the distance 2010), etc. Those HAC methods can be ef -
to the center of the cluster. The approach can ciently distributed and were implemented using
be easily parallelized and distributed given that the MapReduce framework for batch computa-
computing the distance to the cluster centers has tions and NoSQL databases for the storage of
no dependencies. Several distributed implemen- large-distance matrices.
tations are available, in particular for MapReduce
and Spark as part of the Mahout and MLLib
libraries, respectively. For the same reason, dis-
Key Applications
tributed implementations of the streaming variant
of k-means are available to cluster spatiotemporal
This section presents various key use cases of
data in real time. Distributed implementations
clustering algorithms that facilitate the analysis
are also available for related algorithms such as
of spatial and spatiotemporal big data.
CLARANS (Ng et al. 2005).

Hierarchical algorithms represent input data as Internet of Things


a nested set of partitions, that is, as a tree also The Internet of Things (IoT) is one of the key ap-
called a dendrogram. Hierarchical techniques im- plications of spatial big data and machine learn-
plementing a divisive top-down strategy, where ing. It is a recent domain that emerged from the
a larger cluster is split into several subclusters, proliferation of devices that are connected to the
have been proposed. However, hierarchical ag- Internet or to other devices. Examples of such
glomerative clustering (HAC) is the most popular connected devices include smartphones, drones,
strategy. It is a bottom-up strategy that iteratively wearable electronics (e.g., clothes and watches),
groups together the two most similar clusters light bulbs, home appliances, etc. The IoT has
to form a new cluster. The computation of the a tremendous potential in numerous industries
similarity measurement between two clusters de- such as domotics, transportation, retail, health, or
pends on the linkage method and is one of the resource consumption.
242 Clustering of Geospatial Big Data in a Distributed Environment

Those devices are usually equipped with a engineering and planning of new infrastructures.
variety of sensors that can collect data in real For example, the city of Riyadh, Saudi Ara-
time or several times per minute. They also often bia, modeled the entire transportation network,
include a geolocation tracker or can be paired including constraints for transit time between
with a device that has a geolocation tracker. major activity centers. The goal was to ana-
Those connected devices thus generate very large lyze the network to highlight infrastructure de -
amounts of data with a strong spatiotemporal ciencies where usage exceeds capacity and pre-
component. Those devices rely on the integration dict future travel demand and potential conges-
of many spatiotemporal data sources, including tion areas under different scenarios and network
the geolocation of the user (from their connected topologies. Similar studies were conducted in
smartphone) and meteorological data. Jaipur, India (Gahlot et al. 2012), and Vancouver,
One of the key bene ts of the IoT is to enable Canada (Foth 2010), for the design of their public
machine-to-machine communication thereby transit network.
facilitating the automation of various tasks.
Automation relies on the spatiotemporal data Remote Sensing
collected by the devices, as well as rules to de ne Remote sensing is the science of obtaining infor-
triggers and actions. Web services to facilitate the mation about objects or areas from a distance,
implementation of those rules have emerged and typically from aircraft, boats, or satellites, with-
are gaining popularity as new devices become out making physical contact with the object and
connected. Some devices are now relying on thus in contrast to on-site observation. It refers
machine learning algorithms to learn how they to the use of aerial sensor technologies to detect
are being used. Connected home thermostats, for and classify objects on Earth (both on the surface
instance, are now capable of learning from the and in the atmosphere and oceans) by means
habits of the home owners to automatically de ne of propagated signals, including electromagnetic
rules and triggers to adjust the temperature. (RADAR, LiDAR, etc.), acoustic (SONAR, seis-
mograms, etc.), and geodetic (gravitational eld
Smart Cities measurement). The remote sensor can collect the
Another key application of large-scale geospatial signal passively emitted from a surface of interest
information systems is the modeling of public (e.g., a photometer measuring sunlight) or ac-
infrastructures for the development of smarter tively transmit a signal and collect its re ection
cities. Smart cities such as Barcelona, Stockholm, (e.g., RADARs in airplanes).
or Montreal heavily rely on digital technologies Remote sensing has an immense range of
to reduce resource consumption and to engage applications: agriculture (e.g., crop monitoring),
more effectively with their citizens. geology (terrain analysis, topography, etc.), hy-
An example of smart city projects, which also drology ( ood monitoring), environment (sea ice
relies on the Internet of Things, is the pub- coverage, biomass mapping, forestry, land us-
lic bicycle sharing systems such as Bixi (http:// age, etc.), and oceanography (oil spill detection,
www.publicbikesystem.com/) that are implanted tsunami detection, phytoplankton concentration,
in many large cities. In such a system, bikes etc.), to name a few.
are equipped with GPS trackers, allowing the For example, many oceanographic character-
operator to monitor the usage and adjust the istics (such as currents) vary over both time and
service accordingly (Wood et al. 2011). Studies space. At a xed location, an important spatial
of the usage of public bicycle sharing systems us- coordinate is the vertical axis through the wa-
ing spatial clustering algorithms (Austwick et al. ter column or pro le from the surface to
2013) were also conducted to reveal structures of the ocean bottom. An individual pro le can be
social communities in major cities. viewed as a vertical-line plot. A time series of
As another example, distributed computing pro les is best viewed by stacking sequential
systems could also be leveraged to optimize the pro les next to each other to form an image.
Clustering of Geospatial Big Data in a Distributed Environment 243

Remote sensors thus usually produce massive mation obtained from a group of people, can
amounts data of type raster, which may also be provide information similar to those obtainable
combined with data from other on-site sensors. from a sensor network. Twitter (https://twitter.
Distributed clustering algorithms (Lv et al. 2010) com/), one of the most popular social networks,
and color coding then reveal both the vertical allows people to share short messages in real
and temporal structure of the measured quantity, time, many of them are now associated with a
which depends on the sensor. geolocation. Using geospatial data mining and C
Ocean Networks Canada has developed sev- natural language processing techniques, it is thus
eral types of sensors, including an active zoo- possible to leverage Twitter as an effective data
plankton acoustic pro ler (ZAP). This sensor source for social sensing. This approach has been
emits an acoustic pulse through water; when it successfully applied to the identi cation of out-
encounters shes, suspended particulate, or zoo- breaking seismic events (Avvenuti et al. 2014):
plankton oating in the water, a part of the sound the system is able to detect earthquake within
is re ected back. By gating the re ected signals seconds of the event and to notify people far
in time, the vertical distribution of scatterers is earlier than of cial channels.
recorded and provides useful information about
marine life.
Future Directions
Medical Area and Disaster Monitoring
As transportation means have allowed to travel While data clustering has been extensively stud-
faster around the globe, more effective infection ied and proved to be tremendously useful in data
monitoring tools are needed to help in the control mining and knowledge discovery, clustering of
of disease outbreaks as illustrated by the recent geospatial big data presents several challenges to
H1N1 u and Ebola pandemics. The ability to be addressed in the future.
quickly analyze the evolution of the disease, and First, an increasing number of geospatial ap-
to discover patterns in the data, is critical to plications are now generating very large vol-
understand the root cause of the pandemics and umes of data. Those applications include remote
take appropriate measures to control an emerging sensing, drones, and the Internet of Things. The
disease situation and prevent their further spread. number of sensors that collect geocoded data is
Epidemiological data consist of spatiotempo- increasing exponentially, from 500 million de-
ral data describing the evolution of the disease vices in 2003 to an anticipated 50 billion sen-
in both space and time. Key challenges of epi- sors in the next 5 years, resulting in volumes
demiological data are the ability of analyzing new of data far exceeding the computing power of
trends and patterns in pseudo-real time as the a single machine. However, many clustering al-
disease spreads, as well as the recursive nature gorithms which were developed in the past two
of those patterns, that is, patterns from previous decades rely on an iterative approach, which is
pandemics are likely to give important clues for inherently dif cult to parallelize and distribute
the prediction of the evolution of the current ef ciently. While a few parallelized variants of
pandemics. Note that those challenges are not popular algorithms, such as k-means, have been
unique to pandemics, and other disasters that proposed and implemented using the MapReduce
have strong geospatial and temporal dimensions, paradigm, the variety of geospatial clustering
such as tornadoes, water ooding, or oil spills, algorithms that can be ef ciently distributed is
share the same characteristics. limited. As a result, scaling a system horizontally
Crisis detection and management can also be to accommodate larger amounts of data or to
facilitated by integrating traditional geospatial reduce the running time of the algorithms remains
data sources with alternative sources, in partic- challenging.
ular from social media. The underlying idea is In addition to the rapidly increasing num-
that social sensing, which is the set of infor- ber of sensors, a large portion of those sensors
244 Clustering of Geospatial Big Data in a Distributed Environment

can collect data several times per second or per management of data (SIGMOD 98), New York. ACM,
minute: geospatial datasets thus often present a pp 94 105
Alam S, Dobbie G, Koh YS, Riddle P, Rehman SU
strong temporal dimension. A growing number (2014) Research on particle swarm optimization based
of applications require real-time or near real- clustering: a systematic review of literature and tech-
time processing of those spatiotemporal data, for niques. Swarm Evol Comput 17(0):1 13
example, traf c optimization or crime prevention Andrienko G (2008) Spatio-temporal aggregation for vi-
sual analysis of movements. In: Proceedings of IEEE
in smart cities. For those applications, a batch- symposium on visual analytics science and technology
oriented approach to distributed computing such (VAST 2008), Columbus, pp 51 58
as the popular MapReduce paradigm is not suit- Austwick MZ, O Brien O, Strano E, Viana M (2013)
able because of latency issues. Alternative dis- The structure of spatial networks and communities in
bicycle sharing systems. PLoS ONE 8(9):e74685, 09
tributed computing frameworks such as Apache Avvenuti M, Cresci S, Marchetti A, Meletti C, Tesconi M
Storm or Spark signi cantly can handle continu- (2014) Ears (earthquake alert and report system): a
ous streams of data and reduce the latency of the real time decision support system for earthquake cri-
system. However, very few clustering algorithms sis management. In: Proceedings of the 20th ACM
SIGKDD international conference on knowledge dis-
suitable for geospatial applications have been covery and data mining (KDD 14), New York. ACM,
implemented for those frameworks so far. pp 1749 1758
Last, most popular distributed computing Brewer E (2012) Cap twelve years later: how the rules
frameworks, Hadoop and Spark in particular, have changed. Computer 45(2):23 29
Cattell R (2011) Scalable SQL and NoSQL data stores.
were developed only recently and are still under SIGMOD Rec 39(4):12 27
active development. The sets of features of Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA,
those systems are often not stable or mature Burrows M, Chandra T, Fikes A, Gruber RE (2006)
yet. In addition, with the notable exception of Bigtable: a distributed storage system for structured
data. In: Proceedings of the 7th symposium on oper-
Accumulo, most distributed systems have not yet ating systems design and implementation (OSDI 06),
emphasized development on data access control Berkeley. USENIX Association, pp 205 218
and privacy concerns, which can be critical for Chen X, Vo H, Aji A, Wang F (2014) High performance
geospatial applications. integrated spatial big data analytics. In: Proceedings
of the 3rd ACM SIGSPATIAL international workshop
on analytics for big geospatial data (BigSpatial 14),
New York. ACM, pp 11 14
Dai H-K, Su H-C (2003) Approximation and analyti-
Cross-References cal studies of inter-clustering performances of space-
lling curves. In: Banderier C, Krattenthaler C (eds)
 Big Data and Spatial Constraint Databases Discrete random walks (DRW 03), Paris, Sept 1 5
2003. Discrete mathematics and theoretical computer
 Distributed Geospatial Computing (DGC)
science proceedings, vol AC. DMTCS, pp 53 68
 Irregular Shaped Spatial Clusters: Detection Daschiel H, Datcu M (2005) Information mining in re-
and Inference mote sensing image archives: system evaluation. IEEE
 k-NN Search in Time-dependent Road Net- Trans Geosci Remote Sens 43(1):188 199
Dean J, Ghemawat S Mapreduce: simpli ed data process-
works
ing on large clusters. In: Proceedings of the 6th con-
 Movement Patterns in Spatio-Temporal Data ference on symposium on opearting systems design &
 Outlier Detection implementation (OSDI 04), vol 6, Berkeley. USENIX
 Outlier Detection, Spatial Association, pp 10 10
DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lak-
 Patterns, Complex
shman A, Pilchin A, Sivasubramanian S, Vosshall P,
Vogels W (2007) Dynamo: Amazon s highly available
key-value store. In: Proceedings of twenty- rst ACM
SIGOPS symposium on operating systems principles
References (SOSP 07), New York. ACM, pp 205 220
Ehrlich R, Bezdek JC, Fullh W (1984) Fcm: the
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) fuzzy c-means clustering algorithm. Comput Geosci
Automatic subspace clustering of high dimensional 10(2 3):191 203
data for data mining applications. In: Proceedings of Eldawy A, Mokbel MF (2015) Spatialhadoop: a mapre-
the 1998 ACM SIGMOD international conference on duce framework for spatial data. In: Proceedings of the
Clustering of Geospatial Big Data in a Distributed Environment 245

31st IEEE international conference on data engineer- ISO (2004) Geographic information simple feature
ing (ICDE), Seoul access Part 1: common architecture. ISO 19125
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density- 1:2004, International Organization for Standardiza-
based algorithm for discovering clusters in large spa- tion, Geneva
tial databases with noise. In: Simoudis E, Han J, ISO (2008) Geographic information simple feature
Fayyad UM (eds) Second international conference on access Part 2: SQL option. ISO 19125 2:2004, In-
knowledge discovery and data mining. AAAI Press, ternational Organization for Standardization, Geneva
Palo Alto, pp 226 231 Jestes J, Yi K, Li F (2011) Building wavelet histograms C
Foth N (2010) Long-term change around skytrain stations on large data in mapreduce. Proc VLDB Endow
in Vancouver, Canada: a demographic shift-share anal- 5(2):109 120
ysis. Geograph Bull 51:37 52 Jin C, Patwary MMA, Agrawal A, Hendrix W, Liao W-k,
Fox A, Eichelberger C, Hughes J, Lyon S (2013) Choudhary A (2013) Disc: a distributed single-linkage
Spatio-temporal indexing in non-relational distributed hierarchical clustering algorithm using mapreduce. In:
databases. In: 2013 IEEE international conference on Proceedings of the 4th international SC workshop on
big data, Santa Clara, pp 291 299 data intensive computing in the clouds, Denver. (http://
Gahlot V, Swami BL, Parida M, Kalla P (2012) User datasys.cs.iit.edu/events/DataCloud2013/)
oriented planning of bus rapid transit corridor in GIS Jin C, Liu R, Chen Z, Hendrix W, Agrawal A, Choudhary
environment. Int J Sustain Built Environ 1:102 109 A (2015) A scalable hierarchical clustering algorithm
Gao H, Jiang J, She L, Fu Y (2010) A new agglomera- using spark. In: IEEE rst international conference on
tive hierarchical clustering algorithm implementation big data computing service and applications, Redwood
based on the map reduce framework. J Digit Content City, pp 418 426
Technol Appl 4(3):95 100 Kanellakis PC, Kuper GM, Revesz P (1995) Constraint
Ghemawat S, Gobioff H, Leung S-T (2003) The google query languages. J Comput Syst Sci 51(1):26 52
le system. In: Proceedings of the 19th ACM sympo- Kisilevich S, Mansmann F, Keim D (2010a) P-dbscan: a
sium on operating systems principles (SOSP 03), New density based clustering algorithm for exploration and
York. ACM, pp 29 43 analysis of attractive areas using collections of geo-
Gilbert S, Lynch N (2002) Brewer s conjecture and the tagged photos. In: Proceedings of the 1st international
feasibility of consistent, available, partition-tolerant conference and exhibition on computing for geospatial
web services. SIGACT News 33(2):51 59 research &#38; application (COM.Geo 10), Wash-
Gouineau F, Landry T, Triplet T (2016) PatchWork: a scal- ington, DC. ACM, Springer, pp 1 4. (http://www.
able density-grid clustering algorithm. In: Proceedings springer.com/us/book/9780387098227)
of the 31st ACM symposium on applied computing, Kisilevich S, Mansmann F, Nanni M, Rinzivillo S (2010b)
data mining track, Pisa Spatio-temporal clustering. In: Maimon O, Rokach L
Hagenauer J, Helbich M (2013) Contextual neural gas for (eds) Data mining and knowledge discovery hand-
spatial clustering and analysis. Int J Geograph Inf Sci book. Springer, pp 855 874. http://www.springer.com/
27:251 266 us/book/9780387098227
He Y, Tan H, Luo W, Feng S, Fan J (2014) MR-DBSCAN: Kuijpers B, Alvares LO, Palma AT, Bogorny V (2008) A
a scalable mapreduce-based DBSCAN algorithm clustering-based approach for discovering interesting
for heavily skewed data. Front Comput Sci 8(1): places in trajectories. In: Proceedings of the 2008
83 99 ACM symposium on applied computing, Fortaleza,
Hinneburg A, Gabriel H-H (2007) Denclue 2.0: fast clus- pp 863 868
tering based on kernel density estimation. In: Proceed- Lloyd S (1982) Least squares quantization in PCM. IEEE
ings of the 7th international conference on intelligent Trans Inf Theory 28(2):129 137
data analysis (IDA 07). Springer, Berlin/Heidelberg, Lv Z, Hu Y, Zhong H, Wu J, Li B, Zhao H (2010) Parallel
pp 70 80 k-means clustering of remote sensing images based on
Hinneburg A, Keim DA (1998) An ef cient approach to MapReduce. In: Proceedings of the 2010 international
clustering in large multimedia databases with noise. conference on web information systems and mining
In: Agrawal R, Stolorz PE, Piatetsky-Shapiro G (eds) (WISM 10). Springer, Berlin/Heidelberg, pp 162 170
Proceedings of the fourth international conference on MacQueen J (1967) Some methods for classi cation and
knowledge discovery and data mining (KDD-98), New analysis of multivariate observations. In: Proceedings
York, 27 31 Aug 1998. AAAI Press, pp 58 65 of the 5th Berkeley symposium on mathematical statis-
Hong-bo X, Zhong-xiao H, Qi-Long H (2009) A clus- tics and probability, Berkeley/Los Angeles
tering algorithm based on grid partition of space- Miller HJ (2010) The data avalanche is here. Shouldn t we
lling curve. In: 2009 fourth international conference be digging? J Reg Sci 50:181 201
on internet computing for science and engineering Ng RT, Han J, Ieee Computer Society (2005) Clarans: a
(ICICSE), Harbin, pp 260 265 method for clustering objects for spatial data mining.
Hruschka ER, Campello RJGB, Freitas AA, de Carvalho IEEE Trans Knowl Data Eng 1003 1017
ACPLF (2009) A survey of evolutionary algorithms Noticewala M, Vaghela D (2014) Article: Mr-idbscan:
for clustering. IEEE Trans Syst Man Cybern Part C ef cient parallel incremental dbscan algorithm using
Appl Rev 39(2):133 155 mapreduce. Int J Comput Appl 93(4):13 18
246 Cognition

Patwary MA, Palsetia D, Agrawal A, Liao W-k, Manne F,


Choudhary A (2012) A new scalable parallel dbscan Cognitive Engineering
algorithm using the disjoint-set data structure. In: Pro-
ceedings of the international conference on high per-  Geospatial Semantic Web: Personalization
formance computing, networking, storage and analysis
(SC 12), Los Alamitos. IEEE Computer Society Press,
pp 62:1 62:11
Sheikholeslami G, Chatterjee S, Zhang A (1998)
Wavecluster: a multi-resolution clustering approach
for very large spatial databases. Proc Int Confer Very Cognitive Mapping
Large Data Bases 24:428 439
Stonebraker M (1986) The case for shared nothing. IEEE  Way nding, Landmarks
Database Eng Bull 9(1):4 9
Wang W, Yang J, Muntz RR (1997) Sting: a statistical
information grid approach to spatial data mining. In:
Proceedings of the 23rd international conference on
very large data bases (VLDB 97), San Francisco. Mor-
gan Kaufmann Publishers Inc, pp 186 195 Cognitive Psychology
Webber J (2012) A programmatic introduction to
neo4j. In: Proceedings of the 3rd annual conference  Way nding: Affordances and Agent Simula-
on systems, programming, and applications: soft-
ware for humanity (SPLASH 12), New York. ACM,
tion
pp 217 218
Wood J, O Brien O, Slingsby A, Dykes J (2011) Visual-
izing the dynamics of London s bicycle-hire scheme.
Cartogr Int J Geograph Inf Geovis 46(4):239 251 Collaborative Geographic
Xiaoyun C, Yi C, Xiaoli Q, Min Y, Yanshan H (2009) Information Systems
PGMCLU: a novel parallel grid-based clustering al-
gorithm for multi-density datasets. In: 1st IEEE sym-
posium on web society, 2009 (SWS 09), Lanzhou,  Geocollaboration
pp 166 171
Yu Y, Zhao J, Wang X, Wang Q, Zhang Y (2015) Cludoop:
an ef cient distributed density-based clustering for
big data using Hadoop. Int J Distrib Sensor Netw Collaborative Tracking
2015(2):1 13
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica
I (2010) Spark: cluster computing with working sets.  Feature Detection and Tracking in Support of
In: Proceedings of the 2Nd USENIX conference on hot GIS
topics in cloud computing (HotCloud 10), Berkeley.
USENIX Association, pp 10 10
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, Mc-
Cauley M, Franklin MJ, Shenker S, Stoica I (2012)
Resilient distributed datasets: a fault-tolerant abstrac- Collocation Pattern
tion for in-memory cluster computing. In: Proceedings
of the 9th USENIX conference on networked sys-  Co-location Pattern
tems design and implementation (NSDI 12), Berkeley.
USENIX Association, pp 2 2
Zhang H, Zhou Y, Li J, Wang X, Yan B (2010) Ana-
lyze the wild birds migration tracks by MPI-based
parallel clustering algorithm. In: Proceedings of the Collocation, Spatiotemporal
6th international conference on advanced data min-
ing and applications: Part I (ADMA 10). Springer,
 Movement Patterns in Spatio-Temporal Data
Berlin/Heidelberg, pp 383 393

Cognition Co-location

 Hierarchies and Level of Detail  Patterns, Complex


Co-location Pattern Discovery 247

neighborhood. An example of such a rule is


Co-location Mining if a water reservoir is contaminated, then people
who live in nearby houses have high probability
 Co-location Pattern Discovery of having a stomach disease. The interestingness
of a co-location pattern is quantized by two
measures: the prevalence and the con dence.
Co-location patterns can be mined from large C
Co-location Pattern
spatial databases with the use of algorithms that
combine (multi-way) spatial join algorithms with
Nikos Mamoulis
spatial association rule mining techniques.
Department of Computer Science, University of
Hong Kong, Hong Kong, China

Cross-References
Synonyms
 Patterns, Complex
Collocation pattern; Spatial association pattern  Retrieval Algorithms, Spatial

Definition
Co-location Pattern Discovery
A (spatial) co-location pattern P can be modeled
by an undirected connected graph where each Wei Hu
node corresponds to a nonspatial feature and each International Business Machines Corp.,
edge corresponds to a neighborhood relationship Rochester, MN, USA
between the corresponding features. For exam-
ple, consider a pattern with three nodes labeled
timetabling, weather, and ticketing and two Synonyms
edges connecting timetabling with weather
and timetabling with ticketing. An instance Co-location mining; Co-location rule discovery;
of a pattern P is a set of objects that satisfy Co-location rule nding; Co-location rule min-
the unary (feature) and binary (neighborhood) ing; Co-occurrence; Spatial association; Spatial
constraints speci ed by the pattern s graph. An association analysis
instance of an example pattern is a set fo1 , o2 ,
o3 g of three spatial locations where label(o1 ) D
timetabling, label(o2 ) D weather, label(o3 )
Definition
D ticketing (unary constraints), and dist(o1 , o2 )
', dist(o1 , o3 ) ' (spatial binary constraints).
Spatial co-location rule discovery or spatial co-
In general, there may be an arbitrary spatial (or
location pattern discovery is the process that
spatiotemporal) constraint speci ed at each edge
identi es spatial co-location patterns from large
of a pattern graph (e.g., topological, distance,
spatial datasets with a large number of Boolean
direction, and time-difference constraints).
spatial features.

Main Text
Historical Background
Co-location patterns are used to derive co-
location rules that associate the existence The co-location pattern and rule discovery are
of nonspatial features in the same spatial part of the spatial data mining process. The
248 Co-location Pattern Discovery

differences between spatial data mining and other Boolean spatial features in the neighbor-
classical data mining are mainly related to data hood. Figure 1 also provides good examples of
input, statistical foundation, output patterns, and spatial co-location rules. As can be seen, rule
computational process. The research accomplish- Nile crocodiles ! Egyptian plover can predict
ments in this eld are primarily focused on the the presence of Egyptian plover birds in the
output pattern category, speci cally the predictive same areas where Nile crocodiles live. A dataset
models, spatial outliers, spatial co-location rules, consisting of several different Boolean spatial
and clusters (Shekhar et al. 2003). feature instances is marked on the space. Each
The spatial pattern recognition research pre- type of Boolean spatial features is distinguished
sented here, which is focused on co-location, is by a distinct representation shape. A careful ex-
also most commonly referred to as the spatial co- amination reveals two co-location patterns: ( + ,
location pattern discovery and co-location rule x ) and ( o , * ) (Shekhar et al. 2003). Spatial
discovery. To understand the concepts of spatial co-location rules can be further classi ed into
co-location pattern discovery and rule discovery, popular rules and con dent rules, according to the
we will have to rst examine a few basic concepts frequency of cases showing in the dataset. The
in spatial data mining. major concern here is the difference of dealing
The rst word to be de ned is Boolean spatial with rare events and popular events. Usually,
features. Boolean spatial features are geographic rare events are ignored, and only the popular co-
object types. They either are absent or present location rules are mined. So if there is a need
regarding different locations within the domain of to identify the con dent co-location rules, then
a two-dimensional or higher (three)-dimensional special handling and a different approach must be
metric space such as the surface of the earth taken to reach them (Huang et al. 2003).
(Shekhar et al. 2003). Some examples of Boolean Spatial co-location rule discovery is the pro-
spatial features are categorizations such as plant cess that identi es spatial co-location patterns
species, animal species, and types of roads, can- from large spatial datasets with a large num-
cers, crimes, and businesses. ber of Boolean spatial features (Shekhar et al.
The next concept relates to co-location 2003). The problems of spatial co-location rule
patterns and rules. Spatial co-location patterns discovery are similar to the spatial association
represent the subsets of Boolean spatial features rule mining problem, which identi es the inter-
whose instances are often located in close relationships or associations among a number of
geographic proximity (Shekhar et al. 2003). It spatial datasets. The difference between the two
resembles frequent patterns in many aspects. has to do with the concept of transactions.
Good examples are symbiotic species. The An example of association rule discovery can
Nile crocodile and Egyptian plover in ecology be seen with market basket datasets, in which
prediction (Fig. 1) are one good illustration of a transactions represent sets of merchandise item
point spatial co-location pattern representation. categories purchased altogether by customers
Frontage roads and highways (Fig. 2) in speci ed (Shekhar et al. 2003). The association rules are
metropolitan road maps could be used to derived from all the associations in the data
demonstrate line-string co-location patterns. with support values that exceed a user-de ned
Examples of various categories of spatial co- threshold. In this example, we can de ne in
location patterns are given in Table 1. We can detail the process of mining association rules as
see that the domains of co-location patterns are to identify frequent item sets in order to plan
distributed in many interesting elds of science store layouts or marketing campaigns as a part of
research and daily services, which proves their related business intelligence analysis.
great usefulness and importance. On the other hand, in a spatial co-location
Spatial co-location rules are models to as- rule discovery problem, we usually see that
sociate the presence of known Boolean spatial the transactions are not explicit (Shekhar et al.
features referencing the existence of instances of 2003). There are no dependencies among the
Co-location Pattern Discovery 249

Co-location Pattern Co-location Patterns - Sample Data


Discovery, Fig. 1 80
Illustration of point spatial
co-location patterns. 70
Shapes represent different
spatial feature types. 60
Spatial features in sets
{ + , x } and { o , * }
tend to be located together
50
C
(Shekhar et al. 2003) 40

30

20

10

0
0 10 20 30 40 50 60 70 80

Co-location Pattern Discovery, Fig. 2 Illustration of line-string co-location patterns. Highways, e.g., Hwy100, and
frontage roads, e.g., Normandale Road, are co-located (Shekhar et al. 2003)

transactions analyzed in market basket data, distributed into a continuous space domain and
because the transaction data do not share thus share varied spatial types of relationships,
instances of merchandise item categories but such as overlap, neighbor, etc., with each other.
rather instances of Boolean spatial features Although spatial co-location patterns and co-
instead. These Boolean spatial features are location rules differ slightly, according to the
250 Co-location Pattern Discovery

Co-location Pattern
Discovery, Table 1 Domains Example features Example co-location patterns
Examples of co-location Ecology Species Nile crocodile, Egyptian
patterns (Xiong et al. 2004) plover
Earth science Climate and distur- Wild re, hot, dry, lightning
bance events
Economics Industry types Suppliers, producers, consul-
tants
Epidemiology Disease types and en- West Nile disease, stagnant
vironmental events water sources, dead birds,
mosquitoes
Location-based service Service type requests Tow, police, ambulance
Weather Fronts, precipitation Cold front, warm front, snow
fall
Transportation Delivery service US Postal Service, UPS,
tracks newspaper delivery

previous de nitions, it can be said that spatial Co-location Pattern Discovery, Table 2 Boolean fea-
co-location pattern discovery is merely another ture A and the de ned transactions related to B and C
phrasing for spatial co-location rule nding. Ba- Instance of A Transaction
sically, the two processes are the same and can (0,0) ;
be used in place of each other. Both are used to (2,3) {B,C}
nd the frequent co-occurrences among Boolean (3,1) {C}
spatial features from given datasets. (5,5) ;

Scientific Fundamentals Here we can use either the Euclidean distance or


the Manhattan distance, depending on the type
According to one categorization, there are three of application domain we are investigating. Then
methods of nding co-location patterns in spatial with the corresponding de nition of the distance
datasets, depending on the focus of the search. between the features, we could declare them to be
These three categories are the reference feature- neighbors. Thus by considering A, all the other
centric model, the window-centric model, and the Boolean spatial features surrounding A are used
event-centric model (Shekhar and Huang 2001). as transactions. Once the data is materialized as
The reference feature-centric model is relevant above, the support and con dence are computed
to application domains that focus on a speci c and used to measure the degree of interestingness
Boolean spatial feature such as cancer. The goal (Shekhar and Huang 2001). Table 2 shows an
of the scientists is to nd the co-location pat- instance of data, and Fig. 3 illustrates the layout
terns between this Boolean spatial feature and and process we described with detailed data from
other task-related features such as asbestos or Table 2.
other substances. This model uses the concept The second method of nding co-location pat-
of neighborhood relationship to materialize the terns is a window-centric model or data parti-
transactions from datasets. Measurements of sup- tioning model. The process de nes proper sized
port and con dence can be used to show the windows and then enumerates all possible win-
degree of interestingness (Shekhar and Huang dows as transactions. Each window is actually a
2001). For example, if there are two features, A partition of the whole space, and the focus is on
and B, and if A is the relevant feature, then B the local co-location patterns, which are bounded
is said to be close to A if B is a neighbor of A. by the window boundaries. Patterns across mul-
But how can we tell that B is a neighbor of A? tiple windows are of no concern. Each window
Co-location Pattern Discovery 251

Co-location Pattern Discovery, Fig. 3 Transactions are Co-location Pattern Discovery, Fig. 4 Example of
de ned around instances of feature A, relevant to B and C window-centric model (Shekhar and Huang 2001)
(Shekhar and Huang 2001)

is a transaction, and the process tries to nd is an instance of feature fj (Shekhar and Huang
which features appear together the most number 2001). The participation ratio and participation
of times in these transactions, alias, and windows, index are two measures which replace support
i.e., using support and con dence measurements and con dence here. The participation ratio is the
(Shekhar and Huang 2001). number of row instances of co-location C divided
Figure 4 shows the processing with window by number of instances of Fi. Figure 5 shows an
partitions on data similar to that shown in Fig. 3. example of this model.
As this is a local model, even though here the A Table 3 shows a summary of the interest mea-
and C could have been a pattern, these features sures for the three different models.
are completely ignored since they are not within With different models to investigate different
a single window. problems of various application domains, there
The third modeling method is the event- are also multiple algorithms used in the discov-
centric model. This model is mostly related to ery process. Approaches to discover co-location
ecology- speci c domains where scientists want rules can be categorized into two classes, spatial
to investigate speci c events such as drought, statistics and data mining approaches.
El Nino, etc. The goal of this model is to nd Spatial statistics-based approaches use mea-
the subsets of spatial features likely to occur in sures of spatial correlation to characterize the
the neighborhood of a given event type. One relationship between different types of spatial
of the assumptions of this algorithm is that the features. Measures of spatial correlation include
neighbors are re exive, that is, interchangeable. the cross-K function with Monte Carlo simula-
For example, if A is a neighbor of B, then B is tion, mean nearest-neighbor distance, and spatial
also a neighbor of A. regression models. Computing spatial correlation
The event centric de nes key concepts as fol- measures for all possible co-location patterns can
lows: A neighborhood of l is a set of locations be computationally expensive due to the expo-
L = {l1, l2, l3,. . . ,lk} such that li is a neighbor of nential number of candidate subsets extracted
l (Shekhar and Huang 2001). I = {I1,. . . ,Ik} is a from a large collection of spatial Boolean features
row instance of a co-location C ={f1,. . . ,fk} if Ij that we are interested in Huang et al. (2004).
252 Co-location Pattern Discovery

Legend:
T.i represents instance i with feature type T t7
B.5
Lines between instance represents neighbor relationships A B C
C.3
A.4 3 4 1
A.2 e .2
C.2 + k=
A.3 B.3

A.1 candidate co-locations of size 3


C.1 A B C t4 t5
B.4
B.1 B.2 d

t1 t2 t3 t4 t5 t6 table Id
A A B A C B C co-location
B C candidate co-locations of size 2
1 1 1 1 2 2 1 row instance
1 1 A B t1 t2 2 4 3 1 4 1
2 2 2 5 3 table instance
3 3 3 A C t1 t3 3 4 .5
4 4 B C t2 t3 .6 participation index
1 .4
5 will be pruned if min-prevalence
1
a 1 b c set to .5 and algorithm stops

k=1 k=2

Co-location Pattern Discovery, Fig. 5 Event-centric model example (Huang et al. 2004)

Co-location Pattern Discovery, Table 3 Interest measures for different models (Shekhar et al. 2003)
Interest measures for C1 ! C2
Model Items Transactions de ned by Prevalence Conditional probability
Reference feature Predicates on refer- Instances of reference Fraction of instance Pr(C2 is true for an instance
centric ence and relevant fea- feature C1 and C2 in- of reference feature of reference features given
tures volved with with C1 [ C2 C1 is true for that instance of
reference feature)
Data partitioning Boolean feature types A partitioning of spatial Fraction of Pr(C2 in a partition given C1
dataset partitions with C1 [ in that partition)
C2
Event centric Boolean feature types Neighborhoods of in- Participation index Pr(C2 in a neighborhood of
stances of feature types of C1 [ C2 C1 /

Data mining approaches can be further di- algorithm can be used just as in the association
vided into two categories: the clustering-based rule discovery process. Transactions over space
map overlay approach and the association rule- can be de ned by a reference-centric model as
based approaches. discussed previously, which enables the deriva-
Clustering-based map overlay approach re- tion of association rules using the a priori al-
gards every spatial attribute as a map layer and gorithm. There are few major shortcomings of
considers spatial clusters (regions) of point data this approach: generalization of this paradigm is
in each layer as candidates for mining the as- nontrivial in the case where no reference fea-
sociations among them. Association rule-based ture is speci ed, and duplicate counts for many
approaches again can be further divided into two candidate associations may result when de ning
categories: the transaction-based approaches and transactions around locations of instances of all
the distance-based approaches. features.
Transaction-based approaches aim to de ne Distance-based approaches are relatively
transactions over space such that an a priori-like novel. A couple of different approaches have
Co-location Pattern Discovery 253

been presented by different research groups. A nal example in our list of applications is
One proposes the participation index as the traf c control or transportation management.
prevalence measure, which possesses a desirable With the knowledge of co-location rules
anti-monotone property (Huang et al. 2003). discovered from existing datasets, better
Thus, a unique subset of co-location patterns can supervising and management could be carried out
be speci ed with a threshold on the participation to make transportation systems run in the most
index without consideration of detailed algorithm ef cient way, as well as to gain clearer foresights C
applied such as the order of examination of of future road network development and
instances of a co-location. Another advantage of expansion.
using the participation index is that it can de ne There are many more interesting elds re-
the correctness and completeness of co-location lated to the spatial co-location application do-
mining algorithms. main, such as disease research, economics, earth
science, etc. (Shekhar et al. 2002). With the
availability of more spatial data from different
areas, we can expect more research and studies
Key Applications to bene t from this technology.

The problem of mining spatial co-location pat-


terns can be applied to many useful science re-
search or public interest domains. Future Directions
As shown in Table 1, one of the top appli-
cation domains is location-based services. With Spatial co-location pattern discovery and co-
advances such as GPS and mobile communica- location rule mining are very important, even
tion devices, many location-based services have essential tasks of a spatial data mining systems
been introduced to ful ll users increasing de- (SDMS), which extract previously unknown but
sires for convenience. Many of the services re- interesting spatial patterns and relationships from
quested by service subscribers from their mobile large spatial datasets. These methods have the
devices see bene t from the support of spatial potential to serve multiple application domains
co-location pattern mining. The location-based and have a wide impact on many scienti c
service provider needs to know which requests research elds and services. Current approaches
are submitted frequently together and which are to mine useful co-location patterns are still
located in spatial proximity (Xiong et al. 2004). evolving with new studies carried out in the
Ecology is another good eld to apply this eld. We can expect extended development of
technology because ecologists are very interested such techniques to improve the algorithms and
in nding frequent co-occurrences among spatial ef ciency in future studies.
features, such as drought, EI Nino, substantial A rst potential direction of research is to
increase/drop in vegetation, and extremely high nd more ef cient algorithms against extended
precipitation (Xiong et al. 2004). spatial data types other than points, such as line
A third important domain whose future can- segments and polygons (Huang et al. 2004).
not be imagined without spatial data mining is Second direction is that as only Boolean spa-
weather services. The identi cation of correct tial features are mined here, the future studies
and valuable co-location patterns or rules from can extend the co-location mining framework
huge amounts of collected historical data can to handle categorical and continuous features
be expected to lead to better predictions about that also exist in the real world (Huang et al.
incoming weather, deeper insights into environ- 2004).
mental impacts on weather patterns, and sugges- Third potential extension can be on the notion
tions of possible effective steps to prevent the of co-location pattern to be de-colocation pattern
future deterioration of the environment. or co-incidence pattern (Xiong et al. 2004).
254 Co-location Patterns

References Definition

Huang Y, Xiong H, Shekhar S, Pei J (2003) Mining con- A spatial co-location pattern associates the co-
dent colocation rules without a support threshold. In:
Proceedings of the 18th ACM symposium on applied
existence of a set of non-spatial features in
computing (ACM SAC), Melbourne a spatial neighborhood. For example, a co-
Huang Y, Shekhar S, Xiong H (2004) Discovering co- location pattern can associate contaminated water
location patterns from spatial datasets: a general ap- reservoirs with a certain disease within 5 km
proach. IEEE Trans Knowl Data Eng (TKDE) 16(12)
December
distance from them. For a concrete de nition
Shekhar S, Huang Y (2001) Discovering spatial co- of the problem, consider number n of spatial
location patterns: a summary of results. In: Proceed- datasets R1 ; R2 ; : : : ; Rn , such that each Ri
ings of 7th international symposium on spatial and contains objects that have a common non-spatial
temporal databases (SSTD), Redondo Beach
Shekhar S, Schrater P, Raju W, Wu W (2002) Spatial
feature fi . For instance, R1 may store locations
contextual classi cation and prediction models for of water sources, R2 may store locations of ap-
mining geospatial data. IEEE Trans Multimed pearing disease symptoms, etc. Given a distance
Shekhar S, Zhang P, Huang Y, Vatsavai RR (2003) Trends threshold ", two objects on the map (independent
in spatial data mining. In: Kargupta H, Joshi A,
Sivakumar K, Yesha Y (eds) Data mining: next gen-
of their feature labels) are neighbors if their
eration challenges and future directions. AAAI/MIT distance is at most ". We can de ne a co-
Press, Cambridge, MA location pattern P by an undirected connected
Xiong H, Shekhar S, Huang Y, Kumar V, Ma X, Yoo J graph where each node corresponds to a feature
(2004) A framework for discovering co-location pat-
terns in data sets with extended spatial objects. In: and each edge corresponds to a neighborhood
Proceedings of SIAM international conference on data relationship between the corresponding features.
mining (SDM) Figure 1 shows examples of a star pattern, a
clique pattern and a generic one. A variable
labeled with feature fi is only allowed to take
instances of that feature as values. Variable pairs
that should satisfy a spatial relationship (i.e.,
Co-location Patterns constraint) in a valid pattern instance are linked
by an edge. In the representations of Fig. 1, we
 Co-location Patterns, Interestingness Mea- assume that there is a single constraint type (e.g.,
sures close to), however in the general case, any spatial
relationship could label each edge. Moreover,
in the general case, a feature can label more
than two variables. Patterns with more than one
Co-location Patterns, Algorithms variable of the same label can be used to describe
spatial autocorrelations on a map.
Nikos Mamoulis Interestingness measures (Huang et al. 2003;
Department of Computer Science, University of Shekhar and Huang 2001) for co-location pat-
Hong Kong, Hong Kong, China terns express the statistical signi cance of their
instances. They can assist the derivation of useful
rules that associate the instances of the features.

Synonyms Historical Background

Association; Co-occurrence; Mining collocation The problem of mining association rules based on
patterns; Mining spatial association patterns; spatial relationships (e.g., adjacency, proximity,
Participation index; Participation ratio; Refer- etc.) of events or objects was rst discussed in
ence-feature centric Koperski and Han (1995). The spatial data are
Co-location Patterns, Algorithms 255

Co-location Patterns, a b c
Algorithms, Fig. 1 Three
pattern representations. c a b a b
(a) Star. (b) Clique. b
a
(c) Generic

d d c d c
C
star clique generic

b1 jects that have a particular non-spatial feature


Œ fi . Given a feature fi , we can de ne a trans-
a1 actional database as follows. For each object
a
c1 c2 oi in Ri a spatial query is issued to derive a
c3 b set of features I D ffj W fj ⁄ fi ^ 9 oj 2
b2 Rj .dist.oi ; oj / "/g. The collection of all fea-
c ture sets I for each object in Ri de nes a transac-
a2 tional table Ti . Ti is then mined using some item-
sets mining method (e.g., Agrawal and Skrikant
Co-location Patterns, Algorithms, Fig. 2 Mining 1994; Zaki and Gouda 2003). The frequent fea-
example
ture sets I in this table, according to a minimum
support value, and can be used to de ne rules of
converted to transactional data according to a the form:
reference feature model. Later, the research in-
terest shifted toward mining co-location patterns, .label.o/ Dfi / ) .o close to some oj 2 Rj ;
which are feature centric sets with instances that 8fj 2 I /:
are located in the same neighborhood (Huang
et al. 2003; Morimoto 2001; Munro et al. 2003; The support of a feature set I de nes the
Shekhar and Huang 2001; Zhang et al. 2004). con dence of the corresponding rule. For ex-
Huang et al. (2003), Morimoto (2001), Munro ample, consider the three object-sets shown in
et al. (2003), and Shekhar and Huang (2001) Fig. 2. The lines indicate object pairs within a
focused on patterns where the closeness relation- distance " from each other. The shapes indi-
ships between features form a complete graph cate different features. Assume that one must
(i.e., every pair of features should be close to extract rules having feature a on their left-hand
each other in a pattern), whereas Zhang et al. side. In other words, nd features that occur
(2004) extended this model to feature-sets with frequently close to feature a. For each instance
closeness relationships between arbitrary pairs of a, generate an itemset; a1 generates fb; cg
and proposed an ef cient algorithm for mining because there is at least one instance of b (e.g.,
such patterns (which is herein reviewed). Yang b1 and b2 / and one instance of c (e.g., c1 / close
(2005) extended the concept of co-locations for to a1 . Similarly, a2 generates itemset fbg (due
objects with extend and shape, whereas Wang to b2 /. Let 75% be the minimum con dence. One
et al. (2005) studied the mining of co-location rst discovers frequent itemsets (with minimum
patterns that involve spatio-temporal topological support 75%) in Ta D hfb; cg; fbgi, which gives
constraints. us a sole itemset fbg. In turn, one can generate
the rule
Scientific Fundamentals
.label.o/ D a/ ) .o close to oj with label.oj /
Consider a number n of spatial datasets R1 ,
R2 , : : :, Rn , such that each Ri contains all ob- D b/;
256 Co-location Patterns, Algorithms

with con dence 100%. For simplicity, in the rest A co-location clique pattern P of length k is
of the discussion, fi ) I will be used to denote described by a set of features ff1 ; f2 ; : : : ; fk g.
rules that associate instances of feature fi with A valid instance of P is a set of objects
instances of feature sets I , fi I , within its fo1 ; o2 ; : : : ; ok g W .81 i k; oi 2 Ri / ^ .81
proximity. For example, the rule above can be i <j k; dist.oi ; oj / "/. In other words, all
expressed by a ) fbg. The mining process for pairs of objects in a valid pattern instance should
feature a can be repeated for the other features be close to each other, or else the closeness
(e.g., b and c) to discover rules having them relationships between the objects should form
on their left side (e.g., one can discover rule a clique graph. Consider again Fig. 2 and the
b ) fa; cg with conf. 100%). Note that the pattern P D fa; b; cg. fa1 ; b1 ; c1 g is an instance
features on the right hand side of the rules are not of P , but fa1 ; b2 ; c2 g is not.
required to be close to each other. For example, Huang et al. (2003) and Shekhar and Huang
rule b ) fa; cg does not imply that for each b (2001) de ne some useful measures that charac-
the nearby instances of a and c are close to each terize the interestingness of co-location patterns.
other. In Fig. 2, observe that although b2 is close The rst is the participation ratio pr.fi ; P / of a
to instances a1 and a2 of a and instance c2 of c, feature fi in pattern P , which is de ned by the
c2 is neither close to a1 nor to a2 . following equation:

# instances of fi in any instance of P


pr.fi ; P / D : (1)
# instances of fi

Using this measure, one can de ne co-location of features. In addition, prevalence is monotonic;
rules that associate features with the existences if P P 0 , then prev.P / prev.P 0 /. For
of other features in their neighborhood. In example, since prev.fb; cg/ D 2=3, we know
other words, one can de ne rules of the form that prev.fa; b; cg/ 2=3. This implies that
.label.o/ D fi / ) (o participates in an instance the a priori property holds for the prevalence
of P with con dence pr(fi ,P )). These rules are of patterns and algorithms like generalized
similar to the ones de ned in Koperski and Han (Agrawal and Skrikant 1994) can be used to
(1995); the difference here is that there should mine them in a level-wise manner (Shekhar and
be neighborhood relationships between all pairs Huang 2001).
of features on the right hand side of the rule. For Finally, the con dence conf(P ) of a pattern P
example, pr.b; fa; b; cg/ D 0:5 implies that 50% is de ned by the following equation:
of the instances of b (i.e., only b1 / participate in
some instance of pattern a; b; c (i.e., fa1 ; b1 ; c1 g).
conf .P / D maxfpr.fi ; P /; fi 2 P g: (3)
The prevalence prev(P ) of a pattern P is
de ned by the following equation: For example, conf .b; c/ D 1 since
pr.b; fb; cg/ D 1 and pr.c; fb; cg/ D 2=3. The
prev.P / D minfpr.fi ; P /; fi 2 P g: (2) con dence captures the ability of the pattern
to derive co-location rules using the participation
For example, prev.fb; cg/ D 2=3 since ratio. If P is con dent with respect to a minimum
pr.b; fb; cg/ D 1 and pr.c; fb; cg/ D 2=3. The con dence threshold, then it can derive at least
prevalence captures the minimum probability that one co-location rule (for the attribute fi with
whenever an instance of some fi 2 P appears pr.fi ; P / D conf .P /). In Fig. 2, conf .fb; cg/D1
on the map, it will then participate in an instance implies that we can nd one feature in fb; cg
of P . Thus, it can be used to characterize the (i.e., b), every instance of which participates in
strength of the pattern in implying co-locations an instance of fb; cg. Given a collection of spatial
Co-location Patterns, Algorithms 257

Co-location Patterns,
Algorithms, Fig. 3 A 1 2 3 a
regular grid and some
objects b

4 5 6 Œ C
b1
a1

c1

7 8 9

objects characterized by different features, a an example. The space is partitioned into 3 3


minimum prevalence threshold min_prev, and cells. Object a1 (which belongs to dataset Ra ,
a minimum con dence threshold min_conf , a corresponding to feature a) is hashed to exactly
data analyst could be interested in discovering one partition (corresponding to the central cell
prevalent and/or con dent patterns and the co- C5 /. Object b1 is hashed to two partitions (C2
location rules derived by them. The con dence and C5 /. Finally, object c1 is hashed into four
of a co-location rule between two patterns, partitions (C4 , C5 , C7 , and C8 ).
P1 ! P2 , P1 \ P2 D ¿, can be de ned by The mining phase employs a main memory
the conditional probability that an instance of algorithm to ef ciently nd the association rules
P1 participates in some instance of P1 [ P2 in each cell. This method is in fact a multi-
(given that P1 [ P2 is prevalent with respect to way main memory spatial join algorithm based
min_prev) (Shekhar and Huang 2001). on the plane sweep technique (Brinkhoff et al.
It is now discussed how co-location patterns 1993; Mamoulis and Papadias 2001; Preparata
are mined from a spatial database. Star-like pat- and Shamos 1985). The synch_sweep procedure
terns are the rst are of focus (as seen in Fig. 1a). extends the plane sweep technique used for pair-
As an example, consider the rule: given a pub, wise joins to (i) apply for multiple inputs and (ii)
there is a restaurant and a snack bar within 100 for each instance of one input, nd if there is at
meters from it with con dence 60% . Assume least one instance from other inputs close to it.
that the input is n datasets R1 ; R2 ; : : : ; Rn , such synch_sweep takes a feature fi as input and
that for each i , Ri stores instances of feature fi . a set of partitions of all feature instances hashed
The mining algorithm, a high-level description into the same cell C , and nds the maximal
of which is shown in Fig. 3, operates in two patterns each feature instance is included directly
phases; the hashing phase and the mining phase. (without computing their sub-patterns rst). The
During the hashing phase, each dataset Ri is read objects in the partition RiC (corresponding to
and the instances of the corresponding feature are feature fi ) in cell C are scanned in sorted order of
spatially partitioned with the help of a regular their x-value. For each object oi , we initialize the
grid. Each object is extended by the distance maximal star pattern L where oi can participate
threshold " to form a disk and hashed into the as L s center. Then for each other feature, we
partitions intersected by this disk. Figure 4 shows sweep a vertical line along the x-axis to nd
258 Co-location Patterns, Algorithms

Co-location Patterns,
Algorithms, Fig. 4 An
algorithm for reference
feature co-locations

if there is any instance (i.e., object) within " may use spatial analysis to identify features that
distance from oi ; if there is, we add the cor- commonly appear in the same constellation (e.g.,
responding feature to L. Finally, L will con- low brightness, similar colors). Biologists may
tain the maximal pattern that includes fi ; for identify interesting feature combinations appear-
each subset of it we increase the support of ing frequently in close components of protein or
the corresponding co-location rule. For more de- chemical structures.
tails about this process, the reader can refer to
Zhang et al. (2004). Decision Support
Overall, the mining algorithm requires two Co-location pattern analysis can also be used
database scans; one for hashing and one for read- for decision support in marketing applications.
ing the partitions, performing the spatial joins and For example, consider an E-commerce company
counting the pattern supports, provided that the that provides different types of services such
powerset of all features but fi can t in memory. as weather, timetabling and ticketing queries
This is a realistic assumption for typical applica- (Morimoto 2001). The requests for those services
tions (with 10 or less feature types). Furthermore, may be sent from different locations by (mobile
it can be easily extended for arbitrary pattern or x line) users. The company may be interested
graphs like those of Fig. 1b and c. in discovering types of services that are requested
by geographically neighboring users in order
to provide location-sensitive recommendations
Key Applications to them for alternative products. For example,
having known that ticketing requests are
Sciences frequently asked close to timetabling requests,
Scienti c data analysis can bene t from mining the company may choose to advertise the
spatial co-location patterns (Salmenkivi 2004; ticketing service to all customers that ask for
Yang 2005). Co-location patterns in census data a timetabling service.
may indicate features that appear frequently in
spatial neighborhoods. For example, residents of
high income status may live close to areas of low Future Directions
pollution. As another example from geographical
data analysis, a co-location pattern can associate Co-location patterns can be extended to include
contaminated water reservoirs with a certain de- the temporal dimension. Consider for instance, a
cease in their spatial neighborhood. Astronomers database of moving objects, such that each object
Co-location Patterns, Algorithms 259

is characterized by a feature class (e.g., private co-location patterns mining problem is converted
cars, taxis, buses, police cars, etc.). The move- to the spatial co-locations mining problem we
ments of the objects (trajectories) are stored in have seen thus far.
the database as sequences of timestamped spatial A more interesting (and more challenging)
locations. The objective of spatio-temporal co- type of spatio-temporal collocation requires that
location mining is to derive patterns composed the closeness relationship has a duration of at
by combinations of features like the ones seen least time units, where is another mining C
in Fig. 1. In this case, each edge in the graph of parameter. For example, we may consider, as
a pattern corresponds to features that are close a co-location instance, a combination of fea-
to each other (i.e., within distance ") for a large ture instances (i.e., moving objects), which move
percentage (i.e., large enough support) of their closely to each other for continuous time units.
locations during their movement. An exemplary To count the support of such durable spatio-
pattern is ambulances are found close to police temporal patterns, we need to slide a window
cars with a high probability . Such extended spa- of length along the time dimension and for
tial co-location patterns including the temporal each position of the window, nd combinations of
aspect can be discovered by a direct applica- moving objects that qualify the pattern. Formally,
tion of the existing algorithms. Each temporal given a durable pattern P , speci ed by a feature-
snapshot of the moving objects database can be relationship graph (like the ones of Fig. 1) which
viewed as a segment of a huge map (that includes has a node fi and distance/duration constraints "
all frames) such that no two segments are closer and , the participation ratio of feature fi in P is
to each other than ". Then, the spatio-temporal de ned by:

# -length windows with an instance of P


pr.fi ; P / D : (4)
# -length windows with a moving object of type fi

Thus, the participation ratio of a feature fi in Brinkhoff T, Kriegel HP, Seeger B (1993) Ef cient pro-
P is the ratio of window positions that de ne cessing of spatial joins using r-trees. In: Proceedings
a sub-trajectory of at least one object of type of the ACM SIGMOD international conference
Huang Y, Xiong H, Shekhar S, Pei J (2003) Mining con -
fi which also de nes an instance of the pattern. dent co-location rules without a support threshold. In:
Prevalence and con dence in this context are Proceedings of the 18th ACM symposium on applied
de ned by (2) and (3), as for spatial co-location computing (ACM SAC) (2003)
patterns. The ef cient detection of such patterns Koperski K, Han J (1995) Discovery of spatial asso-
ciation rules in geographic information databases.
from historical data as well as their on-line iden- In: Proceedings of the 4th international symposium
ti cation from streaming spatio-temporal data are on advances in spatial databases (SSD), vol. 951,
interesting problems for future research. pp. 47 66
Mamoulis N, Papadias D (2001) Multiway spatial joins.
ACM Trans Database Syst 26(4):424 475
Cross-References Morimoto Y (2001) Mining frequent neighboring class
sets in spatial databases. In: Proceedings of the ACM
 Co-location Pattern SIGKDD international conference knowledge discov-
ery and data mining
 Patterns, Complex Munro R, Chawla S, Sun P (2003) Complex spatial
 Retrieval Algorithms, Spatial relationships. In: Proceedings of the 3rd IEEE inter-
national conference on data mining (ICDM)
Preparata FP, Shamos MI (1985) Computational geome-
References try: an introduction. Springer, New York
Salmenkivi M (2004) Evaluating attraction in spatial point
Agrawal R, Skrikant R (1994) Fast algorithms for mining patterns with an application in the eld of cultural
association rules. In: Proceedings of the 20th interna- history. In: Proceedings of the 4th IEEE international
tional conference on very large data bases, pp 487 499 conference on data mining
260 Co-location Patterns, Interestingness Measures

Shekhar S, Huang Y (2001) Discovering spatial co- often be assumed as desirable. Typically, these
location patterns: a summary of results. In: Proceed- properties are based on the frequencies of pattern
ings of the 7th international symposium on advances
in spatial and temporal databases (SSTD)
instances in the data.
Wang J, Hsu W, Lee ML (2005) A framework for mining Spatial association rules, co-location patterns
topological patterns in spatio-temporal databases. In: and co-location rules were introduced to address
Proceedings of the 14th ACM international conference the problem of nding associations in spatial
on Information and knowledge management. Full pa-
per in IEEE Trans. KDE 16(12), 2004
data, and in a more general level, they are applica-
Yang H, Parthasarathy S, Mehta S (2005) Mining spatial tions of the problem of nding frequent patterns
object associations for scienti c data. In: Proceedings on spatial domain. Interestingness of a pattern in
of the 19th International Joint Conference on Arti cial data is often related to its frequency, and that is
Intelligence
Zaki MJ, Gouda K (2003) Fast vertical mining using
the reason for the name of the problem.
diffsets. In: Proceedings of the ACM SIGKDD Con- In practice, a pattern is considered as interest-
ference ing, if the values of the interestingness measures
Zhang X, Mamoulis N, Cheung, DWL, Shou Y (2004) (possibly only one) of the pattern exceed the
Fast mining of spatial collocations. In: Proceedings of
the ACM SIGKDD Conference
thresholds given by the user.

Historical Background
Co-location Patterns,
Interestingness Measures Finding patterns in data and evaluating their
interestingness has traditionally been an essential
Marko Salmenkivi task in statistics. Statistical data analysis methods
HIIT Basic Research Unit, Department of cannot always be applied to large data masses.
Computer Science, University of Helsinki, For more detailed discussion of the problems,
Helsinki, Finland see Scienti c Fundamentals. Data mining, or
knowledge discovery from databases, is a branch
of computer science that arose in the late 1980s,
Synonyms when classical statistical methods could no
longer meet the requirements of analysis of the
Association Measures; Co-location Patterns; In- enormously increasing amount of digital data.
terestingness Measures; Selection Criteria; Sig- Data mining develops methods for nding trends,
ni cance Measures regularities, or patterns in very large datasets.
One of the rst signi cant contributions of data
mining research was the notion of association
rule, and algorithms, e.g., Apriori (Agrawal and
Definition Ramakrishnan 1994), for nding all interesting
association rules from transaction databases.
Interestingness measures for spatial co-location Those algorithms were based on rst solving
patterns are needed to select from the set of all the subproblem of the frequent itemset discovery.
possible patterns those that are in some (quantita- The interesting association rules could easily be
tively measurable) way, characteristic for the data deduced from the frequent itemsets.
under investigation, and, thus, possibly, provide When applying association rules in spatial
useful information. domain, the key problem is that there is no natural
Ultimately, interestingness is a subjective mat- notion of transactions, due to the continuous two-
ter, and it depends on the user s interests, the dimensional space. Spatial association rules were
application area, and the nal goal of the spatial rst introduced in Koperski and Han (1995).
data analysis. However, there are properties that They were analogous to association rules with
can be objectively de ned, such that they can the exception that at least one of the predicates
Co-location Patterns, Interestingness Measures 261

in a spatial association rule expresses spatial rela- the possible windows of size k k form the set
tionship (e.g., adjacent_to, within, close_to). The of transactions. The items of the transaction are
rules always continue a reference feature. Sup- the features present in the corresponding window.
port and con dence were used as interestingness Thus, support can be used as the interestingness
measures similarly to the transaction-based as- measure. The interpretation of the con dence of
sociation rule mining. Another transaction-based the rule A ! B is the conditional probability
approach was proposed in Morimoto (2001): spa- of observing an instance of B in an arbitrary C
tial objects were grouped into disjoint partitions. k k-window, given that an instance of feature
One of the drawbacks of the method is that A occurs in the window.
different partitions may result in different sets The reference feature-centric model focuses
of transactions, and, thus, different values for on a speci c Boolean spatial feature, and all the
the interestingness measures of the patterns. As discovered patterns express relationships of the
a solution to the problem, co-location patterns reference feature and other features. The spatial
in the context of the event-centric model were association rules introduced in Koperski and Han
introduced in Shekhar and Huang (2001). (1995) are based on selecting a reference fea-
ture, and then creating transactions over space.
Transactions make it possible to employ the in-
Scientific Fundamentals terestingness measures introduced for transac-
tion databases in the context of frequent item-
Different models can be employed to model the set discovery: support of a feature set (analo-
spatial dimension, and the interpretation of co- gously to the support of an itemset in transaction
location patterns as well as the interestingness databases), and con dence (or conditional prob-
measures are related to the selected model. ability) of an association rule.
The set of proposed models include at least the In the event-centric model introduced in
window-centric model, reference feature-centric Shekhar and Huang (2001), the spatial proximity
model, event-centric model, and buffer-based of objects is modeled by using the notion
model (Xiong et al. 2004). of neighborhood. The neighborhood relation
Co-location patterns and co-location rules can R.x; y/, x, y 2 O, where O is the set of spatial
be considered in the general framework of fre- objects, is assumed to be given as input. The
quent pattern mining as pattern classes. Other objects and the neighborhood relation can be
examples of pattern classes are itemsets and as- represented as an undirected graph, where nodes
sociation rules (in relational databases), episodes correspond to objects, and an edge between
(in event sequences), strings, trees and graphs nodes indicates that the objects are neighbors
(Mannila et al. 1995; Zaki 2002). (see Fig. 1). A limitation of the event-centric
In the window-centric model the space is dis- model is that it can be used only when the
cretized by a uniform grid, and the set of all objects are points. An advantage is that the

Co-location Patterns, Interestingness Measures, Fig. 1 Examples of (row) instances of co-location patterns in the
event-centric model
262 Co-location Patterns, Interestingness Measures

pattern discovery is not restricted to patterns with respect to the pattern class of co-location
with a reference feature. Furthermore, no explicit patterns, since adding features to P can clearly
transactions need to be formed. This fact also only decrease prev(P ).
has consequences as to the choice of relevant Let P and Q be co-location patterns, and
interestingness measures. In a transaction-based P \ Q D ;. Then P ! Q is a co-location
model a single object can only take part in one rule. The con dence (or conditional probability)
transaction, whereas in the event-centric model it of P ! Q (in a given dataset) is the fraction
is often the case that a single object participates of instances of P such that they are also in-
in several instances of a particular pattern. stances of P [ Q. A co-location rule is con dent
Figure 1 shows an example. There are nine if the con dence of the rule exceeds the user-
spatial point objects. The set of features consists speci ed threshold value. A suf ciently high
of three features indicated by a triangle (denote prevalence of a co-location pattern indicates that
it by A), circle (B), and rectangle (C ). In this the pattern can be used to generate con dent
example only one feature is assigned to each co-location rules. Namely, assume that the user-
object, in general there may be several of them. speci ed con dence threshold for interesting co-
There are three instances of feature A, two location rules is min_conf . Then, if prev.P /
instances of B, and four instances of C . The min_conf , rule f ! f1 ; : : : ; fn is con dent for
solid lines connect the objects that are neighbors. all f 2 P.
Cliques of the graph indicate the instances of In the example of Fig. 1 the prevalence
co-location patterns. Hence, there is only one prev.AB/ D min.2=3; 1/ D 2=3. Thus, one
instance of pattern fABC g containing all the can generate rules A ! B, the con dence of rule
features. being 2/3, and B ! A (con dence 1).
The participation ratio of a feature f in a Another interestingness measure proposed for
co-location pattern P is the number of instances co-location patterns is maximum participation
of the feature that participate in an instance of ratio (MPR). Prevalence of a pattern is the min-
P divided by the number of all instances of imum of the participation rations of its features,
f . For instance, in the example data on the whereas MPR is de ned as the maximum of them.
left panel of Fig. 1 the participation ratio of Correspondingly, a suf ciently high MPR im-
feature A in pattern fABg, pr.A; fABg/ D 2=3, plies that at least one of the features, denote it
since two out of three instances of feature by T , rarely occurs outside P. Hence, the co-
A also participate in instances of fABg. location rule fT g ! P n fT g is con dent (Huang
Correspondingly pr.B; fABg/ D 2=2 D 1, since et al. 2003). The motivation of using the MPR is
there is no instance of B that is not participating that rare features can more easily be included in
in fABg. The objects on the right panel of Fig. 1 the set of interesting patterns.
are equal to those of the left panel, except for A drawback of MPR is that it is not
an additional point with feature B. Now, there monotonous. However, a weaker property ( weak
are two different points with feature B such that monotonicity ) can be proved for MPR. This
they both are neighbors of the same instance of property is utilized in Huang et al. (2003) to
A. The instances of pattern fA; Bg have been develop a level-wise search algorithm for mining
indicated by the dashed lines. Thus, one instance con dent co-location rules.
of A participates in two instances of fA; Bg. The The buffer-based model extends the co-
participation ratios are equal to the left-side case: location patterns to polygons and line strings
pr.A; fABg/ D 2=3 and pr.B; fABg/ D 3=3 D 1. (Xiong et al. 2004). The basic idea is to
Prevalence of a co-location pattern is de ned introduce a buffer, which is a zone of a
as prev.P / D minfpr.f; P/; f 2 Pg. A co- speci ed distance, around each spatial object.
location pattern is prevalent, if its prevalence The boundary of the buffer is the isoline of
exceeds the user-speci ed threshold value. Preva- equal distance to the edge of the objects (see
lence is a monotonous interestingness measure Fig. 2). The (Euclidean) neighborhood N(o) of
Co-location Patterns, Interestingness Measures 263

Co-location Patterns,
Interestingness
Measures, Fig. 2
Examples of
neighborhoods in the
buffer-based model

an object o is the area covered by its buffer. sample. The aim in statistics is typically to infer,
The (Euclidean) neighborhood of a feature f based on the sample, knowledge of properties of
is the union of N .oi /, where oi 2 Of , and the reality , that is, the phenomenon, that gen-
Of is the set of instances of f . Further, the erated the data. The goal of co-location pattern
(Euclidean) neighborhood N .C / for a feature mining is to nd descriptions of the data, that is,
set C D ff1 ; f2 ; : : :; fn g is de ned as the only the content of the available database is the
intersection of N .fi /; fi 2 C . object of investigation. In a sense, statistical anal-
The coverage ratio Pr.C /, where C D ysis is more ambitious. However, sophisticated
ff1 ; f2 ; : : :; fn g is a feature set is de ned statistical data analysis methods cannot always
as NZ .C /
, where Z is the total size of the be applied to large data masses. This may be
investigation area. Intuitively, the coverage ratio due to the lack of computational resources, expert
of a set of features measures the fraction of knowledge, or other human resources needed to
the investigation area that is in uenced by the preprocess the data before statistical analysis is
instances of the features. possible.
The coverage ratio is a monotonous Furthermore, depending on the application,
interestingness measure in the pattern class treating the content of a spatial database as a
of co-location patterns in the buffer-based sample may be relevant, or not. Consider, for
model, with respect to the size of the co- instance, roads represented in a spatial database.
location pattern (Xiong et al. 2004). Now Clearly, it is usually the case that (practically)
in the buffer-based model the conditional all of them are included in the database, not
probability (con dence) of a co-location rule only a sample. On the other hand, in an ecolog-
P ! Q expresses the probability of nding ical database that includes the known locations
the neighborhood of Q in the neighborhood of of nests of different bird species, it is obvi-
P . Due to the monotonicity of coverage ratio, it ous that not all the nests have been observed,
can be computed as NN .P [Q/
.P /
. Xiong et al. also and thus a part of the information is missing
demonstrate that the de nition of conditional from the database. Another example is a lin-
probability (con dence) of a co-location rule guistic database that contains dialect variants of
in the event-centric model does not satisfy the words in different regions. Such variants can-
law of compound probability: it is possible that not in practice be exhaustively recorded every-
Prob.BCjA/ ⁄ Prob.C jAB/Prob.BjA/, where where, and, thus, the data in the database is a
Prob.BCjA/ is equal to the con dence of the sample.
rule A ! BC. They show, however, that in the Statistical analysis of spatial point patterns is
buffer-based model this law holds. closely related to the problem of nding inter-
esting co-location patterns (see, e.g., Bailey and
Statistical Approaches Gatrell 1995; Diggle 1983). In statistics, features
An essential difference in the viewpoints of spa- are called event types, and their instances are
tial statistics and co-location pattern mining is events. The set of events in the investigation
that in statistics the dataset is considered as a area form a spatial point pattern. Point patterns
264 Co-location Patterns, Interestingness Measures

of several event types (called marked point pat- Pr.T > t jH0 ). The smaller the p-value, the
terns) may be studied, for instance, to evaluate smaller the probability that the observed degree
spatial correlation (either positive, i.e., clustering of spatial correlation could have been occurred by
of events, or negative, i.e., repulsion of events). chance. Thus, the correlation can be interpreted
Analogously, the point pattern of a single event as interesting if the p-value is small. If the p-
type can be studied for evaluating possible spatial value is not greater than a prede ned , the
autocorrelation, that is, clustering or repulsion of deviation is de ned to be statistically signi cant
the events of the event type. with the signi cance level .
In order to evaluate spatial (auto)correlation, The correlation patterns introduced in
point patterns, that is the data, are modeled as Salmenkivi (2006) represent an intermediate
realizations (samples) generated by spatial point approach between spatial point pattern analysis
processes. A spatial point process de nes a joint and co-location pattern mining. Correlation
probability distribution over all point patterns. patterns are de ned as interesting co-location
The most common measures of spatial correla- patterns (in the event-centric model) of the form
tion in point patterns are the G.h/, and K.h/- A ! B, where A and B are single features. The
functions. For a single event type the value of interestingness is determined by the statistical
G.h/-function in data is the number of events signi cance of the deviation of the observed
such that the closest other event is within a G.h/-value from a null hypothesis assuming no
distance less than h divided by the number of spatial correlation between features A and B.
all events. For two event types, instead of the
closest event of the same type, the closest event
of the other event type is considered. Thus, the
Key Applications
con dence of the co-location rule A ! B, where
A and B are single features in the event-centric
Large spatial databases and spatial datasets. Ex-
model, is equal to the value of GA;B (h)-function
amples: digital road map (Shekhar and Ma), cen-
in the data, when the neighborhood relation is
sus data (Malerba et al. 2001), place name data
de ned as the maximum distance of h between
(Leino et al. 2003; Salmenkivi 2006).
objects.
The statistical framework implies that the rela-
tionship of the phenomenon and the data, which
is a sample, has to be modeled in some way. In Future Directions
spatial statistics, the interestingness measures can
be viewed from several perspectives, depending A collection of interesting patterns can be re-
on the statistical framework, and the methods garded as a summary of the data. However, the
used in the data analysis. One of the most com- pattern collections may be very large. Thus, con-
mon frameworks is the hypothesis testing. densation of the pattern collections and pattern
Hypothesis testing sets up a null hypothesis, ordering are important challenges for research on
typically assuming no correlation between fea- spatial co-location patterns.
tures, and an alternative hypothesis that assumes Co-location patterns and rules are local in the
spatial correlation. A test statistic, e.g., G.h/ sense that, given a pattern, only the instances of
or K.h/-function, for measuring spatial corre- the features that appear in the pattern are taken
lation is selected, denote it by T . The value into account when evaluating the interestingness
of the test statistic in data, denote it by t , is of the pattern. However, the overall distribution
compared against the theoretical distribution of and density of spatial objects and features may, in
the test statistic, assuming that the null hypoth- practice, provide signi cant information as to the
esis holds. Then, a natural interestingness mea- interestingness of a pattern. This challenge is to
sure of the observed spatial correlation is based some extent related to the challenge of integrating
on the so-called p-value, which is de ned as statistical and data mining approaches.
Combinatorial Map 265

Cross-References ings of 7th international symposium on advances in


spatial and temporal databases (SSTD 2001), Redondo
Beach
 Co-location Pattern Shekhar S, Ma X. GIS subsystem for a new approach to
 Co-location Pattern Discovery accessing road user charges
 Data Analysis, Spatial Xiong H, Shekhar S, Huang Y, Kumar V, Ma X, Yoo JS
 Frequent Itemset Discovery (2004) A framework for discovering co-location pat-
 Frequent Pattern
terns in data sets with extended spatial objects. In: Pro- C
ceedings of the fourth SIAM international conference
 Statistical Descriptions of Spatial Patterns on data mining (SDM04), Lake Buena Vista
Zaki MJ (2002) Ef ciently mining frequent trees in a
forest. In: Proceedings of 8th ACM SIGKDD inter-
national conference on knowledge discovery and data
References mining, Edmonton

Agrawal R, Ramakrishnan S (1994) Fast algorithms for


mining association rules in large databases. In: Pro-
ceedings of the 20th international conference on very Recommended Reading
large data bases, Santiago, 12 15 Sept, pp 487 499
Bailey TC, Gatrell AC (1995) Interactive spatial data
Mannila H, Toivonen H (1997) Levelwise search and
analysis. Longman, Harlow
borders of theories in knowledge discovery. Data Min
Diggle PJ (1983) Statistical analysis of spatial point pat-
Knowl Disc 1(3):241 258
terns. Mathematics in biology. Academic, London
Huang Y, Xiong H, Shekhar S, Pei J (2003) Mining con-
dent co-location rules without a support threshold In:
Proceedings of the 2003 ACM symposium on applied
computing (ACM SAC March, 2003), Melbourne, Co-location Rule Discovery
pp 497 501
Koperski K, Han J (1995) Discovery of spatial associ-
ation rules in geographic information databases. In:  Co-location Pattern Discovery
Proceedings of 4th international symposium on large
spatial databases (SSD95), Portlane, pp 47 66
Leino A, Mannila H, Pitk nen R (2003) Rule discov-
ery and probabilistic modeling for onomastic data. Co-location Rule Finding
In: Lavrac N, Gamberger D, Todorovski L, Blockeel
H (eds) Knowledge discovery in databases: PKDD
2003. Lecture notes in arti cial intelligence, vol 2838.  Co-location Pattern Discovery
Springer, Heidelberg, pp 291 302
Malerba D, Esposito F, Lisi FA (2001) Mining spatial
association rules in census data. In: Proceedings of 4th
international seminar on new techniques and technolo-
gies for statistics (NTTS 2001), Crete Co-location Rule Mining
Mannila H, Toivonen H, Verkamo AI (1995) Discovering
frequent episodes in sequences. In: First international  Co-location Pattern Discovery
conference on knowledge discovery and data mining
(KDD 95, August), pp. 210 215, Montreal. AAAI
Press
Morimoto Y (2001) Mining frequent neighboring class
sets in spatial databases. In: International proceedings COM/OLE
of the 7th ACM SIGKDD conference on knowledge
and discovery and data mining, San Francisco, pp 353
358  Smallworld Software Suite
Salmenkivi M (2006) Ef cient mining of correlation pat-
terns in spatial point data. In: F rnkranz J, Schef-
fer T, Spiliopoulou M (eds) Knowledge discovery
in databases: PKDD-06, Berlin, Proceedings. Lecture Combinatorial Map
notes in computer science, vol 4213. Springer, Berlin,
pp 359 370
Shekhar S, Huang Y (2001) Discovering spatial co-  Geosensor Networks, Qualitative Monitoring
location patterns: a summary of results. In: Proceed- of Dynamic Fields
266 Complex Event Processing

Definition
Complex Event Processing
In the last few decades, computing environments
 Data Stream Systems, Empowering with Spa-
have evolved to accommodate the need for in-
tiotemporal Capabilities tegrating the separate, and often incompatible,
processes of Geographic Information Systems
(GIS) and Computer Assisted Design (CAD).
Components; Reuse This chapter will explore the evolution of GIS
and CAD computing environments-from desktop
 Smallworld Software Suite to Web, and nally to wireless-along with the in-
dustry requirements that prompted these changes.

Composite Geographic Information


Systems Web Application
Historical Background
 GIS Mashups
Before the 1980s, Computer Assisted Design
(CAD) and Geographic Information Systems
Computational Grid (GIS) functions were performed primarily on
minicomputers running 32-bit operating systems
such as VAX, VMS, or UNIX. Since minicom-
 Grid
puters were expensive (approximately $200,000),
many CAD and GIS solutions were bundled with
hardware and offered as turnkey solutions. Al-
Computational Infrastructure
though the combination of software and hardware
was a popular option, it was still prohibitively
 Grid
expensive for small organizations, making
CAD and GIS affordable only for government,
academic institutions, and major corporations.
Computer Cartography With the advent of the personal computer,
particularly the IBM PC in 1981, which sold
 Con ation of Geospatial Data for approximately $1600, GIS and CAD became
affordable for small- and medium-sized organiza-
tions.
Computer Environments for GIS Soon after the introduction of affordable
and CAD personal computers, Autodesk developed the
rst PC-based CAD software, AutoCADfi ,
Joe Astroth which sold for approximately $1000. Desktop
Autodesk Location Services, San Rafael, CA, GIS products appeared on the market shortly
USA thereafter. Rather than being retro tted from
minicomputer programs, the most successful of
these applications were engineered speci cally
Synonyms for the PC. With the availability of these powerful
desktop programs, small- to medium-sized
CAD and GIS platforms; Convergence of GIS organizations that had previously relied on analog
and CAD; Evolution of GIS and LBS; Geo mapping and drafting had access to the wealth
mashups; Technological in ection points in GIS of information and time-saving tools formerly
and CAD development available only to large organizations.
Computer Environments for GIS and CAD 267

Scientific Fundamentals meaningful information, such as topology and


attributes, manually. Because of this manual step,
CAD and GIS During the Workstation translation from CAD to GIS was extremely
Phase dif cult, even with automated import tools.
Although both GIS and CAD were introduced on GIS and CAD applications also provided dif-
the PC at around the same time, they were consid- ferent types of tools, making it dif cult for users
ered completely separate, and often incompatible, to switch systems. In GIS applications, tools were C
applications, making data sharing dif cult. For designed for data cleanup, spatial analysis, and
example, precision and accuracy in GIS, unlike map production, whereas tools in CAD were
that of CAD, is variable, depending on scale. In intended for data entry, drafting, and design.
CAD, precision was represented as 64-bit units Since CAD drafting tools were much easier to
(double precision), and in GIS as 32-bit units use, CAD technicians were wary of using GIS
(single precision). Positional accuracy, which in- software for creation of design drawings.
dicates the proximity of a feature on a map to CAD drawings themselves also made it dif-
its real location on earth, is quite high for CAD cult to transfer data to GIS. A typical CAD
relative to GIS. For example, a 1:10,000 scale drawing contains objects made up of arcs and
map might have a positional accuracy of 2.5 m arbitrarily placed label text. However, in GIS, text
(8.2 ft). can be generated based on attributes or database
Another barrier to data sharing between CAD values, often producing a result that is not aes-
and GIS on the PC was the process of data thetically pleasing to a cartographer.
collection. Many GIS survey instruments, such The representation of land parcels, common
as Total Stations and GPS, collect data in ground in GIS applications for municipalities, presented
units, rather than grid units. Ground units, which another challenge for integrating CAD drawings
represent features on the earth s surface exactly, and GIS. Polylines portraying lot and block lines
are both longer and bigger than grid units. In in a survey plan need to be translated into mean-
addition, elevations and scale are not factored into ingful polygons in GIS that represent the parcel.
ground units. Cleanup tools are used to ensure the accuracy of
Since a great deal of map data was collected the lot and block lines. Each parcel must also be
in the eld, maps drawn in CAD were stored in associated with its appropriate attributes, and a
ground units, and scale and coordinate systems polygon topology must be created so that Parcel
were added afterwards. CAD engineers found Identi cation Numbers (PINs or PIDs) inside
that using ground units, rather than grid units, was each polygon are linked to the parcel database.
advantageous. For example, assuming that di- These barriers to integrating GIS and CAD
mensions on the map were accurate, if the line on led to the development of software solutions in
the map measured 100 m (328 ft), it corresponded each phase of the technological advancement in
to 100 m on the ground. With GIS grid units, computing environments.
100 m on the map might actually correspond to
100.1 m (328 ft 3 in) on the ground. Bridging the Gap Between CAD in GIS in the
Another signi cant difference between CAD Workstation Phase
and GIS applications is that in CAD, unlike in In the initial workstation phase, the only way to
GIS, points and polylines represent objects in the integrate GIS data with AutoCAD data was to use
real world, but contain no attached information. DXF (drawing exchange format). This process
In GIS, points, polylines, and polygons can was extremely time-consuming and error-prone.
represent wells, roads, and parcels, and include Many CAD drawings were drawn for a particu-
attached tables of information. In many cases, lar project or plan and never used again. Often
when spatial information was transferred these drawings were not in the same coordinate
from CAD to GIS applications, features system as the GIS and had to be transformed
were unintelligent and had to be assigned on import. Even today, a GIS enterprise is built
268 Computer Environments for GIS and CAD

and maintained by importing data from CAD for users who were not GIS professionals or
drawings. Graphic representations of layers of AutoCAD engineers, and offered the basic tools
a formation, such as water, sewer, roads and of both systems: precision drafting and the capa-
parcels, are imported into the GIS using the le- bility to query large geospatial data and perform
based method. rudimentary analysis and reports.
To better merge the CAD world with the World used a Microsoft Of ce interface to ac-
GIS world, a partnership was formed between cess and integrate different data types, including
Autodesk, Inc. and ESRI, leading to the creation geographic, database, raster, spreadsheet, and im-
of ArcCADfi . ArcCAD was built on AutoCAD ages, and supported Autodesk DWG as a native
and enabled users to create GIS layers and to le format, increasing the value of maps created
convert GIS layers into CAD objects. This tool in AutoCAD and AutoCAD Map. World enabled
also facilitated data cleanup and the attachment users to open disparate GIS data les simulta-
of attributes. Because ArcCAD enabled GIS data neously and perform analysis regardless of le
to be shared with a greater number of people, the type. Autodesk World could access, analyze, edit
data itself became more valuable. and save data in all the standard formats without
Although ArcCAD solved some of the inte- import or export.
gration problems between CAD and GIS, it still Although Autodesk World represented a real
did not provide full GIS or CAD functionality. breakthrough in integrating GIS and CAD les,
For example, overlay analysis still had to be it lacked an extensive CAD design environment.
performed in ArcInfofi and arcs and splines were AutoCAD was still the CAD environment of
not available in the themes created by ArcCAD. choice, and AutoCAD Map continued to offer
In order to provide a fully functional GIS built better integration of GIS within a full CAD en-
on the AutoCAD platform, Autodesk developed vironment. Autodesk World lled a need, much
AutoCAD Mapfi (now called Autodesk Mapfi /, like other desktop GIS solutions at the time, but
which made it simple for a CAD designer to there was still a gap between the CAD design
integrate with external databases, build topology, process and analysis and mapping within the GIS
perform spatial analysis, and utilize data clean- environment.
ing, without le translation or lost data. In Auto- In the same time period, AutoCAD Map con-
CAD Map, lines and polygons were topologically tinued to evolve its GIS capabilities for directly
intelligent with regard to abstract properties such connecting, analyzing, displaying, and theming
as contiguity and adjacency. Since DWG les existing GIS data (in SDE, SHP, DGN, DEM, and
were already le-based packets of information, Raster formats, for example) without import or
they became GIS-friendly when assigned topol- export. In support of the Open GIS data standard,
ogy and connected to databases. Precision was AutoCAD Map could read OpenGIS information
enforced instantly, since the DWG les could natively. GIS and CAD integration continues to
now store coordinate systems and perform pro- be one of key features of AutoCAD Map.
jections and transformations. AutoCAD Map rep-
resented the rst time a holistic CAD and GIS CAD and GIS During the Web Phase
product was available for the PC Workstation The next signi cant in ection point in technology
environment. was the World Wide Web, which increased the
Although AutoCAD Map could import and number of users of spatial data by an order of
export the standard GIS le types (circa 1995: magnitude. With the advent of this new technol-
ESRI SHP, ESRI Coverage, ESRI E00, Microsta- ogy and communication environment, more peo-
tion DGN, MapInfo MID/MIF, Atlas BNA) users ple had access to information than ever before.
began to request real-time editing of layers from Initially, CAD and GIS software vendors re-
third-party GIS les. To meet this demand, Au- sponded to the development of the Web by Web-
todesk created a new desktop GIS/CAD product enabling existing PC applications. These Web-
called Autodesk Worldfi . World was designed enabled applications offered the ability to assign
Computer Environments for GIS and CAD 269

Universal Resource Locators (URLs) to graphic browser plug-in that could display full vector-
objects or geographic features, such as points, format GIS data streamed from an enormous
lines and polygons, and enabled users to publish repository using very little bandwidth. Each
their content for viewing in a browser as an layer in MapGuide viewer could render streamed
HTML (Hypertext Markup Language) page and data from different MapGuide Servers around
a series of images representing maps or design. the Internet. For example, road layers could be
Software developers also Web-enabled CAD streamed directly from a server in Washington, C
and GIS software by providing a thin client or DC, while the real-time location of cars could be
browser plug-in, which offered rich functionality streamed directly from a server in Dallas, Texas.
similar to the original application. MapGuide managed its performance primarily
with scale-dependent authoring techniques that
CAD for the Web limited the amount of data based on the current
In the early Web era, slow data transfer rates scale of the client map.
required thin clients and plug-ins to be small MapGuide could perform basic GIS functions
(less than one megabyte) and powerful enough such as buffer and selection analysis, as well
to provide tools such as pan and zoom. In light as address-matching navigation with zoom-goto.
of this, Autodesk s developed a CAD plug-in One of the more powerful aspects of MapGuide
called Whip! which was based on AutoCAD s was the generic reporting functionality, in which
ADI video driver. MapGuide could send a series of unique IDs of
Although the Whip! viewer today has evolved selected objects to any generic Web page for
into the Autodesk DWF Viewer, the le format, reporting. Parcels, for example, could be selected
DWF (Design Web Format) remains the same. in the viewer and the Parcel IDs could be sent to a
DWF les can be created with any AutoCAD server at City Hall that had the assessment values.
based product, including AutoCAD Map, and A report was returned, as a Web page, containing
the DWF format displays the map or design on all the information about the selected parcels.
the Web as it appears on paper. DWF les are Again, the report could reside on any server,
usually much smaller than the original DWGs, anywhere. The maps in MapGuide were just styl-
speeding their transfer across the Web. With the ized pointers to all the potential servers around
development of DWF, Internet users had access the Internet, containing spatial and attribute data.
to terabytes of information previously available MapGuide was revolutionary at the time, and rep-
only in DWG format. This was a milestone in resented, in the true sense, applications taking ad-
information access. vantage of the distributed network called the Web.
From a GIS perspective, 2D DWF les were MapGuide continued to evolve, using ActiveX
useful strictly for design and did not represent controls for Microsoft Internet Explorer, a plug-
true coordinate systems or offer GIS functional- in for Netscape and a Java applet that could run
ity. Although Whip!-based DWF was extremely on any Java-enabled browser. Initially, MapGuide
effective for publishing digital versions of maps used only its own le format, SDF, for geographic
and designs, GIS required a more comprehensive features. Later, MapGuide could natively support
solution. DWG, DWF, SHP, Oracle Spatial, and ArcSDE.
Note: Today, DWF is a 3D format that sup- Although MapGuide was an extremely effec-
ports coordinate systems and object attributes. tive solution, it could run only on Microsoft
Windows servers. The development of MapGuide
GIS for the Web OpenSource and Autodesk MapGuide Enterprise
As the Web era progressed, it became clear was inspired by the need to move toward a neutral
that a simple retro t of existing applications server architecture and plug-in-free client expe-
would not be suf cient for Web-enabled GIS. rience. MapGuide could be now be used either
In 1996, Autodesk purchased MapGuidefi from without a plug-in or with the newest DWF Viewer
Argus Technologies. MapGuide viewer was a as a thin client.
270 Computer Environments for GIS and CAD

Within AutoCAD Map, users could now pub- Wireless GIS and Location-Based Services
lish directly to the MapGuide Server and main- Initially, the mobile GIS solution at Autodesk was
tain the data dynamically, further closing the GIS- OnSite Enterprise, which leveraged the mobility
CAD gap. of OnSite and the dynamism of MapGuide. On-
Site Enterprise created handheld MapGuide maps
in the form of OSD les that users could simply
CAD and GIS During the Wireless Phase
copy off the network and view on their mobile
Wireless CAD and GIS marked the beginning
devices with OnSite.
of the next in ection point on the information
In 2001, when true broadband wireless came
technology curve, presenting a new challenge
on the horizon, Autodesk created a new corpo-
for GIS and CAD integration. Since early wire-
rate division focused solely on Location-Based
less Internet connection speeds were quite slow-
Services (LBS). The burgeoning Wireless Web
approximately one quarter of wired LAN speed-
required a new type of software, designed specif-
Autodesk initially decided that the best method
ically to meet the high transaction volume, per-
for delivering data to handheld device was sync
formance (+ 40 transactions per second), and
and go, which required physically connecting
privacy requirements of wireless network oper-
a handheld to a PC and using synchronization
ators (WNOs). The next technological in ection
software to transfer map and attribute data to the
point had arrived, where maps and location-based
device. GIS consumers could view this data on
services were developed for mass-market mobile
their mobile devices in the eld without being
phones and handheld devices.
connected to a server or desktop computer. Since
Autodesk Location Services created Location-
handheld devices were much less expensive than
Logic , a middleware platform that provides
PCs, mobile CAD and GIS further increased the
infrastructure, application services, content pro-
number of people who had access to geospatial
visioning, and integration services for deploy-
information.
ing and maintaining location-based services. The
LocationLogic platform was built by the same
Wireless CAD strong technical leadership and experienced sub-
Autodesk OnSite View (circa 2000) allowed ject matter experts that worked on the rst Au-
users to transfer a DWG le to Palm-OS handheld todesk GIS products. The initial version of Lo-
and view it on the device. When synchronized, cationLogic was a core Geoserver speci cally
the DWG le was converted to an OnSite Design targeted for wireless and telecom operators that
le (OSD), and when viewed, allowed users to required scalability and high-volume transaction
pan, zoom and select features on the screen. throughput without performance degradation.
With the advent of Windows CE support, On- The LocationLogic Geoserver was able to
Site View allowed redlining, enabling users to provide:
mark up a design without modifying the original.
Redlines were saved as XML (Extensible Markup
Point of Interest (POI) queries
Language) les on the handheld and were trans-
Geocoding and reverse geocoding
ferred to the PC on the next synchronization or
Route planning
docking. These redline les could be imported
Maps
into AutoCAD, where modi cations to the design
Integrated user pro le and location triggers
could be made.
Autodesk OnSite View could be considered
more mobile than wireless, since no direct access Points of Interest (POIs) usually comprise a
to the data was available without connecting the set of businesses that are arranged in different
mobile device to the PC. OnSite View lled a categories. POI directories, which can include
temporary niche before broadband wireless con- hundreds of categories, are similar to Telecom
nections became available. Yellow Pages, but with added location intelligence.
Computer Environments for GIS and CAD 271

Common POI categories include Gas Stations, Friend nder utilities alerted the phone user that
Hotels, Restaurants, and ATMs, and can be people on their list of friends were within a
customized for each customer. Each listing in certain distance of the phone.
the POI tables is spatially indexed so users can More recently, Autodesk Location Services
search for relevant information based on a given has offered two applications built on Location-
area or the mobile user s current location. Logic that can be accessed on the cell phone
Geocoding refers to the representation of a and via a Web browser: Autodesk Insight and C
feature s location or address in coordinates (x,y) Autodesk Family Minder.
so that it can be indexed spatially, enabling prox- Autodesk Insight is a service that enables
imity and POI searches within a given area. any business with a PC and Web browser to
Reverse geocoding converts x, y coordinates to track and manage eld workers who carry mobile
a valid street address. This capability allows the phones. Unlike traditional eet tracking services,
address of a mobile user to be displayed once Insight requires no special investment in GPS
their phone has been located via GPS or cell hardware. Managers and dispatchers can view the
tower triangulation. Applications such as Where locations of their staff, determine the resource
am I? and friend or family nders utilize reverse closest to a customer site or job request, and
geocoding. generate turn-by-turn travel instructions from the
Route planning nds the best route between Web interface. Managers can also receive alerts
two or more geographical locations. Users can when a worker arrives at a given location or enters
specify route preferences, such as shortest path or leaves a particular zone. Reports on travel,
based on distance, fastest path based on speed route histories, and communications for groups
limits, and routes that avoid highways, bridges, or individuals, over the last 12 or more months,
tollways, and so on. Other attributes of route can be generated from the Web interface.
planning include modes of transportation (such Family Minder allows parents and guardians
as walking, subway, car), which are useful for to view the real-time location of family members
European and Asian countries. from a Web interface or their handset. Parents and
The maps produced by the LocationLogic s guardians can also receive noti cations indicating
Geoserver are actually authored in Autodesk that a family member has arrived at or left a
MapGuide. Although the Geoserver was built location. The recent advances in mobile phone
from the ground up, LocationLogic was able to technology, such as sharper displays, increased
take advantage of MapGuide s effective mapping battery life and strong processing power, make it
software. possible for users to view attractive map displays
LocationLogic also supports user pro les for on regular handsets.
storing favorite routes or POIs. Early versions
of LocationLogic also allowed applications to Enterprise GIS: Workstation, Web and
trigger noti cations if the mobile user came close Wireless Synergy
to a restaurant or any other point of interest. In 1999, Autodesk acquired VISION*fi , along
This capability is now used for location-based with its expertise in Oracle and enterprise GIS
advertising, child zone noti cations, and so on. integration. This was a turning point for Autodesk
GIS. File-based storage of information (such as
DWG) was replaced with enterprise database
Key Applications storage of spatial data. Currently, Autodesk has
integrated VISION* into its development, as
Early LBS applications built on LocationLogic seen in Autodesk GIS Design server. Autodesk
included traf c alerts and friend nder utilities. Topobase , which also stores its data in Oracle,
For example, Verizon Wireless subscribers could connects to AutoCAD Map and MapGuide
receive TXT alerts about traf c conditions at to provide enterprise GIS Public Works and
certain times of day and on their preferred routes. Municipal solutions.
272 Computer Environments for GIS and CAD

Users
Autodesk LocationLogic

s
Autodesk InSight

le s
Autodesk MapGuide Family Minder

re
Wi
Autodesk DWF

Inflection Point
Autodesk OnSite View
Autodesk OnSite Enterprise

b
AutoCAD

We
Autodesk Map
Autodesk Raster Design
Inflection Point
s
PC

Technological Inflection Points:


Inflection Point Workstation, Web and Wireless

1970 1982 1994 2001 Time

Computer Environments for GIS and CAD, Fig. 1 Technological in ection points along the information technology
curve exponential jumps in access to geospatial information

MapGuide and AutoCAD Map support Oracle of spatial data consumers, and the CAD and GIS
Spatial and Locator, which allow all spatial data gap continued to close. The most recent in ection
to be stored in a central repository. All appli- point, Web to wireless, saw the number of spatial
cations can view the data without duplication data users reach a new high, as GIS applications
and reliance on le conversion. AutoCAD Map were embedded in the users daily tools, such
users can query as-built information from the as cell phones (see Fig. 1). At this point in the
central repository for help in designs, and any technology curve, the need for synergy between
modi cations are saved and passed to GIS users. CAD and GIS is apparent more than ever. Since
The central GIS database can also be published the value of spatial data increases exponentially
and modi ed from Web-based interfaces, such as with the number of users who have access to
MapGuide. Real-time wireless applications, such it, Autodesk s enterprise GIS solution, with its
as Autodesk Insight, can use the repository for centralized spatial database, provides signi cant
routing and mobile resource management. value to a wide variety of spatial data consumers.
Autodesk has a history of leveraging in ection
Summary points along the computing and communication
At each technological in ection point-workstation, technology curve to create exciting and innova-
Web and wireless-Autodesk has leveraged tive solutions. For over two decades, Autodesk s
infrastructural changes to exponentially increase mission has been to spearhead the democrati-
the universe of potential consumers of geospatial zation of technology by dramatically increas-
information. The shift from minicomputer to PC ing the accessibility of heretofore complex and
saw Autodesk create AutoCAD and AutoCAD expensive software. This philosophy has been
Map to enable sharing of geographic and design pervasive in the GIS and LBS solutions that it
information. The next in ection point, worksta- has brought to a rapidly growing geospatial user
tion to Web, spurred another jump in the number community.
Computer Environments for GIS and CAD 273

Computer Environments
for GIS and CAD, Fig. 2 Future Technological Inflection Point
Future technological
in ection point (Continued
from Fig. 1)
SOA for GIS
Application Mashups

FUTURE Inflection Point C

Autodesk Location Logic


Autodesk InSight

ess
Family Minder

el
Wir

Future Directions CAD and GIS will soon be so integrated


that the location on the timeline from design
The next potential in ection point will emerge to physical feature or survey to map will be
with the development of Service Oriented Archi- the only way to determine which technology is
tecture (SOA), built upon a Web 2.0 and Telco currently being used. Seamless and transparent
2.0 framework. Not only will the distributed data services and data distribution will bring subsets
and application architecture continue to increase of CAD and GIS utilities together to produce
the number of geospatial data consumers, but it dynamic applications on demand. Servers will
will increase the use and accessibility of powerful no longer host only databases, but will run self-
analytical and visual tools as well. supported applications, functions, and methods
Historically, the Web was leveraged to that are CAD, GIS, database, and business ori-
distribute data with wireless and Web technology. ented. These services will be offered through the
Now, the geo-mashups between tools such as new Web 2.0 to provide powerful solutions.
Google Earth and AutoCAD Civil 3D, make Transparent GIS Services and integrated
use of the interaction of Web-based applications geospatial data will affect a larger segment of
and data. A simple example of an SOA the population. No longer will the technology
application is LocationLogic s geocoder, which just be cool, but will be completely integral to
performs geocoding and reverse-geocoding via daily life. Autodesk s role will be to continue
Asynchronous JavaScript and XML (AJAX) calls to provide tools that will leverage this new
to a URL that return sets of coordinates and reality and meet the coming new demands in
addresses, respectively. information and technology.
As GIS applications become integrated into
current technologies (such as cars and courier
boxes), demand for rapid data and application Cross-References
processing will apply pressure to all aspects of
the distribution model. One challenge will be  Data Models in Commercial GIS Systems
to provide rapid information updates, such as  Internet GIS
current aerial photographs and the latest traf c  Internet-Based Spatial Information Retrieval
conditions. These just-in-time applications will  Location Intelligence
require a massive increase in scale to accommo-  Location-Based Services: Practices and Prod-
date the large number of business and personal ucts
users. At each technological in ection point, the  Oracle Spatial, Raster Data
accessibility to this vital information will in-  Privacy Threats in Location-Based Services
crease exponentially (see Fig. 2).  Vector Data
274 Computer Supported Cooperative Work

 Web Mapping and Web Cartography


 Web Services, Geospatial
Computer Vision Augmented
Geospatial Localization

Ashish Gupta
Recommended Reading
Department of Civil, Environmental, and
Autodesk Geospatial. http://images.autodesk.com/adsk/ Geodetic Engineering, Ohio State University,
les/autodesk_geospatial_white_paper.pdf Columbus, OH, USA
Autodesk Inc. (2007) Map 3D 2007 essentials. Autodesk
Press, San Rafael
Barry D (2003) Web services and service-orientated archi-
tectures, the Savvy Manager s guide, your road map Synonyms
to emerging IT. Morgan Kaufmann Publishers, San
Francisco Autonomous navigation; Global navigation
Best Practices for Managing Geospatial Data.
http://images.autodesk.com/adsk/ les/%7B574931BD
satellite systems; GPS-denied geo-localization;
-8C29-4A18 -B77C-A60691A06A11%7D_Best_Prac Simultaneous localization and mapping; Un-
tices. pdf manned aerial vehicles; Visual odometry
CAD and GIS Critical Tools, Critical Links: Removing
Obstacles Between CAD and GIS Professionals.
http://images.autodesk.com/adsk/ les/3582317_Criti
cal Tools0.pdf Definition
Hjelm J (2002) Creating location services for the wireless
guide. Professional Developer s guide series. Wiley
Geospatial localization is the estimation of
Computer Publishing, New York
Jagoe A (2003) Mobile location services, the global geographic location using, in part,
de nitive guide. Prentice Hall, Upper Saddle geospatial analysis. Geospatial analysis uses
River statistical and other analytic techniques for data
Kolodziej K, Hjelm J (2006) Local positioning systems,
that has a geographic or spatial context to it,
LBS applications and services. Taylor and Francis
group. CRC Press, Boca Raton. typically available in geographic information
Laurini R, Thompson D (1992) Fundamentals of spatial systems (GIS). Geographic location is typically
information systems. The APIC series. Academic, ascertained using Global Navigation Satellite
London/San Diego
Systems (GNSS) like GPS and GLONASS,
Longley P, Goodchild M, Maguire D, Rhind D (1999)
Geographical information systems, 2nd edn. Principles which requires simultaneous line-of-sight
and technical issues, vol 1; Management issues and connection with multiple satellites to estimate
applications, vol 2. Wiley, New York location within an error margin of a few meters.
Sharma C (2001) Wireless internet enterprise applications.
These constraints limit the use of GNSS-based
Wiley tech brief series. Wiley Computer Publishing,
New York localization to outdoors with few obstructing
Schiller J, Voisard A (2004) Location-based services. structures in close proximity and a tolerance
Morgan Kaufmann, San Francisco to uncertainty in exact location. In addition to
Plewe B (1997) GIS ONLINE; information retrieval, map-
ping and the internet. OnWord Press, Santa Fe
these constraints, in many environments such
Vance D, Smith C, Appell J (1998) Inside autodesk world. as indoors, urban canyons, under dense foliage,
OnWord Press, Santa Fe underwater, and underground, there is limited or
Vance D, Walsh D, Eisenberg R (2000) Inside AutoCAD no GPS access. Besides these naturally occurring
map 2000. Autodesk Press, San Rafael
Vance D, Eisenberg R, Walsh D (2000) Inside AutoCAD
constraints, GPS access can be easily blocked by
map 2000, the ultimate how-to resource and desktop jamming, spoo ng, and other GPS-denial threats
reference for AutoCAD map. OnWord Press, Florence in adversarial environments. Consequently,
for positioning, navigation, and timing (PNT)
applications, GPS must be augmented or
Computer Supported Cooperative supplanted by other sensors and systems. In
Work such cases GPS is used for an approximate
localization within a geographic region, which
 Geocollaboration can range from tens to thousands of square
Computer Vision Augmented Geospatial Localization 275

meters based on the environment. Alternate vessels while cruising oceans. Such systems are
techniques are used to ascertain exact location too inaccurate for precision demands of present-
within this geographic region. Simultaneous day PNT-dependent systems. To overcome the
localization and mapping (SLAM) techniques are accuracy issues of GNSS for precision aviation,
popularly used in robotics to estimate the location a Local Area Augmentation System (LAAS) is
of a robot in real time based on information used for precision aircraft landing in all weather
acquired from the environment using sensors conditions (Enge 1999). A VHF signal link from C
on board the robot (Lategahn et al. 2011). It airport transmitters is used by aircraft to correct
is typical to use multiple sensors like inertial GPS signal for precise localization. Cellular ca-
measurement units (IMU), single or double pable devices can use Assisted GPS (A-GPS)
video cameras for monocular or stereo vision, for improved localization using information pro-
light detection and ranging (LIDAR), and sound vided by the cellular network in conjunction with
navigation and ranging (SONAR). The choice of satellite signal for a quicker estimation of loca-
sensor suite is based on the environment (aerial, tion. Localization using cellular tower triangula-
ground, underwater, indoor), the type of mobile tion is another alternative with a reasonable error
platform, and the performance, processing, and of several tens of meters, but it is only feasible
cost budget. Vision-based sensors are among outdoors in urban areas. For indoor navigation,
the most popular in SLAM techniques since the IEEE 802.11 wireless LAN (WLAN) location
they are informative and cost-effective. In tracking system is an option (Emery and Denko
addition to acquiring sensor information of 2007). It uses received signal strength indication
its vicinity and estimating its position, SLAM (RSSI) on mobile devices and estimates location
also builds a map of the geographic region in by comparison with a precomputed database of
real time. There are different types of maps. RSSI measurements in that indoor environment.
However, with navigation being the principal It accounts for signal propagation loss and can
objective, topological maps are most relevant. provide accuracy of a few meters.
A topological map focuses on the connectivity These PNT systems are currently operational
between important entities in the environment but are incumbent with high infrastructure
with disregard to their exact location (Paul and cost and have other limitations like availability
Newman 2010). Metric mapping can be used in exclusively in urban environments. Moreover,
conjunction with topological maps to compute they have a natural limitation of localization
a topometric map which is used to compute accuracy. In comparison, computer vision-based
exact localization within that geographic pose estimation techniques used in robotics for
region (Badino et al. 2011). This technique SLAM have a comparatively high localization
records sensor information with its registered accuracy, but have historically been used for
location in a map database. Subsequently, while mapping small-sized environments. However,
moving through that geographic region, sensor success in the DARPA Grand Challenges for
information can be used as a query to the autonomous navigation of driverless vehicles
recorded map database to retrieve geospatial over large distances using SLAM-based
location in real time. techniques established the viability of computer
vision augmented geospatial localization as a
viable PNT alternative in GPS-denied or GPS-
Historical Background degraded environments (Thrun et al. 2006).

Long-range navigation (LORAN) was a hyper-


bolic radio navigation system developed prior to Scientific Fundamentals
the advent of GNSS-based PNT. It used low-
frequency radio waves and covered several thou- Visual sensor-based localization using computer
sand miles, but had a poor accuracy of hundreds vision comprises two main parts in its approach:
of meters. It was used for localization of naval metric and topological localization. Metric
276 Computer Vision Augmented Geospatial Localization

object y
object point [mm]
2
feature point Pj
3
camera
4
image k 1 Pj,k 1
1
Pj,k+1
Pj,k camera
image k +1
0 10m

X [mm]
camera image k
wheel odometry visual odometry (SURF)

visual odometry (SIFT) laser odometry

Computer Vision Augmented Geospatial Localiza- Localization accuracy depends on the type of sensor and
tion, Fig. 1 Estimation of pose of sensors onboard mov- type of features computed from the data stream. The
ing vehicle. Trajectory of moving vehicle is estimated graph is illustrative of the difference between trajectories
based on sensor locations in current and previous frames. recovered using different types of odometry techniques

Computer Vision Augmented Geospatial Localiza- Washington, DC, is abstracted as a graph, where edges
tion, Fig. 2 Map data is abstracted as graph data struc- typically represent roads and nodes represent intersections
ture. The transport network layer from OpenStreetMap of and other relevant points in the map

location is estimated by computing the coor- on board the vehicle is retrieved from a nite set
dinates of the location of the sensor on board the of possible locations. Topological localization
vehicle. These could be geographic coordinates provides a coarse location estimate. Topological
of latitude and longitude. The coordinates of the maps are typically stored as graph structures,
vehicle pose are typically computed by triangu- where nodes indicate possible locations and
lation, using methods like structure from motion edges are connections between locations. An
(SfM) (Koenderink and van Doorn 1991) or example is shown in Fig. 2 for the city of Wash-
Visual Odometry (VO) (Alonso et al. 2012). An ington, DC, where the transport network layer
illustrative example is shown in Fig. 1. Pose of the acquired from OpenStreetMap for the county
sensor is estimated based on matching features area has been abstracted as a graph. The weight of
across different frames in the data stream of a an edge can indicate the similarity or proximity
moving vehicle. The sequence of poses is used to between locations. The size of the nite set of
estimate a 3D trajectory of this moving vehicle. In locations is typically kept small so that ef cient
topological localization, the position of the sensor retrieval in real-time applications is a tractable
Computer Vision Augmented Geospatial Localization 277

problem. While metric approaches provide accu- vironment has several applications in numerous
rate localization results, they tend to fail and drift scenarios. Since it provides very accurate loca-
over time as the vehicle traverses big distances in tion information in real time, it can be used
its geographic region. On the other hand, due to for autonomous navigation for self-driving cars,
its nite state space, topological approaches pro- unmanned aerial vehicles (UAV) and unmanned
vide a robust localization but only rough position ground vehicles (UGV). A vision-based system
estimates. A fusion of the metric and topological provides rich real-time information that allows C
approaches achieves accurate metric results while an autonomous navigation system to tackle a
maintaining the robustness of topological match- dynamic environment, such as appearance of un-
ing, which is a technique typically referred to expected objects in the vicinity of the vehicle that
as topometric localization. It uses a ne-grained were absent during the mapping phase, which is
topological map, where each node has an associ- not available in other PNT systems.
ated coordinate of its real metric location. Such A robust GPS alternative is particularly impor-
topological maps can be acquired from sources tant for military applications since GPS signals
like GIS databases for outdoor navigation. can easily be denied to mission critical naviga-
Finding the node of the current location translates tion systems on several assets, especially in con-
to nding the metric coordinate of the vehicle. tested territory. Relatively cost-effective vision-
A generic topometric localization algorithm based geo-localization can be alternatively used
involves the two stages of map creation and then by guidance systems on weapons platforms like
localization. A vehicle equipped with cameras, missiles, drones, and UGVs.
IMU, and GNSS-capable device rst traverses the Community-driven map generation projects
routes to be mapped. GPS and inertial sensors like OpenStreetMap are extremely popular
are used to create a graph of this environment. (Floros et al. 2013). Accurate and information-
The graph is metric in the sense that the nodes rich vision-based localization can simultaneously
contain the exact location of the vehicle. From correct registration errors in these maps and also
the acquired images using an onboard camera in annotate the maps with geo-referenced objects
monocular or stereo con guration, visual local like buildings, road signs, vegetation, and other
features are extracted. These features are pro- geographic entities.
cessed and stored in a database with a reference Vision-based localization is typically un-
to the node corresponding to its real location. hampered by its environment, unlike radio
At runtime, the vehicle drives over the routes signals which suffer issues of multipath and
included in the a priori map. Video imagery is propagation path losses by absorption. Since
processed online to obtain features. As the vehi- it can be used in most environments, it can
cle moves, these visual features are matched with also be used ubiquitously with disregard to
those in the database. Since there are potentially change in environments like transitioning from
multiple feature matches from different parts of outdoors to indoors, driving through tunnels,
the mapped region, a method like Bayesian l- etc. which otherwise typically require a hand-off
tering is utilized to estimate the probability den- between different PNT systems operational in
sity function of the position of the vehicle. This their respective environments.
facilitates pruning of false-positive matches and
provides accurate localization and a smooth esti-
mated trajectory of the moving vehicle. Future Directions

Computer vision augmented geospatial lo-


Key Applications calization is a rapidly emerging technology.
Future developments include improved sensor
Geospatial localization using a sensor suite that fusion where vision, inertial, LIDAR, SONAR,
includes visual sensors in a GNSS-denied en- magnetometer, and gravimeter sensors will be
278 Computing Fitness of Use of Geospatial Datasets

ef ciently combined for ubiquitous navigation Paul R, Newman P (2010) FAB-MAP 3D: topological
while traveling across different environments mapping with spatial and visual appearance. In: 2010
IEEE international conference on robotics and au-
without degradation in localization accuracy. tomation, Anchorage, pp 2649 2656
The quality of information in GIS databases and Thrun S, Montemerlo M, Dahlkamp H, Stavens D, Aron
accuracy of geospatial localization are synergistic A, Diebel J, Fong P, Gale J, Halpenny M, Hoffmann G,
where one improves the other and vice versa. Lau K, Oakley C, Palatucci M, Pratt V, Stang P, Stro-
hband S, Dupont C, Jendrossek LE, Koelen C, Markey
Cross-referencing and registration of visual infor- C, Rummel C, van Niekerk J, Jensen E, Alessandrini
mation from different mobile platforms including P, Bradski G, Davies B, Ettinger S, Kaehler A, Ne an
UAV and UGV can improve GIS databases and A, Mahoney P (2006) Stanley: the robot that won the
provide a ground and aerial map of a geographic darpa grand challenge: research articles. J Robot Syst
23(9):661 692
region for accurate 3D geospatial localization.

Cross-References
Computing Fitness of Use of
 Bayesian Network Integration with GIS Geospatial Datasets
 Feature Detection and Tracking in Support of
GIS Leen-Kiat Soh and Ashok Samal
 Indoor Localization Department of Computer Science and
 OpenStreetMap Engineering, The University of Nebraska at
 Optimal Location Queries on Road Networks Lincoln, Lincoln, NE, USA
 Road Network Data Model
 Spatial Analysis along Networks
Synonyms

Con ict Resolution; Dempster Shafer Belief The-


References ory; Evidence; Frame of Discernment; Informa-
tion Fusion; Plausibility; Quality of Information;
Alonso IP, Llorca DF, Gavilan M, Pardo SA, Garcia-
Garrido MA, Vlacic L, Sotelo MA (2012) Accurate Timeseries Data
global localization using visual odometry and digital
maps on urban environments. IEEE Trans Intell Transp
Syst 13(4):1535 1545
Badino H, Huber D, Kanade T (2011) Visual topometric
localization. In: IEEE intelligent vehicles symposium, Definition
proceedings (2011), Baden-Baden, pp 794 799
Emery M, Denko M (2007) Ieee 802.11 wlan based real- Geospatial datasets are widely used in many
time location tracking in indoor and outdoor envi-
ronments. In: Canadian conference on electrical and applications including critical decision support
computer engineering, CCECE 2007, Apr 2007, Van- systems. The goodness of the dataset, called the
couver, pp 1062 1065 Fitness of Use (FoU), is used in the analysis
Enge P (1999) Local area augmentation of gps for and has direct bearing on the quality of derived
the precision approach of aircraft. Proc IEEE 87(1):
111 132 information from the dataset that ultimately plays
Floros G, Van Der Zander B, Leibe B (2013) Open- a role in decision making for a speci c applica-
StreetSLAM: global vehicle localization using Open- tion. When a decision is made based on different
StreetMaps. In: Proceedings IEEE international sources of datasets, it is important to be able
conference on robotics and automation, Karlsruhe,
pp 1054 1059 to fuse information from datasets of different
Koenderink JJ, van Doorn AJ (1991) Af ne structure from degrees of FoU. Dempster-Shafer belief theory is
motion. J Opt Soc Am A 8(2):377 385 used as the underlying con ict resolution mecha-
Lategahn H, Geiger A, Kitt B (2011) Visual SLAM for
nism during information fusion. Furthermore, the
autonomous ground vehicles. In: Proceedings IEEE
international conference on robotics and automation, Dempster-Shafer belief theory is demonstrated
Shanghai, pp 1732 1737 as a viable approach to fuse information derived
Computing Fitness of Use of Geospatial Datasets 279

from different approaches in order to compute the and reliability in many real applications when
FoU of a dataset. it is impossible to obtain precise measurements
and results from real experiments. In addition,
the Dempster-Shafer belief Theory provides a
Historical Background framework to combine the evidence from mul-
tiple sources and does not assume disjoint out-
In most applications, sometimes it is assumed comes (Sentz and Ferson 2002). Additionally, the C
that the datasets are perfect and without any Dempster-Shafer s measures are not less accurate
blemish. This assumption is, of course, not true. than Bayesian methods, and in fact reports have
The data is merely a representation of a continu- shown that it can sometimes outperform Bayes
ous reality both in space and time. It is dif cult theory (Cremer et al. 1998; Braun 2000).
to measure the values of a continuous space and
time variable with in nite precision. Limitations
Scientific Fundamentals
are also the result of inadequate human capac-
ity, sensor capabilities and budgetary constraints.
Assume that there is a set of geospatial datasets,
Therefore, the discrepancy exists between the re-
S D fS1 ; S2 ; ; Sn g. A dataset Si may consist
ality and the datasets that are derived to represent
of many types of information including (and not
it. It is especially critical to capture the degree of
limited to) spatial coordinates, metadata about
this discrepancy when decisions are made based
the dataset, denoted by auxi , and the actual time
on the information derived from the data. Thus,
series data, denoted by tsi .
this measure of quality of a dataset is a function of
The metadata for a dataset may include the
the purpose for which it is used, hence it is called
type of information being recorded (e.g., precipi-
its tness of use (FoU). For a given application,
tation or volume of water in a stream), the period
this value varies among the datasets. Information
of record, and the frequency of measurement.
derived from high-FoU datasets is more useful
Thus,
and accurate for the users of the application than
that from low-FoU datasets. The challenge is auxi D htypei ; t bi ; t ei ; i nti i;
to develop appropriate methods to fuse derived
information of varying degrees of FoU as well as where t bi and t ei denote the beginning and the
of derived information from datasets of varying ending time stamps for the measurements, and
degrees of FoU. This will give insights as to how inti is the interval at which the measurements are
the dataset can be used or how appropriate the made. Other metadata such as the type and age of
dataset is for a particular application (Yao 2003). the recording device can also be added.
An information theoretic approach is used to The time series data in a dataset may consist
compute the FoU of a dataset. The Dempster- of a sequence of measurements,
Shafer belief theory (Shafer 1976) is used as the
basis for this approach in which the FoU is repre- t si D mi;1 ; mi;2 ; : : : ; mi;p :
sented as a range of possibilities and integrated
Each measurement stores both the time the
into one value based on the information from
measurement was taken and the actual value
multiple sources. There are several advantages
recorded by the sensor. Thus, each measurement
of the Dempster-Shafer belief theory. First, it
is given by
does not require that the individual elements fol-
low a certain probability. In other words, Bayes mi;j D ti;j ; vi;j :
theory considers an event to be either true or
untrue, whereas the Dempster-Shafer allows for It is assumed that the measurements in the
unknown states (Konks and Challa 2005). This dataset are kept in chronological order. Therefore,
characteristic makes the Dempster-Shafer belief
theory a powerful tool for the evaluation of risk ti;j < ti;k ; for j < k:
280 Computing Fitness of Use of Geospatial Datasets

Furthermore, the rst and last measurement inconsistent with it. The values of both belief
times should match the period of record stored and plausibility range from 0 to 1. The belief
in the metadata, function (bel) and the plausibility function (pl)
are related by:
t bi D ti;1 and t ei D ti;p :
pl.P / D 1 bel.P /;
The problem of nding the suitability of a where P is the negation of the proposition P .
dataset for a given application is to de ne a Thus, bel.P / is the extent to which evidence is
function for the FoU that computes the tness of in favor of P .
use of a dataset described above. The function The term Frame of Discernment (FOD) con-
FoU maps Si to a normalized value between 0 sists of all hypotheses for which the information
and 1: sources can provide evidence. This set is nite
FoU .Si ; A/ D 0; 1 ; and consists of mutually exclusive propositions
where Si is a single dataset and A is the intended that span the hypotheses space. For a nite set
application of the data. The application A is of mutually exclusive propositions . /, the set of
represented in the form of domain knowledge possible hypotheses is its power set .2 /, i.e., the
that describes how the goodness of a dataset is set of all possible subsets including itself and a
viewed. A set of rules may be used to specify this null set. Each of these subsets is called a focal
information. Thus, element and is assigned a con dence interval
(belief, plausibility).
A D fR1 ; R2 ; ; Rd g ; Based on the evidence, a probability mass is
rst assigned to each focal element. The masses
are probability-like in that they are in the range
where Ri is a domain rule that describes the
[0, 1] and sum to 1 over all hypotheses. How-
goodness of a dataset and d is the number of
ever, they represent the belief assigned to a focal
rules. Therefore, the FoU function is de ned with
element. In most cases, this basic probability
respect to an application domain. Different appli-
assignment is derived from the experience and the
cations can use different rules for goodness and
rules provided by some experts in the application
derive different FoU values for the same dataset.
domain.
Given a hypothesis H , its belief is computed
Dempster-Shafer Belief Theory
as the sum of all the probability masses of the
The two central ideas of the Dempster-Shafer
subsets of H as follows:
belief theory are: (a) obtaining degrees of belief
from subjective probabilities for a related ques- X
tion, and (b) Dempster s rule for combining such bel.H / D m.e/;
degrees of belief when they are based on indepen- e H

dent items of evidence. For a given proposition


P , and given some evidence, a con dence inter- where m.e/ is the probability mass assigned
val is derived from an interval of probabilities to the subset e. The probability mass function
within which the true probability lies within a distributes the values on subsets of the frame
certain con dence. This interval is de ned by the of discernment. Only to those hypotheses, for
belief and plausibility supported by the evidence which it has direct evidence, are the none-zero
for the given proposition. The lower bound of values assigned. Therefore, the Dempster-Shafer
the interval is called the belief and measures the belief theory allows for having a single piece
strength of the evidence in favor of a proposition. of evidence supporting a set of multiple propo-
The upper bound of the interval is called the sitions being true. If there are multiple sources
plausibility. It brings together the evidence that of information, probability mass functions can
is compatible with the proposition and is not be derived for each data source. These mass
Computing Fitness of Use of Geospatial Datasets 281

values are then combined using Dempster s Com- to compute the FoU of the datasets are used.
bination Rule to derive joint evidence in order The heuristics can be based on common sense
to support a hypothesis from multiple sources. knowledge or can be based on expert feedback.
Given two basic probability assignments, mA The following criteria are used:
and mB for two independent sources (A and
B/ of evidence in the same frame of discern- Consistency A dataset is consistent if it does
ment, the joint probability mass, mAB , can be not have any gaps. A consistent dataset has a C
computed according to Dempster s Combination higher tness value
Rule: Length The period of record for the dataset is
P also an important factor in the quality. Longer
m.A/ m.B/ periods of record generally imply a higher
A\BDC
mAB .C / D P : tness value
1 m.A/ m.B/
A\BD; Recency Datasets that record more recent
observations are considered to be of a higher
Furthermore, the rule can be repeatedly ap- tness value
plied for more than two sources sequentially, and Temporal Resolution Data are recorded at
the results are order-independent. That is, com- different time scales (sampling periods). For
bining different pieces of evidence in different example, the datasets can be recorded daily,
sequences yields the same results. weekly or monthly. Depending on the applica-
Finally, to determine the con dence in a hy- tion, higher or lower resolution may be better.
pothesis H being true, belief and plausibility are This is also called the granularity (Mihaila
multiplied together: et al. 1999)
Completeness A data record may have many
con dence.H / D bel.H / pl.H /: attributes, e.g., time, location, and one or more
measurements. A dataset is complete if all the
Thus, the system is highly con dent regarding relevant attributes are recorded. Incomplete
a hypothesis being true if it has high belief and datasets are considered to be inferior (Mihaila
plausibility for that hypothesis being true. et al. 1999)
Suppose that there are three discrete FoU out- Noise All datasets have some noise due to
comes of the datasets suitable (s), marginal (m), many different factors. All these factors may
and unsuitable (u), and D fs; m; ug. Then, the lead to data not being as good for use in
frame of discernment is applications.

FOD D 2 For each of the above criteria, one or more heuris-


tics can be de ned to determine the probability
D f;; fsg ; fmg ; fug ; fs; mg ; fs; ug; mass for different data quality values. The heuris-
fm; ug ; fs; m; ugg: tics in the form of rules are speci ed as follows:

To illustrate how the Dempster-Shafer belief C1 .Si / ^ C2 .Si / ^ ^ Cn .Si / ! mass


theory can be used to fuse derived information .Si ; fqtypeg/ D m;
from geospatial databases, two novel approaches
to derive information from geospatial databases where Ci speci es a condition of the dataset, Cj
are herein presented i.e., (1) heuristics, and (Si ) evaluates to true if the condition Cj holds for
(2) statistics before fusing them. For each ap- the dataset Si , and mass(Si , {qtype}) denotes the
proach, the computation of the FoU of the dataset mass of evidence that the dataset Si contributes
based on the derived information is demonstrated. to the FoU outcome types in {qtype}. Then, a
In the rst approach, a set of domain heuristics rule is triggered or res if all the conditions are
for this purpose and then the combination rule met. When the rule res, the right-hand side of
282 Computing Fitness of Use of Geospatial Datasets

the rule is evaluated which assigns a value m to Likewise, the periodic variance can be derived
the probability mass for a given set of outcome for the time marks j as
types which in the example, {qtype} {suitable,
marginal, unsuitable}. vari;j
Applying a set of rules as de ned above to !2
dataset Si thus yields a set of masses for different P
k
2
P
k
k mi;p mi;p
combinations of outcome types. These masses are pD0
periodCj
pD0
periodCj

then combined using the Dempster s Combina- D :


k .k 1/
tion Rule to yield a coherent set of masses for
each element of FOD. The result may be fur- Given the set of means and variances for all
ther reduced by considering only the singletons: time marks in a period, the coef cient of variation
{suitable},{marginal}, and {unsuitable}, which at each time mark j can be further computed as
allows one to compute the belief, plausibility, and
con dence values on only these three outcome p
vari;j
types. covi;j D :
meani;j
Now, how one may compute the FoU of a
dataset using information derived statistically is The temporal variability of the dataset Si can
shown. In the following, the statistical method then be de ned as the average value of coef cient
used is called temporal variability analysis. Sup- of variation for all time marks:
pose that Si has the following measurements:
P
period
t si D mi;1 ; mi;2 ; : : : ; mi;p ; and
j D1
mi;j D ti;j ; vi;j : cN .Si / D period :
covi;j
Suppose that the measurements are collected
periodically at some regular intervals. Suppose Heuristics can then be used to assign proba-
that the period of a data series is de ned as bility masses to the different outcomes based on
the time between two measurements collected at the value of c. For example, to assign probability
the same spatial location at the same time mark. masses to the outcomes, the temporal variabil-
Given this notion of periodicity, the average value ity can be divided into three ranges: the upper
of all measurements at each particular time mark (largest) one-third, the middle one-third and the
over the interval of measurements can be com- lower (smallest) one-third. For each range, one
puted. Formally, or more heuristics are de ned to determine the
probability mass for different FoU values. The
t si D mi;1 ; mi;2 ; ; mi;p can be re-written as: heuristics are speci ed in the form of rules as
t si D mi;1 ; mi;2 ; : : : ; mi;period ; mi;periodC1 ;
.cN .Si / within range k/ !
mi;periodC2; ; mi;2 period;
mass .Si ; fqtypeg/ D m;
mi;2 periodC1 ; ; mi;k period ;
where c.Si / is the average coef cient of varia-
such that ti;k period ti;1 D inti . Given the above
tion of the dataset Si , and the range k is one
representation, the periodic mean can be derived
the three ranges mentioned above. For a given
at each time mark j as
dataset Si , the right hand side of the above rule is
evaluated and a value m to the probability mass is
P
k assigned for a given type (suitable, marginal, or
mi;p period Cj
pD0
unsuitable). These probability masses can also be
meani;j D : combined using Dempster s Combination Rule.
k
Computing Fitness of Use of Geospatial Datasets 283

Thus, at this point, there are two pieces of Key Applications


derived FoU values for a dataset. One is through
the heuristic approach and the other through the The key category of applications for the pro-
statistical approach. Both FoU values represent a posed technique is to categorize or cluster spatio-
con dence in the dataset belonging to a particular temporal entities (objects) into similar groups.
type (suitable, marginal, or unsuitable). To obtain Through clustering, the techniques can be used to
one single composite FoU out of the two values, group a dataset into different clusters for knowl- C
yet another fusion can be performed. That is, to edge discovery and data mining, classi cation,
fuse the two derived information of varying FoU, ltering, pattern recognition, decision support,
one may simply treat each FoU as a mass for knowledge engineering and visualization.
the dataset to belong to a particular qtype. Thus,
by employing Dempster s Combination Rule, one Drought Mitigation: Many critical decisions
can repeat the same process in order to obtain are made based on examining the relation-
different mass values that support the notion that ship between the current value of a decision
the dataset has a FoU of a certain type. This variable (e.g., precipitation) and its historic
allows the FoUs to be fused at subsequent levels norms. Incorporating the tness of use in the
as well. computation will make the decision making
The idea of FoU has also been applied in process more accurate.
several different contexts. De Bruin et al. (2001) Natural Resource Management: Many natural
have proposed an approach based on decision resources are currently being monitored us-
analysis where value of information and value of ing a distributed sensor network. The number
control were used. Value of information is the of these networks continues to grow as the
expected desirability of reducing or eliminating sensors and networking technologies become
uncertainty in a chance node of a decision tree more affordable. The datasets are stored as
while value of control is the expected amount typical time series data. When these datasets
of control that one could affect the outcome of are used in various applications, it would be
an uncertain event. Both of these values can be useful to incorporate the tness of use values
computed from the probability density of the in the analysis.
spatial data. Vasseur et al. (2003) have proposed
an ontology-driven approach to determine t-
ness of use of datasets. The approach includes
conceptualization of the question and hypothesis Future Directions
of work to create the ontology of the problem,
browsing and selecting existing sources informed The current and future work focuses on extending
by the metadata of the datasets available, ap- the above approach to compute the tness of use
praisal of the extent of these databases matching for derived information and knowledge. FoU, for
or missing the expected data with the quality example, is applied directly to raw data. How-
expected by the user, translation of the corre- ever, as data is manipulated, fused, integrated,
sponding (matched) part into a query on the ltered, cleaned, and so on, the derived metadata
database, reformulation of the initial concepts by or information appears, and with it, an associated
expanding the ontology of the problem, querying measure of tness. This tness can be based on
the actual databases with the query formulae, and the intrinsic FoU of the data that the information
nal evaluation by the user to accept or reject is based on and also on the technique that derives
the retrieved results. Further, Ahonen-Rainio and the information. For example, one may say that
Kraak (2005) have investigated the use of sample the statistical approach is more rigorous than the
maps to supplement the tness for use of geospa- heuristic approach and thus should be given more
tial datasets. mass or con dence. Likewise, this notion can
284 Computing Performance

be extended to knowledge that is the result of


using information. A piece of knowledge is, for
Computing Performance
example, a decision. Thus, by propagating FoU
from data to information, and from information  Network GIS Performance
to knowledge, one can tag a decision with a
con dence value.
Conceptual Generalization of
Databases
Cross-References
 Abstraction of Geodatabases
 Crime Mapping and Analysis
 Data Collection, Reliable Real-Time
 Error Propagation in Spatial Prediction Conceptual Model
 Indexing and Mining Time Series Data
 Application Schema

References
Conceptual Modeling
Ahonen-Rainio P, Kraak MJ (2005) Deciding on tness
for use: evaluating the utility of sample maps as an
element of geospatial metadata. Cartogr Geogr Inf Sci  Spatiotemporal Database Modeling with an Ex-
32(2):101 112 tended Entity-Relationship Model
Braun J (2000) Dempster-Shafer theory and Bayesian
reasoning in multisensor data fuSion. In: Sen-
sor fusion: architectures, algorithms and applica-
tions IV. Proceedings of SPIE, vol 4051, Orlando, Conceptual Modeling of Geospatial
pp 255 266 Databases
Cremer F, den Breejen E, Schutte K (1998) Sensor data
fusion for antipersonnel land mine detection. In: Pro-
ceedings of EuroFusion98, Great Malvern, pp 55 60  Modeling with ISO 191xx Standards
De Bruin S, Bregt A, Van De Ven M (2001) Assessing
tness for use: the expected value of spatial data sets.
Int J Geogr Inf Sci 15(5):457 471
Konks D, Challa S (2005) An introduction to Bayesian and Conceptual Neighborhood
Dempster-Shafer data fusion available via DSTO-TR-
1436, Edinburgh, Nov 2005. http://www.dsto.defence.
Anthony G. Cohn
gov.au/publications/2563/DSTO-TR-1436.pdf
Mihaila G, Raschid L, Vidal ME (1999) Querying, Qual- School of Computing, University of Leeds,
ity of Data metadata. In: Proceedings of the third Leeds, UK
IEEE meta-data conference, Bethesda, Apr 1999
Sentz K, Ferson S (2002) Combination of evidence in
Dempster-Shafer belief theory. Available via SANDIA Synonyms
technical report SAND2002-0835. http://www.sandia.
gov/epistemic/Reports/SAND2002-0835.pdf
Shafer G (1976) A mathematical theory of evidence. Closest topological distance; Continuity network;
Princeton University Press, Princeton Qualitative similarity
Vasseur B, Devillers R, Jeansoulin R (2003) Ontological
approach of the tness of use of geospatial datasets. In:
Proceedings of 6th AGILE conference on geographic
information science, Lyon, pp 497 504 Definition
Yao X (2003) Research issues in spatiotemporal data min-
ing. A white paper submitted to the University Con-
A standard assumption concerning reasoning
sortium for Geographic Information Science (UCGIS)
workshop on geospatial visualization and knowledge about spatial entities over time is that change is
discovery, Lansdowne, 18 20 Nov 2003 continuous. In qualitative spatial calculi, such
Concurrency Control for Spatial Access 285

as the mereotopological RCC or 9-intersection Definition


calculi in which a small nite set of jointly
exhaustive and pairwise disjoint sets of relations Concurrency control for spatial access method
are de ned, this can be represented as a refers to the techniques providing the serializable
conceptual neighborhood diagram (also known operations in multi-user spatial databases. Specif-
as a continuity network). A pair of relations ically, the concurrent operations on spatial data
R1 and R2 are conceptual neighbors if it is should be safely executed and follow the ACID C
possible for R1 to hold at a certain time, and rules (i.e., Atomicity, Consistency, Isolation, and
R2 to hold later, with no third relation holding Durability). With concurrency control, multi-user
in between. The diagram to be found in the spatial databases can process the search and up-
de nitional entry for mereotopology illustrates date operations correctly without interfering with
the conceptual neighborhood for RCC-8. each other.
The concurrency control techniques for spatial
Cross-References databases have to be integrated with particular
spatial access methods to process simultaneous
 Knowledge Representation, Spatial operations. There are two major concerns in con-
 Mereotopology currency control for the spatial access method.
 Representing Regions with Indeterminate One is how to elevate the throughput of concur-
Boundaries rent spatial operations and the other is concerned
with preventing phantom access.
Recommended Reading

Cohn AG, Hazarika SM (2001) Qualitative spatial rep-


resentation and reasoning: an overview. Fundam Inf Main Text
46(1 2):1 29
Cohn AG, Renz J (2007) Qualitative spatial representa-
tion and reasoning. In: Lifschitz V, van Harmelen F, In the last several decades, spatial data access
Porter F (eds) Handbook of knowledge representation, methods have been proposed and developed to
Ch. 13. Elsevier, M nchen manage multi-dimensional databases as required
in GIS, computer-aided design, and scienti c
Conceptual Schema modeling and analysis applications. In order to
apply the widely studied spatial access methods
in real applications, particular concurrency con-
 Application Schema
trol protocols are required for multi-user envi-
ronments. The simultaneous operations on spatial
databases need to be treated as exclusive opera-
Concurrency Control for Spatial tions without interfering with one another.
Access The existing concurrency control protocols
mainly focus on the R-tree family. Most of them
Jing (David) Dai1 and Chang-Tien Lu2 were developed based on the concurrency proto-
1
Google, New York City, NY, USA cols on the B-tree family. Based on the locking
2
Department of Computer Science, Virginia strategy, these protocols can be classi ed into two
Tech, Falls Church, VA, USA categories, namely, link-based approaches and
lock-coupling methods.
Concurrency control for spatial access meth-
Synonyms ods is generally required in commercial database
management systems. In addition, concurrency
Concurrency control protocols; Concurrent spa- control methods are required in many speci c
tial operations; Simultaneous spatial operations spatial applications, such as the taxi management
286 Concurrency Control for Spatial Access Method

systems that need to continuously query the lo- Definition


cations of taxies. The study on spatial concur-
rency control is far behind the research on spa- The concurrency control for spatial access
tial query processing approaches. There are two method refers to the techniques providing the
interesting and emergent directions in this eld. serializable operations in multi-user spatial
One is to apply concurrency control methods databases. Speci cally, the concurrent operations
on complex spatial operations, such as nearest on spatial data should be safely executed
neighbor search and spatial join; the other to and follow the ACID rules (i.e., Atomicity,
design concurrency control protocols for moving Consistency, Isolation, and Durability). With
object applications. concurrency control, multi-user spatial databases
can perform the search and update operations
correctly without interfering with each other.
There are two major concerns in the con-
Cross-References currency control for spatial data access. One is
the throughput of concurrent spatial operations.
 Indexing, Hilbert R-Tree, Spatial Indexing, The throughput refers to the number of opera-
Multimedia Indexing tions (i.e., search, insertion, and deletion) that are
committed within each time unit. It is used to
measure the ef ciency of the concurrency control
protocols. The other concern is to prevent phan-
Recommended Reading tom access. Phantom access refers to the update
operation that occurs before the commitment and
Chakrabarti K, Mehrotra S (1999) Ef cient concurrency in the ranges of a search/deletion operation, while
control in multi-dimensional access methods. In: Pro- not re ected in the results of that search/deletion
ceedings of ACM SIGMOD international conference
on management of data, Philadelphia operation. The ability to prevent phantom access
Kornacker M, Mohan C, Hellerstein J (1997) Concurrency can be regarded as a certain level of consistency
and recovery in generalized search trees. In: Proceed- and isolation in ACID rules.
ings of ACM SIGMOD international conference on
management of data, Tucson
Song SI, Kim YH, Yoo JS (2004) An enhanced con-
currency control scheme for multidimensional index
structure. IEEE Trans Knowl Data Eng 16(1):97 111 Historical Background

In the last several decades, spatial data access


methods have been proposed and developed
Concurrency Control for Spatial to manage multi-dimensional databases as
Access Method required in GIS, computer-aided design, and
scienti c modeling and analysis applications.
Jing (David) Dai1 and Chang-Tien Lu2 Representative spatial data access methods are
1
Google, New York City, NY, USA R-trees (Guttman 1984), and space- lling curve
2
Department of Computer Science, Virginia with B-trees (Gaede and Gunther 1998). As
Tech, Falls Church, VA, USA shown in Fig. 1a, the R-tree groups spatial objects
into Minimum Bounding Rectangles (MBR),
and constructs a hierarchical tree structure
Synonyms to organize these MBRs. Differently, Fig. 1b
shows the space- lling curve which splits the
Concurrency control protocols; Concurrent spa- data space into equal-sized rectangles and
tial operations; Phantom update protection; Si- uses their particular curve (e.g., Hilbert curve)
multaneous spatial operations identi cations to index the objects in the cells
Concurrency Control for Spatial Access Method 287

Concurrency Control for Spatial Access Method, Fig. 1 Representative spatial access methods

into one-dimensional access methods, e.g., the ally ordered, some traditional concurrency con-
B-tree family. trol techniques such as link-based protocols are
In order to apply the widely studied spatial dif cult to adapt to spatial databases.
access methods to real applications, particular
concurrency control protocols are required for Spatial Concurrency Control Techniques
the multi-user environment. The simultaneous Since the last decade of the twentieth century,
operations on spatial databases need to be treated concurrency control protocols on spatial access
as exclusive operations without interfering with methods have been proposed to meet the require-
each other. In other words, the results of any op- ments of multi-user applications. The existing
eration have to re ect the current stable snapshot concurrency control protocols mainly focus on
of the spatial database at the commit time. the R-tree family, and most of them were de-
The concurrency control techniques for spatial veloped based on the concurrency protocols on
databases have to be integrated with spatial ac- the B-tree family. Based on the locking strategy,
cess methods to process simultaneous operations. these protocols can be classi ed into two cat-
Most of the concurrency control techniques were egories, namely, link-based methods and lock-
developed for one-dimensional databases. How- coupling methods.
ever, the existing spatial data access methods, The link-based methods rely on a pseudo
such as R-tree family and grid les, are quite global order of the spatial objects to isolate each
different from the one-dimensional data access concurrent operation. These approaches process
methods (e.g., overlaps among data objects and update operations by temporally disabling the
among index nodes are allowed). Therefore, the links to the indexing node being updated so that
existing concurrency control methods are not the corresponding search operations will not
suitable for these spatial databases. Furthermore, retrieve any inconsistent data. For instance, to
because the spatial data set usually is not glob- split node A into A1 and A2 in Fig. 2, a lock
288 Concurrency Control for Spatial Access Method

F F F

A B A B A1 B

A2 A2

a Lock A b Create A2 c Modify A

F F

A1 A2 B A1 A2 B

d Lock F and Modify F e Unlock F

Concurrency Control for Spatial Access Method, Fig. 2 Example of node split in link-based protocol

will be requested to disable the link from A Sequence Number (NSN). The counter of the
to its right sibling node B (step a) before the NSN is incremented in a node split and a new
actual split is performed. Then, a new node A2 value is assigned to the original node with the
will be created in step b by using the second new sibling node receiving the original node s
half of A, and linked to node B. In step c, A prior NSN and right-link pointer. In order for
will be modi ed to be A1 (by removing the an insert operation to execute correctly in this
second half), and then unlocked. Node F will algorithm, multiple locks on two or more levels
be locked before adding a link from F to A2 in must be held. Partial lock coupling (PLC) (Song
step d . Finally, F will be unlocked in step e, and et al. 2004) has been proposed to apply a link-
thus the split is completed. Following this split based technique to reduce query delays due
process, no search operations can access A2 , and to MBR updates for multi-dimensional index
no update operations can access A.A1 / before structures. The PLC technique provides high
step c. Therefore, the potential con iction caused concurrency by using lock coupling only in MBR
by concurrent update operations on node A can shrinking operations, which are less frequent than
be prevented. As one example of the link-based expansion operations.
approach, R-link tree, a right-link style algorithm The lock-coupling-based algorithms (Chen
(Kornacker and Banks 1995), has been proposed et al. 1997; Ng and Kamada 1993) release the
to protect concurrent operations by assigning lock on the current node only when the lock
logical sequence numbers (LSNs) on the nodes on the next node to be visited has been granted
of R-trees. This approach assures each operation while processing search operations. As shown
has at most one lock at a time. However, when a in Fig. 3, using the R-tree in Fig. 1a, suppose
propagating node splits and the MBR updates, objects C , E, D, and F are indexed by an R-tree
this algorithm uses lock coupling. Also, in with two leaf nodes A and B. A search window
this approach, additional storage is required WS can be processed using the lock-coupling
to maintain additional information, e.g., LSNs approach. The locking sequence in Fig. 3 can
of associated child nodes. Concurrency on the protect this search operation from reading the
Generalized Search Tree (CGiST) (Kornacker intermediate results of update operations as well
et al. 1997) protects concurrent operations by as the results of update operations submitted after
applying a global sequence number, the Node WS. During node splitting and MBR updating,
Concurrency Control for Spatial Access Method 289

1. Lock (Root, S)
2. Lock (A, S)
B
C E 3. Unlock (Root, S)
WS D 4. GetObject (A)
F
A 5. Unlock(A, S) C
Concurrency Control for Spatial Access Method, Fig. 3 Example of locking sequence using lock-coupling for WS

B
C E
D Object deleted by WU: D;
WS WU F
Objects selected by WS: C, E, D.
A

Concurrency Control for Spatial Access Method, Fig. 4 Example of phantom update

this scheme holds multiple locks on several Consistency All operations must leave the
nodes simultaneously. The dynamic granular database in a consistent state.
locking approach (DGL) has been proposed to Isolation Operations cannot interfere with
provide phantom update protection (discussed each other.
later) in R-trees (Chakrabarti and Mehrotra 1998) Durability Successful operations must sur-
and GiST (Chakrabarti and Mehrotra 1999). vive system crashes.
The DGL method dynamically partitions the
embedded space into lockable granules that The approaches to guarantee Atomicity and
can adapt to the distribution of the objects. Durability in traditional databases can be applied
The lockable granules are de ned as the leaf in spatial databases. Current research on spatial
nodes and external granules. External granules concurrency control approaches mainly focus
are additional structures that partition the non- on the Consistency and Isolation rules. For
covered space in each internal node to provide example, in order to retrieve the valid records,
protection. Following the design principles of spatial queries should not be allowed to access
DGL, each operation requests locks only on the intermediate results of location updates.
suf cient granules to guarantee that any two Similarly, the concurrent location updates
con icting operations will request locks on at with common coverage have to be isolated as
least one common granule. sequential execution; otherwise, they may not be
processed correctly.

Scientific Fundamentals
Phantom Update Protection
ACID Rules In addition to the ACID rules, phantom update
Concurrency control for spatial access methods protection is used to measure the effectiveness of
should assure the spatial operations are processed a concurrency control. An example of phantom
following the ACID rules (Ramakrishnan and update is illustrated in Fig. 4, where C , E, D, and
Gehrke 2001). These rules are de ned as F are objects indexed in an R-tree, and leaf nodes
follows. A, B are their parents, respectively. A deletion
with the window WU is completed before the
Atomicity Either all or no operations are commitment of the range query WS. The range
completed. query returns the set fC; E; Dg, even object D
290 Concurrency Control for Spatial Access Method

Concurrency Control for B


Spatial Access Method, C E
Fig. 5 Example of
ef cient concurrency D
WU WS F
control
A

should have been deleted by WU. A solution to sensitive messages to cell phone users within
prevent phantom update in this example is to lock a certain range. Concurrency control methods
the area affected by WU (which is D [ W U ) in should be employed to protect the search
order to prevent the execution of WS. process from frequent location updates, because
the updates are not supposed to reveal their
intermediate or expired results to the search
Measurement
process. Another example is a taxi management
The ef ciency of concurrency control for spatial
system that needs to assign a nearest available
access methods is measured by the throughput
taxi based on a client s request. Concurrency
of concurrent spatial operations. The issue to
control methods need to be applied to isolate
provide high throughput is to reduce the num-
the taxi location updating and queries so that
ber of unnecessary con icts among locks. For
the query results are consistent to the up-to-date
the example shown in Fig. 5, even if the update
snapshot of the taxi locations.
operation with window WU and the range query
with window WS intersect with the same leaf
node A, they will not affect each other s results.
Therefore, they should be allowed to access A Future Directions
simultaneously. Obviously, the smaller the lock-
able granules, the more concurrency operations The study on spatial concurrency control is far
will be allowed. However, this may signi cantly behind the research on spatial query process-
increase the number of locks in the database, and ing approaches. There are two interesting and
therefore generate additional overhead on lock emergent directions in this eld. One is to apply
maintenance. This is a tradeoff that should be concurrency control methods on complex spatial
considered when designing concurrency control operations; the other is to design concurrency
protocols. control protocols for moving object applications.
Complex spatial operations, such as spatial
join, k-nearest neighbor search, range nearest
Key Applications neighbor search, and reverse nearest neighbor
search, require special concern on concurrency
Concurrency control for spatial access methods control to be applied in multi-user applications.
are generally required in commercial multi- For example, how to protect the changing search
dimensional database systems. These systems range, and how to protect the large overall search
are designed to provide ef cient and reliable range have to be carefully designed. Furthermore,
data access. Usually, they are required to reliably the processing methods of those complex oper-
handle a large amount of simultaneous queries ations may need to be redesigned based on the
and updates. Therefore, sound concurrency concurrency control protocol in order to improve
control protocols are required in these systems. the throughput.
In addition, concurrency control methods are Spatial applications with moving objects
required in many speci c spatial applications have attracted signi cant research efforts. Even
which have frequent updates or need fresh query though many of these applications assume
results. For instance, a mobile advertise/alarm that the query processing is based on main
system needs to periodically broadcast time- memory, their frequent data updates require
Conflation of Features 291

sophisticated concurrency control protocols to


assure the correctness of the continuous queries.
Concurrency Control Protocols
In this case, concurrency access framework will
be required to support the frequent location  Concurrency Control for Spatial Access
updates of the moving objects. Frequent update  Concurrency Control for Spatial Access
operations usually result in a large number of Method
exclusive locks which may signi cantly degrade C
the throughput. Solutions to improve the update Concurrent Processing
speed and reduce the coverage of operations have
to be designed to handle this scenario.
 Spatial Data Analytics on Homogeneous Mul-
ti-Core Parallel Architectures

Cross-References
Concurrent Spatial Operations
 Indexing, Hilbert R-tree, Spatial Indexing,
Multimedia Indexing  Concurrency Control for Spatial Access
 Concurrency Control for Spatial Access
Method

References

Chakrabarti K, Mehrotra S (1998) Dynamic granular Conditional Spatial Regression


locking approach to phantom protection in R-trees. In:
Proceedings of IEEE international conference on data
engineering, Orlando  Spatial and Geographically Weighted Regres-
Chakrabarti K, Mehrotra S (1999) Ef cient concurrency sion
control in multi-dimensional access methods. In: Pro-
ceedings of ACM SIGMOD international conference
on management of data, Philadelphia
Chen JK, Huang YF, Chin YH (1997) A study of con- Conflation
current operations on R-trees. Inf Sci Int J 98(1
4):263 300
Gaede V, Gunther O (1998) Multidimensional access  Ontology-Based Geospatial Data Integration
methods. ACM Comput Surv 30(2):170 231  Positional Accuracy Improvement (PAI)
Guttman A (1984) R-trees: a dynamic index structure
for spatial searching. In: Proceedings of ACM SIG-
MOD international conference on management of
data, Boston
Kornacker M, Banks D (1995) High-concurrency locking Conflation of Features
in R-trees. In: Proceedings of international conference
on very large data bases, Zurich Sharad Seth and Ashok Samal
Kornacker M, Mohan C, Hellerstein J (1997) Concurrency
Department of Computer Science and
and recovery in generalized search trees. In: Proceed-
ings of ACM SIGMOD international conference on Engineering, The University of Nebraska at
management of data, Tucson Lincoln, Lincoln, NE, USA
Ng V, Kamada T (1993) Concurrent accesses to R-trees.
In: Proceedings of symposium on advances in spatial
databases, Singapore
Ramakrishnan R, Gehrke J (2001) Database management Synonyms
systems, 2nd edn. McGraw-Hill, New York
Song SI, Kim YH, Yoo JS (2004) An enhanced con- Automated Map Compilation; Data Integration;
currency control scheme for multidimensional in-
dex structure. IEEE Trans Knowl Data Eng 16(1): Entity Integration; Feature Matching; Realign-
97 111 ment; Rubber-Sheeting; Vertical Con ation
292 Conflation of Features

Definition is not surprising, therefore, that early use of


con ation was initiated by governmental agen-
In GIS, con ation is de ned as the process of cies. The applications related to the automation
combining geographic information from overlap- of map compilation for transferring positional
ping sources so as to retain accurate data, min- information from a base map to a non-geo-
imize redundancy, and reconcile data con icts referenced target map. An iterative process,
(Longley et al. 2001). The need for con ation called alignment or rubber-sheeting, was used
typically arises in updating legacy data for ac- to bring the coordinates of the two maps into
curacy or missing features/attributes by reference mutual consistency. The latter term alludes to
to newer data sources with overlapping coverage. stretching the target map that is printed on a
For example, the street-name and address-range rubber sheet so as to align it with the base map
data from the US Census Bureau can be con- at all points. Although contemplated many years
ated with the spatially accurate USGS digital- earlier (White 1981), the rst semi-automated
line-graph (DLG) to produce a more accurate systems for alignment came into existence only
and useful source than either dataset. Con ating in the mid-1980s. These interactive systems
vector GIS data with raster data is also a common were screen-based and image-driven (Lynch and
problem. Saal eld 1985). The operator was allowed, and
Con ation can take many different forms. even assisted, to select a pair of intersections to
Horizontal con ation refers to the matching of be matched. With each additional selected pair,
features and attributes in adjacent GIS sources the two maps were brought into closer agreement.
for the purpose of eliminating positional and Fully automated systems, later developed
attribute discrepancies in the common area of in a joint project between the US Geological
the two sources. Vertical con ation solves a Society (USGS) and the Bureau of Census,
similar problem for GIS sources with overlapping aimed at consolidating the agencies 5,700
coverage. As features are the basic entities in pairs of metropolitan map sheet les (Saal eld
a GIS, the special case of feature con ation 1988). This development was facilitated
has received much attention in the published by parallel advances in computer graphics
research. The data used for con ation are point, devices, computational-geometry algorithms, and
line, and area features and their attributes. statistical pattern recognition. Automation of the
Figure 1 illustrates a problem solved by feature process required replacing the operator s skills
con ation. The rst two GIS data layers show a at discerning like features with an analogous
digital ortho-photo and a topographic map of the feature-matching algorithm on the computer.
Mall area in Washington D.C. In the third layer on Alignment may be thought of as a mathematical
the right, showing an overlay of the two sources, transformation of one image that preserves
the corresponding features do not exactly line topology. A single global transformation may
up. With con ation, these discrepancies can be be insuf cient to correct errors occurring due
minimized, thus improving the overall accuracy to local distortions, thus necessitating local
of the data sources. alignments for different regions of the image.
Delauney triangulation de ned by selected
points is preferred for rubber-sheeting because
it minimizes the formation of undesirable thin,
Historical Background long triangles.
Early work in feature con ation was based
Until the 1980s, the collection of geographical on proximity of features in non-hierarchical
information in digital form was expensive enough ( attened ) data sources. Because GIS data
that having multiple sources of data for the same are typically organized into a hierarchy of classes
region, as required for con ation, was possible and carry much information that is not position-
only for large governmental organizations. It related, such as, names, scalar quantities, and
Conflation of Features 293

Conflation of Features, Fig. 1 Two GIS data layers for Washington DC and their overlay

geometrical shapes, the methods used to discover


identical objects can go beyond proximity Geo-Source Geo-Source Geo-Source
matches and include rule-based approaches
(Cobb et al. 1998), string matching, and shape
Registration
similarity (Samal et al. 2004).

Like-Feature Detection
Scientific Fundamentals
Similarity Sets
A prototype of a feature con ation system is
shown in Fig. 2. Such a system would typically
Feature Matching
form the back end of a geographic information
and decision-support system used to respond to
user queries for matched features. Some systems Consolidated Data
may not implement all three steps while others
may further re ne some of the steps, e.g., like- Conflation of Features, Fig. 2 Feature con ation steps
feature-detection may be split into two steps that
either use or ignore the geographical context of
the features during comparison. Further details of unknown transformation T :
the basic steps appear in the following sections.
g.u; v/ D T .f .x; y//:
Registration and Recti cation
Registration refers to a basic problem in remote Thus, in order to recover the original informa-
sensing and cartography of realigning a recorded tion from the recorded observations, we must rst
digital image with known ground truth or an- determine the nature of the transformation T , and
other image. An early survey in geographic data then execute the inverse operation T 1 on this
processing (Nagy and Wagle 1979) formulates image.
the registration problem in remote sensing as Often, because only indirect information is
follows: available about T , in the form of another im-
The scene under observation is considered age or map of the scene in question, the goal
to be a 2D intensity distribution f .x; y/. The of registration becomes nding a mathematical
recorded digital, another 2-D distribution g.u; v/, transformation on one image that would bring it
is related to the true scene f .x; y/ through an into concurrence with the other image. Geometric
294 Conflation of Features

distortions in the recorded image, which affect the well-known Hamming or Levenshtein metric
only the position and not the magnitude, can aimed at transcription errors. An alternative mea-
be corrected by a recti cation step that only sure of string comparison, based on their phonetic
transforms the coordinates. representation (Hall and Dowling 1980), may be
better suited to transcription errors. However, the
Like-Feature Detection names are often word phrases that may look
The notion of similarity is fundamental to match- very different as character strings, but connote
ing features, as it is to many other elds, includ- the same object, e.g., National Gallery of Art
ing, pattern recognition, arti cial intelligence, and National Art Gallery . Table 1 (Samal et al.
information retrieval, and psychology. While the 2004) shows the type of string errors that string
human view of similarity may be subjective, matching should accommodate:
automation requires objective (quantitative) mea- For locations or points, the Euclidean distance
sures. is commonly used for proximity comparison. A
Similarity and distance are complementary generalization to linear features, such as streets
concepts. It is often intuitively appealing to de ne or streams, is the Hausdorff distance, which de-
a distance function d.A; B/ between objects A notes the largest minimum distance between the
and B in order to capture their dissimilarity and two linear objects. Goodchild and Hunter (1997)
convert it to a normalized similarity measure by describe a less computer-intensive and robust
its complement: method that relies on comparing two represen-
tations with varying accuracy. It estimates the
d.A; B/ percentage of the total length of the low-accuracy
s.A; B/ D 1 ; (1) representation that is within a speci ed distance
U
of the high-accuracy representation.
where the normalization factor U may be chosen The shape is an important attribute of polygo-
as the maximum distance between any two ob- nal features in GIS, such as building outlines and
jects that can occur in the data set. The normaliza- region boundaries. As polygons can be regarded
tion makes the value of similarity a real number as linear features, the Goodchild and Hunter
that lies between zero and one. approach may be adapted to de ne shape com-
Mathematically, any distance function must parison. A two-step process is described for this
satisfy the properties if minimality (d.a; b/ purpose by Samal et al. (2004). First, a veto
d.a; a/ 0), symmetry (d.a; b/ D d.b; a/), is imposed if the aspect ratios are signi cantly
and triangular inequality (d.a; b/ C d.b; c/ different. Otherwise, the shapes are scaled to
d.a; c/). However, in human perception studies, match the lengths of their major axes and overlaid
the distance function must be replaced by the by aligning their center points. The similarity
judged distance for which all of these math- of a less accurate shape A to a more accurate
ematical axioms have been questioned (Santini shape B is the percentage A within the buffer
and Jain 1999). Tversky (1977) follows a set- zone of B (see Fig. 3). When the accuracy of the
theoretic approach in de ning similarities be- two sources is comparable, the measure could
tween two objects as a function of the attributes be taken as the average value of the measure
that are shared by the two or by one but not computed both ways.
the other. His de nition is not required to follow Comparing scalars (reals or integers) seems
any of the metric axioms. It is particularly well to be straightforward: take the difference as
suited to fuzzy attributes with discrete overlap- their distance and convert to similarity by using
ping ranges of values. Eq. (1). The normalization factor U , however,
In GIS, the two objects being compared often must be chosen carefully to match intuition. For
have multiple attributes, such as name, location, example, one would say that the pair of numbers
shape, and area. The name attribute is often 10 and 20 is less similar to the pair 123,010
treated as a character string for comparison using and 123020, even though the difference is the
Conflation of Features 295

Conflation of
Error type Examples
Features, Table 1
Typical string errors and Sample 1 Sample 2
differences that matching Word omission Abraham Lincoln Memorial Lincoln Memorial
should accommodate Word substitution Reagan National Airport Washington National Airport
Word transposition National Art Gallery National Gallery of Art
Word abbreviation National Archives Nat l Archives
Character omission Washington Monument Washington Monument C
Character substitution Frear Gallery Freer Gallery

Conflation of Features, Fig. 3 Two polygons and their buffered intersection

Conflation of Features, Fig. 4 Two similar features and their geographic contexts

same in both cases. Hence, the normalization Context


factor should be equated with the magnitude Clearly, context plays an important role in the
of the range of values de ned for scalars human perception of similarity. The similarity
a and b. measures described above, however, are context-
The hierarchical nature of GIS data makes it independent: two features are compared in
possible to also assess the similarities of two ob- isolation without reference to other features on
jects along their categorical structure. A knowl- their respective sources. Context-independent
edge base of spatial categories can be built us- similarity measures alone are not always
ing Wordnet (Fellbaum 1998) and Spatial Data suf cient enough to determine feature matches
Transfer Standard (SDTS) (Rodriguez and Egen- unambiguously, necessitating the use of some
hofer 2003). form of context to resolve such cases.
296 Conflation of Features

The geographic context is de ned as the include clustering (Baraldi and Blonda 1999)
spatial relationships between objects in an and fuzzy logic (Zadeh 1965).
area (Samal et al. 2004). Examples of such For example, similarity of two buildings
relationships include topologies, distances, and appearing in different GIS data layers (as in
directions. Topological relationships, such as Fig. 1a, b) could be established by comparing
disjoint, meet, overlap, and covering are used their individual attributes, such as shape
by the researchers at the University of Maine to and coordinates. These context-independent
model the context of areal features in applications measures, however, may not be suf cient and
involving query by sketch and similarity of it may become necessary to use the geographical
spatial scenes (Bruns and Eggenhofer 1996). context to resolve ambiguities or correct errors.
Distances and angles of a feature to other features
have also been used to represent the geographic
context (Samal et al. 2004). Figure 4 shows Key Applications
the geographic contexts of two nearby features
with similar shapes. The contexts can be seen
Coverage Data gathering is the most expensive part of
to be different enough to disambiguate these two Consolida- building a geographical information system
features when they are compared with a candidate tion: (GIS). In traditional data gathering, this
feature in another source. Further, to keep the cost expense is directly related to the standards
of rigor used in data collection and data
of context-dependent matching under control, it
entry. Feature con ation can reduce the cost
may be enough to de ne the geographic context of GIS data acquisition by combining inex-
with respect to only a small number of well pensive sources into a superior source. With
chosen landmark features. the widespread use of the Web and GPS,
the challenge in consolidation is shifting
from improving accuracy to integrating an
abundance of widely distributed sources by
Feature Matching automated means
Spatial data By identifying common and missing fea-
update: tures between two sources through feature
The similarity measures discussed above for indi-
con ation, new features can be added to an
vidual attributes of a feature must be combined in old source or their attributed updated from
some fashion to provide overall criteria for fea- a newer map
ture matching. According to Cobb et al. (1998), Coverage Non-georeferenced spatial data must be
The assessment of feature match criteria is a registra- registered before it can be stored in a
tion: GIS. Good registration requires choosing
process in which evidence must be evaluated and a number of features for which accurate
weighed and a conclusion drawn not one in geo-positional information is available and
which equivalence can be unambiguously deter- which are also spatially accurate on the
mined . . . after all, if all feature pairs matched source. Spatial data update can help in iden-
tifying good candidate features for registra-
exactly, or deviated uniformly according to pre- tion
cise processes, there would be no need to con ate Error Feature con ation can not only tell which
the maps! detection: features in two sources are alike, but also
The problem can be approached as a restricted provide a degree of con dence for these
assertions. The pairs with low con dence
form of the classi cation problem in pattern can be checked manually for possible errors
recognition: Given the evidence provided by
the similarity scores of different attributes of
two features, determine the likelihood of one
feature belonging to the same class as the other Future Directions
feature. Because of this connection, it is not
surprising that researchers have used well-known Con ation in GIS can be thought of as part of
techniques from pattern recognition to solve the the broader problems in the information age
feature matching problem. These techniques of searching, updating, and integration of data.
Conflation of Geospatial Data 297

Because the sense of place plays such an Hall P, Dowling G (1980) Approximate string matching.
important role in our lives, all kinds of non- ACM Comput Surv 12(4):381 402
Longley PA, Goodchild MF, Maguire DJ, Rhind DW
geographical data related to history and culture (2001) Geographic information systems and science.
can be tied to a place and thus become a candidate Wiley, Chichester
for con ation. In this view, geographical Lynch MP, Saal eld A (1985) Con ation: automated map
reference becomes a primary key used by compilation a video game approach. In: Proceedings
search engines and database applications to
of the auto-carto 7 ACSM/ASP, Falls Church, 11 Mar C
1985
consolidate, lter, and access the vast amount Nagy G, Wagle S (1979) Geographic data processing.
of relevant data distributed among many data ACM Comput Surv 11(2):139 181
sources. The beginnings of this development Rodriguez A, Egenhofer M (2003) Determining semantic
similarity among entity classes from different ontolo-
can already be seen in the many applications gies. IEEE Trans Knowl Data Eng 15:442 456
already in place or envisaged for Google Earth Saal eld A (1988) Con ation: automated map compila-
and other similar resources. If the consolidated tion. Int J GIS 2(3):217 228
data remains relevant over a period of time and Samal A, Seth S, Cueto K (2004) A feature-based ap-
proach to con ation of geospatial sources. Int J GIS
nds widespread use, it might be stored and used 18(5):459 589
as a new data source, much in the same fashion Santini S, Jain R (1999) Similarity measures. IEEE Trans
as the results of con ation are used today. Pattern Anal Mach Intell 21(9):87 883
The traditional concern in con ation for posi- Tversky A (1977) Features of similarity. Psychol Rev
84:327 352
tional accuracy will diminish in time with the in- White M (1981) The theory of geographical data con a-
creasing penetration of GPS in consumer devices tion. Internal Census Bureau draft document
and the ready availability of the accurate position Zadeh LA (1965) Fuzzy sets. Inf Control 8:338 353
of all points on the earth. The need for updating
old data sources and integrating them with new
information, however, will remain an invariant.
Recommended Reading

Rodriguez A, Egenhofer M (1999) Assessing similar-


ity among geospatial feature class de nitions. In:
Cross-References Vckorski A, Brasel KE, Schek HJ (eds) Lecture
notes in computer science, vol 1580. Springer, Berlin,
 Con ation of Geospatial Data pp 189 202
 Geospatial Semantic Integration
 Ontology-Based Geospatial Data Integration

Conflation of Geospatial Data

References Ching-Chien Chen1 and Craig A. Knoblock2


1
Geosemble Technologies, El Segundo, CA,
Baraldi A, Blonda P (1999) A survey of fuzzy clustering USA
algorithms for pattern recognition. IEEE Trans Syst 2
Man Cybern I II 29(6):778 785
Department of Computer Science, University
Bruns H, Eggenhofer M (1996) Similarity of spatial of Southern California, Marina del Rey, CA,
scenes. In: Molenaar M, Kraak MJ (eds) Proceedings USA
of the 7th international symposium on spatial data
handling. Taylor and Francis, London, pp 31 42
Cobb M, Chung MJ, Foley H III, Petry FE, Shaw KB
(1998) A rule-based approach for the con ation of
attributed vector data. Geoinformatica 2(1):7 35 Synonyms
Fellbaum C (ed) (1998) Wordnet: an electronic lexical
database. MIT, Cambridge Computer cartography; Geospatial data align-
Goodchild MF, Hunter GJ (1997) A simple positional
accuracy measure for linear features. Int J Geogr Inf ment; Geospatial data reconciliation; Imagery
Sci 11(3):299 306 con ation
298 Conflation of Geospatial Data

Definition Historical Background

Geospatial data con ation is the compilation or For a number of years, signi cant manual effort
reconciliation of two different geospatial datasets has been required to con ate two geospatial
covering overlapping regions (Saalfeld 1988). In datasets by identifying features in two datasets
general, the goal of con ation is to combine the that represent the same real-world features,
best quality elements of both datasets to create then aligning spatial attributes and non-spatial
a composite dataset that is better than either of attributes of both datasets. Automated vector
them. The consolidated dataset can then provide and vector con ation was rst proposed by
additional information that cannot be gathered Saalfeld (1988), and the initial focus of con ation
from any single dataset. was using geometrical similarities between
Based on the types of geospatial datasets dealt spatial attributes (e.g., location, shape, etc.)
with, the con ation technologies can be catego- to eliminate the spatial inconsistency between
rized into the following three groups: two overlapping vector maps. In particular, in
Saalfeld (1988), Saalfeld discussed mathematical
theories to support the automatic process.
Vector to vector data con ation: A typical
From then, various vector to vector con ation
example is the con ation of two road net-
techniques have been proposed (Walter and
works of different accuracy levels. Figure 1
Fritsch 1999; Ware and Jones 1998) and
shows a concrete example to produce a su-
many GIS systems (such as Con ex (http://
perior dataset by integrating two road vec-
www.digitalcorp.com/con ex.htm)) have been
tor datasets: road network from US Census
implemented to achieve the alignments of
TIGER/Line les, and road network from the
geospatial datasets. More recently, with the
department of transportation, St. Louis, MO
proliferation of attributed vector data, attribute
(MO-DOT data).
information (i.e., non-spatial information) has
Vector to raster data con ation: Fig. 2 is an
become another prominent feature used in the
example of con ating a road vector dataset
con ation systems, such as ESEA MapMerger
with a USGS 0.3 m per pixel color image.
(http://www.esea.com/products/) and the system
Using the imagery as the base dataset for
developed by Cobb et al. (1998).
position, the con ation technique can cor-
Most of the approaches mentioned above fo-
rect the vector locations and also annotate
cus on vector to vector con ation by adapting
the image with appropriate vector attributes
different techniques to perform the matching.
(as Fig. 2b).
However, due to the rapid advances in remote
Raster to raster data con ation: Fig. 3 is an
sensing technology from the 1990s to capture
example of con ating a raster street map (from
high resolution imagery and the ready accessibil-
MapQuest) with a USGS image. Using the im-
ity of imagery over the Internet, such as Google
agery as the base dataset for position, the con-
Maps (http://maps.google.com/) and Microsoft
ation technique can create intelligent images
TerraService (http://terraservice.net/), the con a-
that combine the visual appeal and accuracy
tion with imagery (such as vector to imagery
of imagery with the detailed attributes often
con ation, imagery to imagery con ation and
contained in maps (as Fig. 3b).
raster map to imagery con ation) has become
one of the central issues in GIS. The objectives
Also note that although the examples shown in of these imagery-related con ation are, of course,
Figs. 1, 2, and 3 are the con ation of datasets cov- to take full advantages of updated high resolu-
ering the same region (called vertical con ation), tion imagery to improve out-of-date GIS data
the con ation technologies can also be applied to and to display the ground truth in depth with
merge adjacent datasets (called horizontal con a- attributes inferred from other data sources (as
tion). the examples shown in Figs. 2b and 3b). Due to
Conflation of Geospatial Data 299

Conflation of Geospatial
Data, Fig. 1 An example
of vector to vector
con ation

the natural characteristics of imagery (or, more knowledge about the approximate location and
generally, geospatial raster data), the matching shape of the counterpart elements in the image,
strategies used in con ation involve more image- thus improving the accuracy and running time to
processing or pattern recognition technologies. detect matched features from the image. Mean-
Some proposed approaches (Cobb et al. 1998; while, there are also numerous research activ-
Flavie et al. 2000) rely on edge detections or ities (Chen et al. 2004a; Seedahmed and Mar-
interest-point detections to extract and convert tucci 2002; Dare and Dowman 2000) focusing
features from imagery to vector formats, and on con ating different geospatial raster datasets.
then apply vector to vector con ation to align Again, these approaches perform diverse image-
them. Other approaches (Agouris et al. 2001; processing techniques to detect and match coun-
Chen et al. 2006a; Eidenbenz et al. 2000), how- terpart elements, and then geometrically align
ever; utilize the existing vector data as prior these raster datasets so that the respective pixels
knowledge to perform a vector-guided image or their derivatives (edges, corner point, etc.)
processing. Conceptually, the spatial informa- representing the same underlying spatial structure
tion on the vector data represents the existing are fused.
300 Conflation of Geospatial Data

Conflation of Geospatial
Data, Fig. 2 An example
of vector to raster data
con ation (Modi ed gure
from Chen et al. 2006a)

Today, with the popularity of various Scientific Fundamentals


geospatial data, automatic geospatial data
con ation is rather an area of active research. A geospatial data con ation system requires ef-
Consequently, there are various commercial cient and robust geometric and statistical algo-
products, such as MapMerger and Con ex, rithms, and image processing and pattern recog-
supporting automatic vector to vector data con a- nition techniques to implement a rather broad
tion with limited human intervention. However, spectrum of mathematical theories. The frame-
there are no commercial products to provide work of con ation process can be generalized
automatic vector to raster or raster to raster into the following steps: (1) Feature matching:
con ation. Find a set of conjugate point pairs, termed control
Conflation of Geospatial Data 301

Conflation of Geospatial Data, Fig. 3 An example of raster map to imagery con ation (Modi ed gure from Chen
et al. 2004a)

point pairs, in two datasets, (2) Match checking: the third step in the above-mentioned con ation
Filter inaccurate control point pairs from the set framework). The conclusion of Saalfeld s work is
of control point pairs for quality control, and that Delaunay triangulation is an effective strat-
(3) Spatial attribute alignment: Use the accurate egy to partition the domain space into triangles
control points to align the rest of the geospatial (in uence regions) to de ne local adjustments
objects (e.g., points or lines) in both datasets by (see the example in Fig. 4). A Delaunay trian-
using space partitioning techniques (e.g., triangu- gulation is a triangulation of the point set with
lation) and geometric interpolation techniques. the property that no point falls in the interior
During the late 1980s, Saalfeld (1988) initial- of the circumcircle of any triangle (the circle
ized the study to automate the con ation process. passing through the three triangle vertices). The
He provided a broad mathematical context for Delaunay triangulation maximizes the minimum
con ation theory. In addition, he proposed an angle of all the angles in the triangulation, thus
iterative con ation paradigm based on the above- avoiding elongated, acute-angled triangles. The
mentioned con ation framework by repeating the triangle vertices (i.e., control points) of each
matching and alignment, until no further new triangle de ne the local transformation within
matches are identi ed. In particular, he investi- each triangle to reposition other features. The
gated the techniques to automatically construct local transformation used for positional interpo-
the in uence regions around the control points lation is often the af ne transformation, which
to reposition other features into alignment by consists of a linear transformation (e.g., rota-
appropriate local interpolation (i.e., to automate tion and scaling) followed by a translation. An
302 Conflation of Geospatial Data

af ne transformation can preserve collinearity cally merge datasets based on the control points,
and topology. The well-known technique, rubber- many algorithms have been invented around this
sheeting (imagine stretching a dataset as if it were con ation paradigm with a major focus on solv-
made of rubber), typically refers to the process ing the matching (correspondence) problem to
comprising triangle-based space partition and the nd accurate control point pairs (i.e., to automate
transformation of features within each triangle. the rst two steps in the above-mentioned con-
What Saalfeld discovered had a profound im- ation framework). However, feature matching
pact upon con ation techniques. From then on, algorithms differ with the types of datasets un-
the rubber-sheeting technique (with some vari- dergoing the match operation. In the following,
ants) is widely used in con ation algorithms, we discuss existing con ation (matching) tech-
because of the sound mathematical theories and nologies based on the types of geospatial datasets
because of its success in many practical exam- dealt with.
ples. In fact, these days, most of commercial
con ation products support the piecewise rubber- Vector to vector conflation: There have been
sheeting. Due to the fact that rubber-sheeting has a number of efforts to automatically or semi-
become commonly known strategy to geometri- automatically accomplish vector to vector

Conflation of Geospatial
Data, Fig. 4 An example
of Delaunay triangulation
based on control points
(Modi ed gure from
Chen et al. 2006a)
Conflation of Geospatial Data 303

con ation. Most of the existing vectorvector requires more data-speci c image process-
con ation algorithms are with a focus on road ing techniques to identify the corresponding
vector data. These approaches are different, features from raster data. Some exiting ap-
because of the different methods utilized proaches, for example, include:
for locating the counterpart elements from Con ating two images by extracting and
both vector datasets. The major approaches matching various features (e.g., edges and
include: feature points) across images (Seedahmed C
Matching vector data based on the simi- and Martucci 2002; Dare and Dowman
larities of geometric information (such as 2000).
nodes and lines) (Saalfeld 1988; Walter and Con ating a raster map and imagery
Fritsch 1999; Ware and Jones 1998). by computing the relationship between
Matching attributeannotated vector data two feature point sets detected from
based on the similarities of vector shapes as the datasets (Chen et al. 2004a). In this
well as the semantic similarities of vector approach, especially, these feature points
attributes (Cobb et al. 1998). are generated by exploiting auxiliary
Matching vector data with unknown coor- spatial information (e.g., the coordinates of
dinates based on the feature point (e.g., the imagery, the orientations of road segments
road intersection) distributions (Chen et al. around intersections from the raster map,
2006b). etc.) and non-spatial information (e.g., the
Vector to imagery conflation: Vector to im- image resolution and the scale of raster
agery (and Vector to raster) con ation, on the maps). Figure 3b is the example result
other hand, mainly focus on developing effec- based on this technology.
tive and ef cient image processing techniques
to resolve the correspondence problem. The
major approaches include:
Detecting all salient edges from imagery Key Applications
and then comparing with vector data (Filin
and Doytsher 2000). Con ation technologies are used in many appli-
Utilizing vector data to identify corre- cation domains, most notably the sciences and
sponding image edges based on (modi ed) domains using high quality spatial data such as
Snakes algorithm (Agouris et al. 2001; GIS.
Kass et al. 1987).
Utilizing stereo images, elevation data and
knowledge about the roads (e.g., parallel- Cartography
lines and road marks) to compare vector It is well known that computers and mathematical
and imagery (Eidenbenz et al. 2000). methods have had a profound impact upon car-
Exploiting auxiliary spatial information tography. There has been a massive proliferation
(e.g., the coordinates of imagery and vec- of geospatial data, and no longer is the traditional
tor, the shape of roads around intersections, paper map the nal product. In fact, the focus of
etc.) and non-spatial information (e.g., the cartography has shifted from map production to
image color/resolution and road widths) to the presentation, management and combination
perform a localized image processing to of geospatial data. Maps can be produced on
compute the correspondence (Chen et al. demand for specialized purposes. Unfortunately,
2004b, 2006a). Figure 2b is the example the data used to produce maps may not always be
result based on this technology. consistent. Geospatial data con ation can be used
Raster to raster conflation: In general, raster to address this issue. For example, we can con-
to raster con ation (e.g., imagery to imagery ate to out-of-date maps with up-to-date imagery
con ation and map to imagery con ation) to identify inconsistencies.
304 Conflation of Geospatial Data

GIS Homeland Security


Geographic information provides the basis for The con ation of geospatial data can provide
many types of decisions ranging from economic insights and capabilities not possible with indi-
and community planning, land and natural vidual data. It is important to the national in-
resource management, health, safety and military terest that this automatic con ation problem be
services. Improved geographic data should lead addressed since signi cant statements concerning
to better conclusions and better decisions. In the natural resources, environment, urban set-
general, superior data would include greater tlements, and particularly internal or Homeland
positional accuracy, topological consistency and Security, are dependent on the results of accurate
abundant attribution information. Con ation con ation of geospatial datasets such as satel-
technology, of course, plays a major role in lite images and geospatial vector data including
producing high quality data for various GIS transportation, hydrographic and cadastral data.
applications requiring high-quality spatial data.
Military Training and Intelligence
Many military training and preparation systems
Computational Geometry
require high quality geospatial data for correctly
Although originally the con ation technology
building realistic training environments across di-
is intended for consolidating geospatial datasets
verse systems/applications. An integrated view of
that are known to contain the same features, the
geographic datasets (especially satellite imagery
methods employed in con ation can be adapted
and maps) can also help military intelligence
for other applications. For example, the variants
analysts to more fully exploit the information
of rubber-sheeting techniques are widely used to
contained in maps (e.g., road/railroad networks
support general spatial interpolation. The point or
and textual information from the map, such as
line matching algorithms, in turn, can be used in a
road names and gazetteer data) for analyzing
broad spectrum of geometric object comparisons.
imagery (i.e., identify particular targets, features,
and other important geographic characteristics)
Aerial Photogrammetry and use the information in imagery to con rm in-
With the wide availability of high resolution formation in maps. The geospatial data con ation
aerial photos, there is a pressing need to analyze technique is the key technology to accomplish
aerial photos to detect changes or extract up-to- this.
date features. In general, the problem of extract-
ing features from imagery has been an area of Crisis Management
active research for the last 25 years and given the In a crisis, such as a large re, a category 5 hur-
current state-of-the-art will be unlikely to provide ricane, a dirty bomb explosion, emergency per-
near-term fully-automated solutions to the feature sonnel must have access to relevant geographic
extraction problem. For many regions, there are information quickly. Typically, geographic data,
detailed feature datasets that have already been such as maps and imagery are important data
constructed, but these may need to be con ated sources for personnel who are not already famil-
with the current imagery. Con ating and correlat- iar with a local area. The con ation technology
ing vector data with aerial photos is more likely to enables emergency personnel to rapidly integrate
succeed over a pure feature extraction approach the maps, vector data, and imagery for a local
since we are able to exploit signi cant prior area to provide an integrated geographic view of
knowledge about the properties of the features to an area of interest.
be extracted from the photos. Furthermore, after
con ation, the attribution information contained Transportation Data Update
in vector dataset can be used to annotate spatial Many GIS applications require the road vector
objects to better understand the context of the data for navigation systems. These days, up-to-
photos. date high resolution imagery is often utilized to
Conflation of Geospatial Data 305

a b

Before conflation After conflation

Conflation of Geospatial Data, Fig. 5 An example of parcel vector data to imagery con ation

verify and update road vector data. The ability vided by the parcel data (as an example shown
to automatically con ate the original road vector Fig. 5b) can be combined with the visible in-
data with images supports more ef cient and formation provided by the imagery. Therefore,
accurate updates of road vector. the con ation of these datasets can provide cost
savings for many applications, such as county,
Real Estate city, and state planning, or integration of diverse
With the growth of the real estate market, there datasets for more accurate address geocoding or
are many online services providing real estate emergency response.
records by superimposing the parcel boundaries
on top of high-resolution imagery to show the lo-
cation of parcels on imagery. However, as is typ- Future Directions
ically the case in integrating different geospatial
datasets, a general problem in combining parcel With the rapid improvement of geospatial data
vector data with imagery from different sources is collection techniques, the growth of Internet and
that they rarely align (as shown in Fig. 5a). These the implementation of Open GIS standards, a
displacements can mislead the interpretation of large amount of geospatial data is now readily
parcel and land use data. As the example shown available. There is a pressing need to combine
in Fig. 5, parcel data are often represented as these datasets together using con ation technol-
polygons and include various attributes such as ogy. Although there has been signi cant progress
ownership information, mailing address, acreage, on automatic con ation technology in the last
market value and tax information. The cities and few years, there is still much work to be done.
counties use this information for watershed and Important research problems include, but are not
ood plain modelling, neighborhood and trans- limited to the following: (1) resolving discrepan-
portation planning. Furthermore, various GIS ap- cies between datasets with very different levels
plications rely on parcel data for more accurate of resolution and thematic focus, (2) extending
geocoding. By con ating parcel vector data and existing technologies to handle a broad range of
imagery, the detailed attribution information pro- datasets (in addition to road networks), such as
306 Conflict Resolution

elevation data and hydrographic data, (3) allow- and GIS data. Int Arch Photogramm Remote Sens
ing for uncertainty in the feature matching stage, 33:282 288
Flavie M, Fortier A, Ziou D, Armenakis C, Wang S
and (4) improving the processing time (espe- (2000) Automated updating of road information from
cially for raster data) to achieve con ation on the aerial images. In: Proceedings of American soci-
y. ety photogrammetry and remote sensing conference,
Amsterdam
Kass M, Witkin A, Terzopoulos D (1987) Snakes:
active contour models. Int J Comput Vis 1(4):
Cross-References 321 331
Saalfeld A (1988) Con ation: automated map
 Change Detection compilation. Int J Geogr Inf Sci 2(3):
217 228
 Intergraph: Real-Time Operational Geospatial
Seedahmed G, Martucci L (2002) Automated image reg-
Applications istration using geometrical invariant parameter space
 Photogrammetric Applications clustering (GIPSC). In: Proceedings of the photogram-
 Uncertain Environmental Variables in GIS metric computer vision, Graz
Walter V, Fritsch D (1999) Matching spatial data sets:
 Voronoi Diagram
a statistical approach. Int J Geogr Inf Sci 5(1):
445 473
Ware JM, Jones CB (1998) Matching and aligning features
in overlayed coverages. In: Proceedings of the 6th
References ACM symposium on geographic information systems,
Washington, DC
Agouris P, Stefanidis A, Gyftakis S (2001) Differential
snakes for change detection in road segments. Pho-
togramm Eng Remote Sens 67(12):1391 1399
Chen C-C, Knoblock CA, Shahabi C, Chiang Y-Y,
Thakkar S (2004a) Automatically and accurately con- Conflict Resolution
ating orthoimagery and street maps. In: Proceedings
of the 12th ACM international symposium on ad-
vances in geographic information systems, Washing-  Computing Fitness of Use of Geospatial
ton, DC Datasets
Chen C-C, Shahabi C, Knoblock CA (2004b) Utiliz-  Smallworld Software Suite
ing road network data for automatic identi cation of
road intersections from high resolution color orthoim-
agery. In: Proceedings of the second workshop on
spatiotemporal database management (co-located with Consequence Management
VLDB2004), Toronto
Chen C-C, Knoblock CA, Shahabi C (2006a) Automat-
ically con ating road vector data with orthoimagery.  Emergency Evacuations, Transportation Net-
Geoinformatica 10(4):495 530 works
Chen C-C, Shahabi C, Knoblock CA, Kolahdouzan M
(2006b) Automatically and ef ciently matching road
networks with spatial attributes in unknown geometry
systems. In: Proceedings of the third workshop on Conservation Medicine
spatiotemporal database management (co-located with
VLDB2006), Seoul
Cobb M, Chung MJ, Miller V, Foley H III, Petry FE, Shaw  Exploratory Spatial Analysis in Disease Ecol-
KB (1998) A rule-based approach for the con ation of ogy
attributed vector data. GeoInformatica 2(1):7 35
Dare P, Dowman I (2000) A new approach to automatic
feature based registration of SAR and SPOT images.
Int Arch Photogramm Remote Sens 33(B2):125 130
Eidenbenz C, Kaser C, Baltsavias E (2000) ATOMI Constrained Nearest Neighbor
automated reconstruction of topographic objects from Queries
aerial images using vectorized map information. Int
Arch Photogramm Remote Sens 33(Part 3/1):462 471
Filin S, Doytsher Y (2000) A linear con ation approach  Variations of Nearest Neighbor Queries in Eu-
for the integration of photogrammetric information clidean Space
Constraint Data, Visualizing 307

pixel into a speci c color and present each pixel


Constraint Data, Visualizing individually. To visualize vector data, the visual-
ization application has to identify the geometrical
Shasha Wu
primitives such as points, lines, curves, and poly-
Department of Mathematics, Computer Science
gons, convert the original geospatial coordinate
and Physics, Spring Arbor University, Spring
system to screen coordinate system, associate a
Arbor, MI, USA
particular color to each shape, and then output C
those shapes through the drawing functions pro-
vided by the operating system.
Synonyms The geometrical primitives used by vector
data are all based upon mathematical equations
Constraint database visualization; Isometric color to represent images in computer graphics.
bands displays The constraint databases use linear equality
and inequality constraints as its primitive data
type to represent spatial data. That makes
Definition constraint databases a natural solution for storing,
retrieving, and displaying vector-based spatial
In general, visualization is any technique for data.
creating images, diagrams, or animations in order The visualization of spatial data in a
to present any message. Scienti c visualization is constraint database system is the process of
an application of computer graphics which is con- transforming the vector-based spatial data, which
cerned with the presentation of potentially huge is represented by linear equality and linear
quantities of laboratory, simulation, or abstract inequality constraints in a disjunctive normal
data to aid cognition, hypotheses building, and form (DNF) (Revesz 2002; Rigaux et al. 2003),
reasoning. into a set of points, segments, and convex
In a spatial database system, spatial informa- polygons, associating a color with the individual
tion is usually stored in the format of raster shapes, and then projecting those shapes onto the
data or vector data. To visualize raster data, output devices.
the visualization application has to convert the For example, Fig. 1 is the visualization result
geographical information associated with each of the following three linear constraint relations:

PointA.x; y/ W - x D 0; y D 5:
LineAB.x; y/ W - x 0; y 0; x C 2y D 10:
PolygonC.i; x; y/ W - i D 1; x 2y 5; x C y 15;
x 5:
PolygonC.i; x; y/ W - i D 2; x C 2y 5x C y 25;
x 3y 5; x 5:

Constraint databases are well suited for spatiotemporal data in an identical format
animation because they allow any granularity and the support of recursive queries make
for the animation without requiring much constraint databases a good approach for
data storage (Revesz 2002). Beyond that, the many dif cult visualization problems, such
ability of representing spatiotemporal and non- as the visualization of recursively de ned
308 Constraint Data, Visualizing

Constraint Data, 10
Visualizing, Fig. 1
Visualization of point,
polyline, and polygon in
constraint databases 1

5 A 2
C

0 B
0 5 10 15 20

spatiotemporal concepts discussed in Revesz model are also implemented in some constraint
and Wu (2004, 2006). database systems. For example, the MLPQ
Although most existing constraint database system implements both the regular polygon
systems can only visualize 2-D spatiotemporal visualization and the parametric rectangle
objects, they can be extended to visualize three visualization. The last one, named the PReSTO
or even higher-dimensional spatiotemporal ob- system, implements several special animation
jects. By introducing new variables into the lin- features like Collide and Block. With the
ear constraints, constraint databases can repre- increased number of applications developed from
sent higher-dimensional objects similar to 2-D the spatial constraint database systems, the aim of
objects. The visualization of those objects is ef ciently and naturally visualizing sophisticated
reduced to a process of visualizing the union of spatial or spatiotemporal constraint data attracts
basic higher-dimensional blocks. more and more attention.

Scientific Fundamentals
Historical Background
Static Displays
Constraint databases, including spatial constraint Any 2-D static display can be reduced to the
databases, were proposed by Kanellakis, Kuper, visualization of points, polylines, and polygons.
and Revesz in (1990). They showed in Kanellakis In constraint databases, a point can be directly
et al. (1995) that ef cient, declarative database represented by linear equations over two vari-
programming can be combined with ef cient ables .x; y/. For example, point A(1,1) in Fig. 2
constraint solving and suggested that the can be represented as
constraint database framework can be applied
to manage spatial data. A few years later, several
spatial constraint databases systems, such as A.x; y/ W - x D 1; y D 1:
the MLPQ system (Revesz and Li 1997), the
CCUBE system (Brodsky et al. 1997), the It is a trivial problem to visualize a point with x
DEDALE system (Grumbach et al. 1998), and and y coordinations. Things are a little bit more
the CQA/CDB system (Goldin et al. 2003), complex for polylines and polygons.
were developed. During the development of The line segment between points B(1,3) and
those systems, convex polygons were the major C (3,1) can be represented as
visualization blocks presenting the outputs
of the spatial constraint databases systems. BC.x; y/ W - x C y D 4; x 1; x 3;
Extreme point data models like the rectangles
data model (Revesz 2002) and Worboys data y 1; y 3;
Constraint Data, Visualizing 309

y Among all of them, the simplest algorithm is


polygon triangulation, which represents a poly-
gon through a set of triangles. However, tri-
B (1, 3) angulation results in a large, sometimes pro-
3
hibitive, number of convex components in the
partition. Given a polygon with n vertices, the
number of triangles in the partition is n 2. C
2 To solve the problem, Keil (1985) proposed an
algorithm that can generate an optimal number
of convex components in the partition for most
types of polygons. However, the time complex-
1 ity of his algorithm is O.N 2 n log n/. To reduce
A (1, 1) C (3, 1)
the time complexity, a less optimal algorithm
is proposed and implemented in Rigaux et al.
x (2003), in which the polygon is rst triangulated
0
0 1 2 3 and then the adjacent triangles are merged to
reduce the number of convex components in the
Constraint Data, Visualizing, Fig. 2 Representing spa-
result.
tial object by convex polygon(s)

A polygon can be either a convex polygon or Animation of Spatiotemporal Objects


non-convex polygon. A convex polygon can be Each spatiotemporal object has a spatial extent
directly represented by a set of conjunctive linear and a temporal extent. The spatial extent repre-
inequality constraints over the two variables sents the set of points in space that belong to
.x; y/. A non-convex polygon must be rst the object. The temporal extent represents the
partitioned into convex components. Then, it set of time instances when the object exists. The
can be represented by constraint databases and shape and the location of a 2-D spatiotempo-
visualized through the union of the convex ral object may change over time. In constraint
components. databases, each 2-D spatiotemporal object is rep-
This way, any vector data can be represented resented by linear constraints over the three vari-
by disjunctive normal form formulas of linear ables (x, y, t ) in disjunctive normal form for-
equations and linear inequality constraints over x mula (Revesz 2002).
and y (Revesz 2002). For example, the triangle There are two different animation methods, as
formed by vertices A(1,1), B(1,3), and C (3,1) shown in Fig. 3, to visualize the spatiotemporal
can be represented by the following conjunction constraints:
of linear inequality constraints: The naive animation method works directly on
constraint databases. It nds the linear constraint
ABC.x; y/ W - x 1; y 1; x C y 4: (1) relations that have only two spatial variables
named as x and y to represent the extreme points
Most of the original spatial objects are described of the polygon for each time instance ti , by
by polygons. To represent those objects, con- instantiating the variable t to ti in the original
straint databases have to rst decompose poly- linear constraint tuple. Then it calls the graphic li-
gons into convex components. For some kind of brary functions provided by the operating system
polygons, it is a NP-hard problem (O Rourke to output the polygon. The whole computation
and Supowit 1983). But for most of the com- will be executed every time the user requests an
monly used polygons, it is possible to nd poly- animation display. It is a time-consuming process
nomial polygon partition algorithms (Chazelle and often times causes many delays and jumps in
and Dobkin 1979; Keil 1985; Schachter 1978). the animation.
310 Constraint Data, Visualizing

Naïve Method

x, y, Extreme Points of
Constraints the convex
t =1

x, y, Extreme Points of
t=2 Constraints the convex

x, y Extreme Points of
Constraints the convex
t=n
x, y, t
Display
Constraints

Extreme Points of
t=1 the convex

x = x(t) Extreme Points of


Preprocess t=2 the convex
y = y(t)

Extreme Points of
t=n the convex

Parametric Method

Constraint Data, Visualizing, Fig. 3 Naive and parametric animation methods (See Fig. 16.11 in Revesz 2002)

The parametric animation method has a pre- way, the spatiotemporal data are visualized as an
processing step and a display step to speed up animation display.
the animation. The preprocessing step is executed
at the time the constraint relation is loaded or Key Applications
constructed. It rst computes the extreme points
of each polygon based on its constraint tuple. The visualization of spatial constraint databases
Then, each polygon is describable by a sequence is similar to the visualization of other GIS sys-
of extreme points. Finally, each extreme point is tems, such as the ARC/GIS system. However,
represented by parametric functions x D x.t / the power of ef ciently describing in nite spa-
and y D y.t /, which will be kept in memory until tial and spatiotemporal data and the support of
the close of the constraint relation. The display recursive queries make the visualization of spatial
step is executed every time the user requests an constraint databases more attractive for complex
animation display. After the user speci es the problems like visualization of the recursively
range and the granularity of time for the anima- de ned spatiotemporal concepts (Revesz and Wu
tion and sends the request to the system, the ex- 2004). These applications typically include prob-
treme point parametric functions are loaded and lems where various kinds of spatial and spa-
the time variable t is instantiated several times tiotemporal information such as maps, popula-
based on the required granularity. It generates tion, meteorology phenomena, and moving ob-
a sequence of polygon outputs and sequentially jects are represented and visualized. The follow-
and smoothly outputs them onto the monitor. This ing are some examples of such applications.
Constraint Data, Visualizing 311

Visualization Functions of the MLPQ are traveling at uniform speed de ned by the
Constraint Database System transition function in the constraint databases.
The MLPQ constraint database system imple- The Collide operator will generate a new relation
mented many visualization operators. For exam- that expresses the motion of the objects before
ple, the Complement operator returns the comple- and after collision. This operator can be used to
ment of the given spatial object. The Difference visualize applications like the crash of two cars
operator generates the difference between two or the contact of two billiard balls. C
spatial objects. Three commonly used visualiza-
tion operators are described as follows. Applications Based on Recursively De ned
Concepts
2D Animation Visualization of recursively de ned concepts is
The MLPQ constraint database system can dis- a general problem that appears in many areas.
play the spatiotemporal relations in animations. For example, drought areas based on the Stan-
It provides a set of buttons for the user to control dardized Precipitation Index (SPI) and long-term
the displaying of the animation. The animation air pollution areas based on safe and critical
button allows the user to set the start and end level standards are recursively de ned concepts.
time, the time interval of two frames, and the In Revesz and Wu (2004), a general and ef cient
speed of the animation. The play and playback representation and visualization method was pro-
buttons play the animation forward and back- posed to display recursively de ned spatiotem-
ward, respectively. The rst, forward, next, and poral concepts. Sample applications such as vi-
last buttons allow the user to navigate between sualization of drought and pollution areas were
frames. implemented to illustrate the method.

Block Applications for Epidemiology


Some spatial objects like light and re are formed Ef cient computerized reasoning about epi-
by a set of independent points. If some of the demics is important to public health and national
points are blocked by the presence of another security, but it is a dif cult task because
object, the rest of the points just continue moving epidemiological data are usually spatiotemporal,
along the trajectory determined by the trans- recursive, and fast changing, hence hard to
formation function. The block operator is de- handle in traditional relational databases and
signed to visualize such situations in the con- geographic information systems. In Revesz and
straint databases. It takes two relations and a Wu (2006), a particular epidemiological system
time instance tk as the inputs and returns a new called WeNiVIS was implemented based on
relation that represents the points of the rst the visualization of spatiotemporal constraint
relation at time instance tk that are not blocked by databases. It enables the visual tracking of the
the second relation at any time before tk . Based West Nile virus epidemic in Pennsylvania and
on the block operator, people can easily visualize helps people to predicate the high-risk areas.
the spatial objects such as the shadow of a ball or
show how a lake can block the movement of a re
in the forest. Future Directions

Collide Spatial constraint databases provide an ef cient


A common scenario between moving objects is way to store and query spatial or spatiotemporal
the collision. The Collide operator is designed data over the Internet. There are a growing num-
to visualize the collision situation in animation. ber of web-based spatial constraint database ap-
It assumes an extra attribute for spatiotempo- plications. Most of the applications ask for high-
ral objects called mass. Suppose there are two level visualization methods of constraint data to
objects that do not change their shape and that improve their user interfaces. Methods to enhance
312 Constraint Database Queries

the performance of visualizing spatial constraint Kanellakis PC, Kuper GM, Revesz P (1990) Constraint
databases over the Internet are being developed. query languages. In: Proceedings of ACM sympo-
sium on principles of database systems, Nashville,
By adding one more parameter in the database, pp 299 313
spatial constraint databases can easily represent Kanellakis PC, Kuper GM, Revesz P (1995) Constraint
3-D data. However, by the time of writing this query languages. J Comput Syst Sci 51(1):26 52
entry, only primitive visualization methods like Keil JM (1985) Decomposing a polygon into simpler
components. SIAM J Comput 14(4):799 817
isometric color bands are implemented in some O Rourke J, Supowit KJ (1983) Some NP-hard poly-
existing constraint database systems. For exam- gon decomposition problems. IEEE Trans Inf Theory
ple, the map can be visualized by discrete color 29(2):181 190
zones according to the values of variable ·, which Revesz P (2002) Introduction to constraint databases.
Springer, New York
can be used to represent the value of elevation, Revesz P, Li Y (1997) MLPQ: a linear constraint database
precipitation, or temperature. Using 2-D images system with aggregate operators. In: Proceedings
to visualize 3-D objects is a temporary solution of 1st international database engineering and appli-
for this problem. Compared to the 3-D visual- cations symposium. IEEE Press, Washington DC,
pp 132 137
ization of other commercial GIS systems, this Revesz P, Wu S (2004) Visualization of recursively de-
method has many restrictions on the 3-D objects ned concepts. In: Proceedings of the 8th international
to be visualized, and the result is not impressive. conference on information visualization. IEEE Press,
The implementations of real 3-D visualization of Washington DC, pp 613 621
Revesz P, Wu S (2006) Spatiotemporal reasoning about
constraint databases are being developed. epidemiological data. Artif Intell Med 38(2):157 170
Rigaux P, Scholl M, Segou n L, Grumbach S (2003)
Building a constraint-based spatial database sys-
tem: model, languages, and implementation. Inf Syst
Cross-References 28(6):563 595
Schachter I (1978) Decomposition of polygons into con-
vex sets. IEEE Trans Comput 27(11):1078 1082
 Constraint Database Queries
 Constraint Databases and Data Interpolation
 Constraint Databases and Moving Objects
 Constraint Databases, Spatial Constraint Database Queries
 MLPQ Spatial Constraint Database System
 Raster Data Lixin Li
 Vector Data Department of Computer Sciences, Georgia
Southern University, Statesboro, GA, USA

References
Synonyms
Brodsky A, Segal V, Chen J, Exarkhopoulo P (1997) The
CCUBE constraint object-oriented database system. Constraint query languages; Datalog, SQL; Logic
Constraints 2(3 4):245 277 programming language
Chazelle B, Dobkin D (1979) Decomposing a polygon
into its convex parts. In: Proceedings of 11th annual
ACM symposium on theory of computing, Atlanta,
pp 38 48 Definition
Goldin D, Kutlu A, Song M, Yang F (2003) The constraint
database framework: lessons learned from CQA/CDB.
In: Proceedings of international conference on data A database query language is a special-purpose
engineering, Bangalore, pp 735 737 programming language designed for retrieving
Grumbach S, Rigaux P, Segou n L (1998) The information stored in a database. Structured
{DEDALE} system for complex spatial queries.
query language (SQL) is a very widely used
In: Proceedings of ACM SIGMOD international
conference on management of data, Seattle, commercially marketed query language for
pp 213 224 relational databases. Different from conventional
Constraint Database Queries 313

programming languages such as C, C++, tion , nd the amount of ultraviolet radiation for
or Java, a SQL programmer only needs to each ground location (x,y) at time t.
specify the properties of the information to
Since the input relations in Tables 1 and 2 in
be retrieved, but not the detailed algorithm
Entry Constraint Databases and Data Interpola-
required for retrieval. Because of this property,
tion only record the incoming ultraviolet radia-
SQL is said to be declarative. In contrast,
tion u and lter ratio r on a few sample points, C
conventional programming languages are said
these cannot be used directly to answer the query.
to be procedural.
Therefore, to answer this query, the interpolation
To query spatial constraint databases, any
results of INCOMING(y, t, u) and FILTER(x, y, r)
query language can be used, including SQL.
are needed. To write queries, it is not necessary
However, Datalog is probably the most popularly
to know precisely what kind of interpolation
used rule-based query language for spatial
method is used and what are the constraints used
constraint databases because of its power of
in the representation interpolation. The above
recursion. Datalog is also declarative.
query can be expressed in Datalog as follows
(Li 2003):

Historical Background GROUND.x; y; t; i / W

The Datalog query language is based on logic I NCOMI NG.y; t; u/;


programming language Prolog. The history of F ILTER.x; y; r/;
Datalog queries and logic programming is dis-
cussed in several textbooks such as Ramakr- i D u.1 r/:
ishnan (1998), Silberschatz et al. (2006), and
Ullman (1989). Early work on constraint logic The above query could be also expressed in
programming has been done by Jaffar and Lassez SQL. Whatever language is used, it is clear that
(1987). The concepts of constraint data model the evaluation of the above query requires a join
and query language have been explored by Kanel- of the INCOMING and FILTER relations. Unfor-
lakis et al. (1990, 1995). Recent books on con- tunately, join operations are dif cult to express in
straint databases are Kuper et al. (2000) and simple GIS, including the ArcGIS system. How-
Revesz (2002). ever, join processing is very natural in constraint
database systems.
If the IDW interpolation is used (see sec-
tion Scienti c Fundamentals, Key Applica-
Scientific Fundamentals
tions, Entry  Constraint Databases and Data
Interpolation ), the nal result of the Datalog
A Datalog query program consists of a set of rules
query, GROUND(x, y, t, i), can be represented
of the following form (Revesz 2002):
by Table 1. Since there are ve second-order
Voronoi regions for Incoming and four regions
R0 .x1 ; : : : ; xk / W R1 .x1;1 ; : : : ; x1;k / ; : : : for Filter, as shown in Figs. 8 and 9 in Entry
Rn .xn;1 ; : : : ; xn;k / :  Constraint Databases and Data Interpolation,
there should be 20 tuples in GROUND(x, y, t, i)
in Table 1. Note that the constraint relations can
where each Ri is either an input relation name
be easily joined by taking the conjunction of the
or a de ned relation name and the xs are either
constraints from each pair of tuples of the two
variables or constants.
input relations. Finally, in a constraint database
Query 1 For the ultraviolet radiation example in system, the constraint in each tuple is automat-
Scienti c Fundamentals in Entry ically simpli ed by eliminating the unnecessary
 Constraint Databases and Data Interpola- variables u and r.
314 Constraint Database Queries

Constraint Database Queries, Table 1 GROUND (x, y, t, i) using IDW


X Y T I
x y t i
2x y 20 < q0; 12xC7y 216 < q0; 13yC7t 286 < q0; 2y 3t 12 < q0; y < q15;
..x 2/2 C .y 14/2 /0:9 C ..x 2/2 C .y 1/2 /0:5
D .2.x 2/2 C .y 14/2 C .y 1/2 /r;
..y 13/2 C .t 22/2 /60 C .y 2 C .t 1/2 /20
D ..y 13/2 C .t 22/2 C y 2 C .t 1/2 /u;
i D u.1 r/
x by bt bi b2x y 20 0; 12xC7y 216 < q0; 13yC7t 286 < q0; 2y 3t 12 < q0; y < q15;
..x 25/2 C .y 1/2 /0:9 C ..x 2/2 C .y 1/2 /0:8
D .2.y 1/2 C .x 25/2 C .x 2/2 /r;
..y 13/2 C .t 22/2 /60 C .y 2 C .t 1/2 /20
D ..y 13/2 C .t 22/2 C y 2 C .t 1/2 /u;
i D u.1 r/
x y t y

x y t i 2x y 20 < q0; 12x C 7y 216 0; y 15; y C 3t 54 < q0; 7y t 136 <


q0; 2y C 5t 60 0;
..x 25/2 C .y 14/2 /0:5 C ..x 2/2 C .y 14/2 /0:3
D .2.y 14/2 C .x 25/2 C .x 2/2 /r
..y 29/2 C t 2 /20 C ..y 13/2 C .t 22/2 /40
D ..y 29/2 C t 2 C .y 13/2 C .t 22/2 /u;
i D u.1 r/

Key Applications Constraint Database Queries, Table 2 Sample (x, y,


t, p)
There are many possible queries for a particular X Y T P (price/square foot)
set of GIS data. For example, a very basic query 888 115 4 56.14
for a set of spatiotemporal data would be, What 888 115 76 76.02
is the value of interest at a speci c location and 1630 115 118 86.02
time instance? With good interpolation results 1630 115 123 83.87
and ef cient representation of the interpolation
2240 2380 51 91.87
results in constraint databases, many spatiotem- 2650 1190 43 63.27
poral queries can be easily answered by query
languages. In the following, some examples of
Datalog queries are shown. Ozone_interp(x, y, t, w), which stores the in-
terpolation results of the ozone data by any
spatiotemporal interpolation method, such as
Ozone Data Example 3-D shape function or IDW;
Ozone_loocv(x, y, t, w), which stores the in-
Based on the ozone data example in section His- terpolated ozone concentration level at each
torical Background , Key Applications, Entry monitoring site .x; y/ and time t after apply-
 Constraint Databases and Data Interpolation, ing the leave-one-out cross-validation.
some sample spatiotemporal queries are given
below. Assume that the input constraint relations
are (Li et al. 2006): Note

Ozone_orig(x, y, t, w), which records the origi- The leave-one-out cross-validation is a process
nal measured ozone value w at monitoring site that removes one of the n observation points
location .x; y/ and time t ; and uses the remaining n 1 points to estimate
Constraint Database Queries 315

its value, and this process is repeated at each Error.x; y; t; r/ W O·one_orig.x; y; t; w1/;
observation point (Hjorth 1994). The observa-
tion points are the points with measured origi- O·one_loocv.x; y; t; w2/;
nal values. For the experimental ozone data, the r Dj w1 w2 j =w1:
observation points are the spatiotemporal points
Avg_error.x; y; avg.r// W Error.x; y; t; r/:
.x; y; t /, where .x; y/ is the location of a mon-
itoring site and t is the year when the ozone S i t es_C hosen.x; y/ W Avg_error.x; y; ae/; C
measurement was taken. After the leave-one-out
ae >D 0:2:
cross-validation, each of the observation points
will not only have its original value but also
To nd the areas within 50 miles to the sites with
will have an interpolated value. The original and
more than 20 % interpolation errors, a GIS buffer
interpolated values at each observation point can
operation on the relation Sites_Chosen should
be compared for the purpose of an error analysis.
be performed. The buffer operation is provided
The interpolation error at each data point by
by many GIS software packages and the MLPQ
calculating the difference between its original and
constraint database system. After performing the
interpolated values is as follows:
buffer operation, an output relation will be cre-
j Ii Oi j ated which contains a 50-mile buffer around the
Ei D (1) locations stored in the Sites_Chosen relation.
Oi
Similarly, if there will be a budget cut, similar
where Ei is the interpolation error at observation queries to nd out and shut down the monitor-
point i , Ii is the interpolated value at point i , and ing sites with small interpolation errors can be
Oi is the original value at point i . designed.
Query 2 For a given location with longitude x
and latitude y, nd the ozone concentration level
in year t. House Price Data Example
This can be expressed in Datalog as follows:
The house price data consist of a set of real
O·one_value.w/ O·one_i nt erp.x; y; t; w/: estate data obtained from the Lancaster County
assessor s of ce in Lincoln, Nebraska. House sale
histories since 1990 are recorded in the real estate
Query 3 Suppose that in the future years, there
data set and include sale prices and times. In the
will be a budget increase so that new ozone mon-
experiment, 126 residential houses are randomly
itoring sites can be added. Find the best areas
selected from a quarter of a section of a township,
where new monitoring sites should be installed.
which covers an area of 160 acres. Furthermore,
In order to decide the best locations to add from these 126 houses, 76 houses are randomly
new monitoring sites, it is necessary to rst nd selected as sample data, and the remaining 50
those monitoring sites that have average large houses are used as test data. Figure 1 shows the 76
interpolation errors according to equation (1), for houses with circles and the 50 remaining houses
example, over 20 %. Then, do a buffer operation with stars.
on the set of monitoring sites with big errors to Tables 2 and 3 show instances of these two
nd out the areas within certain distance to each data sets. Based on the fact that the earliest sale
site, for example, 50 miles. Since the buffered of the houses in this neighborhood is in 1990, the
areas are the areas with poor interpolation result, time is encoded in such a way that 1 represents
these areas can be considered the possible areas January 1990, 2 represents February 1990, . . . ,
where new monitoring sites should be built. To and 148 represents April 2002. Note that some
nd the monitoring sites with more than 20 % in- houses are sold more than once in the past, so they
terpolation errors, perform the following Datalog have more than one tuple in Table 2. For example,
queries: the house at the location (888, 115) was sold three
316 Constraint Database Queries

Constraint Database 3000


Queries, Fig. 1
Seventy-six sample houses
( / and 50 test houses (F)
2500

2000

1500

1000

500

0
0 500 1000 1500 2000 2500 3000

Constraint Database Queries, Table 3 Test (x, y, t) Query 5 Suppose it is known that house prices
X Y T in general decline for some time after the rst
115 1525 16
sale. For each house, nd the rst month when it
115 1525 58
becomes pro table, that is, the rst month when
its price exceeded its initial sale price.
115 1525 81
115 1610 63 This can be expressed as follows:

120 1110 30
not _P rof i t able.x; y; t / W Bui lt .x; y; t /:
615 780 59
not _P rof i t able.x; y; t 2 / W

times in the past at time 4 and 76 (which represent not _P rof i t able.x; y; t 1 /;
4/1990 and 4/1996) (Li and Revesz 2002). House.x; y; t 2 ; p2 /; S t art .x; y; p/;
Assume that the input constraint relations
are House.x; y; t; p/ and Bui lt .x; y; t /. t2 D t1 C1; p2 > qp:
House.x; y; t; p/ represents the interpolation P rof i t able.x; y; t 2 / W
result of house price data, and Bui lt .x; y; t /
not _P rof i t able.x; y; t 1 /;
records the time t (in month) when the house at
location .x; y/ was built. The Built relation can House.x; y; t 2 ; p2 /; S t art .x; y; p/;
be usually easily obtained from real estate or city
t2 D t1 C 1; p2 > p:
planning agencies.

Query 4 For each house, nd the starting sale


Query 6 How many months did it take for each
price when the house was built.
house to become pro table?
This can be expressed as follows:
This translates as

S t art .x; y; p/ W Bui lt .x; y; t /; T i me_t o_P rof i t .x; y; t 3 / W Bui lt .x; y; t 1 /;
House.x; y; t; p/: P rof i t able.x; y; t 2 /; t3 D t2 t1 :
Constraint Databases and Data Interpolation 317

All of the above queries could be a part of a Revesz P (2002) Introduction to constraint databases.
more complex data mining or decision support Springer, New York
Silberschatz A, Korth H, Sudarshan S (2006) Database
task. For example, a buyer may want to nd out system concepts, 5th edn. McGraw-Hill, New York
which builders tend to build houses that become Ullman JD (1989) Principles of database and
pro table in a short time or keep their values best. knowledge-base systems. Computer Science Press,
New York
C
Future Directions
Constraint Database Systems
Interesting directions for the future work could
be to continue to design more interesting queries
 Linear Versus Polynomial Constraint Databases
in spatial constraint databases which can be a
 Polynomial Spatial Constraint Databases
valuable part of decision support systems.

Cross-References Constraint Database Visualization


 Constraint Databases and Data Interpolation  Constraint Data, Visualizing
 Constraint Databases and Moving Objects
 Constraint Databases, Spatial
 MLPQ Spatial Constraint Database System
Constraint Databases and Data
Interpolation
Recommended Reading
Lixin Li
Hjorth U (1994) Computer intensive statistical methods, Department of Computer Sciences, Georgia
validation, model selection, and bootstrap. Chapman
and Hall/CRC, London/New York
Southern University, Statesboro, GA, USA
Jaffar J, Lassez JL (1987) Constraint logic program-
ming. In: Proceedings of the 14th ACM symposium
on principles of programming languages, Munich, Synonyms
pp 111 119
Kanellakis PC, Kuper GM, Revesz P (1990) Constraint Contraint relations; Data approximation; Delau-
query languages. In: ACM symposium on principles
of database systems, Nashville, pp 299 313 nay triangulation; Fourier series; Inverse distance
Kanellakis PC, Kuper GM, Revesz P (1995) Constraint weighting; Nearest neighbors; Shape function;
query languages. J Comput Syst Sci 1:26 52 Spatial interpolation; Spatiotemporal interpola-
Kuper GM, Libkin L, Paredaens J (eds) (2000) Constraint tion; Splines; Trend surfaces
databases. Springer, Berlin/Heidelberg
Li L (2003) Spatiotemporal interpolation methods in
GIS. Ph.D thesis, University of Nebraska-Lincoln,
Lincoln
Definition
Li L, Revesz P (2002) A comparison of spatio-temporal
interpolation methods. In: Proceedings of the second Constraint databases generalize relational
international conference on GIScience 2002. Lec- databases by nitely representing in nite rela-
ture notes in computer science, vol 2478. Springer,
Berlin/Heidelberg/New York, pp 145 160
tions. In the constraint data model, each attribute
Li L, Zhang X, Piltner R (2006) A spatiotemporal database is associated with an attribute variable, and the
for ozone in the conterminous US. In: Proceedings value of an attribute in a relation is speci ed
of the thirteenth international symposium on temporal implicitly using constraints. Compared with
representation and reasoning, Washington, DC. IEEE,
pp 168 176
the traditional relational databases, constraint
Ramakrishnan R (1998) Database management systems. databases offer an extra layer of data abstraction,
McGraw-Hill, New York which is called the constraint level (Revesz
318 Constraint Databases and Data Interpolation

2002). It is the constraint level that makes it shape function-, IDW-, and Kriging-based
possible for computers to use nite number of spatiotemporal interpolation methods by using
tuples to represent in nite number of tuples at an actual real estate data set with house prices.
the logical level. Revesz and Wu (2006) also uses a shape function-
It is very common in GIS that sample mea- based interpolation method to represent the
surements are taken only at a set of points. Inter- West Nile virus data in constraint databases and
polation is based on the assumption that things implements a particular epidemiological system
that are close to one another are more alike called WeNiVIS that enables the visual tracking
than those that are farther apart. Interpolation is of and reasoning about the spread of the West
needed in order to estimate the values at unsam- Nile virus epidemic in Pennsylvania.
pled points.
Constraint databases are very suitable for rep-
resenting spatial/spatiotemporal interpolation re- Scientific Fundamentals
sults. In this entry, several spatial and spatiotem-
poral interpolation methods are discussed, and Suppose that the following two sets of sensory
the representation of their spatiotemporal inter- data are available in the database (Revesz and Li
polation that results in constraint databases is 2002):
illustrated by some examples. The performance
analysis and comparison of different interpola- Incoming (y, t , u) records the amount of
tion methods in GIS applications can be found in incoming ultraviolet radiation u for each pair
Li and Revesz (2002), Li and Revesz (2004), and of latitude degree y and time t , where time is
Li et al. (2006). measured in days.
Filter (x, y, r) records the ratio r of ultra-
violet radiation that is usually ltered out by
Historical Background the atmosphere above location (x, y) before
reaching the earth.
There exist a number of spatial interpolation
algorithms, such as inverse distance weighting Suppose that Fig. 1 shows the locations of the
(IDW), Kriging, splines, trend surfaces, and (y, t ) and (x, y) pairs where the measurements
Fourier series. Spatiotemporal interpolation is for u and r, respectively, are recorded. Then
a growing research area. With the additional Tables 1 and 2 could be instances of these two
time attribute, the above traditional spatial relations in a relational database.
interpolation algorithms are insuf cient for The above relational database can be trans-
spatiotemporal data, and new spatiotemporal lated into a constraint database with the two
interpolation methods must be developed. There constraint relations shown in Tables 3 and 4.
have been some papers addressing the issue Although any relational relation can be trans-
of spatiotemporal interpolation in GIS. Gao lated into a constraint relation as above, not all
(2006), Li et al. (2003), and Revesz and Wu the constraint relations can be converted back to
(2006) deal with the use of spatiotemporal relational databases. This is because a constraint
interpolations for different applications. Li relation can store in nite number of solutions.
et al. (2004) and Li and Revesz (2004) discuss For example, the in nite number of interpolation
several newly developed shape function based results of u and r for all the points in the domains
spatial/spatiotemporal interpolation methods. for Incoming (y, t , u) and Filter (x, y, r) can
There have been some applications on the shape be represented in a constraint database by a nite
function-based methods. For example, Li et al. number of tuples. The representation of interpo-
(2006) applies a shape function interpolation lation results in constraint databases by different
method to a set of ozone data in the conterminous methods for Incoming and Filter will be given in
USA, and Li and Revesz (2004) compares Key Applications.
Constraint Databases and Data Interpolation 319

Constraint Databases t
and Data Interpolation,
Fig. 1 The spatial sample 24 2 y
points for Incoming (left) 3 16 3
and Filter (right) 2
16

8
8 C
1 4 1 4
0 y 0 16 24 x
8 16 24 32 8

Constraint Databases and Data Interpolation, applications, for example, in nite element algo-
Table 1 Relational incoming (y, t, u) rithms (Zienkiewics and Taylor 2000). There are
ID Y T U various types of 2-D and 3-D shape functions.
1 0 1 60 2-D shape functions for triangles and 3-D shape
2 13 22 20 functions for tetrahedra are of special interest,
3 33 18 70 both of which are linear approximation meth-
4 29 0 40 ods. Shape functions are recently found to be a
good interpolation method for GIS applications,
Constraint Databases and Data Interpolation,
Table 2 Relational lter (x, y, r) and the interpolation results are very suitable to
be represented in linear constraint databases (Li
ID X Y R
et al. 2004; Li and Revesz 2002, 2004; Li et al.
1 2 1 0.9
2006; Revesz and Li 2002).
2 2 14 0.5
3 25 14 0.3
4 25 1 0.8
2-D Shape Function for Triangles
Constraint Databases and Data Interpolation,
When dealing with complex two-dimensional
Table 3 Constrain incoming (y, t, u) geometric domains, it is convenient to divide
the total domain into a nite number of simple
ID Y T U
sub-domains which can have triangular or
id y t u id=1, y D 0, t D 1, u D 60
quadrilateral shapes. Mesh generation using
id y t u id=2, y D 13, t D 22, u D 20
id y t u id=3, y D 33, t D 18, u D 70 triangular or quadrilateral domains is important
id y t u id=4, y D 29, t D 0, u D 40 in nite element discretization of engineering
problems. For the generation of triangular
Constraint Databases and Data Interpolation, meshes, quite successful algorithms have
Table 4 Constraint lter (x, y, r) been developed. A popular method for the
ID X Y R generation of triangular meshes is the Delaunay
id x y r id=1, x D 2, y D 1, r D 0:9 triangulation (Preparata and Shamos 1985).
id x y r id=2, x D 2, y D 14, r D 0:5 A linear interpolation function for a triangu-
id x y r id=3, x D 25, y D 14, r D 0:3 lar area can be written in terms of three shape
id x y r id=4, x D 25, y D 1, r D 0:8 functions N1 , N2 , N3 , and the corner values w1 ,
w2 , w3 . In Fig. 2, two triangular nite elements, I
and II, are combined to cover the whole domain
Key Applications considered (Li and Revesz 2004).
In this example, the function in the whole
Applications Based on Shape Function domain is interpolated using four discrete values
Spatial Interpolation w1 , w2 , w3 , and w4 at four locations. A particular
Shape functions, which can be viewed as a spatial feature of the chosen interpolation method is that
interpolation method, are popular in engineering the function values inside the sub-domain I can
320 Constraint Databases and Data Interpolation

w2 w3(x3, y3)
w4

A2 w A1
(x,y)

w1 A3
w3
w1(x1, y1) w2(x2, y2)
Constraint Databases and Data Interpolation, Fig. 2
Linear interpolation in space for triangular elements Constraint Databases and Data Interpolation, Fig. 3
Computing shape functions by area divisions

be obtained by using only the three corner values


w1 , w2 and w3 , whereas all function values for the A1
N1 .x; y/ D ;
sub-domain II can be constructed using the corner A
values w2 , w3 , and w4 . Suppose A is the area of A2
N2 .x; y/ D ; (3)
the triangular element I. The linear interpolation A
function for element I can be written as A3
N3 .x; y/ D
w.x; y/ D N1 .x; y/w1 C N2 .x; y/w2 A
3 2
w1 where A1 , A2 and A3 are the three sub-triangle
C N3 .x; y/w3 D N1 N2 N3 4 w2 5 areas of sub-domain I as shown in Fig. 3, and A
w3 is the area of the outside triangle w1 w2 w3 .
(1)
where N1 , N2 and N3 are the following shape
3-D Shape Function for Tetrahedra
functions:
Three-dimensional domains can also be divided
into a nite number of simple sub-domains, such
.x2 y3 x3 y2 / C x.y2 y3 / C y.x3 x2 / as tetrahedral or hexahedral sub-domains. Tetra-
N1 .x; y/ D 2A hedral meshing is of particular interest. With a
.x3 y1 x1 y3 / C x.y3 y1 / C y.x1 x3 / large number of tetrahedral elements, compli-
N2 .x; y/ D 2A
cated 3-D objects can be approximated. There
N3 .x; y/ D .x1 y2 x2 y1 / C x.y1 y2 / C y.x2 x1 / exist several methods to generate automatic tetra-
2A
(2) hedral meshes, such as the 3-D Delaunay tetrahe-
dralization and some tetrahedral mesh improve-
It should be noted that for every sub-domain, a ment methods to avoid poorly shaped tetrahedra.
local interpolation function similar to expression A linear interpolation function for a 3-D tetra-
(1) is used. Each local interpolation function is hedral element can be written in terms of four
constrained to the local triangular sub-domain. shape functions N1 , N2 , N3 , N4 and the corner
For example, the function w of expression (1) values w1 , w2 , w3 , w4 . In Fig. 4, two tetrahedral
is valid only for sub-domain I. For sub-domain elements, I and II, cover the whole domain con-
II, the local approximation takes a similar form sidered (Li and Revesz 2004).
as the expression (1) with replacing the corner In this example, the function in the whole
values w1 , w2 and w3 with the new values w2 , domain is interpolated using ve discrete values
w3 and w4 . w1 , w2 , w3 , w4 , and w5 at ve locations in space.
Alternatively, considering only sub-domain I, To obtain the function values inside the tetrahe-
the 2-D shape function (2) can also be expressed dral element I, the four corner values w1 , w2 ,
as follows (Revesz and Li 2002): w3 , and w4 can be used. Similarly, all function
Constraint Databases and Data Interpolation 321

w1 w1(x1,y1,z1)

w(x,y,z)
w2 w3(x3,y3,z3)
w3
C

w2(x2,y2,z2)
w4
w5 w4(x4,y4,z4)

Constraint Databases and Data Interpolation, Fig. 4 Constraint Databases and Data Interpolation, Fig. 5
Linear interpolation in space for tetrahedral elements Computing shape functions by volume divisions

2 3 2 3
values for element II can be constructed using x 2 1 ·2 x2 y2 1
the corner values w1 , w3 , w4 , and w5 . Suppose c1 D det 4 x3 1 ·3 5 %%d1 D det 4 x3 y3 1 5
V is the volume of the tetrahedral element I. The x 4 1 ·4 x4 y4 1
linear interpolation function for element I can be with the other constants de ned by cyclic inter-
written as: change of the subscripts in the order 4, 1, 2, 3
(Zienkiewics and Taylor 2000).
w.x; y; ·/ D N1 .x; y; ·/w1 C N2 .x; y; ·/w2 Alternatively, considering only the tetrahedral
element I, the 3-D shape function (5) can also be
C N3 .x; y; ·/w3 C N4 .x; y; ·/w4
expressed as follows (Li and Revesz 2004):
2 3
w1
6 w2 7 V1 V2
D N1 N 2 N 3 N 4 6 4 w3 5
7 N1 .x; y; ·/ D ; N2 .x; y; ·/ D ;
V V
(6)
w4 V3 V4
(4) N3 .x; y; ·/ D ; N4 .x; y; ·/ D :
V V
where N1 , N2 , N3 and N4 are the following shape
functions: V1 , V2 , V3 and V4 are the volumes of the four
sub-tetrahedra ww2 w3 w4 , w1 ww3 w4 , w1 w2 ww4 ,
a1 C b1 x C c1 y C d1 · and w1 w2 w3 w, respectively, as shown in Fig. 5;
N1 .x; y; ·/ D ; and V is the volume of the outside tetrahedron
6V
a2 C b2 x C c2 y C d2 · w1 w2 w3 w4 .
N2 .x; y; ·/ D ;
6V
(5) Representing Interpolation Results in
a3 C b3 x C c3 y C d3 · Constraint Databases
N3 .x; y; ·/ D ;
6V In traditional GIS, spatial data are represented in
a4 C b4 x C c4 y C d4 · the relational data model, which is the most
N4 .x; y; ·/ D :
6V popular data model. Many database systems
are based on the relational model, such as
By expanding the other relevant determinants into Oracle and MySQL. However, the relational
their cofactors, there exists model has disadvantages for some applications,
which may lead to in nite relational databases
2 3 2 3
x 2 y 2 ·2 1 y 2 ·2 (Revesz 2002). An in nite relational database
a1 Ddet 4 x3 y3 ·3 5 %%b1 D det 4 1 y3 ·3 5 means the database has relations with in nite
x 4 y 4 ·4 1 y 4 ·4 number of tuples. In reality, only a nite
322 Constraint Databases and Data Interpolation

set of the tuples can be stored in a relation. FILTER(x,y,r) be the constraint relation that
Therefore, a nite set of tuples has to be represents the shape function interpolation result
extracted, which leads to data incompleteness. of the Filter relation.
Using constraint databases can solve this in nity Triangulation of the set of sampled points is
problem. the rst step to use 2-D shape functions. Figure 6
The sensory data of the ultraviolet radiation shows the Delaunay triangulations for the sample
example in scienti c fundamentals will be points in Incoming(y, t , u) and Filter(x, y, r)
used to illustrate how to represent 2-D shape illustrated in Fig. 1.
function spatial interpolation results in constraint The domain of a triangle can be represented
databases. In this example, Incoming(y, t , u) is by a conjunction C of three linear inequalities
treated as if it contains a set of 2-D spatial data. corresponding to the three sides of the triangle.
Let INCOMING(y, t , u) be the constraint relation Then, by the shape function (2), the value w of
that represents the shape function interpolation any point x; y inside a triangle can be represented
result of the Incoming relation. Similarly, let by the following linear constraint tuple:

R.x; y; w/ W C;
w D ..y2 y3 /w1 C .y3 y1 /w2 C .y1 y2 /w3 /=.2A/ x
C ..x3 x2 /w1 C .x1 x3 /w2 C .x2 x1 /w3 /=.2A/ y
C ..x2 y3 x3 y2 /w1 C .x3 y1 x1 y3 /w2
C.x1 y2 x2 y1 /w3 /=.2A/ :

where A is a constant for the area value of the tension (Li and Revesz 2002). These methods can
triangle. By representing the interpolation in each be described brie y as follows:
triangle by a constraint tuple, a constraint relation
to represent the interpolation in the whole domain Reduction This approach reduces the
can be found in linear time. spatiotemporal interpolation problem to
Table 5 illustrates the constraint representation a regular spatial interpolation case. First,
for the interpolation result of FILTER using 2-D interpolate (using any 1-D interpolation
shape functions. The result of INCOMING is in time) the measured value over time at
similar, and the details can be found in reference each sample point. Then get spatiotemporal
Revesz and Li (2002). interpolation results by substituting the
desired time instant into some regular spatial
Applications Based on Shape Function
interpolation functions
Spatiotemporal Interpolation
There are two fundamentally different ways for Extension This approach deals with time as
spatiotemporal interpolation: reduction and ex- another dimension in space and extends the

Constraint Databases 2
and Data Interpolation,
Fig. 6 Delaunay 3
triangulations for Incoming
(left) and Filter (right)
2 3

1 1
4 4
Constraint Databases and Data Interpolation 323

Constraint Databases and Data Interpolation, approximation in space and time. The second
Table 5 FILTER (x, y, r) using 2-D shape functions
step, interpolation in space and time, can be
X Y R implemented by combining a time shape function
x y r 13x 23y C 296 0, x 2, y 1, with the space approximation function (1).
r D 0:0004x 0:0031y C 0:1168 Assume the value at node i at time t1 is wi1 ,
x y r 13x 23y C 296 < q0, x < q25, y < q14, and at time t2 the value is wi2 . The value at the
r D 0:0013x 0:0038y C 0:1056 node i at any time between t1 and t2 can be C
interpolated using a 1-D time shape function in
the following way:
spatiotemporal interpolation problem into a
one-higher dimensional spatial interpolation t2 t t t1
wi .t / D wi1 C wi2 : (7)
problem t2 t1 t2 t1

Using the example shown in Fig. 2 and utilizing


Reduction Approach formulas (1) and (7), the interpolation function
This approach for 2-D space and 1-D time prob- for any point constraint to element I at any time
lems can be described by two steps: 2-D spatial between t1 and t2 can be expressed as follows (Li
interpolation by shape functions for triangles and and Revesz 2004):

t2 t t t1 t2 t t t1
w.x; y; t / D N1 .x; y/ w11 C w12 C N2 .x; y/ w21 C w22
t2 t1 t2 t1 t2 t1 t2 t1
t2 t t t1
CN3 .x; y/ w31 C w32
t2 t1 t2 t1
t2 t
D N1 .x; y/w11 C N2 .x; y/w21 C N3 .x; y/w31
t2 t1
t t1
C N1 .x; y/w12 C N2 .x; y/w22 C N3 .x; y/w32 :
t2 t1

The reduction approach for 3-D space and 1-D ample shown in Fig. 4, the interpolation function
time problems can be developed in a similar way for any point constraint to the sub-domain I at
by combining the 3-D interpolation formula (4) any time between t1 and t2 can be expressed as
and the 1-D shape function (7). Using the ex- follows (Li and Revesz 2004):

h i h i
t2 t t t1
w.x; y; ·; t / D N1 .x; y; ·/ w
t2 t1 11
C w
t2 t1 12
C N2 .x; y; ·/ tt22 tt1 w21 C tt2 tt11 w22
h i h i
t2 t t t1
CN3 .x; y; ·/ w
t2 t1 31
C t2
w
t1 32
C N4 .x; y; ·/ tt22 tt1 w41 C tt2 tt11 w42
t2 t
D t2 t 1
N1 .x; y; ·/w11 C N2 .x; y; ·/w21 C N3 .x; y; ·/w31 C N4 .x; y; ·/w41 (8)

C tt2 t1
t1
N1 .x; y; ·/w12 C N2 .x; y; ·/w22 C N3 .x; y; ·/w32 C N4 .x; y; ·/w42 :

Since the 2-D/3-D space shape functions and tiotemporal interpolation function (110) is not
the 1-D time shape function are linear, the spa- linear but quadratic.
324 Constraint Databases and Data Interpolation

Extension Approach sites for which AIRS data are collected, the an-
For 2-D space and 1-D time problems, this nual concentration level measurements of ozone
method treats time as a regular third dimension. (O3), and the years of the measurement. Several
Since it extends 2-D problems to 3-D problems, datasets from the US EPA (website http://cfpub.
this method is very similar to the linear epa.gov/gdm) were obtained and reorganized into
approximation by 3-D shape functions for a dataset with schema .x; y; t; w/, where x and
tetrahedra. The only modi cation is to substitute y attributes are the longitude and latitude co-
the variable · in Eqs. (4), (5), and (6) by the time ordinates of monitoring site locations, t is the
variable t . year of the ozone measurement, and w is the
For 3-D space and 1-D time problems, this O34MAX (4th Max of 1-h Values for O3) value
method treats time as a regular fourth dimension. of the ozone measurement. The original dataset
New linear 4-D shape functions based on 4-D has many zero entries for ozone values, which
Delaunay tessellation can be developed to solve means no measurements available at a particular
this problem. See reference Li (2003) for details site. After ltering out all the zero entries from
on the 4-D shape functions. the original dataset, there are 1209 sites left with
measurements. Figure 7 shows the locations of
Representing Interpolation Results in the 1209 monitoring sites Li et al. (2006).
Constraint Databases Among the 1209 monitoring sites with mea-
The previous section pointed out the in nity surements, some sites have complete measure-
problem for relational databases to represent spa- ments of yearly ozone values from 1994 to 1999,
tial data. The relational data model shows more while the other sites have only partial records. For
disadvantages when handling spatiotemporal example, some sites only have measurements of
data. For example, using the relational model, the ozone values in 1998 and 1999. In total, there are
current contents of a database (database instance) 6135 ozone value measurements recorded. Each
is a snapshot of the data at a given instant in measurement corresponds to the ozone value at a
time. When representing spatiotemporal data, spatiotemporal point .x; y; t /, where .x; y/ is the
frequent updates have to be performed in order location of one of the 1209 monitoring sites, and
to keep the database instance up to date, which t is a year between 1994 and 1999.
erases the previous database instance. Therefore, The spatiotemporal interpolation extension
the information in the past will be lost. This method based on 3-D shape functions is
irrecoverable problem makes the relational data implemented into a Matlab program and applied
model impractical for handling spatiotemporal to the AIRS ozone data. The Matlab function
data. Using constraint data model can solve delaunayn is used to compute the tetrahedral
this problem. A set of Aerometric Information mesh with the 6135 spatiotemporal points as
Retrieval System (AIRS) data will be used to corner vertices. There are 30,897 tetrahedra in
illustrate how spatiotemporal interpolation data the resulting mesh. Using the mesh and the
can be represented accurately and ef ciently in original 6135 original ozone values measured
constraint databases. at its corner vertices, the annual ozone value at
The experimental AIRS data is a set of data any location and year can be interpolated, as long
with annual ozone concentration measurements as the spatiotemporal point is located inside the
in the conterminous USA (website www.epa.gov/ domain of the tetrahedral mesh.
airmarkets/cmap/data/category1.html). AIRS is a Since the 3-D shape function based spa-
computer-based repository of information about tiotemporal interpolation Eq. (4) is linear, the
airborne pollution in the US and various World interpolation results can be stored in a linear
Health Organization (WHO) member countries. constraint database. Suppose the constraint
The system is administered by the US Environ- relation Ozone_interp is used to store the
mental Protection Agency (EPA). The data cov- interpolation results. Table 6 shows one sample
erage contains point locations of the monitoring tuple of Ozone_interp. The other omitted tuples
Constraint Databases and Data Interpolation 325

Constraint Databases and Data Interpolation, Fig. 7 1209 AIRS monitoring sites with measurements in the
conterminous US

Constraint Databases and Data Interpolation, Table 6 The constraint relation Ozone_interp(x, y, t, w), which
stores the 3-D shape function interpolation results of the ozone data
X Y R W
0:002532x C 0:003385y C 0:000511t 1,
0:002709x C 0:003430y C 0:000517t 1,
0:002659x C 0:003593y C 0:000511t < q1,
0:002507x C 0:003175y C 0:000515t < q1,
x y t w v D 0:0127,
v1 D 1=6 j 1:71x C 2:17y C 0:35t 682:87 j,
v2 D 1=6 j 2:10x C 2:84y C 0:40t 790:39 j,
v3 D 1=6 j 1:28x C 1:63y C 0:24t 474:05 j,
v4 D 1=6 j 2:53x C 3:38y C 0:51t 999:13 j,
wv D 0:063v1 C 0:087v2 C 0:096v3 C 0:074v4
::
x y t w :
::
:

are of similar format. Since there are 30,897 1999). The ozone values measured at these
tetrahedra generated in the tetrahedral mesh, four points are 0.063, 0.087, 0.096, and 0.074,
there should be 30,897 tuples in Ozone_interp. respectively. In this constraint tuple, there are
The tuple shown in Table 6 corresponds 10 constraints. The relationship among these
to the interpolation results of all the points constraints is AND. The rst four constraints
located in the tetrahedron with corner vertices de ne the four facets of the tetrahedron, the next
( 68:709; 45:217; 1996), ( 68:672; 44:736; 1999), ve constraints give the volume values, and the
( 67:594; 44:534; 1995), and ( 69:214; 45:164; last constraint is the interpolation function.
326 Constraint Databases and Data Interpolation

Applications Based on IDW Spatial in each region have the same closest members
Interpolation of S . As in an ordinary Voronoi diagram, each
Inverse distance weighting (IDW) interpolation Voronoi region is still convex in a higher-order
(Shepard 1968) assumes that each measured Voronoi diagram. From the de nition of higher-
point has a local in uence that diminishes with order Voronoi diagrams, it is obvious to see that
distance. Thus, points in the near neighborhood the problem of nding the k closest neighbors
are given high weights, whereas points at a far for a given point in the whole domain, which is
distance are given small weights. Reference closely related to the IDW interpolation method
Revesz and Li (2003) uses IDW to visualize with N D k, is equivalent to constructing kth
spatial interpolation data. order Voronoi diagrams.
The general formula of IDW interpolation for Although higher-order Voronoi diagrams are
2-D problems is the following: very dif cult to create by imperative languages,
such as C, C++, and Java, they can be easily
N
X constructed by declarative languages, such as
w.x; y/ D i wi Datalog. For example, a second-order Voronoi re-
iD1 gion for points (x1 ; y1 ), (x2 ; y2 ) can be expressed
(9)
. d1i /p in Datalog as follows.
i D PN 1 p
At rst, let P .x; y/ be a relation that stores
kD1 . dk / all the points in the whole domain. Also let
Di st .x; y; x1 ; y1 ; d1 / be a Euclidean distance
where w.x; y/ is the predicted value at location relation where d1 is the distance between .x; y/
.x; y/, N is the number of nearest known points and .x1 ; y1 /. It can be expressed in Datalog as:
surrounding .x; y/; i are the weights assigned to
each known point value wi at location .xi ; yi /; di
are the 2-D Euclidean distances between each Di st .x; y; x1 ; y1 ; d1 / W
p
.xi ; yi / and .x; y/, and p is the exponent, which d1 D .x x1 /2 C .y y1 /2 :
in uences the weighting of wi on w.
For 3-D problems, the IDW interpolation
function is similar as formula (9), by measuring Note that any point .x; y/ in the plane does not
3-D Euclidean distances for di . belong to the second-order Voronoi region of the
sample points .x1 ; y1 / and .x2 ; y2 / if there exists
another sample point .x3 ; y3 / such that .x; y/
Representing Interpolation Results in
is closer to .x3 ; y3 / than to either .x1 ; y1 / or
Constraint Databases
.x2 ; y2 /. Using this idea, the complement can be
To represent the IDW interpolation, the nearest
expressed as follows:
neighbors for a given point should be found. The
idea of higher-order Voronoi diagrams (or kth
order Voronoi diagrams) can be borrowed from Not _2Vor.x; y; x1 ; y1 ; x2 ; y2 / W P .x3 ; y3 /;
computational geometry to help nd the nearest Di st .x; y; x1 ; y1 ; d1 /;
neighbors. Higher-order Voronoi diagrams gener- Di st .x; y; x3 ; y3 ; d3 /;
alize ordinary Voronoi diagrams by dealing with d1 > d3 :
k closest points. The ordinary Voronoi diagram of
a nite set S of points in the plane is a partition of
the plane so that each region of the partition is the Not _2Vor.x; y; x1 ; y1 ; x2 ; y2 / W P .x3 ; y3 /;
locus of points which are closer to one member Di st .x; y; x2 ; y2 ; d2 /;
of S than to any other member (Preparata and Di st .x; y; x3 ; y3 ; d3 /;
Shamos 1985). The higher-order Voronoi dia- d2 > d3 :
gram of a nite set S of points in the plane is a Finally, the negation of the above can be taken
partition of the plane into regions such that points to get the second-order Voronoi region as follows:
Constraint Databases and Data Interpolation 327

2Vor.x; y; x1 ; y1 ; x2 ; y2 /
(10)
W not Not_2Vor.x; y; x1 ; y1 ; x2 ; y2 / : (2,3)
•2
•3
The second-order Voronoi diagram will be the
union of all the nonempty second-order Voronoi (1,2)
regions. Similarly to the second order, any kth-
(2,4)
C
order Voronoi diagram can be constructed. (3,4)

After nding the closest neighbors for each


point by constructing higher-order Voronoi dia- 1•
•4
grams, IDW interpolation in constraint databases (1,4)
can be represented. The representation can be
obtained by constructing the appropriate N th-
order Voronoi diagram and using formula (9).
Based on formula (10), assume that the
Constraint Databases and Data Interpolation, Fig. 8
second-order Voronoi region for points (x1 ; y1 ), The 2nd order Voronoi diagram for Incoming
(x2 ; y2 ) is stored by the relation Vor2 nd.x; y; x1 ;
y1 ; x2 ; y2 /, which is a conjunction C of some
linear inequalities corresponding to the edges
of the Voronoi region. Then, using IDW (2, 3)
interpolation with N D 2 and p D 2, the value w
of any point .x; y/ inside the Voronoi region can 2• • 3
be expressed by the constraint tuple as follows:
(1,2) (3,4)
R.x; y; w/ W .x x2 /2 C .y y 2 /2
C .x x1 /2 C .y y 1 /2 w 1• • 4
(1,4)
D .x x2 /2 C .y y2 /2 w1
C .x x1 /2 C .y y1 /2 w2 ;
Vor2 nd.x; y; x1 ; y1 ; x2 ; y2 /:
Constraint Databases and Data Interpolation, Fig. 9
(11)
The 2nd order Voronoi diagram for Filter
or equivalently as,

R.x; y; w/ W .x x2 /2 C .y y 2 /2 be used to illustrate how to represent IDW spa-


tial interpolation results in constraint databases.
C .x x1 /2 C .y y 1 /2 w
Figures 8 and 9 show the second-order Voronoi
D .x x2 /2 C .y y2 /2 w1 diagrams for the sample points in Incoming(y, t,
u) and Filter(x, y, r), respectively. Please note that
C .x x1 /2 C .y y1 /2 w2 ; some second-order Voronoi regions are empty.
C: For example, there is no (1,3) region in Fig. 8,
(12) and there are no (1,3) and (2,4) regions in Fig. 9
In the above polynomial constraint relation, there (Revesz and Li 2002).
are three variables: x, y, and w. The highest-order Let INCOMING(y, t, u) and FILTER(x, y, r) be
terms in the relation are 2x 2 w and 2y 2 w, which the constraint relations that store the IDW inter-
are both cubic. Therefore, this is a cubic con- polation results of Incoming(y, t, u) and Filter(x,
straint tuple. The sensory data of the ultraviolet y, r). Based on formula (12), Table 7 shows the
radiation example in scienti c fundamentals will result of FILTER. Note that the four tuples in
328 Constraint Databases and Data Interpolation

Constraint Databases and Data Interpolation, Table 7 FILTER (x, y, r) using IDW
X Y R
x y r 2x y 20 < q0, 12x C 7y 216 < q0,
..x 2/2 C .y 14/2 /0:9 C ..x 2/2 C .y 1/2 /0:5
D .2.x 2/2 C .y 14/2 C .y 1/2 /r
x y r 2x y 20 < q0, 12x C 7y 216 < q0,
..x 25/2 C .y 1/2 /0:9 C ..x 2/2 C .y 1/2 /0:8
D .2.y 1/2 C .x 25/2 C .x 2/2 /r
x y r 2x y 20 0, 12x C 7y 216 0,
..x 25/2 C .y 14/2 /0:8 C ..x 25/2 C .y 1/2 /0:3
D .2.x 25/2 C .y 14/2 C .y 1/2 /r
x y r 2x y 20 < q0, 12x C 7y 216 0,
..x 25/2 C .y 14/2 /0:5 C ..x 2/2 C .y 14/2 /0:3
D .2.y 14/2 C .x 25/2 C .x 2/2 /r

p
Table 7 represent the four second-order Voronoi where di D .xi x/2 C .yi y/2 and
regions in Fig. 9. The result of INCOMING is wi .t / D tit2i 2 ti1
t
wi1 C tit2 ti1 w .
ti1 i2
similar and the details can be found in reference
Li (2003). Extension Approach
Since this method treats time as a third dimen-
sion, the IDW-based spatiotemporal formula is in
Applications Based on IDW the form of (9) with
Spatiotemporal Interpolation
p
di D .xi x/2 C .yi y/2 C .ti t /2 :
Similar as shape functions, IDW is originally a
spatial interpolation method, and it can be ex-
tended by reduction and extension approaches to
solve spatiotemporal interpolation problems (Li
2003).
Future Directions

Interesting directions for the future work could


Reduction Approach
be to represent more interpolation methods in
This approach rst nds the nearest neighbors
spatial constraint databases, apply more inter-
of for each unsampled point and calculates the
esting data sets to the interpolation methods,
corresponding weights i . Then, it calculates
compare the performances of the methods, and
for each neighbor the value at time t by some
animation/visualize the interpolation results.
time interpolation method. If 1-D shape function
interpolation in time is used, the time interpola-
tion will be similar to (7). The formula for this
approach can be expressed as:
Cross-References

 Constraint Database Queries


N
X . d1i /p  Constraint Databases and Moving Objects
w.x; y; t / D i wi .t /; i D PN 1 p  Constraint Databases, Spatial
iD1 kD1 . dk /
(13)  Voronoi Diagram
Constraint Databases and Moving Objects 329

References Zienkiewics OC, Taylor RL (2000) Finite element


method. The basis, vol 1. Butterworth Heinemann,
Gao J, Revesz P (2006) Voting prediction using new spa- London
tiotemporal interpolation methods. In: Proceedings of
the seventh annual international conference on digital
government research, San Diego
Li J, Narayanan R, Revesz P (2003) A shape-based ap- Constraint Databases and Moving
proach to change detection and information mining Objects C
in remote sensing. In: Chen CH (ed) Frontiers of
remote sensing information processing. WSP, Singa-
pore/River Edge, pp 63 86 Lixin Li
Li L (2003) Spatiotemporal interpolation methods in GIS. Department of Computer Sciences, Georgia
Ph.D thesis, University of Nebraska-Lincoln, Lincoln Southern University, Statesboro, GA, USA
Li L, Li Y, Piltner R (2004) A new shape function
based spatiotemporal interpolation method. In: Pro-
ceedings of the rst international symposium on con-
straint databases 2004. Lecture notes in computer Synonyms
science, vol 3074. Springer, Berlin/Heidelberg/New
York, pp 25 39
Li L, Revesz P (2002) A comparison of spatio-temporal Continuously changing maps; Moving object
interpolation methods. In: Proceedings of the second constraint databases; Moving points; Moving
international conference on GIScience 2002. Lecture regions; Spatio-temporal objects
notes in computer science, vol 2478. Springer, Berlin,
Heidelberg, New York, pp 145 160
Li L, Revesz P (2004) Interpolation methods for spatio-
temporal geographic data. J Comput Environ Urban Definition
Syst 28(3):201 227
Li L, Zhang X, Piltner R (2006) A spatiotemporal database
for ozone in the conterminous U.S. In: Proceedings
Moving objects can be represented in spatial
of the thirteenth international symposium on tem- constraint databases, given the trajectory of the
poral representation and reasoning, Budapest. IEEE, objects.
pp 168 176
Preparata FP, Shamos MI (1985) Computational geome-
try: an introduction. Springer, Berlin/Heidelberg/New
York Historical Background
Revesz P (2002) Introduction to constraint databases.
Springer, New York In general, there are two fundamental ways to
Revesz P, Li L (2002) Constraint-based visualization of
spatial interpolation data. In: Proceedings of the sixth
abstract moving objects: moving points and
international conference on information visualization. moving regions (G ting 1994). Moving points
IEEE Press, London/England, pp 563 569 can describe objects for which only the time-
Revesz P, Li L (2002) Representation and querying of dependent position is of interest, while moving
interpolation data in constraint databases. In: Pro-
ceedings of the second national conference on dig-
regions are able to describe those for which both
ital government research. Los Angeles, California, time-dependent position and spatial extent are
pp 225 228 of interest. Parametric Rectangles (PReSTO)
Revesz P, Li L (2003) Constraint-based visualization of (Revesz and Cai 2002) belong to moving region
spatiotemporal databases. In: Advances in geometric
modeling, chapter 20. Wiley, England, pp 263 276
abstraction. They use growing or shrinking
Revesz P, Wu S (2006) Spatiotemporal reasoning parametric rectangles to model spatiotemporal
about epidemiological data, Arti cial Intelligence in objects in constraint databases. One advantage in
Medicine, 38(2):157 170 using moving regions is the ability to represent
Shepard D (1968) A two-dimensional interpolation func-
tion for irregularly spaced data. In: 23nd national spatial dimensions of objects. Moving points can
conference ACM, Las Vegas. ACM, pp 517 524 also be a proper abstraction for moving objects,
330 Constraint Databases and Moving Objects

Constraint Databases
and Moving Objects,
Fig. 1 A city street
network Strada and its
adjacency-list
representation

such as people, animals, stars, cars, planes, ships, city. Adjacency-list representation (Cormen et al.
and missiles (Erwig et al. 1997). In reference 1999) of directed weighted graphs can be applied
Saglio and Moreira (1999), Saglio and Moreira to model such networks. The city street network
argued that moving points are possibly the Strada is shown in Fig. 1a and its adjacency-list
simplest class of continuously changing spatial representation list is shown in Fig. 1b. Each street
objects and there are many systems, including has the following attributes: slope, speed limit,
those dealing with the position of cars, ships, or and snow clearance priority (the less the value,
planes, which only need to keep the position of the higher the priority). These three attributes
the objects. are shown as labels of each edge in Fig. 1a.
Continuously changing maps are special cases They are also displayed in the property elds of
of moving regions. There are many applications each node in Fig. 1b. For example, for the street
that need to be visualized by continuously chang- segment s! bc , the slope is 15 , the speed limit is
ing maps which will be illustrated in Key Appli- 35 mph, and the clearance priority value is 1. The
cations. movements of snow removal vehicles in Strada
can be represented in Fig. 2 by eight Datalog
Scientific Fundamentals rules in constraint databases.

The following example describes how to illustrate Key Applications


the movements of snow removal vehicles in a
city street network by moving points in spatial There are many applications that need to be
constraint databases. modeled as moving objects, such as continuously
Suppose that snow removal vehicles are changing maps. Constraint databases are capa-
going to clear the snow on the streets of a ble of handling such applications. The MLPQ
Constraint Databases and Moving Objects 331

Constraint Databases and Moving Objects, Fig. 2 Constraint databases representation of the snow removal
vehicles for Strada

(Management of Linear Programming Queries) Constraint Databases and Moving Objects, Table 1
system is a good example. The MLPQ system is A point-based spatiotemporal relation
a constraint database system for linear constraint Drought_Point
databases (Kanjamala et al. 1998; Revesz 2002; x (easting) y (northing) Year SPI
Revesz and Li 1997). This system has a graphic 315515:56 2178768:67 1992 0:27
user interface (GUI) which supports Datalog- 315515:56 2178768:67 1993 0:17
based and icon-based queries as well as visual- :: :: :: ::
: : : :
ization and animations. The MLPQ system can
outdo the popular ArcGIS system by powerful
queries (such as recursive queries) and the ability
to display continuously changing maps. A few Drought_Point relation, as shown in Table 1,
examples are given below. was obtained from the Uni ed Climate Access
Network (UCAN) (Li 2003).
Assume that in the point-based spatiotemporal
relation Drought_Point, the 48 weather stations
SPI Spatiotemporal Data have not changed their locations for the last
10 years and measured SPI values every year.
The point-based spatiotemporal relation The spatial and temporal parts of the 2nd-order
Drought_Point (x, y, year, SPI) stores the average Voronoi region-based relation of Drought_Point
yearly SPI (Standardized Precipitation Index) are shown in Table 2.
values sampled by 48 major weather stations in Continuously changing maps in MLPQ can be
Nebraska from 1992 to 2002. SPI is a common used to visualize the 2nd-order Voronoi diagrams.
and simple measure of drought which is based Users need to push the color animation button
solely on the probability of precipitation for a in the MLPQ GUI and input the following three
given time period. Values of SPI range from parameters: the beginning time instance, ending
2.00 and above (extremely wet) to 2.00 and time instance and step size. Then, the color of
less (extremely dry), with near normal conditions each region of the map will be animated ac-
ranging from 0.99 to 0.99. A drought event is cording to its value at a speci c time instance.
de ned when the SPI is continuously negative Figure 3 shows the 2nd-order Voronoi diagram
and reaches a value of 1.0 or less, and for the 48 weather stations in Nebraska at the
continues until the SPI becomes positive. The snapshot when t D 1992 (Li 2003).
332 Constraint Databases and Moving Objects

Constraint Databases and Moving Objects, Table 2 A 2nd-order Voronoi region-based database
Drought_Vo2_Space
.x1 ; y1 /; .x2 ; y2 / Boundary
( 9820:18, 1929867.40), ( 42164:88, 1915035.54) ( 17122:48, 2203344.58), (3014.51, 2227674.50) (330-
51.50, 2227674.50), (33051.5, 2140801.51)
:: ::
: :
Drought_Vo2_Time
.x1 ; y1 /; .x2 ; y2 / Year avgSPI
( 9820:18, 1929867.4), ( 42164:88, 1915035.54) 1992 -0.47
( 9820:18, 1929867.4), ( 42164:88, 1915035.54) 1993 0.71
:: :: ::
: : :
( 507929:66, 2216998.17), ( 247864:81, 1946777.44) 2002 -0.03

Constraint Databases and Moving Objects, Fig. 3 The 2rd order Voronoi diagram for 48 weather stations in
Nebraska which consists of 116 regions

NASS Spatiotemporal Data Table 3 which uses the vector representation of


counties in Nebraska, while the temporal part is
A NASS (National Agricultural Statistics shown in the lower half of Table 3 (Li and Revesz
Service) region-based spatiotemporal database 2003).
shows the yearly corn yield and production in Continuously changing maps can be used to
each county of the state of Nebraska. The spatial animate the total corn yield in each county in Ne-
part of the database is shown in the upper half of braska during a given period. First, each county
Constraint Databases and Moving Objects 333

Constraint Databases and Moving Objects, Table 3 A region-based spatiotemporal database with separate spatial
and temporal relations
Nebraska_Corn_Space_Region
County Boundary
1 ( 656160:3, 600676.8), ( 652484:0, 643920.3), ( 607691:1, 639747.6), ( 608934:8, 615649.0),
( 607875:6, 615485.8), ( 610542:0, 576509.1), ( 607662:7, 576138.5), ( 611226:9, 537468.5),
( 607807:7, 536762.1), ( 608521:1, 527084.0), ( 660885:4, 531441.2), ( 661759:8,532153.1) C
:: ::
: :
Nebraska_Corn_Time_Region
County Year Practice Acres Yield Production
1 1947 Irrigated 2700 49 132300
1 1947 Non-irrigated 81670 18 1470060
1 1947 Total 84370 19 1602360
:: :: :: :: :: ::
: : : : : :

Constraint Databases and Moving Objects, Fig. 4 A snapshot of continuously changing maps for county-based
corn yield in Nebraska when t D 1998
334 Constraint Databases, Spatial

polygon needs to be represented in MLPQ. Al- the third national conference on digital government
though such county vector data in the US are research, Boston
Revesz P (2002) Introduction to constraint databases.
usually available in ArcView shape le format, a Springer, New York
program can be implemented to convert ArcView Revesz P, Cai M (2002) Ef cient querying of peri-
shape les to MLPQ input text les. The conver- odic spatiotemporal objects. Ann Math Artif Intell
sion from MLPQ les to shape les can also be 36(4):437 457
Revesz P, Li Y (1997) MLPQ: a linear constraint database
implemented. Figure 4 shows the snapshot during system with aggregate operators. In: Proceedings of
the color map animation when t D 1998 (Li the 1st international database engineering and ap-
2003). plications symposium. IEEE Press, Washington, DC,
pp 132 137
Saglio J-M, Moreira J (1999) Oporto: a realistic scenario
generator for moving objects. In: Proceedings of the
DEXA 99 workshop on spatio-temporal data models
Future Directions
and languages (STDML), Florence, pp 426 432

Interesting directions for future work could be


to continue the discovery of moving objects that
Constraint Databases, Spatial
are dif cult to model in relational databases but
can be conveniently modeled in spatial constraint
Peter Z. Revesz
databases, and to extend the MLPQ system so as
Department of Computer Science and
to improve the visualization/animation power of
Engineering, University of Nebraska-Lincoln,
the system.
Lincoln, NE, USA

Cross-References Synonyms

 Constraint Database Queries Databases, Relational; Query, Datalog


 Constraint Databases and Data Interpolation
 Constraint Databases, Spatial
 MLPQ Spatial Constraint Database System Definition
 Visualization of Spatial Constraint Databases
Spatial constraint databases form a generalization
of relational databases for the purpose of repre-
senting spatial and spatiotemporal data. Whereas
References
in a relational database each table is a nite set
Cormen TH, Leiserson CE, Rivest R (1999) Introduction of tuples, in a spatial constraint database, each
to algorithms. McGraw-Hill, New York table is a nite set of quanti er-free conjunc-
Erwig M, G ting R, Schneider M, Vazirgiannis M tions of atomic constraints. In spatial constraint
(1997) Spatio-temporal data types: an approach to
databases, the most frequent type of atomic con-
modeling and querying moving objects in databases.
ChoroChronos Research Project, Technical report CH- straints used are linear equations and linear in-
97-08 equalities. The variables of the atomic constraints
G ting R (1994) An introduction to spatial database sys- correspond to the attributes in the relation; hence,
tems. VLDB J 3(4):357 399
they are called attribute variables.
Kanjamala P, Revesz P, Wang P (1998) MLPQ/GIS: a GIS
using linear constraint databases. In: Prabhu CSR (ed) As an example from Revesz (2010), consider
Proceedings of the 9th COMAD international confer- the highly simpli ed map of the town of Lincoln,
ence on management of data, Hyderabad, pp 389 393 Nebraska, shown in Fig. 1.
Li L (2003) Spatiotemporal interpolation methods in GIS.
Ph.D thesis, University of Nebraska-Lincoln, Lincoln
This map can be represented in a spatial con-
Li L, Revesz P (2003) The relationship among GIS- straint database with linear inequality constraints
oriented spatiotemporal databases. In: Proceedings of as follows (Table 1).
Constraint Databases, Spatial 335

Constraint Databases,
Spatial, Fig. 1 A map of
Lincoln, Nebraska

Constraint Databases, Spatial, Table 1 Lincoln


Name X Y
Lincoln x y y x C 8; y 14; x 2; y 18; y x C 24
Lincoln x y y x C 8; y 0:5x C 12; y 18; x 14; y 8; y 3x C 32

In the above the attribute variables x and y in each row, it is customary to present them in an
represent the longitude and latitude, respectively, order that corresponds to a clockwise ordering of
as measured in units from the .0; 0/ point in the sides of the convex polygon that they together
the above map. In general, any polygonal shape represent.
can be represented by rst dividing it into a set
of convex polygons and then representing each
convex polygon with n sides by a conjunction Historical Background
of n linear inequality constraints. Note that the
above town map is a concave polygon, but it can Constraint databases, including spatial constraint
be divided along the line y D x C 8 into two databases, were proposed by Kanellakis et al. in
convex polygons. The convex pentagon above 1990. A much-delayed journal version of their
line y D x C 8 is represented by the rst original conference paper appeared in Kanellakis
row of the constraint table, while the convex et al. (1995). These papers considered a number
hexagon below line y D x C 8 is represented of constraint database query languages and
by the second row of the constraint table. Within challenged researchers to investigate further their
any row, the atomic constraints are connected by properties. Benedikt et al. (1998) showed that
commas, which simply mean conjunction. While relational calculus queries of constraint databases
the atomic constraints can be given in any order when the constraint database contains polynomial
336 Constraint Databases, Spatial

constraints over the reals cannot express even that contains all .x; y/ points that belong to the
simple Datalog expressible queries. On the other town map. Since there are an in nite number
hand, Datalog queries with linear constraints of such .x; y/ points when x and y are real
can already express some computationally hard numbers, spatial constraint databases are also
or even undecidable problems. Only in special called nitely representable in nite relational
cases, such as with gap-order constraints of the databases.
form xy c where x and y are integer or Spatial constraint databases can represent not
rational variables and c is a nonnegative constant, only areas but also boundaries by using lin-
can an algorithm be given for evaluating Datalog ear equality constraints. Representing the bound-
queries (Revesz 1993). ary of an n-ary (concave or convex) polygo-
The above results in uenced researchers to nal area requires n rows in a constraint table.
implement several spatial constraint database sys- For example, the boundary of the town of Lin-
tems with non-recursive query languages, usually coln, Nebraska, can be represented as shown in
some variation of non-recursive SQL and linear Table 2.
equality and linear inequality constraints. These In the above each range constraint of the form
systems include, in historical order, the MLPQ a x b is an abbreviation of a x; x b
system (Brodsky et al. 1997), the CCUBE sys- where x is any variable and a and b are constants.
tem (Brodsky et al. 1997), the DEDALE sys- Spatial constraint databases can be extended to
tem (Grumbach et al. 1998), and the CQA/CDB higher dimensions. For example, a Z attribute for
system (Goldin et al. 2003). The MLPQ system height or a T attribute for time can be added. As
implements both SQL and Datalog queries. an example, suppose that Fig. 1 shows the map of
Constraint databases are reviewed in a number Lincoln, Nebraska, in year 2000, and since then
of books. Chapter 5.6 of Abiteboul et al. (1995), the town has expanded to the east continuously
a standard reference in database theory, is a com- at the rate of one unit per year. Then the growing
pact description of the main ideas of constraint town area between years 2000 and 2007 can be
databases. Kuper et al. (2000) is a collection of represented as shown in Table 3.
research articles devoted to constraint databases. Spatial constraint databases with polynomial
It is a good introduction to already advanced re- constraints are also possible. With the increased
searchers. Revesz (2002) is the standard textbook complexity of constraints, more complex spatial
for the subject. It is used at many universities. and spatiotemporal objects can be represented.
Chapter 4 of Rigaux et al. (2002), which is an ex- For example, suppose that an airplane ies
cellent source on all aspects of spatial databases, over Lincoln, Nebraska. Its shadow can be
is devoted exclusively to constraint databases. represented as a spatial constraint database
Chapter 6 of Gting and Schneider (2005), which
is a sourcebook on moving object databases, is
also devoted exclusively to constraint databases. Constraint Databases, Spatial, Table 2
Lincoln_Boundary
X Y
Scientific Fundamentals x y x 2; x 6; y D 18
x y x 6; x 8; y D x C 24
The semantics of logical models of a spatial
x y x 8; x 12; y D 0:5x C 12
constraint database table is a relational database
x y x 12; x 14; y D 18
that contains all the rows that can be obtained
x y y 8; y 18; x D 14
by substituting values into the attribute variables
x y x 8; x 14; y D 8
of any constraint row such that the conjunction
x y x 6; x 8; y D 3x C 32
of constraints in that row is true. For example,
x y x 2; x 6; y D 14
it is easy to see that the semantics of the spatial
x y y 14; y 18; x D 2
constraint database table Lincoln is a relation
Constraint Databases, Spatial 337

relation Airplane_Shadow using polynomial databases can be queried. For example, the
constraints over the variables x; y and t . (Here popular Structured Query Language (SQL) for
the time unit t will be measured in seconds relational databases is also applicable to spatial
and not years as in the Lincoln_Growing constraint databases. For example, the following
example.) query nds when the towns of Lincoln, Nebraska,
Spatial constraint databases can be queried and Omaha, Nebraska, will grow into each
by the same query languages that relational other. C

SELECT Min(Lincoln_Growing.T)
FROM Lincoln_Growing, Omaha_Growing
WHERE Lincoln_Growing.X=Omaha_Growing.X AND
Lincoln_Growing.Y=Omaha_Growing.Y AND
Lincoln_Growing.T=Omaha_Growing.T

Suppose that the airplane ies over Lin- the shadow of the airplane will leave the
coln, Nebraska. The next query nds when town.

SELECT Max(Airplane_Shadow.T)
FROM Lincoln, Airplane_Shawdow
WHERE Lincoln.X = Airplane_Shadow.X AND
Lincoln.Y = Airplane_Shadow.Y

Besides SQL queries, spatial constraint representation of moving objects or spatiotempo-


databases can be queried by relational calculus ral data and high-level, often recursive, SQL or
queries (i.e., rst-order logic queries) and by Datalog queries. The following are some exam-
Datalog queries. ples of such applications.

Applications Based on Interpolated


Key Applications Spatiotemporal Data
Spatiotemporal data, just like spatial data, often
Many applications of spatial constraint database contain missing pieces of information that need
systems are similar to the applications of other to be estimated based on the known data. For
GIS systems, such as the ARC/GIS system. These example, given the prices of the houses when
applications typically include problems where they were sold during the past 30 years in a
various kinds of maps, road networks, and land town, one may need to estimate the prices of
utilization information are represented and over- those houses which were not sold at any time
laying of different maps plays a key role in within the past 30 years. In such applications the
information processing and querying. However, results of the interpolation can be represented as
ef ciently describing and querying spatial or geo- a spatial constraint database with x; y elds for
graphic data is just one application area of spatial the location of houses, t eld for time, and a p
constraint databases. eld for price (Li and Revesz 2004). The value
Spatial constraint databases are also useful of p will be estimated as some function, often
in applications that go beyond traditional GIS a linear equation, of the other variables. If the
systems. These applications typically require the spatiotemporal interpolation result is represented
338 Constraint Databases, Spatial

Constraint Databases, Spatial, Table 3 Lincoln_Growing


XYT
xyt y x C 8; y 14; x 2; y 18; y x C 24; 2000 t 2007
xyt y x C 8; y 0:5x C 12; y 18; x 14 C .t 2000/; y 8; y 3x C 32; 2000 t 2007

in a spatial constraint database like MLPQ, then it hurricane. Endangered airplanes can be given a
becomes easy to use the data for applications like warning and rerouted if need be. Another applica-
estimating price and tax payments for particular tion is checking whether the airspace surrounding
houses, estimating total taxes received by the an airport ever gets too crowded by the arriving
town from sale of houses in any subdivision of and departing airplanes.
the town, etc.

Applications Based on Continuously Future Directions


Changing Maps
In many applications maps are not static but Spatial constraint database systems that imple-
continuously changing. For example, the regions ment polynomial constraints are being developed.
where drought occurs or where an epidemic There are a growing number of spatial constraint
occurs change with time within every country database applications. Improved algorithms for
or state. Such changing maps are conveniently indexing and querying spatial constraint data are
represented in spatial constraint databases also being developed. Finally, improved high-
similarly to the representation of the growing level visualization methods of constraint data are
town area (Revesz and Wu 2006). In these cases sought after to enhance the user interface of
too, when the changing map representations are spatial constraint database systems.
available in a spatial constraint database, many
types of applications and speci c queries can be
developed. For example, a drought monitoring Cross-References
application can estimate the damage done to
insured agricultural areas. Here the changing  Constraint Database Queries
drought map needs to be overlaid with static  Constraint Databases and Data Interpolation
maps of insured areas and particular crop areas.  Constraint Databases and Moving Objects
Another example is tracking the spread of an  Linear Versus Polynomial Constraint Databases
epidemic and estimating when it may reach  MLPQ Spatial Constraint Database System
certain areas of the state or country and how  Spatial Constraint Databases, Indexing
many people may be affected.  Visualization of Spatial Constraint Databases

Applications Based on Moving Objects


Moving objects can be animate living beings, that References
is, people and animals, natural phenomena such
as hurricanes and ocean currents, or man-made Abiteboul S, Hull R, Vianu V (1995) Foundations
moving objects such as airplanes, cars, missiles, of databases. Addison-Wesley, Reading,
Massachusetts
robots, and trains. Moving objects can also be
Benedikt M, Dong G, Libkin L, Wong L (1998) Relational
represented in spatial constraint databases. The expressive power of constraint query languages. J
representation can then be used in a wide range of ACM 45(1):1 34
applications from weather prediction to airplane Brodsky A, Segal V, Chen J, Exarkhopoulo P (1997) The
CCUBE constraint object-oriented database system.
and train scheduling or some combination appli-
Constraints 2(3-4):245 277
cation. For example, one may need to nd which Goldin D, Kutlu A, Song M, Yang F (2003) The constraint
airplanes are in danger of being in uenced by a database framework: lessons learned from CQA/CDB.
Contextualization 339

In: Proceedings of international conference on data


engineering, pp 735 737 Constraints, Capability
Grumbach S, Rigaux P, Segou n L (1998) The DEDALE
system for complex spatial queries. In: Proceedings of  Time Geography
ACM SIGMOD international conference on manage-
ment of data, pp 213 24
Gting RH, Schneider M (2005) Moving objects databases.
Morgan Kaufmann, Amsterdam
Constraints, Coupling C
Kanellakis PC, Kuper GM, Revesz PZ (1990) Constraint
query languages. In: Proceedings of ACM symposium
on principles of database systems, pp 299 313  Time Geography
Kanellakis PC, Kuper GM, Revesz PZ (1995) Constraint
query languages. J Comput Syst Sci 51(1):26 52
Kuper GM, Libkin L, Paredaens J (eds) (2000) Constraint
databases. Springer, Berlin Content Metadata
Li L, Revesz PZ (2004) Interpolation methods for spa-
tiotemporal geographic data. Comput Environ Urban
Syst 28:201 227  Feature Catalogue
Revesz PZ (1993) A closed-form evaluation for Datalog
queries with integer (gap)-order constraints. Theor
Comput Sci 116(1):117 49
Revesz PZ (2002) Introduction to constraint databases. Context-Aware
Springer, New York
Revesz PZ (2010) Introduction to databases: from biolog-  User Interfaces and Adaptive Maps
ical to spatio-temporal. Springer, New York
Revesz P, Wu S (2006) Spatiotemporal reasoning about
epidemiological data. Artif Intell Med 38(2):157 170
Rigaux P, Scholl M, Voisard A (2002) Spatial databases Context-Aware Dynamic Access
with application to GIS. Morgan Kaufmann, San Fran-
cisco
Control

 Security Models, Geospatial


Recommended Reading

Revesz P, Li Y (1997) MLPQ: a linear constraint database


system with aggregate operators. In: Proceedings of Context-Aware Presentation
1st international database engineering and applications
symposium. IEEE Press, Washington, pp 132 137
 Information Presentation, Dynamic

Constraint Programming
Context-Aware Role-Based Access
 Integration of Spatial Constraint Databases Control

 Security Models, Geospatial


Constraint Query Languages

 Constraint Database Queries Context-Sensitive Visualization


 Linear Versus Polynomial Constraint Databases
 Polynomial Spatial Constraint Databases  Information Presentation, Dynamic

Constraints, Authority Contextualization

 Time Geography  Geospatial Semantic Web: Personalization


340 Contingency Management System

Definition
Contingency Management System
A continuous query is a new query type that is
 Emergency Evacuation Plan Maintenance
issued once and is evaluated continuously in a
database server until the query is explicitly termi-
nated. The most important characteristic of con-
Continuity Matrix tinuous queries is that their query result does not
only depend on the present data in the databases
 Spatial Weights Matrix but also on continuously arriving data. During
the execution of a continuous query, the query
result is updated continuously when new data
Continuity Network arrives. Continuous queries are essential to ap-
plications that are interested in transient and fre-
 Conceptual Neighborhood
quently updated objects and require monitoring
query results continuously. Potential applications
of continuous queries include but are not lim-
ited to real-time location-aware services, network
Continuous Location-Based Queries ow monitoring, online data analysis and sensor
networks.
 Continuous Queries in Spatio-Temporal Continuous queries are particularly important
Databases in Spatiotemporal Databases. Continuous spatio-
temporal queries are evaluated continuously
against spatiotemporal objects and their results
Continuous Queries are updated when interested objects change
spatial locations or spatial extents over time.
 Indexing, Query and Velocity-Constrained Figure 1 gives an example of a continuous query
 Queries in Spatiotemporal Databases, Time Pa- in a spatiotemporal database. In Fig. 1, o1 to
rameterized o8 are objects moving in the data space and
Q is a continuous spatiotemporal query that
tracks moving objects within the shaded query
region. As plotted in Fig. 1a, the query answer
Continuous Queries in of Q with respect to time t1 consists of three
Spatio-Temporal Databases objects: fo2 ; o3 ; o4 g. Assume that at a later time
t2 , the objects change their locations as shown
Xiaopeng Xiong1 , Mohamed F. Mokbel2 , and in Fig. 1b. Particularly, o2 and o3 move out
Walid G. Aref1 of the query region while o5 moves inside the
1
Department of Computer Science, Purdue query region. o4 also moves, however it remains
University, West Lafayette, IN, USA inside the query region. Due to the continuous
2
Department of Computer Science and evaluation of Q, the query answer of Q will be
Engineering, University of Minnesota, updated to fo4 ; o5 g at time t2 .
Minneapolis, MN, USA

Historical Background
Synonyms
The study of continuous spatiotemporal queries
Continuous location-based queries; Continuous started in the 1990s as an important part of the
query processing; Long-running spatiotemporal study of Spatiotemporal Databases. Since then,
queries; Moving queries continuous spatiotemporal queries have received
Continuous Queries in Spatio-Temporal Databases 341

Continuous Queries in a b
Spatio-Temporal
Databases, Fig. 1 An O1 O1 O6
example of continuous O6
query O2
Q Q
O3
O2 O3 O7 O7
O4
O4
O5
C
O8 O8
O5

At time t1 At time t2

increasing attention due to the advances and Continuous k-nearestneighbor (CkNN)


combination of portable devices and locating queries (Tao et al. 2002; Xiong et al. 2005;
technologies. Recently, the study of continuous Li and Han 2004). A CkNN query is a query
queries in spatiotemporal databases has become tracking continuously the k objects that are
one of the most active elds in the database the nearest ones to a given query point. The
domain. objects of interest and/or the query point may
move during query evaluation.
Scientific Fundamentals
Examples:
There are various types of continuous queries that Continuously track the nearest maintenance truck
can be supported in spatiotemporal databases. In to my vehicle (in the battle eld).
general, continuous spatiotemporal queries can
be classi ed into various categories based on Continuously show me the 10 nearest hotels during
my drive.
different classifying criteria. The most common
classifying criteria are based on the type of query
interest, the mobility of query interest and the Continuous Reverse Nearestneighbor (CRNN)
time of query interest (Mokbel et al. 2003). queries (Kang et al. 2007; Xia and Zhang
According to the type of query interest, there 2006). A CRNN query continuously identi es
are a wide variety of continuous spatiotemporal a set of objects that have the querying object
queries. The following describes some interesting as their nearest neighbor object.
query types that are widely studied.
Examples:
Continuous range queries (Kalashnikov et al.
2002; Mokbel et al. 2004). This type of con- Continuously nd soldiers who need my help (I
am the nearest doctor to him/them).
tinuous query is interested in spatiotemporal
objects inside or overlapping with a given Continuously send electronic advertisement of our
spatial region. Continuous range queries have hotel to vehicles that have our hotel as their nearest
many important applications and are some- hotel.
times used as a lter step for other types of
Based on the mobility of interest, continuous
continuous queries.
spatiotemporal queries can be classi ed as mov-
ing queries over static objects, static queries over
Examples:
moving objects and moving queries over moving
Continuously report the number of trucks on Road objects (Mokbel et al. 2003).
US-52.
Moving queries over static objects. In this
Continuously show me all the hotels within 3 miles
during my drive. query category, the objects of interest are
342 Continuous Queries in Spatio-Temporal Databases

Continuous Queries in a b
Spatio-Temporal
Databases, Fig. 2 O1 O6
O1
Continuous spatiotemporal O6
query types based on O2
mobility Q Q
O3
O2 O3 O7 O7
O4
O5
O4
O8 O8
O5

At time t1 At time t2

static while the query region or query point interest (i.e., the buses and the taxis) continu-
of the continuous spatiotemporal query ously move.
may change over time. This query type is Moving queries over moving objects. In this
abstracted in Fig. 2a. In Fig. 2a, the query o2 query category, both the query region/point
moves along with time (e.g., at t1 , t2 and t3 / of continuous spatiotemporal query and the
and the objects (represented by black dots) are objects of interest are capable of moving. This
stationary. query type is abstracted in Fig. 2c. As shown
in Fig. 2c, the query Q and the objects are both
Examples: moving over time (e.g., at t1 , t2 and t3 /.

Continuously return all the motels within 3 miles Example:


to John during John s road-trip.
Continuously report all the cars within 1 mile when
Continuously nd the nearest gas station to Alice the sheriff drives along the State street.
while she is driving along Highway IN-26.
In this example, the query region (i.e., 1-mile
In the examples above, the objects of interest region surrounding the location of the sheriff)
(i.e., the gas stations and the motels) are static and the objects of interest (i.e., the cars) are both
objects and the query regions/points (i.e., the 3- moving.
mile region surrounding John s location and the Based on the time of query interest, con-
location of Alice) are continuously changing. tinuous spatiotemporal queries can be classi ed
as historical queries, present queries and future
Static queries over moving objects. In this queries (Mokbel et al. 2003). Figure 3 plots the
query category, the query region/point of three types of queries. In Fig. 3, the gray dots
continuous spatiotemporal query remains represent historical object locations and the black
static while the objects of interest are dots represent current object locations. The dot-
continuously moving. This query type is ted lines with arrows represent anticipated object
abstracted in Fig. 2b. As plotted in Fig. 2b, the movements in the future based on the objects
objects keep moving along with time (e.g., at current velocities.
t1 , t2 and t3 / while the query Q is stationary.
Examples: Historical queries. These types of continuous
queries are interested in events of the past.
Continuously monitor the number of buses in
Historical queries are especially interesting to
the campus of Purdue University.
applications in data warehouse analysis and
Continuously nd the nearest 100 taxis to a
business intelligence.
certain hotel.
In the above examples, the query region (i.e., Example:
the university campus) or the query point (i.e., Continuously calculate the average number of
the hotel) does not move while the objects of trucks on Highway I-65 for the past 2 hours.
Continuous Queries in Spatio-Temporal Databases 343

Continuous Queries in a b
Spatio-Temporal
Databases, Fig. 3 O1 O1 O6
Continuous spatiotemporal O6
query types based on the O2
time of interest Q Q
O3
O2 O3 O7 O7
O4
O4
O8
O5
C
O8
O5

At time t1 At time t2

aircrafts. When the velocities of the aircrafts are


In the above example, the query result depends changed, the query is re-evaluated and the query
on the historical data of the past 2 h and is result may change accordingly.
continuously updated when time evolves.

Present queries. In this query category, Key Applications


continuous queries are evaluated against only
the current status of spatiotemporal objects. Continuous spatiotemporal queries have numer-
Present queries are important to real-time ous applications in a wide range of elds.
monitoring and location-based services.
Traf c Monitoring
Examples: Continuous spatiotemporal queries can be ap-
plied to monitor real-time traf c conditions in
Continuously return the total number of vehicles
in the monitored area. interested areas. Abnormal traf c conditions such
as vehicle collision and traf c congestion can be
Send an alarm once the child steps out of the detected and monitored by continuously analyz-
neighbor s home.
ing incoming traf c data. Besides traf c detection
The query results of the examples above de- and monitoring, recommendations of alternative
pend only on the current locations of the objects routes can be sent to drivers, allowing them to
of interest (i.e., the locations of vehicles and the bypass the slow-traf c roads.
location of the child).
Traf c Pattern Detection
Future queries. In this query type, the query Traf c usually demonstrates a repeated pattern
results are based on the predication of fu- with respect to location and time. Detection of
ture events. The evaluation of future queries such a pattern is important to predict the traf c
usually relies on the knowledge of expected conditions in the future at the interested area.
object movement, such as the velocity in- Continuous spatiotemporal queries can be used
formation of the objects. Future queries are to detect such patterns by continuously analyz-
particularly useful for alarm systems and risk ing traf c data and maintaining spatio-temporal
prevention. histograms.

Example: Danger Alarming


Depending on the present or predicted locations
Send an alarm if two aircrafts are becoming less of interested objects, continuous queries can trig-
than 5 miles apart from each other in 3 minutes.
ger alarms to prevent potential dangers. Danger
The above query can be evaluated continu- alarming has applications in both daily life and
ously based on the velocity information of the national security. For example, alarms can be sent
344 Continuous Queries in Spatio-Temporal Databases

when kids step out of the home backyard or when nomena such as forest res or polluted water do-
an unidenti ed ight is predicted to y over a mains. Sensors in a sensor network continuously
military base in 5 min. detect environmental events and feed the data into
spatiotemporal databases. Then, the properties of
the environmental phenomena (e.g., the shape
Digital Battle eld and the movement of the re or the polluted water
In the digital battle eld, continuous queries can area) can be continuously monitored.
help commanders make decisions by continu-
ously monitoring the context of friendly units
such as soldiers, tanks and ights.

Road-Trip Assistance Future Directions


Travel by driving will become more convenient
with the aid of continuous spatiotemporal The study of continuous spatiotemporal queries
queries. A driver can be continuously informed is steadily progressing. Current efforts focus on
about information such as nearby hotels, gas a variety of challenging issues (Roddick et al.
stations and grocery stores. More dynamic 2004; Sellis 1999a, b). The following provides
information such as nearby traf c conditions and some directions among a long list of research
weather alarms based on the current location can topics.
also be supported by integrating corresponding
information in spatiotemporal databases. Query Language. This direction is to de ne an
expressive query language so that any contin-
uous spatiotemporal queries can be properly
Location-Based E-Commerce expressed.
Continuous spatiotemporal queries can be Novel Query Types. More query types are
utilized to shift E-commerce from web-based proposed based on new properties of spa-
E-commerce to location-based E-commerce. tiotemporal data. Examples of new continuous
Location-based E-commerce is E-commerce spatiotemporal query types include continu-
associated with the locations of potential ous probabilistic queries, continuous group
customers. One example of location-based nearest queries etc.
E-commerce is sending coupons to all vehicles Query Evaluation. Due to the continuous eval-
that are within twenty miles of my hotel . uation of queries, ef cient query evaluation
algorithms are needed to process queries in-
Climate Analysis and Predicting crementally and continuously whenever data
Climatology study can bene t from employing is updated.
continuous spatiotemporal queries over climate Data Indexing. Traditional data indexing
data. Climate phenomena such as hurricanes, structures usually do not perform well
storms and cumulonimbus clouds can be modeled under frequent object updates. There is a
as spatiotemporal objects with changing spatial challenging issue on designing updatetolerant
extents and moving locations. Climate phenom- data indexing to cope with continuous query
ena can be continuously monitored and it is plau- evaluation.
sible to analyze historical climate phenomena and Scalability. When the number of moving ob-
to predict future climate phenomena. jects and the number of continuous queries be-
come large, the performance of spatiotempo-
ral databases will degrade and cannot provide
Environmental Monitoring a timely response. Increasing the scalability of
Continuous spatiotemporal queries can work with spatiotemporal databases is an important topic
sensor networks to monitor environmental phe- to address.
Contraflow for Evacuation Traffic Management 345

Cross-References
Continuously Changing Maps
 Queries in Spatiotemporal Databases, Time Pa-
 Constraint Databases and Moving Objects
rameterized
 Spatiotemporal Database Modeling with an Ex-
tended Entity-Relationship Model
C
Contraflow for Evacuation Traffic
References Management
Kalashnikov DV, Prabhakar S, Hambrusch SE, Aref WA
(2002) Ef cient evaluation of continuous range queries Brian Wolshon
on moving objects. In: DEXA 02: proceedings of the Department of Civil and Environmental
13th international conference on database and expert Engineering, Louisiana State University, Baton
systems applications. Springer, Heidelberg, pp 731 Rouge, LA, USA
740
Kang JM, Mokbel MF, Shekhar S, Xia T, Zhang D (2007)
Continuous evaluation of monochromatic and bichro-
matic reverse nearest neighbors. In: ICDE, Istanbul Synonyms
Li JYY, Han J (2004) Continuous k-nearest neighbor
search for moving objects. In: SSDBM 04: proceed-
ings of the 16th international conference on scienti c All-lanes-out; Emergency preparedness; Evacua-
and statistical database management (SSDBM 04). tion planning; Merge designs; One-way-out evac-
IEEE Computer Society, Washington, DC, p 123 uation; Reversible and convertible lanes; Split
Mokbel MF, Aref WA, Hambrusch SE, Prabhakar S
(2003) Towards scalable location-aware services: re-
designs
quirements and research issues. In: GIS, New Orleans
Mokbel MF, Xiong X, Aref WA (2004) SINA: scalable
incremental processing of continuous queries in spa-
tiotemporal databases. In: SIGMOD, Paris Definition
Roddick JF, Hoel E, Egenhofer ME, Papadias D,
Salzberg B (2004) Spatial, temporal and spatiotem- Contra ow is a form of reversible traf c oper-
poral databases hot issues and directions for PHD
research. SIGMOD Rec 33(2):126 131 ation in which one or more travel lanes of a
Sellis T (1999) Chorochronos research on spatiotem- divided highway are used for the movement of
poral database systems. In: DEXA 99: proceedings traf c in the opposing direction (The common
of the 10th international workshop on database & de nition of contra ow for evacuations has been
expert systems applications. IEEE Computer Society,
Washington, DC, p 452 broadened over the past several years by emer-
Sellis TK (1999) Research issues in spatiotemporal gency management of cials, the news media, and
database systems. In: SSD 99: proceedings of the the public to include the reversal of ow on
6th international symposium on advances in spatial any roadway during an evacuation.) (American
databases. Springer, London, pp 5 11
Tao Y, Papadias D, Shen Q (2002) Continuous nearest Association of State Highway and Transportation
neighbor search. In: VLDB, Hong Kong Of cials 2001). It is a highly effective strategy
Xia T, Zhang D (2006) Continuous reverse nearest neigh- because it can both immediately and signi cantly
bor monitoring. In: ICDE 06: proceedings of the 22nd increase the directional capacity of a roadway
international conference on data engineering (ICDE
06), Washington, DC, p 77 without the time or cost required to plan, design,
Xiong X, Mokbel MF, Aref WG (2005) SEA-CNN: and construct additional lanes. Since 1999, con-
scalable processing of continuous K-nearest neighbor tra ow has been widely applied to evacuate re-
queries in spatiotemporal databases. In: ICDE, Tokyo
gions of the southeastern United States (US)
when under threat from hurricanes. As a result
Continuous Query Processing of its recent demonstrated effectiveness during
Hurricane Katrina (Wolshon 2006), it also now
 Continuous Queries in Spatio-Temporal looked upon as a potential preparedness measure
Databases for other mass-scale hazards.
346 Contraflow for Evacuation Traffic Management

Contra ow segments are most common and While the date of the rst use of contra ow
logical on freeways because they are the highest for an evacuation is not known with certainty,
capacity roadways and are designed to facilitate interest in its potential began to be explored
high speed operation. Contra ow is also more after Hurricane Andrew struck Florida in 1992.
practical on freeways because these routes do By 1998, transportation and emergency manage-
not incorporate at-grade intersections that inter- ment of cials in both Florida and Georgia had
rupt ow or permit unrestricted access into the plans in place to use contra ow on segments
reversed segment. Freeway contra ow can also of Interstate freeways. Ultimately, the watershed
be implemented and controlled with fewer man- event for evacuation contra ow in the United
power resources than unrestricted highways. States was Hurricane Floyd in 1999. Since then,
Nearly all of the contra ow strategies cur- every coastal state threatened by hurricanes has
rently planned on US freeways have been de- developed and maintains plans for the use of
signed for the reversal of all inbound lanes. This evacuation contra ow.
con guration, shown schematically in Inset 1d of Hurricane Floyd triggered the rst two major
Fig. 1, is commonly referred to as a One-Way- implementations of contra ow, one on a segment
Out or All-Lanes-Out evacuation. Though not of Interstate (I) 16 from Savannah to Dublin,
as popular, some contra ow plans also include Georgia and the other on I-26 from Charleston
options for the reversal of only one of the inbound to Columbia, South Carolina. The results of both
lanes (Inset 1b) with another option to use one of these applications were generally positive,
or more of the outbound shoulders (Inset 1c) although numerous areas for improvement were
(Wolshon 2001). Inbound lanes in these plans also identi ed. The contra ow application in
are maintained for entry into the threat area by South Carolina was particularly interesting
emergency and service vehicles to provide assis- because it was not pre-planned. Rather, it was
tance to evacuees in need along the contra ow implemented on an improvisational basis after a
segment. strong public outcry came from evacuees trapped
for hours in congested lanes of westbound I-26
seeking ways to use the near-empty eastbound
Historical Background lanes.
The rst post-Floyd contra ow implementa-
Although evacuation-speci c contra ow is a rel- tions occurred in Alabama for the evacuation of
atively recent development, its application for Mobile and Louisiana for the evacuation New
other types of traf c problems is not new (Wol- Orleans. Once again, many lessons were learned
shon and Lambert 2004). In fact, various forms and numerous improvements in both physical
of reversible traf c operation have been used and operational aspects of the plans were sug-
throughout the world for decades to address many gested. The timing of these events was quite
types of directionally unbalanced traf c condi- fortuitous for New Orleans. Within 3 months
tions. They have been most common around ma- of the major changes that were implemented to
jor urban centers where commuter traf c is heavy the Louisiana contra ow plan after Hurricane
in one direction while traf c is light in the other. Ivan, they were put into operation for Hurri-
Reverse and contra ow operations have also been cane Katrina. The changes, so far the most ag-
popular for managing the infrequent, but periodic gressive and far-ranging of any developed until
and predictable, directionally imbalanced traf c that time (Wolshon et al. 2006), involved the
patterns associated with major events like con- closure of lengthy segments of interstate free-
certs, sporting events, and other public gather- way, forced traf c onto alternative routes, es-
ings. Reversible lanes have also been cost effec- tablished contra ow segments across the state
tive on bridges and in tunnels where additional di- boundary into Mississippi, coordinated parallel
rectional capacity is needed, but where additional non-freeway routes, and recon gured several in-
lanes can not be easily added. terchanges to more effectively load traf c from
Contraflow for Evacuation Traffic Management 347

Contraflow for Evacuation Traffic Management, Fig. 1 Freeway contra ow lane use con gurations for evacuations
(Wolshon 2001)

surface streets. The results of these changes were no other type of natural or manmade hazard. The
re ected in a clearance time for the city that was rst reason for this is that these two hazards affect
about half of the previous prediction (Wolshon much greater geographic areas and tend to be
and McArdle). slower moving relative to other hazards. Because
of their scope they also create the need move
larger numbers of people over greater distances
than other types of hazards. The second reason is
Scientific Fundamentals that contra ow requires considerable manpower
and materiel resources as well as time to mobi-
Although the basic concept of contra ow is sim-
lize and implement. Experiences in Alabama and
ple, it can be complex to implement and oper-
Louisiana showed that the positioning of traf c
ate in actual practice. If not carefully designed
control devices and enforcement personnel takes
and managed, contra ow segments also have the
at least 6 h not including the time to plan and
potential to be confusing to drivers. To insure
preposition equipment for the event. In Florida,
safe operation, improper access and egress move-
where needs are great and manpower resources
ments must be prohibited at all times during its
are stretched thin, evacuation contra ow requires
operation. Segments must also be fully cleared
involvement from the Florida National Guard.
of opposing traf c prior to initiating contra ow
For this reason (among others), Florida of cials
operations. These are not necessarily easy to
require a minimum of 49 h of advanced mobi-
accomplish, particularly in locations where seg-
lization time for contra ow to be implemented
ments are in excess of 100 miles and where
(Wolshon et al. 2005).
interchanges are frequent. For these reasons some
transportation of cials regard them to be risky
and only for use during daylight hours and under Operational Effects of Contra ow
the most dire situations. They are also the reason As the goal of an evacuation is to move as many
why contra ow for evacuation has been planned people as quickly out of the hazard threat zone
nearly exclusively for freeways, where access and as possible, the primary goal of contra ow is to
egress can be tightly controlled. increase the rate of ow and decrease the travel
To now, contra ow evacuations have also been time from evacuation origins and destinations.
used only for hurricane hazards and wild res and Prior to eld measurement, it was hypothesized
348 Contraflow for Evacuation Traffic Management

5000
Hurricane Ivan Hurricane Katrina
Total
8/26 thru 8/29, 2005
4500 9/14 and 9/15, 2004 Northbound
Volume
w/ contraflow
4000

3500
Total
3000 Northbound
Volume
2500
2000 Northbound Volume
in “Normal” Lanes
1500
1000
500
0
0 12 24 12 24 12 24 12 24 12 24 12 24 12 24
TUESDAY WEDNESDAY FRIDAY SATURDAY SUNDAY MONDAY

Contraflow for Evacuation Traffic Management, Fig. 2 Northbound traf c volume I-55 at Fluker Louisiana
(Data source: LA DOTD)

that the ow bene ts of contra ow would be left side of the graph) a total of 60,721 vehicles
substantial, but less than that of an equivalent traveled northbound through this location. During
normally owing lane (Wolshon 2001). These the Katrina evacuation, the total volume was
opinions were based on measurements of ow 84,660 vehicles during a corresponding 48 h pe-
on I-26 during the Hurricane Floyd evacuation riod. It is also worthy to note that the duration of
and the theory that drivers would drive at slower the peak portion of the evacuation (i.e., when the
speeds and with larger spacing in contra ow volumes were noticeably above the prior 3 week
lanes. average) was about the same for both storms.
The highest ow rates measured by the South The data in Fig. 2 are also of interest because
Carolina Department of Transportation (DOT) they are consistent with prior analytical models of
during the Floyd evacuation were between 1500 evacuation that have estimated maximum evacua-
to 1600 vehicles per hour per lane (vphpl) (United tion ow on freeways with contra ow to be about
States Army Corps of Engineers 2000). Traf c 5000 vph. One of the dif culties in making full
ows measured during the evacuations for Hurri- analyses of evacuation volume in general, and of
canes Ivan and Katrina on I-55 in Louisiana were contra ow volume in speci c, has been a lack
somewhat less than the South Carolina rates. of speed data. Although the ow rates recorded
Flows in the normal- ow lanes of I-55 averaged during the two recent Louisiana hurricane evacu-
about 1230 vphpl during the peak 10 h of the ations are considerably below than the theoretical
evacuation. Flow rates in the contra ow lanes capacity of this section of freeway, it can not
during the same period averaged about 820 vphpl. be determined with certainty if the conditions
These volumes compare to daily peaks of about were congested with low operating speeds and
400 vphpl during routine periods and a theoretical small headways or relatively free owing at more
capacity of 1800 2000 vphpl for this segment. moderate levels of demand. It is also interesting
The graph of Fig. 2 illustrates the hourly traf c to note that empirical observation of speed at
ow on I-55 during the evacuations for Hurri- a point toward the end of the segment did not
canes Ivan (when contra ow was not used) and appear to support the popular theory of elevated
Katrina (when contra ow was used). During the driver caution during contra ow. In fact, traf c
48 h period of the Ivan evacuation (shown on the enforcement personnel in Mississippi measured
Contraflow for Evacuation Traffic Management 349

speeds well in excess of posted speed limits as the segment and prohibit the segment from carrying
initial group of drivers moved through the newly capacity-level demand. This was illustrated by
opened lanes. I-10 contra ow segment in New Orleans during
the Hurricane Ivan evacuation. At that time,
Elements of Contra ow Segments evacuating traf c vehicles in the left and center
Reversible roadways have a number of physical outbound lanes of I-10 were transitioned across
and operational attributes that common among the median and into the contra ow lanes using C
all applications. The principle physical attributes a paved crossover. However, the combination of
are related to spatial characteristics of the design, the crossover design, temporary traf c control
including its overall length, number of lanes, devices, presence of enforcement personnel,
as well as the con guration and length of the and weaving vehicles created a ow bottleneck
inbound and outbound transition areas. The pri- that restricted in ow into the contra ow lanes.
mary operational attributes are associated with This caused two problems. First, it limited
the way in which the segment will be used and in- the number of vehicles that could enter the
clude the temporal control of traf c movements. contra ow lanes limiting ow beyond the entry
The temporal components of all reversible lane point signi cantly below its vehicle carrying
segments include the frequency and duration of capability. The other was that it caused traf c
a particular con guration and the time required queues upstream of the crossover that extended
to transition traf c from one direction to another. back for distances in excess of 14 miles. This plan
The duration of peak-period commuter reversible was signi cantly improved prior to the Katrina
applications, for example, typically last about evacuation 1 year later by permitting vehicles
2 h (not including set-up, removal, and transition to enter the contra ow lanes at multiple points,
time) with a twice daily frequency. Evacuation spatially spreading the demand over a longer
contra ow, however, may only be implemented distance and reducing the length and duration
once in several years, its duration of operation amount of the congested conditions (Wolshon
may last several days. et al. 2006).
Like all reversible ow roadways, contra ow Inadequate designs at the downstream end of
lanes need to achieve and maintain full utilization contra ow segments can also greatly limit its
to be effective. Although this sounds like an obvi- effectiveness. Prior experience and simulation
ous fact, it can be challenging to achieve in prac- modeling (Lim 2003) have shown that an inabil-
tice. The most common reason for underutiliza- ity to move traf c from contra ow lanes back into
tion has been inadequate transitions into and out normally owing lanes will result in congestion
of the contra ow segment. Contra ow requires a backing up from the termination transition point
transition section at the in ow and out ow ends in the contra ow lanes. Under demand conditions
to allow drivers to maneuver into and out of the associated with evacuations, queue formation can
reversible lanes from the unidirectional lanes on occur quite rapidly and extend upstream for many
the approach roadways leading into it. Since these miles within hours. To limit the potential for
termini regulate the ingress and egress of traf c such scenarios, con gurations that require merg-
entering and exiting the segment and they are ing of the normal and contra owing lanes are
locations of concentrated lane changing as drivers discouraged; particularly if they also incorporate
weave and merge into the desired lane of travel, lane drops. Two popular methods that are used
they effectively dictate the capacity of the entire to terminate contra ow include routing the two
segment. traf c streams at the termination on to separate
Through eld observation and simulation routes and reducing the level of out ow demand
studies (Theodoulou 2003; Williams et al. 2007) at the termination by including egress point along
it has been shown that contra ow entry points the intermediate segment. Several of the more
with inadequate in ow transitions result in traf c common con gurations are discussed in the fol-
congestion and delay prior to the contra ow lowing section.
350 Contraflow for Evacuation Traffic Management

Contra ow Plans and Designs nating route options at the end of the segment. In
The primary physical characteristics of con- some older designs, the contra ow traf c stream
tra ow segments are the number of lanes and was planned to be routed onto an intersecting
the length. A 2003 study (Urbina and Wolshon arterial roadway. One of the needs for this type of
2003) of hurricane evacuation plans revealed that split design is adequate capacity on the receiving
18 controlled access evacuation contra ow ow roadway.
segments and three additional arterial reversible Merge termination designs also have pros and
roadway segments have been planned for use in cons. Not surprisingly, however, these costs and
the US. Currently, all of the contra ow segments bene ts are nearly the exact opposite of split
are planned for a full One-Way-Out operation. designs in their end effect. For example, most
The shortest of the contra ow freeway segments merge designs preserve routing options for evac-
was the I-10 segment out of New Orleans at uees because they do not force vehicles on to
about 25 miles long. The longest were two 180 adjacent roadways and exits. Unfortunately, the
segments of I-10 in Florida; one eastbound from negative side to this is that they also have a
Pensacola to Tallahassee and the other westbound greater potential to cause congestion since they
from Jacksonville to Tallahassee. Most of the merge traf c into a lesser number of lanes. At
others were between 85 and 120 miles. rst glance it would appear illogical to merge
In the earliest versions of contra ow, nearly all two high volume roadways into one. However,
of the planned segments that were identi ed in in most locations where they are planned exit
the study were initiated via median crossovers. opportunities along the intermediate segment will
Now that single point loading strategies have be maintained to decrease the volumes at the end
been shown to be less effective, many locations of the segment.
are changing to multi-point loading. Most popu-
lar of these are median crossovers, with supple-
mental loading via nearby reversed interchange Key Applications
ramps.
The termination con gurations for the The list of applications for contra ow continues
reviewed contra ow segments were broadly to grow as transportation and emergency pre-
classi ed into one of two groups. The rst were paredness agencies recognize its bene ts. As a
split designs, in which traf c in the normal and result, the number of locations that are contem-
contra owing lanes were routed onto separate plating contra ow for evacuations is not known.
roadways at the terminus. The second group were However, a comprehensive study of contra ow
the merge designs in which the separate lane plans (Urbina and Wolshon 2003) in 2003 in-
groups are reunited into the normal- ow lanes cluded 21 reverse ow and contra ow sections.
using various geometric and control schemes. The locations and distances of these locations are
The selection of one or the other of these detailed in Table 1.
termination con gurations at a particular location
by an agency has been a function of several
factors, most importantly the level of traf c Future Directions
volume and the con guration and availability
of routing options at the end of the segment. As experiences with contra ow increase and its
In general, split designs offer higher levels of effectiveness becomes more widely recognized,
operational ef ciency of the two designs. The it is likely that contra ow will be accepted as
obvious bene t of a split is that it reduces the a standard component emergency preparedness
potential for bottleneck congestion resulting from planning and its usage will grow. Several recent
merging four lanes into two. Its most signi cant high pro le negative evacuation experiences have
drawback is that it requires one of the two lane prompted more states to add contra ow options
groups to exit to a different route, thereby elimi- to their response plans. The most notable of
Contraflow for Evacuation Traffic Management 351

Contraflow for Evacuation Traffic Management, Table 1 Planned contra ow/reverse ow evacuation routes
(Urbina and Wolshon 2003)
State Route(s) Approx. distance Origin location Termination
(miles) location
New Jersey NJ-47/ NJ-347a 19 44 29.5 3.5 26 Dennis Twp Atlantic Maurice River Twp
Atlantic City City Ship Bottom Washington Twp
Expressway NJ-72/
NJ-70a NJ-35a
Boro Mantoloking
Boro Wall Twp
Southampton
Pleasant
Pt.
Beach
C
NJ-138/I-195 Upper Freehold
Maryland MD-90 11 Ocean City US 50
a
Virginia I-64 80 Hampton Roads Richmond
Bridge
North Carolina I-40 90 Wilmington Benson (I-95)
South Carolina I-26 95 Charleston Columbia
Georgia I-16 120 Savannah Dublin
Florida I-10 Westbound I-10 180 180 20 110 85 Jacksonville Tallahassee
Eastbound SR 528 75 100 Pensacola SR 520 Tallahassee SR
(Beeline) I-4 East- Tampa Charlotte 417 Orange County
bound I-75 North- County Ft. Pierce I-275 Orlando Coast
bound FL Turnpike Coast
I-75 (Alligator Al-
ley)
Alabama I-65 135 Mobile Montgomery
Louisiana I-10 Westbound I- 25 115a New Orleans New I-55 Hattiesburga
10/I-59 (east/north) Orleans
Texas I-37 90 Corpus Christi San Antonio
a
Notes: Delaware and Virginia contra ow plans are still under development. The actual length of the New Orleans, LA
to Hattiesburg, MS contra ow segment will vary based on storm conditions and traf c demand. Since they are undivided
highways, operations on NJ-47/NJ-347, NJ-72/NJ-70, and NJ-35 are reverse ow rather than contra ow

these was in Houston, where scores of evac- It is also expected that as contra ow gains in
uees (including 23 in a single tragic incident) popularity, the application of other developing
reportedly perished during the highly criticized technologies will be integrated into this strat-
evacuation for Hurricane Rita in 2005 (Senior egy. Such has already been the case in South
Citizens From Houston Die When Bus Catches Carolina, Florida, and Louisiana where various
Fire 2005). Plans for contra ow are currently un- intelligent transportation systems (ITS) and other
der development and should be ready for imple- remote sensing technologies have been applied
mentation by the 2007 storm season. Contra ow to monitor the state and progression of traf c
is also being evaluated for use in some of the on contra ow sections during an evacuation. In
larger coastal cities of northeast Australia. Washington DC, where reversible ow has been
In other locations where hurricanes are not a evaluated for use on primary arterial roadways
likely threat, contra ow is also being studied. during emergencies, advanced control systems
Some of these examples include wild res for modifying traf c signal timings have also
in the western United States (Wolshon and been studied (Chen et al.).
Marchive 2007) and tsunamis and volcanoes
in New Zealand. Greater emphasis on terrorism
response have also resulted in cities with few Cross-References
natural hazards to begin examining contra ow
for various accidental and purposeful manmade  Contra ow in Transportation Network
hazards (Sorensen and Vogt 2006).  Dynamic Travel Time Maps
352 Contraflow in Transportation Network

References Wolshon B, Catarella-Michel A, Lambert L (2006)


Louisiana highway evacuation plan for hurricane Ka-
American Association of State Highway and Transporta- trina: proactive management of regional evacuations.
tion Of cials (2001) A policy on geometric design ASCE J Transp Eng 132(1):1 10
of highways and streets, 5th edn. American Associ- Wolshon B, Urbina E, Levitan M, Wilmot C (2005)
ation of State Highway and Transportation Of cials, National review of hurricane evacuation plans and
Washington, DC policies, part II: transportation management and oper-
Chen M, Chen L, Miller-Hooks E, Traf c signal timing ations. ASCE Natl Hazard Rev 6(3):142 161
for urban evacuation. ASCE J Urban Plan Dev Spec
Emerg Transp Issue 133(1):30 42
Lim YY (2003) Modeling and evaluating evacuation
contra ow termination point designs. Master s thesis, Contraflow in Transportation
Department of Civil and Environmental Engineering,
Louisiana State University
Network
Senior Citizens from Houston Die When Bus Catches
Fire (2005) Washington post staff writer, Saturday, Sangho Kim
Sept 24, p A09. Also available online at: http://www. Rancho Cucamonga, CA, USA
washingtonpost.com/wp-dyn/content/article/2005/09/
23/AR200509230 0505.html
Sorensen J, Vogt B (2006) Interactive emergency evacu-
ation planning guidebook. Chemical Stockpile Emer- Synonyms
gency Preparedness Program, Department of Home-
land Security. Available online at: http://emc.ornl.gov/
CSEPPweb/evac_ les/index.htm Counter ow; Emergency response; Evacuation
Theodoulou G (2003) Contra ow evacuation on the west- routes; Lane reversal; Networks, spatial; Road
bound I-10 out of the city of New Orleans. Master s networks
thesis, Department of Civil and Environmental Engi-
neering, Louisiana State University
United States Army Corps of Engineers (2000) South-
east United States hurricane evacuation traf c study.
Buckeley, Schuh, and Jernigan, Inc., Tallahassee (Per-
Definition
formed by post)
Urbina E, Wolshon B (2003) National review of hurricane Contra ow is a method designed to increase the
evacuation plans and policies: a comparison and con- capacity of transportation roads toward a certain
trast of state practices. Transp Res Part A Policy Pract
direction by reversing the opposite direction of
37(3):257 275
Williams B, Tagliaferri PA, Meinhold SS, Hummer JE, road segments. Figure 1 shows a change of road
Rouphail NM (2007) Simulation and analysis of free- direction under contra ow operation. Contra ow
way lane reversal for coastal hurricane evacuation. has been primarily used as a part of evacuation
ASCE J Urban Plan Dev Spec Emerg Transp Issue
133(1):61 72
schemes. Incoming lanes are reversed to an out-
Wolshon B (2006) Planning and engineering for the bound direction from an affected area during an
Katrina evacuation. Bridge Natl Acad Sci Eng evacuation to increase the ef ciency of traf c
36(1):27 34 movement. When contra ow is implemented on
Wolshon B (2001) One-way-out: contra ow freeway op-
eration for hurricane evacuation. Natl Hazard Rev Am
a road network, a signi cant amount of resources
Soc Civil Eng 2(3):105 112 is required in terms of manpower and safety fa-
Wolshon B, Lambert L (2004) Convertible lanes and road- cilities. Automated contra ow execution requires
ways. National Cooperative Highway Research Pro- controlled access at both starting and end points
gram, Synthesis 340, Transportation Research Board,
National Research Council, Washington, DC, 92pp
of a road. Manual execution requires police of-
Wolshon B, Marchive E (2007) Evacuation planning in the cers and barricade trucks. Contra ow planners
urban-wildland interface: moving residential subdivi- also need to take into account other factors from
sion traf c during wild res. ASCE J Urban Plan Dev the perspectives of planning, design, operation,
Spec Emerg Transp Issue 133(1):73 81
Wolshon B, McArdle B (in press) Temporospatial anal-
and nancial cost.
ysis of hurricane Katrina regional evacuation traf c Today, there are many attempts to generate au-
patterns. ASCE J Infrastruct Syst Spec Infrastruct Plan tomated contra ow plans with the advanced com-
Design Manag Big Events Issue puting power. However, computerized contra ow
Contraflow in Transportation Network 353

Contraflow in Transportation Network, Table 1


Evacuation traf c ow rates with varying number of
contra ow lanes (Source: Wolshon 2001; FEMA 2000)
Use con guration Estimated average outbound
ow rate (vehicle/h)
Normal (two-lanes 3,000
outbound) C
Normal plus one 3,900
contra ow lane
Normal and shoulder 4,200
plus one contra ow lane
Normal plus two 5,000
contra ow lanes

the utilization of contra ow has been considered


and executed in recent years. During the evac-
uation of hurricane Floyd in 1999, the South
Carolina Department of Transportation measured
the traf c ow of Interstate Highway 26 with
Contraflow in Transportation Network, Fig. 1 varying numbers of contra ow lanes (Wolshon
Contra ow road direction. (a) Normal operation. 2001). Table 1 summarizes their results. The
(b) Contra ow operation
rst important nding is that ow rate increases
as contra ow lanes (either a lane of opposing
traf c or a shoulder) are added. Second, the
involves a combinatorial optimization problem
amount of increased ow per lane is less than
because the number of possible contra ow net-
the average ow rate under normal condition.
work con gurations is exponentially increasing
It is known that the ow rate per lane under
with the number of road segments (i.e., edges
normal operation is about 1,500 vehicles/h. How-
in a graph). In addition, a direction change of
ever, the increased ow rate per lane in the table
a road segment affects the ow (or scheduling)
is under 1,000 vehicles/h. The limited increases
of overall traf c movement at the system level.
are caused by the unfamiliarity of drivers and
Thus, it is very hard to nd an optimal contra ow
their uneasiness driving through an opposite or
network con guration among a huge number of
shoulder lane. Finally, it is observed that the use
possibilities.
of shoulder lanes is not as effective as that of
normal lanes for contra ow.
During the Rita evacuation in 2005, many
Historical Background evidences showed how ill-planned contra ow
negatively affected traf c ow. The following
The use of contra ow methods on road networks are quoted observations (Litman 2006) of the
is not a new concept. Since the beginning of the traf c problems during the Rita evacuation:
modern roadway system, there has been a search High-occupancy-vehicle lanes went unused,
for solutions to resolve unbalanced ow of traf c as did many inbound lanes of highways,
due to limited capacity. As the need for mas- because authorities inexplicably waited until
sive evacuations of population began to increase late Thursday to open some up. . . . As
around southeastern coastal states threatened by congestion worsened state of cials announced
hurricanes, more ef cient methods of moving that contra ow lanes would be established on
surface transportation have been discussed over Interstate Highway 45 (Fig. 2), US Highway 290
the past 20 years (Wolshon et al. 2002). However, and Interstate Highway 10. But by mid-afternoon,
354 Contraflow in Transportation Network

Contraflow in Transportation Network, Fig. 2 Hurricane Rita evacuation required contra ow on Interstate Highway
45. Notice that traf c on both sides of I-45 is going north (Source: dallasnews.com)

with traf c immobile on 290, the plan was con gurations (89,023), only 346 con gurations
dropped, stranding many and prompting others to have minimum (i.e., optimal) evacuation time,
reverse course. We need that route so resources which corresponds to 0.26 % out of total possible
can still get into the city, explained an agency con gurations. For the same network with three
spokeswoman. types of ippings as shown in Fig. 3d, the number
of possible networks is 317 , which is more than
100 million. It is impossible to handle such
Scientific Fundamentals exponentially large number of con gurations
even with the most advanced computing system.
Why is Planning Contraflow Difficult?: Figur- These examples with such a small size network
ing out an optimal contra ow network con gura- show why it is dif cult to nd an optimal
tion is very challenging due to the combinatorial contra ow network. The problem is classi ed as
nature of the problem. Figure 3 shows examples an NP-hard problem in computer science domain.
of contra ow network con gurations. Suppose
that people (e.g., evacuees) in a source node S Modeling Contraflow using Graph: It is often
want to escape to destination node D on the necessary to model a contra ow problem using
network. Figure 3a is a road network with all a mathematical graph. S. Kim et al. (Kim and
edges in two way directions. In other words, no Shekhar 2005) presented a modeling approach
edge is reversed in the network. Figure 3b is an for the contra ow problem based on graph and
example of a so called Infeasible contra ow ow network. Figure 4 shows a simple evacuation
con guration because no evacuee can reach des- situation on a transportation network. Suppose
tination node D due to the ill- ipped road seg- that each node represents a city with initial occu-
ments. The network in Fig. 3c allows only two pancy and its capacity, as shown in Fig. 4a. City
types of ippings (i.e., ", #). A network in Fig. 3d A has 40 people and also capacity 40. Nodes
allows three types of ippings (i.e., ", #, "#). A and C are modeled as source nodes, while
Each network used in these examples has node E is modeled as a destination node (e.g.,
17 edges. If two types of ippings are allowed shelter). Each edge represents a road between
as shown in Fig. 3c, the number of possible two cities with travel time and its capacity. For
network con gurations is 217 , that is, 131,072. example, a highway segment between cities A
Among them, 89,032 con gurations are feasible. and B has travel time 1 and capacity 3. If a time
An experiment was conducted by assigning unit is 5 min, it takes 5 min for evacuees to travel
some number of evacuees on node S and from A to B and a maximum of 3 evacuees can
travel time/capacity attributes on edges. If simultaneously travel through the edge. Nodes B
evacuation time is measured for all feasible and D have no initial occupancy and only serve as
Contraflow in Transportation Network 355

Contraflow in a b
Transportation Network,
Fig. 3 Examples of
infeasible contra ow
network and 2 or 3 types of
ippings. (a) All two way S D S D
(b) Infeasible con guration
(c) Two types ippings
(d) Three types ippings
C

c d

S D S D

a b c
{40,40} {0,10}
(1,3)
A B
(1,4) (1,2)
{40,40} {0,10} {40,40} {0,10}
(1,3) (1,7) (1,7)
A B A B
E {0,–} (1,5) (1,5)
(1, 2) (1,2)

(4,1) (1,4) E {0,–} (1,4) E {0,–}


(1,2) (4,1)
(4,2) (4,2)
C D C
(1,5)
D C
(1,5)
D
(1,3)
{20,20} {0,10} {20,20} {0,10} {20,20} {0,10}
{initial occupancy, node capacity} # : Source
(travel time, edge capacity)
Evacuation Time: 22 # : Destination Evacuation Time: 11 Evacuation Time: 14
Graph Representation of an Evacuation Situation Contraflow Configuration 1 Contraflow Configuration 2

Contraflow in Transportation Network, Fig. 4 Graph representation of a simple evacuation situation and two
following contra ow con guration candidates

transshipment nodes. The evacuation time of the but also differ in evacuation time. Even though
original network in Fig. 4a is 22, which can be the time difference is just 3 in this example, the
measured using minimum cost ow algorithm. difference may be signi cantly different in the
Figure 4b and c illustrate two possible case of a complicated real network. This example
contra ow con gurations based on the original illustrates the importance of choice among
graph. All the two-way edges used in the possible network con gurations. In addition,
original con guration are merged by capacity there are critical edges affecting the evacuation
and directed in favor of increasing outbound time, such as edge (B, D) in Fig. 4.
evacuation capacity. There are two candidate
con gurations that differ in the direction of Solutions for Contraflow Planning: S. Kim
edges between nodes B and D. If the evacuation et al. (Kim and Shekhar 2005) presented heuristic
times of both con gurations are measured, the approaches to nd a sub-optimal contra ow
con guration in Fig. 4b has evacuation time 11, network con guration from a given network.
while the con guration in Fig. 4c has evacuation Their approaches used the congestion status of a
time 14. Both con gurations not only reduce road network to select the most effective target
356 Contraflow in Transportation Network

: Highway

0 ~ 1,000
~ 3,000

~ 8,000
~ 16,000

~ 40,000

Contraflow in Transportation Network, Fig. 5 Day-time population distribution in the Twin Cities, Minnesota

road segments. The experimental results showed simulator, they were able to suggest alternative
that reversing less than 20 % of all road segments contra ow con gurations at the level of entry and
was enough to reduce evacuation time by more termination points.
than 40 %. Tuydes and Ziliaskopoulos (2004)
proposed a mesoscopic contra ow network Datasets for Contraflow Planning: When
model based on a dynamic traf c assignment emergency managers plan contra ow schemes,
method. They formulated capacity reversibility the following datasets may be considered.
using a mathematical programming method. First, population distribution is important to
Theodoulou and Wolshon (2004) used CORSIM predict congested road segments and to prepare
microscopic traf c simulation to model the resources accordingly. Figure 5 shows a day-
freeway contra ow evacuation around New time population distribution in the Twin Cities,
Orleans. With the help of a micro scale traf c Minnesota. The dataset is based on Census 2000
Contraflow in Transportation Network 357

Contraflow in Transportation Network, Fig. 6 Monticello nuclear power plant located around Twin Cities,
Minnesota

and employment Origin-Destination estimate Key Applications


from the Minnesota Department of Employment
and Economic Development, 2002. Evacuation under Emergency: When a
Second, a scenario dataset needs to be pre- contra ow program is executed under an
pared. The scenario may include road network, emergency situation, several factors should be
accident location, affected area, destination (e.g., taken into account: traf c control, accessibility,
evacuation shelter). Figure 6 shows a virtual sce- merging of lanes, use of roadside facilities, safety,
nario of a nuclear power plant failure in Monti- labor requirements, and cost (Wolshon 2001).
cello, Minnesota. There are twelve cities directly Among these factors, there is a tradeoff between
affected by the failure within 10 miles of the facil- contra ow and safety because most freeways and
ity and one destination shelter. The affected area arterial roads are originally designed for one way
is highly dynamic in this case because wind di- direction. An easy example would be a driver
rection can change the shape of the affected area. who cannot see a traf c light if he drives in the
The road network in the scenario is based on In- opposite direction. Thus, considerable resources
terstate highway (I-94) and major arterial roads. (e.g., police of cers, barricade trucks, etc) and
Figure 7 shows a possible contra ow scheme cost are required when a contra ow program
based on the Twin Cities population distribution is implemented. Most coastal states threatened
and Monticello nuclear power plant scenario. In by hurricanes every year prepare contra ow
this scheme, the dotted road segments represent schemes. According to Wolshon (2001), 11 out of
suggested contra ow. If the suggested road seg- 18 coastal states have contra ow plans in place.
ments are reversed as contra ow, the evacuation The application of contra ow for various
time can be reduced by a third from the results of desaster types is somewhat limited due to the
computer simulation. following reason. Table 2 presents various types
358 Contraflow in Transportation Network

Contraflow in
Transportation Network,
Fig. 7 A possible
contra ow scheme for
Monticello nuclear power
plant scenario

Contraflow in Transportation Network, Table 2 Different types of disasters present different types of evacuation
properties (Source: Litman 2006)
Type of disaster Geographic scale Warning Contra ow before Contra ow after
p p
Hurricane Very large Days
p p
Flooding Large Days
p
Earthquake Large None
p
Tsunami Very large Short
p
Radiation/toxic release Small to large Sometimes

of disasters and their properties. According to has proven to provide substantial savings in travel
Litman (2006), evacuation route plans should time. Reversible lanes are also commonly found
take into account the geographic scale and in tunnels and on bridges. The Golden Gate
length of warning. Contra ow preparedness Bridge in San Francisco has 2 reversible lanes.
is most appropriate for disasters with large Figure 8 shows a controlled access of reversible
geographic scale and long warning time, which lane system.
gives responders time to dispatch resources and
establish reversed lanes. Thus, hurricane and Others: Contra ow programs are used for
ooding are the most appropriate candidates to events with high density population such as
apply contra ow plans before disaster. Other football games, concerts, and reworks on the
types of disasters with relatively short warning Fourth of July. Highway construction sometimes
time may consider contra ow only after disaster requires contra ow. Figure 9 shows an example
to resolve traf c congestion for back home traf c. of contra ow use for highway construction.

Automated Reversible Lane System: Wash- Future Directions


ington D. C. has an automated reversible lane
system to address the daily traf c jams during For emergency professionals, there are many is-
morning and evening peak time (Metropolitan sues to be solved with regard to contra ow. Cur-
Washington Council of Governments 2004). rently many planners rely on educated guesses
For example, Interstate Highway 95 operates 28 with handcrafted maps to plan contra ow. They
miles of reversible lanes during 6:00 9:00 AM need computerized contra ow tools to acquire
and 3:30 6:00 PM. The reversible lane system precise quanti cation (e.g., evacuation time) of
Contraflow in Transportation Network 359

Contraflow in
Transportation Network,
Fig. 8 Controlled access
of automated reversible
lane system in Adelaide
(Source: wikipedia.org)

Contraflow in
Transportation Network,
Fig. 9 Use of contra ow
for highway construction
on I-10 in Arizona (Source:
map.google.com)

contra ow networks with geographic and demo- Cross-References


graphic data. Other assessments are also required
to plan ef cient resource use, appropriate termi-  Emergency Evacuation, Dynamic Transporta-
nation of contra ow, road markings, post disaster tion Models
re-entry, etc. For researchers, the development  Emergency Evacuations, Transportation Net-
of ef cient and scalable contra ow simulation works
models and tools are urgent tasks. As shown in
the Rita evacuation, a large scale evacuation (i.e.,
three million evacuees) is no longer an unusual References
event. Ef cient tools available in the near future
Kim S, Shekhar S (2005) Contra ow network recon gu-
should handle such large scale scenarios with
ration for evacuation planning: a summary of results.
high delity traf c models. In: Proceedings of the 13th ACM symposium on
360 Contraint Relations

advances in geographic information systems, Bremen,


pp 250 259 Coregistration
Litman T (2006) Lessons from Katrina and Rita: what
major disasters can teach transportation planners. J  Registration
Trans Eng 132(1):11 18
Metropolitan Washington Council of Governments (2005)
2004 performance of regional high-occupancy vehicle
facilities on freeways in the Washington region. Anal-
ysis of person and vehicle volumes Co-registration
Theodoulou G, Wolshon B (2004) Alternative methods to
increase the effectiveness of freeway contra ow evac-
uation. J Trans Res Board Transp Res Rec 1865:48 56  Registration
Tuydes H, Ziliaskopoulos A (2004) Network re-design
to optimize evacuation contra ow. Technical report
04-4715, Presented at 83rd Annual Meeting of the
Transportation Research Board Correlated
Wolshon B (2001) One-way-out: contra ow freeway op-
eration for hurricane evacuation. Natl Hazard Rev
2(3):105 112  Patterns, Complex
Wolshon B, Urbina E, Levitan M (2002) National review
of hurricane evacuation plans and policies. Technical
report, Hurricane Center, Louisiana State University,
Baton Rouge
Correlated Frailty Models

Contraint Relations  Spatial Survival Analysis

 Constraint Databases and Data Interpolation

Correlated Walk
Convergence of GIS and CAD  CrimeStat: A Spatial Statistical Program for the
Analysis of Crime Incidents
 Computer Environments for GIS and CAD

Converging Correlation and Spatial


Autocorrelation
 Movement Patterns in Spatio-Temporal Data
Sang-Il Lee
Department of Geography Education, Seoul
Co-occurrence National University, Seoul, South Korea

 Co-location Pattern Discovery


 Co-location Patterns, Algorithms Definition
 Statistically Signi cant Co-location Pattern
Mining Spatial autocorrelation or spatial dependence can
be de ned as a particular relationship between
the spatial proximity among observational units
and the numeric similarity among their values;
Coordinate Systems positive spatial autocorrelation refers to situations
in which the nearer the observational units, the
 Reference Frames more similar their values (and vice versa for
Correlation and Spatial Autocorrelation 361

its negative counterpart). The presence of spa- lates the assumption of independent observations
tial autocorrelation or dependence means that a and reduces the number of degrees of freedom
certain amount of information is shared and du- or effective sample size. In this context, standard
plicated among neighboring locations, and thus, inferential tests tend to underestimate the true
an entire data set possesses a certain amount sampling variance of the Pearson s correlation
of redundant information. This feature violates coef cient when positive spatial autocorrelation
the assumption of independent observations upon is present in two variables under investigation, C
which many standard statistical treatments are resulting in a heightened chance of committing
predicated. This entry revolves around what hap- a Type I error. One can generate n different pairs
pens to the nature and statistical signi cance of of spatial patterns from the original variables; all
correlation coef cients (e.g., Pearson s r) when of the pairs are identical in terms of Pearson s cor-
spatial autocorrelation is present in both or either relation coef cient, but they are different in terms
of the two variables under investigation. of the number of degrees of freedom or effective
sample size (Clifford and Richardson 1985; Clif-
ford et al. 1989; Haining 1991; Dutilleul 1993).
Historical Background These notions can extend to situations dealing
with a pair of regression residuals (Tiefelsdorf
A lack of independence results in reduced de- 2001).
grees of freedom or effective sample size; the Two different approaches exist, addressing the
greater the level of spatial autocorrelation, the problem of spatial autocorrelation in bivariate
smaller the number of degrees of freedom or correlation. One is to seek to remedy the problem
effective sample size. This means that any type by providing modi ed hypothesis testing proce-
of statistical test based on an original sample dures taking the degree of spatial autocorrelation
size could be awed in the presence of spatial into account (for a comprehensive review and
autocorrelation, thus heightening the probability discussion, see Grif th and Paelinck 2011). The
of committing a Type I error. Suppose that n dif- other is to develop bivariate spatial autocorrela-
ferent map patterns are generated from n obser- tion statistics to capture the degree of spatial co-
vations. Because the n different map patterns are patterning between two map patterns and, fur-
identical in terms of sample mean and variance, ther, to propose some techniques for exploratory
any statistical inferences based on these values spatial data analysis (ESDA) that allow the de-
are identical. However, all of the map patterns tecting of bivariate spatial clusters (among others,
possess different degrees of freedom or effective Lee 2001; Anselin et al. 2002; Lee 2012).
sample size, and thus n different statistical esti-
mations should be obtained.
This type of problem occurs in situations deal- Scientific Fundamentals
ing with the correlation between two variables,
which has long been known (Bivand 1980; Grif- For this section, I seek to conceptualize and
th 1980; Haining 1980; Richardson and HØmon illustrate the concept of bivariate spatial depen-
1981). The presence of spatial autocorrelation dence with which the problems of correlation in
in both or either of two variables under investi- the presence of spatial autocorrelation are better
gation (i.e., bivariate spatial dependence) means captured and tackled. For simplicity, subsequent
that when the nature of a bivariate association discussions about spatial autocorrelation tend to
at a location is known, one can guess the na- refer to its positive component.
ture of bivariate associations at nearby locations. Nearly all studies about spatial autocorrela-
For example, if a location has a pair of higher- tion focus on univariate cases, i.e., on the sim-
than-average values for two variables, there is ilarity/dissimilarity in nearby locations in a sin-
a more-than-random chance to observe similar gle map pattern in terms of their values. How-
pairs in nearby locations. This feature again vio- ever, correlation could be a legitimate statistical
362 Correlation and Spatial Autocorrelation

concept endemic to bivariate situations. A corre- opposite. H Q denotes a spatial lag that is greater
lation coef cient should gauge the nature (direc- than or equal to the global average, and LQ denotes
tion and magnitude) of the relationship between the opposite. The symbol denotes a univariate
two variables under investigation. Interestingly, horizontal relationship. The symbol is intro-
spatial autocorrelation can be viewed as a par- duced here to make a clear distinction between
ticular case of correlation, although only a single an original value at a location and a derived value
variable is involved, which is why it is known as from a set of locations. This conceptualization is
autocorrelation. Because any type of correlation the basis for the Moran s I statistic.
should entail two vectors, another vector should If another concept (i.e., the spatial moving
be spatially derived for spatial autocorrelation average) is introduced, the situation changes sub-
to be a type of correlation. One of the most stantively. Unlike the spatial lag, this concept
commonly used concepts for this case is a spatial treats the reference unit itself as one of its neigh-
lag vector, each element of which represents a bors. Consider
weighted mean of a location s neighbors. In this
Q
H LQ (2)
sense, spatial autocorrelation could be rephrased
as the correlation between one variable and its Here, H Q and LQ denote the spatial moving av-
spatial lag vector (Lee 2001). erages at each location. This conceptualization
But, what kinds of issues can arise when forms the foundation for the Getis-Ord s Gi
we combine the two concepts, correlation and statistic. The four types of univariate spatial as-
spatial autocorrelation? This question might be sociation listed in (1) reduce to the two values
better captured by a rather new concept known in (2); H H Q and L LQ respectively are linked
as bivariate spatial dependence, which is a sim- to H and LQ , but H
Q LQ and L Q can
H
ple extension of the general concept of spatial point either way, depending on the differences in
dependence, and can be de ned as a particular values and/or spatial weights. These two values
relationship between the spatial proximity among can be conceptualized as two different types of
observational units and the numeric similarity of univariate spatial clusters (Lee and Cho 2013).
their bivariate associations (Lee 2001, 2012). This distinction between spatial association types
In a bivariate situation, each observational unit and spatial cluster types is critical because it
contains a pair of values, and the nature of the can represent the two contrasting perspectives of
bivariate association is assumed to be concep- spatial modeling and spatial exploration. This
tually de ned and numerically evaluated. If the distinction plays a pivotal role in addressing vari-
distribution of bivariate associations is not spa- ous issues about multivariate spatial dependence,
tially random, then we might legitimately state a particular case of which is bivariate spatial
that bivariate spatial dependence exists. dependence.
Before attempting to illustrate the concept We now move to bivariate situations in which
of bivariate spatial dependence, we begin with two variables, denoted by X and Y , are under
univariate spatial dependence. Any local set com- investigation. Each observational unit should take
posed of a reference observational unit and its on one of the following four types of bivariate
neighbors takes on one of the following four association (Lee 2012):
types of univariate spatial association:
H H L L
H Q
H H LQ L Q
H L Q
L (1) j j j j (3)
H L H L
Here, H denotes a value at a reference unit that is
greater than or equal to a threshold value (usually In this work, the symbol j denotes a bivari-
the average) or a positive ·-score (original values ate vertical relationship at a location. Pearson s
having the mean subtracted and then divided correlation coef cient is predicated upon this
by the standard deviation), and L denotes the conceptualization and is aspatial in nature in the
Correlation and Spatial Autocorrelation 363

sense that it does not consider the spatial distribu- ists of identifying those showing typical positive
tion of the pair-wise local bivariate associations. bivariate spatial dependence.
Suppose that a location has only one neighbor We might be able to simplify the situation
at which the four different types of bivariate as- by applying the notion of spatial lag as seen
sociation are possible, resulting in the following in (1). Because each variable has four different
16 different types of bivariate spatial association: types of univariate spatial association at a loca-
tion, we always have only 16 different types of C
H H H H H L H L bivariate spatial association (Lee 2012; Lee and
j j j j j j j j Cho 2013), no matter how many neighbors are
H H H L H H H L involved. Consider

H Q
H H HQ H Q
H H HQ
H H H H H L H L j j j j j j j j
j j j j j j j j (4) H Q
H H LQ L Q
H L LQ
L H L L L H L L

H Q
L H LQ H LQ H Q
L
L H L H L L L L j j j j j j j j (5)
j j j j j j j j Q
H H H LQ L HQ L Q
L
H H H L H H H L

L Q
H L HQ L Q
H L HQ
L H L H L L L L
j j j j j j j j
j j j j j j j j Q
H H H LQ L Q
H L LQ
L H L L L H L L

This association is both bivariate and spatial L LQ L LQ L LQ L LQ


because two pairs (bivariate) in adjacent locations j j j j j j j j
H HQ H LQ L HQ L QL
(spatial) are compared. The four main diagonal
elements clearly show positive bivariate spatial
dependence because exactly the same types of Each observational unit is assigned to one of
pairs of bivariate association are connected. In these 16 types in terms of local bivariate spa-
contrast, the four anti-diagonal elements can be tial dependence. Certain interesting ndings are
viewed as examples of negative bivariate spatial drawn from this illustration. First, the four cases
dependence because rather different types of bi- of perfect positive spatial dependence are ob-
variate association are placed next to each other. served in the four corners, and their four neg-
We next consider additional neighbors. If one ative counterparts are observed in the middle.
more neighbor is added, then we have 43 D 64 Second, the four cases in the main diagonal are
different types of bivariate spatial association for more closely associated with a positive aspatial
each local set. The situation becomes more com- correlation, measured by Pearson s correlation
plicated, although a decent chance still exists for coef cient, and the four cases in the anti-diagonal
observational units to show perfect and positive are more strongly associated with a negative
bivariate spatial dependence. Because areal units aspatial correlation. This notion is not con ned
in the real world (i.e., administrative units, school to Pearson s r, but can extend to other linear
districts, and other types of functional regions) correlation coef cients (see Grif th and Amrhein
have been reported to have approximately six 1991).
contiguous neighbors on average, we consider By combining these two aspects, we can make
47 D 16;384 different types of bivariate spatial certain general statements. First, with a decent
associations at each location, and little chance ex- level of positive Pearson s r, the main diagonal
364 Correlation and Spatial Autocorrelation

cases are expected to be more observable than bivariate spatial autocorrelation statistics for the
the anti-diagonal cases. If the rst and last cases bivariate counterparts of Moran s I and Getis-
prevail for a local set, a positive bivariate spatial Ord s Gi statistics.
dependence can be said to exist; if the second The test statistic for Pearson s r is given by
and third cases prevail, a negative bivariate spatial
p .p
dependence can be said to exist. Second, with
t Dr n 2 1 r2 (7)
a decent level of negative Pearson s r, the anti-
diagonal cases are expected to be more observ-
able than the main diagonal counterparts. If the with n 2 degrees of freedom when the following
rst and last cases prevail for a local set, a two assumptions are satis ed: pairs of observa-
positive bivariate spatial dependence can be said tions are drawn from the same, approximately
to exist; if the second and third cases prevail, bivariate normal, distribution with constant ex-
a negative bivariate spatial dependence can be pectation and nite variance (Haining 1991) and
said to exist. In an overall sense, if no bivariate observations of each variable are mutually inde-
spatial autocorrelation exists, the 16 different pendent. This standard hypothesis testing proce-
types of bivariate spatial association (occurrences dure for the correlation coef cient might not hold
of which are subordinate to the nature of the for spatial data. The rst assumption of a constant
global aspatial correlation) must be randomly mean structure cannot be assumed because of the
distributed; otherwise, they should show a certain potential presence of a global trend. More impor-
degree of spatial clustering. tantly, the second assumption cannot be sustained
These situations are further simpli ed by in- because of the usual presence of univariate spatial
corporating the notion of spatial moving average. autocorrelation for both or either of the variables
Because the four different types of univariate under investigation, which alludes to bivariate
spatial association de ned in (1) reduce to the two spatial dependence.
different values seen in (2), the 16 different types The standard error of Pearson s r, which is
of bivariate spatial association de ned in (5) can also a part of (7), is given by
reduce to the following four: r
1 r2
Q
H HQ LQ LQ Or D ; (8)
n 2
j j j j (6)
Q
H QL HQ QL where the denominator is associated with the
number of degrees of freedom. This standard er-
These classi cations can be referred to as four ror should be adjusted according to the degree of
different types of bivariate spatial clusters (Lee spatial autocorrelation in the variables; it should
and Cho 2013). The cases in the four corners in be larger when positive spatial autocorrelation
(5) represent typical examples of the four types; prevails (and vice versa for negative spatial au-
the others are classi ed into one of the four cases, tocorrelation) (Haining 1991). This outcome can
depending on differences in values and/or spatial be shown in (8); the lack of independence among
weights. pairs of observations due to positive bivariate
spatial dependence reduces the number of degree
of freedom or effective sample size, thus making
Key Applications the standard error larger.
Several approaches have been proposed
For this section, I focus on two strands of endeav- in order to remedy or at least alleviate
ors that have been undertaken in this particular the problem of underestimation of the true
eld: one is to develop a means to remedy the sampling variance that the standard inferential
problem of correlation in the presence of bivari- test commits (Clifford and Richardson 1985;
ate spatial dependence; the other is to devise Dutilleul 1993). In this entry, we focus solely
Correlation and Spatial Autocorrelation 365

on the Clifford-Richardson s solution (for a Any bivariate spatial autocorrelation statistic


more comprehensive treatment, see Grif th and should capture the degree of spatial co-patterning
Paelinck 2011). They rede ne the equation for by measuring both pair-wise covariance and spa-
the standard error by replacing n in (8) with tial clustering (Lee 2001). One of the most impor-
n0 , their effective sample size, which arguably tant considerations in determining how to mea-
refers to the number of equivalent, independent sure bivariate spatial dependence might be the
samples: r fact that both Pearson s r and Moran s I are C
1 r2 cross-product statistics (Getis 1991), which take
Or D : (9) the form of an average of the sum of products
n0 2
They also provide the equation for computing the of two vectors. Pearson s r is de ned as an
effective sample size as average of the cross-product of two standardized
vectors, zX and zY ; similarly, Moran s I can be
de ned as an average of the cross-product of two
h i 1
O XR
n0 D 1 C n2 trace R OY ; (10) standardized vectors, zX and zQ X (a standardized
spatial lag vector), when a spatial weights matrix
is row standardized (Lee 2001):
where R O X and RO Y are the estimated n n spatial
autocorrelation matrices for the two variables and P
i .xi N .yi y/
x/ N
the trace is a matrix operation which is the sum of r D qP qP (11)
the diagonal elements. Because each diagonal el- i .xi N 2
x/ i .yi N 2
y/
ement of matrix R O Y can be seen as the relative
O XR
degree of spatial autocorrelation at each location 1X
D · · ; and
(1 for no spatial autocorrelation, more than 1 for n i X i Yi

positive spatial autocorrelation), trace R OY


O XR P P
n i j wij .xi x/
N xj xN
captures the overall degree of bivariate spatial de- I D P P P 2
pendence. If no spatial autocorrelation is present i j wij i .xi x/
N
for either of the two variables across all locations,
each diagonal element of matrix R O XRO Y is 1, P P
.xi x/
i N j wij xj xN
trace R O XRO Y D n, and thus n 0
n (Haining D qP qP
1991). If a positive bivariate spatial dependence i .xi N 2
x/ i .xi N 2
x/
prevails, n0 is less than n, resulting in a reduced 1X
effective sample size or a lesser number of de- D · ·Q : (12)
n i Xi Xi
grees of freedom.
Suppose, for example, that we have 50 pairs
of observations and a Pearson s r of 0.3. The test Predicated upon all of the discussions about uni-
statistic and the number of degrees of freedom ac- variate statistics for spatial autocorrelation, Lee
cording to the standard hypothesis testing method (2012) identi es six vectors that may play roles in
as shown in (7) are, respectively, 2.179 and 48, de ning bivariate spatial autocorrelation statistics
which implies that r is statistically signi cant conforming to the general form of cross-product
.p D 0:0343/. If we have a positive bivariate statistic: zX , zQ X , and zQ X (a standardized spatial
spatial autocorrelation of 2.0 on average across moving average vector) for the X variable and
locations, then we have t D 1:541 with the effec- zY , zQ Y , and zQ Y for the Y variable. Using these
tive sample size of 26 .1 C 502 100/ according two sets of vectors, one can obtain various types
to (10), which is not statistically signi cant .p D of bivariate spatial autocorrelation statistics. In
0:1365/. This Clifford-Richardson s solution is this entry, only the following two are discussed
implemented in an R package named SpatialPack (i.e., the cross-Moran or bivariate Moran statistic
(Vallejos et al. 2013). denoted by CM and Lee s L statistic):
366 Correlation and Spatial Autocorrelation

P P
n i j wij .xi x/
N yj yN 1X
CM D P P q qP D · ·Q ; and
wij P n i X i Yi
i j .xii N 2
x/ i .yi N 2
y/
P h P P i
n i j wij xj xN j wij yj yN 1X
L D qP qP D ·Q ·Q : (13)
P P 2 n i X i Yi
i j wij i .xi N 2
x/ i .yi N 2
y/

Here, wij and wij are elements from a zero spatial modeling perspective, whereas Lee s
diagonal and nonzero diagonal spatial weights statistic is more strongly associated with the
matrix, respectively. The former statistic is one spatial exploration perspective. For example,
derived from a multivariate spatial correlation many situations might exist in which one should
matrix proposed by Wartenberg (1985) and is a postulate that a dependent variable at a given
simple extension of univariate Moran s I , thus set of locations is in uenced by independent
gauging the correlation between one variable at variables in the neighboring locations. However,
original locations and the other variable at the if the main interest lies in measuring the spatial
neighboring locations (a spatial lag vector). In similarity between the two map patterns, and
contrast, the latter, which was proposed by Lee exploring and detecting possible bivariate spatial
(2001, 2004, 2009), is de ned as the correlation clusters, L might be the better option. In
between one variable and the other variable s addition, L is much more congruent with what
spatial moving average vectors. In comparison, is documented in (6). The higher the Pearson s
cross-Moran is more congruent with the con- aspatial correlation coef cient, and at the same
cept of cross-correlation, whereas Lee s L deals time the higher the level of spatial clustering of
more directly with the concept of co-patterning bivariate association, the higher the L statistic.
by considering not only bivariate association at Certain exploratory spatial data analysis (ESDA)
the original locations but also their spatial associ- techniques using Lee s local Li (see Eq. 14)
ation with neighboring locations. can be developed like ones using cross-Moran
In examining the different advantages and (Anselin et al. 2002), which is beyond the
weaknesses, one can conclude that the bivariate scope of this entry (see Lee 2012; Lee and Cho
Moran s statistic is more congruent with the 2013):

P P
n2 jwij xj xN j wij yj yN
Li D qP qP D ·Q Xi ·Q Yi (14)
P P 2
i j wij i .xi N 2
x/ i .yi N 2
y/

The distributional properties for all bivariate information in terms of bivariate association,
spatial autocorrelation statistics have been estab- thus violating the assumption of independent
lished with the randomization assumption (Lee sampling, and the shared information spuriously
2004, 2009), which might be crucial to develop strengthens (or weakens) the nature of correlation
certain kinds of ESDA techniques, such as bivari- between two variables under investigation,
ate cluster maps. making any conventional statistical inferences
or judgments considerably questionable.
Future Directions The notion and procedure of correlation coef-
cient decomposition based on the eigenvector
Bivariate spatial dependence points to situations spatial ltering (ESF) technique (Grif th and
in which nearby observational units carry shared Paelinck 2011; Chun and Grif th 2013) provides
Correlation and Spatial Autocorrelation 367

an invaluable insight into our understanding of Bivand R (1980) A Monte Carlo study of correlation
correlation with spatial autocorrelation. It allows coef cient estimation with spatially autocorrelated ob-
servations. Quaest Geogr 6:5 10
an aspatial correlation coef cient to be decom- Clifford P, Richardson S (1985) Testing the association
posed into ve sub-correlations between spatially between two spatial processes. Stat Decis Suppl Issue
ltered variables, common spatial autocorrelation 2:155 160
components, unique spatial autocorrelation com- Clifford P, Richardson S, HØmon D (1989) Assessing
ponents, one s spatially ltered variable and the
the signi cance of the correlation between two spatial C
processes. Biometrics 45:123 134
other s unique spatial autocorrelation component, Chun Y, Grif th DA (2013) Spatial statistics & geostatis-
and one s unique spatial autocorrelation compo- tics: theory and applications for geographic informa-
nent and the other s spatially ltered variable. tion science & technology. Sage, Los Angeles
Dray S, Sonia S, Fran ois D (2008) Spatial ordination of
Bivariate spatial dependence or autocorrela- vegetation data using a generalization of Wartenberg s
tion is a special case of multivariate spatial de- multivariate spatial correlation. J Veg Sci 19:45 56
pendence or autocorrelation (Wartenberg 1985). Dutilleul P (1993) Modifying the t test for assessing the
For example, trivariate spatial dependence is correlation between two spatial processes. Biometrics
49:305 314
simply de ned as a particular relationship be- Getis A (1991) Spatial interaction and spatial autocor-
tween the spatial proximity among observational relation: a cross-product approach. Environ Plan A
units and the numeric similarity of their trivariate 23:1269 1277
associations. Thus, we have 43 D 64 different Grif th DA (1980) Towards a theory of spatial statistics.
Geogr Anal 12:325 339
types of trivariate spatial association, similar to Grif th DA (1988) Advanced spatial statistics: special
(5), and 23 D 8 different types of trivariate topics in the exploration of quantitative spatial data
spatial clusters, similar to (6). series. Kluwer, Dordrecht
Because each pair of variables in a multivari- Grif th DA, Amrhein CG (1991) Statistical analysis for
geographers. Prentice-Hall, Englewood Cliffs
ate data set can be viewed as a building block for Grif th DA, Paelinck, JH (2011) Non-standard spatial
statistical treatments, the notion of bivariate spa- statistics and spatial econometrics. Springer, New York
tial dependence should have certain implications Haining RP (1980) Spatial autocorrelation problems. In:
in spatializing any form of multivariate statisti- Herbert DT, Johnston RJ (eds) Geography and the
urban environment, vol 3. Wiley, New York, pp 1 44
cal techniques, e.g., spatial principal components Haining RP (1991) Bivariate correlation with spatial data.
analysis (e.g., Grif th 1988; Dray et al. 2008; Lee Geogr Anal 23:210 227
and Cho 2014; Lee 2015) and spatial canonical Lee S-I (2001) Developing a bivariate spatial association
correlation analysis. measure: an integration of Pearson s r and Moran s I .
J Geogr Syst 3:369 385
Lee S-I (2004) A generalized signi cance testing method
for global measures of spatial association: an extension
of the Mantel test. Environ Plan A 36:1687 1703
Cross-References Lee S-I (2009) A generalized randomization approach
to local measures of spatial association. Geogr Anal
41:221 248
 Spatial Autocorrelation and Spatial Interaction
Lee S-I (2012) Exploring bivariate spatial dependence
 Spatial Autocorrelation Measures and heterogeneity: a comparison of bivariate measures
 Spatial Filtering of spatial association. Paper presented at the annual
 Spatial Statistics and Geostatistics: Basic Con- meeting of the association of American geographers,
New York, 24 28 Feb
cepts
Lee S-I (2015) Some elaborations on spatial principal
components analysis. Paper presented at the annual
meeting of the association of American geographers,
Chicago, 21 25 Apr
References Lee S-I, Cho D (2013) Delineating the bivariate spatial
clusters: a bivariate AMOEBA technique. Paper pre-
Anselin L, Syabri I, Smirnov O (2002) Visualizing mul- sented at the annual meeting of the association of
tivariate spatial correlation with dynamically linked American geographers, Los Angeles, 9 13 Apr
windows. In: Anselin L, Rey S (eds) New tools for Lee S-I, Cho D (2014) Developing a spatial principal
spatial data analysis: proceedings of the specialist components analysis. Paper presented at the annual
meeting, Center for Spatially Integrated Social Science meeting of the association of American geographers,
(CSISS), University of California, Santa Barbara Tampa, 8 12 Apr
368 Correlation Queries

Richardson S, HØmon D (1981) On the variance of the Correlation queries are the queries used for
sample correlation between two independent lattice nding collections, e.g. pairs, of highly correlated
processes. J Appl Probab 18:943 948
Tiefelsdorf M (2001) Speci cation and distributional
time series in spatial time series data, which
properties of the spatial cross-correlation coef cient might lead to nd potential interactions and pat-
C"1 ;"2 . Paper presented at the Western Regional Sci- terns. A strongly correlated pair of time series
ence Conference, Palm Springs, 26 Feb indicates potential movement in one series when
Vallejos R, Osorio F, Cuevas F (2013) SpatialPack
an R package for computing spatial association be-
the other time series moves.
tween two stochastic processes de ned on the plane.
Available via DIALOG. http://rvallejos.mat.utfsm.cl/
Time%20Series%20I%202013/paper3.pdf. Accessed Historical Background
12 Feb 2016
Wartenberg D (1985) Multivariate spatial correlation: a
method for exploratory geographical analysis. Geogr The massive amounts of data generated
Anal 17:263 283 by advanced data collecting tools, such as
satellites, sensors, mobile devices, and medical
instruments, offer an unprecedented opportunity
for researchers to discover these potential nuggets
Correlation Queries of valuable information. However, correlation
queries are computationally expensive due to
 Correlation Queries in Spatial Time Series large spatio-temporal frameworks containing
Data many locations and long time sequences.
Therefore, the development of ef cient query
processing techniques is crucial for exploring
these datasets.
Correlation Queries in Spatial Time
Previous work on query processing for time
Series Data
series data has focused on dimensionality re-
duction followed by the use of low dimensional
Pusheng Zhang
indexing techniques in the transformed space.
Microsoft Corporation, Redmond, WA, USA
Unfortunately, the ef ciency of these approaches
deteriorates substantially when a small set of
dimensions cannot represent enough information
Synonyms
in the time series data. Many spatial time se-
ries datasets fall in this category. For example,
Correlation Queries; Spatial Cone Tree; Spatial
nding anomalies is more desirable than nding
Time Series
well-known seasonal patterns in many applica-
tions. Therefore, the data used in anomaly detec-
Definition tion is usually data whose seasonality has been
removed. However, after transformations (e.g.,
A spatial framework consists of a collection Fourier transformation) are applied to deseason-
of locations and a neighbor relationship. A time alize the data, the power spectrum spreads out
series is a sequence of observations taken se- over almost all dimensions. Furthermore, in most
quentially in time. A spatial time series dataset spatial time series datasets, the number of spatial
is a collection of time series, each referencing locations is much greater than the length of the
a location in a common spatial framework. For time series. This makes it possible to improve the
example, the collection of global daily tempera- performance of query processing of spatial time
ture measurements for the last 10 years is a spa- series data by exploiting spatial proximity in the
tial time series dataset over a degree-by-degree design of access methods.
latitude-longitude grid spatial framework on the In this chapter, the spatial cone tree, an spatial
surface of the Earth. data structure for spatial time series data, is dis-
Correlation Queries in Spatial Time Series Data 369

cussed to illustrate how correlation queries are ef- A spatial cone tree is a spatial data structure
ciently supported. The spatial cone tree groups for correlation queries on spatial time series data.
similar time series together based on spatial prox- The spatial cone tree uses a tree data structure,
imity, and correlation queries are facilitated using and it is formed of nodes. Each node in the spatial
spatial cone trees. This approach is orthogonal cone tree, except for the root, has one parent node
to dimensionality reduction solutions. The spatial and several-zero or more-child nodes. The root
cone tree preserves the full length of time series, node has no parent. A node that does not have C
and therefore it is insensitive to the distribution of any child node is called a leaf node and a non-
the power spectrum after data transformations. leaf node is called an internal node.
A leaf node contains a cone and a data pointer
pd to a disk page containing data entries, and
is of the form h(cone.span, cone.center), pd i.
The cone contains one or multiple normalized
Scientific Fundamentals
time series, which are contained in the disk page
referred by the pointer pd . The cone.span and
Let xD hx1 ; x2 ; : : : ; xm i and yD hy1 ; y2 ; : : : ; ym i
cone.center are made up of the characteristic
be two time series of length m. The correlation
parameters for the cone. The data pointer is
coef cient of the two time series is de ned as:
P a block address. An internal node contains a
corr.x; y/ D m1 1 m xi x yi y
iD1 . x / . y / D x O y,
O
Pm q Pm cone and a pointer pi to an index page con-
xi 2
i D1 .xi x/ taining the pointers to children nodes, and is
where x D i D1 , x D ,y D
Pm q
m
Pm
m 1
i D1 yi i D1 .yi x/
2 of the form h (cone.span, cone.center), pi i. The
m
, yD m 1
, xbi D p 1 xi x x , cone.span and cone.center are the characteris-
m 1
ybi D p 1 yi y y , xO D hxO 1 ; xO 2 , : : : ; xc
m i, and tic parameters for the cone, which contains all
m 1
yO D hyO1 ; yb2 ; : : : ; yOm i. normalized times series in the subtree rooted at
Because the sum of the xbi 2 is equal to 1: this internal node. Multiple nodes are organized
0 12
in a disk page, and the number of nodes per
Pm P
bi 2 D iD1 @ p 1 r Pm i
iD1 x
m x x A D disk page is de ned as the blocking factor for a
m 1 .x 2
x/
i D1 i
m 1
spatial cone tree. Notice that the blocking factor,
1; xO is located in a multi-dimensional unit the number of nodes per disk page, depends on
sphere. Similarly, yO is also located in a the sizes of cone span, cone center, and data
multi-dimensional unit sphere. Based on the pointer.
de nition of corr.x; y/, corr.x; y/ D xO yO D Given a minimal correlation threshold .0 <
cos. .x;O y//.
O The correlation of two time series < 1), the possible relationships between a cone
is directly related to the angle between the two C and the query time series, Tq , consist of all-
time series in the multi-dimensional unit sphere. true, all-false, or some-true. All-true means that
Finding pairs of time series with an absolute all times series with a correlation over the correla-
value of correlation above the user given minimal tion threshold; all-false means all time series with
correlation threshold is equivalent to nding a correlation less than the correlation threshold;
pairs of time series xO and yO on the unit multi- some-true means only part of time series with
dimensional sphere with an angle in the range of a correlation over the correlation threshold. The
.0; arccos. // or .180 arccos. /; 180 /. upper bound and lower bound of angles between
A cone is a set of time series in a multi- the query time series and a cone is illustrated
dimensional unit sphere and is characterized by in Fig. 1a. Let T is any normalized time series
!!
two parameters, the center and the span of the in the cone C and .Tq ; T / is denoted for the
cone. The center of the cone is the mean of all the !
angle between the query time series vector Tq and
time series in the cone. The span of the cone is !
the time series vector T in the multi-dimensional
the maximal angle between any time series in the
sphere. The following properties are satis ed:
cone and the cone center.
370 Correlation Queries in Spatial Time Series Data

a b
Lower Bound

Query Time Series Cone


All-true
–acrcos( )

Upper bound
Lower bound
All-false Some-true

o acrcos( )
All-true Some-true

0 acrcos( ) –acrcos( ) Upper Bound


Some-true
( : Correlation Threshold)

Correlation Queries in Spatial Time Series Data, Fig. 1 (a) Upper bound and lower bound, (b) properties of spatial
cone tree

!! cone is reached. The re nement step exhaustively


1. If max 2 .0; arccos. //, then .Tq ; T / 2
.0; arccos. //; checks the some-true leaf cones.
2. If min 2 .180 arccos. /; 180 /, then
!! Key Applications
.Tq ; T / 2 .180 arccos. /; 180 /;
3. If min 2 .arccos. /; 180 / and max 2
!! The explosive growth of spatial data and
. min ; 180 arccos. //, then .Tq ; T / 2 widespread use of spatial databases emphasize
.arccos. /; 180 arccos. //. the need for the automated discovery of
spatial knowledge. The complexity of spatial
If either of the rst two conditions is satis ed, the data and intrinsic spatial relationships limits
cone C is called an all-true cone (all-true lemma). the usefulness of conventional data mining
If the third condition is satis ed, the cone C techniques for extracting spatial patterns.
is called an all-false cone (all-false lemma). If Ef cient tools for extracting information from
none of the conditions is satis ed, the cone C geo-spatial data are crucial to organizations
is called a some-true cone (some-true lemma). which make decisions based on large spatial
These lemma are developed to eliminate cones datasets, including the National Aeronautics and
with all times series satisfying/dissatisfying the Space Administration (NASA), the National
correlation threshold in query processing. Geospatial-Intelligence Agency (NGA), the
The key idea of query processing is to process National Cancer Institute (NCI), and the United
a correlation query in a lter-and-re ne style on States Department of Transportation (USDOT).
the cone level, instead on the individual time These organizations are spread across many
series level. The ltering step traverses the spatial application domains including Earth science,
cone tree, applying the all-true and all-false lem- ecology and environmental management,
mas on the cones. Therefore, the cones satisfying public safety, transportation, epidemiology,
all-true or all-false conditions are ltered out. The and climatology. The application of correlation
cones satisfying some-true are traversed recur- queries used in Earth science is introduced in
sively until all-true or all-false is satis ed or a leaf details as follows.
Counterflow 371

NASA Earth observation systems currently Box G, Jenkins G, Reinsel G (1994) Time series analysis:
generate a large sequence of global snapshots of forecasting and control. Prentice Hall, Upper Saddle
River
the Earth, including various atmospheric, land, Dhillon I, Fan J, Guan Y (2001) Ef cient clustering
and ocean measurements such as sea surface of very large document collections. In: Grossman R,
temperature (SST), pressure, and precipitation. Kamath C, Kegelmeyer P, Kumar V, Namburu R (eds)
These data are spatial time series data in na- Data mining for scienti c and engineering applica-
ture. The climate of the Earth s land surface is
tions. Kluwer Academic, Dordrecht C
Chan FK, Fu AW (2003) Haar wavelets for ef cient
strongly in uenced by the behavior of the oceans. similarity search of time-series: with and without time
Simultaneous variations in climate and related warping. IEEE Trans Knowl Data Eng 15(3):678
processes over widely separated points on the 705
Guttman A (1984) R-trees: a dynamic index structure for
Earth are called teleconnections. For instance, ev- spatial searching. ACM, pp 47 57
ery three to seven years, an El Nino event, i.e., the Kahveci T, Singh A, Gurel A (2002) Similarity searching
anomalous warming of the eastern tropical region for multi-attribute sequences. IEEE, p 175
of the Paci c Ocean, may last for months, hav- Keogh E, Pazzani M (1999) An indexing scheme for fast
similarity search in large time series databases. IEEE,
ing signi cant economic and atmospheric con- pp 56 67
sequences worldwide. El Nino has been linked National Oceanic and Atmospheric Administration. El
to climate phenomena such as droughts in Aus- Nino Web Page. www.elnino.noaa.gov/
tralia and heavy rainfall along the eastern coast Ra ei D, Mendelzon A (2000) Querying time series data
based on similarity. IEEE Trans Knowl Data Eng
of South America. To investigate such land-sea 12(5):675 693
teleconnections, time series correlation queries Rigaux P, Scholl M, Voisard A (2001) Spatial databases:
across the land and ocean is often used to reveal with application to GIS. Morgan Kaufmann Publish-
the relationship of measurements of observations. ers, Reading
Samet H (1990) The design and analysis of spatial data
structures. Addison-Wesley Publishing Company, San
Francisco
Shekhar S, Chawla S (2003) Spatial databases: a tour.
Prentice Hall, Upper Saddle River. ISBN:0130174807
Future Directions Zhang P, Huang Y, Shekhar S, Kumar V (2003a) Correla-
tion analysis of spatial time series datasets: a lter-and-
In this chapter, the spatial cone tree on spa- re ne approach. In: Lecture notes in computer science,
vol 2637. Springer, Berlin/Heidelberg
tial time series data was discussed, and how Zhang P, Huang Y, Shekhar S, Kumar V (2003b) Ex-
correlation queries can be ef ciently supported ploiting spatial autocorrelation to ef ciently pro-
using the spatial cone tree was illustrated. In cess correlation-based similarity queries. In: Lec-
future work, more design issues on the spatial ture notes in computer science, vol 2750. Springer,
Berlin/Heidelberg
cone tree should be further investigated, e.g., the
blocking factor and balancing of the tree. The
spatial cone tree should be investigated to support
complex correlation relationships, such as time
lagged correlation. The generalization of spatial
cone trees to non-spatial index structures using COSP
spherical k-means to construct cone trees is also
an interesting research topic.  Error Propagation in Spatial Prediction

Recommended Reading
Counterflow
Agrawal R, Faloutsos C, Swami A (1993) Ef cient simi-
larity search in sequence databases. In: Lecture notes
in computer science, vol 730. Berlin/Heidelberg  Contra ow in Transportation Network
372 Coverage Standards and Services, Geographic

for coverage and analyze coverage types and


Coverage Standards and Services, components, e.g., ISO: ISO/TC211 (2005). These
Geographic include characteristics of spatiotemporal domain
coverage and attribute range, major coverage
Wenli Yang1 and Liping Di2
1 types, and operations on coverages. Geographic
Center for Spatial Information Science and
coverage standards provide a common tech-
Systems, College of Science, George Mason
nology language and guide the development
University, Fairfax, VA, USA
2 of interoperable services on coverage data.
Center for Spatial Information Science and
Geographic coverage services perform various
Systems (CSISS), George Mason University,
functionalities for coverage including collecting,
Fairfax, VA, USA
archiving, cataloging, publishing, distributing,
and processing of coverage data. Geographic
coverage services compliant with standard
Definition schema and interfaces are interoperable. They
can be described, published and found in standard
A geographic coverage is a representation of a service catalogues, be accessed by all compliant
phenomenon or phenomena within a bounded clients, and be connected in order to construct
spatiotemporal region by assigning a value or service chains to accomplish complex geospatial
a set of values to each position within the modeling tasks.
spatiotemporal domain. Geographic coverage
standards specify schema and frameworks for
geographic coverage or coverage components.
Geographic coverage services are those having
standard interfaces de ned by widely recognized Cross-References
standardization bodies.
 Geographic Coverage Standards and Services

Main Text

Geographic phenomena can be observed in two References


forms, one is discrete and the other is continuous.
Discrete phenomena are usually objects that can ISO: ISO/TC211 (2005) ISO 19123 geographic informa-
be directly recognized due to the existence of tion schema for coverage geometry and functions
their geometrical boundaries with other objects.
Continuous phenomena usually do not have ob-
servable boundaries and vary continuously over
space. CPU-GPU
The information on the discrete phenomena
and that on continuous phenomenon are often  Medical Image Dataset Processing over
used differently and operations performed on Cloud/MapReduce with Heterogeneous Archi-
the data recording of these two categories of tectures
information are usually also different. Thus, there
are often differences in data structure designs,
data encoding approaches, data accessing and
Crime Mapping
processing methods for these two types of
geographic phenomena. Geographic coverage is a
 CrimeStat: A Spatial Statistical Program for the
concept for continuous phenomena. Geographic
coverage standards de nes conceptual schema Analysis of Crime Incidents
 Hotspot Detection, Prioritization, and Security
Crime Mapping and Analysis 373

sis, sociological and criminological theory and


Crime Mapping and Analysis makes it possible to test conjectures from these
disciplines in order to con rm or refute them
Ronald E. Wilson and Katie M. Filbert
as actual. In essence it has developed into an
Mapping and Analysis for Public Safety
applied science, with its own tools, that exam-
Program & Data Resources, National Institute of
ines a range of issues about society and its re-
Justice, Washington, D.C, MD, USA
lationship with the elements that contribute to C
crime. Thus, crime mapping is interdisciplinary,
involving other disciplines that incorporate the
Synonyms spatial perspectives of social phenomena related
to crime, such as inequality, residential stabil-
Environmental criminology; First law of geog- ity, unemployment, resource depravation, eco-
raphy; Geographical analysis; Rational choice; nomic opportunities, housing availability, migra-
Route activity; Social disorganization; Spatial tion, segregation, and the effects of policy. Using
analysis of crime; Spatial aspects of crime; Sta- a geographic framework often leads to a more
tistical techniques comprehensive understanding of the factors that
contribute to or suppress crime.
Even though the term crime mapping is a
misnomer it will continue to be widely used as a
Definition general term with regard to the study of the spatial
aspects of crime. Users of the term need to let the
The term crime mapping is inaccurate as it context of their work dictate which standpoint is
is overly simplistic. Crime mapping is often as- being referred to.
sociated with the simple display and querying
of crime data using a Geographic Information
System (GIS). Instead, it is a general term that en-
compasses the technical aspects of visualization Historical Background
and statistical techniques, as well as practical as-
pects of geographic principles and criminological Starting in the 1930s, crime mapping was used
theories. with limited success in the United States due
From a technical standpoint, the term is a to lack of data and the means to analyze that
combination of visualization and statistical tech- data, computational capacity. Thus, its value was
niques manifested as software. This combination simple depictions on paper of where crimes were
of techniques is shared between mapping, spatial occurring. For social science these depictions
analysis and spatial data analysis. Mapping is were not important until researchers from the
simply a visualization tool that is used to display Chicago School of Sociology combined crimino-
raw geographic data and output from analysis, logical theory with geographic theory on a map.
which is done through a GIS. Spatial analysis The result was the theory of social disorgani-
is the statistical testing of geographic features in zation. Using a map, Shaw and McKay (1942)
relation to other geographic features for patterns, overlaid residences of juvenile offenders with
or lack there of. Spatial data analysis is the the (Park et al. 1925) concentric zone model of
combination of spatial analysis with associated urban land uses, including demographic charac-
attribute data of the features to uncover spatial teristics. They discovered a geographic correla-
interactions between features. tion between impoverished and blighted places
From a practical standpoint, crime mapping with those that had most of the juvenile offender
is a hybrid of several social sciences, which are residences. This fostered a new line of research
geography, sociology and criminology. It com- that examined the spatial aspects of crime that
bines the basic principles of geographic analy- spanned from 1950 to the late 1970s. Despite the
374 Crime Mapping and Analysis

impact this had on furthering spatial theories of Keith Harries (1974, 1980), to environmental
crime there was not much more that could be criminology using GIS and spatial statistics
done because the basic principles of geographic software continued, thereafter, to strengthen
analysis had not yet been operationalized into the role of geography in the study of crime.
what geographer Jerome (Dobson 1983) would As a result, criminology now has several
call, Automated Geography. Personal comput- geographic theories of crime, including rational
ers soon came afterwards, but software permitting choice (Cornish and Clarke 1986), routine
empirical testing of these theories did not come activity (Cohen and Felson 1979), and crime
until much later. It was at this point that crime pattern theory (Brantingham and Brantingham
mapping became useful to law enforcement, pri- 1981). Social disorganization theory was
marily to depict where crimes were occurring in also extended with geographical principles
order to focus resources (Weisburd and McEwen through the incorporation of simultaneous social
1997). However, there was not yet a relationship interactions between adjacent neighborhoods.
between academic institutions and law enforce- For a brief and succinct listing of these theories
ment agencies to couple theories with actual see Paulsen and Robinson (2004). At this point,
observations from the street. crime mapping branched out to become useful
Crime mapping with computers made an en- in a new practitioner-based area beyond law
trance in the mid 1960s allowing the production enforcement, the criminal justice agency. The
of maps of crime by city blocks shaded by vol- con uence of geographic principles, crimino-
ume of incidents. This was still of little inter- logical theory and advancing technology led to
est to researchers studying crime. Even though the development of crime prevention programs
criminologists were becoming interested in the based on empirical evidence, such as Hot Spot
spatial analysis of crime they were not looking to Policing.
other disciplines, including geography, for help In the late 1980s the Federal government
in analyzing data using a spatial framework. A played a role in advancing the use of the crime
manifold of software programs from geography mapping. The National Institute of Justice (NIJ)
were available that could have been used, but funded several efforts under the Drug Market
there is little evidence in any of the social science Analysis Program (DMAP) that brought together
literature that demonstrate that these programs academic institutions with law enforcement
were being used. Also neglected were principles agencies in ve cities in the United States (La
of geographic analysis, to analyze the spatial Vigne and Groff 2001). The purpose was to
aspects of data. With practitioners, their struggle identify drug markets and activities associated
was different. To produce maps of crime required with them by tracking movement of dealers and
serious computing infrastructure that, at the time, users in and out of them. These grants were the
was only available within larger city government rst to promote working relationships between
agencies, which did not hold making crime maps practitioners and researchers in the area of crime
in high priority (Weisburd and McEwen 1997). mapping to move them beyond the limitations
The growth of environmental criminology each was facing not having the other as a
in the 1980s, spearheaded by Paul and partner.
Patricia Brantingham, allowed the discipline of Continuing improvements in GIS throughout
geography to make inroads into criminological the 1990s, and into the 2000s, made it possible to
theory (La Vigne and Groff 2001). Environmental better assemble, integrate, and create new data.
criminology fused geographical principles This is probably the greatest impact that GIS
and criminological theory together with GIS has had on crime mapping. Not only could a
and provided opportunities to empirically test GIS assemble multiple and disparate sets of de-
the theories it was purporting. Signi cant mographic, economic and social data with crime
contributions by George Rengert (1989) (Rengert data, it could also create new units of analysis that
and Simon 1981), Jim LeBeau (1987, 1992) and better modeled human behavior. This capability
Crime Mapping and Analysis 375

afforded a more accurate understanding of the Scientific Fundamentals


spatial interactions among offenders, victims and
their environments that could be captured and an- Crime mapping, as an applied science, is ulti-
alyzed in ways that more accurately represented mately about where. As a result, there are con-
human settlement and activity. This freed crim- tributions from primarily two social science dis-
inologists from being con ned to the standard ciplines that make up the foundations of crime
units of analysis, such as administrative bound- mapping. The rst provides a set of principles C
aries from the US Census Bureau or other local that sets the stage for the study of crime within
governmental agencies. GIS provided the unique a spatial framework, geography. The second pro-
opportunity to represent boundaries of human vides a set of speci c spatial theories about crim-
activity more accurately through the creation of inal activity and environmental conditions that
more distinct partitions, such as police beats or form the foundation of the spatial aspects of
land use, as well as asymmetrical boundaries of crime, criminology.
human interaction created with buffers or density
surfaces. In this regard, there is nothing else
like GIS in the study of crime. For practitioners
this freed them from having to depend on other
Geographic Principles
government agencies to produce crime maps for
A complete understanding of crime is facilitated
law enforcement purposes, as well as provide op-
by two sets of factors: individual and contextual.
portunities to produce custom on demand maps
Crime mapping deals with the contextual.
for speci c purposes, including search warrants
Therefore, geographic principles are necessary to
or patrol deployment.
understand that context. These principles provide
The late 1990s saw the advancement of crime
a framework for measuring the interactions
mapping in not only both academic departments
between places. Analysis in that framework is
that study crime and law enforcement agencies,
possible combining long standing geographic
but also in the Federal government. NIJ estab-
principles that have been implemented through
lished the Crime Mapping Research Center in
GIS and spatial data analysis software. GIS
1997, now the Mapping and Analysis for Pub-
facilitates the visualization of raw data and the
lic Safety (MAPS) Program, for the purpose of
results from statistical analysis. Spatial statistical
conducting research and evaluation of the spa-
techniques extend traditional statistics to form a
tial aspects of crime. One year later NIJ pro-
more complete approach toward understanding
vided money to the National Law Enforcement
social problems, including crime. The following
and Corrections Technology Center (NLECTC)
are the three basic geographic principles that
Rocky Mountain Division to establish the Crime
are the foundation for the contextual analysis of
Mapping and Analysis Program (CMAP). This
crime.
program was to provide assistance to law enforce-
ment and criminal justice agencies speci cally
in the use of crime mapping. Into the 2000s all Place
large agencies and most medium-sized agencies Criminology has a long history of looking at
are using GIS as part of their analysis and op- the geographical in uences on crime. Some
erations efforts and are using crime mapping far of the most signi cant pieces of work were in
beyond just the simple mapping of where crime regards to the study of crime in neighborhoods,
is occurring. Research has continued to re ne communities, cites, regions and even across the
spatial theories of crime based on better coor- United States (Brantingham and Brantingham
dination with practitioners, funding from Fed- 1981; Reiss et al. 1986; Bursik and Grasmick
eral agencies and the development of software 1993; Weisburd et al. 1995). These studies
for the further understanding of crime through identify places in which criminology seeks
geography. to understand criminal activity. The focus of
376 Crime Mapping and Analysis

studying crime in place demonstrates the use Criminological Theories


of geography as a framework for contextual
analysis that no other discipline can offer. Place Criminology has developed a set of spatial the-
becomes the cornerstone because it allows for the ories of crime that have utilized all three of the
categorizing of space by de ning a geographic geographic principles listed.
unit of analysis for the systematic measurement
of human and environmental characteristics in Rational Choice
relation to neighboring places. Rational choice theory is based on classical ideas
that originated in the 1700s, with the work of
Cesare Beccaria and others who took a utilitarian
Tobler s First Law of Geography
view of crime (Beccaria 1764). This perspective
Places are not isolated islands of activity. In-
suggests that criminals think rationally and make
teractions, such as social, demographic, or eco-
calculated decisions, weighing costs and risks
nomic occur within and between places. These
of committing a crime against potential bene-
interactions form spatial relationships based on
ts while being constrained by time, cognitive
the concept that those things closer together in
ability and information available resulting in a
space are more related. That is, changes in human
limited rather than normal rationality (Cornish
activity and physical environments change slowly
and Clarke 1986). In this sense, rational choice
across space, with abrupt changes being out of the
theory also brings in economic ideas and theories
ordinary. Named after Waldo (Tobler 1970), this
into criminology.
law forms the theoretical foundation for the con-
cept of distance decay that is used for analysis of
Routine Activities
these spatial interactions and relationships which
Routine activities theory helps explain why crime
then allows for measurement in the strength of
occurs at particular places and times. The theory
interactions between places.
suggests that crime opportunities are a function
of three factors that converge in time and place,
Spatial Processes including a motivated offender, suitable target or
Human interactions that occur within, and be- victim, and lack of a capable guardian (Cohen
tween, geographic places form two concepts: spa- and Felson 1979). A fourth aspect of routine
tial heterogeneity and spatial dependence. Spatial activities theory, suggested by John Eck, is place
heterogeneity is the variability of human and en- management. Rental property managers are one
vironmental conditions across space. At the local example of place managers (Eck and Wartell
level this is change across a de ned space where 1997). They have the ability to take nuisance
conditions, such as racial composition, economic abatement and other measures to in uence be-
stability, housing conditions, land use, or mi- havior at particular places. Criminals choose or
gration vary. These things are not evenly dis- nd their targets within context of their routine
tributed across space and form various patterns, activities, such as traveling to and from work, or
at different scales, and in multiple directions, other activities such as shopping, and tend not
all of which are asymmetric. Spatial dependence to go that far out of their way to commit crimes
represents the strength of a relationship of some (Felson 1994).
phenomenon between places that have in uence
on each other, a concept known as spatial au- Crime Pattern
tocorrelation. These patterns range from clusters Crime pattern theory looks at the opportunities
to randomly distribution to dispersed to uniform. for crime within context of geographic space,
These are indications that human activity and the and makes a distinction between crime events
environments which they develop have a wide and criminality, that is, the propensity to com-
range of variability, one that usually follows sys- mit crime (Brantingham and Brantingham 1981).
temic patterns. Crime pattern theory integrates rational choice
Crime Mapping and Analysis 377

and routine activities theories, with a geographic for categorizing the distribution of a variable.
framework, place. The theory works at various Qualitative maps provide a mechanism for classi-
geographic scales, from the macro-level with spa- cation of some description, or label, of a value.
tial aggregation at the census tract or other level, They are often shaded administrative or statistical
to the micro-scale with focus on speci c crime boundaries, such as census blocks, police beats or
events and places. Crime pattern theory focuses neighborhoods. For example, robbery rates based
on situations or places where there is lack of on population can be derived for neighborhood C
social control or guardianship over either the boundaries giving an indication of the neighbor-
suspect or victim, combined with a concentration hoods that pose the highest risk. However, loca-
of targets. For example, a suburban neighborhood tions can be symbolized to show quantities based
can become a hot spot for burglaries because on size or color of the symbol. For example, mul-
some homes have inadequate protection and no- tiple crime events at a particular location give an
body home to guard the property. indication of repeat victimization, such as com-
mon in burglary. However, simple visualization
Social Disorganization of values and rates can be misleading, especially
Social disorganization theory emphasizes the im- since the method of classi cation can change the
portance of social controls in neighborhoods on meaning of a map. Spatial statistics are then used
controlling behavior, particularly for individuals to provide more rigorous and objective analysis
with low self-control or a propensity to commit of spatial patterns in the data.
crime. Social controls can include family, as well
as neighborhood institutions such as schools and Non-graphical Indicators
religious places. When identifying places with Non-graphical statistical tests produce a single
social disorganization, the focus is on ability of number that represents the presence of the clus-
local residents to control social deviancy (Bursik tering of crime incidents or not. These are global
and Grasmick 1993). Important factors include level statistics indicating the strength of spa-
poverty, as well as turnover of residents and tial autocorrelation, but not its location. They
outmigration, which hinder the development of compare actual distributions of crime incidents
social networks and neighborhood institutions with random distributions. Positive spatial au-
that lead to collective ef cacy (Sampson et al. tocorrelation indicates that incidents are clus-
1997). tered, while negative indicates that incidents are
uniform. Tests for global spatial autocorrelation
within a set of points include Moran s I, (Chakra-
Key Applications vorty 1995), Geary s C statistic, and Nearest
Neighbor Index (Levine 2005). After visualizing
There are ve key applications in crime mapping. data in thematic maps these are the rst statistical
These applications are thematic mapping, non- tests conducive to determining whether there are
graphical indicators, hot spots, spatial regression any local level relationships between crime and
and geographic pro ling. They make up a full place exist.
compliment of techniques from elementary to
advanced. Hot Spots
Hot spots are places with concentrations of high
Thematic Mapping crime or a greater than average level of crime.
Thematic maps are color coded maps that de- The converse of a hot spot is a cold spot, which
pict the geographic distribution of numeric or are places that are completely, or almost, devoid
descriptive values of some variable. They reveal of crime. Identi cation and analysis of hot spots
the geographic patterns of the underlying data. is often done by police agencies, to provide
A variable can be quantitative or qualitative. guidance as to where to place resources and
Quantitative maps provide multiple techniques target crime reduction efforts. Hot spot analysis
378 Crime Mapping and Analysis

can work at different geographic levels, from criminals do not go far out of their daily routines
the macro-scale, looking at high crime neigh- to commit crimes. Geographic pro ling takes into
borhoods, or at the micro-scale to nd speci c account a series of crime locations that have been
places such as particular bars or street segments linked to a particular serial criminal and creates a
that are experiencing high levels of crime (Eck probability surface that identi es the area where
et al. 2005). Depending on the level of analysis, the offender s anchor point may be (Rossmo
police can respond with speci c actions such as 2000; Canter 2003). Geographic pro ling was
issuing a warrant or focusing at a neighborhood originally developed for use in serial murder,
level to address neighborhood characteristics that rapes, and other rare but serious crimes. However,
make the place more criminogenic. A variety of geographic pro ling is being expanded to high-
spatial statistical techniques are used for creat- volume crimes such as serial burglary (Chainey
ing hot spots, such as density surfaces (Levine and Ratcliffe 2005).
2005), location quotients (Isserman 1977; Brant-
ingham and Brantingham 1995; Ratcliffe 2004),
local indicators of spatial autocorrelation (LISA) Future Directions
(Anselin 1995; Getis and Ord 1996; Ratcliffe
and McCullagh 1998), and nearest neighborhood The advancement of research and practice in
hierarchical clustering (Levine 2005). crime mapping rests on continuing efforts in three
areas: efforts by research and technology centers,
Spatial Regression software development, and expansion into law
Regression techniques, such as Ordinary Least enforcement and criminal justice.
Squares (OLS), have been used for quite some Crime mapping research and technology
time in criminology as explanatory models. This centers, such as the MAPS Program, the CMAP
technique has a major limitation, in that it does and the Crime Mapping Center at the Jill
not account for spatial dependence inherent in Dando Institute (JDI), are primary resources
almost all data. Holding to geographic principles, for research, development and application of
a place with high crime is most likely surrounded GIS, spatial data analysis methodologies and
by neighbors that also experience high crime, geographic technologies. These three centers
thereby displaying spatial autocorrelation, i.e. a serves as conduits for much of the work
spatial effect. Spatial regression techniques, de- conducted in both the academic and practitioner
veloped by Luc (Anselin 2002), take into account communities. The MAPS Program is a grant
spatial dependence in data. Not factoring these funding and applied research center that serves as
spatial effects into models makes them biased and a resource in the use of GIS and spatial statistics
less ef cient. Tests have been created for iden- used in crime studies. The program awards
tifying spatial effects in the dependent variable numerous grants for research and development
(spatial lag) and among the independent variables in the technical, applied and theoretical aspects
(spatial error). If tests detect the presence of of using GIS and spatial data analysis to study
spatial lag or error, this form of regression adjusts crime, as well as conduct research themselves.
the model so that spatial effects do not unduly As a counterpart to the MAPS Program,
affect the explanatory power of the model. CMAP s mission is to serve practitioners in
law enforcement and criminal justice agencies
Geographic Pro ling by developing tools and training materials for
Geographic pro ling is a technique for identify- the next generation and crime analysts and
ing the likely area where a serial offender resides applied researchers in the use of GIS and spatial
or other place such as their place of work, that analysis. In the UK the Jill Dando Institute of
serves as an anchor point. Geographic pro ling Crime Science has a Crime Mapping Center that
techniques draw upon crime place theory and contributes to the advancement in understanding
routine activities theory, with the assumption that the spatial aspects of crime with an approach
Crime Mapping and Analysis 379

called crime science. This approach utilizes combines crime data with community data where
theories and principles from many scienti c crime is a characteristic of populations rather than
disciplines to examine every place as a unique a product. That is to say crime is, at times, a cause
environment for an explanation of the presence or of conditions rather than the result of conditions.
absence of crime. They conduct applied research It is an indicator of the well being of neigh-
and provide training with their unique approach borhoods, communities or cities. Shared with
on a regular basis. The MAPS Program and local level policy makers, COMPASS provides a C
the Crime Mapping Center at the JDI hold view into this well being of their communities.
conferences on a regular basis. These events Resources can be directed to those places that are
form the nexus for practitioners and researchers not well and helps to understand what makes
to work together in the exchange of ideas, data, other places well. Combined with problem-
experiences and results from analysis that create oriented policing, a strategy that addresses spe-
a more robust applied science. ci c crime problems, this approach can be ef-
Software programs are vital to the progression fective in reducing crime incidents and a general
of the spatial analysis of crime. These programs reduction in social disorder (Braga et al. 1999).
become the scienti c instruments that researchers Coupled with applications in criminal justice,
and practitioners need in understanding human mapping can be utilized to understand the results
behavior and environmental conditions as they of policy and the outcomes. This includes topics
relate to crime. Software, such as CrimeStat, important to community corrections in moni-
GeoDa and spatial routines for R are being writ- toring or helping returning offenders, including
ten to include greater visualization capabilities, registered sex offenders. Or, mapping can be of
more sophisticated modeling and mechanisms use in allocating probation and parole of cers
for seamless operation with other software. For to particular geographic areas, directing proba-
example, in version three of CrimeStat the theory tioners and parolees to community services, and
of travel demand was operationalized as a set of selecting sites for new community services and
routines that apply to criminals as mobile agents facilities (Karuppannan 2005). Finally, mapping
in everyday life. GeoDa continues to generate can even help to understand the geographic pat-
robust tools for visualization based on the prin- terns of responses to jury summons to determine
ciples of Exploratory Data Analysis (EDA). New if there are racial biases are occurring in some
and cutting edge tools for geographic visualiza- systematic way across a jurisdiction (Ratcliffe
tion, spatial statistics and spatial data analysis are 2004).
being added to the open statistical development These three elements will persist and inter-
environment R on a regular basis. All of these twine to evermore incorporate the geographic
programs provide a rich set of tools for testing aspects of basic and applied research of crime
theories and discovering new patterns that recip- through technology. The advancement of knowl-
rocally help re ne what is known about patterns edge that crime mapping can provide will re-
of crime. The emergence of spatial statistics has quire continued reciprocation of results between
proven important enough that even the major research and practice through technology (Stokes
statistical software packages, such as SAS, SPSS, 1997). The hope is that researchers will continue
and Stata are all incorporating full sets of spatial to create new techniques and methods that fuse
statistics routines. classical and spatial statistics together to further
The application of crime mapping is expand- operationalize geographic principles and crimi-
ing into broader areas of law enforcement and nological theory to aid in the understanding of
criminal justice. In law enforcement mapping is crime. Practitioners will implement new tools
taking agencies in new directions toward crime that are developed for analyzing crime with ge-
prevention. For example, the Computer Map- ographic perspectives. They will also continue to
ping, Planning and Analysis of Safety Strate- take these tools in new directions as improvers
gies (COMPASS) Program, funded by the NIJ, of technology (Stokes 1997) and discover new
380 Crime Mapping and Analysis

patterns as those tools become more complete in Canter D (2003) Mapping murder: the secrets of geo-
modeling places. graphic pro ling. Virgin Publishing, London
Chainey S, Ratcliffe J (2005) GIS and crime mapping.
Wiley, Hoboken
Acknowledgements We would like to thank Keith Har-
Chakravorty S (1995) Identifying crime clusters: the spa-
ries, Dan Helms, Chris Maxwell and Susan Wernicke-
tial principles. Middle States Geogr 28:53 58
Smith for providing comments on this entry in a very short
Cohen L, Felson M (1979) Social change and crime rate
time. They provided valuable comments that were used
trends. Am Soc Rev 44(4):588 608
toward crafting this entry.
Cornish D, Clarke RV (1986) The reasoning criminal.
The views expressed in this paper are those of the
Springer
authors, and do not represent the of cial positions or
Dobson JE (1983) Automated geography. Prof Geogr
policies of the National Institute of Justice or the US
35(2):135 143
Department of Justice.
Eck J, Chainey S, Cameron J, Leitner M, Wilson
RE (2005) Mapping crime: understanding hot spots.
National Institute of Justice, Washington, DC
Eck J, Wartell J (1997) Reducing crime and drug dealing
Cross-References by improving place management: a randomized exper-
iment. National Institute of Justice
Felson M (1994) Crime and everyday life. Pine
 Autocorrelation, Spatial
Forge
 Constraint Data, Visualizing Getis A, Ord JK (1996) Local spatial statistics: an
 CrimeStat: A Spatial Statistical Program for the overview. In: Longley P, Batty M (eds) Spatial analy-
Analysis of Crime Incidents sis: modelling in a GIS environment. Geoinformation
International, Cambridge, pp 261 277
 Data Analysis, Spatial
Harries KD (1974) The geography of crime and justice.
 Exploratory Visualization McGraw-Hill, New York
 Hotspot Detection, Prioritization, and Security Harries KD (1980) Crime and the environment. Charles C
 Patterns, Complex Thomas Press, Sping eld
Isserman AM (1977) The location quotient approach for
 Spatial Econometric Models, Prediction
estimating regional economic impacts. J Am Inst Plan
 Spatial Regression Models 43:33 41
 Statistical Descriptions of Spatial Patterns Karuppannan J (2005) Mapping and corrections: man-
 Time Geography agement of offenders with geographic information
systems. Corrections Compendium. http://www.iaca.
net/Articles/drjaishankarmaparticle.pdf
La Vigne NG, Groff ER (2001) The evolution of crime
mapping in the United States: from the descriptive to
References the analytic. In: Hirsch eld A, Bowers K (eds) Map-
ping and analyzing crime data. University of Liverpool
Anselin L (1995) Local indicators of spatial association Press, Liverpool, pp 203 221
LISA. Geogr Anal 27:93 115 LeBeau JL (1987) Patterns of stranger and serial rape
Anselin L (2002) Under the hood: issues in the speci - offending: factors distinguishing apprehended and at
cation and interpretation of spatial regression models. large offenders. J Crim Law Criminol 78(2):309
http://sal.uiuc.edu/users/anselin/papers.html 326
Braga AA, Weisburd DL, et al (1999) Problemoriented LeBeau JL (1992) Four case studies illustrating the spa-
policing in violent crime places: a randomized con- tialtemporal analysis of serial rapists. Police Stud
trolled experiment. Criminology 7:541 580 15:124 145
Brantingham P, Brantingham P (1981) Environmental Levine N (2005) CrimeStat III version 3.0, a spatial
criminology. Waverland Press, Prospect Heights statistics program for the analysis of crime incident
Brantingham P, Brantingham P (1995) Location quotients locations
and crime hotspots in the city. In: Block C, Dabdoub Park RE, Burgess EW, McKenzie RD (1925) The city:
M, Fregly S (eds) Crime analysis through computer suggestions for investigation of human behavior in
mapping. Police Executive Research Forum, Washing- the urban environment. University of Chicago Press,
ton, DC Chicago
Beccaria C (1764) Richard Davies, translator: on crimes Paulsen DJ, Robinson MB (2004) Spatial aspects of crime:
and punishments, and other writings. Cambridge theory and practice. Allyn and Bacon, Boston
University Press Ratcliffe JH, McCullagh MJ (1998) The perception of
Bursik RJ, Grasmick HG (1993) Neighborhoods and crime hotspots: a spatial study in Nottingham, UK.
crime: the dimensions of effective community control. In: Crime mapping case studies: successes in the eld.
Lexington Books, New York National Institute of Justice, Washington, DC
CrimeStat: A Spatial Statistical Program for the Analysis of Crime Incidents 381

Ratcliffe JH (2004) Location quotients and force- eld


analysis. In: 7th annual international crime mapping CrimeStat
research conference, Boston
Reiss AJ, Tonry M (eds) (1986) Communities and crime,  CrimeStat: A Spatial Statistical Program for the
vol 8. University of Chicago Press, Chicago
Rengert GF, Simon H (1981) Crime spillover. Sage Publi- Analysis of Crime Incidents
cations, Beverley Hills
Rengert GF (1989) Behavioral geography and criminal be- C
havior. In: Evans DJ, Herbert DT (eds) The geography
of crime. Routledge, London
Rossmo DK (2000) Geographic pro ling. CRC Press, CrimeStat: A Spatial Statistical
Boca Raton Program for the Analysis of Crime
Sampson RJ, Raudenbush SR, Earls F (1997) Neigh- Incidents
borhoods and violent crime: a multilevel study of
collective ef cacy. Science 227:918 924
Shaw CR, McKay HD (1942) Juvenile delinquency Ned Levine
and urban areas. University of Chicago Press, Ned Levine & Associates, Houston, TX, USA
Chicago
Stokes DE (1997) Pasteur s quadrant: basic science and
technological innovation. Brookings Institution Press,
Washington, DC Synonyms
Tobler WR (1970) A computer movie simulating ur-
ban growth in the Detroit region. Econ Geogr 46:
Centrographic measures; Correlated walk; Crime
234 240
Weisburd D, Eck JE (eds) (1995) Crime and place: crime mapping; CrimeStat; Crime travel demand; Geo-
prevention studies, vol 4. Police Executive Research graphic pro ling; Hotspot; Interpolation; Journey
Forum/Willow Tree Press, Washington, DC to crime analysis; Knox test; Mantel test; Space
Weisburd D, McEwen T (eds) (1997) Introduction: crime
time interaction; Spatial statistics program
mapping and crime prevention. In: Crime mapping
and crime prevention. Criminal Justice Press, Monsey,
pp 1 23

Definition
Recommended Reading
CrimeStat is a spatial statistics and visualization
Clarke RV (1992) Situational crime prevention: successful program that interfaces with desktop GIS pack-
case studies. Harrow and Heston, New York ages. It is a stand-alone Windows program for the
Cresswell T (2004) Place: a short introduction. Blackwell
Publishing Ltd, Malden analysis of crime incident locations and can in-
Haining R (2003) Spatial data analysis: theory and prac- terface with most desktop GIS programs. Its aim
tice. Cambridge University Press, Cambridge/New is to provide statistical tools to help law enforce-
York ment agencies and criminal justice researchers
Ray JC (1977) Crime prevention through environmental
design. Sage Publications, Beverly Hills in their crime mapping efforts. The program has
Ronald CV (1992) Situational crime prevention. Harrow many statistical tools, including centrographic,
and Heston, New York distance analysis, hot spot analysis, space-time
Weisburd D, Green L (1995) Policing drug hot spots: the analysis, interpolation, Journey-to-Crime estima-
Jersey city drug market analysis experiment. Justice Q
12(4):711 736 tion, and crime travel demand modeling routines.
The program writes calculated objects to GIS
les that can be imported into a GIS program,
including shape, MIF/MID, BNA, and ASCII.
Crime Travel Demand The National Institute of Justice is the distributor
of CrimeStat and makes it available for free to an-
alysts, researchers, educators, and students (The
 CrimeStat: A Spatial Statistical Program for the program is available at http://www.icpsr.umich.
Analysis of Crime Incidents edu/crimestat). The program is distributed along
382 CrimeStat: A Spatial Statistical Program for the Analysis of Crime Incidents

with a manual that describes each of the statistics Evans 1954), the linear nearest neighbor statistic,
and gives examples of their use (Levine 2007a). the K-order nearest neighbor distribution (Cressie
1991), and Ripley s K statistic (Ripley 1981).
The testing of signi cance for Ripley s K is done
Historical Background through a Monte Carlo simulation that estimates
approximate con dence intervals.
CrimeStat has been developed by Ned Levine
and Associates since the late 1990s under grants Hot Spot Analysis
from the National Institute of Justice. It is an An extreme form of spatial autocorrelation is a
outgrowth of the Hawaii Pointstat program that hot spot. While there is no absolute de nition of
was UNIX-based (Levine 1996). CrimeStat, on a hot spot , police are aware that many crime
the other hand, is a Windows-based program. It incidents tend to be concentrated in a limited
is written in C++ and is multi-threading. To date, number of locations. The Mapping and Analy-
there have been three major versions with two sis for Public Safety Program at the National
updates. The rst was in 1999 (version 1.0) with Institute of Justice has sponsored several major
an update in 2000 (version 1.1). The second was studies on crime hot spot analysis (Harries 1999;
in 2002 (CrimeStat II) and the third was in 2004 LaVigne and Wartell 1998; Eck et al. 2005).
(CrimeStat III). The current version is 3.1 and CrimeStat includes seven distinct hot spot
was released in March 2007. analysis routines: the mode, the fuzzy mode,
nearest neighbor hierarchical clustering (Everitt
et al. 2001), risk-adjusted nearest neighbor hi-
Scientific Fundamentals erarchical clustering (Levine 2004), the Spatial
and Temporal Analysis of Crime routine (STAC)
The current version of CrimeStat covers seven (Block 1994), K-means clustering, and Anselin s
main areas of spatial analysis: centrographic; Moran statistic (Anselin 1995).
spatial autocorrelation, hot spot analysis, inter- The mode counts the number of incidents at
polation, space-time analysis, Journey-to-Crime each location. The fuzzy mode counts the number
modeling, and crime travel demand modeling. of incidents at each location within a speci ed
search circle; it is useful for detecting concen-
Centrographic Measures trations of incidents within a short distance of
There are a number of statistics for describing each other (e.g., at multiple parking lots around
the general properties of a distribution. These a stadium; at the shared parking lot of multiple
include central tendency of the overall spatial apartment buildings).
pattern, dispersion and directionality. Among the The nearest neighbor hierarchical clustering
statistics are the mean center, the center of min- routine de nes a search circle that is tied to
imum distance, the standard distance deviation, the random nearest neighbor distance. First, the
the standard deviational ellipse, the harmonic algorithm groups incidents that are closer than the
mean, the geometric mean, and the directional search circle and then searches for a concentra-
mean (Ebdon 1988). tion of multiple incidents within those selected.
The center of each concentration is identi ed and
Spatial Autocorrelation all incidents within the search circle of the center
There are several statistics for describing spatial of each concentration are assigned to the cluster.
autocorrelation, including Moran s I, Geary s C, Thus, incidents can belong to one-and-only-one
and a Moran Correlogram (Moran 1948; Geary cluster, but not all incidents belong to a cluster.
1954; Ebdon 1988). There are also several statis- The process is repeated until the distribution is
tics that describe spatial autocorrelation through stable ( rst-order clusters). The user can spec-
the properties of distances between incidents in- ify a minimum size for the cluster to eliminate
cluding the nearest neighbor statistic (Clark and very small clusters (e.g., 2 or 3 incidents at the
CrimeStat: A Spatial Statistical Program for the Analysis of Crime Incidents 383

same location). Once clustered, the routine then (a kernel). The densities are summed over all
clusters the rst-order clusters to produce second- incidents to produce an estimate for the cell. This
order clusters. The process is continued until the process is then repeated for each grid cell (Bailey
grouping algorithm fails. The risk-adjusted near- and Gatrell 1995). CrimeStat allows ve different
est neighbor hierarchical clustering routine fol- mathematical functions to be used to estimate the
lows the same logic but compares the distribution density. The particular dispersion of the function
of incidents to a baseline variable. The clustering is controlled through a bandwidth parameter and C
is done with respect to a baseline variable by the user can select a xed or an adaptive band-
calculating a cell-speci c grouping distance that width. It is a type of hot spot analysis in that it
would be expected on the basis of the baseline can illustrate where there are concentrations of
variable, rather than a single grouping distance incidents. However it lacks the precision of the
for all parts of the study area. hot spot routines since it is smoothed. The hot
The Spatial and Temporal Analysis of Crime spot routines will show exactly which points are
hot spot routine (STAC) is linked to a grid and included in a cluster.
groups on the basis of a minimum size. It is CrimeStat has two different kernel function, a
useful for identifying medium-sized clusters. The single-variable kernel density estimation routine
K-means clustering algorithm divides the points for producing a surface or contour estimate of the
into K distinct groupings where K is de ned density of incidents (e.g., the density of burglar-
by the user. Since the routine will frequently ies) and a dual-variable kernel density estimation
create clusters of vastly unequal size due to the routine for comparing the density of incidents to
concentration of incidents in the central part of the density of an underlying baseline (e.g., the
most metropolitan areas, the user can adjust them density of burglaries relative to the density of
through a separation factor. Also, the user can de- households).
ne speci c starting points (seeds) for the clusters As an example, Fig. 1 shows motor vehicle
as opposed to allowing the routine to nd its own. crash risk along Kirby Drive in Houston for
Statistical signi cance of these latter routines 1999 2001. Crash risk is de ned as the an-
is tested with a Monte Carlo simulation. The nual number of motor vehicle crashes per 100
nearest neighbor hierarchical clustering, the risk- million vehicle miles traveled (VMT) and is a
adjusted nearest neighbor hierarchical clustering, standard measure of motor vehicle safety. The
and the STAC routines each have a Monte Carlo duel-variable kernel density routine was used to
simulation that allows the estimation of approx- estimate the densities with the number of crashes
imate con dence intervals or test thresholds for being the incident variable and VMT being the
these statistics. baseline variable. In the map, higher crash risk
Finally, unlike the other hot spot routines, is shown as darker. As a comparison, hot spots
Anselin s Local Moran statistic is applied to ag- with 15 or more incidents were identi ed with the
gregates of incidents in zones. It calculates the nearest neighbor hierarchical clustering routine
similarity and dissimilarity of zones relative to and are overlaid on the map as are the crash
nearby zones by applying the Moran s I statistic locations.
to each zone. An approximate signi cance test
can be calculated using an estimated variance.
Space-Time Analysis
Interpolation There are several routines for analyzing cluster-
Interpolation involves extrapolating a density es- ing in time and in space. Two are global measures
timate from individual data points. A ne-mesh the Knox and Mantel indices, which specify
grid is placed over the study area. For each grid whether there is a relationship between time and
cell, the distance from the center of the cell to space. Each has a Monte Carlo simulation to es-
each data point is calculated and is converted timate con dence intervals around the calculated
into a density using a mathematical function statistic.
384 CrimeStat: A Spatial Statistical Program for the Analysis of Crime Incidents

CrimeStat: A Spatial Statistical Program for the Analysis of Crime Incidents, Fig. 1 Safety on Houston s Kirby
Drive: 1998 2001

The third space-time routine is a speci c tool events committed by the serial offender by dis-
for predicting the behavior of a serial offender tance, direction, and time interval. It does this
called the Correlated Walk Analysis module. This by analyzing the sequence of lagged incidents. A
module analyzes periodicity in the sequence of diagnostic correlogram allows the user to analyze
CrimeStat: A Spatial Statistical Program for the Analysis of Crime Incidents 385

periodicity by different lags. The user can then on a large set of records of known offenders,
specify one of several methods for predicting the routine estimates the distribution of origins
the next incident that the serial offender will of these offenders. This information can then
commit, by location and by time interval. Error be combined with the travel distance function
is, of course, quite sizeable with this methodol- to make estimates of the likely location of a
ogy because serial offenders don t follow strict serial offender where the residence location is not
mathematical rules. But the method can be useful known. Early tests of this method suggest that it is C
for police because it can indicate whether there 10 15% more accurate than the traditional travel
are any repeating patterns that the offender is distance only method in terms of estimating the
following. distance between the highest probability location
and the location where the offender lived.
Journey-to-Crime Analysis As an example, Fig. 2 shows a Bayesian prob-
A useful tool for police departments seeking ability model of the likely residence location of a
to apprehend a serial offender is Journey-to- serial offender who committed ve incidents be-
crime analysis (sometimes known as Geographic tween 1993 and 1997 in Baltimore County, Mary-
Pro ling). This is a method for estimating the land (two burglaries and three larceny thefts). The
likely residence location of a serial offender given grid cell with the highest probability is outlined.
the distribution of incidents and a model for The location of the incidents is indicated as is the
travel distance (Brantingham and Brantingham actual residence location of the offender when ar-
1981; Canter and Gregory 1994; Rossmo 1995; rested. As seen, the predicted highest probability
Levine 2007b). The method depends on building location is very close to the actual location (0.14
a typical travel distance function, either based on of a mile error).
empirical distances traveled by known offenders
or on an a priori mathematical function that ap- Crime Travel Demand Modeling
proximates travel behavior (e.g., a negative expo- CrimeStat has several routines that examine
nential function, a negative exponential function travel patterns by offenders. There is a module
with a low use buffer zone around the offender s for modeling crime travel behavior over a
residence). metropolitan area called Crime Travel Demand
CrimeStat has a Journey-to-Crime routine that modeling. It is an application of travel demand
uses the travel distance function and a Bayesian modeling that is widely used in transportation
Journey-to-Crime routine that utilizes additional planning (Ortuzar and Willumsen 2001). There
information about the likely origins of offenders are four separate stages to the model. First,
who committed crimes in the same locations. predictive models of crimes occurring in a series
With both types the traditional distance-based of zones (crime destinations) and originating in
and the Bayesian, there are both calibration and a series of zones (crime origins) are estimated
estimation routines. In the calibration routine for using a non-linear (Poisson) regression model
the Journey-to-Crime routine, the user can create with a correction for over-dispersion (Cameron
an empirical travel distance function based on the and Trivedi 1998). Second, the predicted origins
records of known offenders where both the crime and destinations are linked to yield a model
location and the residence location were known of crime trips from each origin zone to each
(typically from arrest records). This function can destination zone using a gravity-type spatial
then be applied in estimating the likely location interaction model. To estimate the coef cients,
of a single serial offender for whom his or her the calibrated model is compared with an actual
residence location is not known. distribution of crime trips.
The Bayesian Journey-to-Crime routine uti- In the third stage, the predicted crime trips
lizes information about the origins of other of- are separated into different travel modes using
fenders who committed crimes in the same lo- an approximate multinomial utility function
cations as a single serial offender. Again, based (Domencich and McFadden 1975). The aim is
386 CrimeStat: A Spatial Statistical Program for the Analysis of Crime Incidents

CrimeStat: A Spatial Statistical Program for the Analysis of Crime Incidents, Fig. 2 Estimating the residence
location of a serial offender in Baltimore County (MD)

to examine possible strategies used by offenders Key Applications


in targeting their victims. Finally, the predicted
crime trips by travel mode are assigned to CrimeStat is oriented mostly toward the law en-
particular routes, either on a street network or forcement and criminal justice elds, but it has
a transit network. The cost of travel along the been used widely by researchers in other elds
network can be estimated using distance, travel including geography, traf c safety, urban plan-
time, or a generalized cost using the A* shortest ning, sociology, and even elds like botany and
path algorithm (Sedgewick 2002). forestry. The tools re ect a range of applica-
Once calibrated, the model can be used tions that criminal justice researchers and crime
to examine possible interventions or policy analysts might nd useful, some describing the
scenarios. For example, one study examined spatial distribution and others being targeted to
the travel behavior of individuals who were particular offenders.
involved in Driving-while-Intoxicated (DWI) For example, hot spot analysis is particularly
motor vehicle crashes in Baltimore County. useful for police departments. Police of cers,
Neighborhoods where a higher proportion of crime analysts and researchers are very familiar
DWI drivers involved in crashes were identi ed with the concentration of crime or other incidents
as were locations where many DWI crashes had that occur in small areas. Further they are aware
occurred. Interventions in both high DWI driver that many offenders live in certain neighborhoods
neighborhoods and the high DWI crash locations that are particularly poor and lacking in social
were simulated using the model to estimate the amenities. There is a large literature on high
likely reduction in DWI crashes that would crime areas so that the phenomenon is very well
be expected to occur if the interventions were known (e.g., see Cohen and Felson 1979; Wilson
actually implemented. and Kelling 1982). The hot spot tools can be
CrimeStat: A Spatial Statistical Program for the Analysis of Crime Incidents 387

useful to help police systematically identify the consistent with trends in computer science. First,
high crime areas as well as the areas where there there will be a new GUI interface that will be
are concentrations of offenders (which are not more Windows Vista-oriented. Second, the code
necessarily the same as the high crime locations). is being revised to be consistent with the .NET
For example, the hot spot tools were used to framework and selected routines will be compiled
identify locations with many red light running as objects in a library that will be available for
crashes in Houston as a prelude for introducing programmers and third-party applications. Third, C
photo-enforcement. The Massachusetts State Po- additional statistics relevant for crime predic-
lice used the neighbor nearest hierarchical clus- tion are being developed. These include a spatial
tering algorithm to compare heroin and marijuana regression module using Markov Chain Monte
arrest locations with drug seizures in one small Carlo methods and an incident detection module
city (Bibel 2004). for identifying emerging crime hot spot spots
Another criminal justice application is the early in their sequence. Version 4 is expected to
desire to catch serial offenders, particularly be released early in 2009.
high visibility ones. The Journey-to-Crime and
Bayesian Journey-to-Crime routines can be
useful for police departments in that it can narrow Cross-References
the search that police have to make to identify
likely suspects. Police will routinely search  Autocorrelation, Spatial
through their database of known offenders;  Crime Mapping and Analysis
the spatial narrowing can reduce that search  Data Analysis, Spatial
substantially. The CrimeStat manual has several  Emergency Evacuation, Dynamic Transporta-
examples of the Journey-to-Crime tool being tion Models
used to identify a serial offender. As an example,  Hotspot Detection, Prioritization, and Security
the Glendale (Arizona) Police Department used  Movement Patterns in Spatio-temporal Data
the Journey-to-Crime routine to catch a felon  Nearest Neighbor Problem
who had committed many auto thefts (Hill 2004).  Movement Patterns in Spatio-Temporal Data
Many of the other tools are more relevant  Public Health and Spatial Modeling
for applied researchers such as the tools for de-  Routing Vehicles, Algorithms
scribing the overall spatial distribution or for  Statistical Descriptions of Spatial Patterns
calculating risk in incidents (police typically are
interested in the volume of incidents) or for
modeling the travel behavior of offenders. Two References
examples from the CrimeStat manual are given.
Amaral S, Monteiro AMV, C mara G, Quintanilha JA
First, the spatial distribution of Man With A (2004) Evolution of the urbanization process in the
Gun calls for service during Hurricane Hugo Brazilian Amazonia. In: Levine N (ed) CrimeStat III:
in Charlotte, North Carolina was compared with a spatial statistics program for the analysis of crime
a typical weekend (LeBeau 2004). Second, the incident locations (version 3.0), Chapter 8. Ned Levine
& Associates, Houston; National Institute of Justice,
single-variable kernel density routine was used Washington, DC
to model urbanization changes in the Amazon Anselin L (1995) Local indicators of spatial association
between 1996 and 2000 (Amaral et al. 2004). LISA. Geogr Anal 27(2):93 115
Bailey TC, Gatrell AC (1995) Interactive spatial data
analysis. Longman Scienti c & Technical/Burnt Mill,
Essex
Future Directions Bibel B (2004) Arrest locations as a means for direct-
ing resources. In: Levine N (ed) CrimeStat III: a
spatial statistics program for the analysis of crime
Version 4 of CrimeStat is currently being de-
incident locations (version 3.0), Chapter 6. Ned Levine
veloped (CrimeStat IV). The new version will & Associates, Houston; National Institute of Justice,
have a complete restructuring to modernize it Washington, DC
388 Cross-Covariance Models

Block CR (1994) STAC hot spot areas: a statistical tool & Associates, Houston; National Institute of Justice,
for law enforcement decisions. In: Proceedings of the Washington, DC
workshop on crime analysis through computer map- Levine N (2007a) CrimeStat III: a spatial statistics pro-
ping. Criminal Justice Information Authority, Chicago gram for the analysis of crime incident locations (ver-
Brantingham PL, Brantingham PJ (1981) Notes on the sion 3.1). Ned Levine & Associates, Houston; National
geometry of crime. In: Brantingham PJ, Brantingham Institute of Justice, Washington, DC
PL (eds) Environmental criminology. Waveland Press, Levine N (2007b) Bayesian journey to crime estima-
Inc., Prospect Heights, pp 27 54 tion (update chapter). In: Levine N (ed) CrimeStat
Cameron AC, Trivedi PK (1998) Regression analysis of III: a spatial statistics program for the analysis of
count data. Cambridge University Press, Cambridge crime incident locations (version 3.1). Ned Levine
Canter D, Gregory A (1994) Identifying the residential & Associates, Houston; National Institute of Justice,
location of rapists. J Forens Sci Soc 34(3):169 175 Washington, DC. Available at http://www.icpsr.umich.
Clark PJ, Evans FC (1954) Distance to nearest neighbor edu/crimestat
as a measure of spatial relationships in populations. Moran PAP (1948) The interpretation of statistical maps.
Ecology 35:445 453 J R Stat Soc B 10:243 251
Cohen LE, Felson M (1979) Social change and crime Ortuzar JD, Willumsen LG (2001) Modeling transport,
rate trends: a routine activity approach. Am Soc Rev 3rd edn. Wiley, New York
44:588 608 Ripley BD (1981) Spatial statistics. Wiley, New York
Cressie N (1991) Statistics for spatial data. Wiley, New Rossmo DK (1995) Overview: multivariate spatial pro-
York les as a tool in crime investigation. In: Block CR,
Domencich T, McFadden DL (1975) Urban travel de- Dabdoub M, Fregly S (eds) Crime analysis through
mand: a behavioral analysis. North-Holland Publish- computer mapping. Police Executive Research Forum,
ing Co. Reprinted 1996. Available at: http://emlab. Washington, DC, pp 65 97
berkeley.edu/users/mcfadden/travel.html Sedgewick R (2002) Algorithms in C++: part 5 graph
Ebdon D (1988) Statistics in geography, 2nd edn. (with algorithms, 3rd edn. Addison-Wesley, Boston
corrections) Blackwell, Oxford Wilson JQ, Kelling G (1982) Broken windows: the police
Eck J, Chainey S, Cameron J, Leitner M, Wilson RE and neighborhood safety. Atl Mon 29(3):29 38
(2005) Mapping crime: understanding hot spots. Map-
ping and Analysis for Public Safety/National Institute
of Justice, Washington, DC
Everitt BS, Landau S, Leese M (2001) Cluster analysis, Cross-Covariance Models
4th edn. Oxford University Press, New York
Geary R (1954) The contiguity ratio and statistical map-  Hurricane Wind Fields, Multivariate Model-
ping. Inc Stat 5:115 145
Harries K (1999) Mapping crime: principle and practice.
ing
NCJ 178919, National Institute of Justice/US Depart-
ment of Justice, Washington, DC. Available at http://
www.ncjrs.org/html/nij/mapping/pdf.html CSCW
Hill B (2004) Catching the bad guy. In: Levine N (ed)
CrimeStat III: a spatial statistics program for the anal-
ysis of crime incident locations (version 3.0), Chapter  Geocollaboration
10. Ned Levine & Associates, Houston; National Insti-
tute of Justice, Washington, DC
LaVigne N, Wartell J (1998) Crime mapping case studies: Cuda/GPU
success in the eld, vol 1. Police Executive Research
Forum and National Institute of Justice/US Depart-
ment of Justice, Washington, DC Cheng-Zhi Qin
LeBeau JL (2004) Distance analysis: man with a gun calls State Key Laboratory of Resources &
for Servicein Charlotte, N.C., 1989. In: Levine N (ed) Environmental Information System, Institute of
CrimeStat III: a spatial statistics program for the analy-
sis of crime incident locations (version 3.0), Chapter 4. Geographic Sciences & Natural Resources
Ned Levine & Associates, Houston; National Institute Research, Chinese Academy of Sciences,
of Justice, Washington, DC Beijing, P.R. China
Levine N (1996) Spatial statistics and GIS: software
tools to quantify spatial patterns. J Am Plan Assoc
62(3):381 392 Synonyms
Levine N (2004) Risk-adjusted nearest neighbor hierar-
chical clustering. In: Levine N (ed) CrimeStat III: a
General-purpose computing on graphics process-
spatial statistics program for the analysis of crime
incident locations (version 3.0), Chapter 6. Ned Levine ing units (GPGPUs)
Cuda/GPU 389

Definition operations per second (FLOP(s)) than CPUs.


Third, the programmability of modern GPUs
A graphics processing unit (GPU) is an electronic allows programmers to develop GPU-available
circuit originally designed to accelerate real-time algorithms with the aid of GPU programming
computation for computer graphics. As one com- models, such as the compute uni ed device
ponent of the basic hardware inside a modern architecture (CUDA) for NVIDIA Corporation s
personal computer, the GPU is connected to the GPUs, platform-independent OpenCL, and so C
central processing unit (CPU) through a system on. Furthermore, their comparatively low cost
bus. For the purpose of fast image rendering, and good cost-performance ratios make GPGPUs
which requires that the whole process of image popular in current parallel computation.
rendering should be completed within one frame Scienti c computing bene ted from the use
(typically 1/30 s), the GPU has been inherently of GPGPUs, in which matrix operation was one
designed as a highly parallelized processor con- of the rst successful cases. During the rst
taining many cores, high memory bandwidth, and 10 years of the twenty- rst century, developers
single-instruction multiple-data (SIMD) execu- explored the signi cant acceleration performance
tion (Lindholm et al. 2008; Garland and Kirk of GPUs in a wide range of application domains,
2010). such as physical simulation, computational
In recent years, the high performance of chemistry, medical image processing, as well
modern GPUs has motivated researchers to as geocomputation.
explore general-purpose computing on GPUs
(GPGPUs). This has resulted in GPUs taking over
the computational tasks traditionally performed Historical Background
by CPUs, especially compute-intensive, data-
parallel tasks (Owens et al. 2007). The successful The GPU was rst proposed by NVIDIA through
and wide use of GPGPUs is due to several its release of GeForce 256 in 1999 (Fig. 1). A
attractive features of GPUs that have evolved GPU is also referred to as a visual processing
rapidly in recent years. First, modern GPUs per- unit (VPU) by ATI Technologies, e.g., with the
mit data to be bidirectionally transferred between release of its Radeon 9700 in 2002. Although
GPU and CPU, where previously data could there are many GPU producers, most GPUs are
only be transferred from the CPU to the GPU. produced by NVIDIA and ATI (ATI was acquired
Second, state-of-the-art GPUs can achieve one to by AMD Inc. in 2006). Driven by demand for 3D
several orders of magnitude higher oating-point graphics from the game industry, GPU producers

Cuda/GPU, Fig. 1 Evolution of GPUs


390 Cuda/GPU

have continually improved their performance and CUDA, which was rst released by NVIDIA in
capacities. 2007, brought GPGPU widespread popularity.
In the early 2000s, both NVIDIA and ATI CUDA is a C-language extension for general-
added programmable capacity and oating- purpose programming used exclusively for
point support to GPUs. These improvements recent NVIDIA GPUs (NVIDIA Corp. 2012).
make it possible to off-load the non-graphical Other main programming models for GPGPU
calculations from CPUs to GPUs. One of the rst include DirectCompute from Microsoft, which
attempts to use GPUs in scienti c computing was is speci c to newer Windows operating system,
the matrix multiplication function developed in and OpenCL, which is designed by Apple Inc.
2001 (Larsen and McAllister 2001). This new and maintained by Khronos Group. Since rst
trend was represented by the term GPGPU, being released in 2009, OpenCL has been an
which was proposed by Dr. Mark Harris in 2002 industry standard for GPGPUs because it not
(See http://GPGPU.org/about. Accessed on 8 Jan only provides capabilities similar to CUDA but
2015.). also has programming portability across GPUs,
Without easy-to-use GPGPU programming multicore processors, and operating systems
tools, the use of GPGPUs would not be practical, (Stone et al. 2010; Munshi 2012).
nor would it have received such widespread With the many applications of GPGPU,
acceptance. Although graphics application GPU producers continually enhance their
programming interfaces (APIs) such as OpenGL computational performance. The computing
and DirectX were released in the 1990s to aid capacity of GPUs has been doubled every
in the development of graphics applications, 12 18 months and is several times higher than
these graphics APIs were inconvenient for that of contemporary CPUs (Lindholm et al.
developing non-graphical applications. Instead 2008) (Fig. 2). NVIDIA s GPUs with Fermi

Cuda/GPU, Fig. 2 Comparison of computational capacity (unit: 10 billion FLOP(s) or GFLOP(s)) between NVIDIA
GPU and Intel CPU (Adapted from NVIDIA Corp. 2012)
X

XML XML Triple

 deegree Free Software  Knowledge Representation, Spatial


 Extensible Markup Language

XML Based Vector Graphics

 Scalable Vector Graphics (SVG)

© Springer International Publishing AG 2017


S. Shekhar et al. (eds.), Encyclopedia of GIS,
DOI 10.1007/978-3-319-17885-1
Z

Z/I Imaging Zernike Polynomials

 Intergraph: Real-Time Operational Geospatial  Biomedical Data Mining, Spatial


Applications

Zernike

 Biomedical Data Mining, Spatial

© Springer International Publishing AG 2017


S. Shekhar et al. (eds.), Encyclopedia of GIS,
DOI 10.1007/978-3-319-17885-1

You might also like