Edition - 2013 - Encyclopedia of GIS (3) - Annotated PDF
Edition - 2013 - Encyclopedia of GIS (3) - Annotated PDF
Edition - 2013 - Encyclopedia of GIS (3) - Annotated PDF
Editors
Encyclopedia of GIS
Second Edition
123
Editors
Shashi Shekhar Hui Xiong
University of Minnesota Management Science and Information Systems
Minneapolis, MN Department
USA Rutgers Business School
Rutgers, The State University of New Jersey
Xun Zhou Newark, NJ
Department of Management Sciences USA
Tippie College of Business
University of Iowa
Iowa City, IA
USA
xi
xii Foreword by Brian Berry
Geographic information systems date from the 1960s, when computers were
mostly seen as devices for massive computation. Very significant technical
problems had to be solved in those early days: how did one convert the
contents of a paper map to digital form (by building an optical scanner from
scratch); how did one store the result on magnetic tape (in the form of a
linear sequence of records representing the geometry of each boundary line as
sequences of vertices); and how did one compute the areas of patches (using
an elegant algorithm involving trapezia). Most of the early research was about
algorithms, data structures, and indexing schemes and, thus, had strong links
to emerging research agendas in computer science.
Over the years, however, the research agenda of GIS expanded away
from computer science. Many of the technical problems of computation were
solved, and attention shifted to issues of data quality and uncertainty, the
cognitive principles of user interface design, the costs and benefits of GIS, and
the social impacts of the technology. Academic computer scientists interested
in GIS wondered if their research would be regarded by their colleagues as
peripheral – a marginally interesting application – threatening their chances
of getting tenure. Repeated efforts were made to have GIS recognized as
an ACM Special Interest Group, without success, though the ACM GIS
conferences continue to attract excellent research.
The entries in this encyclopedia should finally lay any lingering doubts to
rest about the central role of computer science in GIS. Some research areas,
such as spatiotemporal databases, have continued to grow in importance be-
cause of the fundamental problems of computer science that they address and
are the subject of several respected conference series. Geospatial data mining
has attracted significant attention from computer scientists as well as spatial
statisticians, and it is clear that the acquisition, storage, manipulation, and
visualization of geospatial data are special, requiring substantially different
approaches and assumptions from those in other domains.
At the same time, GIS has grown to become a very significant application
of computing. Sometime around 1995, the earlier view of GIS as an assistant,
performing tasks that the user found too difficult, complex, tedious, or
expensive to do by hand, was replaced by one in which GIS became the
means by which humans communicate what they know about the surface of
the Earth, with which they collectively make decisions about the management
of land, and by which they explore the effects of alternative plans. A host
of new issues suddenly became important: how to support processes of
xiii
xiv Foreword by Michael Goodchild
It has been over 7 years since the publication of first edition of the Ency-
clopedia of GIS. During this period of time, we have witnessed numerous
significant advances in mobile technology and disruptive development in
business that are transforming the world: the widespread use of smartphones,
the increasing popularity of mobile apps, the wide deployment of location-
based services (LBSs), the fast-growing taxi-hailing services like Uber, the
evolution of mobile social networks, and, more recently, the global interests
in big data, unmanned aerial vehicles, and self-driving vehicles to improve
people’s lives. While various disciplines have been contributing to these new
advances, spatial computing and GIS techniques no doubt are playing a key
role here. For instance, localization is a fundamental issue for smartphones,
connected and self-driving vehicles, unmanned aerial vehicles, taxi-hailing
services, etc. Location information and location privacy are the essentials of
LBSs. Check-in recommendation is a key function of mobile social networks.
The study of spatial big data, such as Global Positioning System (GPS) traces
of vehicles and global climate data, helps people better understand human
mobility patterns as well as Earth climate change. Consequently, an influential
2011 report on big data from McKinsey included a chapter on location-based
big data.
To acknowledge the growth, the Association of Computing Machinery
(ACM) formed a special interest group, namely, SIGSPATIAL, and its annual
meeting attracts over 300 attendees. In addition, the Computing Research
Association’s Computing Community Consortium organized a multi-sector
multidisciplinary workshop titled “From GPS and Virtual Globes to Spatial
Computing 2020” at national academies in 2012 to assess the state of
the art and catalyze new research visions. A summary of the workshop
report appeared in the Communications of the ACM in January 2016 as the
cover article titled “Spatial Computing.” In summary, experts in GIS-related
fields and researchers from other disciplines have shown strong interests in
understanding these new spatial technologies and developments. Therefore,
we believe it is the time to develop the second edition of the encyclopedia
and include entries on the new emerging topics.
The second edition of the Encyclopedia of GIS also provides us an
opportunity to enhance the topic coverage and content timeliness of the first
edition. While over 200 entries across 50 different fields were included in
xv
xvi Preface
the first edition, there are still a few important topics left out, such as basic
concepts in GIS and GPS. As suggested by GIS colleagues, we have included
some of these topics in the second edition. Moreover, new research advances
on some existing fields of the first edition are also updated either by adding
new entries or through the revision of existing entries.
The second edition inherited all the key features from the previous edition.
Typical entries are 3000 words with sections such as definition, scientific
fundamentals, application domains, and future trends. Regular entries include
key citations and a list of recommended reading materials regarding the
literature. The encyclopedia is also simultaneously available as an HTML
online reference with hyperlinked citations, cross-references, four-color art,
links to web-based maps, and other interactive features.
It is worth noting that the first edition of the Encyclopedia of GIS has been
well received by a broad audience in the industry and academia. It is available
at thousands of libraries worldwide as well as on third-party websites such
as Google Books. By March 2016, the cumulative downloads via Springer
have been more than 133,000 not counting additional downloads via other
websites such as Google Books. Furthermore, it has received numerous
recognitions such as the CHOICE Outstanding Title Award. At the University
of Minnesota, the encyclopedia has been used as teaching materials in
two graduate-level spatial computing and spatial database research courses.
Its articles were used for the Fall 2014 Coursera’s massively open online
course titled “From GPS and Google Maps to Spatial Computing,” with
over 21,800 students from 182 countries. We hope that the second edition
could continue serving the research community and the general public as a
helpful introductory material to GIS, a resourceful research reference, and an
illustrative GIS textbook.
The second edition includes 25 additional fields that are either previously
absent from the first edition or recently emerged as new research topics. Each
field has typically 3–10 articles. These fields include spatial computing infras-
tructure, spatial cognitive assistance, volunteering geographic information
(VGI), GPS-denied environment, statistically significant spatiotemporal pat-
tern mining, mobile economy, mobile recommender systems, spatial network
routing, spatial optimization, web-based GIS (industry perspective), location-
based recommendation systems, linear anomaly window detection, intelligent
transportation, GPU-based spatial computing, spatiotemporal analysis of
climate data, geospatial weather and climate nexus, spatial statistics, concepts
in spatial statistics, data science for GIS applications, 3D modeling and
analysis, geometric nearest-neighbor queries, modeling of spatial relations,
concepts in statistics for spatial and spatiotemporal data, high-performance
computing in GIS, and trends. Furthermore, there are two fields, road network
databases and constraint databases and data mining, which have been updated
Preface xvii
by the original editors with new concepts added or existing articles revised to
accommodate more recent research results and technical advances.
August 2016 Shashi Shekhar
Hui Xiong
Xun Zhou
0-9
Crisp clustering is a technique to cluster objects The crisp clustering algorithm has been used
into group without having overlapping partitions. ubiquitously in many fields and areas such as web
Each data point is either belongs to or not to mining, spatial data analysis, business, prediction
a group. Most of the clustering algorithms are based on groups, and much more. In the past
categorized as crisp clustering. There are several few years, a number of algorithms have been
invented and proposed for various applications. According to Ng (1994), PAM is an expensive
These algorithms can be represented based on its algorithm in finding medoid. This is due to
categories as follows. its properties that exchange the medoid with
other objects until all of the objects meet the
Partitional Algorithms requirement as a medoid.
• k-means • CLARA (Clustering Large Applications)
k-means is the most widely used crisp clus- CLARA used PAM as part of its tech-
tering algorithms in various applications such nique. From a set of data, it produced
as machine learning, statistical analysis, and multiple samples and applies PAM on the
computer visualization. k-means was invented samples
by MacQueen in 1967 to deal with the prob- • CLARANS (Clustering Large Applications
lem of data clustering (MacQueen 1967). The based on Randomized Search)
aim of this clustering technique is to optimize By combining its technique with PAM,
the objective function which can be described CLARANS started the process by searching
as follows: a graph on each node that has a potential
solution. This process produced a set of
X
c X
ED d.x; mi / (1) k medoid. Medoid will be replaced after
iD1 x2ci
this process and clusters will be produced.
Produced clusters are a neighboring cluster of
From the Eq. (1), the cluster center of Ci is mi, the existing clustering. In this technique, node
while d is the distance from point x to point will be selected and compared to user-defined
mi . In the equation, the criterion of function E number. CLARANS moves to another node
will minimize the distance between point and neighbor to start the process when the best
cluster center. A set of C cluster centers was candidate is found. If not, the local optimum
chosen at the initial step. Then, each object is is found, and node will be selected randomly
assigned to the nearest cluster center. The cen- to search a new local optimum.
ters are then recomputed, and the process con-
tinues until the cluster center stops changing.
• PAM (Partitioning Around Medoid) Hierarchical Algorithms
This algorithm attempts to find the medoid for Hierarchical algorithm is a method that produces
each cluster. The algorithm starts by searching the hierarchy of clustering. The application of
the nearest objects that are located in the clus- these clustering approaches could be found in
ter. The algorithm of PAM first will compute various fields such as modern biology, biological
a k representative object which is a medoid. taxonomy, as well as computer science and
A medoid is an object that has a very min- engineering. According to Theodoridis and
imal average dissimilarity. After finding the Koutroumbas (2009), hierarchical algorithms
medoid, each object is grouped to the nearest could be divided into two categories:
medoid, where object i is grouped into cluster
Pi when medoid mPi is the nearest than other • Agglomerative algorithms: The algorithms
medoids. produce a decreasing number of clusters
in each step. Two nearest clusters will be
d.i; mPi /£d.i; mk /for all x D 1; : : : ; k (2) merged to produce sequences of clustering
schemes.
The k number of objects is expected to min- • Divisive algorithms: Contrary to the agglom-
imize the objective function of PAM. The erative algorithms, these algorithms produce
objective function is described as follows: an increasing number of clustering each step.
X Each group is split into two clusters to produce
d.i; mpi /A (3) sequences of clustering scheme.
3D Crisp Clustering of Geo-Urban Data 3
The example of some hierarchical based algo- • STING (Statistical Information Grid-based
rithm could be described as follows: method) 0-9
STING is proposed by Wang et al. (1997). It
• BIRCH divides the space or region into several rect-
BIRCH by Zhang et al. (1996) uses CF-tree angular cells based on hierarchical structure.
as a hierarchical structure to partition a point Statistical parameters (i.e., min, max, mean,
dataset. BIRCH is also the first algorithm that etc.) are used to calculate numerical feature
could handle noise efficiently. of each object in the cell. Then clustering
• CURE information is represented based on the hier-
CURE by Guha et al. (1998) select points archical structure of the grid cell. This clus-
from a set of data and then pull them toward tering approach offers the efficiency of search
the cluster center. To cater the large volume queries.
application such as large database, CURE will • WaveCluster
use the combination of random sampling tech- WaveCluster is invented by Sheikholeslami
nique and partition clustering. et al. (2000). This algorithm is invented from
signal processing and frequency domain. The
Density-Based Algorithms process started by imposing multidimensional
This type of algorithm considers a cluster as a grid structure onto the space. Information is
region in the n-dimensional space. Most of these represented by grid cell and will be trans-
algorithms do not enforce any restriction to the formed using wavelet transformation. To find
produced result. It has the ability to handle the the cluster, a dense region in the transforma-
outliers. The time complexity is O.N2 / which is tion domain needs to be identified.
suitable for large data processing.
Massive 3D geospatial dataset would be very Zhu et al. (2007), Deren et al. (2004), and Zla-
complex to be constellated in the database sys- tanova (2000). Based on those studies and re-
tem. Thus, data model is used as a guideline views, most researchers agree that the transition
to manage all these data. By using data model, of 2D R-tree structure to 3D R-tree would be a
geospatial data will be transformed into a set of starting point toward a promising 3D spatial in-
rows and records in the database. This dataset is dex structure. R-tree index structure was invented
then retrieved, processed, and analyzed to trans- by Guttman in (1984). It is a simple data structure
form it into valuable information. However, due that bounded objects with minimum bounding
to the large volume of geospatial data, perfor- rectangle (MBR). The structure of 3D R-tree and
mance of data retrieval is easily deteriorated dur- original R-tree is not much different even after the
ing query operation due to the inspection and ex- transition of its dimensionality. However, when
amination process of each row and record in the the R-tree is extended to the third dimension (3D
database. In some applications, performance of R-tree), the minimum bounding volume (MBV)
data retrieval is very important. For example, in between nodes is frequently overlap. In certain
business service application, retrieving customer case, MBV of node could also be covered by the
information on the specific time is important for other MBV. Overlapping node is the main reason
efficient delivery service. For service-based busi- for the low efficiency of query performance due
ness, punctuality is very important for company to multipath query and replicated data entry.
reputation. Fast data retrieval is also important for In several cases of urban application such as
emergency response application such as hospital real-time application, geospatial data or urban ob-
and fire station. In this case, time management is ject is frequently updated. Thus, rows or records
very important because each of every second is in the database will be modified through the pro-
meaningful. cess of data updating such as insert, delete, and
Since time is very important for data retrieval, update. This process is actually affecting the in-
a specific technique is required to boost up the dex structure of 3D R-tree. In certain case, nodes
performance during query operation. In spatial in the tree structure are overflown with M +1
database, spatial access method is used to support entries or underflow with n < minimum entry, m.
efficient spatial selection, especially for range In these cases, nodes may need to be merged
queries, map overlay, spatial analysis, and spatial with other node or split using splitting operation.
join. However, without spatial indexing, full table Splitting operation is the most critical process for
scans need to be performed in order to meet spa- R-tree index structure (Fu et al. 2002; Liu et al.
tial selection criterion. Therefore, spatial index- 2009; Korotkov 2012; Sleit and Al-Nsour 2014).
ing is required to address object efficiently with- At this phase the tree structure will be altered,
out examining every row and record. In spatial and, at the same time, it should produce minimal
database, the development of 2D spatial indexing overlapping node, minimal coverage area, and
is well established compared to 3D counterpart. minimal tree height. These issues become critical
2D spatial index structures are not the best fit when it comes to 3D. The minimization of over-
solution to be used for 3D geospatial data since lap coverage of MBV is more complex, and the
the data types and relationships between objects splitting operation requires a different approach
are defined differently than in 2D. Until now, than in 2D.
a well-established index structure for 3D spatial Crisp clustering considers non-overlapping
information is still an open research problem. partitions in its approach. Thus, each object either
Thus, a dedicated index structure for 3D geospa- belongs to or not to a class. This characteristic is
tial information is significant for efficient data suitable with the aim of R-tree (Guttman 1984)
retrieval. structure which is an object that will be appeared
The effort of developing 3D spatial indexing only once in an index node. The idea is to cluster
could be seen in several researches and studies; 3D geospatial data based on classes. Each class
see Wang and Guo (2012), Gong et al. (2009), represents a node or MBV of 3D R-tree. This
3D Crisp Clustering of Geo-Urban Data 5
3D Crisp Clustering of
Geo-Urban Data, Fig. 1
3D R-tree workflow
6 3D Crisp Clustering of Geo-Urban Data
Q
P R
S
W
T
V
P Q R S T U V W
U
P1, Q1, R1, S1, T1, U1, V1, W1,
P2, Q2, R 2, S 2, T 2, U2, V 2, W 2,
P3, Q3, R 3, S 3, T 3, U 3, V 3, W 3,
P4 Q4, R 4, S 4, T 4, U 4, V 4, W 4, Parent Node
P5, Q5, R 5, S 5, T 5, U 5, V 5, W 5,
P6, Q6, R 6, S 6, T 6, U 6, V 6, W 6,
P7, Q7, R 7, S 7, T 7, U 7, V 7, W 7, Child Node
P8, Q8, R 8, S 8, T 8, U 8, V 8, W 8,
P9, Q9, R 9, S 9, T 9, U 9, V 9, W 9,
P10, Q10, R 10, S 10, T10, U10, V10, W10,
P11, Q11, R 11, S 11, T11, U11, V11, W11,
….. …. ……. ….. ….. ….. ….. ….. Results of the First Cycle
….. … ……. ….. ….. ….. ….. …..
Number of Clusters = 8
….. ….. ……. ….. ….. ….. ….. …..
Maximum Entry = 25
…..... ….. ……. ….. ….. ….. ….. …..
P19 Q25 R29 S20 T22 U23 V 34 W29 Nodes for Second Cycle = R, V and W
3D Crisp Clustering of Geo-Urban Data, Fig. 2 Clustered objects using crisp clustering
45%
23%
Original R-Tree
k-means Clustering
Proposed Crisp
Clustering
3D Crisp Clustering of Geo-Urban Data, Fig. 3 Comparison of overlap percentage and coverage percentage
Thus, MBVs R, V, and W are qualified for the clustered based on the proposed approach. The
next cycle. In the second cycle, each node will be input data of these 3D buildings are based on
split and divided into two subgroups of R (Sub- LoD 2 (Level of Detail) of CityGML format. The
R1and Sub-R2), V (Sub-V1and Sub-V2), and W cluster classes for this dataset are set to 20 with
(Sub-W1and Sub-W2). maximum entry M where M is set to 25 for
To evaluate the efficiency of the proposed each class. Classes are then formed into MBV of
approach in constructing and producing efficient 3D R-tree. The result of this experiment is then
structure of 3D R-tree, a set of 3D vector data compared with the original R-tree and original k-
are tested in this experiment. The datasets are means crisp clustering. Figure 3 shows the com-
from 3D volumetric objects (i.e., 3D building). parison of overlapping percentage and coverage
For the first experiment, a set of 500 buildings percentage of the proposed approach with other
in an urban area as represented in Fig. 3 are approaches.
3D Crisp Clustering of Geo-Urban Data 7
0-9
Exhaustive
• Overlap Percentage = 97%
New Linear
3D Crisp Clustering of Geo-Urban Data, Fig. 4 Percentage of overlap using different approaches
The same dataset in Fig. 3 is tested with for lision detection. However, problem arises while
node splitting operation. As mentioned in the utilizing this approach such as visiting node more
previous section, splitting operation of 3D R- than once and the transformation of node into
tree should preserve minimal overlapping among local coordinate system (Figueiredo et al. 2010).
node, minimal coverage area, as well as tree As a consequence, the performance for the col-
height. In this test three different splitting ap- lision detection process will be deteriorated. By
proaches are used for comparison purposes such using 3D R-tree based on the crisp clustering
as new linear (Ang and Tan 1997), exhaustive R- approach, the process of finding collision detec-
tree (Guttman 1984), and crisp clustering. From tion would be very efficient without visiting node
the Fig. 4, the percentage of total overlap between repetitively.
nodes indicates that crisp clustering offers a min-
imal percentage which is 20%. Meanwhile, the
percentage for original exhaustive R-tree is 97% Real-Time Application
and new linear 88%. Real-time application such as in-vehicle satellite
navigation or web-based system is exposed with
active data updating operation such as updated
coordinate information and number of online
Key Application users information. To retrieve a set of data within
a specific time, a performance booster such as
Collision Detection 3D R-tree spatial indexing could be used for
Collision detection is important in many com- this application. Frequent data updating process
puter graphics and visualization. Usually classi- needs an efficient index structure with minimal
cal hierarchical traversal scheme is used for col- overlap.
8 3D Crisp Clustering of Geo-Urban Data
Point Cloud Data Management Arthur D, Vassilvitskii S (2007) k-means++: the ad-
Dealing with millions of point cloud data col- vantages of careful seeding. In: Proceedings of the
eighteenth annual ACM-SIAM symposium on discrete
lected from airborne sensors or terrestrial laser algorithms, New Orleans. Society for Industrial and
scanner often creates many problems in data Applied Mathematics, pp 1027–1035
management and visualization. In this case, spa- Deren L, Qing Z, Qiang L, Peng X (2004) From 2D
tial indexing is used to retrieve points efficiently and 3D GIS for CyberCity. Geo-Spat Inf Sci 7(1):1–5.
doi:10.1007/bf02826668
from a huge and massive dataset. One of the Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-
famous spatial indexing techniques used for this based algorithm for discovering clusters in large spa-
application is R-tree index structure. However, R- tial databases with noise. Paper presented at the pro-
tree suffers with serious overlap among nodes, ceeding of 2nd international conference on knowledge
discovery and data mining, Portland
which could cause multipath query and deterio- Figueiredo M, Oliveira J, Araújo B, Pereira J (2010) An
rates the performance of data retrieval. By using efficient collision detection algorithm for point cloud
the crisp clustering algorithm, the risk of having models. In: 20th international conference on computer
multipath query could be reduced and increase graphics and vision, Warsaw. Citeseer, p 44
Fu Y, Teng J-C, Subramanya S (2002) Node splitting
the efficiency of search and query operation to- algorithms in tree-structured high-dimensional indexes
ward a massive point cloud collection. for similarity search. In: Proceedings of the 2002 ACM
symposium on applied computing, Madrid. ACM,
pp 766–770
Gong J, Ke S, Li X, Qi S (2009) A hybrid 3D spatial access
Future Directions method based on quadtrees and R-trees for globe data.
74920R–74920R. doi:10.1117/12.837594
Based on our observation, 3D R-tree has its own Guha S, Rastogi R, Shim K (1998) CURE: an efficient
limitation during the data updating operation. clustering algorithm for large databases. SIGMOD
Rec 27(2):73–84. doi:10.1145/276305.276312
Whenever the updating process occurs, such as Guttman A (1984) R-trees: a dynamic index structure
insert operation or delete operation, the tree struc- for spatial searching. SIGMOD Rec 14(2):47–57.
ture needs to be revised and all nodes including doi:10.1145/971697.602266
root node need to be modified. This cost may be Hinneburg A, Keim DA (1998) An efficient approach to
clustering in large multimedia databases with noise.
significant for the frequent update application or Paper presented at the proceedings of the 4th ACM
moving objects. Besides that, it also could reduce SIGKDD, New York
the processing time and increase the performance Korotkov A (2012) A new double sorting-based node
efficiency. Thus, a special technique in handling splitting algorithm for R-tree. Programm Comput
Softw 38(3):109–118
data updating using R-tree without the revision of Kovács F, Legány C, Babos A (2005) Cluster validity
its structure would be a very interesting topic for measurement techniques. In: Proceeding of sixth inter-
future directions of this study. national symposium Hungarian researchers on compu-
tational intelligence (CINTI), Barcelona. Citeseer,
Liu Y, Fang J, Han C (2009) A new R-tree node splitting
algorithm using MBR partition policy. In: 2009 17th
Cross-References international conference on geoinformatics, Fairfax.
IEEE, pp 1–6
MacQueen J (1967) Some methods for classification and
Access Method
analysis of multivariate observations. In: Proceedings
R-tree of the fifth Berkeley symposium on mathematical
Spatial Indexing statistics and probability, Berkeley, p 14
Ng RT, Han J (1994) Efficient and effective clustering
methods for spatial data mining. In: Proceedings of the
20th VLDB conference, Santiago
References Sheikholeslami G, Chatterjee S, Zhang A (2000)
WaveCluster: a wavelet-based clustering approach
Ang CH, Tan TC (1997) New linear node splitting al- for spatial data in very large databases. VLDB J
gorithm for R-trees. In: Scholl M, Voisard A (eds) 8(3–4):289–304. doi:10.1007/s007780050009
Advances in spatial databases, vol 1262. Lecture notes Sleit A, Al-Nsour E (2014) Corner-based splitting: an
in computer science. Springer, Berlin/Heidelberg, improved node splitting algorithm for R-tree. J Inf Sci.
pp 337–349. doi:10.1007/3-540-63238-7_38 doi:10.1177/0165551513516709
3D Indoor Models and Their Applications 9
Historical Background
3-D GIS Indoor mapping and modeling has received an
increased level of attention during the last decade
Validation of Three-Dimensional Geometries (Worboys 2011; Zlatanova et al. 2014). Indoor
10 3D Indoor Models and Their Applications
space differs from outdoor space in many aspects: door models such as Industry Foundation Classes
the space is smaller and closed; there are many (IFC) or CityGML LOD4. Although this is a
constraints such as walls, doors, stairs, and furni- valuable approach, it is often insufficient. The
ture; the structure is multilayered, frequently con- existing models might be outdated, incomplete,
taining intermediate and irregular spaces; and the or even not existing. In such cases, new measure-
lighting is largely artificial and so forth (Figs. 1 ments are required using a range of sensors and
and 2). To be able to represent indoor spaces in processing techniques. The processed raw data
a proper manner, many data acquisition concepts, are then organized in 3D geometry representa-
data models, and ISO/OGC standards have to be tions such as 3D vector (B-reps, CSG, BIM) and
defined or redefined to meet the requirements of 3D raster (or dense colored point clouds). Some
indoor spatial applications (Figs. 3 and 4). of these representation have semantics and topol-
Indoor models representing 3D information ogy. The tendency is to identify semantics and
can be generated by using various manual, semi- topology at very early stage of data processing, to
automatic, and automatic methods. The global re- avoid post-processing and the so-called semantic
search trends are focused on finding methods for enrichment of geometric models (Billen et al.
automatic generation. Many of them are on model 2014). Azri et al. (2012) has identified several
transformation such as generation of application- possible approaches for automatic generation of
specific indoor models from general digital in- 3D indoor models (Table 1).
3D Indoor Models and Their Applications, Fig. 1 Example of obstacles (left) and intermediate floors (right)
3D Indoor Models and Their Applications, Fig. 2 Examples of “rooms inside rooms” (left) and complex layered
structures (right)
3D Indoor Models and Their Applications 11
IfcSpace
ExteriorObject (Building)
RelatingSpace (Boundedby)
IfcRelSpaceBoundary
RelatedBuildingElement
Ifcwall
Ifcwall InteriorWallSurface
Room (space
surrounded by
surfaces)
IfcSpace
3D Indoor Models and Their Applications, Fig. 4 Conceptual difference between IFC (left) and CityGML LOD4
(right) for modeling interiors (Courtesy Fillipo Mortari)
Computer-aided design (CAD) and lately and detailed models did not focus on maintenance
the architecture, engineering, and construction of attributes and lack the support of geodetic
(AEC) are the oldest domain offering 3D tools reference systems. Although CAD models offer a
for representation of indoors. CAD was primarily convenience in representing indoor information,
developed for engineers responsible for designing several drawbacks of CAD models have been
and building facilities (Azri et al. 2012). It is easy revealed. For instance, CAD is only a platform to
to compute and design with CAD tools due to its design and model geometries. Thus, information
friendly environment and dynamic interaction. such as attribute, topology can only be tagged
CAD tools which were dealing with large-scale externally during the design process. Some
12 3D Indoor Models and Their Applications
3D Indoor Models and Their Applications, Table 1 Approaches and Methodologies of Automatic Indoor Model
Generation
Generation approaches Method(s) to be utilized Enriching semantics Enriching geometry
Document analysis • Text analysis Documents recordings Documents recordings
• Speech analysis
• Video analysis
Data fusion • Data processing ID tags CAD files Docu- CAD/GIS files Point
• Model integration ments clouds Videos/images
Model transformation • Transformation BIM city models BIM city models
User-based • SLAM GUI/software GUI/software
new extensions of CAD/AEC (Bentley Systems, a priority topic of research in GIS society.
Autodesk products) do allow the maintenance of Today, CityGML is the best known model for
topology and semantics but in a quite vendor- 3D indoor modeling. CityGML is developed
dependent way. Therefore, the topology and for representing 3D city geometry, (a kind
semantics are lost when the model is exported to of) topology, and thematic-semantic modeling.
another software tool. If the information attached CityGML can be used to represent buildings
to the model is not transferred together with the and building parts and properties in different
model, the users can only interpret information levels of detail (LOD) (i.e., from LOD0 up to
from what they have seen through the model. In LOD4). CityGML LOD4 provides a semantic-
addition, if the building model was developed thematic model for representing indoors. The
with low level of detail, there may not be much indoor objects are much less than the objects
geometric and semantic information that can be that can be represented in IFC. However, their
extracted and used. simplicity seems quite sufficient for a large group
Building information model (BIM) is the next of outdoor and indoor applications (Billen et al.
stage in the digital representation of building 2014).
interiors and facilities. BIMs can be used to Which of the two most prominent standards
model building information in 3D with the sup- will be used for 3D indoor modeling depends
port of an intelligent database that contains in- very much on the application. CAD/BIM domain
formation for design decision making, produc- has been traditionally dealing with very large-
tion of accurate construction documents, pre- scale representations, while GIS with very small
diction of performance factors, cost estimating, scale (up to km). In the last decade a fusion
design scenario planning, and construction plan- and overlap between the two domains is ob-
ning. BIM is an object-oriented, semantically served (Fig. 3). However, there are fundamental
rich model. The spatial relationships between difference between the two models related to
building elements are maintained in hierarchi- the conceptual definition of the indoor objects.
cal manner. It maintains many geometric prim- IFC objects are defined from the view of the
itives ranging from simple B-reps to free form constructor and the CityGML LOD4 from the
curves and surfaces. Today, the most prominent view of the user (Fig. 3). IFC is very appropriate
BIM standard is the Industry Foundation Classes to maintain information about construction parts
(IFC). of building as concrete walls, slabs, and columns.
3D indoor models are investigated by CityGML is focused on the modeling of the
researchers in GIS domain as well. Digital city visible environment such as surfaces of the walls
models have become widely used for digital as part of one room or surfaces of walls as part
representation of major cities. With the advent of the façade of a building. This poses numerous
of 3D city models such as in Google Earth, challenges to the transformations between the
CityGML, and others, indoor modeling became two models (Fig. 4).
3D Indoor Models and Their Applications 13
The interest, research, and developments in dicates whether it can be used for navigation or
modeling indoors resulted in the first standard not (Fig. 5). The topology can then be derived
dedicated to indoor navigation, i.e., IndoorGML. automatically from the semantic following the
Similar to all OGS standards, IndoorGML is duality-graph principle.
designed to represent and allow exchange of
geo-information that is intended to support in-
door navigation applications. As mentioned pre- Key Applications
viously, the characteristics of CityGML and IFC
might be not sufficient (either too complex or Indoor applications have been traditionally not a
lacking information) for all kinds of indoor ap- topic of research of GIS community. Designers,
plications. Indoor navigation requires a specific constructors, and engineers have been worked
semantics and a topological (connectivity) model, and used 3D indoor representations for model-
which would allow user-oriented path computa- ing airflow simulation, smoke modeling, interior
tion. IndoorGML semantics, geometry, and con- design, and facility management. However, the
nectivity can be derived from other 3D indoor two prominent indoor applications are indoor
models such as IFC and CityGML following the navigation and facility management.
rule of the model. In contract to CityGML and
IFC, IndoorGML requires complete subdivision Indoor Navigation
of the space into cellular units. The subdivision Generally speaking, a navigation system consists
can be done with respect to different themes: of the following components: positioning of a
topographic theme (i.e., representing the internal user, calculation of a best path (cheapest, fastest,
structure of the building) or sensor theme (rep- safest, etc.) to some destination(s), and guidance
resenting the coverage of Wi-Fi access points) along the path. Indoor navigation is a very promi-
or security theme (representing accessible areas nent and active research area. It has been origi-
due to security restrictions) (Becker et al. 2008). nated from navigation robots and it moved to hu-
Therefore, the semantics is quite general; it in- man navigation in the last two decades. However,
14 3D Indoor Models and Their Applications
it remains a challenging topic for several reasons: for coding location. Research on semantic ex-
indoor positioning is not very accurate, users can pression of spatial relationships, directions, and
freely move within the building, topology model locations such as “in room 321,” “on the second
(or path network) construction process may not floor,” as well as “two meters from the second
be straightforward due to complexity of indoor window” and “12 steps from the door,” has been
space, and humans need an appropriate guidance. discussed by a number of researches, e.g., Billen
Many papers have provided extended overview et al. (2014).
on navigation systems and models (2D and 3D) As mentioned previously, the 3D indoor mod-
to support indoor navigation (Afyouni et al. 2012; els can be generated in various ways. Becker
Montello 1993; Fallah et al. 2013; Bandi and et al. (2013) presented an approach based on
Thalmann 1998; Zlatanova et al. 2014). The ma- shape grammars applied to point clouds. Shape
jority of the indoor models found in current liter- grammars have been proven to be successful and
ature are still mostly 2D. They very often ignore efficient to deliver volumetric LOD2 and LOD3
architectural characteristics such as number of models, and the next challenge is its applica-
doors, openings, and windows. The granularity tion to indoor modeling, i.e., LOD4 models. In
of the models is still very low, i.e., they do building interiors, where the available observa-
not take into consideration moveable obstacles tion data may be inaccurate, the shape grammars
(such as furniture), of functional spaces such as can be used to make the reconstruction process
“coffee corner,” “resection area,” etc. Still most robust and verify the reconstructed geometries.
of the topological models used for navigation The potential benefit of using the grammar as a
are predefined, are pre-computed, and cannot support for indoor modeling was evaluated in the
reflect dynamic changes as closing because of study based on an example in which the grammar
renovations. There is a vast amount of research has been applied to automatically generate an
in the area of indoor navigation and localiza- indoor model from erroneous and incomplete
tion. Several conferences have been organized traces, gathered by foot-mounted MEMS/IMU
annually by various international organizations positioning systems.
(ACM SIGSPATIAL, ISPRS, LBS, ICA, etc.). Point clouds are widely used for generation of
For example, the Indoo3D conference organized 3D indoor models. They can be created using
in December 2013 discussed topics related to in- difference range techniques or from images
door model definition, model generation, indoor and videos. Obtaining the vector model can
localization, and indoor navigation applications. be also done using many different approaches
Agreeing on standards for indoor models is and algorithms. El Meouche et al. (2013)
one of the most investigated topics. It is well investigated automatic reconstruction of 3D
understood that standards will speed up the appli- building models from terrestrial laser scanned
cation development. Some researchers take into data. They proposed a surface reconstruction
consideration not only the internal structure of a technique for buildings by processing data from
building but also the manner people can be local- a 3D laser scanner. Funk et al. (2013) presented a
ized indoors to be able to give directions. Com- paper on implicit scene modeling from imprecise
monly geographical coordinates do not make point clouds. The authors stated that when
sense to humans. Humans, however, understand applying optical methods for automated 3D
expressions such as “10 m left from the door” and indoor modeling, the 3D reconstruction of objects
“at front of the restaurant.” Xiong et al. (2013) and surfaces is very sensitive to both lighting
presented the work on a multidimensional indoor conditions and the observed surface properties.
location and information model, which aims to This ultimately compromises the utility of
define absolute, relative, semantic, and metric the acquired 3D point clouds. The authors
expression of location. The model is comple- presented a reconstruction method which is based
mentary to 3D concepts such as CityGML and upon the observation that most objects contain
IndoorGML and is accepted as Chinese standards only a small set of primitives. The approach
3D Indoor Models and Their Applications 15
combined sparse approximation techniques from presented at the workshop focused on the use
the compressive sensing domain with surface of Wi-Fi technologies in indoor positioning. Ver-
0-9
rendering approaches from computer graphics. bree et al. (2013) investigated how Wi-Fi based
The amalgamation of these techniques allows indoor positioning can be used in museum envi-
a scene to be represented by a small set of ronment to navigate three categories of users: vis-
geometric primitives as well as generating itors, employees and emergency services. They
perceptually appealing results. The resulting compared two different Wi-Fi-based localization
surface models are defined as implicit functions techniques. The first one is based on Wi-Fi scan-
and may be processed using conventional ners, i.e., Libelium Meshlium Wi-Fi scanner. The
rendering algorithms, such as marching cubes, to second method was the traditional Wi-Fi fin-
deliver polygonal models of arbitrary resolution. gerprinting. In a similar research, Chan et al.
Wohlfeil et al. (2013) expressed the impor- (2013) worked on improving Wi-Fi fingerprinting
tance of using multi-scale sensor systems and by applying a probabilistic approach, based on
photogrammetric approaches in 3D reconstruc- previously recorded Wi-Fi fingerprint database.
tion. The authors discussed that 3D surface mod- In addition, the authors developed a 3D modeling
els with high resolution and high accuracy are of module that allows for efficient reconstruction of
great importance in many applications, especially outdoor building models to be integrated with in-
if these models are true to scale. As a promising door building models. The architecture consisted
alternative to active scanners (e.g., light section, of a sensor module for receiving, distributing,
structured light, laser scanners, etc.), the authors and visualizing real-time sensor data and a web-
believe that new photogrammetric approaches are based visualization module for users to explore
attracting more attention. They use modern struc- the dynamic urban life in a virtual world.
ture from motion (SfM) techniques, using the Research on algorithms for indoor navigation
camera as the main sensor. Their research com- is also very intensive with the aim to adapt
bined the strengths of novel surface reconstruc- them to the human perception and understanding.
tion techniques from the remote sensing sector Particular indoors, well-known outdoor strategies
with novel SfM technologies resulting in accurate as the shortest and the fastest path might be not
3D models of indoor and outdoor scenes. Starting relevant, while the safest, or less crowded, might
with the image acquisition, all particular steps to be of relevance. Applications that support indoor
a final 3D model were explained in their study. navigation and way finding have become one
The most prominent topic in indoor navigation of the booming industries in the last couple of
is indoor localization. The indoor localization is years. In spite of this, the algorithmic support for
in demand for a variety of applications within indoor navigation has been left mostly untouched
the built environment, and an overall solution so far, and most applications mainly rely on
based on a single technology has not been de- adapting Dijkstra’s shortest path algorithm to an
termined yet. This research is developed rather indoor network. In outdoor spaces, several alter-
independently from the indoor modeling. The native algorithms have been proposed by adding
focus is on the technology that would allow a more cognitive notion to the calculated paths
localizing a person in a building, and therefore and adhering to the natural way-finding behavior
the indoor model is used mostly for visualization (e.g., simplest paths, least risk paths). The need
of the location. In the context of localization, for indoor cognitive algorithms is highlighted by
3D indoor models have been used for improving a more challenging navigation and orientation
the localization accuracy (Girard et al. 2011; Liu requirements due to the specific indoor structure
et al. 2015). Many different localization tech- (e.g., fragmentation, less visibility, confined ar-
nologies are investigated indoors as well (Fallah eas) (Vanclooster et al. 2013).
et al. 2013). Much attention is given to WLAN Today, various indoor applications are avail-
applications, which does not require a person to able on the market. Google Maps, Open Street
carry specialized devices. Two research papers Map (the 3D indoor project), airports, museums,
16 3D Indoor Models and Their Applications
3D Indoor Models and Their Applications, Fig. 6 Visualization of a navigation path in 3D environment: Paris
airport (left) and Hubei Museum (right) (Xu et al. 2013)
and shopping malls have their own indoor nav- server which makes the mobile client very
igation applications. (Fig. 6, left) The real 3D lightweight.
applications are however still very sparse. One of • The network used for navigation is extracted
the reasons is that 3D visualization of enclosed semiautomatically and renewable.
indoor spaces is usually more disturbing than • The graphical user interface (GUI), which is
guiding; the other reason is that the calculations based on a game engine, has high performance
that are performed on 2D plans and 3D models of visualizing 3D model on a mobile display.
are therefore not maintained. Xu et al. (2013) (Fig. 6, right)
presented a 3D model-based indoor navigation
system for a museum in Wuhan, China. The Facility Management
system was based on a 3D model, organized in Facility management is an area of research, which
DBMS on a server and game engine for visual- is increasingly gaining attention. Building owners
ization on android device. The authors argue that are actively seeking for models that can give
3D models are more powerful because 3D models answer to questions as “how much paint do I need
can provide accurate descriptions of locations for the renovation of floor x,” “what is the area of
of indoor objects (e.g., doors, windows, tables), the window frames that have to be pained,” and
which are exhibited in walls and shelves. The “how many square meters of carpet do I need for
experimental system is an example of a flexible room y.” Facility managers need to have informa-
client-server, user-oriented applications. The sys- tion about pipes and cables in case regular checks
tem is composed of three layers: mobile app, web and/or failures. Local governments, institutions
services, and a database (PostGIS). There were performing taxation, and so forth are also
three main strengths of this system: becoming interested in systems, which can easily
compute net areas and volumes of apartments
• It stores all data needed in one database and offices. All these questions usually require
and processes most calculations on the web information about vertical elements, internal
3D Indoor Models and Their Applications 17
structure of buildings, and even “invisible” mation about floors and year of building, which
information about pipe and cables integrated in is used to estimate the thickness of the walls.
0-9
walls and floors/ceilings. IFC and CityGML are
very often compared and discussed, but still there Monitoring of Indoor Environments
is no agreement which model is more appropriate. Internet of Things (IoT) will be a key concept
For daily building and facility management, in monitoring of indoor environments. The IoT
IFC appear to be too heavy and complex and concentrates on making every physical and vir-
numerous solutions are investigated considering tual “thing” a publisher of information. The IoT
CityGML. approach enables “things” to publish information
Several 3D indoor models have been devel- once a state change occurs in them or in predeter-
oped with the ultimate goal of finding an interme- mined intervals. For instance, in a building that
diate solution between IFC and CityGML. Hijazi implements the IoT concepts, a door will publish
et al. (2012) presents a model that integrates the information such as “I am closed now!” or a
building structure concepts of CityGML with the light bulb will indicate “I am on at the moment.”
IFC concepts to provide simplified 3D model for In addition, the “things” will become capable of
maintenance of utility networks. The model is taking actions based on messages coming from
accessed by a simple application, which allows other “things” or humans. A building will be
facility managers to explore and query their elec- considered as a living entity, and applications
tricity and water facilities (Fig. 7). will require information from the “things” (i.e.,
Boeters et al. (2015) argue that CityGML real and virtual) and the “models” (such as City
should be extended with more indoor LOD to be GML/IndoorGML) in real time. In essence, ap-
able to deal with some of the building taxation plications such as Smart Buildings would require
issues such as area and volume computation. The the fusion of information acquired from mul-
authors propose a new LOD2+ which enriches tiple resources, such as things, models, virtual
LOD2 with floor indoor information (Fig. 8). The objects, and real objects. The efficient moni-
floors are volumes; the thickness of exterior walls toring of indoor environments will be directly
is taken into consideration. The LOD2+ is created proportional with the effectiveness in provision
automatically from LOD2 and additional infor- and fusion of real-time information related to
3D Indoor Models and Their Applications, Fig. 7 Google Earth- based prototype of the 3D facility management
application (Hijazi et al. 2012)
18 3D Indoor Models and Their Applications
3D Indoor Models and Their Applications, Fig. 8 Example of a LOD2+ buildings with indoor information about
floors (left) and the same building in reality (right)
indoors. By the utilization of ubiquitous monitor- the people in the rooms can interact with the IoT
ing of indoors, the information regarding build- nodes (to control sensor and actuators) to let them
ing elements would be available 24/7 regardless out of that building part. IoT provides unique
of the situation (i.e., which can be emergency opportunities for indoor monitoring.
or nonemergency). Building and city dashboard
applications would be the main consumers of
this ubiquitous information. Combining semantic Future Directions
information coming from the indoor models with
IoT data provides advantages in answering the 3D indoor models are going to be further ex-
emergency scene questions such as “would you plored, adjusted, and explored as the demand
provide the average CO2 level in the rooms which for indoor is increasing. Research in support of
are not affected by the fire?” and “would you indoor mapping and modeling has been an active
provide the number of doors which are open in field for over 30 years. 3D indoor modeling
the floors that are affected by the flood?” As research is related to all aspects of creating of
another example, in a fire response operation, digital models of the real world: data acquisi-
an emergency responder will acquire information tion, data structuring, visualization techniques,
from the sensors located in each floor regarding applications, and legal issues and standards. The
the spreading of the fire; in response, he can research topics are investigated by a large group
then invoke the web services to interact with of scientist coming from photogrammetry, com-
IoT Nodes which will then invoke the actuators puter vision and image analysis, computer graph-
to close the doors in certain floors to prevent ics, robotics, laser scanning, and many other
spreading of the fire to other floors. Furthermore, technologies. 3D indoor models are no longer a
machine-to-machine (M2M) autonomous inter- research area of engineers, planners, construc-
action is also possible, and a sensor can collect in- tors, and designers. GIS specialists as well as
formation regarding the emergency situation and governments, commercial enterprises, and indi-
interact with another IoT Node to perform a pre- viduals are also beginning to seek and apply 3D
ventive action. As another sample, sensors in the indoor models in their business applications. This
building can interact with the actuators to close reshaping of the users poses higher requirements
doors to prevent some parts of the building from to the models and the tools that would use them.
being flooded by water; in fact, if there would be There are many problems before the 3D indoor
people in these parts of the building, they can be models become commonly available, standard-
trapped as they cannot get out. In this situation, ized, and used for the development of flexible
3D Indoor Models and Their Applications 19
Real-time Navigation
modelling queries and
Mobility Real-time multiplicity of
Dynamic change targets
Real-time Natural
Emerging abstraction visualization Security and
acquisition of Travelling description of
problems levels of
dynamic Discovering Complexity imperatives indoor
access
environments the context of visualization environments
space Discrete vs
Learning the Real-time Privacy
Aural cues continuous
composition of Integration navigation decision
space with GIS/BIM Guidance support Copyright
models
3D Indoor Models and Their Applications, Fig. 9 Challenges in indoor mapping and modeling
and 3D Geo-information Sciences, Springer, pp. 60– global matching. Int Arch Photogramm Remote Sens
77. Springer, Berlin/Heidelberg Spat Inf Sci XL-4/W4:37–43
Becker S, Peter M, Fritsch D, Philipp D, Baier P, Dibak Worboys M (2011) Modelling indoor space. In: Pro-
C (2013) Combined grammar for the modeling of ceedings of the third ACM SIGSPATIAL interna-
building interiors. ISPRS Ann Photogramm Remote tional workshop on indoor spatial awareness, Chicago,
Sens Spat Inf Sci II-4/W1:1–6 pp 1–6
Billen R, Cutting-Decelle A-F, Marina O, de Almeida J-P, Xiong Q, Zhu Q, Zlatanova S, Huang L, Zhou Y, Du Z
Caglioni M, Falquet G, Leduc T, Métral C, Moreau G, (2013) Multi dimensional indoor location information
Perret J, Rabino G, San Jose R, Yatskiv I, Zlatanova model. Int Arch Photogramm Remote Sens Spat Inf
S (2014) 3D city models and urban information: cur- Sci XL-4/W4:11–13
rent issues and perspectives, European COST Action Xu W, Kruminaitea M, Onrusta B, Liu H, Xiong Q, Zla-
TU0801, EDP science, 130 p tanova S (2013) A 3D model based imdoor navigation
Boeters RK, Ohori A, Biljecki F, Zlatanova S (2015) Au- system for Hubei provincial museum. Int Arch Pho-
tomatically enhancing CityGML LOD2 models with togramm Remote Sens Spat Inf Sci XL-4/W4:51–55
a corresponding indoor geometry. Int J Geogr Inf Sci Zlatanova S, Sithole G, Nakagawa M, Zhud’Q (2013)
29:2248–2268 Problems in indoor mapping and modelling. Int
Chan S, Sohn G, Wang L, Lee W (2013) Dynamic WIFI- Arch Photogramm Remote Sens Spat Inf Sci
based indoor positioning in 3D virtual world. Int Arch XL-4/W4:63–68
Photogramm Remote Sens Spat Inf Sci XL-4/W4: Zlatanova S, Liu L, Sithole G, Zhao J, Mortari F (2014)
1–6 Space subdivision for indoor applications. GISt Report
El Meouche R, Rezoug M, Hijazi I, Dieter M (2013) 66, 2014
Automatic reconstruction of 3D building models from
terrestrial laser scanner data. ISPRS Ann Photogramm
Remote Sens Spat Inf Sci II-4/W1:7–12
Fallah N, Apostolopoulos I, Bekris K, Folmer E (2013)
Indoor human navigation systems: a survey. Interact 3D Models
Comput 25(1):21–33
Funk E, Dooleya LS, Boernerb A, Griessbachb D
(2013) Implicit surface modeling from imprecise point Photogrammetric Products
clouds. Int Arch Photogramm Remote Sens Spat Inf
Sci XL-4/W4:7–12
Girard G, Côté S, Zlatanova S, Barette Y, St-Pierre J,
Van Oosterom P (2011) Indoor pedestrian naviga- 3D Network Analysis for User Centric
tion using foot-mounted IMU and portable ultrasound
range sensors. Sensors 11(8):7606–7624
Evacuation Systems
Hijazi I, Ehlers M, Zlatanova S (2012) NIBU: a new
approach to representing and analyzing interior utility Umit Atila, Ismail Rakip Karas, and
networks within 3D geo-information systems. Int J Yasin Ortakci
Digit Earth 5(1):22–42
Liu L, Xu W, Penard W, Zlatanova S (2015) Leveraging
Department of Computer Engineering, Karabuk
spatial model to improve indoor tracking. In: Fuse T, University, Karabuk, Turkey
Nakagava M (eds) ISPRS Arch Photogramm Remote
Sens Spatial Inf Sci, XL-4/W5, pp 75–80
Montello, D. (1993). Scale and Multiple Psychologies of
Space. Spatial Information Theory: A theoretical basis Introduction
for GIS. A. Frank and I. Campari. Berlin, Springer-
Verlag. Lecture Notes in Computer Science 716: Research on evacuation of high-rise buildings
312–321
in case of disasters such as fire, terrorist at-
Vanclooster A, Viaenea P, Van de Weghea N, Fack V, De
Maeyer P (2013) Analyzing the applicability of the tacks, indoor air pollution incidents, etc., has
least risk path algorithm in indoor space. ISPRS Ann become popular in the last decade. In case of
Photogramm Remote Sens Spat Inf Sci II-4/W1:19–26 such disasters, people inside the buildings should
Verbree E, Zlatanova S, van Winden K, van der Laan
be evacuated out of the area as soon as possible.
E, Makri A, Taizhou L, Haojun A (2013) To localise
or to be localised with WiFi in the Hubei museum? However, organizing a quick and safe evacuation
Int Arch Photogramm Remote Sens Spat Inf Sci is a difficult procedure due to the complexity
XL-4/W4:31–35 of high-rise buildings and the huge number of
Wohlfeil J, Strackenbrock B, Kossykb I (2013) Auto-
people occupied inside such buildings. Besides,
mated high resolution 3D reconstruction of cultural
heritage using multi-scale sensor systems and semi- problems such as smoke inhalation, confluence,
3D Network Analysis for User Centric Evacuation Systems 21
panic, and inaccessibility of some exits may arise analysis using Oracle Spatial and Graph within a
during the evacuation procedure. Therefore, an Java-based 3D-GIS implementation. As an initial 0-9
efficient user-centric evacuation system should implementation, a GUI provides a 3D visual-
be developed for quick and safe evacuation from ization of a building. A network model based
high and complex buildings. on CityGML data stores spatial data in Oracle
Routing someone to an appropriate exit in database and then performs network analysis un-
safety can only be possible with a system that der different constraints, such as avoiding nodes
can manage the 3D topological transportation or links in the network model. All experiments
network of a building. Realizing an evacuation of highlighted in this chapter are performed on the
a building in such systems also called navigation 3D model of the Corporation Complex in Putra-
systems by guiding people in real time requires jaya, Malaysia.
complex analysis on 3D spatial data. Sections “Evacuation Process” and “Evacua-
Interest on 3D navigation systems has in- tion Systems” summarize evacuation process and
creased especially after the 9/11 attacks, and evacuation systems, respectively. Section “Vi-
many researchers concentrated on how a safe and sualization of 3D Network Models for Evac-
quick evacuation could be realized in case of such uation” gives examples of visualization of 3D
disasters (Lee 2007). Most of the navigation sys- building and network models from the CityGML
tems operate on 2D data to find and simulate the format. Section “Representing Network Model in
shortest path (which is lacking building environ- Geo-DBMS” gives some information on storing
ment) (Musliman and Rahman 2008). Therefore, spatial data and explains how to create Net-
there is a need for different approaches which use work Models in Oracle Spatial and Graph. Sec-
the 3D objects and eliminate the network analysis tion “Network Analysis Tool” introduces a 3D
limitations on multilevel structures (Cutter et al. network analysis tool and gives visualized re-
2003; Pu and Zlatanova 2005; Kwan and Lee sults of 3D network analysis performed by our
2005; Zlatanova et al. 2004). proposed 3D-GIS implementation. Section “Sim-
In a study conducted by Kwan and Lee (2005), ulation of User Centric Evacuation” elaborates
relative accessibility of the emergency response the routing engine integrated in the simulation
between a disaster site and an emergency station module and presents a visualization sample.
in a building was measured. Their results showed
that extending 2D-GIS to 3D-GIS representations
of the interiors of high-rise buildings can improve Evacuation Process
the overall speed of the rescue process.
Most of the GIS researchers use graph net- One of the most dangerous disasters threatening
works for indoor routing and evacuation analy- the high-rise and complex buildings is fire in
sis (Karas et al. 2006; Jun et al. 2009). While which most of the people may lose their lives
most of the 3D visualization problems have been due to smothering rather than burning. In case
solved by CityGML, initial requirements, con- of fire disasters, extraordinary indoor air pollu-
cepts, frameworks, and applications from a wide tion (EIAP) incidents happen suddenly and cause
point of view have been represented by some fatal consequences such as airlessness, excessive
other research such as (Pu and Zlatanova 2005; temperature, explosions, and smoke and toxic gas
Musliman et al. 2006). However, there is still leakages. Table 1 indicates the number of people
a lack of implementation of 3D network anal- died due to various reasons after a residential
ysis and navigation specifically for evacuation fire incident (Holborn et al. 2003). As it can be
purposes. deduced from Table 1, the major death cause was
The objective of this study is to investigate breathing in smoke, followed by combination of
and implement 3D visualization and navigation burning and smothering.
techniques and solutions for indoor spaces within There are three main stages in extraordinary
3D GIS. We explain how to perform 3D network indoor air pollution incidents. In the first stage,
22 3D Network Analysis for User Centric Evacuation Systems
3D Network Analysis for User Centric Evacuation Sys- 3D Network Analysis for User Centric Evacuation Sys-
tems, Table 1 Number of people died due to various tems, Table 2 The main factors that triggered occupant
reasons after a residential fire incident (Holborn et al. evacuation in buildings (Wood 1972; Bryan 1977)
2003)
The main factors that triggered England % USA %
Reason of death Number of people Percentage occupant evacuation
who lost their lives Smoke 34:0 35:1
Inhalation 101 36 Shouting and voices 33:0 34:7
Smothering 8 3 Flames 15:0 8:1
Burned bronchus 8 3 Noise 9:0 11:2
Burning 53 19 Alarm 7:0 7:4
Combination of 69 25 Others 2:0 2:8
burning and
smothering
Others 20 7
Injuries due to heart 20 7
attack stroke and The studies also indicated when an alarm
falling system sounds, occupants spend the most criti-
cal time period to understand the reason of the
alarm rather than evacuating the building. Also,
studies indicated that the occupants give different
occupants are not affected by smoke, gas, or tem- responses based on the type and method of alarm
perature; therefore, this stage is the most appro- system or content and time of the announce
priate stage for evacuation. In the second stage, (Bryan 2002). Uncertainty and insufficient infor-
the occupants are heavily exposed to smoke, toxic mation during the event may delay the evacuation
gas, and excessive temperature. procedure.
In previous studies, the behaviors of the oc- The second stage of evacuation is movement
cupants are analyzed in the two main stages dis- time or action time. Movement time is the period
cussed in the previous paragraph during a disaster between the time people react to escape from the
(Purser and Bensilum 2001). The first stage is building and the time they reach out of the build-
the premovement time or response time, and the ing or some safe place in the building (Purser and
second stage is the movement time or action Bensilum 2001). Movement time varies based
time. Premovement time is defined as the period on two main factors: exit preferences and smoke
between the time alarm systems activates and the problems.
time people react to escape from the building. Current evacuation systems assume that
Table 2 compares the main factors that triggered occupants use the closest exit in a time of
occupant evacuation in buildings in England and emergency evacuation. Table 3 indicates the
the USA (Wood 1972; Bryan 1977). This indi- results from a study where the preferences of
cates that the effect of alarm systems in initiating the occupants were investigated in a building
people to react is unexpectedly low. where there was one emergency exit door and
A study conducted by Purser and Bensilum one entrance door located in opposite locations
(2001) in a shopping mall indicated that when to each other. As seen in the Table 3, most of
occupants are informed by announcement sys- the guests used the entrance which they were
tem, most of the evacuation time procedure was more familiar with (Mawson 1980), while almost
realization of a need to evacuate, rather than all of the occupants use the emergency exit
movement time. Figure 1 indicates that the per- door. People use the closest exit only if they
centages of realization, response, and reaction know the building well (Gwynne et al. 1999).
times were 65%, 16%, and 19%, respectively. When the guidance of the evacuation systems is
Therefore, premovement time is 81% of the total insufficient, people consider various factors in
evacuation time. choosing the evacuation path.
3D Network Analysis for User Centric Evacuation Systems 23
60
Time (min)
50
40
30
20
10
0
1 2 3 4 5 6 7 8 9 10 11
Occupants
3D Network Analysis for User Centric Evacuation Sys- 3D Network Analysis for User Centric Evacuation Sys-
tems, Table 3 Exit preference rates of people (Sime tems, Table 4 Percentages of occupants returning back
1985) due to low sight distance
Exit preference Guest Occupant Total Visibility (meter) England (%) USA (%)
Entrance door 37 1 38 0–2 29:0 31:8
Emergency exit door 24 13 37 3–6 37:0 22:3
7–12 25:0 22:3
13–30 6:0 17:6
Previous studies reported that when occupants 31–36 0:5 1:3
encounter a smoke problem, they keep moving 37–45 1:0 0
through the smoke if the sight distance is more 46–60 0:5 4:7
than 20 m; however, they hesitate and do not take > 60 1:0 0
the risk when sight distance is less than 20 m
(Bryan 1995). Thus, smoke is a serious problem
which affects the movement time in evacuation at the early stages of a disaster; and evacua-
process. People slow down in smoke, and they tion lighting to allow occupants to continue to
cannot determine an optimum evacuation path or navigate (Fig. 2). Traditional evacuation systems
cannot follow a straight route due to diminished are not sufficient for safe and quick evacuation
sight distance (Jin 1976). However, it can some- of today’s high-rise and complex buildings (Pu
times be necessary to pass through a smoke area and Zlatanova 2005). These evacuation systems
for survival. Based on a previous study, Table 4 are not flexible due to their static predefined
indicates the percentages of occupants returning scenarios. This may guide people to block exits
back due to low sight distance in smoked zones or places where there are gas leakages. Also,
(Bryan 1995). traditional evacuation systems become useless
when sight distance is very low due to smoke
and electricity cuts. They also provide insuffi-
Evacuation Systems cient evacuation information, especially for peo-
ple who are not familiar with the building.
Traditional evacuation systems can be divided Emergency incidents are not static, but they
into three main groups: sensors to detect heat, are dynamic and variable events. However,
smoke, or radiation; alarm system to alert people traditional evacuation instructions are generally
24 3D Network Analysis for User Centric Evacuation Systems
Alarm devices
Detectors &
Sensors Control room Evacuating
people
Evacuation lights
3D Network Analysis for User Centric Evacuation Systems, Fig. 2 The components of current evacuation systems
(Pu and Zlatanova 2005)
facilitating work with the CityGML and JOGL Representing Network Model in
Java bindings for the OpenGL graphic library to Geo-DBMS
0-9
carry out visualization of 3D spatial objects.
CityGML is introduced as one of the inter- While CityGML is used to store and visualize 3D
national standards for representing and exchang- spatial objects, graph model managed in a geo-
ing spatial data, making it easier to visualize, database management system (DBMS) is used to
store, and manage 3D city models data efficiently. perform 3D network analysis. Oracle Spatial and
CityGML is able to represent 3D city models in Graph is one of the most powerful geo-DBMS,
five well-defined Level of Details (LOD), namely, which offers a combination of geometry models
LOD0 to LOD4. The accuracy and structural and graph models (Murray 2009).
complexity of the 3D objects increases with the Oracle Spatial and Graph maintains a combi-
LOD level where LOD0 is the simplest LOD nation of geometry and graph models within the
with a two-and-a-half-dimensional Digital Ter- Network Data Model. A spatial network consists
rain Model, while LOD4 is the most complex of nodes and links which are SDO_GEOMETRY
LOD including architectural details with interior objects representing points and lines, respectively
structures. LOD1 is the well-known blocks model (Kothuri et al. 2010).
comprising prismatic buildings with flat roofs. Network support in the Oracle database is
Differentiated from LOD1, LOD2 has roof struc- composed of the following elements:
tures. LOD3 denotes architectural models with
detailed wall and roof structures and balconies • A data model to store networks inside the
(Gröger et al. 2008). database as a set of network tables: This is the
The implemented system reads CityGML persistent copy of a network.
datasets from LOD0 to LOD2. 3D building • SQL functions to define and maintain net-
models are represented in LOD2 described by works (i.e., the SDO_NET package).
polygons using the Building Module of CityGML • Network analysis functions in Java program-
(Fig. 3). Network models are represented as ming language: The Java API works on a copy
linear networks in LOD0 using CityGML’s of the network loaded from the database. This
Transportation Module (Fig. 4). is the volatile copy of the network.
3D Network Analysis for User Centric Evacuation Systems, Fig. 3 Building model (textured viewing mode)
26 3D Network Analysis for User Centric Evacuation Systems
3D Network Analysis for User Centric Evacuation Systems, Fig. 4 Network model
• Network analysis functions in PL/SQL (the and content to model the network. A node table
SDO_NET_MEM package). (see Table 5) describes all nodes in the network.
Each node has a unique numeric identifier (the
Figure 5 illustrates the relationship between NODE_ID column). A link table (see Table 6)
the elements of the Oracle Network Model describes all links in the network. Each link has a
(Kothuri et al. 2010). unique numeric identifier (the LINK_ID column)
To define a network in Oracle Spatial and and contains the identifiers of the two nodes it
Graph, at least two tables should be created. connects (Kothuri et al. 2010).
These are node and link tables. These tables In this study, as we define a spatial network
should be provided with the proper structure containing both connectivity and geometric
3D Network Analysis for User Centric Evacuation Systems 27
3D Network Analysis for User Centric Evacuation Sys- 3D Network Analysis for User Centric Evacuation Sys-
tems, Table 5 Example entry in node table in network tems, Table 6 Example entry in link table in network 0-9
model model
NODE_ID 230 LINK_ID 15
NODE_NAME NODE-230 START_NODE_ID 452
GEOMETRY MDSYS.SDO_GEOMETRY(3001, END_NODE_ID 455
NULL,MDSYS. SDO_POINT_TYPE
(42.2019449799705,100.382921548 LINK_NAME Link-452-455-Corridor
946, 3.7),NULL,NULL) GEOMETRY MDSYS.SDO_GEOMETRY(3002,
ACTIVE Y NULL,NULL,MDSYS. SDO_ EL-
EM_INFO_ARRAY(1,2,1),MDSYS.
SDO_ORDINATE_ARRAY (115.30
6027729301,85.9775129777152,1.8,
information, we use SDO_GEOMETRY for 115.306027729301,82.9483382781
representing points and lines. 573,1.8))
For completing the network creation process, LINK_LENGTH 3,029174699557899
Oracle Spatial and Graph needs a metadata table ACTIVE Y
called USER_SDO_NETWORK_METADATA
LINK_TYPE Corridor
(see Table 7) to ensure the table structures are
consistent with the metadata. The metadata
table USER_SDO_NETWORK_METADATA
describes the elements that compose a network manually. CREATE_SDO_NETWORK function
such as names of the tables and names of the creates all the structures of a network, but it is
optional columns. not flexible as it gives very little control over
There are two choices to create a network. One the actual structuring of the tables. Sample code
can either prefer to create network automatically given below illustrates creation of CORPORA-
using CREATE_SDO_NETWORK function of TION_PUTRAJAYA network with explicit table
SDO_NET package or prefer to create network and column names.
SQL >BEGIN
SDO_NET.CREATE_SDO_NETWORK (
NETWORK D> ‘CORPORAHON_PUTRAJAYA’,
NO_OF_HIERARCHY-LEVELS D> 1,
IS_DIRECTED D> FALSE,
NODE_TABLE_NAME D> ‘CORP_NETWORK_NODE’,
NODE_GEOM_COLUMN D> ‘GEOMETRY’,
NODE_COST_COLUMN D> NULL,
LINK_TABLE_NAME D> ‘CORP_NETWORK_LINK ’,
LINK_GEOM_COLUMN D> ‘GEOMETRY’,
LINK_COST_COLUMN D> ‘LINK_LENGTH’
);
END;
The alternative way is to create the network statements. Manual creation gives total flexibility
tables manually. When defining network over the table structures, but one must ensure
manually, one has to create all needed tables that the table structures are consistent with the
and insert proper data into tables using SQL metadata.
28 3D Network Analysis for User Centric Evacuation Systems
If we want to define a set of constraint for and MustAvoidLinks. Once we define the
any of analysis methods to limit the search SystemConstraint object, we can pass it as last
space, we simply define a SystemConstraint parameter to any of the analysis methods of the
object. The SystemConstraint class allows to NetworkManager class. The following sets a
define constraints such as MaxCost, MaxDepth, constraint to avoid use of link identified by 6012,
MaxDistance, MaxMBR, MustAvoidNodes, 6013, and 6014 in the network.
3D Network Analysis for User Centric Evacuation Systems, Fig. 6 UML diagram summarizing network analysis
process
3D Network Analysis for User Centric Evacuation Systems, Fig. 7 Shortest path between two nodes without any
constraint
3D Network Analysis for User Centric Evacuation Systems, Fig. 8 Recalculated shortest path considering avoided
elevators in a part of building
Figure 7 shows the shortest path analysis result with one of elevators are avoided, shown by red
on a graphical screen without any constraint. The lines which means that elevator is not in use
found path follows nodes 3059-3067-3066-3366- any more. Updated path to destination follows
3367-3359-3365-3368. Figure 8 shows how the nodes 3059-3065-3070-3069-3369-3370-3365-
shortest path is updated after links associated 3368.
3D Network Analysis for User Centric Evacuation Systems 31
0-9
3D Network Analysis for User Centric Evacuation Systems, Fig. 9 Routing simulation process of the instruction
engine-Scene-1 (The red point is the user)
3D Network Analysis for User Centric Evacuation Systems, Fig. 10 Routing simulation process of the instruction
engine-Scene-2
3D Network Analysis for User Centric Evacuation Systems, Fig. 11 Routing simulation process of the instruction
engine-Scene-3
work currently in progress. In our future study, evacuated and produce the personalized instruc-
we intend to design an intelligent user-centric tions in real time.
evacuation model based on neural networks for
high-rise building fires in which we will consider Acknowledgements This study was supported by
TUBITAK-The Scientific and Technological Research
the physical conditions of the environment and
Council of Turkey research grant [grant number:
the properties of the person that requests to be 112Y050]. We are indebted for its financial support.
4-Intersection Calculus 33
refers to both the thematic diversity and the geo- Fully automatic processes are available that are
metric complexity of the objects. Among the well able to generalize large data sets, e.g., the whole
known map generalization operations the follow- of Germany (Urbanke and Dieckhoff 2006).
ing subset is used for model generalization: se-
lection, (re-)classi cation, aggregation, and area
collapse. Sometimes, also the reduction in the Scientific Fundamentals
number of points to represent a geometric feature
is applied in the model generalization process, Operations of model generalization are selection,
although this is mostly considered a problem of re classi cation, aggregation, area collapse, and
cartographic generalization. This is achieved by line simpli cation.
line generalization operations.
Selection
According to a given thematic and/or geometric
property, objects are selected which are being
Historical Background preserved in the target scale. Typical selection
criteria are object type, size or length. Objects
Generalization is a process that has been applied ful lling these criteria are preserved, whereas the
by human cartographers to generate small scale others are discarded. In some cases, when an
maps from detailed ones. The process is com- area partitioning of the whole data set has to
posed of a number of elementary operations that be preserved, then the deleted objects have be
have to be applied in accordance with each other replaced appropriately by neighboring objects.
in order to achieve optimal results. The dif culty
is the correct interplay and sequencing of the Re-Classi cation
operations, which depends on the target scale, the Often, the thematic granularity of the target scale
type of objects involved, as well as constraints is also reduced when reducing the geometric
these objects are embedded in (e.g., topological scale. This is realized by reclassi cation or new
constraints, geometric and semantic context, . . . ). classi cation of object types. For example, in the
Generalization is always subjective and requires German ATKIS system, when going from scale
the expertise of a human cartographer (Spiess 1:25.000 to 1:50.000, the variation of settlement
1995). In the digital era, attempts to automate structures is reduced by merging two different
generalization have lead to the differentiation settlement types to one class in the target scale.
between model generalization and cartographic
generalization, where the operations of model Area Collapse
generalization are considered to be easier to au- When going to smaller scales, higher-dimensional
tomate than those of cartographic generalization. objects may be reduced to Lower dimensional
After model generalization has been applied, ones. For instance, a city represented as an area
the thematic and geometric granularity of the is reduced to a point; an areal river is reduced
data set corresponds appropriately to the target to a linear river object. These reductions can
scale. However, there might be some geometric be achieved using skeleton operations. For the
con icts remaining that are caused by applying area-to-line reduction, the use of the Medial
signatures to the features as well as by impos- Axis is popular, which is de ned as the locus of
ing minimum distances between adjacent objects. points that have more than one closest neighbor
These con icts have to be solved by cartographic on the polygon boundary. There are several
generalization procedures, among which typi - approximations and special forms of axes (e.g.,
cation and displacement are the most important Straight Skeleton (David and Erickson 1998)).
(for a comprehensive overview, see Mackaness Depending on the object and the task at hand,
et al. 2007). As opposed to cartographic gen- there are forms that may be more favorable than
eralization, model generalization processes have others (e.g., Chin et al. 1995 and Haunert and
already achieved a high degree of automation. Sester 2007).
Abstraction of Geodatabases 37
object -> A
dark object -> object -> object ->
light gray object max_neighbors biggest neighbor equal distribution
to all neighbors
Visualization on Small Displays isting data sets (Hampe et al. 2004). Although
The size of mobile display devices requires the different approaches already exist, there is still
presentation of a reduced number of features. research needed to fully exploit this data structure
To this end, the data can be reduced using data (Sheeren et al. 2004).
abstraction processes.
resulting in major savings in time and fuel, the datasets: (1) traf c accident reports and (2) traf c
two important commodities of the twenty- rst sensor data collected from a historical time stamp
century. According to FASANA Motion report until t0 , the following three sets of parameters
(Report 2012), approximately 50% of the freeway must be predicted:
congestions are caused by nonrecurring issues, (a) The set of road segments that are impacted by
such as traf c accidents, weather hazard, special the incident: {ri }.
events, and construction zone closures. Hence, (b) For each impacted road segment ri , the sig-
it is fairly important to quantify and predict the ni cance of the impact (i.e., scale of speed
impact of traf c incidents on the surrounding decrease): vi .
traf c. This quanti cation can alleviate the (c) For each impacted road segment ri , the time
signi cant nancial and time losses attributed stamp when the impact starts: ti .
to traf c incidents, for example, it can be used In this de nition, a sensor refers to a loop
by city transportation agencies for providing detector or any other sensing device built on a
evacuation plan to eliminate potential congested road segment. It continuously (e.g., every 30 s)
grid locks, for effective dispatching of emergency reports readings (e.g., speed) to re ect traf c sit-
vehicles, or even for long-term policy-making. uation on road segments. In this problem setting,
Moreover, the predictive information can be to quantify the traf c situation on a road segment
either used by a driver directly to avoid potential (e.g., impacted or not), the readings collected
gridlocks or consumed by a predictive route- from the sensors located on this segment are
planning algorithm (e.g., Demiryurek et al. 2011) utilized. Other terms that are seen frequently in
to ensure a driver to select the best route from the this entry are de ned as follows:
start. Impacted Road Segment: For a road segment ri
The McKinsey report (McK 2011) predicts a equipped with a sensor s and time stamp t (e.g.,
worldwide consumer saving of more than $600 8:30 AM), if the speed readings reported by s
billion annually by 2020 for location-based ser- presents an anomalous decrease (e.g., 40% drop)
vices, where the biggest single consumer bene t compared with historical daily readings at time
will be from time and fuel savings from naviga- t (i.e., average of all readings collected at 8:30
tion services tapping into real-time traf c data. AM in the dataset), we consider ri as impacted
Therefore, let us consider a navigation system by traf c events.
utilizing predictive route-planning algorithm as a Backlog: For a particular accident ev, its backlog
next-generation consumer navigation system (in- (b) refers to the total length of all impacted road
car or on smartphone). We notate such systems segments between ev s location and the last im-
as ClearPath, as a motivating application, which pacted road segment, along the opposite direction
can help drivers to effectively plan their routes in of vehicle ow.
real time by avoiding the incidents impact areas. Propagation Behavior: Given a traf c accident
That is, suppose an accident is reported in real (ev) occurred at time t0 , ev s propagation behav-
time (by crowdsourcing (WAZE 2014) or through ior is de ned as a time series of backlog (b) after
agency reports or SIGALERTS (2013)) in front t0 and until it propagate to the maximum back-
of a driver, but the accident is 20 min away. If we log. Assuming ev reaches the maximum backlog
can effectively predict the impact of the accident, after t time units, its propagation behavior is
ClearPath would know that this accident would represented as bE or {b0 ; b1 ; : : : ; bt }, where the
be cleared in the next 10 min. Thereby, ClearPath subscript i for bi represents the time unit after t0 .
would guide the driver directly toward the acci-
dent because it knows that by the time the driver
arrives in the area, there would be no accident. Historical Background
operations research have studied the traf c con- in a faraway future (e.g., the next 30 min). In
gestion problem through mathematical models, fact, the occurrence of most accidents involves
simulation studies, and eld surveys. However, two phenomenon: (1) abrupt speed changes, for A
due to the recent sensor instrumentations of example, it is very common for the traf c speed
road networks in major cities as well as the to drop 60% when an accident occurs on freeways
vast availability of auxiliary commodity sensors in LA and (2) long-lasting propagation of the
from which traf c information can be derived speed changes, for example, a closer sensor to
(e.g., CCTV cameras, GPS devices), for the rst the accident may report a speed decrease in the
time a large volume of real-time traf c data at 3rd min after its occurrence, and a further sen-
very high spatial and temporal resolutions has sor may report similar decrease in the 30th min.
become available. While this is a gold mine of Since traditional prediction approaches rely on
data, the most popular utilization of this data is to the immediate past data to predict the future,
simply visualize and utilize the current real-time they cannot effectively predict the abrupt speed
traf c congestion on online maps, car navigation changes and how they propagate over a long term.
systems, sig-alerts, or mobile applications. Hence, the navigation systems relied on these
However, the most useful application of this data approaches may hardly navigate drivers around
is to predict the traf c ahead of you during the the accident impact area.
course of a commute to avoid traf c congestions,
especially in the presence of traf c accidents.
In the last decade, most of the studies on ac- Scientific Fundamentals
cident impact prediction are based on theoretical
modeling and simulations, which can be clas- For the motivating navigation application,
si ed into three groups: (1) deterministic queu- ClearPath, to be effective, it is essential to predict
ing theory or shock wave theory (e.g., Lawson speci c values of speed changes and backlog
et al. (1997) and Wirasinghe (1978)), (2) heuristic lengths over the lifetime (i.e., temporal) and
methods and simulations (e.g., Pal and Sinha impact area (i.e., spatial) of an accident. In
2002), and (3) microscopic modeling of driver s particular, the following three aspects need to
behavior (e.g., Daganzo (1994) and Wang and be considered:
Murray-Tuite (2010)). However, the outcome of First, the numeric values of speed changes and
these studies relies on theoretical simulations of backlog lengths. There are two major approaches
road network traf c instead of the real-world to measure the impact of accidents: (1)
collected traf c data. Also, none of these studies qualitative approaches (i.e., classify accident s
uses a source of incident data with description impact into conceptual categories such as
variables and reporting techniques, and their spa- severe or non-severe and signi cant delay
tial transferability is limited. or slight delay ) and (2) quantitative approaches
When working with real-world data, it is im- (i.e., providing numeric measurement such as
portant to identify certain characteristics of traf c 45% speed decrease and 3.2 miles of congested
data, such as temporal patterns of rush hours backlog). In the past, most studies focused on
or the spatial impacts of accidents, which need qualitative approaches for measuring impact,
to be incorporated into a data-mining technique which makes the impact harder to quantify
to make the prediction much more accurate. For (e.g., Ozbay and Kachroo 1999). The qualitative
example, for generic time series, the observations measurement may be suf cient for general
made in the immediate past are usually a good decision-making or response analysis, however,
indication of the short-term future. However, for not precise and informative enough for ClearPath.
traf c time series, this is not true in the beginning In section Impact Parameters , the prediction
of a traf c accident. Speci cally, for accident of quantitative information, which provides
impact prediction, it is necessary to predict the numeric measurements of the impact to the
sudden speed changes caused by traf c accidents surrounding areas, is introduced.
42 Accident Impact Prediction
Second, the spatiotemporal behavior of the quanti ed on the surrounding traf c in real time
impact. In previous studies, it was suf cient to using the information from past accidents.
predict the impact of an accident as a single The impact of a traf c accident can be charac-
or a set of aggregate values. For example, in terized in multiple ways. Three typical quanti -
the literature by Pan et al. (2012), the impact is cation impact parameters are (1) impact backlog,
predicted as average speed decrease or average (2) speed decrease caused by the accident, and (3)
of the backlog length. Since the impact region congestion duration.
of an accident evolves over time and space, the Based on the analysis of real-world data,
outcome of prediction approach should be the it is observed that the impact parameters
exact length of time varying backlogs (i.e., evo- vary across accidents with different attributes.
lution of congested spatial span) with different The accident reports normally contain (but
scales of speed changes. The section Impact not limited to) the following metadata: (1)
Propagation will explain the prediction accident date, (2) accident start time, (3)
strategy of the propagation behavior of traf c accident location (i.e., street name, latitude,
accident. longitude), (4) accident type (Note that the
Third, the comprehensive area impacted by a accident type usually refers to one of the
traf c accident. Most of existing researches fo- following: Traf c collision+no/minor injuries,
cused on predicting the impact with respect to the Traf c collision+major injuries/ambulance,
set of upstream road segments impacted from a Traf c collision-no details, Signal alert, Natural
traf c accident (Kwon et al. 2006). In reality, traf- weather hazard, Lane closure and Fire, etc.).,
c incidents may cause surges in traf c demand (5) type of vehicles involved if incident is an
that overwhelm the system in their vicinity with accident, and (6) number of affected lanes. Let us
a radically different ow from typical patterns. consider one of the attributes start time as an
Section Impact on Other Streets explains the example. The impact backlog of accidents that
algorithms to forecast the impact of incidents happen during daytime may be large compared
on the nearby streets and intersecting freeways, with accidents happening at midnight, due to
which can (1) identify a set of road segments that higher traf c ow during the daytime. Thereby,
will be impacted given a new incident and, (2) the key to predict impact parameters (e.g.,
for each impacted road segment, predict the spa- impact backlog) is to investigate which accident
tiotemporal performance decrease, i.e., determine attributes are correlated with them. It is likely
when and how the impact will occur in time and that some accident attributes are irrelevant or
space. redundant for inferring the impact backlog. In
order to identify the most correlated subset, we
rst process the accident attributes as normalized
Impact Parameters features and impact backlog as numerical classes.
In this entry, we utilize two real-world trans- Then we apply the Correlation-based Feature
portation datasets: (1) accident reports and (2) Selection (CFS) algorithm (Hall and Smith
traf c sensor data. And we address the problem 1998) on top of this normalized data to select
of predicting and quantifying the impact of traf c correlated features. From the result obtained from
accidents. By analyzing historic accident data, this procedure, the following accident attributes
the main idea is to classify accidents based on are selected as the most relevant:{start time,
their features (e.g., time, location, type of ac- location, direction, type, #. of affected lanes}.
cident). Subsequently, we model the impact of We use the selected attributes to categorize the
each accident class on its surrounding traf c by traf c accident according to the values of their
analyzing the archived traffic data at the time attributes and utilize the average value of the
and location of the accidents. Consequently, if impact parameters in each category to predict
a similar accident (from real-time accident data) the impact of an accident with corresponding
is observed, its impact can be predicted and attributes.
Accident Impact Prediction 43
Accident Impact b
Prediction, Fig. 1 a 2 2
Impact Backlog (mile)
Sample propagation
1.5
Distance(mile)
Accident Impact
Prediction, Fig. 2 Impact
s4 traffic
of a traf c incident sensor
s2 3
2 traffic
s0 incident
1
potential
impact
s1 direction
s3
important to incorporate more information such attributes, the detected causality can be utilized to
as traf c density measures (e.g., volume and predict the impact in the vicinity area of a traf c
occupancy) to improve the prediction accuracy. accident.
Moreover, the consideration of using a multistep Given the strategy above, the challenge is how
prediction approach that takes into account the to detect the causality between the traf c speed
initial behavior (i.e., sub-pattern of propagation time series. One straightforward idea is to use
behavior) of an incident may further improve the the traditional causality test (e.g.,Granger 1969)
prediction accuracy. to detect the causality. However, with real-world
traf c data, it is observed that hardly any Granger
Impact on Other Streets causality existed between any pair of traf c speed
As illustrated in Fig. 2, the impact caused by a time series. This was a surprising observation and
traf c accident on a freeway may affect the traf c counterintuitive as it is expected strong causal-
ow in the following three types of locations: ity relationship among traf c time series. With
further investigation regarding the unique charac-
(1) Upstream stretch of the occurrence freeway, teristics of traf c speed time series, two types of
(2) Adjacent arterial streets, and time-sensitive causalities that are unique to traf c
(3) Other surrounding freeways. speed time series are discovered. Speci cally,
for two traf c speed time series with correlated
This section focuses on how to forecast the historical patterns, it is observed that sometimes
impact of incidents on the nearby streets and the causality only exists during the beginning
freeways (i.e., the locations (2) and (3)). of rush hours when the traf c starts to become
The intuitive way to predict the impact of congested, named as slowdown causality. Such
accidents on the nearby streets and freeways is causality only exists between two road segments
to identify the causal interactions among traf c that have strong connectivity in the road network.
at different road segments to address the afore- Conversely, in other connectivity scenarios, espe-
mentioned challenges. To identify the causality cially when the two time series are not correlated,
relationship, the main idea is to utilize archived another type of causality is observed that only
traf c sensor datasets to train causality models exists in the presence of traf c accidents and
to determine whether the time series data (e.g., during non-rush hours, named as intervention
collected from s0 in Fig. 2) is useful for predicting causality. Consequently, the detected causalities
other time series data (e.g., collected from s1 ). If can be utilized for predicting the impact of traf c
the change in traf c performance (e.g., decrease accidents, with procedure illustrated in Fig. 3.
or increase in traf c speed) at s0 leads to a change Given that a new incident e has just occurred,
in traf c performance at another location s1 , in its closest upstream sensor s0 is sent to the
the presence of a traf c accident near s0 , then s1 archived database to retrieve the relevant time
could be identi ed as part of the impacted area. intervals for causality detection. It is also utilized
Consequently, given a traf c accident and its to search among the nearby sensors and retrieve a
Accident Impact Prediction 45
Incident e
Real-time & archived traffic dataset
occurred
s0
Causality A
For each Have Real-time
detection traffic speed
e’ info <s 0 , s i > correlated
pattern No for s0
Locate sensors
?
e’s nearest Does Does Identify
sensor s 0 slowdown intervention sensor si as to
causality Yes causality be impacted,
e’s adjacent exist? exist? and predict its
sensors{ s i } traffic speed
Yes Yes
Impact Select important lag(s) based on lasso-
regressive model
prediction granger & re-train regressive model
potential candidate sensor to be impacted. Then, speed data collected from s0 and the learned
the sensor pair < s0 ; si >, together with the regressive model are utilized to predict the speed
corresponding dataset and the causality detection of si .
model, is used to identify whether the slowdown To enable real-time impact prediction, in
causality or intervention causality exists from s0 Fig. 3, the causality detection and important
to si . If the slowdown causality exists, there is variable selection steps need to be implemented
no need to examine the intervention causality off-line for every sensor pair on the road
because the impact of signi cant speed drops networks. Because the training step in the
from traf c incidents is already covered in the regressive model and the lasso approach require
de nition of the slowdown causality. At the access to large amounts of archived traf c time
end of the causality detection, the sensor pairs series data, the causality detection and important
(< s0 ; si >) holding the causality relationship variable selection signi cantly delays the online
can proceed to the next step, and the sensor prediction process due to a great deal of training
pairs (< s0 ; si >) holding neither slowdown time consumption. In this way, when a new
nor intervention causality are disregarded. In incident occurs, the system will search within
the former case, si is considered one of the the off-line training results to identify whether
impacted sensors that can be contributed to the causality exists between the corresponding
spatial impact range. In the latter case, si is sensor pairs and will further retrieve the learned
excluded from the spatial impact range caused by regressive model for the online traf c speed
incident e. For sensor pairs (< s0 ; si >) holding prediction for the sensor to be impacted.
the causality relationship, the next step is to Note that in the domain of social science and
select the most important time stamps (i.e., t C h economics, the causality models have already
given the accident occurs at t ) to identify when been widely applied (Pearl 1988; Glymour et al.
si starts to become impacted. In the pipeline 1987; Spirtes et al. 2001), many of which are
illustrated in Fig. 3, we resort to lasso-Granger superior to Granger causality in multivariate
(Arnold et al. 2007) approach to achieve this causality inference. However, for the impact
step. Note that after lasso, we need to retrain the problem, Granger causality model is a better
regressive model for predicting si based on the candidate for causality detection for the following
selected lag in s0 . Finally, the real-time traf c reasons. First, in this study, the ultimate goal
46 Accident Impact Prediction
is to enable the better prediction of the traffic is the major focus in transportation networks.
time series in the presence of traf c incidents It is entirely possible that traf c at different
by taking advantage of the detected causality locations is causally dependent, and they may
relationship. Revealing the complete causality have more than one cause. However, such cases
relationship among all traf c data on the road are barely useful to this problem, which is
network is not our focus. Compared with predicting the impact by a single cause (i.e., a
other causality inference models, the regressive particular traf c incident). Thus, the Granger
model of Granger causality serves as a fairly causality, even though it ignores multivariate
effective predictor for time series data, such dependencies, is particularly effective for this
as traf c sensor data. Second, for accident purpose.
impact prediction problem, it not only needs
the identi cation of the causality but also the
time lag of the causality (i.e., how much time Key Applications
needs to pass until a road segment s traf c gets
impacted by a traf c incident). For the time Navigation Systems
series-based Granger causality model, such time The result of impact prediction can be applied
lag can be effectively learned through the model s in smart-routing applications in real time to help
learning process. However, for the graph-based users avoid unexpected congestion. Speci cally,
causal inference models (e.g., Pearl (1988) and when there is traf c accident, the prediction result
Glymour et al. (1987)), it is fairly dif cult to of event impact including the backlogs and speed
learn such a temporal dependency in the detected decrease caused by the traf c events can also
causality. Finally, the existing literatures focus be utilized for the purpose of avoiding traf c
on predicting the impact of one traf c incident at congestion. To be more speci c, consider an-
a time. In particular, the one-to-one causality other example illustrated in Fig. 4. In this gure,
relationship detection between the traf c at the caution mark, the directed solid red lines,
the incident location and one other location and the dashed blue lines represent the incident
Accident Impact Prediction, Fig. 4 (a) Route calcu- incident location. (c) Route calculated based on accurate
lated based on current incident s impact. (b) Time-varying prediction of impact
expansion of impacted region as driver approaches the
Accident Impact Prediction 47
Definition
data reported over arbitrary-shaped polygons or guide the redistribution of aggregate attribute
regular pixels. Attribute surfaces have appealing values to ner resolutions while maintaining
processing and interoperability characteristics, as consistency with the aggregate data (Haining
they are amenable to spatial operations in GIS 2003). It should be noted, here, that, apart from
and they can be aggregated at arbitrary spatial statistical (regression-based) models for surface
resolutions for subsequent data integration pur- reconstruction, the reliability of the resulting
poses. This contribution provides an overview of target predictions is rarely reported since most
geostatistical methods developed for the purpose surface reconstruction methods are cast in a
of reconstructing attribute surfaces from aggre- deterministic framework.
gate (areal) data. Geostatistics is a branch of spatial statistics,
with origins in mining applications, that deals
with the analysis of spatially distributed data
Historical Background (Journel and Huijbregts 1978). Geostatistical an-
alytical methods appear in numerous and diverse
In the spatial analysis literature, the task of scienti c disciplines, ranging from geoinformat-
changing an attribute s geographical unit frame ics, to earth sciences, to environmental and at-
falls in the realm of areal interpolation (Haining mospheric sciences, as well as to socioeconomic
2003). It is customary in the literature to applications. Geostatistical interpolation meth-
designate the known data and their corresponding ods, i.e., Kriging and its variants, have histori-
measurement units as source data and source cally addressed the exact same problem as areal
zones and similarly the unknown attribute interpolation. In particular, the concept of pre-
values and measurement units as target values dicting attribute values at arbitrary blocks (in 3D)
and target zones. When the target zones from known measurements de ned over points or
are in nitesimally small, i.e., points, areal blocks, termed change of support, was one of the
interpolation is tantamount to surface creation. early selling points of geostatistics, particularly
Surface reconstruction can be either based on in mining applications, along with the assess-
point source data the classical punctual spatial ment of uncertainty in the reported predictions
interpolation case or on source data de ned as (Journel and Huijbregts 1978). The problem of
aggregate values over regular pixels or irregular change of support has close connections with two
polygons, the problem of surface reconstruction celebrated issues in spatial analysis, namely, the
addressed in this contribution (a particular case modi able areal unit problem (MAUP) that per-
of downscaling). tains to the effects of aggregation on the statistics
The simplest (and earliest) form of surface of spatial attributes and the ecological inference
reconstruction from aggregate data is the problem (EIP) that pertains to the inference of
choropleth map, whereby all point attribute the statistics of disaggregate attribute values from
values within the same polygon receive the same aggregate data (Haining 2003).
value; see, for example, Haining (2003). Tobler s The connection between geostatistical
celebrated mass-preserving or pycnophylactic methods and areal interpolation, however, was
interpolation method aims at smoothing until recently limited mostly to the application of
the patchy attribute surface corresponding to punctual (point-to-point) Kriging and (point-to-)
the choropleth map, by invoking explicitly a block Kriging (Haining 2003). It was until
smoothness criterion for that surface subject recently that several commonly used areal
to constraints of aggregation consistency or interpolation methods for surface reconstruction
mass preservation (Tobler 1979). Ancillary data, were formulated within a geostatistical (area-
e.g., land cover information in a population to-point Kriging) framework (Kyriakidis 2004).
interpolation context, have also been accounted The remainder of this contribution provides an
for in surface reconstruction, via dasymetric overview of geostatistical surface reconstruction
mapping or regression models, to better methods from aggregate attribute data, with and
Aggregate Data: Geostatistical Solutions for Reconstructing Attribute Surfaces 51
without ancillary information, highlights recent Eq. (1) constitutes a discrete convolution of point
extensions, and discusses open problems and attribute values with the sampling kernel.
future directions. In its simplest form, the sampling kernel A
gn .cm / can attain a binary (0=1) value, indicating
that a particular target point cm lies within a
Scientific Fundamentals given source support Cn or not, accounting for
the representative region of the target point.
In its discrete approximation, surface reconstruc- That indicator value could be divided by the
tion can be formulated within a general spatial measure (length, area, volume) of the source
prediction framework as the task of predicting the support, depending on whether the geospatial
unknown entries of the (M 1) target attribute attributes undergoing transformation pertain to
vector yt D y.cm /; m D 1; : : : ; M T at a set area averages (spatially intensive variables),
of M point locations from the known entries of e.g., population density or average income, or
the (N 1) source data vector ys D y.Cn /, to area totals (spatially extensive variables),
n D 1; : : : ; N T available at N source supports. e.g., population counts or total income. More
Here, y.cm / denotes the unknown attribute value elaborate weighting schemes or sampling kernels
at a target location with coordinate vector cm , can be de ned, e.g., based on buffers or distance
assumed representative of an elemental region to roads or other geographical features in
around cm for discrete integration purposes, and population density estimation applications, or
y.Cn / denotes the known attribute value pertain- based on a sensor s point-spread function in
ing to a source support de ned as a polygon with remote sensing applications.
vertex coordinates stored in matrix Cn ; super- In the above formulation, the source data vec-
script T denotes transposition. For simplicity and tor ys and the target attribute vector yt , i.e.,
without loss of generality, it is assumed that the the discrete approximation of the sought-after
union of the N source polygons identi es the attribute surface, are linked as
study region A; that is, source polygons do not
overlap and cover completely the study region. In ys D Gyt (2)
addition, it is assumed that the M point locations
provide an adequate approximation to a continu- where G D gn .cm /; n D 1; : : : ; N; m D
ous surface, that is, an adequate discretization of 1; : : : ; M denotes a (N M ) matrix of sampling
the study region A, implying that M N. function values; the n-th row of G consists of
the M sampling function values for all point val-
Links Between Attribute Surface, ues within the source polygon Cn . Equation (2)
Aggregate Data, and Their Statistics contains the N measurement equations de ning
Source data are de ned via the aggregation of the N known source data, and matrix G can
point attribute values within their respective sup- be regarded as a linear spatial aggregation op-
ports. In particular, the aggregation procedure is erator. Note that the aggregation matrix G can
speci ed as a weighted linear averaging of point accommodate both point and aggregate data. In
values: other words, some elements of the source data
vector ys could pertain to point support attribute
M
X values, known, for example, from ne-resolution
y.Cn / D gn .cm /y.cm / (1) surveys. In this case, some rows of the aggre-
mD1 gation matrix G contain only one nonzero entry
corresponding to the locations of the point-level
where gn .cm / denotes the known contribution source attribute data.
of point attribute value y.cm / to the aggregate In geostatistics, the spatial distribution of an
(source) data value y.Cn /; that contribution is attribute surface y is regarded a realization of
termed sampling function or sampling kernel, and a random eld model fY.c/; c 2 Ag, or its
52 Aggregate Data: Geostatistical Solutions for Reconstructing Attribute Surfaces
discrete counterpart, a random vector (Journel aggregate data. In particular, the inference of
and Huijbregts 1978). In the second-order sta- a point covariogram model Y .hI / from ag-
tionary case, that random eld is parameterized gregate data is termed covariogram deconvolu-
by a constant mean Y and a positive-de nite tion (or deregularization) and constitutes an ill-
covariogram model Y .hI /, speci ed as a de- posed (under-determined) inverse problem, as is
creasing parametric function of distance; here h the ecological inference problem; some propos-
denotes a lag vector between any two locations, als for possible solutions to such an inference
and denotes a vector with covariogram model objective are offered by Kyriakidis (2004) and
parameters (range, sill, nugget). This implies that, Goovaerts (2008). In what follows, it is assumed
in the discrete case, the target attribute surface yt that the functional form and parameters of such
is characterized by a (M 1) constant expectation a point-level covariogram model Y .hI / have
(mean) vector t D 1t Y D and a (M been inferred, and the resulting model is used to
M ) covariance matrix t t D Y .cm cm0 I /, construct all necessary covariance matrices t t ,
m D 1; : : : ; M; m0 D 1; : : : ; M D . /, where st , and ss .
Y .cm cm0 I / denotes the covariance value
pertaining to a location pair cm and cm0 , built
from the point covariogram model Y .hI /, and Surface Reconstruction Using Aggregate
1t denotes a (M 1) vector of ones. Data Only
Being functionally linked to the unobserved When the expectation vector t of the point at-
attribute surface, the source (aggregate) data tribute values (hence the expectation vector s of
vector ys is also a realization of a random vector the source data) is known, surface reconstruction
characterized by a (N 1) expectation vector can be performed via simple Kriging (SK). In
particular, the (M 1) vector yO t D y.c O m /; m D
s D G and a (N N ) covariance matrix
0 1; : : : ; M T of SK predictions for the unknown
ss D Y .Cn ; Cn0 /; n D 1; : : : ; N; n D
T
1; : : : ; N D G . /G , where Y .Cn ; Cn0 / target attribute values is expressed as:
denotes the covariance value pertaining to
a pair of supports Cn and Cn0 . The two yO t D C WT ys (3)
t s
random vectors ys and yt are also correlated
with (N M ) (cross)covariance matrix
st D Y .Cn ; cm /; n D 1; : : : ; N; m D where W is a (N M ) matrix of SK weights; the
1; : : : ; M D G . /, where Y .Cn ; cm / denotes m-th column of matrix W contains the N weights
the covariance value pertaining to a polygon- applied to the N source data for computing the
point pair Cn and cm . When the entries of the O m / at location cm .
target prediction y.c
point-level covariance matrix . / are computed In the formulation above, all N source data
using a positive-de nite covariogram model are considered for predicting any target attribute
Y .hI /, both covariance matrices st and ss value y.cm /, a procedure termed global inter-
are positive de nite. Note that in the case of polation. Local variants of spatial interpolation
irregular supports, e.g., polygons, second-order amount to considering only a subset N 0 < N
stationarity cannot be assumed for the statistics of source data for prediction. In the isotropic
of the source data, even if that assumption is case, this subset is typically limited to a circular
reasonable for the statistics of the underlying neighborhood centered at the target location cm ;
attribute surface; this is a consequence of the the neighborhood radius is linked to the range of
spatially varying characteristics of aggregation. the point-level covariogram model Y .hI /. In
In practical applications of surface reconstruc- what follows, the discussion pertains to the global
tion, one has access to the aggregate data and not interpolation case, unless otherwise noted.
to the underlying attribute surface. This implies The SK weights of Eq. (3) are determined by
that the statistics and . / of the underly- solving a (N N ) system of (normal) equations,
ing surface must be inferred from those of the termed the simple Kriging (SK) system:
Aggregate Data: Geostatistical Solutions for Reconstructing Attribute Surfaces 53
a constant but unknown local attribute mean at small distances, particularly the nugget effect
Y .cm / is assumed for the target location cm and contribution, can only be indirectly (if at all)
all locations within the N 0 < N source supports estimated, since any information at resolutions
considered within the search neighborhood smaller than the source supports is lost due to
around cm . This amounts to the assumption of aggregation. This implies that surface reconstruc-
intrinsic stationarity, a weaker assumption than tion from aggregate data can only be achieved
second-order stationarity, whereby (i) the point- in this case after invoking, explicitly or implic-
level attribute mean is assumed locally (within itly, assumptions regarding the point semivari-
each search neighborhood) constant, and (ii) a ogram model corresponding to the underlying
more general distance-based metric of spatial attribute surface. The work of Kyriakidis (2004)
association (dissimilarity), the semivariogram demonstrated that several commonly used areal
function, can be de ned even in cases (in nite interpolation methods for surface reconstruction
attribute variance) where the covariogram cannot from aggregate data can be actually formulated
(Journel and Huijbregts 1978). That point- as particular cases of (area-to-point) Kriging un-
level local attribute mean Y .cm / is implicitly der very particular point semivariogram mod-
estimated, in conjunction with the OK weights, els. In particular, it was demonstrated that (a)
using a local version of the OK system of Eq. (7) the choropleth map corresponds to area-to-point
from the N 0 source data within each search Kriging with a white-noise (pure-nugget effect)
neighborhood. point semivariogram model, (b) kernel smoothing
No matter the formulation (SK or OK) methods often do not explicitly account for the
adopted, it should be stressed that surface aggregate nature of source data, and (c) Tobler s
reconstruction from aggregate data is an under- pycnophylactic interpolation (Tobler 1979) cor-
determined (ill-posed) inverse problem, as is the responds to area-to-point Kriging with a logarith-
classical problem of surface construction from mic point semivariogram model (in 2D); this was
point measurements via spatial interpolation. also shown in practice by Yoo et al. (2010).
In other words, there are multiple alternative Several extensions and improvements of the
surfaces that could be de ned at the point level, original formulation of geostatistical surface re-
all of which could be consistent with (reproduce) construction from aggregate data have been pro-
the available source data; such surfaces constitute posed in the literature. In particular, Yoo and Kyr-
solutions to the inverse problem of surface iakidis (2006) incorporated nonnegativity con-
reconstruction. In both cases (aggregate or not straints in the formulation of area-to-point Krig-
source data), what is required is a (prior) model ing, Guan et al. (2011) proposed ef cient numer-
of attribute spatial structure at the ne (target) ical methods based on the fast Fourier transform
resolution to resolve the inherent ambiguity of the for evaluating the source-to-source G . /GT
ill-posed inverse problem and render it solvable. and source-to-target G st . / covariance inte-
In geostatistics, that prior structural information grals involved in all Kriging systems, whereas
is explicitly speci ed in terms of a (typically Nagle (2010) incorporated measurement error
parametric) semivariogram (or covariogram) in the source data by via area-to-point factorial
model that characterizes the spatial variability or Kriging. In this latter case, factorial Kriging pre-
smoothness of the unobserved attribute surface. dictions do not reproduce the aggregate source
Such semivariogram models can range from data, since such data are deemed error prone,
pure-nugget effect models, indicative of an and the resulting surface is smoother than the
ultimately rough (random) attribute surface, to one computed via area-to-point simple or ordi-
models with extremely large range and no nugget nary Kriging. Last, Goovaerts (2006) developed
contribution, indicative of an extremely smooth a variant of area-to-point Kriging, termed Pois-
surface (Journel and Huijbregts 1978). son Kriging, capable of accounting for aggregate
When only aggregate source data are avail- source data following a non-Gaussian distribu-
able, the shape of the point semivariogram model tion.
Aggregate Data: Geostatistical Solutions for Reconstructing Attribute Surfaces 55
No matter the effort put into ameliorating sur- errors. Equation (9) implies that the area-level
face reconstruction with better or more realistic regression coef cients, e.g., s , are the same with
predictors, however, the nal attribute surface those of the point level t ; the reason behind this A
re ects the information content of the aggregate resolution invariance is the fact that the point-
source data. For a given attribute surface, the level values of the dependent variable Y and of
larger the aggregation extent, the less informative the predictors X are subjected to the same linear
the aggregate source data are. Surface reconstruc- aggregation encapsulated in matrix G.
tion thus becomes more realistic and more accu- Under the above linear model, surface recon-
rate as long as reconstruction methods are able struction is achieved via Kriging with external
to incorporate auxiliary geospatial information drift (KED) (Sales et al. 2013). In particular,
available at ne spatial resolutions, particularly the (M 1) vector yO KD t of KED predictions is
at the point support level. expressed as:
coherence property of Eq. (5) applies also to the attribute value. That uncertainty is quanti ed by
vector of KED-derived predictions yO KEDt (Sales the prediction error variance at each target loca-
et al. 2013). tion cm , taking into account (conditional on) the
Surface reconstruction via KED can also be con guration of the source supports, the point-
performed in a local interpolation mode, whereby level covariogram model Y .hI /, as well as the
a new linear regression model similar to Eq. (8) aggregate nature of the source data encapsulated
is postulated for the N 0 < N source supports in the aggregation matrix G.
considered within the search neighborhood cen- For the case of SK, the prediction error vari-
tered at a target location cm . The vector t of ance O Y .cm / at location cm can be derived from
point-level regression coef cients is implicitly the (M M ) SK prediction error covariance ma-
estimated, in conjunction with the KED weights, trix O t t D O Y .cm ; cm0 /, m D 1; : : : ; M; m0 D
using a local version of the KED system of 1; : : : ; M , with O Y .cm ; cm0 / denoting the condi-
Eq. (11) from the N 0 source data within each tional covariance value for a location pair cm and
search neighborhood. cm0 , expressed as:
A rather restrictive requirement of area-to-
point KED is that aggregate data of both the O tt D tt WT st
dependent variable Y and the K 1 independent
variables X pertaining to the same support D . / . /G G . /GT 1
G . /
Cn be all de ned using the same aggregation (12)
mechanism, since the sampling function gn .cm /
does not depend on any particular variable. where the M entries on the diagonal of matrix
When this requirement is not satis ed, surface O t t correspond to the SK prediction error vari-
reconstruction can be achieved via area-to- ances at the M target locations; such conditional
point coKriging and its variants accounting variance values represent the uncertainty in the
for a spatial varying attribute mean (Atkinson target predictions and are typically mapped along
et al. 2008). CoKriging weights furnish the with the SK-derived attribute surface of Eq. (3).
contribution of each source datum value, be it When the attribute expectation vector t is un-
of the dependent variable Y or of an auxiliary known and linked to auxiliary data via the regres-
variable Xk , to the target prediction as a function sion model of Eq. (8), the corresponding predic-
of both point-level and regularized (aggregate- tion error variance O YKD .cm / at location cm can
level) auto- and cross-covariogram values. The be derived from the (M M ) SK prediction error
solution of the corresponding coKriging system covariance matrix O KD tt D O YKD .cm ; cm0 /; m D
of equations calls for a permissible (positive- 1; : : : ; M; m D 1; : : : ; M , with O YKD .cm ; cm0 /
0
de nite) joint model, e.g., the linear model of denoting the KED-derived conditional covariance
coregionalization (Journel and Huijbregts 1978), value for a location pair cm and cm0 , expressed as
for all point-level auto- and cross-covariograms (Sales et al. 2013):
de ned between all pairs of variables involved.
Although surface reconstruction based on area- O KD
tt D .Q/ . Q /G G . Q /GT 1
G .Q/
to-point coKriging is more exible than KED-
T
based reconstruction, it requires parameter C KD Xt (13)
estimation for a signi cantly larger number of
point-level covariogram models, thus increasing where term TKD Xt represents the increase (with
considerably the required inference effort. respect to SK) in prediction uncertainty brought
by the fact that the attribute expectation vector t
Uncertainty in Surface Reconstruction (hence s ) assumed known in the SK formulation
Kriging is a stochastic surface reconstruction of Eq. (3) is now implicitly estimated by KED.
method and as such provides an estimate of In the multivariate Gaussian case, the area-
uncertainty or reliability for each predicted point to-point Kriging prediction and the associated
Aggregate Data: Geostatistical Solutions for Reconstructing Attribute Surfaces 57
prediction error variance furnish the parameters ing and its variants, as well as ATP stochastic
of a local Gaussian probability distribution of simulation, has been employed in several elds,
possible attribute values given the aggregate ranging from remote sensing and geoinformation A
source data, possibly including data on to environmental science, population mapping, as
relevant auxiliary variables used for spatial well as public health. In terms of remote sens-
prediction (in the case of KED), as well as ing applications, ATP Kriging and ATP coKrig-
the particular point-level covariogram model ing have been extensively used for downscaling
adopted. That probability distribution can be moderate resolution imaging spectroradiometer
used for propagating (either analytically or (MODIS) data to ner spatial resolutions; see,
numerically through statistical simulation) the for example, Atkinson et al. (2008), Sales et al.
local uncertainty in interpolated attribute values (2013), and Wang et al. (2015), as well as Truong
to quantify uncertainty in the results of local GIS et al. (2014) who extended ATP Kriging and
operations involving one location at a time. simulation to account for expert knowledge when
Reconstructed attribute surfaces, however, of- downscaling MODIS temperature pro le data.
ten undergo spatial operations involving multiple These applications showcase the great poten-
locations at a time, e.g., gradient computations or tial of geostatistical surface reconstruction when
other focal or zonal operations in a GIS. In such combined with MODIS data for a wide variety
cases, however, knowledge of the local Kriging of environmental monitoring purposes, such as
attribute prediction and variance at a set of target global deforestation mapping.
locations, considered one at a time, is not ade- In terms of soil science applications, Kerry
quate for such a multiple-point uncertainty anal- et al. (2012) employed ATP Kriging to disaggre-
ysis task. The preferred means for uncertainty gate legacy soil data for mapping soil organic
propagation in this case is surface reconstruction carbon, and Horta et al. (2014) applied ATP
via geostatistical simulation (Kyriakidis and Yoo stochastic simulation for mapping soil hydraulic
2005). As stated before, attribute surface recon- properties integrating measurements of different
struction from (aggregate or point) source data is spatial resolutions. Last, geostatistical surface re-
an under-determined inverse problem, which can construction has been also employed in socioe-
be rendered solvable once a covariogram model is conomic applications. In particular, Goovaerts
postulated or inferred for the underlying attribute (2012) employed ATP binomial Kriging for map-
surface (when using SK and OK) or for the ping cancer mortality risk while accounting for
regression error surface (when using KED). Even different levels of spatial aggregation and for non-
when such point-level statistics related to surface Gaussian distribution of the aggregate data, Liu
smoothness have been inferred, multiple plausi- et al. (2008) applied ATP Kriging to the residuals
ble solutions exist to the stochastic surface recon- of a regression model linking urban population
struction inverse problem, all sharing the same data from census units to land-use zones, Yoo and
covariogram model and being consistent with Kyriakidis (2009) applied ATP Kriging for down-
the aggregate source data. Surface reconstruction scaling housing prices within a hedonic pricing
via geostatistical simulation can be regarded as model framework, and Nagle (2010) applied ATP
the procedure of generating or exploring such factorial Kriging to predict employment density
alternative attribute surfaces, thus furnishing mul- from aggregate data.
tiple solutions to the stochastic disaggregation
inverse problem (Kyriakidis and Yoo 2005).
Future Directions
data are available. Although this requirement multiple-point geostatistics are increasingly used
might seem problematic at rst sight, it explicates as geostatistical downscaling methods.
the subjective decisions made at the surface It is expected that as more of these develop-
(point) level by existing methods for surface ments nd their way into commercial or open-
reconstruction. Explicit model speci cation at source GIS software, geostatistical surface re-
the point level is more exible and creates construction from aggregate data will become an
more opportunities for interdisciplinary problem- even more popular downscaling method across
solving than downscaling relying on somewhat multiple disciplines.
arbitrary decisions invoked implicitly by
traditional surface reconstruction methods. More
research is thus required to develop guidelines Recommended Reading
for selecting appropriate models of point-
level spatial correlation for selected classes of Atkinson PM, Pardo-Igœzquiza E, Chica-Olmo M (2008)
downscaling problems, e.g., depending on the Downscaling cokriging for super-resolution mapping
of continua in remotely sensed images. IEEE Trans
particular attribute surface being reconstructed Geosci Remote Sens 46:573 580
and/or the particular region or environment where Goovaerts P (2006) Geostatistical analysis of disease data:
the aggregate source data are available. accounting for spatial support and population density
The inclusion of time as an additional data and in the isopleth mapping of cancer mortality risk using
area-to-point Poisson kriging. Int J Health Geogr 5:52
modeling component has been one of the major Goovaerts P (2008) Kriging and semivariogram deconvo-
areas of development in geostatistics during the lution in the presence of irregular geographical units.
last decade. Several space-time semivariogram Math Geosci 40:101 128
functions have been proposed in the literature for Goovaerts P (2012) Geostatistical analysis of health data
with different levels of spatial aggregation. Spat Spa-
modeling joint attribute variation in a spatiotem- tiotemporal Epidemiol 3:83 92
poral context. In addition, space-time semivari- Guan Q, Kyriakidis PC, Goodchild MF (2011) A parallel
ogram functions derived from analytical solutions computing approach to fast geostatistical areal inter-
of partial differential equations have also been polation. Int J Geogr Inf Sci 25:1241 1267
Haining R (2003) Spatial data analysis: theory and prac-
developed to account for the dynamic evolution tice. Cambridge University Press, Cambridge
of spatiotemporal processes. Such models can Horta A, Pereira MJ, Gon alves M, Ramos T, Soares A
furnish the much sought-after, yet often elusive, (2014) Spatial modelling of soil hydraulic properties
point-level semivariogram function required for integrating different supports. J Hydrol 51:1 9
Journel AG, Huijbregts CJ (1978) Mining geostatistics.
geostatistical surface reconstruction, as well as Academic, London
infuse process-based expert or prior knowledge Kerry R, Goovaerts P, Rawlings BG, Marchant BP (2012)
in the downscaling procedure. Disaggregation of legacy soil data using area to point
Another recent development in geostatistics kriging for mapping soil organic carbon at the regional
scale. Geoderma 170:347 358
is that of multiple-point geostatistics, whereby Kyriakidis PC (2004) A geostatistical framework for area-
attribute spatial patterns involving more than two to-point spatial interpolation. Geogr Anal 36:259 389
points at a time (a semivariogram is a two-point Kyriakidis PC, Yoo E-H (2005) Geostatistical prediction
statistic) are learned from training images (Ma- and simulation of point values from areal data. Geogr
Anal 37:124 151
riethoz and Caers 2014). Such images could be Liu X, Kyriakidis PC, Goodchild MF (2008) Population
constructed from remotely sensed images or even density estimation using regression and area-to-point
attribute surfaces stemming from numerical mod- residual Kriging. Int J Geogr Inf Sci 22:431 447
els of physical or social processes and represent Mariethoz G, Caers J (2014) Multiple-point geostatis-
tics: stochastic modeling with training images. Wiley-
prior (before aggregate data acquisition) repos- Blackwell, Chichester
itories of spatial patterns. These learned spatial Nagle NN (2010) Geostatistical smoothing of areal data:
patterns provide more realistic models of spa- mapping employment density with factorial Kriging.
tial heterogeneity and complexity than paramet- Geogr Anal 42:99 117
Sales MHR, Sousa CM Jr, Kyriakidis PC (2013) Fusion
ric (or nonparametric) semivariogram functions. of MODIS images using Kriging with external drift.
Fine spatial resolution training images along with IEEE Trans Geosci Remote Sens 51:2250 2259
Aggregate Queries, Progressive Approximate 59
Tobler WR (1979) Smooth pycnophylactic interpolation and in reasonable time. Alternatively, the precise
for geographical regions. J Am Stat Assoc 74:519 530 value of the aggregate may not even be needed
Truong PN, Heuvelink GBM, Pebezma E (2014) Bayesian
area-to-point kriging using expert knowledgeas infor-
by the application submitting the query, e.g., if A
mative priors. Int J Appl Earth Obs Geoinf 30:128 138 the aggregate value is to be mapped to an 8-bit
Wang Q, Shi W, Atkinson PM, Zhao Y (2015) Down- color code for visualization. Hence, this moti-
scaling MODIS images with area-to-point regression vates the use of approximate aggregate queries,
kriging. Remote Sens Environ 166:191 204
Yoo E-H, Kyriakidis PC (2006) Area-to-point Kriging
which return a value close to the exact one, but at
with inequality-type data. J Geogr Syst 8:357 390 a fraction of the time.
Yoo E-H, Kyriakidis PC (2009) Area-to-point Kriging in Progressive approximate aggregate queries go
spatial hedonic pricing models. J Geogr Syst 11:381 one step further. They do not produce a single
406
Yoo E-H, Kyriakidis PC, Tobler W (2010) Reconstructing
approximate answer, but continuously re ne the
population density surfaces from areal data: a compar- answer as time goes on, progressively improving
ison of Tobler s pycnophilactic interpolation method its quality. Thus, if the user has a xed deadline,
and area-to-point Kriging. Geogr Anal 42:78 98 he can obtain the best answer within the allotted
time; conversely, if he has a xed answer accu-
racy requirement, the system will use the least
amount of time to produce an answer of suf cient
Aggregate Nearest Neighbor
accuracy. Thus, progressive approximate aggre-
Queries
gate queries are a exible way of implementing
aggregate query answering.
Variations of Nearest Neighbor Queries in Eu-
Multi-Resolution Aggregate trees (MRA-
clidean Space
trees) are spatial or in general multi-
dimensional indexing data structures, whose
nodes are augmented with aggregate values for
Aggregate Queries, Progressive all the indexed subsets of data. They can be used
Approximate very ef ciently to provide an implementation of
progressive approximate query answering.
Iosif Lazaridis and Sharad Mehrotra
Department of Computer Science, University of
California, Irvine, CA, USA Historical Background
temperature is 34:12 C, but 34 0:5 C will on the subset of interest without having to process
suf ce; second, the dataset may be so large a great number of tuples individually. Moreover,
that exhaustive computation may be infeasible. MRA-trees provide deterministic answer quality
These observations motivated researchers to guarantees to the user that are easy for him
devise approximate aggregate query answering to prescribe (when he poses his query) and to
mechanisms. interpret (when he receives the results).
Off-line synopsis based strategies, such as his-
tograms (Ioannidis and Poosala 1999), samples
(Acharya et al. 1999), and wavelets (Chakrabarti Scientific Fundamentals
et al. 2000) have been proposed for approx-
imate query processing. These use small data Multi-dimensional index trees such as R-trees,
summaries that can be processed very easily to quad-trees, etc., are used to index data exist-
answer a query at a small cost. Unfortunately, ing in a multi-dimensional domain. Consider a
summaries are inherently unable to adapt to the d-dimensional space Rd and a nite set of points
query requirements. The user usually has no way (input relation) S Rd . Typically, for spatial
of knowing how good an approximate answer is applications, d 2 f2; 3g. The aggregate query
and, even if he does, it may not suf ce for his is de ned as a pair (agg, RQ ) where agg is
goals. Early synopsis based techniques did not an aggregate function (e.g., MIN, MAX, SUM,
provide any guarantees about the quality of the AVG, COUNT) and RQ Rd is the query
answer, although this has been incorporated more region. The query asks for the evaluation of agg
recently (Garofalakis and Kumar 2005). over all tuples in S that are in region RQ . Multi-
Online aggregation (Hellerstein et al. 1997) dimensional index trees organize this data via a
was proposed to deal with this problem. In online hierarchical decomposition of the space Rd or
aggregation, the input set is sampled continu- grouping of the data in S . In either case, each
ously, a process which can, in principle, continue node N indexes a set of data tuples contained in
until this set is exhausted, thus providing an an- its subtree which are guaranteed to have values
swer of arbitrarily good quality; the goal is, how- within the node s region RN .
ever, to use a sample of small size, thus saving MRA-trees (Lazaridis and Mehrotra 2001) are
on performance while giving a good enough generic data techniques that can be applied over
answer. In online aggregation, a running aggre- any standard multi-dimensional index method;
gate is updated progressively, nally converging they are not yet another indexing technique. They
to the exact answer if the input is exhausted. The modify the underlying index by adding the value
sampling usually occurs by sampling either the of the agg over all data tuples indexed by (i.e.,
entire data table or a subset of interest one tuple in the sub-tree of) N to each tree node N . Only
at a time; this may be expensive, depending on a single such value, e.g., MIN, may be stored,
the size of the table, and also its organization: but in general, all aggregate types can be used
if tuples are physically ordered in some way, without much loss of performance. An example
then sampling may need to be performed with of an MRA-quad-tree is seen in Fig. 1.
random disk accesses, which are costiercompared The key observation behind the use of MRA-
to sequential accesses. trees is that the aggregate value of all the tuples
Multi-resolution trees (Lazaridis and Mehrotra indexed by a node N is known by just visiting N .
2001) were designed to deal with the limita- Thus, in addition to the performance bene t of
tions of established synopsis-based techniques a standard spatial index (visiting only a fraction
and sampling-based online aggregation. Unlike of selected tuples, rather than the entire set), the
off-line synopses, MRA-trees are exible and MRA-tree also avoids traversing the entire sub-
can adapt to the characteristics of the user s tree of nodes contained within the query region.
quality/time requirements. Their advantage over Nodes that partially overlap the region may or
sampling is that they help queries quickly zero in may not contribute to the aggregate, depending
Aggregate Queries, Progressive Approximate 61
on the spatial distribution of points within them. The progressive approximation algorithm
Such nodes can be further explored to improve (Fig. 3) has three major components:
performance. This situation is seen in Fig. 2:
nodes at the perimeter of the query (set Np ) can Computation of a deterministic interval of
be further explored, whereas nodes at the interior confidence guaranteed to contain the aggre-
(Nc ) need not be. gate value, e.g., [30, 40].
62 Aggregate Queries, Progressive Approximate
Estimation of the aggregate value, e.g., 36.2. example, if the SUM of all contained nodes is 50
A traversal policy which determines which and the SUM of all partially overlapping nodes is
node to explore next by visiting its children 15, then the interval is [50, 65] since all the tuples
nodes. in the overlapping nodes could either be outside
or inside the query region.
The interval of con dence can be calculated There is no single best way for aggregate value
by taking the set of nodes partially overlap- estimation. For example, taking the middle of
ping/contained in the query into account (Fig. 2). the interval has the advantage of minimizing the
The details of this for all the aggregate types can worst-case error. On the other hand, intuitively, if
be found in Lazaridis and Mehrotra (2001). For a node barely overlaps with the query, then it is
Aggregate Queries, Progressive Approximate 63
0.8
0.6
0.4
0.2
0
0 100 200 300 400 500 600
# MRA-tree Nodes Visited
expected that its overall contribution to the query lectivity affects processing speed; like all multi-
will be slight. Thus, if in the previous example dimensional indexes, performance degrades as a
there are two partially overlapping nodes, A and higher fraction of the input table S is selected.
B, with SUM(A) D 5 and SUM(B) D 15, and However, unlike traditional indexes, the degrada-
30% of A and 50% of B overlaps with the query tion is more gradual since the interior area of
respectively, then a good estimate of the SUM the query region is not explored. A typical pro le
aggregate will be 50 C 5 0:3 C 15 0:5 D 59. of answer error as a function of the number of
Finally, the traversal policy should aim to nodes visited can be seen in Fig. 4.
shrink the interval of con dence by the great- MRA-trees use extra space (to store the
est amount, thus improving the accuracy of the aggregates) in exchange for time. If the
answer as fast as possible. This is achieved by underlying data structure is an R-tree, then
organizing the partially overlapping nodes using storage of aggregates in tree nodes results in
a priority queue. The queue is initialized with the decreased fanout since fewer bounding rectangles
root node and subsequently the front node of the and their accompanying aggregate values
queue is repeatedly picked, its children examined, can be stored within a disk page. Decreased
the con dence interval and aggregate estimate is fanout may imply increased height of the tree.
updated, and the partially overlapping children Fortunately, the overhead of aggregate storage
are placed in the queue. Our example may show does not negatively affect performance since it
the preference to explore node B before A since it is counter-balanced by the bene ts of partial
contributed more (15) to the uncertainty inherent tree exploration. Thus, even for computing the
in the interval of con dence than B (5). Detailed exact answer, MRA-trees are usually faster than
descriptions of the priority used for the different regular R-trees and the difference grows even if a
aggregate types can be found in Lazaridis and small error, e.g., in the order of 10%, is allowed
Mehrotra (2001). (Fig. 5).
Performance of MRA-trees depends on both
the underlying data structure used as well as
the aggregate type and query selectivity. MIN Key Applications
and MAX queries are typically evaluated very
ef ciently since the query processing system uses Progressive approximate aggregate queries using
the node aggregates to quickly zero in on a a multi-resolution tree structure can be used in
few candidate nodes that contain the minimum many application domains when data is either
value; very rarely is the entire perimeter needed large, dif cult to process, or the exact answer is
to compute even the exact answer. Query se- not needed.
64 Aggregate Queries, Progressive Approximate
15
10
0
0 2 4 6 8 10 12
Spatial Query Selectivity (% space)
Recommended Reading
Aggregation
Acharya S, Gibbons P, Poosala V, Ramaswamy S (1999)
Joint synopses for approximate query answering. In:
SIGMOD 99: proceedings of the 1999 ACM SIG- Hierarchies and Level of Detail
MOD international conference on management of
data. ACM Press, New York, pp 275 286
Chakrabarti K, Garofalakis MN, Rastogi R, Shim K
(2000) Approximate query processing using wavelets. Aggregation Query, Spatial
In: VLDB 00: proceedings of the 26th international
conference on very large data bases. Morgan Kauf-
mann, San Francisco, pp 111 122 Donghui Zhang
Garofalakis M, Kumar A (2005) Wavelet synopses for College of Computer and Information Science,
general error metrics. ACM Trans Database Syst Northeastern University, Boston, MA, USA
30(4):888 928
Hellerstein JM, Haas PJ, Wang HJ (1997) Online aggrega-
tion. In: SIGMOD 97: proceedings of the 1997 ACM
SIGMOD international conference on management of
data. ACM Press, New York, pp 171 182 Synonyms
Ioannidis YE, Poosala V (1999) Histogram-based approx-
imation of set-valued query-answers. In: VLDB 99: Spatial Aggregate Computation
proceedings of the 25th international conference on
very large data bases. Morgan Kaufmann, San Fran-
cisco, pp 174 185
Lazaridis I, Mehrotra S (2001) Progressive approximate Definition
aggregate queries with a multi-resolution tree struc-
ture. In: SIGMOD 01: proceedings of the 2001 ACM Given a set O of weighted point objects and a
SIGMOD international conference on management of
data. ACM Press, New York, pp 401 412 rectangular query region r in the d-dimensional
Lazaridis I, Mehrotra S (2004) Approximate selection space, the spatial aggregation query asks the to-
queries over imprecise data. In: ICDE 04: proceedings tal weight of all objects in O which are contained
of the 20th international conference on data engineer-
in r.
ing, Washington, DC. IEEE Computer Society
Porkaew K, Lazaridis I, Mehrotra S, Winkler R (2001) This query corresponds to the SUM aggrega-
Database support for situational awareness. In: Vassil- tion. The COUNT aggregation, which asks for
iop MS, Huang TS (eds) Computer-science handbook the number of objects in the query region, is a
for displays summary of ndings from the Army Re-
special case when every object has equal weight.
search Lab s advanced displays & interactive displays
federated laboratory. Rockwell Scienti c Company The problem can actually be reduced to a
special case, called the dominance-sum query. An
object o1 dominates another object o2 if o1 has
larger value in all dimensions. The dominance-
Recommended Reading sum query asks for the total weight of objects
Karras P, Mamoulis N (2005) One-pass wavelet synopses
dominated by a given point p. It is a special case
for maximum-error metrics. In: VLDB 05: proceed- of the spatial aggregation query, when the query
ings of the 31st international conference on very large region is described by two extreme points: the
data bases, Trondheim. VLDB Endowment, pp 421 lower-left corner of space and p.
432
Lenz HJ, Jurgens M (1998) The Ra -tree: an improved
The spatial aggregation query can be reduced
r-tree with materialized data for supporting range to the dominance-sum query in the 2D space, as
queries on olap-data. In: DEXA workshop, Vienna illustrated below. Given a query region r (a 2D
66 Aggregation Query, Spatial
rectangle), let the four corners of r be low- To externalize an internal memory data structure,
erleft, upperleft, lowerright, and upperright. It a widely used method is to augment it with block-
is not hard to verify that the spatial aggregate access capabilities (Vitter 2001). Unfortunately,
regarding to r is equal to this approach is either very expensive in query
cost, or very expensive in index size and update
d omi nancesum.upperright / cost.
d omi nancesum.lowerright / Another approach to solve the spatial aggre-
gation query is to index the data objects with
d omi nancesum.upperlef t /
a multidimensional access method like the R -
C d omi nancesum.lowerlef t /
tree (Beckmann et al. 1990). The R -tree (and
the other variations of the R-tree) clusters nearby
objects into the same disk page. An index entry
Historical Background is used to reference each disk page. Each index
entry stores the minimum bounding rectangle
In computational geometry, to answer the (MBR) of objects in the corresponding disk page.
dominance-sum query, an in-memory and static The index entries are then recursively clustered
data structure called the ECDF-tree (Bentley based on proximity as well. Such multidimen-
1980) can be used. The ECDF-tree is a multi-level sional access methods provide ef cient range
data structure, where each level corresponds to query performance in that subtrees whose MBRs
a different dimension. At the rst level (also do not intersect the query region can be pruned.
called main branch), the d -dimensional ECDF- The spatial aggregation query can be reduced to
tree is a full binary search tree whose leaves the range search: retrieve the objects in the query
store the data points, ordered by their position region and aggregate their weights on the y.
in the rst dimension. Each internal node of this Unfortunately, when the query region is large, the
binary search tree stores a border for all the query performance is poor.
points in the left subtree. The border is itself An optimization proposed by Lazaridis and
a (d -1)-dimensional ECDF-tree; here points Mehrotra (2001) and Papadias et al. (2001) is
are ordered by their positions in the second to store, along with each index entry, the total
dimension. The collection of all these border weight of objects in the referenced subtree. The
trees forms the second level of the structure. index is called the aggregate R-tree, or aR-tree
Their respective borders are (d -2)-dimensional in short. Such aggregate information can improve
ECDF-trees (using the third dimension and so the aggregation query performance in that if the
on). To answer a dominance-sum query for point query region fully contains the MBR of some
p D .p1 ; : : : ; pd /, the search starts with the index entry, the total weight stored along with
root of the rst level ECDF-tree. If p1 is in the index entry contributes to the answer, while
the left subtree, the search continues recursively the subtree itself does not need to be examined.
on the left subtree. Otherwise, two queries are However, even with this optimization, the query
performed, one on the right subtree and the other effort is still affected by the size of the query
on the border; the respective results are then region.
added together.
In the elds of GIS and spatial databases, one Scientific Fundamentals
seeks for disk-based and dynamically updateable
index structures. An approach is to externalize This section presents a better index for the
and dynamize the ECDE-tree. To dynamize a dominance-sum query (and in turn the spatial
static data structure, some standard techniques aggregation query) called the Box-Aggregation
can be used (Chiang and Tamassia 1992), for ex- Tree, or BA-tree in short.
ample, the global rebuilding (Overmars 1983) or The BA-tree is an augmented k-d-B-tree
the logarithmic method (Bentley and Saxe 1980). (Robinson 1981). The k-d-B-tree is a disk-based
Aggregation Query, Spatial 67
index structure for multidimensional point points contained in F.box; (2) the points domi-
objects. Unlike the R-tree, the k-d-B-tree indexes nated by the low point of F (in the shadowed
the whole space. Initially, when there are only region of Fig. 1a); (3) the points below the lower A
a few objects, the k-d-B-tree uses a single disk edge of F.box (Fig. 1b); and (4) the points to the
page to store them. The page is responsible for left of the left edge of F.box (Fig. 1c).
the whole space in the sense that any new object, To compute the dominance-sum for points in
wherever it is located in space, should be inserted the rst group, a recursive traversal of subtree(F )
to this page. When the page over ows, it is split is performed. For points in the second group,
into two using a hyperplane corresponding to a in record F a single value (called subtotal) is
single dimension. For instance, order all objects kept, which is the total value of all these points.
based on dimension one and move the half of For computing the dominance-sum in the third
the objects with larger dimension-one values to group, an x-border is kept in F which contains
a new page. Each of these two disk pages is the x positions and values of all these points.
referenced by an index entry, which contains a This dominance-sum is then reduced to a 1D
box: the space the page is responsible for. The dominance-sum query for the border. It is then
two index entries are stored in a newly created suf cient to maintain these x positions in a 1D
index page. As more split happens, the index BA-tree. Similarly, for the points in the fourth
page contains more index entries. group, a y-border is kept which is a 1D BA-tree
For ease of understanding, let s focus the dis- for the y positions of the group s points.
cussion on the 2D space. Figure 1 shows an To summarize, the 2D BA-tree is a k-d-B-
exemplary index page of a BA-tree in the 2D tree where each index record is augmented with
space. As in the k-d-B-tree, each index record a single value subtotal and two 1D BA-trees
is associated with a box and a child pointer. The called x-border and y-border, respectively. The
boxes of records in a page do not intersect and computation for a dominance-sum query at point
their union creates the box of the page. p starts at the root page R. If R is an index node,
As done in the ECDE-tree, each index record it locates the record r in R whose box contains p.
in the k-d-B-tree can be augmented with some A 1D dominance-sum query is performed on the
border information. The goal is that a dominance- x-border of r regarding p:x. A 1D dominance-
sum query can be answered by following a sin- sum query is performed on the y-border of r
gle subtree (in the main branch). Suppose in regarding p:y. A 2D dominance-sum query is
Fig. 1a, there is a query point contained in the performed recursively on page(r.child). The nal
box of record F . The points that may affect the query result is the sum of these three query results
dominance-sum query of a query point in F.box plus r.subtotal.
are those dominated by the upper-right point of The insertion of a point p with value v starts
F.box. Such points belong in four groups: (1) the at the root R. For each record r where r.lowpoint
a b c
B B B
F G F G F G
A A A
C C C
D E H D E H D E H
Aggregation Query, Spatial, Fig. 1 The BA-tree is a k-d-B-tree with augmented border information. (a) Points
affecting the subtotal of F. (b) Points affecting the x-border of F. (c) Points affecting the y-border of F
68 Aggregation Query, Spatial
dominates p, v is added to r.subtotal. For each for data cube range-sum appear in Chung et al.
r where p is below the x-border of r, position (2001) and Geffner et al. (2000). When applied
p:x and value v are added to the x-border. For to this problem, the BA-tree differs from Geffner
each record r where p is to the left of the y- et al. (2000) in two ways. First, it is disk based,
border of r, position p:y and value v are added while (Geffner et al. 2000) presents a main-
to the y-border. Finally, for the record r whose memory structure. Second, the BA-tree partitions
box contains p, p and v are inserted in the the space based on the data distribution, while
subtree(r.child). When the insertion reaches a leaf (Geffner et al. 2000) does partitioning based on
page L, a leaf record that contains point p and a uniform grid.
value v is stored in L.
Since the BA-tree aims at storing only the ag-
Future Directions
gregate information, not the objects themselves,
there are chances where the points inserted are
The update algorithm for the BA-tree is omitted
not actually stored in the index, thus saving stor-
from here, but can be found in Zhang et al.
age space. For instance, if a point to be inserted
(2002). Also discussed in Zhang et al. (2002) are
falls on some border of an index record, there is
more general queries, such as spatial aggregation
no need to insert the point into the subtree at all.
over objects with extent.
Instead, it is simply kept in the border that it falls
The BA-tree assumes that the query region is
on. If the point to be inserted falls on the low
an axis-parallel box. One practical direction of
point of an internal node, there is even no need
extending the solution is to handle arbitrary query
to insert it in the border; rather, the subtotal value
regions, in particular, polygonal query regions.
of the record is updated.
The BA-tree extends to higher dimensions
in a straightforward manner: a d -dimensional Cross-References
BA-tree is a k-d-B-tree where each index record
is augmented with one subtotal value and d Aggregate Queries, Progressive Approximate
borders, each of which is a (d -1)-dimensional OLAP, Spatial
BA-tree.
References
Key Applications
Beckmann N, Kriegel HP, Schneider R, Seeger B (1990)
The R -tree: an ef cient and robust access method
One key application of ef cient algorithms for for points and rectangles. In: SIGMOD, Atlantic City,
the spatial aggregation query is interactive GIS pp 322 331
systems. Imagine a user interacting with such a Bentley JL (1980) Multidimensional divide-and-conquer.
Commun ACM 23(4):214 229
system. She sees a map on the computer screen. Bentley JL, Saxe NB (1980) Decomposable searching
Using the mouse, she can select a rectangular problems I: static-to-dynamic transformations. J Algo-
region on the map. The screen zooms in to the rithms 1(4):301 358
selected region. Besides, some statistics about the Chiang Y, Tamassia R (1992) Dynamic algorithms in com-
putational geometry. Proc IEEE Spec Issue Comput
selected region, e.g., the total number of hotels, Geom 80(9):1412 1434
total number of residents, and so on, can be Chung C, Chun S, Lee J, Lee S (2001) Dynamic up-
quickly computed and displayed on the side. date cube for range-sum queries. In: VLDB, Roma,
Another key application is in data mining, pp 521 530
Geffner S, Agrawal D, El Abbadi A (2000) The dynamic
in particular, to compute range sums over data data cube. In: EDBT, Konstanz, pp 237 253
cubes. Given a d -dimensional array A and a Lazaridis I, Mehrotra S (2001) Progressive approximate
query range q, the range-sum query asks for the aggregate queries with a multi-resolution tree struc-
total value of all cells of A in range q. It is ture. In: SIGMOD, Santa Barbara, pp 401 412
Overmars MH (1983) The design of dynamic data struc-
a crucial query for online analytical processing tures. Lecture notes in computer science, vol 156.
(OLAP). The best known in-memory solutions Springer, Heidelberg
Anomaly Detection 69
Analysis, Robustness
Air Borne Sensors
Multicriteria Decision-Making, Spatial
Photogrammetric Sensors
Analysis, Sensitivity
akNN
Multicriteria Decision-Making, Spatial
Nearest Neighbor Problem
Anamolies
Algorithm
Data Analysis, Spatial
Data Structure
Anchor Points
All-k-Nearest Neighbors
Way nding, Landmarks
Nearest Neighbor Problem
Anchors, Space-Time
All-Lanes-Out
Time Geography
Contra ow for Evacuation Traf c Management
Anomaly Detection
All-Nearest-Neighbors
Homeland Security and Spatial Data Mining
Nearest Neighbor Problem Outlier Detection
70 Anonymity
Definition
Anonymity
In the context of geographic information and
Cloaking Algorithms for Location Privacy
ISO/TC 211 vocabulary, an application schema
consists in an application level conceptual
schema rendering to a certain level of detail a
universe of discourse described as data. Such
Anonymity in Location-Based
data is typically required by one or more
Services
applications (ISO/TC211 ISO19109:2005 2005).
Typically, additional information not found in
Privacy Threats in Location-Based Services
the schema is included in a feature catalogue to
semantically enrich the schema. Levels of details
regarding schemata (models) and catalogues
(data dictionaries) are described in the cross-
Anonymization of GPS Traces
references.
Synonyms
References
Conceptual model; Conceptual schema; Data
Brodeur J, BØdard Y, Proulx MJ (2000) Modelling geospa-
models; Data schema; ISO/TC 211; Object tial application databases using UML-based repos-
model; Object schema itories aligned with international standards in geo-
Approximation 71
sister2
front
you
Approximate Aggregate Query back
left right
Approximation
Example
Historical Background
At every moment in time, your body axes create
a partition of space consisting of the cells front- Rough set theory, the formal basis of the theory
left ( ), back-left (bl), front-right (fr), and back- of approximations as reviewed in this entry,
right (br) as depicted in Fig. 1. Every object, was introduced by Pawlak (1982; 1991) as a
72 Approximation
1 2 3 4 5 6
Regional Approximations
x z In spatial representation and reasoning, it is of-
y u ten not necessary to approximate subsets of an
arbitrary set, but subsets of a set with topolog-
Approximation, Fig. 2 Rough approximations of spa- ical or geometric structure. Thus, rather than
tial regions (Bittner and Stell 2002b) considering arbitrary sets and subsets thereof,
regular closed subsets of the plane are considered.
The cells (elements) of the partitions are regular
formal tool for data analysis. The main areas of closed sets which may overlap on their bound-
application are still data mining and data analysis aries, but not their interiors.
(Duentsch and Gediga 2000; Or owska 1998; Consider Fig. 2. Let X D f.x; y/ j 0 < x <
Slezak et al. 2005); however there are successful 7&0 < y < 7g be a regular closed subset of
applications in GIScience (Bittner and Stell the plane and c.0;0/ D f.x; y/ j 0 < x <
2002b; Worboys 1998a, b) and in other areas. 1&0 < y < 1g, c.0;1/ D f.x; y/ j 0 < x <
Ongoing research in rough set theory includes 1&1 < y < 2g, : : : c.7;7/ D f.x; y/ j 6 <
research on rough mereology (Polkowski and x < 7&6 < y < t g a partition of X formed
Skowron 1996) and its application to spatial by the regular closed sets c.0;0/ ; : : : ; c.6;6/ (cells),
reasoning (Polkowski 2004). Rough mereology i.e., I D fc.0;0/ ; : : : ; c.6;6/ g. Two members of X
is a generalization of rough set theory and of the are equivalent if and only if they are part of the
research presented here. interior of the same cell c.i;j / .
The subsets x, y, ·, and u now can be ap-
proximated in terms of their relations to the cells
c.i;j / of I which is represented by the mappings
Scientific Fundamentals
x , y , · , u of signature I ! with D
fo; po; no):
Rough Set Theory
Rough set theory (Or owska 1998; Pawlak 1982,
1991) provides a formalism for approximating I : : : c.2;6/ c.3;6/ : : : c.3;5/ : : :
X D ’x D
subsets of a set when the set is equipped with an : : : po po : : : fo : : :
equivalence relation. An equivalence relation is a
I : : : c.5;4/ c.6;4/ c.5;3/ c.6;3/ : : :
binary relation which is re exive, symmetric, and Y D ’y D
: : : po po po po : : :
transitive. Given a set X , an equivalence relation
on X creates a partition I of X into a set of jointly I : : : c.0;1/ c.1;2/ c.2;2/ c.3;2/ : : :
Z D ’· D
exhaustive and pairwise disjoint subsets. Let [x] : : : po fo po no : : :
be the set of all members of X that are equivalent
to x with respect to, i.e., x D fy 2 X j x yg. I : : : c.0;1/ c.1;1/ c.1;2/ c.1;3/ : : :
U D ’u D
Then, I D f x j x 2 X g is a partition of X : the : : : po no no no : : :
Approximation 73
where the Boolean values are ordered by F < T. I. Each of the above triples provides an RCC5
The resulting ordering (which is similar to the relation; thus the relation between X and Y
conceptual neighborhood graph (Goodday and can be measured by a pair of RCC5 relations.
Cohn 1994)) is indicated by the arrows in Fig. 3. These relations will be denoted by Rmin .X; Y /
There are two approaches one can take to and Rmax .X ,Y /. One then can prove that the pairs
generalize the RCC5 classi cation from precise (Rmin .X; Y /, R max .X; Y //, which can occur, are
regions to approximations of regions. These two all pairs (a,b) where a b with the exception of
may be called the semantic and the syntactic. (PP,EQ) and (PPi, EQ).
Semantic generalization. One can de ne the Let the syntactic generalization of RCC5 de-
RCC5 relationship between approximations X ned by
and Y to be the set of relationships which occur
between any pair of precise regions having the S Y N .X; Y / D .Rmin .X; Y /; Rmax .X; Y // ;
approximations X and Y . That is, one can de ne
where Rmin and Rmax are de ned as described
SEM .X; Y / D fRC C 5.x; y/ j x 2 X in the previous paragraph. It then follows that
and y 2 Y g : for any approximations X and Y , the two ways
of measuring the relationship of X to Y are
Syntactic generalization. One can take a for- equivalent in the sense that
mal de nition of RCC5 in the precise case which
uses meet operations between regions and gener- SEM .X; Y /
alize this to work with approximations of regions D f 2 RCC5 j Rmin .X; Y / < q
by replacing the meet operations on regions by < qRmax .X; Y /g ;
analogous ones for approximations.
If X and Y are approximations of regions (i.e., where RCC5 is the set {EQ, PP, PPi, PO, DR}
functions from I to ), one can consider the two and is the ordering as indicated by the arrows
triples of Boolean values: in Fig. 3.
including spatial representation of objects with An important special case is the approxima-
indeterminate boundaries, representation of tion of objects with indeterminate boundaries
spatial data at multiple levels of resolution, with respect to so-called egg-yolk partitions A
representation of attribute data at multiple levels (Cohn and Gotts 1996). Here the partition
of resolution, and the representation of temporal consists of three concentric disks, called the
data. central core, the broad boundary, and the exterior.
An egg-yolk partition is chosen such that an
object with indeterminate boundaries has the
Objects with Indeterminate Boundaries relation fo to the central core, the relation po to
Geographic information is often concerned the broad boundary, and the relation no to the
with natural phenomena, cultural, and human exterior cell of the partition.
resources. These domains are often formed by
objects with indeterminate boundaries (Burrough Processing Approximate Geographic
and Frank 1995) such as The Ruhr, The Information at Multiple Levels of Detail
Alps, etc. Natural phenomena, cultural, and Partitions that form frames of references
human resources are not studied in isolation. for rough approximations can be organized
They are studied in certain contexts. In the hierarchically. In Fig. 4, three partitions which
spatial domain, context is often provided by partition the region A at different levels
regional partitions forming frames of reference. of resolution are depicted: {A}, {B; C },
Consider, for example, the location of the {D,E,F ,G,H }. Obviously, parts/subsets of
spatial object The Alps. It is impossible to A can be approximated at different levels
draw exact boundaries for this object. However, of granularity with respect to {A}, {B,C },
in order to specify its location, it is often or {D,E,F ,G,H }. Various approaches of
suf cient to say that parts of The Alps are processing approximations at and across different
located in South Eastern France, Northern Italy, levels of granularity in such hierarchical
Southern Germany, and so on. This means subdivisions have been proposed including
that one can specify the rough approximation (Bittner and Stell 2003; Stell and Worboys 1998;
of The Alps with respect to the regional Worboys 1998a, b).
partition created by the regions of the European
states. This regional partition can be re ned by
distinguishing northern, southern, eastern, and Processing Attribute Data
western parts of countries. It provides a frame From the formal development of rough approx-
of reference and an ordering structure which imations, it should be clear that its application
is used to specify the location of The Alps
and which can be exploited in the represen-
{A}
tation and reasoning process as demonstrated A
above.
The utilization of rough approximations in the
{B,C}
above context allows one to separate two aspects: C
(a) the exact representation of the location of
well-de ned objects using crisp regions and (b)
the nite approximation of the location of objects B
with indeterminate boundaries in terms of their E
relations to the regions of the well-de ned ones.
The approximation absorbs the indeterminacy D F G H {D,E,F,G,H}
(Bittner and Stell 2002b) and allows for determi-
nate representation and reasoning techniques as Approximation, Fig. 4 Partitions at multiple levels of
demonstrated above. resolution (Bittner and Stell 2003)
76 Approximation
d e f g h
{d, e, f, g, h}
(SOAP/XML, KML) and as component inter- During the 1980s ESRI devoted its resources
faces (.Net and Java). Developers can personalize to developing and applying a core set of ap-
and customize the existing software applications, plication tools that could be applied in a com-
build whole new applications, embed parts of puter environment to create a geographic infor-
ArcGIS in other software, and interface to other mation system. In 1982 ESRI launched its rst
software systems. commercial GIS software called ARC/INFO. It
combined computer display of geographic fea-
tures, such as points, lines, and polygons (the
Historical Background ARC software), with a database management
tool for assigning attributes to these features (the
ArcGIS is developed by a company called Envi- Henco, Inc., INFO DBMS). Originally designed
ronmental Systems Research Institute, Inc. (ESRI to run on minicomputers, ARC/INFO was the
pronounce each letter, it is not an acronym). rst modern GIS software. As the technology
Headquartered in Redlands, California, and with shifted operating system, rst to UNIX and later
of ces throughout the world, ESRI was founded to Windows, ESRI evolved software tools that
in 1969 by Jack and Laura Dangermond (who took advantage of these new platforms. This
to this day are president and vice-president) as shift enabled users of ESRI software to apply
a privately held consulting rm that specialized the principles of distributed processing and data
in land use analysis projects. The early mission management.
of ESRI focused on the principles of organizing The 1990s brought more change and
and analyzing geographic information; projects evolution. The global presence of ESRI grew
included developing plans for rebuilding the City with the release of ArcView, an affordable,
of Baltimore, Maryland, and assisting Mobil Oil relatively easy-to-learn desktop mapping tool,
in selecting a site for the new town of Reston, which shipped 10,000 copies in the rst 6 months
Virginia. of 1992. In the mid-1990s, ESRI released the
ArcGIS: General-Purpose GIS Software 79
rst of a series of Internet-based map servers ArcReader, ArcView, ArcEditor, and ArcInfo
that published maps, data, and metadata on the (Fig. 2). ESRI has built plug-in extensions (3D
web. These laid the foundation for today s server- Analyst, Spatial Analyst, Network Analyst,
based GIS called ArcGIS Server and a suite of etc.) which add new functional capabilities to
online web services called ArcWeb Services. the main desktop products. There is a desktop
In 1997 ESRI embarked on an ambitious re- runtime called ArcGIS Engine which is a set
search project to reengineer all of its GIS soft- of software components that developers can use
ware as a series of reusable software objects. to build custom applications and embed GIS
Several hundred person years of development functions in other applications. ArcGIS Server is
later, ArcInfo 8 was released in December 1999. also a scalable set of products, namely, ArcGIS
In April 2001, ESRI began shipping ArcGIS 8.1, Server Basic, Standard, and Advanced (with
a family of software products that formed a com- each available in either workgroup or enterprise
plete GIS built on industry standards that pro- editions). The mobile products include ArcPad
vides powerful, yet easy-to-use, capabilities right and ArcGIS Mobile, and to complete the picture,
out of the box. ArcGIS 9 followed in 2003 and there are a suite of ArcGIS Online web services
saw the addition of ArcGIS Server and ArcGIS which provide data and applications to desktop,
Online a part of ArcGIS that ESRI hosts in its server, and mobile clients.
own servers and makes accessible to users over Today, ESRI employs more than 4,000 staff
the web. worldwide, over 1,900 of which are based at the
Although developed as a complete system, worldwide headquarters in California. With 27
ArcGIS 9 is a portfolio of products and is international of ces, a network of more than 50
available in individual parts. The major product other international distributors, and over 2,000
groups are desktop, server, online, and mobile business partners, ESRI is a major force in the
(Fig. 1). ArcGIS desktop has a scalable set of GIS industry. ESRI s lead software architect,
products, in increasing order of functionality, Scott Morehouse, remains the driving force
80 ArcGIS: General-Purpose GIS Software
behind ArcGIS development, and he works and 3D multipatches), rasters, addresses, CAD
closely with Clint Brown, product development entities, topologies, terrains, networks, and
director; David Maguire, product director; and, surveys. In ArcGIS, geographic objects of the
of course, Jack Dangermond, president. same type (primarily the same spatial base di-
mensionality, projection, etc.) are conventionally
organized into a data structure called a layer.
Scientific Fundamentals Several layers can be integrated together using
functions such as overlay processing, merge, and
Fundamental Functional Capabilities map algebra. Geodatabases can be physically
ArcGIS is a very big software system with lit- stored in both le system les and DBMS tables
erally thousands of functional capabilities and (e.g., in DB2, Oracle, and SQL Server).
tens of millions of lines of software code. It is It is convenient to discuss the functional capa-
impossible, and in any case worthless, to try to bilities of ArcGIS in three main categories: geovi-
describe each piece of functionality here. Instead, sualization, geoprocessing, and geodata manage-
the approach will be to present some of the core ment.
foundational concepts and capabilities. Geovisualization, as the name suggests, is
The best way to understand ArcGIS is to start concerned with the visual portrayal of geographic
with the core information (some people use the information. It should come as no surprise
term data) model since it is this which de nes that many people frequently want to visualize
what aspects of the world can be represented geographic information in map or chart form.
in the software and is the push-off point for Indeed many people s primary use for a GIS is
understanding how things can be manipulated. to create digital and/or paper maps. ArcGIS has
ArcGIS s core information model is called the literally hundreds of functions for controlling
geographic database or geodatabase for short. the cartographic appearance of maps. These
The geodatabase de nes the conceptual and include specifying the layout of grids, graticules,
physical model for representing geographic legends, scale bars, north arrows, titles, etc., the
objects and relationships within the system. type of symbolization (classi cation, color, style,
Geodatabases work with maps, models, globes, etc.) to be used, and also the data content that
data, and metadata. Instantiated geodatabases will appear on the nal map. Once authored,
comprise information describing geographic maps can be printed or published in softcopy
objects and relationships that are stored in formats such as PDF or served up over the
les or DBMS. These are bound together at web as live map services. Additionally, many
runtime with software component logic that geographic work ows are best carried out
de nes and controls the applicable processes. using a map-centric interface. For example,
It is this combination of data (form) and editing object geometries, examining the results
software (process) which makes the geodatabase of spatial queries, and verifying the results
object oriented and so powerful and useful. For of many spatial analysis operations can only
example, a geodatabase can represent a linear really be performed satisfactorily using a map-
network such as an electricity or road network. based interface. ArcGIS supports multiple
The data for each link and node in a network is dynamic geovisualization display options such
stored as a separate record. Functions (tools or as 2D geographic (a continuous view of many
operators), such as tracing and editing that work geodatabase layers), 2D layout (geodatabase
with networks, access all the data together and layers presented in paper space), 3D local
organize it into a network data structure prior scenes (strictly a 2.5D scene graph view of local
to manipulation. Geodatabases can represent and regional data), and 3D global (whole-Earth
many types of geographic objects and associated view with continuous scaling of data).
rules and relationships including vector features The term geoprocessing is used to describe
(points, lines, polygons, annotations [map text], the spatial analysis and modeling capabilities
ArcGIS: General-Purpose GIS Software 81
of ArcGIS. ArcGIS adopts a data transforma- ported/exported in many standard formats (e.g.,
tion framework approach to analysis and mod- dxf and mif) and is accessible via standard-based
eling: data + operator = data. For example, streets interfaces (e.g., OGC WMS and WFS) and open A
data + buffer operator = streets_with_buffers data. APIs (application programming interfaces, e.g.,
ArcGIS has both a framework for organizing SQL, .Net, and Java), and the key data structure
geoprocessing and an extensive set of hundreds formats are openly published (e.g., shape le and
of operators that can be used to transform data. geodatabase).
The framework is used to organize operators
(also called functions or tools) and compile and Fundamental Design Philosophy
execute geoprocessing tasks or models (collec- The ArcGIS software has evolved considerably
tions of tools and data organized as a work ow) over the two and a half decades of its existence as
and interfaces to the other parts of ArcGIS that the underlying computer technologies, and con-
deal with geodata management and geovisual- cepts and methods of GIS have advanced. Nev-
ization. The set of operators includes tools for ertheless, many of original design philosophies
classic GIS analysis (overlay, proximity, etc.), are still cornerstones of each new release. Not
projection/coordinate transformation, data man- surprisingly, the original design goals have been
agement and conversion, domain-speci c anal- supplemented by more recent additions which
ysis 3D, surfaces, network, raster, geostatis- today drive the software development process.
tics, linear referencing, cartography, etc. and This section discusses the fundamental design
simulation modeling. Geoprocessing is widely philosophies of ArcGIS in no particular order of
used to automate repetitive tasks (e.g., load 50 signi cance.
CAD les into a geodatabase); integrate data
Commercial off-the-shelf (COTS) hardware.
(e.g., join major_streets and minor_streets data
ArcGIS has always run on industry standard
layers to create a single complete_streets layer),
COTS hardware platforms (including computers
as part of quality assurance work ows (e.g., nd
and associated peripherals, such as digitizers,
all buildings that overlap); and to create process
scanners, and printers). Today, hardware is
models (e.g., simulate the spread of re through
insulated by a layer of operating system
a forested landscape).
software (Windows, Linux, Solaris, etc.), and this
Geodata management is a very important part
constitutes much of the underlying computing
of GIS not least because geodata is a very valu-
platform on which the GIS software runs. The
able and critical component of the most well-
operating system affords a degree of hardware
established operational GIS. It is especially im-
neutrality. ArcGIS runs on well-established
portant in large enterprise GIS implementations
mainstream operating systems and hardware
because the data volumes tend to be enormous,
platforms.
and multiple users often want to share access.
ArcGIS has responded to these challenges by Multiple computer architectures. Parts of the
developing advanced technology to store and ArcGIS software system can run on desktop,
manage geodata in databases and les. An ef - server, and mobile hardware. There is also a
cient storage schema and well-tuned spatial and portion of ArcGIS that is available online for use
attribute indexing mechanisms support rapid re- over the web. The software can be con gured to
trieval of data record sets. Coordinating multiuser run stand alone on desktop and mobile machines.
updates to continuous geographic databases has It can also be con gured for workgroup and
been a thorny problem for GIS developers for enterprise use so that it runs as a client-server
many years. ArcGIS addresses this using an opti- and/or distributed server-based implementation.
mistic concurrency strategy based on versioning. This offers considerable exibility for end-use
The versioning data management software, data deployment. The newest release of the software
schema, and application business logic are a core is adept at exploiting the web as a platform for
part of ArcGIS. The data in ArcGIS can be im- distributed solutions.
82 ArcGIS: General-Purpose GIS Software
GIS professionals. The target user for the core guages including Farsi, French, German, Hebrew,
of ArcGIS is the GIS professional (loosely de- Japanese, Italian, Mandarin, Spanish, and Thai.
ned as a career GIS staff person). GIS pro-
fessionals often build and deploy professional
GIS applications for end users (e.g., planners, Key Applications
utility engineers, military intelligence analysts,
and marketing staff). The software is also fre- ArcGIS has been applied to thousands of dif-
quently incorporated in enterprise IT systems by ferent application arenas over the years. It is a
IT professionals and is increasingly being used by testament to the software s exibility and adapt-
consumers (members of the general public with ability that it has been employed in so many
very limited GIS skills). different application areas. By way of illustration,
this section describes some example application
Generic toolbox with customization. From the areas in which ArcGIS has been widely adopted.
outset ArcGIS was designed as a toolbox of
generic GIS tools. This means that functional Business
GIS capabilities are engineered as self-contained Businesses use many types of information geo-
software components or tools that can be ap- graphic locations, addresses, service boundaries,
plied to many different data sets and application sales territories, delivery routes, and more that
work ows. This makes the software very exible can be viewed and analyzed in map form. ArcGIS
and easily adaptable to many problem domains. software integrated with business, demographic,
The downside to this is that the tools need to geographic, and customer data produces applica-
be combined into application solutions that solve tions that can be shared across an entire organi-
problems, and this adds a degree of complexity. zation. Typical applications include selecting the
In recent releases of the software, this issue has best sites, pro ling customers, analyzing market
been ameliorated by the development of menu- areas, updating and managing assets in real time,
driven applications for key geographic work ows and providing location-based services (LBSs) to
(editing, map production, 3D visualization, busi- users. These applications are used extensively in
ness analysis, utility asset management and de- banking and nancial services, retailing, insur-
sign, etc.). ance, media and press, and real estate sectors.
Education
Strong release control. ArcGIS is a software In the education sector, ArcGIS is applied daily
product which means that it has well-de ned in administration, research, and teaching at the
capabilities, extensive online help and printed primary, secondary, and tertiary levels. In recent
documentation, and add-on materials (third- years, ArcGIS use has grown tremendously, be-
party scripts, application plug-ins, etc.), a coming one of the hottest new research and edu-
license agreement that controls usage, and that cation tools. At the primary and secondary level,
it is released under carefully managed version GIS provides a set life skills and a stimulating
control. This means that additions and updates learning environment. More than 100 higher ed-
to the product are added only at a new release ucation academic disciplines have discovered the
(about two to three times a year). power of spatial analysis with GIS. Researchers
are using GIS to nd patterns in drug arrests,
Internationalized and localized. The core soft- study forest rehabilitation, improve crop produc-
ware is developed in English and is internation- tion, de ne urban empowerment zones, facilitate
alized so that it can be localized into multiple historic preservation, develop plans to control
locales (local languages, data types, documen- toxic waste spills, and much more. GIS is also
tation, data, etc.). The latest release of ArcGIS a useful tool for the business of education. It
has been localized into more than 25 local lan- is used to manage large campuses, plan campus
ArcGIS: General-Purpose GIS Software 83
aR-Tree
Synonyms
Multi-resolution Aggregate Tree
Atlas information system; Atlas, electronic;
Atlas, interactive; Atlas, multimedia; Atlas,
virtual; Atlas, web; Cartographic information
Asset Pricing system; Earth, digital; Globe, virtual; Google
Map/Earth
Financial Asset Analysis with Mobile GIS
Definition
Association
Atlas information systems (AIS) are system-
Co-location Patterns, Algorithms atic, targeted collections of spatially related
knowledge in electronic form, allowing a user-
oriented communication for information and
Association Measures decision-making purposes. As in a conventional
atlas, an AIS mainly consists of a harmonized
collection of maps with different topics, scales,
Co-location Patterns, Interestingness Measures
and/or from different regions. The maps usually
come in standardized scales or degrees of
generalization, respectively. The different map
types have a common legend and symbolization.
Association Rules, Spatiotemporal The access to the maps is granted through
thematic or geographic indexes. AIS dispose
of special interactive functions for geographic
Movement Patterns in Spatio-Temporal Data
and thematic navigation, querying, analysis,
and visualization in 2D and 3D mode. Unlike
in many geographic information systems (GIS)
applications, the data in AIS is cartographically
Association Rules: Image Indexing
edited, and the functionality is intentionally
and Retrieval
limited in order to provide a user-targeted
set of data as well as adapted analysis and
Image Mining, Spatial
visualization functions. In multimedia atlases,
86 Atlas Information Systems
additional related multimedia information, like map series. Modern interactive atlases make use
graphics, diagrams, tables, text, images, videos, of vector data sets and/or statistical data which
animations, and audio documents, are linked are symbolized and visualized on the y (e.g.,
to the geographic entities. The access to data the Tirol Atlas). The atlases evolved from CD-
and functions is provided through a graphical ROM, then DVD to Internet-based or combined
user interface (GUI). Ef cient management interactive atlases.
of the increasing amount of information led
to the development of database-driven AIS.
Whereas some years ago, most AIS were based Scientific Fundamentals
on CD-ROM and DVD, and currently they
are increasingly based on Internet and WWW For the case of interactive maps on new media,
technologies. the classical graphical variables and their expres-
sions are extended as shown in Table 1 (Buziek
2001).
Historical Background The added values and advantages of AIS
compared to paper atlases can be summed up
The technological leap, which caused the tran- as follows: interactivity, navigation, maps as
sition from analog to digital cartography in the interface, exploration, customized/customizable
1980s, has also stimulated the development of to user s need, updatable, dynamics/animation,
interactive atlases. GIS, computer-aided design and multimedia integration (Ormeling 1996;
(CAD) systems, desktop publishing (DTP) sys- Borchert 1999).
tems, and the thereby evoked releases of geomet- The degree of interactivity, a very signi cant
ric and thematic cartographic data were the cata- element of the usability of a cartographic applica-
lysts of both digital and interactive cartographies. tion, is mainly based on the richness of available
It is disputed which atlas was the rst digital one: cartographic functions. Table 2 shows the most
Some authors claim that an early version of the important functions, arranged in ve main groups
Electronic Atlas of Canada was the rst digital (Cron 2006).
atlas (Ormeling 1995); others consider that it was Complementary, AIS can be characterized ac-
the Electronic Atlas of Arkansas (Siekierska and cording to the basic concepts as shown in Table 3.
Williams 1996). Early digital atlases had a rather Today, most atlases still consist of raster and
limited functionality, like name search, zoom, vector-based data, but a transition to relational or
and layer selection. Other atlases like the PC
version of the National Atlas of Sweden were
based on commercial GIS software. In the fol- Atlas Information Systems, Table 1 Aspects of carto-
graphic expression forms (After Buziek 2001)
lowing years, interactive atlases were evolving
with respect to content, data, and technology. In Aspects of cartographic ex- Ordering of aspects
pression forms
several countries national atlases on CD-ROM
were produced, either as a digital version of a Display media Print, screen,
projection
conventional paper atlas (such as the National
Dimension of representation 2D, pseudo-3D, 3D
Atlas of Germany) or as entirely interactive ver-
Degree of dynamics Static,
sion (such as the Atlas of Switzerland). In the cinematographic,
late 1990s, national mapping authorities began dynamic
to publish their topographical map series on CD- Degree of interaction Noninteractive,
ROM/DVD. A third group of atlases are counter partially interactive,
pieces to conventional world or school atlases, interactive
such as the Swiss World Atlas – interactive. Tech- Channels of representation Visual, acoustic, haptic
nologically, the rst atlases were based on raster User-map relation Separating, integrative,
ampli cation of reality
data maps like most of the electronic national
Atlas Information Systems 87
object-oriented vector databases can be observed. Today s AIS comprise of basic topographic
While most atlases are still bound to classic and thematic data and software allowing the cre-
computer interfaces like keyboards, mice, and ation of maps on demand, as in GIS (Da Silva A
screens, many have adapted to the touch user Ramos and Cartwright 2006). However the dif-
interface found in tablets and mobile phones. ferences between AIS and GIS can be perceived
Furthermore, Internet and mobile technologies when comparing three approaches for applying
increase the degree of system distribution. With GIS to the development of AIS (B r and Sieber
respect to interactivity, atlases are arranged into 1999; Schneider 2001). The concept multimedia
three groups: view-only atlases, interactive at- in GIS proposes the integration of multime-
lases, and analytical atlases (Siekierska and Tay- dia functionality in GIS, mainly at the cost of
lor 1991). The latter can be subdivided into sim- user friendliness. GIS in multimedia incorpo-
ple, constructive, and automatic analytical atlas rated explicitly de ned and developed GIS func-
types (Hurni 2006). Furthermore, many atlases tions in a cartographic multimedia environment.
serve no longer as a main, but as one out of The third concept GIS analysis for multimedia
several possible interfaces to the data, e.g., in the atlases combines a GIS, the authoring system,
Google search engine. and a multimedia map extension (GIS data con-
verter) in one common multimedia atlas develop-
Atlas Information Systems, Table 2 Main functions in a multimedia atlas information system (MAIS) (After Cron
2006)
Function groups Function subgroups Functions
General functions Mode selection, language selection, le import/export, printing,
placing bookmarks, hot spots, forward/backward, settings
(preferences), tooltips, display of system state, help, imprint,
home, exit
Navigation functions Spatial navigation Spatial unit selection, enlarge/reduce of map extend (zoom in,
zoom out, magni er), move map (pan, scroll), reference
map/globe, map rotation, determination of location (coordinates,
altitude), line of sight and angle, placement of pins,
spatial/geographical index, spatial/geographical search, tracking
Thematic navigation Theme selection and change, index of themes, search by theme,
theme favorites
Temporal navigation Time selection (positioning of time line, selection of time
period), animation (start/stop, etc.)
Didactic functions Explanatory functions Guided tours, preview, explanatory texts, graphics, images,
sounds, lms
Self-control functions Quizzes, games
Cartographic and Map manipulation Switch on/off layers, switch on/off legend categories,
visualization functions modi cation of symbolization, change of projection
Redlining Addition of user-de ned map elements, addition of labels
(labeling)
Explorative data Modi cation of classi cation, modi cation of appearance/state
analysis (brightness, position of sun), map comparison, selection of data
GIS functions Space- and Spatial query/position query (coordinates query/query of
object-oriented query altitude), measurement/query of distance and area, creating
functions pro le
Thematic query Thematic queries (data/attribute queries), access to statistical
functions table data
Analysis functions Buffering, intersection, aggregation and overlapping (transparent
overlapping/fading), terrain analysis (exposition, slope, etc.)
88 Atlas Information Systems
Atlas Information Systems, Table 3 Main characteristics and concepts of AIS (After Hurni 2006)
Main characteristic Characteristic/functionality Subgroups/remarks Examples
Data type and modeling Raster Raster GIS, map layers in
raster format
Vector Sequentially attributive DTP les with attributes
( le oriented)
Relational-topological Database-based system
(geometry and thematic data)
Object-oriented- OO-geo-databases
topological
Medium, Text Keyboard, alphanumeric
communication channel output
Language Voice output in car navigation
systems
Screen Stationary screens Computer screen
Portable screens Tablet, mobile phone
Degree of system distri- Off-line Local system (client AIS on storage media
bution based)
Online (1:1) Client/server based Swiss World Atlas
interactive
Distributed (1:n) One client/several WMS
servers
Multiple distributed Several clients/several Sensor networks coupled
(n:m) servers with distributed real-time
information systems
Degree of interactivity View only Display of prepared Information maps on the
maps Internet
Interactive Queries by criteria, AIS like Atlas of Switzerland
adjustment of V1
output/display
Simple analytical Combined queries, AIS with GIS functions, data
more complex however prepared (Atlas of
(GIS-like) analysis Switzerland 2 partially)
functions
Constructive analytical Direct processing of Web-GIS, projection web
user data, design services
possibilities
Automatic analytical Automatic data analysis Cartographic real-time web
and rule-based information systems, e.g.,
processing online avalanche maps, radar
precipitation maps,
egocentric real-time
information display on LBS,
online generalization
Priority of cartographic Map information Map functions as main AIS, web map information
functionality systems interaction tools systems
General information Map functions as Digital encyclopediae (e.g.,
systems further query and Encarta), environmental
export/display information systems, real
possibility estate portals, etc.
Atlas Information Systems 89
Atlas Information Systems, Table 4 Differences between GIS and multimedia atlas information systems (MAIS)
(Adapted after Schneider 1999)
GIS AIS A
Target users Experts Nonexperts (and experts)
Use of interface Complex Easy
Control of functions and data By users By authors
Guidance Minimal Distinct
Flow of information Unstructured Structured (narrative)
Main focus Handling, analysis, and presentation of data Visualization of themes
Data Raw, not integrated Edited, integrated
Data model Primary model Secondary model
Covered area Open Usually prede ned: regional, national
Computation time Short to long Short
Purpose Open for any kind of data and analysis Speci c purpose
ment environment. Table 4 shows the main differ- the full text and the maps plus some additional
ences between GIS and AIS (Schneider 1999). interactive maps (Hanewinkel and Tzschaschel
2005). An example for an entirely digital atlas
is the DVD-based Atlas of Switzerland 3 which
Key Applications consists of 1,700 interactive maps derived on the
y from digital topographic, environmental, and
World Atlases and School Atlases statistical base data, combined with multimedia
Interactive world atlases mainly consist of elements (Fig. 1) (Sieber et al. 2009).
physical (and some thematic) maps of the world
with search and index functions. An example is Topographic Atlases
Google Map/Earth which provides detailed base Many national or state mapping authorities pub-
map information with additional services such lish their topographic map series on web-based
as routing functions. User-generated information portals. In most cases, the maps are stored in
may be included, however, with only a limited raster format, but enriched with place names
quality control by the publisher. School atlases and vector line data for routes and trails. Some
are a special type of world atlases which also products offer the possibility of importing own
include more thematic maps and numerous data like GPS tracks or drawing map overlays.
carefully edited exemplary maps for didactic Simple analyses like measurement functions, pro-
purposes. An example is the Swiss World Atlas – les, and 3D displays are possible. Examples
interactive. are the USGS TOPO! Map layers available on
various web portals and the online version of the
National Atlases and Regional Atlases Swiss National Map Series. Many atlases also
National and regional atlases depict a country or display geo-referenced satellite or aerial images
a region in a broad variety of mainly thematic as additional layer.
maps. Today many national atlases have been
converted from the printed to the interactive form. Thematic Atlases and Statistical Atlases
There also exist mixed versions like the National Numerous atlases cover speci c thematic topics
Atlas of Germany which consists of a series of like geology, hydrology, climate, planning, his-
theme books and accompanying CD-ROMs with tory, etc., both in 2D and 3D (Fig. 2). Statistical
90 Atlas Information Systems
Atlas Information Systems, Fig. 1 Example of a multimedia national atlas: soil map in the Atlas of Switzerland,
combined with legend, text, and image information (' Atlas of Switzerland)
atlases allow the visualization of statistical data tions, rule-based display functions, or analysis
as choropleth or diagram maps, usually on the functions. Real-time data, for instance, will be
basis of administrative boundaries (e.g., the In- analyzed automatically and visualized on the y.
teractive Statistical Atlas of Switzerland). The integration of user-generated data will be
simpli ed, and an AIS will become a collabora-
tive platform that can constantly be maintained
and updated by crowdsourcing (shared editing).
Future Directions Furthermore, the quality of atlas maps generated
on the y by web services has increased. Cou-
A major focus will be the further development pled with data stored in distributed geospatial
of spatial data models and structures. Such data databases, AIS will evolve toward a service-
are usually managed and processed in relatively oriented architecture (Iosifescu Enescu 2011).
specialized systems like GIS. Data are enriched
with graphical attributes for cartographic
visualization and thematic attributes. Often, this
attributing is already handled the other way Cross-References
round: thematic data is stored in standardized,
distributed databases and they are additionally ArcGIS: General-Purpose GIS Software
annotated with spatial information, i.e., they are Constraint Data, Visualizing
geo-referenced. In the future the functionality Cyberinfrastructure for Spatial Data Integration
of GIS and general information systems will Data Analysis, Spatial
therefore converge. Search engines will, for Data Infrastructure, Spatial
instance, be equipped with more sophisticated Distributed Geospatial Computing (DGC)
geographical search functions. Exploratory Visualization
Speci c cartographic functions will be devel- Generalization and Symbolization
oped further, e.g., automatic generalization func- Generalization, On-the-Fly
Atlas Information Systems 91
Atlas Information Systems, Fig. 2 Example of a 3D labeled mountain names, and star constellations (' Atlas
atlas: 3D display of geological data as overlay on a digital of Switzerland)
elevation model; combined with legend, automatically
Atlas, Electronic
Synonyms
Atlas Information Systems
Spatial correlation; Spatial dependence; Spatial
interdependence
Atlas, Interactive
Definition
Atlas Information Systems
In many spatial data applications, the events at
a location are highly in uenced by the events at
neighboring locations. In fact, this natural incli-
nation of a variable to exhibit similar values as a
Atlas, Multimedia
function of distance between the spatial locations
at which it is being measured is known as spatial
Atlas Information Systems
dependence. Spatial autocorrelation is used to
measure this spatial dependence. If the variable
exhibits a systematic pattern in its spatial distribu-
tion, it is said to be spatially autocorrelated. The
Atlas, Virtual existence and strength of such interdependence
among values of a speci c variable with refer-
Atlas Information Systems ence to a spatial location can be quanti ed as a
positive, zero, or negative spatial autocorrelation.
Positive spatial autocorrelation indicates that sim-
ilar values or properties tend to be collocated,
Atlas, Web while negative spatial autocorrelation indicates
that dissimilar values or properties tend to be near
Atlas Information Systems each other. Random patterns indicate zero spa-
Autocorrelation, Spatial 93
tial autocorrelation since independent, identically across space aggregates similar values or prop-
distributed random data are invariant with regard erties adjacent to each other.
to their spatial location. In classical statistics, the observed samples are A
assumed to be independent and identically dis-
tributed (iid). This assumption is no longer valid
Historical Background for inherently spatially autocorrelated data. This
fact suggests that classical statistical tools like
The idea of spatial autocorrelation is not new linear regression are inappropriate for spatial data
in the literature and was conceptualized as early analysis. The inferences made from such analyses
as 1854, when nebula-like spatial clusters with are either biased, indicating that the observations
distance decay effects were readily apparent in are spatially aggregated and clustered, or overly
mapped cholera cases in the city of London precise, indicating that the number of real inde-
(Moore and Carpenter 1999). This led to the pendent variables is less than the sample size.
hypothesis that the systematic spatial pattern of When the number of real independent variables is
cholera outbreak decayed smoothly with distance less than the sample size, the degree of freedom
from a particular water supply which acted as the of the observed data is lower than that assumed in
source for the disease. This concept of spatial au- the model.
tocorrelation was also documented in the rst law
of geography in 1970 which states: Everything
is related to everything else, but near things are
more related than distant things (Tobler 1970). Scale Dependence of Spatial
Autocorrelation
The strength of spatial autocorrelation is often a
Scientific Fundamentals function of scale or spatial resolution, as illus-
trated in Fig. 1 using black and white cells. High
Spatial autocorrelation is a property of a vari- negative spatial autocorrelation is exhibited in
able that is often distributed over space (Shekhar Fig. 1a since each cell has a different color from
and Chawla 2003). For example, land-surface its neighboring cells. Each cell can be subdivided
elevation values of adjacent locations are gener- into four half-size cells (Fig. 1b), assuming the
ally quite similar. Similarly, temperature, pres- cell s homogeneity. Then, the strength of spa-
sure, slopes, and rainfall vary gradually over tial autocorrelation among the black and white
space, thus forming a smooth gradient of a vari- cells increases, while maintaining the same cell
able between two locations in space. The propen- arrangement. This illustrates that spatial autocor-
sity of a variable to show a smooth gradient relation varies with the study scale.
Autocorrelation, Spatial,
Fig. 1 The strength of
spatial autocorrelation as a
function of scale using:
(a) 4-by-4 raster and
(b) 8-by-8 raster
94 Autocorrelation, Spatial
a b c
10 10 10
20 20 20
30 30 30
40 40 40
50 50 50
60 60 60
10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60
Autocorrelation, Spatial, Fig. 2 Three different data distributions. (a) Binary distributed data in space. (b) Random
uniformly distributed lattice data. (c) Random normally distributed lattice data in space
Differentiating Random Data from Spatial ples, demonstrating the nonexistence of spatial
Data autocorrelation in randomly generated data sets.
Consider three different random distributions and Consider a digital elevation model (DEM)
three lattice grids of 64-by-64 cells (see Fig. 2) of that shows an array of elevations of the land
each distribution: the rst lattice data set (Fig. 2a) surface at each spatial location (i; j / as shown in
is generated from a binary distribution, the sec- Fig. 3a. The values of this data set do not change
ond data set (Fig. 2b) is generated from a uni- abruptly, whereas in Fig. 3b, the difference of
form distribution, and the third data set is gen- the elevations between the location (i; j / and its
erated from a normal distribution. The value at neighborhoods changes abruptly as shown in its
pixel .i; j /; P .i; j / fP .i; j /I i D 1; : : : ; 64; j D corresponding color scheme.
1; : : : ; 64g is assumed to be independent and The variogram, a plot of the dissimilarity
identically distributed. As shown in Fig. 2, the against the spatial separation (i.e., the lag
non-clustering or spatial segregation of the data distance) (Wackernagel 2003) in spatial data,
suggests that the value P .i; j /; where i , j 2 R, quanti es spatial autocorrelation and represents
has no correlation (zero correlation) with itself in how spatial variability changes with lag distance
space. (Devary and Rice 1982). In Fig. 4a, the semi-
Each pixel .i; j / has eight neighborhoods, variogram value of the DEM surface is zero
and each neighborhood also has its own eight at the zero lag distance and increases with
adjacent neighborhoods except the cells located the lag distance, whereas in Fig. 4b, the semi-
on the boundary. The variability of P .i; j / in one variogram value of the random surface varies
direction will not be the same in other directions, erratically with the increasing lag distance.
thus forming an anisotropic system, indicating Contrary to spatial autocorrelation, the semi-
the spatial autocorrelation varies in all directions. variogram has higher values in the absence
The quanti cation of this directional spatial au- of spatial correlation and lower values in
tocorrelation is computationally expensive; thus, the presence of spatial correlation. This
the average of each direction at distance k is indicates that spatial autocorrelation gradually
used to quantify the spatial autocorrelation. The disappears as the separation distance increases
distance k (e.g., k pixel separation of (i; j / in Spatial Autocorrelation (2006) (Fig. 4a). These
any direction) is called lag distance k. The spa- variogram gures are generated at a point (xi )
tial autocorrelation from each spatial entity to by comparing the values at its four adjacent
all other entities can be calculated. The average neighbors such that:
value over all entities of the same lag distance is
expressed as a measure of spatial autocorrelation. 1 X
n
The above three data sets are illustrative exam- .h/ D .·.xi / ·.xi C h//2 ; (1)
N.h/
iD1
Autocorrelation, Spatial 95
a b
200 200 A
400 400
600 600
800 800
1000 1000
1200 1200
200 400 600 800 1000 1200 1400 1600 200 400 600 800 1000 1200 1400 1600
Autocorrelation, Spatial, Fig. 3 (a) One meter spatial resolution LIDAR DEM for South Fork Eel, California. (b)
One meter normally distributed DEM reconstructed for same statistics (i.e., mean and variance) as LIDAR DEM in (a)
where ·.xi / and ·.xi C h) are the values of the where N is the number of cases, XN is the mean
function · located at xi and (xi C h), respec- value of the variable X , Xi and Xj are the values
tively. The four adjacent average of the squared of the variable X at location i and j , respectively,
difference values along the X and Y axes at lag and Wi;j is the weight applied to the comparison
distance h are used in these variogram clouds. between the values at i and j .
The semi-variogram values in Fig. 3a (generated The same equation in matrix notation can also
from lattice data) increase with increasing lag be represented as Shekhar and Chawla (2003):
distance, whereas the semi-variogram values gen-
erated from point data reach a steady state with
increasing lag distance. ·W ·t
I D ; (3)
··t
How to Quantify Spatial Autocorrelation
Several indices can be used to quantify spatial
where · D .x1 x; x2 x; : : : ; xn x/; ·t is the
autocorrelation. The most common techniques
transpose of matrix · and W is the same contigu-
are Moran s I , Geary s C , and spatial autore-
ity matrix of n- by- n that has been introduced in
gression. These techniques are described in the
Eq. 2.
following sections.
An important property in Moran s I is that
Moran’s I Method the index I depends not only on the variable X ,
Moran s I index is one of the oldest (Moran but also on the data s spatial arrangement. The
1950) methods in spatial autocorrelation and is spatial arrangement is quanti ed by the conti-
still the de facto standard method of quantify- guity matrix, W . If a location i is adjacent to
ing spatial autocorrelation (Moran 1950). This location j , then this spatial arrangement receives
method is applied for points or zones with con- the weight of 1; otherwise the value of the weight
tinuous variables associated with them. The value is 0. Another option is to de ne W based on
obtained at a location is compared with the value the squared inverse distance .1=dij2 / between the
of other locations. Morgan s I method can be locations i and j (Lembo 2006). There are also
de ned as: other methods to quantify this contiguity matrix.
P P For example, the sum of the products of the vari-
N i j Wi;j .Xi X/.X N j XN / able x can be compared at locations i and j and
I D P P P ; (2) then weighted by the inverse distance between i
Wi;j .Xi X/ N 2
i j i
and j .
96 Autocorrelation, Spatial
Autocorrelation, Spatial, a
Fig. 4 (a) Variogram for 3
spatial data in Fig. 3a.
(b) Variogram for the
2.5
random data in Fig. 3b
semivariogram
1.5
0.5
0
0 20 40 60 80 100 120 140 160 180 200
lag distance in m
b ×10–3
4
3.5
2.5
semivariogram
1.5
0.5
0
0 20 40 60 80 100 120 140 160 180 200
lag distance in m
1
where Q D .I W /:
Y Observation or dependent variable,
Spatial autoregressive parameter,
Demonstration Using Mathworks Matlab
Y Observation or dependent variable,
Software
W Contiguity matrix, Matlab software (MATLAB 1997) is used to
Regressive coef cient, demonstrate this example. The following ve ma-
Unobservable error term (N (0, I)), trices are de ned for W , , ", , and X as W D
X Feature values or independent variable. 0; 0:5; 0:5; 0I 0:5; 0; 0; 0:5I 0:5; 0; 0; 0:5I 0; 0:5;
0:5; 0 ; D 0:1 ; " D 0:01*rand .4; 1/; D
When D 0, this model is reduced to the 1:0; X D 100I 101I 102I 103 . The above
ordinary least square regression equation. de ned values are substituted into Eq. 6,
The solution for Eq. 5 is not straightforward, which can be shown in Matlab notation as
and the contiguity matrix W gets quadratic in y D i nv.eye.4; 4/ . *W /*( *X C "/.
size compared to the original size of data sample. The solution provides an estimation of y D
However, most of the elements of W are zero; 111:2874; 112:2859; 113:2829; d114:2786 .
thus, sparse matrix techniques are used to speed
up the solution process (Shekhar and Chawla
2003).
Key Applications
Illustration of SAR Using Sample Data Set
The key application of spatial autocorrelation
Consider the following 2-by-2 DEM grid data set.
is to quantify the spatial dependence of spatial
variables. The following are the examples from
100 101 various disciplines where spatial autocorrelation
102 103 is used:
98 Autocorrelation, Spatial
Biology Economics
Patterns and processes of genetic divergence Because of the heterogeneity across regions and a
among local populations have been investigated large number of regions strongly interacting with
using spatial autocorrelation statistics to describe each other, economic policy measures are tar-
the autocorrelation of gene frequencies for geted at the regional level. Superimposed spatial
increasing classes of spatial distance. Spatial structures from spatial autocorrelation analysis
autocorrelation analysis has also been used to improve the forecasting performance of nonspa-
study a variety of phenomena, such as the genetic tial forecasting models. The spatial dependence
structure of plant and animal populations and the and spatial heterogeneity can be used to inves-
distribution of mortality patterns. tigate the effect of income and human capital
Automated Map Generalization 99
Bayesian Inference
Definition
Hurricane Wind Fields, Multivariate Model-
ing A Bayesian network (BN) is a graphical-
mathematical construct used to probabilistically
model processes which include interdependent
variables, decisions affecting those variables, and
costs associated with the decisions and states
Bayesian Maximum Entropy of the variables. BNs are inherently system
representations and, as such, are often used
Uncertainty, Modeling with Spatial and Tem- to model environmental processes. Because
poral of this, there is a natural connection between
certain BNs and GIS. BNs are represented as probabilistic map algebra) as demonstrated in
a directed acyclic graph structure with nodes Taylor (2003); (2) BN-based classification as
(representing variables, costs, and decisions) and demonstrated in Stassopoulou et al. (1998)
arcs (directed lines representing conditionally and Stassopoulou et al. (1998); (3) using BNs
probabilistic dependencies between the nodes). A for intelligent, spatially oriented data retrieval,
BN can be used for prediction or analysis of real- as demonstrated in Walker et al. (2004) and
world problems and complex natural systems Walker et al. (2005); and (4) GIS-based BN
where statistical correlations can be found decision support system (DSS) frameworks
between variables or approximated using expert where BN nodes are spatially represented in
opinion. BNs have a vast array of applications a GIS framework as presented by Ames et al.
for aiding decision-making in areas such as (2005).
medicine, engineering, natural resources, and
decision management. BNs can be used to model
geospatially interdependent variables as well
Scientific Fundamentals
as conditional dependencies between geospatial
layers. Additionally, BNs have been found to be
As noted above, BNs are used to model reality by
useful and highly efficient in performing image
representing conditional probabilistic dependen-
classification on remotely sensed data.
cies between interdependent variables, decisions,
and outcomes. This section provides an in-depth
explanation of BN analysis using an example BN
Historical Background model called the “Umbrella” BN (Fig. 1), an aug-
mented version of the well-known “Weather” in-
Originally described by Pearl (1988), BNs have
fluence diagram presented by Shachter and Peot
been used extensively in medicine and computer
(1992). This simple BN attempts to model the
science (Heckerman 1997). In recent years,
variables and outcomes associated with the de-
BNs have been applied in spatially explicit
cision to take or not take an umbrella on a
environmental management studies. Examples
given outing. This problem is represented in the
include the Neuse Estuary Bayesian ecological
response network (Borsuk and Reckhow 2000),
Baltic salmon management (Varis and Kuikka
1996), climate change impacts on Finnish A. Forecast B. Weather
watersheds (Kuikka and Varis 1997), the Interior
Columbia Basin Ecosystem Management Project Sunny No Rain
(Lee and Bradshaw 1998), and waterbody Cloudy
eutrophication (Haas 1998). As illustrated Rainy Rain
in these studies, a BN graph structures a
problem such that it is visually interpretable by
stakeholders and decision-makers while serving
as an efficient means for evaluating the probable
outcomes of management decisions on selected C. Take Umbrella D. Satisfaction
variables.
Take
Both BNs and GIS can be used to represent
spatially explicit, probabilistically connected
environmental and other systems; however, the Do not Take
integration of the two techniques has only been
explored relatively recently. BN integration with Bayesian Network Integration with GIS, Fig. 1
GIS typically takes one of the four distinct Umbrella Bayesian decision network structure. A and B
forms: (1) BN-based layer combination (i.e., nature nodes, C a decision node, and D a utility node
Bayesian Network Integration with GIS 103
BN by four nodes. “Weather” and “Forecast” the conditional probability of event A given B,
are nature or chance nodes where “Forecast” multiplied by the probability of event B (Eq. 2):
is conditioned on the state of “Weather” and
“Weather” is treated as a random variable with P .A; B/ D P .AjB/ P .B/ : (2)
a prior probability distribution based on histor- B
ical conditions. “Take Umbrella” is a decision Equation 2 is used to compute the probability
variable that, together with the “Weather” vari- of any state in the Bayesian network given the
able, defines the status of “Satisfaction.” The states of the parent node events. In Eq. 3, the
“Satisfaction” node is known as a “Utility” or probability of state Ax occurring given parent B
“Value” node. This node associates a resultant is the sum of the probabilities of the state of Ax
outcome value (monetary or otherwise) to repre- given state Bi , with i being an index to the states
sent the satisfaction of the individual based on of B, multiplied by the probability of that state
the decision to take the umbrella and whether of B:
or not there is rain. Each of these BN nodes
contains discrete states where each variable state X
P .Ax ; B/ D P .Ax jBi / P .Bi / : (3)
represents abstract events, conditions, or numeric i
ranges of each variable.
The Umbrella model can be interpreted as Similarly, for calculating states with multiple
follows: if it is raining, there is a higher prob- parent nodes, the equation is modified to make
ability that the forecast will predict it will rain. the summation of the conditional probability of
In reverse, through the Bayesian network “back- the state Ax given states Bi and Cj multiplied by
ward propagation of evidence,” if the forecast the individual probabilities of Bi and Cj :
predicts rain, it can be inferred that there is
a higher chance that rain will actually occur. P .Ax ; B; C /
The link between “Forecast” and “Take Um- X
brella” indicates that the “Take Umbrella” deci- D P .Ax jBi ; Cj / P .Bi / P .Cj / : (4)
sion is based largely on the observed forecast. i;j
Finally, the link to the “Satisfaction” utility node
from both “Take Umbrella” and “Weather” cap- Finally, though similar in form, utility nodes
tures the relative gains in satisfaction derived do not calculate probability, but instead calcu-
from every combination of states of the BN late the utility value as a metric or index given
variables. the states of its parent or parents as shown in
Bayesian networks are governed by two math- Eqs. 5 and 6:
ematical techniques: conditional probability and X
Bayes’ theorem. U.A; B/ D U.AjBi / P .Bi / (5)
Conditional probability is defined as the prob- i
ability of one event given the occurrence of an-
other event and can be calculated as the joint
probability of the two events occurring divided U.A; B; C /
by the probability of the second event: X
D U.AjBi ; Cj / P .Bi / P .Cj / : (6)
P .A; B/ i;j
P .AjB/ D : (1)
P .B/
The second equation that is critical to BN
From Eq. 1, the fundamental rule for proba- modeling is Bayes’ theorem:
bility calculus and the downward propagation of
evidence in a BN can be derived. Specifically, it is P .BjA/ P .A/
P .AjB/ D : (7)
seen that the joint probability of A and B equals P .B/
104 Bayesian Network Integration with GIS
The conditional probability inversion repre- Bayesian Network Integration with GIS, Table 3
sented here allows for the powerful technique Satisfaction utility conditioned on rain and the “Take
Umbrella” decision
of Bayesian inference, for which BNs are par-
ticularly well suited. In the Umbrella model, Satisfaction
inferring a higher probability of a rain given a Weather Take Umbrella Satisfaction
rainy forecast is an example application of Bayes’ No Rain Take 20 units
theorem. No Rain Do not Take 100 units
Connecting each node in the BN is a condi- Rain Take 70 units
tional probability table (CPT). Each nature node Rain Do not Take 0 units
(state variable) includes a CPT that stores the
probability distribution for the possible states of
the variable given every combination of the states Table 3 is a utility table defining the relative
of its parent nodes (if any). These probability gains in utility (in terms of generic “units” of
distributions can be assigned by frequency anal- satisfaction) under all of the possible states of
ysis of the variables and expert opinion based on the BN. Here, satisfaction is highest when there
observation or experience, or they can be set to is no rain and the umbrella is not taken and
some “prior” distribution based on observations lowest when the umbrella is not taken but it does
of equivalent systems. rain. Satisfaction “units” are in this case assigned
Tables 1 and 2 show CPTs for the Umbrella as arbitrary ratings from 0 to 100, but in more
BN. In Table 1, the probability distribution of complex systems, utility can be used to represent
rain is represented as 70% chance of no rain and monetary or other measures.
30% chance of rain. This CPT can be assumed Following is a brief explanation of the imple-
to be derived from historical observations of the mentation and use of the Umbrella BN. First it
frequency of rain in the given locale. Table 2 is useful to compute P (Forecast D Sunny) given
represents the probability distribution of the pos- unknown Weather conditions as follows:
sible weather forecasts (“Sunny,” “Cloudy,” or
“Rainy”) conditioned on the actual weather event.
For example, when it actually rained, the prior P .Forecast D Sunny/
forecast called for “Rainy” 60% of the time, X
D P .Forecast
“Cloudy” 25% of the time, and “Sunny” 15% iDNoRain; Rain
of the time. Again, these probabilities can be
derived from historical observations of prediction D SunnyjWeatheri / P .Weatheri /
accuracies or from expert judgment. D 0:7 0:7 C 0:15 0:3 D 0:535 D 54%:
Bayesian Network Integration with GIS, Table 1 Next P (Forecast D Cloudy) and P (Forecast
Probability of rain D Rainy) can be computed as
Weather
No rain Rain P .Forecast D Cloudy; Weather/
70% 30%
D 0:2 0:7 C 0:25 0:3 D 0:215 D 22%
Similarly, the utility of not taking the umbrella A brief explanation of the scientific fundamentals
is computed as of each of these uses is presented here.
Bayesian Network Integration with GIS, Fig. 2 The East Canyon Creek BDN from Ames et al. (2005), as seen in
the GeNIe (Decision Systems Laboratory 2006) graphical node editor application
be more informative for decision-makers than an given raster cell based on the input layers. The
indicator model that simply displays the sum of application of the final BN model to predict
some number of reclassified indicators. land cover or other classifications at an unknown
point is similar to the probabilistic map algebra
Image Classi cation described previously.
In the previous examples, BN CPTs are derived
from historical data or information from experts.
However, many BN applications make use of the Automated Data Query and Retrieval
concept of Bayesian learning as a means of au- In the case of application of BNs to automated
tomatically estimating probabilities from existing query and retrieval of geospatial data sets, the
data. BN learning involves a formal automated goal is typically to use expert knowledge to
process of “creating” and “pruning” the BN node- define the CPTs that govern which data layers
arc structure based on rules intended to maximize are loaded for visualization and analysis. Using
the amount of unique information represented by this approach in a dynamic web-based mapping
the BN CPTs. In a GIS context, BN learning system, one could develop a BN for the display of
algorithms have been extensively applied to im- layers using a CPT that indicates the probability
age classification problems. Image classification that the layer is important, given the presence or
using a BN requires the identification of a set absence of other layers or features within layers at
of input layers (typically multispectral or hyper- the current view extents. Such a tool would sup-
spectral bands) from which a known set of objects plant the typical approach which is to activate or
or classifications are to be identified. deactivate layers based strictly on “zoom level.”
Learning data sets include both input and For example, consider a military GIS mapping
output layers where output layers clearly indicate system used to identify proposed targets. A BN-
features of the required classes (e.g., polygons in- based data retrieval system could significantly
dicating known land cover types). A BN learning optimize data transfer and bandwidth usage by
algorithm applied to such a data set will produce only showing specific high-resolution imagery
an optimal (in BN terms) model for predicting when the probability of needing that data is raised
land cover or other classification schemes at a due to the presence of other features which in-
Bayesian Network Integration with GIS 107
Bayesian Network Integration with GIS, Fig. 3 (a) East Canyon displayed with the East Canyon BN overlain on it.
(b) Same, but with the DEM layer turned off and the BN network lines displayed
dicate a higher likelihood of the presence of the in Fig. 2. This BN is a model of streamflow
specific target. (FL_TP and FL_HW) at both a wastewater
BN-based data query and retrieval systems can treatment plant and in the stream headwaters,
also benefit from Bayesian learning capabilities conditional on the current season (SEASON).
by updating CPTs with new information or ev- Also the model includes estimates of phosphorus
idence observed during the use of the BN. For concentrations at the treatment plant and in the
example, if a user continually views several data headwaters (PH_TP and PH_HW) conditional
sets simultaneously at a particular zoom level or on the season and also on operations at both the
in a specific zone, this increases the probability treatment plant (OP_TP) and in the headwaters
that those data sets are interrelated and should (OP_HW). Each of these variables affects phos-
result in modified CPTs representing those con- phorus concentrations in the stream (PH_ST)
ditional relationships. and ultimately reservoir visitation (VIS_RS).
Costs of operations (CO_TP and CO_HW) as
Spatial Representation of BN Nodes well as revenue at the reservoir (REV_RS) are
Many BN problems and analyses though not represented as utility nodes in the BN.
completely based on geospatial data have a clear Most of the nodes in this BN (except for
geospatial component and as such can be mapped SEASON) have an explicit spatial location (i.e.,
on the landscape. This combined BN-GIS they represent conditions at a specific place). Be-
methodology is relatively new but has significant cause of this intrinsic spatiality, the East Canyon
potential for helping improve the use and under- BN can be represented in a GIS with points
standing of a BN. For example, consider the East indicating nodes and arrows indicating the BN
Canyon Creek BN (Ames et al. 2005) represented arcs (i.e., Fig. 3). Such a representation of a BN
108 Bayesian Spatial Regression
within a GIS can give the end users a greater Taylor KJ (2003) Bayesian belief networks: a conceptual
understanding of the context and meaning of the approach to assessing risk to habitat. Utah State Uni-
versity, Logan
BN nodes. Additionally, in many cases, it may be Varis O, Kuikka S (1996) An influence diagram approach
that the BN nodes correspond to specific geospa- to Baltic salmon management. In: Proceedings of the
tial features (e.g., a particular weather station) in conference on decision analysis for public policy in
which case spatial representation of the BN nodes Europe, INFORMS decision analysis society, Atlanta
Walker A, Pham B, Maeder A (2004) A Bayesian frame-
in a GIS can be particularly meaningful. work for automated dataset retrieval. In: Geographic
information systems. 10th International Multimedia
Future Directions Modelling Conference (MMM), Brisbane, p 138
Walker A, Pham B, Moody M (2005) Spatial Bayesian
learning algorithms for geographic information re-
It is expected that research and development of trieval. In: Proceedings 13th annual ACM international
tools for the combined integration of GIS and workshop on geographic information systems, Bre-
BNs will continue in both academia and com- men, pp 105–114
mercial entities. New advancements in each of
the application areas described are occurring on a
regular basis and represent an active and interest- Recommended Reading
ing study area for many GIS analysts and users.
Ames DP (2002) Bayesian decision networks for water-
shed management. Utah State University, Logan
References Norsys Software Corp (2006) Netica Bayesian belief
network software. Acquired from http://www.norsys.
Ames DP, Neilson BT, Stevens DK, Lall U (2005) Us- com/
ing Bayesian networks to model watershed manage- Stassopoulou A, Caelli T (2000) Building detection using
ment decisions: an East Canyon Creek case study. J Bayesian networks. Int J Pattern Recognit Artif Intell
Hydroinform 7:267–282. IWA Publishing 14(6):715–733
Borsuk ME, Reckhow KH (2000) Summary description of
the Neuse estuary Bayesian ecological response net-
work (Neu-BERN). http://www2.ncsu.edu/ncsu/CIL/
WRRI/neuseltm.html. 26 Dec 2001
Haas TC (1998) Modeling waterbody eutrophication with Bayesian Spatial Regression
a Bayesian belief network. Working paper, School
of Business Administration, University of Wisconsin, Bayesian Spatial Regression for Multisource
Milwaukee
Heckerman D (1997) Bayesian networks for data mining.
Predictive Mapping
Data Mining Knowl Discov 1:79–119. MapWindow
Open Source Team (2007). MapWindow GIS 4.3 Open
Source Software. Accessed 06 Feb 2007 at the Map-
Window Website: http://www.mapwindow.org/ Bayesian Spatial Regression for
Kuikka S, Varis O (1997) Uncertainties of climate change Multisource Predictive Mapping
impacts in Finnish watersheds: a Bayesian network
analysis of expert knowledge. Boreal Environ Res
2:109–128 Andrew O. Finley1 and Sudipto Banerjee2
1
Lee DC, Bradshaw GA (1998) Making monitor- Department of Forestry and Department of
ing work for managers: thoughts on a concep- Geography, Michigan State University, East
tual framework for improved monitoring within
broad-scale ecosystem management. http://icebmp.
Lansing, MI, USA
2
gov/spatial/lee_monitor/preface.html (26 Dec 2001) Biostatistics, School of Public Health, The
Pearl J (1988) Probabilistic reasoning in intelligent sys- University of Minnesota, A460 Mayo Bldg.
tems: networks of plausible inference. Morgan Kauf- MMC303, Minneapolis, MN, USA
mann, San Francisco
Shachter R, Peot M (1992) Decision making using prob-
abilistic inference methods. In: Proceedings of the
eighth conference on uncertainty in artificial intelli- Synonyms
gence, Stanford, pp 275–283
Stassopoulou A, Petrou M, Kittler J (1998) Application of
a Bayesian network in a GIS based decision making Bayesian spatial regression; Pixel-based predic-
system. Int J Geograph Inf Sci 12(1):23–45 tion; Spatial regression
Bayesian Spatial Regression for Multisource Predictive Mapping 109
distribution of the parameters conditional upon and spectral change. As the pixel size decreases,
the data a posteriori. By modeling both the the signal is reduced and so too is the sensor’s
observed data and any unknown regressor ability to detect changes in brightness. Scale
or covariate effects as random variables, the refers to the geographic extent of the image or
hierarchical Bayesian approach to statistical scene recorded by the sensor. Scale and spatial
analysis provides a cohesive framework for resolution hold an inverse relationship; that is,
combining complex data models and external the greater the spatial resolution, the smaller the
knowledge or expert opinion. A theoretical extent of the image.
foundation for contemporary Bayesian modeling In addition to the academic publications
can be found in several key texts, including noted above, numerous texts (see, e.g., Campbell
Banerjee et al. (2004), Carlin and Louis (2000), 2006; Mather 2004; Richards and Xiuping 2005)
Gelman et al. (2004), and Robert (2001). provide detail on acquiring and processing
remotely sensed imagery for use in prediction
models. The modeling illustrations offered in this
Scientific Fundamentals entry use imagery acquired from the Thematic
Mapper (TM) and Enhanced Thematic Mapper
This entry focuses on predictive models that use Plus (ETM+) sensors mounted on the Landsat 5
covariates derived from digital imagery captured and Landsat 7 satellites, respectively (see, e.g.,
by sensors mounted on orbiting satellites. These http://landsat.gsfc.nasa.gov for more details).
modern spaceborne sensors are categorized as These are considered mid-resolution sensors
either passive or active. Passive sensors detect because the imagery has moderate spatial,
the reflected or emitted electromagnetic radiation radiometric, and spectral resolution. Specifically,
from natural sources (typically solar energy), the sensors record reflected or emitted radiation
while active sensors emit energy that travels to in blue-green (band 1), green (band 2), red
the surface feature and is reflected back toward (band 3), near-infrared (band 4), mid-infrared
the sensor, such as radar or light detection and (bands 5 and 7), and far-infrared (band 6)
ranging (LIDAR). The discussion and illustration portions of the electromagnetic spectrum. Their
covered here focus on data from passive sensors, radiometric resolution within the bands records
but can be extended to imagery obtained from brightness at 265 levels (i.e., 8 bits) with a spatial
active sensors. resolution of 30 30 m pixels (with the exception
The resolution and scale are additional sensor of band 6 which is 120 120). The scale of these
characteristics. There are three components to images is typically 185 km wide by 170 km long,
resolution: (1) spatial resolution refers to the size which is ideal for large-area moderate-resolution
of the image pixel, with high spatial resolution mapping.
corresponding to small pixel size; (2) radiometric In addition to the remotely sensed covariates,
resolution is the sensor’s ability to resolve levels predictive models require georeferenced mea-
of brightness, and a sensor with high radiometric surements of the response variables of interest.
resolution can distinguish between many levels of Two base units of measure and mapping are
brightness; and (3) spectral resolution describes commonly encountered: locations that are areas
the sensor’s ability to define wavelength intervals, or regions with well-defined neighbors (such as
and a sensor with high spectral resolution can pixels in a lattice, counties in a map, etc.), whence
record many narrow wavelength intervals. These they are called areally referenced data, or loca-
three components are related. Specifically, higher tions that are points with coordinates (latitude-
spatial resolution (i.e., smaller pixel size) results longitude, easting-northing, etc.), in which case
in lower radiometric and/or spectral resolution. In they are called point referenced or geostatistical.
general terms, if pixel size is large, the sensor Statistical theory and methods play a crucial role
receives a more robust signal and can then dis- in the modeling and analysis of such data by
tinguish between a smaller degree of radiometric developing spatial process models, also known
Bayesian Spatial Regression for Multisource Predictive Mapping 111
as stochastic process or random function models, K.si ; sj I / ni;j D1 is the n n covariance matrix
that help in predicting and estimating physical with .i; j /-th element given by K.si ; sj I /.
phenomena. This entry deals with the latter – Clearly K.s; s0 I / cannot be just any function;
modeling of point-referenced data. it must ensure that the resulting w matrix is
The methods and accompanying illustration symmetric and positive definite. Such functions B
presented here provide pixel-level prediction at are known as positive definite functions and are
the lowest spatial resolution offered in the set characterized as the characteristic function of a
of remotely sensed covariates. In the simplest symmetric random variable (due to a famous
setting, it is assumed that the remotely sensed theorem due to Bochner). Further technical
covariates cover the entire area of interest, re- details about positive definite functions can be
ferred to as the domain, D. Further, all covariates found in Banerjee et al. (2004), Chilées and
share a common spatial resolution (not necessar- Delfiner (1999), and Cressie (1993).
ily common radiometric or spectral resolution). For valid inference on model parameters
Finally, each point-referenced location s, in the and subsequent prediction model, (1) requires
set S D fs1 ; : : : ; sn g, where a response variable that the underlying spatial random field be
is measured must coincide with a covariate pixel. stationary and isotropic. Stationarity, in spatial
In this way, the elements in the n 1 response modeling contexts, refers to the setting when
vector, y D y.si / niD1 , are uniquely associated K.s; s0 I / D K.s s0 I /; that is, the covariance
with the rows of the n p covariate matrix, function depends upon the separation of the
X D xT .si / niD1 . This statement suggest that sites. Isotropy goes further and specifies
given the N pixels which define D, n of them K.s; s0 / D 2 .s; s0 I /, where k s s0 k is the
are associated with a known response value, and distance between the sites. Usually one further
n D N n require prediction. This is the typical specifies K.s; s0 I / D 2
.s; s0 I / where
setup for model-based predictive mapping. . I / is a correlation function and includes
The univariate spatial regression model for parameters quantifying rate of correlation decay
point-referenced data is written as and smoothness of the surface w.s/. Then
Var.w.s// D 2 represents a spatial variance
component in the model in (1). A very versatile
y.s/ D xT .s/ C w.s/ C .s/ ; (1)
class of correlation functions is the Matérn
correlation function given by
where {w.s/ W s 2 D} is a spatial random
field, with D an open subset of Rd of dimension 1
d ; in most practical settings, d D 2 or d D 3. ks s0kI D 1
ks s0k
2 . /
A random field is said to be a valid spatial
process if for any finite collection of sites S of K ks s0kI I > 0; >0;
arbitrary size, the vector w D w.si / niD1 follows (2)
a well-defined joint probability distribution. Also,
i id
.s/ N.0; 2 / is a white-noise process, often where D . ; / with controlling the decay
called the nugget effect, modeling measurement in spatial correlation and yielding smoother
error or microscale variation (see, e.g., Chilées process realizations for higher values. Also, is
and Delfiner 1999). the usual gamma function, while K is a modified
A popular modeling choice for a spatial Bessel function of the third kind with order , and
random field is the Gaussian process, w.s/ ks s0k is the Euclidean distance between the sites
GP .0; K. ; //, specified by a valid covariance s and s0 .
function K.s; s0 I / D C ov.w.s/; w.s0 // that With observations y from n locations, the data
models the covariance corresponding to a likelihood is written in the marginalized form
pair of sites s and s0 . This specifies the joint y M V N.X ; y /, with y D 2 R. / C
distribution for w as M V N.0; w /, where w D 2
In and R. / D .si ; sj I / ni;j D1 that is the
112 Bayesian Spatial Regression for Multisource Predictive Mapping
spatial correlation matrix corresponding to w(s). contours to produce image and contour plots
For hierarchical models, one assigns prior (hyper- of the spatial processes.
prior) distributions to the model parameters (hy- For predictions, if fs0i gniD1
0
is a collection
perparameters), and inference proceeds by sam- of n0 locations, one can compute the posterior
pling from the posterior distribution of the param- predictive distribution p.w jy/ where w D
eters (see, e.g., Banerjee et al. 2004). Generically w.s0k / nkD1
0
. Note that
denoting by D . ; 2 ; ; 2 /, the set of param-
eters that are to be updated in the marginalized p.w jy/
model from, sample from the posterior distribu- Z
tion / p.w jw; ; y/p.wj ; y/p. jy/d dw:
a b
2597500
B
Latiude (meters)
2595500
2593500
Bayesian Spatial Regression for Multisource Predic- derived from the 1 arc-second (approximately 30 30 m)
tive Mapping, Fig. 1 Remotely sensed variables georec- US Geological Survey national elevation dataset DEM
tified to a common coordinate system (North American data (Gesch et al. 2002). (b)–(d) are the tasseled cap
Datum 1983) and projection (Albers Conical Equal Area) components of brightness, greenness, and wetness derived
and resampled to a common pixel resolution and align- from bands 1 to 5 and 7 of a spring 2002 date of Landsat 7
ment. The images cover the US Forest Service Bartlett ETM+ sensor imagery (Huang et al. 2002). This Landsat
Experimental Forest near Bartlett, New Hampshire, USA. imagery was acquired from the National Land Cover
(a) is the elevation measured in meters above sea level Database for the USA (Homer et al. 2004)
sea level. For this illustration, the focus is on useful regressors for predicting forest biomass. A
predicting the spatial distribution of total tree spring, summer, and fall 2002 date of 30 30
biomass per hectare across the BEF. Tree biomass Landsat 7 ETM+ satellite imagery was acquired
is measured as the weight of all above ground for the BEF. Following Huang et al. (2002), the
portions of the tree, expressed here as metric image was transformed to tasseled cap compo-
tons per hectare. Within the data set, biomass nents of brightness (1), greenness (2), and wet-
per hectare is recorded at 437 forest inventory ness (3) using data reduction techniques. Three
plots across the BEF (Fig. 2). Satellite imagery of the nine resulting spectral variables labeled
and other remotely sensed variables have proved TC1, TC2, and TC3 are depicted in Fig. 1b–d.
114 Bayesian Spatial Regression for Multisource Predictive Mapping
a b
2597500
Latitude (meters)
2595500
2593500
Bayesian Spatial Regression for Multisource Predic- tree biomass per hectare measured at forest inventory plots
tive Mapping, Fig. 2 The circle symbols in (a) represent depicted in (a) and remotely sensed regressors, some of
georeferenced forest inventory plots on the US Forest which are depicted in Fig. 1. Note that the spatial trends
Service Bartlett Experimental Forest near Bartlett, New in (b) suggest that observations of total tree biomass per
Hampshire, USA. (b) is an interpolated surface of residual hectare are not conditionally independent (i.e., conditional
values from an ordinary least squares regression of total on the regressors)
a b
2597500
B
2595500
2593500
c d
2597500
2595500
2593500
Bayesian Spatial Regression for Multisource Predic- total tree biomass per hectare, depicted in Fig. 2, and
tive Mapping, Fig. 4 Results of pixel-level prediction remotely sensed regressors, some of which are depicted
from a spatial regression model of the response variable in Fig. 1
likelihood. Replicated data sets from the above investigations attempt to predict the spatial and
distribution are easily obtained by drawing, for temporal distribution of risk to humans or com-
each posterior realization .l/ , a replicated data ponents of an ecosystem. For example, Thayer
set y.l/
rep from p.yrep j
.l/
/. Preferred models will et al. (2003) explore the utility of geostatistics for
perform well under a decision-theoretic balanced human risk assessments of hazardous waste sites.
loss function that penalizes both departure from Another example is from Kooistra et al. (2005)
corresponding observed value (lack of fit) and who investigate the uncertainty of ecological risk
for what the replicate is expected to be (variation estimates concerning important wildlife species.
in replicates). Motivated by a squared error loss As noted, the majority of the multisource risk
function, the measures for these two criteria are prediction literature is based on non-Bayesian
T
evaluated as G D .y rep / .y rep / and P D kriging models; however, as investigators begin
t r.Var.yrep jy//, where rep D E yrep jy is the to recognize the need to estimate the uncertainty
posterior predictive mean for the replicated data of prediction, they will likely embrace the basic
points and P is the trace of the posterior predic- Bayesian methods reviewed here and extend them
tive dispersion matrix for the replicated data; both to fit their specific domain. For example, Kneib
these are easily computed from the samples y.j /
rep . and Fahrmeir (2007) have proposed one such
Gelfand and Ghosh (1998) suggests using the Bayesian extension to spatially explicit hazard
score D D G C P as a model selection criterion, regression.
with lower values of D indicating better models.
Another measure of model choice that Agricultural and Ecological Assessment
has gained much popularity in recent times, Spatial processes, such as predicting agricultural
especially due to computational convenience, crop yield and environmental conditions (e.g.,
is the deviance information criteria (DIC) deforestation, soil or water pollution, or forest
(Spiegelhalter et al. 2002). This criteria is species change in response to changing climates),
the sum of the Bayesian deviance (a measure are often modeled using multisource spatial re-
of model fit) and the (effective) number of gression (see., e.g., Atkinson et al. 1994; Bert-
parameters (a penalty for model complexity). erretche et al. 2005; Bhatti et al. 1991). Only
The deviance, up to an additive quantity not recently have Bayesian models been used for pre-
depending upon , is D. / D 2 log L.yj /, dicting agricultural and forest variables of interest
where L.yj / is the first-stage Gaussian within a multisource setting. For example, in an
likelihood as in (1). The Bayesian deviance is effort to quantify forest carbon reserves, Banerjee
the posterior mean, D. / D E jy D. / , while and Finley (2007) used single and multiple reso-
the effective number of parameters is given by lution Bayesian spatial regression to predict the
pD D D. / D. N /. The DIC is then given distribution of forest biomass. An application of
by D. / C pD and is easily computed from such models to capture spatial variation in growth
the posterior samples. It rewards better fitting patterns of weeds is discussed in Banerjee and
models through the first term and penalizes more Johnson (2006).
complex models through the second term, with
lower values indicating favorable models for the
Atmospheric and Weather Modeling
data.
Arrays of weather monitoring stations provide
a rich source of spatial and temporal data on
Key Applications atmospheric conditions and precipitation. These
data are often coupled with a host of topographic
Risk Assessment and satellite derived variables through a spatial
Spatial and/or temporal risk mapping and auto- regression model to predict short- and long-term
matic zonation of geohazards have been mod- weather conditions. Recently, several investiga-
eled using traditional geostatistical techniques tors used these data to illustrate the virtues of
that incorporate both raster and vector data. These a Bayesian approach to spatial prediction (see
Bayesian Spatial Regression for Multisource Predictive Mapping 117
Gelfand AE, Ghosh SK (1998) Model choice: a mini- fit (with discussion and rejoinder). J R Stat Soc Ser B
mum posterior predictive loss approach. Biometrika 64:583–639
85:1–11 Thayer WC, Griffith DA, Goodrum PE, Diamond GL,
Gelman A (2006) Prior distributions for variance parame- Hassett JM (2003) Application of geostatistics to risk
ters in hierarchical models. Bayesian Anal 3:515–533 assessment. Risk Anal Int J 23(5):945–960
Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Wackernagel H (2006) Multivariate geostatistics: an in-
Bayesian data analysis, 2nd edn. Chapman and troduction with applications, 3nd edn. Springer, New
Hall/CRC Press, Boca Raton York
Gesch D, Oimoen M, Greenlee S, Nelson C, Steuck Webster R, Oliver MA (2001) Geostatistics for environ-
M, Tyler D (2002) The national elevation dataset. mental scientists. Wiley, New York
Photogramm Eng Remote Sens 68(1):5–12
Homer C, Huang C, Yang L, Wylie B, Coan M (2004)
Development of a 2001 national land-cover database
for the United States. Photogramm Eng Remote Sens
Recommended Reading
70:829–840
Huang C, Wylie B, Homer C, Yang L, Zylstra G (2002) Handcock MS, Stein ML (1993) A Bayesian analysis of
Derivation of a tasseled cap transformation based on kriging. Technometrics 35:403–410
landsat 7 at-satellite reflectance. Int J Remote Sens Wang Y, Zheng T (2005) Comparison of light detection
8:1741–1748 and ranging and national elevation dataset digital el-
Jones CB (1997) Geographical information systems and evation model on floodplains of North Carolina. Natl
computer cartography. Addison Wesley Longman, Hazards Rev 6(1):34–40
Harlow
Kneib T, Fahrmeir L (2007) A mixed model approach
for geoadditive hazard regression. Scand J Stat 34:
207–228 Bead
Kooistra L, Huijbregts MAJ, Ragas AMJ, Wehrens R,
Leuven RSEW (2005) Spatial variability and uncer- Space-Time Prism Model
tainty in ecological risk assessment: a case study on
the potential risk of cadmium for the little owl in
a Dutch River Flood Plain. Environ Sci Technol 39:
2177–2187
Mather PM (2004) Computer processing of remotely-
Best linear Unbiased Prediction
sensed images, 3rd edn. Wiley, Hoboken, p 442
Möller J (2003) Spatial statistics and computational Spatial Econometric Models, Prediction
method. Springer, New York
National Academy of Sciences (1970) Remote Sensing
with Special Reference to Agriculture and Forestry.
National Academy of Sciences, Washington, DC, Big Data
p 424
Paciorek CJ, Schervish MJ (2006) Spatial modelling using
a new class of nonstationary covariance functions. Informing Climate Adaptation with Earth Sys-
Environmetrics 17:483–506 tem Models and Big Data
Riccio A, Barone G, Chianese E, Giunta G (2006)
A hierarchical Bayesian approach to the spatio-
temporal modeling of air quality data. Atmosph En-
viron 40:554–566 Big Data and Spatial Constraint
Richards JA, Xiuping J (2005) Remote sensing digital
image analysis, 4th edn. Springer, Heidelberg, p 439
Databases
Robert C (2001) The Bayesian choice, 2nd edn. Springer,
New York Peter Z. Revesz
Santner TJ, Williams BJ, Notz WI (2003) The design and Department of Computer Science and
analysis of computer experiments. Springer, New York
Engineering, University of Nebraska-Lincoln,
Schabenberger O, Gotway CA (2004) Statistical methods
for spatial data analysis. Texts in statistical science Lincoln, NE, USA
series. Chapman and Hall/CRC, Boca Raton
Scheiner SM, Gurevitch J (2001) Design and analysis of
ecological experiments, 2nd edn. Oxford University
Press, Oxford
Synonyms
Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A
(2002) Bayesian measures of model complexity and Spatial big data; Spatial constraint database
Big Data and Spatial Constraint Databases 119
plished by the appropriate type of constraint solv- or other diseases, provides another growing
ing, which generalizes the aggregate operators application area.
on relational data. Some of the aggregation can
be done by spatial indexing algorithms. Another
Future Directions
type of aggregation calls for finding the inter- B
section (or union) of various spatial areas. The
The following are some open problems or barely
intersection operator is commutative and associa-
addressed topics in the context of big data and
tive and therefore can be computed using a tree
spatial constraint databases:
structure where the leaves are the input constraint
databases describing various areas, and each in-
ternal node is the intersection of all the leaves • Constraint relations for communications in
below it, and the root is the intersection of all the a map-reduce architecture need to be facili-
areas. tated by development of efficient algorithms
that are locally produced at each data store
triangular irregular networks, spatiotemporal
functions describing the trajectories of moving
Key Applications
objects, and other constraint relations. How
There are many possible applications of spatial much computational time and space are re-
data ranging from astronomy, to geography, and quired by the aggregation? How conflicts in
urban planning. As an example from astronomy, the constraint data can be handled?
suppose that we have sky survey data with a • There need to be developed good methods
distributed storage where each location records to estimate the error in the spatial constraint
observations at midnight on different days. The database approximation. This is similar to
location of one particular galaxy is identified in numerical analysis methods that find an error
the records at each location. Then a query may be term when estimating the integral of a func-
to find any star that is always near the galaxy on tion using a particular numerical integration
a specific set of days. method, for example, composite Simpson’s
Another application is in integrating data that rule (Burden and Faires 2014). We can reduce
is stored at each location and classified using the size of the error by reducing the error
machine-learning techniques such as support vec- tolerance value. That is similar to numerical
tor machines (SVMs) or decision trees. The local integration methods that can reduce the size of
classifications can be represented using constraint the error by considering smaller interval sizes.
databases. Classification integration (Revesz and
Triplet 2010, 2011) combines the classifications Cross-References
at each location into a single classification. The
classification integration can be also performed MLPQ Spatial Constraint Database System
using a tree structure similar to the intersection Spatial
query described above. Spatial Join with Hadoop
Weather forecasting and climate change mod-
eling is another application where a distributed
sensor network continuously collects a vast References
amount of data to be mined for both local and
Burden RL, Faires JD (2014) Numerical analysis, 9th edn.
global trends (Revesz and Woodward 2014). Springer, New York
Finally, the explosion of genomic data when Chomicki J, Revesz PZ (1999) Constraint-based interop-
spatial allele variations are also considered, erability of spatiotemporal databases. Geoinformatica
3(3):211–243
as in the case of personalized medicine, such
Li L, Revesz PZ (2004) Interpolation methods for spatio-
as predicting a person’s chances of developing temporal geographic data. Comput Env Urban Syst
specific types of cancer (Revesz and Assi 2013) 28(3):201–227
122 Binary Correlation
Historical Background
Bioinformatics, Spatial Aspects
According to Kotch (2005), spatial epidemiology
Sudhanshu Panda mapping and analysis has been driven by soft-
GIS/Environmental Science, Gainsville State ware developments in geographic information
College, Gainsville, GA, USA systems (GIS) since the early 1980s. GIS map-
ping and analysis of spatial disease patterns and
geographic variations of health risks is helping
Synonyms understand the spatial epidemiology since past
centuries (Jacquez 2000). Researchers in bioin-
Biological data mining; Epidemiological map- formatics also deal with similar pattern recogni-
ping; Genome mapping tion and analysis regarding very small patterns,
Bioinformatics, Spatial Aspects 123
Bioinformatics, Spatial Aspects, Fig. 1 Images of DNA and proteins in cell as 2-D and 3-D form
Bioinformatics, Spatial Aspects, Fig. 2 High-resolution (4000 4000 pixels) images of a genome maps showing
the spatial nature of the data
such as those in DNA structure that might predis- like other geospatial data. GIS can interactively
pose an organism to developing cancer (Anony- be used in bioinformatics projects for better
mous 2007). As both bioinformatics and GIS are dynamism, versatility, and efficiency. Figure 3
based on common mapping and analytical ap- shows mapping of genome data using ArcGIS
proaches, there is a good possibility of gaining an software. This helps in managing the genome
important mechanistic link between individual- data interactively with the application of superior
level processes tracked by genomics and pro- GIS functionality. Below is a description of a GIS
teomics and population-level outcomes tracked application in bioinformatics for different aspects
by GIS and epidemiology (Anonymous 2007). of management and analysis.
Thus, the scope of bioinformatics in health re-
search can be enhanced by collaborating with Use of GIS for Interactive Mapping of
GIS. Genome Data
In bioinformatics application, genome browsers
are developed for easy access of the data. They
Scientific Fundamentals use only simple keyword searches and limit the
display of detailed annotations to one chromo-
As discussed earlier, data in bioinformatics are somal region of the genome at a time (Dolan
of spatial nature and could be well understood et al. 2006). Spatial data browsing and man-
if represented, analyzed, and comprehended just agement could be done with efficiency using
124 Bioinformatics, Spatial Aspects
Bioinformatics, Spatial Aspects, Fig. 3 Display of a mouse genome in ArcGIS (Adapted from Dolan et al. 2006)
ArcGIS software (ESRI). Dolan et al. (2006) have provided (Pushker et al. 2005) with an interface
employed concepts, methodologies, and the tools for selecting a particular sampling location on the
that were developed for the display of geographic world map and getting all the genome sequences
data to develop a Genome Spatial Information from that location and their details. Geodatabase
System (GenoSIS) for spatial display of genomes management ability helped them obtained the
(Fig. 4). The GenoSIS helps users to dynamically following information: (i) taxonomy report,
interact with genome annotations and related at- taxonomic details at different levels (domain,
tribute data using query tools of ArcGIS, such phylum, class, order, family, and genus); (ii)
as query by attributes, query by location, query depth report, a plot showing the number of
by graphics, and developed definition queries. sequences vs. depth; (iii) biodiversity report, a
The project also helps in producing dynamically list of organisms found; (iv) collection of all
generated genome maps for users. Thus, the ap- entries; and (v) advanced search for a selected
plication of GIS in bioinformatics projects helps region on the map. Using GIS tools, they
genome browsing become more versatile and retrieved sequences corresponding to a particular
dynamic. taxonomy, depth, or biodiversity (Pushker et al.
2005). Meaning, the bioinformatics dealing with
GIS Application as a Database Tool for a spatial scale can be well managed by GIS
Bioinformatics database development and management.
GIS can be applied for efficient biological While developing a “Global Register of Mi-
database management. While developing a gratory Species (GROMS),” Riede (2000) devel-
database for the dynamic representation of oped a geodatabase of the bird features including
marine microbial biodiversity, the GIS option their genomes. It was efficient in accessing the
Bioinformatics, Spatial Aspects 125
Conserved regions
Buildings
Roads, rivers
Land parcels
Expression levels B
Elevation data
Chromosome space
Underlying geographic space
http://www.gis.com/whatisgis/whyusegis.html
Bioinformatics, Spatial Aspects, Fig. 4 Comparative use of GIS paradigm of the map layers to the integration and
visualization of genome data (Adapted from Dolan et al. 2006)
Bioinformatics, Spatial
Aspects, Fig. 5 Image
analysis of a barley leaf for
cell transformation analysis
(Adapted from Schweizer
2007)
information about the migratory birds including fully penetrated into the transformed cell and
their spatial location through the geodatabase. started to grow out on the leaf surface illus-
trated by the elongating secondary hyphae. This
Genome Mapping and Pattern study shows the spatial aspect of bioinformatics
Recognition and Analysis (Schweizer 2007).
While studying the genome structure, it is es-
sential to understand the spatial extent of its GIS Software in Bioinformatics as Spatial
structure. As genome mapping is done through Analysis Tool
imaging, image processing tools are used to an- Software has been developed as tools of
alyze the pattern. GIS technology could be used bioinformatics to analyze nucleotide or amino
for pattern recognition and analysis. Figure 5 is acid sequence data and extract biological
an example of an image of a microscopic top information. “Gene prediction software (Pavy
view of a barley leaf with a transformed - et al. 1999)” and “sequence alignment software
glucuronidase-expressing cell (blue-green). From (Anonymous: Sequence alignment software
the image analysis, it is observed that two fungal 2007)” are examples of some of the software
spores of Blumeria graminis f.sp. hordei (fungus, developed for bioinformatics.
dark blue) are interacting with the transformed Gene prediction software is used to identify a
cell and the spore at the left-hand side success- gene within a long gene sequence. As described
126 Bioinformatics, Spatial Aspects
by Dolan et al. (2006), if the genome database bioinformatics in these areas could be associ-
can be presented through ArcGIS, they can be ated with spatial technology of GIS. A study
visualized, analyzed, and queried better than the by Nielson and Panda (2006) was conducted
present available techniques. Thus, GIS func- on predictive modeling and mapping of fasting
tion development techniques can be replicated blood glucose level in Hispanics in southeastern
to make the available software more efficient. Idaho. The levels were mapped according to the
Programs like GenoSIS (Dolan et al. 2006) is a racial genes and their spatial aspect of represen-
step in that direction. Sequence alignment soft- tation. This study shows how bioinformatics can
ware is a compilation of bioinformatics software be used in several other areas including epidemi-
tools and web portals which are used in se- ology.
quence alignment, multiple sequence alignment,
and structural alignment (Anonymous: Sequence
alignment software 2007). They are also used Future Directions
for database searching. Thus, GIS can be used
for bringing dynamism to the database search. According to Virginia Bioinformatics Institute
Matching of DNA structures could be efficiently (VBI) Director Bruno Sobral, “The notion of a
done with the GIS application. map goes all the way from the level of a genome
Molecular modeling and 3-D visualization is to a map of the United States,” he said, “bioin-
another aspect of genome research. To under- formatics has focused on modeling from the level
stand the function of proteins in cells, it is es- of the molecules up to the whole organism, while
sential to determine a protein’s structure (BSCS GIS has created tools to model from the level of
2003). The process of determining a protein’s the ecosystem down.” This indicates that there is
exact structure is labor intensive and time con- great potential for bioinformatics and geospatial
suming. Traditionally, X-ray crystallography and technology to be combined in a mutually enhanc-
nuclear magnetic resonance (NMR) spectroscopy ing fashion for important applications.
techniques are used to solve protein structure
determination (BSCS 2003). The maps developed
by these instruments are preserved in the form
of a Protein Data Bank (PDB). The PDB is Cross-References
the first bioinformatics resource to store three-
dimensional protein structures. Currently, it is Biomedical Data Mining, Spatial
possible to visualize the utility of GIS regarding Public Health and Spatial Modeling
molecular modeling and the 3-D visualization
process. ArcScence™ is the 3-D visualization
and modeling software for spatial data. It could References
very well be the best tool for PDB data visualiza-
tion, modeling, and analysis. Anonymous (2007) VT conference puts new research area
on the map; GIS expert Michael Goodchild echoes its
value. https://www.vbi.vt.edu/article/articleview/48/1/
15/. Accessed 05 Mar 2007
Key Applications Anonymous (2007) Sequence alignment software. http://
en . wikipedia.org/wiki/Sequence_alignment_software.
Bioinformatics is playing important roles in many Accessed
BSCS (2003) Bioinformatics and the human genome
areas such as agriculture, medicine, biotechnol- project. A curriculum supplement for high school
ogy, environmental science, animal husbandry, biology. Developed by BSCS under a contract from the
etc., as a genome is not only a principal com- Department of Energy
ponent of the human body but also of plants. Dolan ME, Holden CC, Beard MK, Bult CJ (2006)
Genomes as geography: using GIS technology to
GIS or geotechnology has been successfully used build interactive genome feature maps. BMC Bioinf
in these areas for a long time. Therefore, using 7:416
Biomedical Data Mining, Spatial 127
Znm . ; /
Biomedical Data Mining, Spatial, Fig. 1 Graphical
representation of the modes that constitute a 4th order 8p m
Zernike polynomial. Each mode is labeled using the single
< p2.n C 1/ZRn . / cos.m / for m > 0
indexing convention created by the Optical Society of D 2.n C 1/ZRnm . / sin.jmj / for m < 0
:p
America (Thibos et al. 1999) .n C 1/ZRnm . / for m D 0
(2)
medical community for a number of different
applications, most recently for modeling the where n is the radial polynomial order and m
shape of the cornea (Schwiegerling et al. 1995). represents azimuthal frequency. The normaliza-
Studies have shown that these models can tion coefficient is given by the square root term
effectively characterize aberrations that may exist preceding the radial and azimuthal components.
on the corneal surface (Iskander et al. 2001a, b). The radial component of the Zernike polynomial,
The use of wavelets is natural in applications the second portion of the general formula, is
that require a high degree of compression with- defined as:
out a corresponding loss of detail, or where the
detection of subtle distortions and discontinuities ZRnm . /
is crucial (Mallat 1999). Wavelets have been used
.n X
jmj/=2
in a number of applications, ranging from signal . 1/s .n s/ n 2s
D :
processing, to image compression, to numerical sD0 s . nCjmj s/ . n 2jmj s/
2
analysis (Daubechies 1992). They play a large (3)
role in the processing of biomedical instrument
data obtained through techniques such as ultra-
Note that the value of n is a positive integer or
sound, magnetic resonance imaging (MRI) and
zero. For a given n, m can only take the values
digital mammography (Laine 2000).
n, n C 2, n C 4, : : :, n. In other words,
m jnj D even and jnj < m. Thus, only certain
combinations of n and m will yield valid Zernike
Scientific Fundamentals
polynomials. Any combination that is not valid
simply results in a radial polynomial component
Zernike Polynomials
of zero.
Zernike polynomials are a series of circular poly-
Polynomials that result from fitting raw data
nomials defined within the unit circle. They are
with these functions are a collection of approxi-
orthogonal by the following condition:
mately orthogonal circular geometric modes. The
Z 2 Z 1 coefficients of each mode are proportional to its
0
Znm . ; /ZRnm0 . ; / d d contribution to the overall topography of the orig-
0 0 inal image data. As a result, one can effectively
D nn0 mm0 (1) reduce the dimensionality of the data to a subset
2.n C 1/ of polynomial coefficients that represent spatial
Biomedical Data Mining, Spatial 129
a b
2 4
1 2
0 0
0 1 2 0 1 2 3 4
c
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8
Biomedical Data Mining, Spatial, Fig. 2 Graphical representation of first (a), second (b), and third-order Hilbert
curves (c). Curves used to sample matrices of size 4, 16, and 81, respectively
Figure 3 shows an example of the output classify corneal shape, and finally, to present the
produced by a corneal topographer for a patient results of those decisions to allow clinicians to
suffering from Keratoconus, a patient with a nor- visually inspect the classification criteria.
mal cornea, and an individual who has undergone
LASIK eye surgery. The methods presented here Data Transformation
are intended to differentiate between these pa- The data from a corneal topographer are largely
tient groups. These images represent illustrative instrument-specific but typically consist of a 3D
examples of each class, i.e., they are designed point cloud of approximately 7000 spatial coordi-
to be easily distinguishable by simple visual in- nates arrayed in a polar grid. The height of each
spection. The top portion of the figure shows the point · is specified by the relation · D f . ; /,
imaged concentric rings. The bottom portion of where the height relative to the corneal apex
the image shows a false color map representing is a function of radial distance from the origin
the surface curvature of the cornea. This color ( ) and the counter-clockwise angular deviation
map is intended to aid clinicians and is largely from the horizontal meridian ( ). The inner and
instrument-dependent. outer borders of each concentric ring consist of
a discrete set of 256 data points taken at a known
angle , but a variable distance , from the origin.
Key Applications
Zernike
This section covers techniques employed to rep- The method described here is based on tech-
resent the topography data with the above mod- niques detailed by Schwiegerling et al. (1995)
eling methods, to use those representations to and Iskander et al. (2001a). In summary, the
Biomedical Data Mining, Spatial 131
Biomedical Data Mining, Spatial, Fig. 3 color topographical map representing corneal curvature,
Characteristic corneal shapes for three patient groups. with an increased curvature given a color in the red
The top image shows a picture of the cornea and reflected spectrum, decreased curvature in blue. From left to right:
concentric rings. The bottom image shows the false Keratoconus, Normal and post-refractive surgery
data is modeled over a user-selected circular polar to Cartesian coordinates. Once this step has
region of variable-diameter, centered on the axis been completed, the matrix is normalized to a
of measurement. The generation of the Zernike power of 2, typically 64 64 or 128 128. After
model surface proceeds in an iterative fashion, normalization, the 2D wavelet decomposition is
computing a point-by-point representation of the applied, with the final level of approximation
original data at each radial and angular loca- coefficients serving as the feature vector.
tion up to a user-specified limit of polynomial
complexity. The polynomial coefficients of the
1D Wavelets
surface that will later be used to represent the
This section details two methods for transforming
proportional magnitude of specific geometric fea-
the topography data into a 1D signal. The first
tures are computed by performing a least-squares
method is to simply trace along each ring in a
fit of the model to the original data, using stan-
counter-clockwise fashion, adding each of the
dard matrix inversion methods (Schwiegerling
256 points to the end of the signal. Upon reaching
et al. 1995).
the end of a ring, one moves to the next larger ring
and repeats the process. Next, a 1D decomposi-
2D Wavelets tion is applied and the final level approximation
To create a model using the 2D wavelet decom- coefficients are taken to serve as a feature vector
position, the data must first be transformed from for classification. The second method involves
132 Biomedical Data Mining, Spatial
transforming the data into a distance matrix as Part of the rationale behind using Zernike
with the 2D wavelet decomposition. Then, one polynomials as a transformation method over
can sample the data using a space-filling curve, other alternatives is that there is a direct corre-
which will result in a 1D representation of the 2D lation between the geometric modes of the poly-
matrix. nomials and the surface features of the cornea.
An added benefit is that the orthogonality of the
Classification series allows each term to be considered inde-
Given a dataset of normal, diseased, and post- pendently. Since the polynomial coefficients used
operative LASIK corneas, the above represen- as splitting attributes represent the proportional
tations were tested using a number of differ- contributions of specific geometric modes, one
ent classification strategies, including decision can create a surface representation that reflects
trees, Naïve Bayes, and neural networks. The the spatial features deemed “important” in classi-
data was modeled with Zernike polynomials and fication. These features discriminate between the
several polynomials orders were tested, rang- patient classes and give an indication as to the
ing from 4 to 10. The experiments showed that specific reasons for a decision.
the low-order Zernike polynomials, coupled with The section below discusses a method that
decision trees, provided the best classification has been designed and developed to visualize
performance, yielding an accuracy of roughly decision tree results and aid in clinical decision
9% (Marsolo et al. 2006; Twa et al. 2003). The support (Marsolo et al. 2005). The first step of
2D, 1D Hilbert and 1D ring-based wavelet rep- the process is to partition the dataset based on
resentations were tested as well. For the 2D the path taken through the decision tree. Next a
representation, a normalized matrix of size 128 surface is created using the mean values of the
128 and a 3rd level decomposition yielded the polynomial coefficients of all the patients falling
highest accuracy. For the Hilbert-based approach, into each partition. For each patient, an individual
a normalized matrix of size 64 64 and a 6th polynomial surface is created from the patient’s
level transformation. With the ring-based model, coefficient values that correspond to the splitting
the 7th level decomposition yielded the highest attributes of the decision tree. This surface is
accuracy. The accuracy of the different wavelet- contrasted against a similar surface that contains
based models was roughly the same, hovering the mean partition values for the same splitting
around 80%, approximately 10% lower than the coefficients, providing a measure to quantify how
best Zernike-based approach. “close” a patient lies to the mean.
Figure 4 shows a partial decision tree for a
Visualization 4th Zernike model. The circles correspond to
While accuracy is an important factor in choosing the Zernike coefficients used as splitting criteria,
a classification strategy, another attribute that while the squares represent the leaf nodes. In
should not be ignored is the interpretability of this tree, three example leaf nodes are shown,
the final results. Classifiers like decision trees are one for each of the patient classes considered in
often favored over “black-box” classifiers such this experiment (K – keratoconus, N – normal,
as neural networks because they provide more L – post-operative LASIK). The black triangles
understandable results. One can manually inspect are simply meant to represent subtrees that were
the tree produced by the classifier to examine the omitted from the figure in order to improve read-
features used in classification. In medical image ability. While each node of the tree represents
interpretation, decision support for a domain ex- one of the Zernike modes, the numbers on each
pert is preferable to an automated classification branch represent the relation that must be true in
made by an expert system. While it is important order to proceed down that path. (If no relation
for a system to provide a decision, it is often is provided, it is simply the negation of the
equally important for clinicians to know the basis relation(s) on the opposing branch(es)). An object
for an assignment. can be traced through the tree until a leaf node is
Biomedical Data Mining, Spatial 133
K
B
Z 04
–4.04
Z 22
Z 00
9.34
–401.83
L
Z 33
1.42
Z 13
Z 24
–0.31
reached and the object is then assigned the label comparison will give clinicians some indication
of that leaf. Thus, given the tree in Fig. 4, if an of how “close” a patient is to the rule average. To
object had a Z3 1 coefficient with a value 2.88, compute these “rule mean” coefficients, denoted
one would proceed down the left branch. If the rule, the training data is partitioned and the aver-
value was > 2.88, the right branch would be taken age of each coefficient is calculated using all the
and the object would be labeled as keratoconus. records in that particular partition.
In this manner, there is a path from the root node For a new patient, a Zernike transformation
to each leaf. is computed and the record is classified using
As a result, one can treat each possible path the decision tree to determine the rule for that
through a decision tree as a rule for classifying patient. Once this step has been completed, the
an object. For a given dataset, a certain num- visualization algorithm is applied to produce five
ber of patients will be classified by each rule. separate images (illustrated in Fig. 5). The first
These patients will share similar surface features. panel is a topographical surface representing the
Thus, one can compare a patient against the Zernike model for the patient. It is constructed
mean attribute values of all the other patients by plotting the 3-D transformation surface as a
who were classified using the same rule. This 2-D topographical map, with elevation denoted
134 Biomedical Data Mining, Spatial
a b c
1 1 1
200 200 200
0 400 0 0
400 400
600
600
800 600
1 1 1
1 0.5 0 0.5 1 1 0.5 0 0.5 1 1 0.5 0 0.5 1
1 1 1
200 0
200
400 200
0 0 0 400
600 400
600
800 600
1 1 1
1 0.5 0 0.5 1 1 0.5 0 0.5 1 1 0.5 0 0.5 1
1 1 5 1 355
20
0 360
0 0 0 0
5 365
370
1 20 1 10 1
1 0.5 0 0.5 1 1 0.5 0 0.5 1 1 0.5 0 0.5 1
1 1 1 385
20 4
2 390
10
0 0
0 0 0 2 395
4
10 400
1 6 1
1
1 0.5 0 0.5 1 1 0.5 0 0.5 1 1 0.5 0 0.5 1
0.4 2 0
0.5
0.2 0
1
0 2 1.5
1 2 3 4 5 6 7 8 9 1011 12131415 1 2 3 4 5 6 7 8 9 1011 12131415 1 2 3 4 5 6 7 8 9 1011 12131415
Biomedical Data Mining, Spatial, Fig. 5 Example sur- representation using the rule values. The third and fourth
faces for each patient class. (a) represents a Keratoconic panels show the rule surfaces, using the patient and rule
eye, (b) a Normal cornea, and a post-operative LASIK coefficients, respectively. The bottom panel consists of a
eye. The top panel contains the Zernike representation bar chart showing the deviation between the patient’s rule
of the patient. The next panel illustrates the Zernike surface coefficients and the rule values
by color. The second section contains a topo- same rule partition. For each of the distinguishing
graphical surface created in a similar manner by coefficients, the relative error between the patient
using the rule coefficients. These surfaces are and the rule is computed. The absolute value of
intended to give an overall picture of how the the difference between the coefficient value of
patient’s cornea compares to the average cornea the patient and the value of the rule is taken and
of all similarly-classified patients. divided by the rule value. A bar chart of these
The next two panels in Fig. 5 (rows 3 and 4) error values is provided for each coefficient (the
are intended to highlight the features used in clas- error values of the the coefficients not used in
sification, i.e., the distinguishing surface details. classification are set to zero). This plot is intended
These surfaces are denoted as rule surfaces. They to provide a clinician with an idea of the influence
are constructed from the value of the coefficients of the specific geometric modes in classification
that were part of the classification rule (the rest and the degree that the patient deviates from the
are zero). The first rule surface (third panel of mean.
Fig. 5) is created by using the relevant coeffi- The surfaces in Fig. 5 correspond to the three
cients, but instead of the patient-specific values, example rules shown in the partial tree found
the values of the rule coefficients are used. This in Fig. 4. For each of the patient classes, the
surface will represent the mean values of the majority of the objects were classified by the
distinguishing features for that rule. The second example rule. Since these are the most discrimi-
rule surface (row 4, Fig. 5) is created in the same nating rules for each class, one would expect that
fashion, but with the coefficient values from the the rule surfaces would exhibit surface features
patient transformation, not the average values. commonly associated with corneas of that type.
Finally, a measure is provided to illustrate These results are in agreement with expectations
how close a patient lies to those falling in the of domain experts.
B-tree, Versioned 135
References Bi-temporal
Born M, Wolf E (1980) Principles of optics: electromag-
netic theory of propagation, interference and diffrac- Spatiotemporal Query Languages
tion of light, 6th edn. Pergamon Press, Oxford/New
York
Daubechies I (1992) Ten lectures on wavelets. Society for
Industrial and Applied Mathematics, Philadelphia
Bitmap
Hoekman DH, Varekamp C (2001) Observation of trop-
ical rain forest trees by airborne high-resolution in- Raster Data
terferometric radar. IEEE Trans Geosci Remote Sens
39(3):584–594
Iskander DR, Collins MJ, Davis B (2001a) Optimal mod-
eling of corneal surfaces with Zernike polynomials. Bivariate Median
IEEE Trans Biomed Eng 48(1):87–95
Iskander DR, Collins MJ, Davis B, Franklin R (2001b)
Geometric Median
Corneal surface characterization: how many Zernike
terms should be used? (ARVO) abstract. Invest Oph-
thalmol Vis Sci 42(4):896
Kiely PM, Smith G, Carney LG (1982) The mean shape Black Hole Detection
of the human cornea. J Modern Opt 29(8):1027–1040
Laine AF (2000) Wavelets in temporal and spatial pro-
cessing of biomedical images. Annu Rev Biomed Eng Ring-Shaped Hotspot Detection
02:511–550
Mallat S (1999) A wavelet tour of signal processing, 2nd
edn. Academic, New York
Mandell RB (1996) A guide to videokeratography. Int BLUP
Contact Lens Clin 23(6):205-28
Marsolo K, Parthasarathy S, Twa MD, Bullimore MA Spatial Econometric Models, Prediction
(2005) A model-based approach to visualizing classifi-
cation decisions for patient diagnosis. In: Proceedings
of the conference on artificial intelligence in medicine
(AIME), Aberdeen, 23–27 July 2005 Branch and Bound
Marsolo K, Twa M, Bullimore MA, Parthasarathy S
(2006) Spatial modeling and classification of corneal
shape. IEEE Trans Inf Technol Biomed Skyline Queries
Platzman L, Bartholdi J (1989) Spacefilling curves and the
planar travelling salesman problem. J Assoc Comput
Mach 46:719–737
Schwiegerling J, Greinvenkamp JE, Miller JM (1995) B-tree, Versioned
Representation of videokeratoscopic height data with
Zernike polynomials. J Opt Soc Am A 12(10):2105–
2113 Smallworld Software Suite
136 Bubble Estimation
Financial Asset Analysis with Mobile GIS Decision-Making Effectiveness with GIS
Definition
Caching
A cadastre may be de ned as an of cial geo-
OLAP Results, Distributed Caching
graphic information system (GIS) which iden-
ti es geographical objects within a country, or
more precisely, within a jurisdiction. Just like
CAD and GIS Platforms a land registry, it records attributes concerning
pieces of land, but while the recordings of a
land registry is based on deeds of conveyance
Computer Environments for GIS and CAD
and other rights in land, the cadastre is based
on measurements and other renderings of the
location, size, and value of units of property. The
Cadaster cadastre and the land registry in some countries,
e.g., the Netherlands and New Zealand, are man-
Cadastre aged within the same governmental organization.
From the 1990s, the term land administration
system came into use, referring to a vision of
a complete and consistent national information
Cadastre system, comprising the cadastre and the land
registry.
Erik Stubkj r The above de nition of cadastre accom-
Department of Development and Planning, modates to the various practices within
Aalborg University, Aalborg, Denmark continental Europe, the British Commonwealth,
and elsewhere. Scienti c scrutiny emerged
from the 1970s, where the notions of a land
Synonyms information system or property register provided
a frame for comparison of cadastres across
Cadaster; Land administration system; Land in- countries. However, GIS and sciences emerged
formation system; Land policy; Land registry; as the main approach of research in the more
Property register; Spatial reference frames technical aspects of the cadastre. In fact, the
above de nitional exercise largely disregards recordings in ledgers and maps became a model
the organizational, legal, and other social for other European principalities and kingdoms.
science aspects of the cadastre. These are The uneven diffusion of cadastral technology
more adequately addressed when the cadastre reveals a power struggle between the ruling elite
is conceived of as a sociotechnical system, and the landed gentry and clerics, who insisted
comprising technical, intentional, and social on their tax exemption privileges. In the early
elements. modern states, the cadastre was motivated by
reference to a God-given principle of equality
(German: gottgef llige Gerechtigkeit or gottge-
f llige Gleichheit (Kain and Baigent 1993)). Gen-
Historical Background erally, absolutist monarchs were eager to estab-
lish accounts of the assets of their realm, as
The notion of the cadastre has been related to the basis for decisions concerning their use in
Byzantine ledgers, called katastichon in Greek, wars and for the general bene t of the realm.
literally line by line . A Roman law of 111 A continental European version of mercantilism,
BC required that land surveyors (agrimensores) cameralism , was lectured at universities, seek-
should complete maps and registers of certain ing a quasirational exploitation of assets and fair
tracts of Italy. Also, an archive, a tabularium, taxation, for which the cadastre was needed, as
was established in Rome for the deposit of the well as regulations and educational programs,
documents. Unfortunately, no remains of the tab- for example in agriculture, forestry, and mining.
ularium seem to have survived the end of the Cadastral technology, the related professions, and
Roman Empire. the centralized administration together became
The Cadastre reemerged in the fteenth cen- an instrument of uni cation of the country, pro-
tury in some Italian principalities as a means viding the technical rationale for greater equality
of recording tax liabilities. This seems part of in taxation. Taxation thus gradually became con-
a more general trend of systematic recording of trolled by the central authority, rather than me-
assets and liabilities, e.g., through double-entry diated through local magnates. This change was
bookkeeping, and spread to other parts of Europe. recognized by Adam Smith, by physiocrats, and
In order to compensate for the lack of mapping by political writers in France. The administrative
skills, landed assets and their boundaries were de- technology was complemented by codi cation,
scribed through a kind of written maps or cartes that is, a systematic rewriting of laws and by-laws
parlantes. During the sixteenth century, land- that paved the way for the modern state where
scape paintings or so called picture maps were individual citizens are facing the state, basically
prepared e.g., for the court in Speyer, Germany, on equal terms.
for clarifying argumentation on disputed land. The reorganization of institutions after the
During the same period in the Netherlands, the French revolution of 1789 also changed the role
need for dike protection from the sea called for of the cadastre as introduced by Enlightenment
measurements and the organization of work and monarchs. In part, this was due to university re-
society; the practice of commissioning surveys forms, e.g., by Wilhelm von Humboldt in Berlin.
for tax collection became increasingly common Cameralism was split into economics, which in-
there. creasingly became a mathematically based dis-
A new phase was marked by new technology, cipline and a variety of disciplines lectured at
the plane table with the related methods for dis- agricultural and technical universities. Cadastral
tance measurement, mapping, and area calcula- expertise was largely considered a sub eld of
tion. The technology was introduced in 1720 in geodesy, the new and rational way of measuring
Austrian Lombardy through a formal trial against and recording the surface of the Earth. From
alternative mapping methods. The resulting Mi- the end of eighteenth century, cadastral maps
lanese census, the Censimento, with its integrated were increasingly related to or based on geodetic
Cadastre 139
triangulations, as was the case for the Napoleonic sumption being that such recordings substantially
cadastre of France, Belgium and the Netherlands. reduce the number of boundary disputes.
The same cadastral reform included the inten- Application of the above-mentioned xed-
tion of using the measured boundaries and areas boundary solution raises serious problems, even
and the parcel identi cation for legal purposes, if the assumption may be largely con rmed.
primarily by letting the cadastral documentation In the cases of land slides and land drifting
with its xed boundaries prove title to land due to streams, the solution is insensitive to C
and become the nal arbiter in case of boundary the owner s cost of getting access to all parts
disputes. However, the courts that were in charge of the property. Furthermore, the solution
of land registries, generally for a more than a does not accommodate for later and better
century, and the legal profession were reluctant measurements of the boundary. Moreover, the
to adopt what might be considered an encroach- boundary may have been shifted years ago for
ment on their professional territory. During the reasons that have become too costly to elicit,
nineteenth century, most countries improved their relative to the value of the disputed area and
deeds recording system, and German-speaking even acknowledging the fact that justice may
countries managed to develop them into title not be served. Some jurisdictions hence allow
systems backed by legal guaranties. Similarly, for adverse possession , that is: an of cial
from South Australia the so-called Torrens sys- recognition of neighbors agreement on the
tem, adopted in 1858, in uenced the English- present occupational boundary, even if it differs
speaking world. However, with few exceptions, from previous cadastral recordings. Likewise,
the integration of cadastral and legal affairs into legal emphasis on merestones and other boundary
one information system had to await the introduc- marks, as well as the recording of terrain features
tion of computer technology and the adoption of which determine permanent and rather well
business approaches in government. de ned boundary points may supplement the
The above historical account is Eurocentric pure xed-boundary approach.
and bypasses possible scienti c exchange with The geodetic surveyors reference frames lo-
South-Eastern neighbors during the twelfth to cate points in terms of coordinates, but the nam-
fteenth centuries. Also, it leaves out the devel- ing and identi cation of parcels relates to names,
opment in England and its colonies worldwide. which is a sub eld of linguistics. The cadastral
The account describes how the notion of the identi er is a technical place name, related to the
cadastre emerged and varied across time and place names of towns, roads, parishes, and to-
place. This calls for special care in scienti c com- pographic features. Hierarchically structured ad-
munications, since no standardized and theory- ministrative units or jurisdictions and their names
based terminology has been established. provide a means of location of property units.
Even if such ordinal structuring of a jurisdiction
through place names is coarse, relative to the
Scientific Fundamentals metric precision of measured boundary points, it
provides in many cases for a suf cient localiza-
Spatial Reference Frames tion of economic activities and it reduces the de-
The center and rotational axis of the Earth, to- pendency of specialized and costly competence.
gether with the Greenwich meridian, provide a The linguistic approach to localization refers
reference frame for the location of objects on to another spatial reference frame than the Earth,
the surface of the Earth. Furthermore, a map namely the human body. The everyday expres-
projection relates the three-dimensional positions sions of left and right , up and down all
to coordinates on a two-dimensional map plane. refer to the body of the speaker or the listener,
The skills of the geodesist and land surveyor are as practiced when giving directions to tourists
applied in the cadastral eld to record agreed or explaining where to buy the best offers of
legal boundaries between property units, the as- the day. Important research areas include the
140 Cadastre
balancing of nominal, ordinal and metric means and the building erected on it may establish
of localization and the consideration of relations one unit, yet alternatively the building always or
amongst various spatial reference frames. under certain conditions constitutes a unit in its
own right. Variations also occur as to whether
Communication and Standardization parts of buildings can become units, for example
The national information systems (cadastre, land in terms of condominiums, which may depend on
registry, or whatever may be the name) and orga- conditions related to use for housing or business
nizational structure of real property information purposes. Research efforts under the heading of
systems depend on databases and related archives standardization of the core cadastral unit have
and need updating if they are to render trust- contributed substantially to the understanding of
worthy minutes of the spatial and legal situation the complexity of the property unit.
of the property units. Computer and commu- Updating the information in property registers
nication sciences provide the theoretical back- is as essential as the speci cation of units. From
ground for these structures and tasks. However, an informatics point of view, a survey of the
until recently the methods provided in terms of information ows in what may be called the geo-
systems analysis and design, data modeling, etc. data network may reveal uncoordinated, and
addressed the information system within a single perhaps duplicated, efforts to acquire information
decision body, while the situation pertaining to and other suboptimal practices. However, from
real property information depends on an interplay the end users point of view, what takes place
between ministries, local government, and the is a transaction of property rights and related
private sector. The notion of a geospatial data processes, for example subdivision of property
infrastructure is used to describe this scope. The units. The updating of property registers is from
modeling of this infrastructure compares to the this point of view a by-product of the transaction.
modeling of an industrial sector for e-business, The end-user point of view is taken also by
an emergent research issue that includes the de- economists, who offer a theoretical basis for
velopment of vocabulary and ontology resources investigations of the mentioned processes, a eld
of the domain. known as institutional economics . New insti-
The speci cation of the property unit is a tutional economics (NIE) introduces transaction
fundamental issue. Often, the unit is supposed costs as an expense in addition to the cost of pro-
to be in individual ownership, but state or other ducing a commodity to the market. In the present
public ownership is common enough to deserve context, the transaction costs are the fees and
consideration, as are various forms of collective honoraries, etc. to be paid by the buyer and seller
ownership. Collective ownership is realized by of a property unit, besides the cost of the property
attributing rights and obligations to a so-called itself. Buyers efforts to make sure that the seller
legal person, which may be an association, a is in fact entitled to dispose of the property unit
limited company, or another social construct en- concerned can be drastically reduced, that is:
dorsed by legislation or custom. Comparative transaction costs are lowered, where reliable title
studies reveal that the property unit itself can be information from land registries is available.
speci ed in a host of variations: Is the unit a The NIE approach, as advocated by Nobel
single continuous piece of land, or is it de ned laureate Douglass C North, was applied in re-
as made up of one or more of such entities? cent, comparative research. His notion of in-
Relations among pieces of land can be speci ed stitution : the norms which restrict and enable
in other ways: In Northern and Central Europe, human behavior, suggested research focused on
a construct exists where a certain share of a practices rather than legal texts. Methods were
property unit is owned by the current owners of developed and applied for the systematic de-
a number of other property units. In cadastral scription and comparison of property transac-
parlance, such unit is said to be owned by the tions, including a formal, ontology-based ap-
share-holding property units. Furthermore, land proach. The methods developed were feasible and
Cadastre 141
initial national accounts of transaction costs were terms, conveyance is an example of a transaction
drafted. However, advice for optimal solutions in commodities or other assets. These transac-
are not to be expected, partly because many tions are performed according to a set of rules:
of the agents involved, both in the private and acts, by-laws, professional codes of conduct, etc.
the public sector, have other preferences than which are, in the terminology of Douglass North,
the minimizing of transaction costs. Moreover, a set of institutions. The process of change of
property transactions are, for various political these institutions is the object of analysis on C
reasons, often regulated, for example through the second layer and, following Daniel Bromley,
municipal preemption rights or spatial planning called institutional transactions . Institutional
measures. The NIE approach does however offer transactions may reduce transaction costs within
an adequate theoretical basis for analyzes of the the jurisdiction concerned, but Bromley shows
infrastructure of real property rights, analyzes at length that this need not be so; generally, the
which assist in the identi cation and remedy of initiator of an institutional transaction cannot be
the most inappropriate practices. sure whether the intended outcome is realized.
Among other things, this is because the transac-
Cadastral Development Through tion is open to unplanned interference from actors
Institutional Transactions on the periphery and also because the various
Institutional economics divides into two strands: resources of the actors are only partly known at
NIE, and institutional political economy, respec- the outset.
tively. The former may be applied to the cadastre The strand of institutional political economy
and its related processes, conceived as a quasir- is researching such institutional transactions in
ational, smooth-running machine that dispatches order to explain why some countries grow rich,
information packets between agents to achieve while others fail to develop their economy. Here
a certain outcome, e.g., the exchange of money we have the theoretical basis for explaining the
against title to a speci c property unit. However, emergence and diffusion of the cadastre in early
this approach does not account for the fact that modern Europe, cf. Historical Background
the various governmental units, professional as- above. The pious and enlightened absolutist
sociations, etc. involved in the cadastral processes monarchs and their advisors established a set
have diverse mandates and objectives. Develop- of norms that framed institutional transactions in
ment projects pay attention to these con icting a way that encouraged a growing number of the
interests through stakeholder analyzes. Research population to strive for the common weal .
in organizational sociology suggests identi ca-
tion of policy issue networks and investigation
of the exchange of resources among the actors
involved, for example during the preparation and Key Applications
passing of a new act. The resources are gener-
ally supposed not to be money, although bribery The de nition of cadastre speci es the key ap-
occurs, but more abstract entities such as legal- plication: the of cial identi cation and recording
technical knowledge, access to decision centers, of information on geographical objects: pieces of
organizational skills, reputation, and mobilizing land, buildings, pipes, etc. as well as documents
power. and extracts from these on rights and restrictions
Institutional economics provides a frame for pertaining to the geographical objects. Cadastral
relating this power game to the routine trans- knowledge is applied not only in the public
actions of conveyance of real estate. It is done sector, but also in private companies, as we shall
by introducing two layers of social analysis: the see form the following overview of services:
layer of routine transactions, and the layer of
change of the rules which determine the rou- Facilitation of nancial in ow to central and
tine transactions (Williamson 2000). In economic local government by speci cation of property
142 Cadastre
2001. http://www.wider.unu.edu/publications/pb3.pdf.
Accessed 14 Aug 2007 Carbon Emissions
Frank AU (2001) Tiers of ontology and consistency con-
straints in geographic information systems. Int J Geogr Climate Risk Analysis for Financial Institu-
Inf Sci 15:667 678
Kain RJP, Baigent E (1993) The cadastral map in the tions
service of the state: a history of property mapping. The
University of Chicago Press, Chicago
Kaufmann J, Steudler D (1998) Cadastre 2014. http://
www. g.net/cadastre2014/index.htm. Accessed 14 Carbon Finance
Aug 2007
Kuhn W (2001) Ontologies in support of activities in
geographical space. Int J Geogr Inf Sci 15:613 631 Climate Risk Analysis for Financial Institu-
North DC (1990) Institutions, institutional change and tions
economic performance. Cambridge University Press,
Cambridge
Palacio A, Legal empowerment of the poor: an action
Agenda for the world bank. Available via http://
rru.worldbank.org/Documents/PSDForum/2006/back Carbon Trading
ground/legal_empowerment_of_poor.pdf. Accessed
14 Aug 2007
Stubkj r E (2003) Modelling units of real property Climate Risk Analysis for Financial Institu-
rights. In: Virrantaus K, Tveite H (eds) ScanGIS 03 tions
proceedings, 9th Scandinavian research conference
on geographical information sciences, Espoo, June
2003
van Oosterom P, Schlieder C, Zevenbergen J, Hess
C, Lemmen C, Fendel E (eds) (2005) Standardiza- Cardinal Direction Relations
tion in the cadastral domain. In: Proceedings, stan-
dardization in the cadastral domain, Bamberg, Dec
Directional Relations
2004. The International Federation of Surveyors,
Frederiksberg
Williamson OE (2000) The new institutional economics:
taking stock, looking ahead. J Econ Literat 38:595
613 Cartographic Data
Zevenbergen J, Frank A, Stubkj r E (eds) (2007) Real
property transactions: procedures, transaction costs
and models. IOS Press, Amsterdam Photogrammetric Products
Recommended Reading
Cartographic Generalization
Chang H-J Understanding the relationship between
institutions and economic development: some key
theoretical issues. UNU/WIDER Discussion Paper Abstraction of Geodatabases
2006/05. July 2006. http://www.wider.unu.edu/
publications/dps/dps2006/dp2006-05.pdf. Accessed
14 Aug 2007
de Janvry A, Gordillo G, Platteau J-P, Sadoulet E
(eds) (2001) Access to land, rural poverty Cartographic Information System
and public action. UNU/WIDER studies in
development economics. Oxford University Press,
Oxford/New York Atlas Information Systems
Historical Background
Synonyms
The rst catalogues were introduced by pub-
Catalogue information schema; Catalogue meta- lishers serving their own business of selling the
data schema; Registry information model books they printed. At the end of the fteenth
century, they made lists of the available titles
and distributed them to those who frequented
the book markets. Later on, with the increasing
Definition
volume of books and other inventories, the library
became one of the earliest domains providing
The catalogue information model is a conceptual
a detailed catalogue to serve their users. These
model that speci es how metadata is organized
library catalogues hold much of the reference
within the catalogue. It de nes a formal structure
information (e. g., author, title, subject, publica-
representing catalogued resources and their inter-
tion date, etc.) of bibliographic items found in a
relationships, thereby providing a logical schema
particular library or a group of libraries.
for browsing and searching the contents in a
People began to use the term metadata in the
catalogue.
late 1960s and early 1970s to identify this kind
There are multiple and slightly different
of reference information. The term meta comes
de nitions of the catalogue information model
from a Greek word that denotes alongside, with,
used by various communities. The Open
after, next. More recent Latin and English us-
Geospatial Consortium (OGC) de nes the
age would employ meta to denote something
catalogue information model in the OGC
transcendental or beyond nature (Using Dublin
Catalogue Services Speci cation (OGC) as an
Core).
abstract information model that speci es a BNF
The card catalogue was a familiar sight to
grammar for a minimal query language, a set
users for generations, but it has been effectively
of core queryable attributes (names, de nitions,
replaced by the computerized online catalogue
conceptual data types), and a common record
which provides more advanced information tools
format that de nes the minimal set of elements
helping to collect, register, browse, and search
that should be returned in the brief and
digitized metadata information.
summary element sets. The Organization for
the Advancement of Structured Information
Standards (OASIS) de nes the registry informa- Scientific Fundamentals
tion model in the ebXML Registry Information
Model (ebRIM) speci cation (OASIS) as the Metadata can be thought of as data about
information model which provides a blueprint other data. It is generally used to describe the
or high-level schema for the ebXML registry. It characteristics of information-bearing entities to
provides the implementers of the ebXML registry aid in the identi cation, discovery, assessment,
with information on the type of metadata that is management, and utilization of the described
146 Catalogue Information Model
entities. Metadata standards have been developed their own catalogue information model as pro-
to standardize the description of information- les. However, to facilitate the interoperability
bearing entities for speci c disciplines or between diverse OGC-compliant catalogue ser-
communities. For interoperability and sharing vice instances, a set of core queryable parameters
purposes, a catalogue system usually adopts originated from Dublin Core is proposed in the
a metadata standard used in the community base speci cation and is desirable to be supported
the system intends to serve as its catalogue in each catalogue service instance. OGC further
information model. endorsed the OASIS ebRIM (e-Business Registry
A metadata record in a catalogue system Information Model) as the preferred basis for
consists of a set of attributes or elements future pro les of OGC Catalogue (OGC).
necessary to describe the resource in question. How a catalogue information model can be
It is an example of the catalogue information formally discovered and described in a catalogue
model being used by the catalogue system. A service is another issue. Some catalogue
library catalogue, for example, usually consists services do not provide speci c operations
of the author, title, date of creation or publication, for automatic discovery of the underlying
subject coverage, and call number specifying the catalogue information model, while others
location of the item on the shelf. The structures, support particular operations to ful ll this task.
relationships, and de nitions for these queryable In the OGC Catalogue Services Speci cation, the
attributes known as conceptual schemas exist names of supported information model elements
for multiple information communities. For the can be listed in the capabilities les, and a
purposes of interchange of information within an mandatory DescribeRecord operation allows the
information community, a metadata schema may client to discover elements of the information
be created that provides a common vocabulary model supported by the target catalogue service.
which supports search, retrieval, display, and This operation allows some of or the entire
association between the description and the information model to be described.
object being described.
A catalogue system needs to reference an
information model for collecting and manipu- Key Applications
lating the metadata of the referenced entities
catalogued in the system. The information model The concept of the catalogue information model
provides speci c ways for users to browse and has been widely applied in many disciplines for
search them. Besides the metadata information information management and retrieval. Common
that directly describes those referenced entities metadata standards are widely adopted as the
themselves, a catalogue might hold another type catalogue information model. Among them, the
of metadata information that describes the rela- Dublin Core is one of the most referenced and
tionship between these entities. commonly used metadata information models for
Some catalogue services may only support scienti c catalogues. In the area of geographic
one catalogue information model, each with the information science, ISO 19115 is being widely
conceptual schema clearly de ned, while others adopted as the catalogue information model for
can support more than one catalogue information facilitating the sharing of a large volume of
model. For example, in the US Geospatial Data geospatial datasets.
Clearinghouse, the af liated Z39.50 catalogue
servers only support US Content Standard for Dublin Core
Digital Geospatial Metadata (CSDGM) standard The Dublin Core metadata standard is a simple
in their initial developing stage. While in OGC yet effective element set for describing a wide
Catalogue Service base speci cation, what cat- range of networked resources (Dublin Core).
alogue information model can be used is un- The Dublin in the name refers to Dublin,
de ned. Developers are encouraged to propose Ohio, USA, where the work originated from
Catalogue Information Model 147
a workshop hosted by the Online Computer through a forum on geospatial metadata. The rst
Library Center (OCLC), a library consortium version of the standard was approved on June 8,
which is based there. The Core refers to the 1994, by the FGDC.
fact that the metadata element set is a basic but Since the issue of Executive Order 12906,
expandable core list (Using Dublin Core). Coordinating Geographic Data Acquisition
The Simple Dublin Core Metadata Element and Access: The National Spatial Data Infras-
Set (DCMES) consists of 15 metadata elements: tructure, by President William J. Clinton on C
title, creator, subject, description, publisher, con- April 11, 1994, this metadata standard has been
tributor, date, type, format, identi er, source, lan- adopted as the catalogue information model in
guage, relation, coverage, and rights. Each ele- numerous geospatial catalogue systems operated
ment is optional and may be repeated. by US federal, state, and local agencies as well as
The Dublin Core Metadata Initiative (DCMI) companies and groups. It has also been used by
continues the development of exemplary terms or other nations as they develop their own national
quali ers that extend or re ne these original 15 metadata standards.
elements. Currently, the DCMI recognizes two In June of 1998, the FGDC approved the
broad classes of quali ers: element re nement CSDGM version 2, which is fully backward com-
and encoding scheme. Element re nement makes patible with and supersedes the June 8, 1994, ver-
the meaning of an element narrower or more spe- sion. This version provides for the de nition of
ci c. Encoding scheme identi es schemes that pro les (Appendix E) and extensibility through
aid in the interpretation of an element value. user-de ned metadata extensions (Appendix D).
There are many syntax choices for Dublin The June 1998 version also modi es some pro-
Core metadata, such as SGML, HTML, duction rules to ease implementation.
RDF/XML, and key-value pair TXT le. In The Content Standard for Digital Geospatial
fact, the concepts and semantics of Dublin Core Metadata (CSDGM) (FGDC, CSDGM) identi-
metadata are designed to be syntax independent es and de nes the metadata elements used to
and are equally applicable in a variety of contexts. document digital geospatial data sets for many
purposes, which includes (1) preservation of the
Earth Science meaning and value of a dataset, (2) contribution
With the advances in sensor and platform tech- to a catalogue or clearinghouse, and (3) aid in
nologies, the Earth science community has col- data transfer. CSDGM groups the metadata infor-
lected a huge volume of geospatial data in the past mation into the following seven types:
30 years via remote sensing methods. To facilitate
the archival, management, and sharing of these Identi cation_Information
massive geospatial data, the Earth science com- Data_Quality_Information
munity has been one of the pioneers in de ning Spatial_Data_Organization_Information
metadata standards and using them as informa- Spatial_Reference_Information
tion models in building catalogue systems. Entity_and_Attribute_Information
Distribution_Information
FGDC Content Standard for Digital Geospatial Metadata_Reference_Information
Metadata
The Federal Geographic Data Committee For each type, it further de nes composed ele-
(FGDC) of the USA is a pioneer in setting ments and their type, short name, and/or domain
geospatial metadata standards for the US federal information.
government. To provide a common set of To provide a common terminology and set of
terminology and de nitions for documenting de nitions for documenting geospatial data ob-
digital geospatial data, FGDC initiated work tained by remote sensing, the FGDC de ned the
on setting the Content Standard for Digital Extensions for Remote Sensing Metadata within
Geospatial Metadata (CSDGM) in June of 1992 the framework of the June 1998 version of the
148 Catalogue Information Model
CSDGM (FGDC, Content Standard for Digital in the metadata or sometimes be documented,
Geospatial Metadata). These remote sensing ex- ISO 19115 de nes a descriptor for each package
tensions provide additional information partic- and each element. This descriptor may have the
ularly relevant to remote sensing: the geome- following values:
try of the measurement process, the properties
of the measuring instrument, the processing of
raw readings into geospatial information, and M (mandatory)
the distinction between metadata applicable to C (conditional)
an entire collection of data and those applicable O (optional)
only to component parts. For that purpose, these
remote sensing extensions establish the names, Mandatory (M) means that the metadata entity or
de nitions, and permissible values for new data metadata element shall be documented.
elements and the compound elements of which Conditional (C) speci es an electronically
they are the components. These new elements are manageable condition under which at least
placed within the structure of the base standard, one metadata entity or a metadata element
allowing the combination of the original standard is mandatory. Conditions are de ned in the
and the new extensions to be treated as a sin- following three possibilities:
gle entity (FGDC, Content Standard for Digital
Geospatial Metadata). Expressing a choice between two or more
options. At least one option is mandatory and
ISO 19115 must be documented.
In May 2003, ISO published ISO 19115: Ge- Documenting a metadata entity or a metadata
ographic Information Metadata (ISO/TC 211). element if another element has been docu-
The international standard was developed by ISO mented.
Technical Committee (TC) 211 as a result of Documenting a metadata element if a speci c
consensus among TC national members as well value for another metadata element has been
as its liaison organizations on geospatial meta- documented. To facilitate reading by humans,
data. ISO 19115, rooted at FGDC CSDGM, pro- plain text is used for the speci c value. How-
vides a structure for describing digital geographic ever, the code shall be used to verify the
data. Actual clauses of 19115 cover properties condition in an electronic user interface.
of the metadata: identi cation, constraints, qual-
ity, maintenance, spatial representation (grid and In short, if the answer to the condition is positive,
vector), reference systems, content (feature cata- then the metadata entity or the metadata element
logue and coverage), portrayal, distribution, ex- shall be mandatory.
tensions, and application schemas. Complex data Optional (O) means that the metadata entity or
types used to describe these properties include the metadata element may be documented or may
extent and citations. ISO 19115 has been adopted not be documented. Optional metadata entities
by OGC as a catalogue information model in and optional metadata elements provide a guide
its Catalogue Service for Web-ISO 19115 Pro- to those looking to fully document their data. If an
le (OGC). Figure 1 depicts the top-level UML optional entity is not used, the elements contained
model of the metadata standard. within that entity (including mandatory elements)
ISO 19115 de nes more than 300 metadata will also not be used. Optional entities may have
elements (86 classes, 282 attributes, 56 relations). mandatory elements; those elements only become
The complex, hierarchical nested structure and mandatory if the optional entity is used.
relationships between the components are shown ISO 19115 de nes the core metadata that
using 16 UML diagrams. consists of a minimum set of metadata required
To address the issue whether a metadata entity to serve the full range of metadata applications.
or metadata element shall always be documented All the core elements must be available in
Catalogue Information Model 149
<<Leaf>>
Constraint
information <<Leaf>>
Content
information <<Leaf>>
<<Leaf>> Citation and responsible
Portrayal catalogue party information
information C
<<Leaf>>
<<Leaf>> Distribution
Maintenance information
information
<<Leaf>>
Meta data entity
set information <<Leaf>>
<<Leaf>> Meta data extension
Application schema information
information
<<Leaf>>
<<Leaf>> Identification
<<Leaf>>
Data quality information
Reference system
information information
a given metadata system. The optional ones US NASA ECS Core Metadata Standard
need not be instantiated in a particular dataset. To enable an improved understanding of the Earth
These 22 metadata elements are shown in as an integrated system, in 1992, the National
Table 1. Aeronautics and Space Administration (NASA)
Currently, ISO is developing ISO 19115-2, of the USA started the Earth Observing System
which extends ISO 19115 for imagery and grid- (EOS) program, which coordinates efforts to
ded data. Similar to the FGDC efforts and using study the Earth as an integrated system. This
FGDC CSDGM Extensions for Remote Sensing program, using spacecraft, aircraft, and ground
Metadata as its basis, ISO 19115-2 will de ne instruments, allows humans to better understand
metadata elements particularly for imagery and climate and environmental changes and to
gridded data within the framework of ISO 19115. distinguish between natural and human-induced
According to the ISO TC 211 program of work, changes. The EOS program includes a series
the nal CD was posted in March 2007; barring of satellites, a science component, and a data
major objection, it will be published as DIS in system for long-term global observations of the
June 2007. land surface, biosphere, solid Earth, atmosphere,
150 Catalogue Information Model
and oceans. The program aims at accumulating extent, and content. The ECS Core Metadata
15 years of Earth observation data at a rate Standard has been used as the catalogue infor-
of over 2 terabytes per day. To support data mation model for EOSDIS Data Gateway (EDG)
archival, distribution, and management, NASA and the EOS ClearingHOuse (ECHO). The ECS
has developed an EOS Data and Information Core Metadata Standard was the basis for the
System (EOSDIS) and its core system (ECS), development of FGDC CSDGM Extensions for
the largest data and information system for Earth Remote Sensing Metadata.
observation in the world. With new satellites being launched and instru-
In order to standardize the descriptions of ments being operational, this standard will incor-
data collected by the EOS program, NASA has porate new keywords from them into the new ver-
developed the ECS Core Metadata Standard. The sion. The current version is 6B, released on Octo-
standard de nes metadata in several areas: al- ber 2002. The 6B version is logically segmented
gorithm and processing packages, data sources, into eight modules for the purpose of readability,
references, data collections, spatial and temporal including data originator, ECS collection, ECS
Catalogue Information Model 151
RegistryEntry
RegistryPackage 0..* ExternalLink Externalldentifier
0..*
packages
externalLinks
0..* externalldentifiers
Slot <(Association)>
Association
C
member
0..* slots 0..* 1..* linked Objects identificationScheme
RegistryObject
Classification
affectedObjects 1..* classifications 0..*
0..*
specifcationObject
0..* 0..* Association
classificationScheme
AuditableEvent
RegistryEntry
Association ClassificationScheme
SpecificationLink
classificationNode
requestor
1 1..* 0..*
User
0..* Association 0..* Organization
classificationScheme
affiliatedWith ServiceBinding
1
0..* 0.1
primaryContact
ClassificationNode 0..1
parent 0..* bindings
targetBinding
parent
Catalogue Information Model, Fig. 2 The high-level UML model of ebRIM (OASIS)
data granule, locality spatial, locality temporal, Pro les in the OGC technical meeting in
contact, delivered algorithm package, and docu- December 2007 (OGC).
ment (ECS).
Central Perspective
References
Photogrammetric Methods
Dublin Core, http://en.wikipedia.org/wiki/Dublin_Core.
Accessed 13 Sept 2007
ECS, Release 6B Implementation Earth Science Data Mo-
del for the ECS Project. http://spg.gsfc.nasa.gov/stan
dards/heritage/eosdis-core-system-data-model. Acc- Central Projection
essed 13 Sept 2007
FGDC, CSDGM. http://www.fgdc.gov/standards/stand
ards_publications/index_html. Accessed 13 Sept 2007 Photogrammetric Methods
FGDC, Content Standard for Digital Geospatial Metadata,
Extensions for Remote Sensing Metadata. http://www.
fgdc.gov/standards/standards_publications/index_html.
Accessed 13 Sept 2007
ISO/TC 211, ISO 19115 geographic information: meta- Centrographic Measures
data
OASIS, ebXML Registry Information Model (ebRIM).
http://www.oasis-open.org/committees/regrep/docume CrimeStat: A Spatial Statistical Program for the
nts/2.0/specs/ebrim.pdf. Accessed 13 Sept 2007 Analysis of Crime Incidents
OGC, OGC Catalogue Services Speci cation, OGC 04-
021r3. http://portal.opengeospatial.org/ les/?artifact_
id=5929&Version=2 . Accessed 13 Sept 2007
OGC, OpenGISfi Catalogue Services Speci cation 2.0
ISO19115/ISO19119 Application Pro le for CSW CGI
2.0. OGC 04-038r2. https://portal.opengeospatial.org/
les/?artifact_id=8305. Accessed 13 Sept 2007
OGC, EO Products Extension Package for ebRIM Web Mapping and Web Cartography
(ISO/TS 15000-3) Pro le of CSW 2.0. OGC 06-131.
http://portal.opengeospatial.org/ les/?artifact_id=17
689. Accessed 13 Sept 2007
OGC, OGC Adopts ebRIM for Catalogues. http://www.
opengeospatial.org/pressroom/pressreleases/655. Ac- CGIS
cessed 13 Sept 2007
Using Dublin Core, http://www.dublincore.org/docume
nts/2001/04/12/usageguide/. Accessed 13 Sept 2007 Geocollaboration
also very heterogeneous and may vary from local performed by carefully selecting relevant mul-
events (e.g., road construction) to global changes tidate imagery and by applying pre-processing
(e.g., ocean water temperature). Due to this very treatments.
large spatio-temporal range, the nature and extent
of changes are complex to determine because
they are interrelated and interdependent at differ- Data Selection and Pre-processing
ent scales (spatial, temporal). Change detection Data selection is a critical step in change detec-
is, therefore, a challenging task. tion studies. The acquisition period (i.e., season,
month) of multidate imagery is an important
Imagery Characteristics Regarding parameter to consider in image selection because
Changes it is directly related to phenology, climatic con-
Since the development of civilian remote sensing, ditions, and solar angle. A careful selection of
the earth bene ts from a continuous and increas- multidate images is therefore needed in order to
ing coverage by imagery such as: aerial photogra- minimize the effects of these factors. In vegeta-
phy or satellite imagery. This coverage is ensured tion change studies (i.e., over different years), for
by various sensors with various properties. First, example, summer is usually used as the target
in terms of the time scale, various temporal reso- period because of the relative stability of phe-
lutions (i.e., revisit time) and mission continuities nology, solar angle, and climatic conditions. The
allow coverage of every point of the earth from acquisition interval between multidate imagery
days to decades. Secondly, in terms of the spatial is also important to consider. As mentioned be-
scale, various spatial resolutions (i.e., pixel size, fore, earth surface changes must cause enough
scene size) allow coverage of every point of the radiance changes to be detectable. However, the
earth at a sub-meter to a kilometer resolution. data selection is often limited by data availability
Thirdly, sensors are designed to observe the earth and the choice is usually a compromise between
surface using various parts of the electromagnetic the targeted period, interval of acquisition, and
spectrum (i.e., spectral domain) at different res- availability. The cost of imagery is also a limiting
olutions (i.e., spectral resolution). This diversity factor in data selection.
allows the characterization of a large spectrum However, a careful data selection is usually
of earth surface elements and change processes. not enough to minimize radiometric heterogene-
However, change detection is still limited by ity between multidate images. First, atmospheric
data availability and data consistency (i.e., mul- conditions and solar angle differences usually
tisource data). need additional corrections and secondly other
factors such as sensor calibration or geometric
Changes in Imagery distortions need to be considered. In change de-
Changes in imagery between two dates trans- tection analysis, multidate images are usually
late into changes in radiance. Various factors compared on a pixel basis. Then, very accu-
can induce changes in radiance between two rate registrations need to be performed between
dates such as changes in: sensor calibration, so- images in order to compare pixels at the same
lar angle, atmospheric conditions, seasons, or locations. Misregistration between multidate im-
earth surface. The rst premise of using imagery ages can cause signi cant errors in change in-
for change detection of the earth surface is that terpretation. The sensitivity of change detection
change in the earth surface must result in a approaches to misregistration is variable though.
change in radiance values. Secondly, the change The minimization of radiometric heterogeneity
in radiance due to earth surface changes must (due to sources other than earth surface change)
be large compared to the change in radiance due can be performed using different approaches de-
to other factors. A major challenge in change pending on the level of correction required and
detection of the earth surface using imagery is the availability of atmospheric data. The tech-
to minimize these other factors. This is usually niques such as dark object subtraction, relative
Change Detection 155
radiometric normalization or radiative transfer threshold selection using the standard deviation
code can be used. of resulting pixels.
theoretical basis. The Principal Component dates). Usually the PCA is calculated from a
Analysis (PCA) and the Tasseled-Cap trans- variance/co-variance matrix. However, standard-
formations are the most common ones. Linear ized matrix (i.e., correlation matrix) is also used.
transformations are often used to reduce spectral The PCA is scene dependent and results can be
data dimensionality by creating fewer new hard to interpret. The challenging steps are to
components. The rst components contain most label changes from principal components and to
of the variance in the data and are uncorrelated. select thresholds between change and no-change
When used for change detection purposes, linear areas. A good knowledge of the study area is
transformations are performed on multidate required.
images that are combined as a single dataset The Tasseled-Cap is also a linear transfor-
(Fig. 3). mation. However, unlike PCA, it is independent
After performing a PCA, unchanged areas are of the scene. The new component directions
mapped in the rst component (i.e., information are selected according to pre-de ned spectral
common to multidate images) whereas areas of properties of vegetation. Four new components
changes are mapped in the last components (i.e., are computed and oriented to enhance brightness,
information unique to either one of the different greenness, wetness, and yellowness. Results are
Change Detection 157
also dif cult to interpret and change labeling a change between two dates, its position in
is challenging. Unlike PCA, Tasseled-Cap n-dimensional spectral space is expected to
transformation for change detection requires change. This change is represented by a vector
accurate atmospheric calibration of multidate (Fig. 4) which is de ned by two factors, the
imagery. direction which provides information about
Other transformations such as multivariate the nature of change and the magnitude which
alteration detection or Gramm-Schmidt transfor- provides information about the level of change.
mation were also developed but used to a lesser This approach has the advantage to process
extent. concurrently any number of spectral bands. It
also provides detailed information about change.
Change vector analysis: This approach is The challenging steps are to de ne thresholds of
based on the spatial representation of change magnitude, discriminating between change and
in a spectral space. When a pixel undergoes no change, and to interpret vector direction in
158 Change Detection
relation with the nature of change. This approach sion function and to de ne thresholds between
is often performed on transformed data using change and no change areas.
methods such as Tasseled-Cap.
Multitemporal spectral mixture analysis: The
Image regression: This approach assumes that spectral mixture analysis is based on the premise
there is a linear relationship between pixel values that a pixel re ectance value can be computed
of the same area at two different times. This from individual values of its composing elements
implies that a majority of the pixels did not (i.e., end-members) weighted by their respective
encounter changes between the two dates (Fig. 5). proportions. This case assumes a linear mixing
A regression function that best describes the of these components. This method allows re-
relationship between pixel values of each spectral trieving sub-pixel information (i.e., surface pro-
band at two dates is developed. The residuals of portions of end-members) and can be used for
the regression are considered to represent the ar- change detection purposes by performing sepa-
eas of changes. This method has the advantage of rate analysis and comparing results at different
reducing the impact of radiometric heterogeneity dates (Fig. 6). The advantage of this method is
(i.e., atmosphere, sun angle, sensor calibration) to provide precise and repeatable results. The
between multidate images. However, the chal- challenging step of this approach is to select
lenging steps are to select an appropriate regres- suitable end-members.
Change Detection 159
Combined approaches: The previous tech- the vegetation cover. This degradation has had
niques represent the most common approaches a direct impact on health problems observed
used for change detection purposes. They can in the caribou (Rangifer tarandus) population
be used individually, but are often combined over the last few years and may also have
together or with other image processing contributed to the recent decline of the GRCH
techniques to provide more accurate results. (404,000 head in 2000 2001). Lichen habitats
Numerous combinations can be used and they are good indicators of caribou herd activity
will not be described here. Some of them because of their sensitivity to overgrazing and
include the combination of vegetation indices and overtrampling, their widespread distribution over
image differencing, change vector analysis and northern territories, and their in uence on herd
principal component analysis, direct multidate nutrition. The herd range covers a very large
classi cation and principal component analysis, territory which is not easily accessible. As a
multitemporal spectral analysis and image result, eld studies over the whole territory are
differencing, or image enhancement and post- limited and aerial surveys cannot be conducted
classi cation. frequently. Satellite imagery offers the synoptic
view and temporal resolution necessary for
Example of change detection analysis: mapping and monitoring caribou habitat. In
Mapping changes in caribou habitat using this example, a change detection approach using
multitemporal spectral mixture analysis: The Landsat imagery was used. The procedure was
George River Caribou Herd (GRCH), located based on spectral mixture analysis and produced
in northeastern Canada, increased from about maps showing the lichen proportion inside each
5,000 in the 1950s to about 700,000 head in pixel. The procedure was applied to multidate
the 1990s. This has led to an over-utilization imagery to monitor the spatio-temporal evolution
of summer habitat, resulting in degradation of of the lichen resource over the past three decades
160 Change Detection
Change Detection, Fig. 5 Raster data covering the exact same location
Example and principle of (e.g. Digital Number Band a)
the image regression
procedure
Time x Time y
and gave new information about the habitat used Change Master Directory; IGBP; IHDP; WCRP).
by the herd in the past, which was very useful to Monitoring changes using GIS and remote sens-
better understand population dynamics. Figure 6 ing is therefore used in a wide eld of applica-
summarizes the approach used in this study and tions. A non-exhaustive list of key applications is
illustrates the steps typical of a change detection presented here.
procedure.
Forestry
Key Applications Deforestation (e.g., clear cut mapping, regen-
eration assessment)
The earth surface is changing constantly in many Fire monitoring (e.g., delineation, severity,
ways. Changes occur at various spatial and tem- detection, regeneration)
poral scales in numerous environments. Change Logging planning (e.g., infrastructures, inven-
detection techniques are employed for different tory, biomass)
purposes such as research, management, or busi- Herbivory (e.g., insect defoliation, grazing)
ness (Lunetta and Elvidge 1998; Canada Centre Habitat fragmentation (e.g., landcover
for Remote Sensing; Diversitas; ESA; Global changes, heterogeneity)
Change Detection 161
Image B
to correct Before correction After correction
Spectral Mixture Analysis provides for each pixel: Lichen fraction, Canopy fraction, and Shadow
Image Differencing
Change Detection Results for lichen fractions between 1978 and 1998
0 20 km
0 Decreased % lichen100
For more details see: Théau and Duguay (2004) Mapping Lichen Habitat Changes inside the Summer Range of the George River Caribou Herd
(Québec-Labrador, Canada) using Landsat Imagery (1976-1998). Rangifer. 24: 31-50.
Change Detection, Fig. 6 Example of a change detection procedure. Case study of mapping changes in caribou
habitat using multitemporal spectral mixture analysis
Transportation and infrastructure planning and data processing capacities. In the future,
(e.g., landcover use) these elds will still evolve in parallel and new
developments in change detection are expected
Ice and Snow with the development of computer technologies.
Navigation route (e.g., sea ice motion) Developments and applications of new im-
Infrastructure protection (e.g., ooding moni- age processing methods and geospatial analysis
toring) are also expected in the next decades. Arti -
Glacier and ice sheet monitoring (e.g., motion, cial intelligence systems as well as knowledge-
melting) based expert systems and machine learning al-
Permafrost monitoring (e.g., surface tempera- gorithms represent new alternatives in change
ture, tree line) detection studies (Coppin et al. 2004). These
techniques have gained considerable attention in
Ocean and Coastal the past few years and are expected to increase in
Water quality (e.g., temperature, productivity) change detection approaches in the future. One
Aquaculture (e.g., productivity) of the main advantages of these techniques is
Intertidal zone monitoring (e.g., erosion, veg- that they allow the integration of existing knowl-
etation mapping) edge and non-spectral information of the scene
Oil spill (e.g., detection, oil movement) content (e.g., socio-economic data, shape, and
size data). With the increasing interest in using
integrated approaches such as coupled human-
environment systems, these developments look
Future Directions promising.
The recent integration of change detection and
In the past decades, a constant increase of re- spatial analysis modules in most GIS software
motely sensed data availability was observed. also represents a big step towards integrated tools
The launch of numerous satellite sensors as well in the study of changes on the earth surface. This
as the reduction of product costs can explain integration also includes an improvement of com-
this trend. The same evolution is expected in the patibility between image processing software and
future. The access to constantly growing archive GIS software. More developments are expected
contents also represents a potential for the de- in the future which will provide new tools for
velopment of more change detection studies in integrating multisource data more easily (e.g.,
the future. Long-term missions such as Landsat, digital imagery, hard maps, historical informa-
SPOT (Satellite pour l Observation de la Terre), tion, vector data).
AVHRR (Advanced Very High Resolution Ra-
diometer) provide continuous data for more than
20 30 years now. Although radiometric hetero-
geneity between sensors represents serious lim- Cross-References
itation in time series analysis, these data are
still very useful for long term change studies. Co-location Pattern Discovery
These data are particularly suitable in the de- Correlation Queries in Spatial Time Series Data
velopment of temporal trajectory analysis which Spatiotemporal Change Footprint Pattern Dis-
usually involves the temporal study of indicators covery
(e.g., vegetation indices, surface temperature) on
a global scale.
Moreover, as mentioned before in the His-
References
torical Background section, the development of
change detection techniques are closely linked Canada Centre for Remote Sensing, http://ccrs.nrcan.gc.
with the development of computer technologies ca/index_e.php. Accessed Nov 2006
Channel Modeling and Algorithms for Indoor Positioning 163
RP #1
RP #2
Location metric
Sensor
Location metric
RP #3
RP #N RP #4
Location metric Location metric
Positioning Algorithm
Channel Modeling and Algorithms for Indoor Positioning, Fig. 1 General structure of an indoor geolocation
system. RP reference point
in indoor areas, owing to the large amount of sections, an overview of positioning techniques is
signal attenuation caused by building walls. In provided for the indoor environment.
addition, the behavior of the indoor radio channel
is very different from the outdoor case, in that it
exhibits much stronger multipath characteristics. Scientific Fundamentals
Therefore, new methods of position estimation
need to be developed for the indoor setting. In Structure of a Positioning System
addition, the accuracy requirements of indoor po- The basic structure of a positioning system is
sitioning systems are typically a lot higher. For an illustrated in Fig. 1, where a sensor (whose loca-
application such as E-911, an accuracy of 125 m tion is to be determined) is shown. The system
for 67 % of the time is considered acceptable consists of two parts: reference points (RPs) and
(FCC 1996), while a similar indoor application the positioning algorithm. The RPs are radio
typically requires an accuracy level on the order transceivers, whose locations are assumed to be
of only a few meters (Sayed et al.). In the next few known with respect to some coordinate system.
Channel Modeling and Algorithms for Indoor Positioning 165
Each RP measures various characteristics of the shows an example of the AOA estimation in an
signal received from the sensor, which is referred ideal nonmultipath environment. The two RPs
to in this entry as location metrics. These location measure the AOAs from the sensor as 78:3 and
metrics are then fed into the positioning algo- 45 , respectively. These measurements are then
rithm, which then produces an estimate of the used to form lines of position, the intersection of
location of the sensor. which is the position estimate.
The location metrics are of three main In real-world indoor environments, however, C
types: multipath effects will generally result in AOA
estimation error. This error can be expressed as
Angle of arrival (AOA)
Time of arrival (TOA)
OD true (1)
Received signal strength (RSS)
This section is organized in four subsections; where true is the true AOA value, generally
in the rst three, each of these location metrics obtained when the sensor is in the line-of-sight
is discussed in greater detail, while the last is (LOS) path from the RP. In addition, O represents
devoted to a nonexhaustive survey of position the estimated AOA, and is the AOA estimation
estimation techniques using these metrics. error. As a result of this error, the sensor position
is restricted over an area de ned with an angular
Angle of Arrival spread of 2 , as illustrated in Fig. 3 below for
As its name implies, AOA gives an indication of the two-RP scenario. This clearly illustrates that
the direction the received signal is coming from. in order to use AOA for indoor positioning, the
In order to estimate the AOA, the RPs need to be sensor has to be in the LOS path to the RP, which
equipped with special antennae arrays. Figure 2 is generally not possible.
45,0°
RP2
45,0°
RP2
166 Channel Modeling and Algorithms for Indoor Positioning
Amplitude
50 Channel Profile
40
10
0
0 20 40 60 80 100 120 140
Time (nsec)
Time of Arrival (TOA) increasing the bandwidth of the system used for
TOA gives an indication of the range (i.e., dis- the TOA estimation (Alavi and Pahlavan 2005).
tance between a transmitter and a receiver). The UDP conditions, on the other hand, refer to cases
basic concept can be illustrated with reference where the DP cannot be detected at all, as shown
to the channel pro le of Fig. 4 below. Since the in Fig. 5 below. UDP conditions generally occur
speed of light in free space, c, is constant, the at the edge of coverage areas, or in cases where
TOA of the direct path (DP) between the trans- there are large metallic objects in the path be-
mitter and the receiver, , will give the true range tween the transmitter and the receiver. As a result,
between the transmitter and receiver as de ned the difference between the rst detected path
by the equation: (FDP) and the DP is beyond the dynamic range
of the receiver, and the DP cannot be detected, as
d Dc : (2) shown in Fig. 5. Unlike multipath-based ranging
error, UDP-based ranging error typically cannot
In practice, the TOA of the DP cannot be be reduced by increasing the bandwidth. In addi-
estimated perfectly, as illustrated in Fig. 4. The tion, the occurrence of UDP-based ranging error
result is ranging error [also referred to as the is itself random in nature (Alavi and Pahlavan
distance measurement error (DME) in the liter- 2005).
ature], given as Through UWB measurements in typical in-
door areas, it has been shown that both multipath
ranging error and UDP-based ranging error fol-
" D dO d (3)
low a Gaussian distribution, with mean and vari-
ance that depend on the bandwidth of operation
where dO is the estimated distance and d is the true (Alavi and Pahlavan 2005). The overall model
distance. can be expressed as follows:
There are two main sources of ranging er-
ror: multipath effects and undetected direct path
(UDP) conditions. Multipath effects will result in
the DP, as well as re ected and transmitted paths dO D d C G.mw ; w / log.1 C d/
to be received. It has been shown empirically
that multipath ranging error can be reduced by C G.mUDP;w ; UDP;w / (4)
Channel Modeling and Algorithms for Indoor Positioning 167
Dynamic Range
Amplitude (mU)
Error
of 200 MHz. FDP stands
for rst detected path 0.8 C
0.6
0.4 Detection
Threshold
0.2 DP
0
0 20 40 60 80 100 120 140
Time (nsec)
d_1
d_2 D
RP-1 RP-2
are sent to one central node, which then carries CN-TOAG Algorithm
out the computations. In contrast, the term dis- The CN-TOAG algorithm leverages the fact that
tributed algorithms refers to a class of algorithms at any given point in an indoor covered by a
where the computational load for the position number of RPs, the exact value of the TOA is
calculations are spread out over all the nodes in known Kanaan and Pahlavan (2004). Consider
the network. In the next few sections, some ex- the grid arrangement of RPs in an indoor setting,
amples of centralized and distributed positioning as shown in Fig. 6. Each of these RPs would
algorithms for both xed positioning and ad hoc perform a range measurement, di .1 < qi < qN ,
scenarios will be discussed. Owing to space lim- where N is the number of RPs in the grid), to the
itations, the treatment is by no means exhaustive; user to be located.
the interested reader is referred to Hightower and Let D represent the vector of range measure-
Borriello (2001) and Niculescu as well as any ments that are reported by the RPs, and let Z rep-
associated references contained therein. resent the vector of expected TOA-based range
measurements at a certain point, r D .x; y/. For
the purposes of this algorithm, Z is known as the
Centralized Algorithms range signature associated with the point r. An
In this section, two algorithms for xed position estimate of the user s location, rO , can be obtained
estimation and one algorithm from ad hoc by nding that point r, where Z most closely ap-
positioning are discussed. For xed location proximates D. The error function, e(r)De(x, y),
estimation, the closest neighbor with TOA grid is de ned as
(CN-TOAG) (Kanaan and Pahlavan 2004) and
ray-tracing assisted closest neighbor (RT-CN) e.r/ D e.x; y/ D jjD Z.r/jj D jjD Z.x; y/jj
algorithms (Hatami and Pahlavan 2006) are (7)
discussed. For ad hoc positioning, a distributed
version of the least-squares (LS) algorithm is where jj:jj represents the vector norm. Equa-
presented (Di Stefano et al. 2003). tion (7) can also be written as
Channel Modeling and Algorithms for Indoor Positioning 169
time and this results in large DME values. These tinct combinations will have to be used: UUUU,
will then translate to large values of estimation UUUD, UUDD, UDDD, and DDDD. Each of
error; in other words, the quality of estimation these combinations can be used to characterize
(QoE) will be degraded (Kanaan et al. 2006). a different QoL class. The occurrence of each
Owing to the site-speci c nature of indoor of these combinations will give rise to a certain
radio propagation, the very occurrence of UDP MSE value in the location estimate. This MSE
conditions is random and is best described sta- value will also depend on the speci c algorithm
tistically (Alavi and Pahlavan 2005). That being used. There may be more than one way to obtain
the case, the QoE (i.e., location estimation ac- each DDP/UDP combination. If UDP conditions
curacy) will also need to be characterized in the occur with probability Pudp , then the overall prob-
same manner. Different location-based applica- ability of occurrence of the i th combination Pi
tions will have different requirements for QoE. can be generally expressed as
In a military or public safety application (such as
keeping track of the locations of re ghters or N N N Nudp;i
Pi D Pudpudp;i 1 Pudp (11)
soldiers inside a building), high QoE is desired. Nudp;i
In contrast, lower QoE might be acceptable for
a commercial application (such as inventory con- where N is the total number of RPs (in this
trol in a warehouse). In such cases, it is essential case four), and Nudp;i is the number of RPs
to be able to answer questions like: What is where UDP-based DME is observed. Combining
the probability of being able to obtain a mean the probabilities, Pi , with the associated MSE
square error (MSE) of 1 m2 from an algorithm values for each QoL class, a discrete cumulative
x over different building environments that give distribution function (CDF) of the MSE can be
rise to different amounts of UDP? or What obtained. This discrete CDF is known as the MSE
algorithm should be used to obtain an MSE of pro le (Kanaan et al. 2006). The use of the MSE
0.1 cm2 over different building environments? pro le will now be illustrated with examples,
Answers to such questions will heavily in uence focusing on the CN-TOAG algorithm.
the design, operation, and performance of indoor The system scenario in Fig. 6 is considered
geolocation systems. with D D 20 m. A total of 1,000 uniformly
Given the variability of the indoor propagation distributed random sensor locations are simulated
conditions, it is possible that the distance mea- for different bandwidth values. In line with the
surements performed by some of the RPs will be FCC s formal de nition of UWB signal band-
subject to DDP errors, while some will be sub- width as being equal to or more than 500 MHz
ject to UDP-based errors. Various combinations (US Federal Communications Commission
of DDP and UDP errors can be observed. To 2004), the results are presented for bandwidths of
illustrate, consider the example system scenario 500, 1,000, 2,000, and 3,000 MHz. For each
shown in Fig. 6. For example, the distance mea- bandwidth value, different QoL classes are
surements performed by RP-1 may be subject to simulated, speci cally UUUU, UUUD, UUDD,
UDP-based DME, while the measurements per- UDDD, and DDDD. Once a sensor is randomly
formed by the other RPs may be subject to DDP- placed in the simulation area, each RP calculates
based DME; this combination can be denoted as TOA-based distances to it. The calculated dis-
UDDD. Other combinations can be considered in tances are then corrupted with UDP- and DDP-
a similar manner. based DMEs in accordance with the DME model
Since the occurrence of UDP conditions is based on UWB measurements as given in Alavi
random, the performance metric used for the and Pahlavan (2005). The positioning algorithm
location estimate (such as the MSE) will also is then applied to estimate the sensor location.
vary stochastically and depends on the particu- Based on 1,000 random trials, the MSE is cal-
lar combination observed. For the four-RP case culated for each bandwidth value and the corre-
shown in Fig. 6, it is clear that the following dis- sponding combinations of UDP- and DDP-based
Channel Modeling and Algorithms for Indoor Positioning 171
P(MSE abscissa)
0.6 C
0.4
500 MHz
0.2 1000 MHz
2000 MHz
3000 MHz
0
0 0.5 1 1.5 2 2.5 3
MSE
Channel Modeling and MSE with different QoL classes: CN-TOAG algorithm
Algorithms for Indoor 3
Positioning, Fig. 8 w=500 MHz
Quality of link (QoE) w=1000 MHz
variation across the various 2.5 w=2000 MHz
QoL classes
w=3000 MHz
2
MSE
1.5
0.5
0
1 2 3 4 5
QoL classes
DMEs. The probability of each combination is speci c range of bandwidths. Above 2,000 MHz,
also calculated in accordance with (11). however, the MSE pro le becomes wider as a
The results are shown in Figs. 7 and 8. Figure 7 result of increased probability of UDP conditions
shows the MSE pro les for the CN-TOAG (Alavi and Pahlavan 2005), which increases the
algorithm. From this plot, it is observed that overall DME. This, in turn, translates into an
as the bandwidth increases from 500 MHz to increase in the position estimation error. In order
2,000 MHz, the range of MSE pro le values gets to gain further insight into the variation of the
smaller. This correlates with the ndings of Alavi QoE across the different QoL classes, again
and Pahlavan (2005), where it was observed considering bandwidth as a parameter, just the
that the overall DME goes down over this MSE is plotted, as seen in Fig. 8.
172 Channel Modeling and Algorithms for Indoor Positioning
Synonyms
Climate Change
Check-Out Climate change is de ned as changes in the state
of the climate variables that can be identi ed (by
Smallworld Software Suite using statistical tests) by changes in the mean
and/or the variability of its properties and that
persists for an extended period. An extended
period in climate context implies decades or an
Chronological even longer time scale (IPCC 2014). Climate
change may be due to natural internal processes
Time-Aware Personalized Location Recom- or external forcings and persistent anthropogenic
mendation changes in the composition of the atmosphere or
in land use (Stocker et al. 2013).
in the face of a variable climate include the However, if mitigation is not an easy prob-
development of oodplain regulations, insurance, lem to address, climate adaptation is perhaps
wildlife reserves, drinking water reservoirs, and relatively better poised. In addition, adaptation
building codes. However, these actions have been becomes an absolute necessity if mitigation path-
taken in response to a climate that has been ways fail to fully materialize and/or if historical
relatively stable for many centuries. emissions are already causing signi cant dam-
While climate mitigation deals with energy age. The impacts of natural hazards such as C
and economic policies to avoid the unmanage- hurricanes, oods, and heat or cold waves are
able, climate adaptation is about engineering the felt immediately, and the scarcity of water, food,
coupled natural-built human system to manage and energy resources affects lives and well-being.
the unavoidable. A survey at the World Eco- Thus, the need to adapt is often acceptable to
nomic Forum ranked failure of climate adaptation even those who may not perceive climate change
and mitigation as well as greater incidences of ex- as a major threat or even as a driver of weather
tremes weather as two of the top ten global risks extremes or resource constraints. This is espe-
of highest concern. Two other concerns among cially true if adaptation measures are low regret
the top ten were water and food crises, which (e.g., reinforces what needed to be done anyway
in turn are strongly in uenced by climate. The irrespective of the nature of climate change in
United States Department of Defense, an agency any given region), even though there may be oc-
with a broad mission space encompassing the po- casions when transformational adaptations may
litical, military, economic, social, informational, be the only strategy. The importance and near
and infrastructural sectors, has called climate centrality of water to adaptation have been well
change a threat multiplier in their Quadrennial recognized, both directly through water security
Defense Review Report. and through their impacts on energy and food
Climate mitigation relies on the perceived ur- security. The resilience of interdependent criti-
gency of the climate challenge by nations of the cal lifelines and infrastructures, in sectors such
world. Developing nations may nd the trade-offs as water (including waste water), transportation,
particularly dif cult to justify, given the immedi- energy (or power and fuel), and communications,
ate impacts on the poor and on the aspirations has been recognized as an urgent societal need.
of their middle class, as well as the disparities Natural ecosystems, which may act as soft in-
in per capita and in historical emissions when frastructures (e.g., in coastal and/or urban regions
compared to developed nations. However, the where marine ecosystems could help slow down
poorer sections of the developing nations are ex- the impacts of sea level rise or reduce the strength
pected to have to bear the brunt of climate-related of storm surges), may have interdependencies
hazards and resource scarcity. Conversely, de- with built, or the hard, infrastructures. One step
veloped nations do not consistently rank climate to adaptation is what has been sometimes called
change as among the highest of policy concerns translational climate science or the ability to
but expect the developing nations to bear the develop actionable yet credible insights from cli-
burden of emission reductions by appealing not to mate science through computational modeling
per capita but to the total emissions per country. and data sciences.
Belief systems, ranging from political ideologies
and the ability of technological innovations to
solve society s problems to humanity s manifest Incremental and Transformative
destiny, in addition to a host of cultural and Adaptation
historical experiences on individuals and nations,
color the perceptions around climate mitigation. Incremental adaptations to climate change can be
What adds to the complexity is that the costs thought as extensions of actions and behaviors
of mitigation have to be borne by the current that already reduce the loss and enhance the
generation but the perceived bene ts are to future bene ts of variations in changing climate and
generations and to the planet at large. weather extremes. Incremental adaptations are
176 Climate Adaptation, Introduction
doing slightly more of what is already being done growing challenge of food and water security.
to deal with natural variation in climate and with For instance, the population of Chennai, an
extreme events. However, transformative adapta- urban coastal metropolitan city on the Eastern
tion measures seek to change the basic attributes coast of India, has increased by fourfolds in
of the systems that are affected or likely to be the last ve decades (Chennai City Popula-
affected by the variations. Kates et al. classify tion Census 2011). When the city was fac-
transformation adaptation into three broad cate- ing an unprecedented crisis of drinking water,
gories which are discussed in further detail in this introduction of seawater desalination plants
section (Kates et al. 2012): has proved to be transformative adaptation to
drinking water problem.
1. Adaptation new to resources/location: Ex- 2. Enlarged scale or intensity: Incremental
amples include introduction of technologies adaptations can become transformative
into places where they have been not used when they are used at a greater scale with
before. This can either be done through tech- much larger effects. This kind of adaptation
nology transfer or by technological inven- measures generally requires a system-level
tions relevant for the location. Environmental, view with an underlying philosophy that
human-induced changes and mass migrations the whole is greater than the sum of its
have posed serious challenges or a serious parts.
challenge to urban coastal megacities. As a 3. Different places and locations: Some adap-
consequence, these cities are facing an ever- tations collectively transform place-based hu-
Climate Adaptation, Introduction, Table 1 Summary of sectorial risks in changing climate and potential adaptation
strategies
Sector Key risks Potential adaptation strategies
Water Drought frequency likely to increase by the Adaptive water management strategies
resources end of the twentieth century Scenario planning
Raw water quality likely to reduce Low regret solutions (IPCC 2014)
Increased concentration of pollutants during
droughts
Ecosystems Increasing ocean acidi cation in medium to Promote genetic diversity
Terrestrial high emission scenarios to impact population Assisted migration and dispersal from
Marine dynamics, physiology, and behavior of marine severely impacted ecosystems
Inland species Manipulation of disturbing regimes (such as
Carbon stored in terrestrial biosphere suscepti- forest res, coastal ooding)
ble to loss to atmosphere as a consequence of
deforestation and climate change
Increased risk of species extinction and habitat
migrations
Urban areas Heat stress Multilevel urban risk governance
Inland and coastal ooding Including voice of low-income groups in
Drought and water scarcity informing policy
Risks ampli ed by lack of resilient infrastruc- Building resilient infrastructure systems
ture systems
Rural areas Moderate to severe impact on: Trade reforms and investments in rural ar-
Food security eas
Agriculture income Adaptations for agriculture and water
Shifts in production area of crops through policies taking account of rural
Freshwater availability decision-making contexts
Climate Adaptation, Introduction 177
man environment systems or shift such sys- Some examples of key risks in changing cli-
tems to other locations. Resettlement associ- mate include:
ated with climate variability, and, by some
accounts, climate change per se, is already (a) Risk of disruption of livelihoods in low-lying
under way in a few locations. This category coastal zones and small islands due to sea
of transformations becomes imperative when level rise, storms, and coastal ooding (Aerts
risks have increased beyond the threshold, et al. 2014). C
where incremental transformations and even (b) Risk of deaths and mass migrations of large
technology transformations may not result in urban populations as a consequence of inland
a signi cant positive effect. ooding.
(c) Systematic risks due to extremes leading to
breakdown of infrastructure networks (Bhatia
et al. 2015).
Future Directions: Risks and (d) Increased risk of mortality and illness during
Opportunities for Adaptation extreme period of heats, particularly for vul-
nerable urban population (Meehl and Tebaldi
In the context of climate change, key risks refer 2004).
to dangerous human-induced interferences with (e) Risk of loss of biodiversity in terrestrial, ma-
changing climate. Identi cations of key risks are rine, and/or inland water ecosystems.
based on the following criteria: (f) Risk of food insecurity linked to warming,
drought, ooding, and extreme precipitation
(a) Large magnitude events, particularly for poorer populations in
(b) High probability or irreversibility of impacts urban and rural settings.
(c) Timing of impact
(d) Persistent vulnerability Sectors that are likely to be impacted by the
(e) Limited potential to reduce risks through mit- key risks include freshwater resources, ecosys-
igation or adaptation tems (include terrestrial, marine, and inland),
Climate Adaptation, Introduction, Table 2 Summary adaptation strategies for given key risk are identi ed by
of regional risks in changing climate, climate drivers, the same number (Adapted from IPCC 2014)
and potential adaptation strategies. Climate drivers and
Region Key risk Climate driver Potential adaptation strategies
Australasia I. Increased risk in riverine, I. Extreme precipitation, cy- I. Exposure reduction and protect-
coastal, and urban ood clones, and sea level rise ing natural barriers (e.g., man-
II. Heat waves: increased risk II. Warming trends and ex- groves)
of heat-wave-related mor- treme temperature events II. Heat health warning systems, ur-
tality, forest res, decreas- III. Warming trends and cy- ban planning to reduce heat is-
ing crop output/hectare clones lands, and new work practices to
III. Signi cant change in IV. Extreme temperature, dry- avoid heat stress among outdoor
community composition ing trends, and warming population
and structure of coral reef trends III. Direct interventions such as as-
systems in Australia sisted colonization and shading
IV. Increased risk of drought- and reducing stresses such as sh-
related water and food ing, tourism
shortage in South Asia and IV. Integrated water resource
the Indian subcontinent management, water infrastructure
and reservoir development, water
reuse, and desalinated sea water
usage in coastal areas
(continued)
178 Climate Adaptation, Introduction
North I. Wild re induces loss of I. Warming trends and dry- I. Introducing resilient vegetation
America ecosystem integrity and hu- ing trend and prescribed burning
man mortality II. Extreme temperature and II. Early heat warning systems,
II. Heat-related human mortal- warming trends cooling centers, residential air
ity III. Extreme precipitation and conditioning, and community
III. Urban oods in riverine cyclones and household scale adaptations
and coastal area resulting in through family support
ecosystem damage, human III. Low impervious surface
mortality, mass migrations, pavement designs, updating
and infrastructure damage old rainfall-based infrastructure
design to re ect current and
changing climate conditions, and
protecting natural ood barriers
(e.g., mangroves)
Europe I. Signi cant reduction in wa- I. Drying trend, warming I. Implementation of best practices
ter availability from river ab- trend, and extreme and governance instruments in
straction and from ground- temperature river basin management plans and
water resources, combined II. Extreme temperature integrated water management
with increased water de- II. Implementation of warning sys-
mand (e.g., for irrigation, en- tems
ergy and industry, domestic III. Adaptation of dwellings and
use) and with reduced water workplaces and of transport and
drainage and runoff as a re- energy infrastructure
sult of increased evaporative IV. Reductions in emissions to im-
demand prove air quality
II. Increased economic losses V. Improved wild re management
and people affected by ex- VI. Development of insurance prod-
treme heat events: impacts ucts against weather-related yield
on health and well-being, la- variations
bor productivity, crop pro-
duction, air quality, and in-
creasing risk of wild res in
southern Europe and in Rus-
sian boreal region
food sector, infrastructure sector, urban and rural tion. Moreover, adaptation is region and context
areas, and human health. Table 1 summarizes speci c, and with no universal strategy to reduce
the selected key risks and possible adaptation the risk, characterizing the risks and understand-
scenarios for these risks in changing climate. ing context and place-based adaptation strate-
Risks will vary through time across regions gies are critical to inform adaptation strategies.
and populations, dependent on innumerable fac- Table 2 summarizes the selected regional risks
tors including the extent of adaptation and mitiga- and feasible adaptation scenarios for Australasia
Climate and Human Stresses on the Water-Energy-Food Nexus 179
Converting surface irrigation into high- leading to a 6 % rise in electricity use. For three
ef ciency pressurized irrigation may save days in August, peak demand was so high that
water but may also result in higher energy utilities shut off 1.5 GW of nonessential industrial
use. loads to avoid instating rolling blackouts (Texas).
Given the severity of the drought, the
Water, energy, and food (WEF) represent the Texas power system demonstrated exceptional
greatest global risks because they are expected resiliency. As a state prone to such dry weather,
to be highly impacted by climate change, demo- most power plants have either been built or
graphic shifts including mass migrations, aging retro tted with equipment to ensure operability
infrastructure, global trade networks, and other with restricted water use. Natural gas, for
challenges of the twenty- rst century (Andrews- example, has become a major source of electricity
Speed et al. 2012). The nexus approach considers in Texas and requires no cooling water if used
the different dimensions of water, energy, and in a combustion turbine. The construction of
food equally and recognizes the interdependen- ef cient combined-cycle power plants reduces
cies of different resource uses to develop sustain- water use per unit generation. Many plants that
ably (Bazilian et al. 2011). rely primarily on once-through cooling have
supplementary cooling towers for use during
drought conditions. One plant even has an
8.5-mile pipeline to bring cooling water from
Historical Background a secondary source. Lastly, wind power has
seen enormous growth in Texas during the past
In 2011, Texas experienced its most extreme decade, with 10 GW capacity now installed. Wind
drought on record, stressing the ability of the power requires no cooling water. As evidence of
power grid to meet demand. A 2013 study dis- the effectiveness of these alternatives, not a single
cussed the water-energy nexus in the context power plant was cited for water discharges above
of Texas droughts. The electricity supply grid the allowable level during the drought.
in Texas is unique in that it is almost entirely The population of Texas is projected to
separated from the rest of the power infrastructure increase dramatically in the coming decades,
in the USA. As such, there is limited capacity and infrastructure planners are working on
to purchase power from other geographic regions new ways to ensure that electricity demand is
in case of a generation de cit. In the event of met even under extreme drought. Some have
a statewide drought, this isolation presents vul- suggested adding supplemental cooling towers
nerability. Texas also encompasses a range of to all plants, but critics argue that this option
climates. In the subhumid eastern half of the is too costly. Rather, those critics recommend
state, most power plants (70 %) use once-through wisely choosing what type of new generation
cooling and draw from surface water, most often and cooling systems to build. These include
reservoirs. In the semiarid west, power plants use dry cooling systems that, while expensive, use
wet cooling systems to minimize water demand, air rather than water for cooling. To meet the
which is met mostly with groundwater. Dur- rising demand for cooling water, the Texas State
ing 2011 Texas experienced 100 days of above Water Plan calls for the construction of 26 new
100 F temperatures and a record low level of reservoirs. Some are advocating the increased
precipitation. The combination of high demand, utilization of groundwater resources, speci cally
low rainfall, and higher temperatures increased using aquifers to store water and eliminate
evaporation, lowering the state s reservoirs by evaporative losses. This is a common practice in
30 % compared to the previous year. At one California, Arizona, and Florida but has yet to be
point 88 % of the state was experiencing excep- implemented in Texas. Another option is drawing
tional drought. The drought was accompanied water from non-freshwater sources, including
by greater electricity demand for air conditioning, treated wastewater, brackish water, and seawater.
Climate and Human Stresses on the Water-Energy-Food Nexus 181
Climate and Human Stresses on the Water-Energy- duce electricity for other activities. Water is also essential
Food Nexus, Fig. 1 A simpli ed schematic view of the for agriculture and food. Some agricultural products are
relationship between water, food, and energy. Water is re ned into fuels in a process that connects water, energy,
used for cooling thermoelectric power plants, which pro- and food
than once-through systems, it has a higher rate of corn and soy irrigation each consume upward of
consumption. A third type of cooling method is 10,000 gallons of water for each MMBTU of fuel
known as dry cooling. These systems circulate produced. The re ning of petroleum and plant-
water in a closed loop and dissipate the heat to based fuels also requires large amounts of water,
the air through a large heat exchanger, similar to in the range of 1 2 billion gallons every day.
the radiator in a car. Dry cooling technology While this chapter focuses on water for cooling,
has the bene t of consuming virtually no water water plays a vital role throughout the extraction,
but is expensive, large, and dependent on the local re ning, and transport of fuels as well (DoE US
climate. Some power plants use a combination of 2006).
wet and dry cooling. These are called hybrid
systems. Dammed hydroelectric power produc- Climate Impacts on Water and
tion relies on an adequate supply of water; lower Consequences on Power Generation
water levels in reservoirs correlates to reduced As mentioned at the outset of this chapter, climate
generating capacity. Extracting a single gallon of change will have wide reaching effects on the
oil requires between 2 and 350 gallons of water. water cycle, altering the temperature and quantity
Growing biofuel crops has a higher water inten- of water available in all regions of the country.
sity than any method of fossil fuel extraction: Also as discussed, electricity generation is highly
Climate and Human Stresses on the Water-Energy-Food Nexus 183
Climate and Human Stresses on the Water-Energy- (Adapted from Ganguly et al. 2015) and the rate of water
Food Nexus, Fig. 2 Power plant information from the intake. Water availabilities are median values taken from
Energy Information Agency (EIA) overlaid with county- an ensemble of precipitation-minus-evaporation models.
level water availability projections for 2040. Power plants Values are less domestic demand, taken as per capita
are broken down by the type of cooling system employed demand times the projected 2040 population
dependent on abundant sources of water for cool- With the exception of the last criterion, which
ing. The con uence of these two relationships is a mechanical limit, these regulations are put
forms a water-energy nexus that will stress water in place at the local, state, and federal levels to
supplies and potentially limit power production protect the aquatic environment. The rst three
capacity in the future (F rster and Lilliestam criteria are temperature dependent. As water tem-
2009). peratures increase due to climate change, it be-
The relationship between energy production comes increasingly dif cult for power plants to
and water for once-through cooling systems can meet these discharge requirements. If the intake
be broken down into following sub-relations: rate of cooling water is kept constant, higher
intake temperatures equate to higher discharge
The temperature of the water discharged by a temperatures. In some cases, the higher discharge
plant cannot exceed a speci ed level temperature will exceed criteria 1 or 2. Alter-
The mixed river and discharged water temper- natively, a cooling system may compensate for
ature must not exceed a speci ed level higher intake temperatures by increasing the in-
The temperature difference of discharged ver- take rate. In this case criterion 4 limits plant
sus river water must not exceed a speci ed operation, especially if the availability of water is
level reduced, as in drought conditions. Additionally,
The plant can only withdraw up to a certain criterion 5 prevents the plant from drawing in
fraction of the available stream ow more cooling water than the capability of the
The cooling system must not exceed its pump- plant s equipment (Kimmell 2009). Water avail-
ing capacity ability can also become an issue in light of criteria
184 Climate and Human Stresses on the Water-Energy-Food Nexus
Climate and Human Stresses on the Water-Energy-Food Nexus, Fig. 3 A owchart of water availability with both
its drivers and the responses (Source: IPCC WG-II 2014 report Chapter 3 IPCC 2014)
3 and 4. In the event of low water levels, a temperature, carbon dioxide levels, and water lev-
plant may be forced to withdraw less water so els. Climate disruptions will cause a signi cant
as to satisfy criterion 4. As a result of the lower decrease in the yield of most crops and livestock
intake rate, the temperature of the discharged because of changes in the atmosphere and the
water will be higher than usual. In this case, water availability. Changes in water availability
criterion 1 limits the plant s generating capacity. will affect what crops will grow in certain areas
In addition to the above 5 criteria, in certain cases and the amount of yield and hence in uence food
reduced water availability may lower the water production around the world.
level in bodies of water from which power plants In a world with a growing population, the
withdraw cooling water. If the water level falls demand for food is growing, and the agriculture
below the intake level, the plant will be unable sector needs to increase production to match
to intake suf cient quantities of cooling water. this projected growth. Without the necessary in-
Many systems have intakes at depths shallower crease in agricultural lands, there would either
than 10 feet and may run dry under drought con- need to be fundamental improvements in yield
ditions. In any of these situations, power plants and management, consumption patterns, or both.
are forced to either reduce generation or shut However, consumption patterns are expected to
down entirely. As a result, the availability of increase, with large middle class populations in
electricity is reduced. In regions at high risk of developing nations aspiring to the standards of
increased water temperatures and/or reduced wa- living available in the First World countries. Wa-
ter availability induced by climate change, power ter requirements vary by crop type, and scarcity
production is especially vulnerable (Fig. 3). or variability in water availability may reduce
yield substantially. In a study on wheat, rice,
Climate Change Consequences on the maize, and soybean conducted by Parry et al.,
Marine Food Web carbon dioxide levels, temperature, and water
Agriculture and water systems have an obvious availability were projected to future levels to
connection: plants need water to grow and yield determine how the yield would be affected (Parry
food. Aside from this main connection, climate et al. 2004). The results illustrate slight to mod-
in uences the agriculture sector with changes in erate negative impact worldwide with the most
Climate and Human Stresses on the Water-Energy-Food Nexus 185
Plausible Projection A
Surface Water Climate Change Multi-sector Population Stream Temperature (°C)
Ground Water and Variability Demand Change 15.0
> 15.0 - 20.0
EPA Changes in Regional Changes in Changes in > 20.0 - 25.0
Regulation Hydro-climatology Water Supply Water Demand > 25.0 - 32.0
> 32.0
s
Ex
Po
er Pla
tie
r
ab
Vu we
se
rP
Technologies
d
Po
ln
Impacted
ts
-– k
k
00
50
50
50
50
0
85
15
85
Ecosystems
20
–3
-1
-3
-8
Proximity to
-–
>
–
k
k
k
£
50
0
0
50
50
15
35
Water Bodies
00
–1
Energy Security Challenges
–8
–3
>
>
–2
>
>
>
>
Climate and Human Stresses on the Water-Energy- power production based on climate model and population
Food Nexus, Fig. 4 A proof of concept on aspects of projections, which combined with stream temperature
the water-energy nexus. The process ow (left) considered sensor and GIS data on energy infrastructures resulted in
water stress resulting from scarcer and warmer freshwater new insights (right)
(Adapted from Ganguly et al. 2015) on thermoelectric
potential changes in yield in Asia and Africa. The Climate, Humans, and WEF
yields, according to this study, will fall by 10 % Changes in population and lifestyles, changes
combined overall throughout all of the regions, in climate, resilience of infrastructures and so-
which has the potential to affect the food security cieties, and regional land use and urbanization
of the growing population, especially in develop- are the main stressors of the water-food-energy
ing economies. nexus. They impact water resources (both quan-
Climate change is severely impacting marine tity and quality), water-related hazards ( oods
food webs. Marine life is comprised within an in- and droughts), built and natural infrastructures,
tricate network; currents make nutrient-rich wa- and coupled natural-engineered human systems
ters available to marine life, sustaining single- across many scales. The in uence and impact
celled plants called phytoplankton, the zooplank- of these stressors are critically dependent on
ton that feed on them, and the larger sh that eat time horizons, spatiotemporal resolutions, and
the zooplankton and may be consumed as food by the resilience of the coupled systems. As climate
humans. change in uences water availability, additional
Climate change and associated changes in stress will be placed on the balance between food
ocean circulations, sea level rise, and coastal and biofuel crops. As discussed earlier, biofuel
winds are altering the patterns of nutrient crops are much more water intensive than tradi-
upwelling and thereby changing the timing of sh tional fuels. As such, the energy and food needs
spawning and the yield from sheries (Wang et al. of people must be carefully considered in light of
2015). Rising sea temperatures are threatening changes in the geographic distribution of water
the habitats of sea life by straining the ability availability (Fig. 4).
of these creatures to cope with temperature
changes. Marine mammals are being forced to
travel longer distances to nd food or to rely Planning Horizons and Spatial Resolutions
on less nutritious, energy-expending substrate The spectrum of water challenges operates at
for survival. Changing marine patterns are also multiple space and time scales, and across dis-
affecting human food production and the shing parate planning horizons, over which scienti c
industry. insights, projections, and policy insights need
186 Climate and Human Stresses on the Water-Energy-Food Nexus
to be generated. They include near-real time to appropriate changes can be made in emergency
weeks, seasonal to interannual, and decadal to preparedness. Reinforcements and remediation
mid-century. measures to build the resilience of communities,
High-resolution predictions are possible over engineered systems, and natural ecosystems may
near-real time to weekly time scales. Beyond this be revived or made more resilient based on this
range, chaotic characteristics and/or extreme sen- knowledge.
sitivity to initial conditions limits the predictabil- Beyond decadal to mid-century time scales,
ity of hydrological and meteorological systems. climate trends are expected to dominate over
Within these time frames, short-term monitoring uncertainties. Projections for population and hu-
and predictions may lead to urgent and immedi- man systems in this range are not available and
ate events and emergency management. Speci c dif cult to create. Similarly, infrastructure and
examples include ood and ash ood warnings, technology changes are nearly impossible to fore-
monitoring water quality of source lakes to gen- see. Most stakeholder decision horizons do not
erate advance warning of possible pathogens in extend to these scales. The climate change adap-
drinking water, urgent warnings about crop de- tation community, as well as the related inte-
struction, or stoppage of power plant operations grated assessments community, has been work-
owing to lack of water during a drought or excess ing at these ranges with a variety of simpli-
water because of oods. ed models. Data and geographic information
Seasonal to interannual time scales encompass science may be useful at these scales primarily
changes in climate patterns such as El Niæo, to develop predictability and uncertainty studies
seasonal changes in monsoons in the Southwest in climate, evaluate the performance of models
USA, seasonal oods in Iowa, the severity of of water, energy, and food systems, develop en-
winter in the Northeast USA, the number of trop- hanced projections with uncertainty, and examine
ical cyclones striking the Gulf Coast during the aggregate impacts in terms relevant for adaptation
hurricane season, seasonal droughts in California, and mitigation. Water in the atmosphere, oceans,
and regional energy production. Speci c weather land, and biosphere signi cantly impact climate
events are not predictable at these time scales, variability yet remain among the largest sources
but average seasonal and annual patterns can be of uncertainty. Climate impacts regional hydrom-
characterized. eteorology and the water balance, including avail-
Decadal to mid-century time scales, ranging ability and quality of surface and groundwater,
from about 5 30 years in the future, are not pre- and in uences natural hazards such as oods
dictable on a seasonal or even annual basis. How- and droughts. Water availability and tempera-
ever, physics-based climate models can project ture affect terrestrial and marine ecosystems, as
climate trends and variability based on assumed well as energy and food security. Climate in u-
emission scenarios. Trends in global warming ences the probability of hazards such as oods
and changes in weather patterns start becoming and droughts at multiple space-time resolutions.
prominent at these horizons. Also at this scale, Adapting to changing hazards requires the re-
variability in mean and extreme climate is ex- silient design of critical infrastructures and effec-
pected to be large. While climate change and the tive management of key resources.
associated uncertainty are expected to be a major
factor in water challenges at these time horizons,
demographic changes may dominate the impact Key Applications
on water resources. The intensity and severity
of oods and droughts may not be predictable The challenge of understanding energy, water,
on a single-event, seasonal, or annual basis at and food policy interactions, and addressing them
these time horizons, but duration and frequency in an integrated manner, appears daunting. Com-
changes may be predictable. The nexus of water prehensive understanding of WEF nexus and sub-
with food and energy can be examined and char- sequent impacts of the human and climate is
acterized together with their uncertainties, and required to:
Climate and Human Stresses on the Water-Energy-Food Nexus 187
Assess the current state and pressures on nat- to enable short-term recovery and long-term
ural and human resources systems preparedness?
Forecast expected demands, trends, and 5. How would uncertainties along the intercon-
drivers on resources systems and interactions nected networks of the nexus be quanti ed,
between water, energy, and food systems including in how the impacts of changes in the
Delineate different sectorial goals, policies, stressors and their extremes propagate along
and strategies in regard to water, energy, and the nexus? C
food. This includes an analysis of the degree
of coordination and coherence of policies, as
well as the extent of regulation of uses. Cross-References
Need for planned investments, acquisitions,
Climate Extremes and Informing Adaptation
reforms, and large-scale Infrastructure;
Climate Hazards and Critical Infrastructures
Inform key stakeholders, decision-makers,
and user groups. Resilience
References
Future Directions
Andrews-Speed P, Bleischwitz R, Boersma T, Johnson C,
The key challenges/questions for future direc- Kemp G, VanDeveer SD (2012) The global resource
nexus: the struggles for land, energy, food, water, and
tions are vefold: minerals. Transatlantic Academy, Washington, DC
Bazilian M, Rogner H, Howells M, Hermann S,
1. What are the relationships among the Arent D, Gielen D et al (2011) Considering the en-
interlinked stressors and the stressed systems ergy, water and food nexus: towards an integrated
modelling approach. Energy Policy 39:7896 906.
across the components of the water-climate- doi:10.1016/j.enpol.2011.09.039
energy-food-ecosystem nexus at different time DoE US (2006) Energy demands on water resources: re-
and spatial scales? port to Congress on the interdependency of energy and
2. Can remotely sensed and other information water, vol 1. U.S. Department of Energy, Washington,
DC
about lakes or rivers including water levels F rster H, Lilliestam J (2009) Modeling thermoelectric
and quality, capacity, location and water use of power generation in view of climate change. Reg En-
power production, resilience of energy infras- viron Change 10:327 338. doi:10.1007/s10113-009-
tructures, food crops and biofuels, as well as 0104-x
IPCC (2014) Climate change 2014: impacts, adaptation,
freshwater or marine ecosystems be related to and vulnerability. Part B: regional aspects. Contribu-
each other through graphical dependencies to tion of working group II to the fth assessment report
form interconnected network structures across of the intergovernmental panel on climate change
the disparate systems of the nexus, with a view [Barros VR, Field CB, Dokken DJ, Mastrandrea MD,
Mach KJ, Bilir TE, Chatterjee M, Ebi KL, Estrada
to understanding their systemic dependencies, YO, Genova RC, Girma B, Kissel ES, Levy AN,
feedback, and resilience? MacCracken S, Mastrandrea PR, White LL (eds)].
3. What are the characteristics of the stressors, Cambridge University Press, Cambridge/New York
especially the attributes of their extremes, how Ganguly AR, Kumar D, Ganguli P, Short G, Klausner J
(2015) Climate adaptation informatics: water stress
do they impact the nexus as well as the in- on power production. Comput Sci Eng 17:53 60.
dividual components of the nexus, and how doi:10.1109/MCSE.2015.106
do failures or loss of functionality propagate Kimmell TA, Veil JA, Division ES (2009) Impact of
drought on U.S. steam electric power plant cooling
along the tightly interconnected system of
water intakes and related water resource management
systems? issues. Argonne National Laboratory (ANL), Wash-
4. Can the future ows, feedback, and vulnera- ington, DC
bility along the nexus network, as well as the Parry ML, Rosenzweig C, Iglesias A, Livermore M,
Fischer G (2004) Effects of climate change on global
perturbations of the nexus owing to possible
food production under SRES emissions and socio-
non-stationary behavior of the extreme stres- economic scenarios. Glob Environ Change 14:53 67.
sors, be predicted across multiple time scales doi:10.1016/j.gloenvcha.2003.10.008
188 Climate Change
Common Future, a paper released by Brundt- or other low-carbon investments may make good
land Commission. The commission was estab- business sense while lessening the more severe
lished in the 1980s as a response to the in- effects of climate change. The Intergovernmental
adequate response of the 1970s environmental Panel on Climate Change (IPCC) ndings on
movement. Sustainable development, according mitigation are outlined in its Working Group III
to the commission, is development that meets report (Summary for Policymakers 2014). Ac-
the needs of the present without compromising cording to the IPCC, mitigation efforts, by lim- C
the ability of future generations to meet their iting the impacts of climate change, can enhance
own needs (Brundtland 1985). The commission sustainable development, allow for more equi-
further speci ed this de nition by adding three table distribution of resources, and assist with
pillars: economic growth, environmental protec- poverty. IPCC Working Group III describes the
tion, and social equality. Although many focus challenge of distributive justice, or the division
on the second pillar, the commission emphasized of mitigation efforts equitably, to account for
the interconnectivity of all three. Only when each the greater impact of climate change on devel-
pillar is achieved can sustainable development oping nations with historically lower emissions.
truly be realized. For the 1.3 billion people in the world without
access to electricity (Summary for Policymakers
2014), sustainable development will be required
Historical Background for this population s economic growth to not
simultaneously expanding fossil fuel usage. The
Climate change is expected to worsen the scarcity report presents developing, urbanizing cities as a
of water, food, and energy resources and exacer- signi cant mitigation target to reduce emissions,
bate hazards such as heat waves and heavy pre- as many developed nations remain locked into
cipitation. Developing economies are character- excessive emissions by existing infrastructure.
ized by growing and vulnerable populations, ever Mitigation policy will require a systemic ap-
increasing income and wealth inequality, deterio- proach, innovation, and dif cult decisions: effec-
rating or inadequate infrastructures, as well as the tive greenhouse gas emission reductions cannot
inability to mobilize resources for relief, rescue, be done if all nations do not act. The systemic
or recovery efforts. Thus, while no region is im- change required will necessitate signi cant pub-
mune to climate change, developing economies lic, private, and institutional spending at an global
have been hit the hardest and will continue to level that takes into consideration local and his-
be disproportionately impacted in the coming torical practices and the potential for injustice
decades. However, climate adaptation and mitiga- between both developing and developed nations.
tion remain hotly debated, especially in situations The IPCC s WGII report (IPCC 2014) iden-
where economic inequalities are severe and the ti es different areas of society that are at risk
disparity between future societal aspirations and and the impact of climate change on different
the current status is large. Reduced reliance on populations in the face of remaining uncertainty
fossil fuels may be viewed as less urgent com- on the exact timing and extent of its impacts.
pared to industrialization and economic growth, The IPCC report nds the most at risk vul-
while land use and urban planning may be viewed nerable populations to be those that are already
as hindrances to improving quality of life. Adapt- the most disadvantaged in society: socially, eco-
ing to projected disasters or resource scarcity may nomically, politically, culturally, institutionally,
seem a case of misplaced priorities amidst a lack or otherwise. The WGII report recommends rst
of adequate investments in education, health, or evaluating levels of vulnerability and risk. Once
food security. However, it is within these devel- risks are de ned, policy makers must evaluate
oping economies that low regrets or transforma- resilience; the ability of a society to recover from
tional adaptation may signi cantly reduce loss disasters and hazardous events, ef ciently and
of lives and economic devastation of the under- effectively. Climate resilience, speci cally, is the
privileged. In the longer term, renewable energy ability to manage the impacts of climate change,
190 Climate Change and Developmental Economies
Climate Change and Developmental Economies, in both the climate system (left) and socioeconomic
Fig. 1 Risk of climate-related impacts results from the processes including adaptation and mitigation (right) are
interaction of climate-related hazards with the vulnerabil- drivers of hazards, exposure, and vulnerability (Source:
ity and exposure of human and natural systems. Changes IPCC 2014, WGII AR5 SPM, pp 26)
reducing disruptions and expanding opportuni- that this relationship can be decoupled, that is,
ties. In Fig. 1, the IPCC s iterative risk manage- that economic growth should not be conditional
ment process is illustrated. Risk, according to the on environmental degradation. Two ideas, the
IPCC is the intersection of hazards, vulnerability, environmental Kuznets curve (EKC) and the
and exposure. While climate causes hazards, risk more recent Ecomodernist Manifesto, explore
arises out of vulnerability and exposure due so- the topic of decoupling further.
cioeconomic processes.
Among the uncertainty of climate change and The Environmental Kuznets Curve
its impacts, it is widely accepted that effective
adaptation requires localized and targeted ap- A country s level of economic development is a
proaches. The IPCC recommends investing in in- key component in the debate on climate change
frastructures, development assistance, and exist- mitigation. Do developed countries need to take
ing disaster risk management institutions. How- on the majority of emissions reductions? Is there
ever, the report emphasizes that a one-size- ts- such a thing as a far share carbon space? Does
all solution will be ineffective globally due to a country s GDP growth inherently lead to a
varying social values, interests, expectations, and worse off environment? Questions like these are
circumstances (Smith et al. 2014). at the heart of the climate change-development
conundrum. The environmental Kuznets curve,
Scientific Fundamentals developed in 1991 by Grossman and Krueger,
shows a graphical relationship between develop-
The relationship between economic growth and ment and environmental degradation. Based on
environmental degradation is a fundamental the economic Kuznets curve, which described the
one for sustainable development. Many hope relationship between a country s gross domestic
Climate Change and Developmental Economies 191
Turning point
political freedoms, rather than economic devel- The way for humanity to develop further while
opment (Barrett et al. 2000). preserving the surface of the earth is for society
Some ecological economists argue that a bet- to decouple from nature and decrease resource
ter measure of human well-being is re ected in dependency. Simply put, nature unused is nature
human development rather than in the measures spared (Asafu-Adjaye et al. 2015).
of GDP growth alone. Steinberger and Roberts The Manifesto de nes decoupling as the
looked at the relationships between four differ- decrease in rate of environmental impact of a
ent measures of human development (life ex- process, as economic output rates increase. The
pectancy, literacy, GDP per capita, and Human authors set a goal of absolute decoupling, when
Development Index) and two measures of re- the rate of human consumption of resources and
source use (primary energy use and carbon emis- energy peaks and declines. The Manifesto sug-
sions). They concluded that human well-being gests humanity is on this path to peak envi-
over time is becoming steadily more ef cient. ronmental impact by the end of this century,
This challenges the perception that increased en- due to trends of increased urbanization, agricul-
ergy usage and increased emissions are necessary tural technology expansion, and the introduction
for better living conditions. of ef cient technology. The Manifesto s recom-
Ultimately, Steinberger and Roberts research mended strategy to reduce the human dependency
re ects new possibilities of dissociation between on natural resources and strive for rapid de-
raising the standard of living while degrading carbonization include urbanization, aquaculture,
the environment. In their words, high human agricultural intensi cation, nuclear power, and
development can be generated at lower and lower desalination.
energy and carbon emissions costs, and the qual- A strength of the manifesto is that it recom-
ity of life is steadily decoupling from its material mends innovation, not a return to earlier prac-
underpinnings. They found that different mea- tices. It rejects nostalgia in environmentalism,
sures of development can be achieved at different and the idea that humanity has previously lived
rates of energy usage. Literacy, for instance, for lighter on the land, since three-fourths of global
instance, requires far less energy output than deforestation occurred before the industrial revo-
GDP (Steinberger and Roberts 2010). This new lution. It calls for the expansion of electricity in
paradigm of development is useful. The justi ca- the developing world, in contrast to environmen-
tion for decoupling is re ected in other literature talists who theorize that resources are not avail-
as well, including the Ecomodernist Manifesto. able for such development. It presents the idea
that global poverty is an environmental problem,
one that cannot be ignored.
Ecomodernist Manifesto However, the Manifesto lacks a concrete
strategy for more ef cient use of resources. It
The Ecomodernist Manifesto rejects the idea that presents scalable, power dense technologies as
humanity will run out of resources and that in- an alternative to carbon energy sources, even
creased human development is a problem. Ac- though present technologies are not yet capable
cording to the Manifesto, the real issues are the of achieving that transition. It relies on the
misuse of energy sources, inef cient technology, discovery of such technologies, but even admits
and excessive carbon emissions. Predicted out- that such progress is not inevitable.
comes of the current path of resource usage,
including ocean acidi cation, the loss of ozone in
the earth s stratosphere, and climate shifts, could Case Study: Paradox of Development
result in economic, population, and ecological and Mitigation
loss. Along with long-term effects, the Manifesto
points out well-known immediate impacts on The relationship between development and its
populations, including water and air pollution. con ict with sustainable development is well
Climate Change and Developmental Economies 193
documented both in theoretical and empirical ture (sponge iron, carbon products, and smelting
literature. Though the start was made in units, among others). The industrial units were
developed countries and most of the empirical belching fumes containing substantial amount of
studies are still in the context of high-income harmful gases, toxic chemicals, and Suspended
industrialized countries, developing countries Particulate Matter (SPM). This has resulted in
are slowly rising up to analytical research that pollution of air and water in the surrounding
explores the link between economic activities, areas, health- and other-related damages to both C
environmental impacts, climate change, and the workers and residents, and another impetus to
sustainability. Several such studies as well as global climate change. SPM10 level at Durgapur
the reports brought out by the UNFCC have stood at 350 mg/m3 in August 2014, more than
put developing countries, especially India and ve times of the permissible standard level of
China, in a dilemma. On one hand lie the 60 mg/m3 . The nitrogen dioxide level stood at
aspirations of its people to achieve a decent 51.5 mg/m3 in 2014 and continues to increase.
standard of living and come out of poverty, It has been observed by researchers that the neo-
which requires these economies to achieve a high industrialization drive has raised pollution levels
macroeconomic growth rate. On the other is the and there are substantial costs involved as evident
global responsibility of these countries to check by using Willingness To Pay and Willingness
GHG emissions and adopt mitigation measures to Accept methods. More than one-third of the
to delay climate change. Matters are made more respondents report the pollution level as unbear-
complex by the domestic heterogeneity of these able while half of them say that it has increased
countries one part having a lifestyle and values over the last decade. Estimated value of environ-
akin to the rst world and another whose troubles, mental damage is about 2 % of the gross output
struggles, and aspirations resemble the least of the new industries clearly pointing at the
developed countries. While the terms of and substantial cost of development without looking
solutions to challenges at the macro level remain at environmental impacts.
gray, there is some evidence that at the microlevel It is not that only industrialization in
policy makers prefer development at any cost. developing countries is to be blamed. Agriculture
We shall refer to some case studies to understand has its own way of affecting the environment and
how some of these activities are degrading the bringing about climate change. Biswas (2010)
environment in developing countries. looked at the impact of extensive agricultural
The Asansol-Durgapur region in the eastern expansion on water availability and LULC
part of India has been an economic downturn changes in the rice bowl of Eastern India (Biswas
and industries shuttered. The region came to 2010). It was observed that over a period of
be known as the Ruhr of India because of the three decades [1971 2001 roughly coinciding
large number of large industrial units set up in with the period of agricultural revolution in
the region postindependence. These units were the region that turned mono-crop land to
set up with adequate attention to environmental three-crop land through intensive irrigation,
standards and impact on local pollution. How- mechanization, improved seeds, pesticides, and
ever, since the early 1990s the region faced an chemical fertilizer], land use shifted strongly
economic downturn and industries shuttered. By in favor of agriculture. However, the strategy
the early 2000, the government started a new was water intensive and resulted in lowering of
industrialization drive in the region by providing the groundwater table from 8 m to 15 m in just
several concessions to entrepreneurs including 10 years. The number of surface water bodies
scal bene ts, a map of deregulated land use, and decreased drastically, as did their total surface
fast licensing. A large number of industrial units area. Markov chain modeling predicts a 50 %
came up over the next ve years, most of them in decline in water bodies over the next 25 years
the earlier green belts and close to the residential along with a corresponding rise in cultivated area
areas. Most of them were also polluting in na- and settlements. However, this situation is not
194 Climate Change and Developmental Economies
and his team were able to put together a solar determination. On the other hand, it will make
lighting system that was suitable for the harsh them resilient to deal with the adverse impact of
environment of rural Karnataka, at about INR climate change.
5000 per light, it was not affordable by those at
the base of the economic pyramid. They could Future Directions
only buy the product if they were provided credit.
However, the banks were not ready to lend to the C
First and foremost, Ecomodernist Manifesto are
poor, especially because lighting systems were calls to action and discussion. Though the Man-
viewed as a consumer durable product and banks ifesto identi es the challenges of global climate
were instructed to provide loans to the poor only change to be technological ones, it recognizes
for income generating activities. Thus, Harish the need to adopt certain values in society to
realized that it was necessary to link the purchase fully address them, including democracy, toler-
of solar lights to a stream of income. That was ance, and pluralism. However, in order to reach
not too dif cult to do because solar lights could the great Anthropocene era that the Manifesto
increase the number of business hours for those strives for, private businesses and state institu-
who had to close shop after sundown because of tions must invest in technological research and
lack of electricity. Moreover, there were others embrace regulations to mitigate emissions. Tech-
who were purchasing kerosene to do business nologies including nuclear power, wind power,
after sunset. Selco could structure a nancing solar power, and desalination remain either un-
plan for them such that the money they saved sustainable and carbon intensive or economically
from not having to buy kerosene was more than inef cient. Scalable, power dense alternatives to
the money they would have to pay for loan carbon energy must be developed in order to both
repayment. Finally, after a lot of convincing, urbanize and intensify agriculture and simultane-
banks started to provide credit for purchasing ously reduce human impacts on the environment.
solar lights and Harish s dream of selling solar The environmental Kuznets curve, too, re ects a
lighting systems to the poor took concrete shape. certain sense of optimism in its own modeling.
Harish also realized that apart from nancing, One could argue that either the Ecomodernist
Selco also needed to provide prompt service to Manifesto or the EKC simpli es the idea of
its customers. Since customers would depend on decoupling too much. However, the concept
its lights to run their business, any downtime that economic growth does not necessarily need
would imply loss of livelihood opportunity and to be the cause of environmental degradation
thereby loss of credibility. Selco therefore estab- may be a positive framework to encourage action
lished a wide network of service centers all across on sustainable development.
Karnataka so that service engineers could reach
even the most remotely located customer within
a reasonable amount of time. Cross-References
Selco s journey, apart from being inspi-
rational, holds a lot of lessons for social Climate Adaptation, Introduction
entrepreneurs and others who engage with the Climate and Human Stresses on the Water-En-
problem of seeking market based solutions for ergy-Food Nexus
poverty alleviation. However, changes need to
be systemic in order to have any perceptible
impact on resilience or mitigation. Therefore, References
such efforts need to be scaled in multiple domains
such as healthcare, education and livelihood Asafu-Adjaye J, Blomquist L, Brand S, Brook BW, De-
fries R, Ellis E, Foreman C, Keith D, Lewis M, Lynas
generation. This, on one hand, will reduce the M, Nordhaus T, Pielke R, Pritzker R, Roy J, Sagoff
economic vulnerability of a large section of the M, Shellenberger M, Stone R, Teague P (2015) An
population, giving them opportunity for self- ecomodernist manifesto, pp. 32
196 Climate Extremes
Barrett S, Graddy K (2000) Freedom, growth, and Adler Baum Brunner P Eickemeier B Kriemann J
the environment. Env Dev Econ. core/journals/ Savol. Schl mer C, Von Stechow T, Zwickel JC x (eds)
environment-and-development-economics/article /free Cambridge University Press, Cambridge/New York
dom-growth-and-the-environment / 393DCC0CAB23F van Ginneken W (2003) Extending social
8A9837DCC892B3CB90A. Accessed 6 Sept 2016 security: policies for developing countries. Int
Biswas B (2010) Changing water resources study using Labour Rev 142:277 294. doi:10.1111/j.1564-
GIS and spatial model a case study of Bhatar Block, 913X.2003.tb00263.x
district Burdwan, West Bengal, India. J Indian Soc
Remote Sens 37:705 717. doi:10.1007/s12524-009-
0049-z
Brundtland GH (1985) World commission on environment
and development. Env Policy Law 14:26 30
Climate Extremes
Chakraborti D, Rahman MM, Das B, Murrill M, Dey
S, Chandra Mukherjee S et al (2010) Status of Climate Extremes and Informing Adaptation
groundwater arsenic contamination in Bangladesh:
a 14-year study report. Water Res 44:5789 5802.
doi:10.1016/j.watres.2010.06.051
Field CB (2012) Managing the risks of extreme events and
disasters to advance climate change adaptation: spe- Climate Extremes and Informing
cial report of the intergovernmental panel on climate Adaptation
change. Cambridge University Press, Cambridge
IPCC. Climate Change (2014) Impacts, adaptation, and
vulnerability. Part B: regional aspects. Contribution Hayden Henderson1;2 , Laura Blumenfeld1 ,
of working group II to the fth assessment report Allison Traylor1;3 , Udit Bhatia1 , Devashish
of the intergovernmental panel on climate change Kumar1 , Evan Kodra1;4 , and Auroop R.
[Barros VR, Field CB, Dokken DJ, Mastrandrea MD, Ganguly1
Mach KJ, Bilir TE, Chatterjee M, Ebi KL, Estrada 1
YO, Genova RC, Girma B, Kissel ES, Levy AN, Sustainability and Data Sciences Laboratory
MacCracken S, Mastrandrea PR, White LL (eds)]. (SDS Lab), Department of Civil and
Cambridge University Press, Cambridge/New York Environmental Engineering, Northeastern
IPCC-TAR M (2001) Third assessment report of the in- University, Boston, MA, USA
tergovemmental panel on climate change. Cambridge 2
University Press, New York Department of Mechanical and Industrial
Martinez-Alier J (1995) The environment as a luxury good Engineering, Northeastern University, Boston,
or too poor to be green ? Ecol Econ 13:1 10 MA, USA
Smith KR, Woodward A, Campbell-Lendrum D, Chadee 3
Department of Political Science, Northeastern
DD, Honda Y, Liu Q et al (2014) Human health:
impacts, adaptation, and co-bene ts. In: Field CB, University, Boston, MA, USA
4
Barros VR, Dokken DJ, Mach KJ, Mastrandrea MD, risQ Incorporated, Cambridge, MA, USA
Bilir TE et al (eds) Climate change 2014 impacts
adapt. Vulnerability Part Glob. Sect. Asp. Contrib.
Work. Group II Fifth Assess. Rep. Intergov. Panel Synonyms
Clim. Change. Cambridge University Press, Cam-
bridge/New York, pp 709 54
Steinberger JK, Roberts JT (2010) From constraint to Climate adaptation; Climate change; Climate im-
suf ciency: the decoupling of energy and carbon pacts; Climate resilience; Climate risks; Climate
from human needs, 1975 2005. Ecol Econ 70:425 33. variability; Disaster risks; Floods and droughts;
doi:10.1016/j.ecolecon.2010.09.014
Weather extremes
Stern DI (2004) The rise and fall of the environ-
mental Kuznets curve. World Dev 32:1419 1439.
doi:10.1016/j.worlddev.2004.03.004 Definitions
Stocker TF, Qin D, Plattner GK, Tignor M, Allen SK,
Boschung J et al (2013) Climate change 2013: the
physical science basis. Intergovernmental panel on Climate Extremes
climate change, working group I contribution to the Climate extremes may be de ned inclusively as
IPCC fth assessment report (AR5), New York severe hydrological or weather events, as well
Summary for Policymakers. Clim. Change (2014) Mitig.
as signi cant regional changes in hydromete-
Clim. Change Contrib. Work. Group III Fifth Assess.
Rep. Intergov. Panel Clim. Change Edenhofer O R orology, which are caused or exacerbated by
Pichs-Madruga Sokona E Farahani Kadner K Seyboth climate change and which may in turn cause
Climate Extremes and Informing Adaptation 197
anticipatory planning. Weather and hydrologic As a result, the uncertainty in future changes
hazards may be caused or exacerbated by natural of extreme events, especially at the local and
climate variability and climate change. However, larger scale, is great. The uncertainty created
the hazards turn into disasters and indeed catas- by a changing climate and dynamic develop-
trophic events when infrastructures and lifelines ment trajectories poses challenges for decision-
are vulnerable and when exposure to hazards is making. This section outlines methods that can
high. be used to quantify, characterize, and attribute
For example, in 2005 during Hurricane Kat- extremes to inform adaptation and policies. In
rina, the eye of the hurricane passed east of the the context of climate, while there are different
city of New Orleans without causing catastrophic types of extremes, heat waves and cold snaps
damage to buildings and structures. However, are the most dif cult to quantify, and hence we
ood walls and levees designed to protect the city focus on the methods related to these. Methods
from oods were breached at more than 50 loca- to quantify extremes are classi ed into three
tions leaving approximately 8 % of New Orleans broad categories: (a) impact relevant metrics,
ooded. Hence, how much human population (b) methods to quantify trends in time and space,
are affected by changes in extreme weather also and (c) extreme attribution.
depends on level of adaptability and preparedness
in addition to exposure and vulnerability. (a) Impact relevant metrics
The major constraints in translating climate Impact relevant metrics include heat
extreme science to adaptation-relevant insights waves (de ned as prolonged period of
are the uncertainties in our understanding and excessively hot weather. While de nitions
in projections at (local to regional) scales and vary, a heat wave is measured relative to
(decadal) planning horizons relevant to stake- the usual weather in the area and relative to
holders. At regional and decadal scales process normal temperatures for the season) and cold
understanding and model projections are less spells (de ned as rapid fall in temperature
accurate, while at decadal scales the uncertainties within a 24-h period requiring substantially
are dominated by natural variability and hence increased protection to agriculture, industry,
dif cult to translate to risk-based design princi- commerce, and social activities. The precise
ples. While there is strong evidence of human criterion for a cold wave is determined by the
in uence in the warming of the atmosphere and rate at which the temperature falls and the
the ocean and in changes in the global water minimum to which it falls. This minimum
cycle and changes in climatic extremes (Qin et al. temperature is dependent on the geographical
2013), the low con dence in the presence of region and time of year).
trends in certain extreme events such as inten- (b) Methods to quantify trends in time and
si cation of hurricanes, droughts, and the sub- space
sequent attribution to human activities makes A few examples of these methods include
adaptation and planning for these extreme events (but not limited to) generalized extreme value
a daunting task (Table 1). theory (GEV), trend analysis, and covariates
in extremes.
Scientific Methods
Generalized Extreme Value
IPCC s fth assessment calls for more attention
to how adaptation is implemented in response Generalized extreme value (GEV) theory is a
to climate risks with special focus on the role family of continuous distribution that combine
of extremes in the adaptation process (Change type I (Gumbel), type II (FrØchet), and type III
IP on C 2014). However, future climate simula- (Weibull) extreme value distributions. The GEV
tions display large uncertainty in mean changes. is the only possible limit distribution of sequence
Climate Extremes and Informing Adaptation, Table 1 Summary of global-scale increase in uncertainty as we move down the table (Source: IPCC AR5 (Field 2012)
assessment of recent observed changes and human contribution to the extremes, both Working Group I (WGI) Summary for Policy Makers, Table SP)
in terms of detection of change and attribution to humans for the changes. Note the
Phenomenon and direction of trend Assessment that changes occurred (typically since 1950 unless Assessment of a human contribution to
otherwise indicated) observed changes
Warmer and/or fewer cold days and nights over most land areas Very likely Very likely
Very likely Likely
Very likely Likely
Warmer and/or more frequent hot days and nights over most land Very likely Very likely
areas Very likely Likely
Very likely Likely (nights only)
Climate Extremes and Informing Adaptation
Warm spells/heat waves. Frequency and/or duration increases over Medium con dence on a global scale Likelya
most land areas Likely in large parts of Europe, Asia
and Australia
Medium con dence in many (but not Not formally assessed
all) regions
Likely More likely than not
Heavy precipitation events, Increase in the frequency, intensity, Likely more land areas with increases Medium con dence
and/or amount of heavy precipitation than decreasesc
C
200
independent and identically distributed random which may be linear or non-linear. Trend may be
variables maxima that are properly normalized. linear or nonlinear. To test the presence of trends,
The GEV has cumulative distribution function: simple linear regression is most commonly
n o used to estimate the slope in combination with
x 1n
F .xI ; ; / D exp 1C signi cance tests such as parametric Student s
t-test or nonparametric Mann-Kendall test
(1)
(to test both linear and nonlinear signi cance) C
It is the three parameter distribution where , with the underlying null hypothesis that no trend
, and represent location parameter, scale pa- is present.
rameter, and the shape parameter, respectively. In
statistics, location parameter determines the shift (c) Extreme attribution
of distribution, scale parameter quanti es spread Weather and climate extremes occur all
(or variability) of the distribution, and shape the time, with or without climate change.
parameter controls symmetry of the distribution However, as shown in Table 1, there is a
(Coles 2001). In the context of climate, Fig. 1 justi able and strong sense that some of these
(top row) shows how changes in the location extremes are evolving and becoming more
parameter would impact the distribution of ex- frequent, and the primary reason can be at-
tremes, and similarly middle and bottom rows tributed to human-induced changes in cli-
show the corresponding changes in extremes (or mate. However, given the small signal-to-
tails) when scale and shape factors are changed noise ratio and uncertain nature of forced
(Kodra and Ganguly 2014). changes, attributing changes solely to human-
To model series of extremes, a series of in- induced changes or natural variability can
dependent observations X1 ; X2 ; : : : Xn is consid- be misleading (Trenberth et al. 2015). Ex-
ered for some large value of n. Data is blocked treme attribution studies aim to determine to
into such sequences and a series of block maxima what extent human-induced climate change
Mn1 ; Mn2 ; : : : Nnm to which GEV is generated. has altered the probability or magnitude of
For example, if n corresponds to the number particular events with signi cant con dence
of observations in each year and m number of levels (Stott et al. 2016). This section dis-
years are considered, block maxima corresponds cusses some of the methods used for extreme
to annual maxima. attribution.
Estimates of extreme quantiles of the annual
maximum distribution are then obtained by in- Fractional Attributable Risk
verting (1): If A is the probability of a climatic event occur-
ring in the presence of human-induced forcing,
h i and Bis the probability of it occurring if the same
xp D 1 yp (2) forcing had not been present, then the fraction
of the current risk that is attributable to past
where xp is the return level associated with return greenhouse gas emissions (fraction of attributable
period 1/p. In other words, xp is exceeded by the risk; FAR) is given by FAR D 1 A=B.
annual maxima in a given year by probability p.
Model Approaches
Analysis of Trends General circulation models (GCMs), which often
The detection, estimation, and prediction of include biological, chemical, geological, atmo-
trends and associated statistical signi cance are spheric, and oceanic processes, provide the most
important aspects of climate extremes to analyze comprehensive simulations of the climate system.
extremes. For example, given a time series Data from model experiments with different
of temperature, the trend is the rate at which climate forcing combinations are available from
temperature changes over a given period of time, Climate Research Program s Coupled Model
202 Climate Extremes and Informing Adaptation
Probability of Occurrence
Fig. 1 The IPCC SREX a
discussed the potential
global warming
consequences on
(temperature, in this case)
extremes through three more
representational images hot
and assuming Gaussian less weather
(normal bell shaped, cold
symmetrical distribution). weather
less more
The rst image (top) extreme cold extreme hot
depicts a shift in the mean weather weather
without any other change
in the temperature
distribution, leading to
more hot extremes but less Increased Variability
cold extremes. The middle
Probability of Occurrence
Changed Symmetry
Probability of Occurrence
c
Without climate change
With climate change
Intercomparison Project Phase 5 (CMIP5) (Tay- such as temperature, precipitation, humidity, etc.
lor et al. 2012). This data typically involves pool- The distribution of variables in the world with
ing data from multimodel ensembles of simula- human in uences and the world without these
tions with and without anthropogenic in uences in uences thus can be constructed from which es-
to generate large samples of the relevant variables timates of FAR can be obtained (Stott et al. 2016).
Climate Extremes and Informing Adaptation 203
Climate Extremes and Informing Adaptation, Fig. 2 be prepared. In the current era of greenhouse gas-driven
The 2012 IPCC SREX depicts the connection between climate change, even hazards are not immune to human
climate-related hazards and vulnerability or exposure. in uences (Figure source: IPCC SREX (1.1.2, gure 1 1)
While hazards have traditionally been considered acts of Lavell et al. 2012)
God, disasters are caused by the very human failure to
Informing Adaptation: Risk that event. In the context of climate and weather
Management extremes, hazard (H) can be visualized as an
outcome of an extreme event. For example,
Adaptation to climate extremes and preparedness in the context of planning for transportation
to disaster seek to reduce factors and modify en- systems, H may represent the severe snowstorm
vironmental and human contexts that contribute or hurricane that can potentially deviate the
to climate-related risk, to promote sustainabil- system from its normal functionality. In addition
ity in social and economic development (Lavell to hazard (H) and its subsequent probability of
et al. 2012). The promotion of adequate pre- occurrence p(H), risk to the system also depends
paredness for disaster is also a function of dis- on likelihood of vulnerability p(V) and chances
aster risk management and adaptation to climate of the system getting exposed to these risks.
change. Mathematically, risk can be quanti ed as:
One of the many ways in which climate
change is likely to affect societies and ecosystems Risk D p.H/ p.E/ p.V/ (3)
around the world is through extremes and
changes in extreme events (Fig. 2). As a result, Risk in a system is interpreted as total reduction
regularly updated appraisals of evolving climate in functionality and is related to the temporal
conditions and extreme weather would be effect of an extreme event on the system (Linkov
immensely bene cial for adaptation planning. et al. 2014).
In fact, in a conventional risk framework, one
of the components is probability of occurrence
of hazard. Risk analysis methods identify the Resilience Framework
vulnerabilities of speci c components of a
system to an adverse event and quantify the loss As discussed in the previous section, adaptation
of functionality of the system as a consequence of to climate extremes seeks to reduce factors
204 Climate Extremes and Informing Adaptation
stakeholders and policy makers is the inability understanding of the relevant processes. The so-
to produce credible assessments of local to called Big Data methods can succeed in the con-
regional climate extremes. Results from the text of climate extremes if in addition to handling
latest generation of global climate model runs massive data volumes, nonlinear data generation
do not suggest the possibility of signi cant processes, complex proximity based as well as
improvements in the near future, while regional long-memory and long-range dependence in time
climate models remain promising. However, and space, and extreme events or change can be C
ultrahigh-resolution models and physical directly addressed.
understanding continue to improve process
models. On the other hand, climate-related data,
from archived model simulations, and remote
Cross-References
or in situ sensors, have already moved into
the petabyte scale and are projected to reach
Climate Adaptation, Introduction
350 PB by 2030. Thus, data-driven hypothesis
Informing Climate Adaptation with Earth Sys-
examination and hypothesis generation need
tem Models and Big Data
to leverage methods for handling massive and
complex data. Geographical information science,
comprising both geospatial process models and
References
data science developments, can help address
these challenges. Aerts JCJH, Botzen WJW, Emanuel K, Lin N, Moel H de,
Michel-Kerjan EO (2014) Evaluating ood resilience
strategies for coastal megacities. Science 344:473
475. doi:10.1126/science.1248222
Role of Big Data in Extreme Event Bhatia U, Kumar D, Kodra E, Ganguly AR (2015) Net-
Mining work science based quanti cation of resilience demon-
strated on the Indian Railways network. PLoS ONE
Generalized extreme value distribution is the only 10:e0141890. doi:10.1371/journal.pone.0141890
Change IP on C (2014) Climate change 2014 impacts,
possible limit distribution of properly normalized adaptation and vulnerability: regional aspects. Cam-
maxima of a sequence of independent and iden- bridge University Press, New York
tically distributed random variables. However, Coles S (2001) An introduction to statistical modeling of
climate extreme events that can be correlated with extreme values. Springer, London
Disaster Resilience: A National Imperative n.d. http://
space and time may deviate from the assump- www.nap.edu/openbook.php?record_id=13457.
tion of proper normalization. Hence, statistical Accessed 1 July 2015
approaches have not been well developed for Field CB (2012) Managing the risks of extreme events and
a majority of climate extremes. Nonlinear dy- disasters to advance climate change adaptation: spe-
cial report of the intergovernmental panel on climate
namical approaches are better at characterizing change. Cambridge University Press, New York
the climate system rather than generating projec- Ganguly AR, Kodra EA, Agrawal A, Banerjee A, Boriah
tions and, even so, are not well developed for S, Chatterjee S et al (2014) Toward enhanced under-
predictability assessment in climate. Traditional standing and projections of climate extremes using
physics-guided data mining techniques. nonlinear Pro-
spatial and spatiotemporal data mining in com- cess Geophys 21:777 795. doi:10.5194/npg-21-777-
puter science, while well suited to certain kinds 2014
of geographic data, cannot handle the complex Gao J, Barzel B, BarabÆsi A-L (2016) Universal resilience
dependence structures, low-frequency variability, patterns in complex networks. Nature 530:307 312.
doi:10.1038/nature16948
and nonlinear data generation processes relevant Kao S-C, Ganguly AR (2011) Intensity, duration, and
for predicting climate extremes. The barriers are frequency of precipitation extremes under 21st-
particularly challenging given the so-called deep century warming scenarios. J Geophys Res Atmos
uncertainties in climate arising from both natural 116:D16119. doi:10.1029/2010JD015529
Karl TR, Melillo JT, Peterson TC (2009) Global climate
variability in the climate system, such as from change impacts in the United States. Cambridge Uni-
oceanic oscillators combined with our lack of versity Press, New York
206 Climate Finance
be maintained while allowing for a systematic across organizational and jurisdictional barriers.
failure of components. The methods have been Adaptation to climate change and climate-related
successfully applied to earthquake engineering. weather or hydrologic extremes, especially
However, the challenges and the opportunities over the lifetime of infrastructure sectors and
become signi cantly different when concepts lifelines, requires an understanding on both the
from engineering design need to be generalized nonstationary nature of climate stressors and the
to embedding resilience in critical infrastructures, deep uncertainties. The earth s climate system C
especially in the context of adapting to threats is fundamentally changing in ways such that the
resulting from climate change. The current state past is no longer an effective guide to the future
of practice and research consider three related in terms of design parameters. Uncertainties
issues, speci cally, the nature of the climate and resulting from both our lack of understanding
related stressors, the de nition of the stressed and the intrinsic variability of the climate system
systems under consideration, as well as the cannot be assigned likelihoods. The situation
evolving concept of resilience. Resilience in this calls for exible design principles, which remain
context goes beyond robustness to the immediate risk informed and resilience centric. Case studies
effects of a hazard as well as the ability to discuss urban heat islands, sea level rise and
gracefully recover from the aftermath in a timely, land subsidence, hurricanes and storm surge in
cost-effective, and ef cient manner. In other coastal megacities, and severe droughts with
words, resilience is de ned as the ability of the consequences for the nexus of food-energy-water.
entire system to maintain essential functionality
despite acute or chronic stressors and, in the event
of failure or loss of functions, gets back to nor- Probabilistic Risk Assessments and
malcy quickly and easily. The stressed systems Climate Hazards
of primary concern are what have been called
critical infrastructures and lifeline infrastructure The Special Report on Extremes (IPCC 2012) as
networks. The United States Department of well as the Intergovernmental Panel on Climate
Homeland Security de nes 16 critical infrastruc- Change s Fifth Assessment Report (AR5) (IPCC
ture sectors, speci cally, chemical, commercial 2014a, b) published in 2013 2014 depicts
facilities, communications, critical manufactur- how climate extremes may turn into disasters
ing, dams, defense industrial base, emergency depending on vulnerability and exposure. The
services, energy, nancial services, food and framework relies on risk computations, where
agriculture, government facilities, healthcare and three aspects are considered: hazards, or the
public health, information technology, nuclear probability of threats; vulnerability, or the
reactors and materials and waste, transportation, probability of damage conditional on hazards;
and water and wastewater. The National and consequences or economic damages and/or
Infrastructure Advisory Council lists four critical losses of human lives. Climate hazards may be
lifeline infrastructure networks: transportation, broadly de ned to include either extreme weather
electricity and power, communications, and or hydrological events or changes in regional
water and wastewater. Developing resilience hydrometeorology, which may be caused or
across these lifelines and sectors requires an un- exacerbated by climate variability or change
derstanding of the cascading interdependencies and which could stress all or parts of the coupled
across infrastructure elements and networks, the natural-engineered-built systems (Fig. 1). Recent
ability to design systems for effective response climate hazards in the United States include
and recovery, the ability to design for greater hurricanes Katrina in New Orleans in 2005 and
resilience, the availability of appropriate metrics Sandy in New York/New Jersey in 2012, oods
and nancial instruments or economic incentives, in Iowa in 2013, the 2010 (ongoing) droughts in
as well as the ability to effectively govern California, the 2014 cold snaps in the Northeast,
208 Climate Hazards and Critical Infrastructures Resilience
Climate Hazards and Critical Infrastructures Re- to identify the vulnerabilities of both natural and built
silience, Fig. 1 Schematic representation of probabilistic environments to an expected climate-related hazard and
risk assessment (PRA) methods in context of climate- quantify the losses as a result of consequences of these
related hazards. PRA methods such as this can be used events
and 2012 summer heat waves across the United to examine the impacts of strategic policy
States. Hurricane Katrina was a Category 5 over and tactical interventions. Figure 1 shows a
the Gulf of Mexico but reduced to a Category 3 comprehensive depiction of PRA and PRA-
by the time it made landfall on the Gulf Coast. inspired methods, which have been or could
However, the natural phenomena, the hurricane be used in the context of climate hazards. In
hazard itself in this case, was not the sole reason the context of climate-change impacts, risk is
why Hurricane Katrina was the costliest natural often represented as probability of occurrence of
disaster in US history. In fact, post-landfall hazards, including but not limited to extremes
news for a while appeared to suggest that the such as heat waves, droughts, oods, cold
storm was moving northward over land, but snaps multiplied by the impacts these events
no major destruction was reported. However, may cause on natural and human systems.
it was then that the levee, which was known to Climate observations from in situ and remote
be highly vulnerable to start with, broke from sensors such as satellites, reanalysis data (Kalnay
the weight of the water. The resulting oodwater et al. 1996), and data from general circulation
devastated New Orleans, where to start with model (GCMs) are assimilated together with
the human settlement patterns were susceptible. Greenhouse emission scenarios, multi-model
This is where the hazard (Hurricane Katrina) ensembles, and multiple initial conditions of
interacted with the vulnerability of a critical GCMs (see next sections) to project the changes
infrastructure (levee) as with exposure (e.g., and variability in climate and climate-related
human settlements in this case) to result in levels extremes. However, global circulation models
of losses that were historically unprecedented are run at a coarse spatial resolution, typically of
and thus far unsurpassed within the United the order of 100 km and are unable to delve
States. Probabilistic risk assessments (PRA) thus information at the local to regional scales
remain important to extract a comprehensive relevant to policymakers and stakeholders. As
characterization of climate hazards, understand a result, GCM output cannot be directly used
how and when the hazards may turn into for impact assessment at regional or local scales.
catastrophic disasters, and perhaps even used To overcome this problem, downscaling is often
Climate Hazards and Critical Infrastructures Resilience 209
used to obtain local-scale climate projections enormous interest, most of the research endeavors
at ner resolution from atmospheric variables have focused on the isolated systems. However,
provided by GCMs (Ghosh and Mujumdar critical infrastructures including lifelines exhibit
2008). Interaction of the climate-related hazards a large number of interdependencies. These in-
with the exposure and vulnerabilities of critical terdependencies could be cyber or cyber-physical
infrastructures and population put these systems (Buldyrev et al. 2010), geographical (SolØ et al.
at risk, resulting in the loss of economy or/and 2008) or political, and so on. Traditional risk C
human lives. However, quanti cation of climate- analysis methods focus on identi cation of vul-
related hazards and related risk is associated with nerabilities of speci c system components. Sub-
uncertainties arising out of natural variability, sequent risk management frameworks, hence, fo-
anthropogenic climate change, or a combination cus on strengthening these speci c components
of both. Hence, uncertainty quanti cation and to prevent overall system failure (Linkov et al.
characterization forms a crucial part of PRA- 2014). However, the factors which make tra-
inspired methods before they can be deployed ditional risk assessment tools unviable are as
to motivate strategic policy changes and resilient follows: (1) complexity and interconnectedness
design practices. of infrastructure networks including lifelines and
(2) nonstationarity and deep uncertainty asso-
Resilience Paradigm: Beyond ciated with climate hazards. However, the de-
Probabilistic Risks velopment of resilience at system level faces
the following challenges: the lack of consensus
While critical infrastructure systems and life- over de ning and quantifying resilience, lack of
lines were built as isolated entities, in actuality preparedness for foreseeable and unforeseeable
they are functionally interdependent. Disasters risks under changing climate, absence of incen-
ranging from hurricane to large-scale power out- tive structure for public and private infrastructure
ages have shown how failure in one system may owners to create resilience, and organizational
trigger a cascade of failures in interdependent barriers to creating resilience. Figure 2 sums
infrastructure systems. Although investigation of up the barriers and plausible solutions to over-
resilience in infrastructure systems has triggered come these in order to translate resilience from
Climate Hazards and Critical Infrastructures Re- infrastructure and lifeline systems. Visualization and un-
silience, Fig. 2 De ciencies in critical infrastructure re- derstanding resilience is an obligatory part of the frame-
silience arise from four broad challenges. The four pil- work to enforce resilient engineering and policy practices,
lars outline the elements of solution to overcome these which in turn, requires exhaustive understanding of inter-
challenges to embed resilience in functioning of critical dependencies of various infrastructure systems
210 Climate Hazards and Critical Infrastructures Resilience
Climate Hazards and Critical Infrastructures Re- and plan for adverse events, resilience management goes
silience, Fig. 3 Resilience management framework beyond and integrates the capacity of a system to absorb
adapted from the commentary in Nature by Linkov et al. and recover from adverse events, and then adapt. The
While probabilistic risk assessment-enabled methods give dashed line suggests that state of the system after recovery
the probability of system hitting the lowest point of its may be better or worse with respect to the initial perfor-
essential functionality and thus help the system prepare mance, depending upon the system resilience
a mere buzzword to operational paradigm for ate threats (e.g., terrorism, sabotage). Over the
system management (Linkov et al. 2014). As last decade, there have been considerable ad-
highlighted in the correspondence piece (Fisher vances in the understanding of cascading inter-
2015), resilience has been de ned in more than dependencies of the lifeline networks (Buldyrev
70 ways in the literature. While the National et al. 2010; Ko et al. 2013; Hernandez-Fajardo
Academy of Sciences (Disaster Resilience 2015) and Dueæas-Osorio 2013). However, they had
de nes resilience as the ability to prepare and relatively little impact on the design of resilient
plan for, absorb, recover from, and more success- interconnected infrastructures to mitigate the risk
fully adapt to adverse events , many scientists of cascading failures because the applicability of
have just focused on the recovery part (Fig. 3a) to these frameworks on real-life networked infras-
de ne resilience as the system s ability to bounce tructures is not a trivial task, because the over-
back after stress. Long-term policies based on simpli ed assumptions on which these models
the two extreme ends of de nitions are likely are based may not be valid for the inextricable
to be very different and would be associated interdependent systems (Vespignani 2010).
with different costs, depending on the de nitions
and metrics we adopt to measure resilience. At
the regional scale, the structure and function of Climate Hazards: Variability and
infrastructure systems particularly in the life- Deep Uncertainty
line sectors are appropriately represented us-
ing network models and network science tools As discussed in previous sections, both PRAs
(SolØ et al. 2008; Albert et al. 2000; Sen et al. and resilience management framework include
2003; Guimer et al. 2005). A key issue for risk analysis as a central component. However,
assessing and improving the resilience of infras- climate change might produce extreme events
tructure systems is to understand the behavior that cannot be predicted precisely, particularly
of the lifeline sectors during normal operating at the spatial resolutions and time horizons
conditions, as well as in the presence of both relevant to the infrastructure owners and
nondeliberate hazards (e.g., natural hazards, hu- managers. Time horizons to be considered
man accidents, technology failures) and deliber- for emergency(Aerts et al. 2014) management
Climate Hazards and Critical Infrastructures Resilience 211
Climate Hazards and Critical Infrastructures Re- terdependent lifeline services, including water distribution
silience, Fig. 4 Flowchart showing events resulting in and wastewater distribution networks, transportation net-
2012 blackouts and resulting consequences on other in- works, and healthcare services
locally and nationally. This study addresses air Stats.com (FlightStats 2015) issued a report stat-
traf c delays, diversions, and ight cancellations ing that from October 27 to November 1 in North
caused by extreme winter weather events and America alone, 20,254 ights were canceled due
system recovery. An airport that is better prepared to Hurricane Sandy. Roughly 9,978 ights were
to respond to weather hazards operates more canceled at New York area airports alone. United
ef ciently for passengers and airlines and can stands as the airline with the most cancellations
avoid signi cant negative impact to the NAS as by Sandy (2,149) followed by JetBlue (1,469),
a whole. US Airways (1,454), Southwest (1,436), Delta
As of August 21, 2014, there were 19,453 air- (1,293), and American (759). In an examina-
ports in the United States (IPCC 2014a). Five of tion of weather events over the past 7 years,
the busiest are located in the Eastern Service Area Sandy comes in second in terms of total number
(ESA) of the National Airspace System (NAS): of cancelled ights, behind the North American
Atlanta, New York s JFK, Boston, Philadelphia, Blizzard of February 2010 (22,441 ights), for
and Washington DC. Numerous studies (Jarrah which the Blizzard of January 2015, designated
et al. 1993; Abdelghany et al. 2004) have shown Juno, is compared in this report. Airport system
that convective weather in/around airports are a capacity directly relates to NAS capacity, and
major cause of ight delays and a signi cant Juno adversely affected airports and air traf c in
causal factor in aircraft accidents. In 2012, Flight- the system.
Climate Hazards and Critical Infrastructures Resilience 213
Climate Hazards and Critical Infrastructures Re- international airport (Adapted from Massacheusetts Bay
silience, Fig. 5 Massachusetts Bay Transit System: light Transportation Authority, Boston)
rail routes (Green, orange, blue, red lines) and bus route to
Case Study 2: 2012 India Blackouts grid for running water pumps to irrigate the paddy
On July 30 31, 2012, two severe blackouts hit elds in Kharif season.
northern and eastern India, which impacted over On July 30, circuit breakers on a 400 kV line
620 million people, across 22 out of 29 states between cities of Bina and Gwalior got tripped.
of the nation. Given the population size affected, As this line fed into another transmission section
this has been recorded as the largest power outage (Agra-Bareilly), circuit breakers at that section
in the history. Figure 1 shows how both non- also tripped. As a result of this sequential trip-
intentional manmade and natural events resulted ping, power failure cascaded through the grid.
in the collapse of the power grid. In the sum- The system failed again on the afternoon of July
mer of 2012, extreme heat caused record power 31 due to relay problem. As a result, power
consumption in northern India. The situation was stations across the affected parts went of ine,
further exacerbated by delayed monsoons, which resulting in the shortage of 32 GW of power.
resulted in drawing of increased power from the The failure cascaded through other dependent
214 Climate Hazards and Critical Infrastructures Resilience
Climate Hazards and Critical Infrastructures Re- 2012 power blackout brought more than 300 trains in
silience, Fig. 6 Illustrative representation of interdepen- northern and eastern India to a standstill, leaving people
dencies between power grid and Indian Railways network. con ned in the trains
infrastructures, hence severely affecting the func- tems such as these, operating at spatial scales
tioning of lifeline systems including transporta- ranging from local to regional to global (Fig. 4).
tion, water distribution and wastewater treatment
units, and health care services. Several hospitals Case Study 3: 2012 Blizzard 2015 and
faced interruptions in providing health services. Massachusetts Bay Transit System
Water treatment plants in affected regions were In 2015, Boston confronted the snowiest winter
shut down for several hours. More than 300 ever in the history of recording climate events.
trains, which include both long distance trains In February alone, four storms had brought
and local trains, were stalled, leaving passen- Boston record-breaking snowfall of over 100 in.
gers stuck midway. An illustration of cascad- Thousands of citizens lives were affected.
ing independencies between the power grid and Boston s transportation system undertook an
Indian Railways Network is shown in Fig. 3. unprecedented test: highways blocked, ights
This case study highlights the imperative need canceled, and train service shut down. After a
to address the model the complexities of inte- thorough analysis on dwell time and boarding
grated systems to embed resilient design prac- data of the northern stations of the Orange Line
tices into large-scale lifeline infrastructure net- (shown in Fig. 5) provided by MBTA Overhead
works (Linkov et al. 2014). Also, the role of Contact System center, the ridership decreased
geographic information systems is implicit and dramatically by nearly 30 % on the rst day after
ubiquitous to model and visualize complex sys- the blizzard and recovered rapidly on the next
Climate Hazards and Critical Infrastructures Resilience 215
one. Meanwhile, the travel time and dwell time Buldyrev SV, Parshani R, Paul G, Stanley HE,
among, between, and within stations increased Havlin S (2010) Catastrophic cascade of failures
in interdependent networks. Nature 464:1025 1028.
almost 50 %, which means the Orange Line train doi:10.1038/nature08932
system lost one-third of its capacity. According to Disaster Resilience: A National Imperative (2015) [In-
the boarding record and peak hour statistics, the ternet]. [cited 1 Jul 2015]. Available: http://www.nap.
remaining capacity can just meet the highest edu/openbook.php?record_id=13457
demand of current ridership. However, with
Fisher L (2015) Disaster responses: More than 70 C
ways to show resilience. Nature 518:35 35.
growing population and transit use, the capacity doi:10.1038/518035a
limit might become a bottleneck in front of FlightAware Flight Tracker/Flight Status/Flight Track-
extreme weather or an emergency event, let ing. In: FlightAware [Internet]. [cited 1 Jul 2015].
Available: http:// ightaware.com/
alone worse weather conditions. The subsequent FlightStats Global Flight Tracker, Status
snowstorms during February 2015 have proved Tracking and Airport Information [Internet].
this hypothesis and resulted in system shutdown [cited 1 Jul 2015]. Available: http://www. ight
at times (Fig. 6). stats.com/go/Home/home.do
Ganguly AR, Steinhaeuser K, Erickson DJ, Branstetter
Given that the capacity one train can carry M, Parish ES, Singh N et al (2009) Higher trends but
is equivalent to almost 15 buses, it is almost larger uncertainty and geographic variability in 21st
impossible to completely replace the train service century temperature and heat waves. Proc Natl Acad
by using the shuttle bus. As a result, passengers Sci 106:15555 15559. doi:10.1073/pnas.0904495106
Ghosh S, Mujumdar PP (2008) Statistical downscal-
have to turn away transit and resort to driving ing of GCM simulations to stream ow using rele-
cars as their commuter mode, which brought even vance vector machine. Adv Water Resour 31:132 146.
more congestion on the highways. The transition doi:10.1016/j.advwatres.2007.07.005
from SOV (single-occupancy vehicle) to HOV Ghosh S, Das D, Kao S-C, Ganguly AR (2012) Lack
of uniform trends but increasing spatial variability in
(high-occupancy vehicle) usage cannot be widely observed Indian rainfall extremes. Nat Clim Change
accepted if robust and reliable transit service is 2:86 91. doi:10.1038/nclimate1327
not being provided. Given the fragility and unre- Guimer R, Mossa S, Turtschi A, Amaral LAN (2005)
liability of current rail service that the northern The worldwide air transportation network: anoma-
lous centrality, community structure, and cities global
part of MBTA Orange Line presented, a more roles. Proc Natl Acad Sci USA 102:7794 7799
comprehensive evaluation of the whole MBTA doi:10.1073/pnas.0407994102
transit system, including its capacity, resilience, Hawkins E, Sutton R (2009) The potential
and future evolution, is recommended. to narrow uncertainty in regional climate
predictions. Bull Am Meteorol Soc 90:1095 1107.
doi:10.1175/2009BAMS2607.1
Hernandez-Fajardo I, Dueæas-Osorio L (2013) Probabilis-
Cross-References tic study of cascading failures in complex interdepen-
dent lifeline systems. Reliab Eng Syst Saf 111:260
272. doi:10.1016/j.ress.2012.10.012
Internet-Based Spatial Information Retrieval IPCC (2012) Managing the risks of extreme events and
disasters to advance climate change adaptation: spe-
cial report of the intergovernmental panel on climate
References change [Internet]. Available: https://www.ipcc.ch/pdf/
special-reports/srex/SREX_Full_Report.pdf
IPCC (2014a) Climate change 2014 impacts, adaptation
Abdelghany KF, Shah SS, Raina S, Abdelghany AF and vulnerability: part A: global and sectoral aspects
(2004) A model for projecting ight delays during [Internet]. Cambridge University Press. Avail-
irregular operation conditions. J Air Transp Manag able: http://www.cambridge.org/us/academic/subjects/
10:385 394. doi:10.1016/j.jairtraman.2004.06.008 earth-and-environmental-science/climatology-and-cli
Aerts JCJH, Botzen WJW, Emanuel K, Lin N, Moel H mate-change/climate-change-2014-impacts-adaptation
de, Michel-Kerjan EO (2014) Evaluating ood re- -and-vulnerability-part-global-and-sectoral-aspects-wo
silience strategies for coastal megacities. Science 344: rking-group-ii-contribution-ipcc- fth-assessment-repo
473 475. doi:10.1126/science.1248222 rt-volume-1?format=PB
Albert R, Jeong H, BarabÆsi A-L (2000) The Internet s IPCC (2014b) Climate change 2014 impacts,
Achilles Heel: error and attack tolerance of complex adaptation and vulnerability: part B: regional aspects
networks. Nature 406:200 0 [Internet]. Cambridge University Press. Available:
216 Climate Impacts
http://www.cambridge.org/us/academic/subjects/earth-
and-environmental-science/climatology-and-climate-c Climate Risk Analysis for Financial
hange/climate-change-2014-impacts-adaptation-and-v Institutions
ulnerability-part-b-regional-aspects-working-group-ii-
contribution-ipcc- fth-assessment-report-volume-2?fo
Farid Razzak
rmat=PB#contentsTabAnchor
Jarrah AIZ, Yu G, Krishnamurthy N, Rakshit A (1993) Rutgers Business School, Rutgers University,
A decision support framework for airline ight New Brunswick, NJ, USA
cancellations and delays. Transp Sci 27:266 280.
doi:10.1287/trsc.27.3.266
Kalnay E, Kanamitsu M, Kistler R, Collins Synonyms
W, Deaven D, Gandin L et al (1996) The
NCEP/NCAR 40-year reanalysis project. Bull
Am Meteorol Soc 77:437 471. doi:10.1175/1520- Carbon Emissions; Carbon Finance; Carbon
0477(1996)077<0437:TNYRP>2.0.CO;2 Trading; Climate Change; Climate Finance;
Ko Y, Warnier M, Kooij RE, Brazier FMT (2013) Climate Trend Analysis; Emissions Trading;
An entropy-based metric to quantify the robustness
of power grids against cascading failures. Saf Sci
GHG; GIS Mobile Remote Sensors; MRV;
59:126 134. doi:10.1016/j.ssci.2013.05.006 REDD+; Sequestration; Sustainability Risk
Linkov I, Bridges T, Creutzig F, Decker J, Fox-
Lent C, Kr ger W et al (2014) Changing the re- Definition
silience paradigm. Nat Clim Change 4:407 409.
doi:10.1038/nclimate2227 The climate change phenomenon is widely un-
Palmer TN, Shutts GJ, Hagedorn R, Doblas-Reyes
FJ, Jung T, Leutbecher M (2005) Representing
derstood to be magni ed by harmful greenhouse
model uncertainty in weather and climate pre- gases (GHGs) that are by-products of emissions
diction. Annu Rev Earth Planet Sci 33:163 193. yielded from advances in human engineering
doi:10.1146/annurev.earth.33.092203.122552 in the energy, technology, transportation, and
Salvi K, Ghosh S, Ganguly AR (2015) Credibility of
statistical downscaling under nonstationary climate.
land development industries. Effectively, the
Clim Dyn 1 33 doi:10.1007/s00382-015-2688-9 pollution that is being generated from human
Sen P, Dasgupta S, Chatterjee A, Sreeram PA, Mukher- activities is actively contributing to the imbalance
jee G, Manna SS (2003) Small-world properties of in the planet s climate, therefore creating the
the Indian railway network. Phys Rev E 67:036106.
doi:10.1103/PhysRevE.67.036106 scenario where human prosperity may be
SolØ RV, Rosas-Casals M, Corominas-Murtra B, Valverde severely hindered in the near future. Global
S (2008) Robustness of the European power grids industrial incentives, regulations, and policies
under intentional attack. Phys Rev E 77:026102. have been formed to mitigate the climate change
doi:10.1103/PhysRevE.77.026102
Tebaldi C, Knutti R (2007) The use of the multi-model phenomenon in the form of monetized nancial
ensemble in probabilistic climate projections. Philos instruments that can help manage the amount
Trans R Soc Lond Math Phys Eng Sci 365:2053 2075. of global pollution permitted, nancial climate
doi:10.1098/rsta.2007.2076 risk disclosures that keep investors informed
Vespignani A (2010) Complex networks: the
fragility of interdependency. Nature 464:984 985. about climate-related impacts to investments,
doi:10.1038/464984a and environmental sustainability analysis that
validates the business continuity of an investment
impacted by environmental risks.
Climate Impacts The management of future pollution that
may contribute to furthering climate change by
nancially incentivizing more prudent business
Climate Extremes and Informing Adaptation
practices and climate-friendly organizational
strategies has created opportunities for climate
change investment research. Geographical
Climate Resilience Information Systems that can provide insight
into different aspects of global climate change
Climate Extremes and Informing Adaptation facilitates data-driven nancial investment
Climate Risk Analysis for Financial Institutions 217
decisions which can create a dynamic and robust supporting the analysis, it was decided at the
relationship between the nancial and scienti c 1992 United Nations Conference on Environment
aspects of leveraging climate change mitigation. and Development (UNCED) to formally begin
This chapter will explore the historical back- action to create policies for climate change
ground of global climate change polices and leg- mitigation by commissioning the United Nations
islation over recent decades; the nancial instru- Framework Convention on Climate Change
ments, markets, and risk disclosures that resulted (UNFCCC) (Moore 2012; Raufer and Iyer 2012). C
from these policies; the relevant scienti c and The purpose of the UNFCCC was to estab-
investment approaches regarding climate change lish a voluntary commitment from the United
mitigation; how Geographical Information Sys- States and 153 other nations to reduce harmful
tems can serve as a crucial tool in the nancial greenhouse gas (GHG) emissions to environ-
applications of climate change mitigation and mentally acceptable levels within the next few
the future prospects of Geographical Information decades, to nd strategies to reduce the global
Systems in this domain. warming epidemic, and to assess viable options
to address inevitable climate change effects on
the environment (Moore 2012; Raufer and Iyer
Historical Background 2012). Annual meetings of the parties involved
with the UNFCCC have been conducted since
United Nations Climate Mitigation Polices the inception of the convention onward, formally
The environmental impacts of climate change referred to as the UNFCCC conference of par-
were not clearly understood by the nations of ties (COP), yielding progressive legislation and
the world in the early 1980s. The United States policies toward the mitigation of climate change
was the rst government to lead an exploratory (Moore 2012).
study of international environmental risks which The meetings of most signi cant and consid-
included a thorough analysis on climate change ered progressive milestones for climate change
effects. This study brought signi cant awareness mitigation policies have been that of the COP of
to the potential impacts of climate change war- 1997 in Kyoto, Japan, and the COP of 2009 in
ranting a more speci ed scienti c study of cli- Copenhagen, Denmark (Moore 2012; Raufer and
mate change to illuminate the future risks that na- Iyer 2012; Alexander 2013).
tions of the world may have to encounter (Moore
2012). Kyoto Protocol
In 1988, the United Nations World Meteoro- On December 11, 1997, during an annual
logical Organization (WMO) and the United Na- UNFCC COP in Kyoto, Japan, the Kyoto
tions Environment Program (UNEP) established Protocol was adopted and given an effective
the Intergovernmental Panel on Climate Change date of February 16, 2005. The Kyoto Protocol
(IPCC) to provide research on the science of is widely seen as the rst signi cant step toward
climate change, analyze the societal and econom- an internationally standardized GHG emissions
ical risks due to climate change, and produce reduction plan that seeks to manage harmful
strategies to mitigate the impacts that climate emissions and provide a formalized scalability
change presents for further discussion on the platform to continuously improve on climate
international topic (Moore 2012). change mitigation strategies (Moore 2012;
The rst assessment report from the IPCC Raufer and Iyer 2012; Baranzini and Carattini
was delivered on 1990 and provided ample 2014).
evidence to suggest that climate change would The Kyoto Protocol facilitated the reduction
be of crucial importance for the near future of emissions by the establishment of binding
of environmental risks and policy planning. agreements among 37 industrialized nations and
With subsequent reports from the IPCC echoing European nations which committed the nations to
similar sentiments and additional evidence reduce their GHG emissions output by an average
218 Climate Risk Analysis for Financial Institutions
of about 5 8 % from the year 1990 emissions and Carattini 2014; Henr quez 2013; Kossoy and
output by a 5-year span of 2008 2009 (Moore Guigon 2012).
2012; Raufer and Iyer 2012; Baranzini and Carat- The provision in the Kyoto Protocol also al-
tini 2014). More importantly, the Kyoto Protocol lows the trade of other equally important environ-
placed a larger responsibility and burden on the mental reduction targets such as the removal units
developed countries due to the accepted notion (RMU) based on land use, land use change, and
that they were the primary contributors to the cur- forestry (LULUCF) activities to help mitigate de-
rent amount of GHG emissions in the atmosphere forestation activities which directly contribute to
(Moore 2012). the natural mitigation of climate change (Moore
Enforcement of the Kyoto Protocol was gen- 2012). Additionally the Kyoto Protocol also of-
erally conducted through industrial policies and fers the global Clean Development Mechanisms
regulations at the federal and local government that acts as the authority of GHG emissions offset
levels of each respective participatory nation programs which allows industrialized or devel-
(Moore 2012). However, the Kyoto Protocol also oping countries that engage in quali ed local
offered market-based nancial and economical projects that are designed to help reduce GHG
incentives to achieve promotion of environment- emissions or to provide environmental sustain-
friendly investments, business practices, and ability to earn certi ed emissions credits (CER).
technologies as well as to meet GHG reduction A CER is the equivalent of 1 ton of carbon
targets via economic and ef cient options. The dioxide (CO2 ) allowed to be emitted into the at-
market-based options that the Kyoto Protocol mosphere. CO2 is one of the harmful GHG emis-
introduced were GHG Emissions Trading, the sions that contributes to climate change. With
Clean Development Mechanism (CDM), and the these earned CER credits, they can be traded,
Joint Implementation (JI). Each of the options sold, or purchased on international markets for
followed a cap and trade framework in which the bene t of nations to meet or exceed their
there was a cap or quota on the allowed GHG emissions reduction targets (Moore 2012;
amount of commodities (emissions allowed to be Raufer and Iyer 2012). It is also important to
produced) that were in the market. The trade note that 2 % of the income proceeds from CDM
aspect refers to the ability and platform to trade projects goes toward the Kyoto Protocol Adap-
the commodities as an instrument with other tation Fund which nancially backs projects and
market participants (Moore 2012; Raufer and Iyer programs for countries that are impacted most
2012; Baranzini and Carattini 2014; Henr quez adversely from climate change effects without the
2013; Kossoy and Guigon 2012). ability to mitigate them (Moore 2012). Lastly,
the Joint Implementation provision in the Ky-
oto Protocol under Article 6 allows participating
Greenhouse Gas Emissions Trading nations to engage in quali ed projects that re-
As previously mentioned, the Kyoto Protocol duce GHG emissions in other countries to earn
receives commitments from participatory nations emissions reduction credits which can be used
to reduce GHG emission levels by a 5-year span toward the participatory nations GHG emissions
in 2008 2012. Some countries may be able to reduction targets. Joint Implementations allows
facilitate these targets while being well under for mutually bene cial partnerships that help to
target emissions levels, but some may require ad- foster prosperity in developing nations while also
ditional allowance of emissions to meet practical keeping a focus on the mitigation of climate
industrial and economic demands. To address this change (Moore 2012). It is important to note that
potential issue, Article 17 of the Kyoto Protocol all of these market-based mechanisms are heavily
allows for different market-based nancial instru- dependent on accurate analysis, measurement,
ments that can allow for trade of excess emissions and forecasting of GHG emissions to be consid-
allowances to countries that may exceed emis- ered a viable climate change mitigation strategy
sions targets (Raufer and Iyer 2012; Baranzini (Raufer and Iyer 2012; Rosenqvist et al. 2003).
Climate Risk Analysis for Financial Institutions 219
GHG emissions trading relies on the overall the Bali Action Plan was rati ed, yielding the
calculated emissions quotas for each respective REDD+ program (REDDplus). REDD+ included
nation to determine the appropriate amount of all of the original REDD stipulations but also
commoditized GHG emissions to be allowed into incorporated a focus on funding projects that
the market. For this market-based platform to created sustainable management of forests and
be successful, accurate monitoring and measure- further enhancement of forest carbon stocks in
ments of actual GHG emissions from each nation developing countries (Alexander 2013). REDD+ C
is required though regulated carbon registries and programs are based on the science that terres-
authorities (Moore 2012; Rosenqvist et al. 2003). trial forests, wetland forests, and biodiversity are
Once regulated appropriately and accurately, capable of natural carbon sequestration, where
national and regional marketplaces are allowed GHG emissions such as carbon dioxide (CO2 )
to be established so long as they follow the is captured by plant life and where carbon is
Kyoto Protocol s fundamental stipulations. This stored in the soil beneath the plant life (Alexander
has allowed for emissions marketplaces such 2013; T nzler and Ries 2012; Plugge et al. 2011;
as the then Chicago Climate Exchange (CCX) Nzunda and Mahuve 2011; Wertz-Kanounnikoff
and European Climate Exchange (ECX), both of et al. 2008).
which operated as a trading platform similar to
that of other nancial commodities exchanges, Copenhagen Accord
and now the Intercontinental Exchange Futures As the Kyoto Protocol s framework approached
Europe which is now the leading market in its expiration date of 2012, a mounting need to
emissions trading. All of which followed the develop a new framework that may extend and/or
European Union s emissions trading scheme enhance the Kyoto Protocol s principles for
(EU ETS) (Raufer and Iyer 2012; Baranzini and climate change mitigation was direly needed. The
Carattini 2014; Rosenqvist et al. 2003; Kossoy December 2009 United Nations Climate Change
and Guigon 2012). Conference in Copenhagen, Denmark, addressed
the concerns of the expiration of the Kyoto
Reducing Emissions from Deforestation and Protocol by developing, negotiating, and ratifying
Forest Degradation the Copenhagen Accord. The Copenhagen
At the 11th Conference of Parties (COP11) of the Accord committed 186 nations (including the
UNFCCC in the year 2005, the reducing emis- United States) to reduce GHG emissions levels,
sions from deforestation and forest degradation engage in clean energy projects, and put focus on
(REDD) program was established to assist with adaptation projects due to the impacts of climate
the reduction of carbon emissions and preser- change. The Copenhagen Accord also requests
vation of the forests (Alexander 2013; T nzler a technical analysis due in 2015 to determine
and Ries 2012). The program was initially devel- the need of a new potential CO2 atmospheric
oped to support the Clean Development Mech- concentration level to maintain to achieve
anism (CDM) policies under the Kyoto Proto- the underlying goals behind climate change
col to allow developing countries to gain funds mitigation (Moore 2012). The main highlights
for projects focused around the conservation, of the Copenhagen Accord included continued
afforestation, and reforestation leading to the re- action by countries to manage global temperature
duction of GHG emissions. The IPCC had ear- increases to under 2 C, submission of GHG
lier concluded that the continual degradation of emissions reduction goals by January 2010
terrestrial and wetland forests has direct impacts from each participatory country, reports from
on the mitigation of climate change (Alexander developing countries about climate mitigation
2013; T nzler and Ries 2012; Plugge et al. 2011; actions, and nancial funding for environmental
Nzunda and Mahuve 2011; Wertz-Kanounnikoff conservation projects in developing countries
et al. 2008). At the 12th Conference of Par- (Moore 2012). The Copenhagen Accord also
ties (COP15) of the UNFCC in the year 2007, stipulates that the UNFCCC will continue its
220 Climate Risk Analysis for Financial Institutions
role for nancial governance, GHG emissions More advanced statistical techniques can be
reporting and monitoring, and scienti c climate applied to derive more speci c data analysis, such
analysis for the years beyond the expiration of as Taylor diagrams, to graphically compare sta-
the Kyoto Protocol and will conduct meetings tistical correlation summaries between individual
as necessary to achieve appropriate mitigation of climate patterns(observed or modeled), empiri-
climate change (Moore 2012). cal orthogonal function(EOF), and rotated EOF
analysis to interpret potential spatial modes or
patterns of variability changes over time (Shea
Scientific Fundamentals 2014).
All the fundamental Climate Trend Analysis
Given the importance of the climate change phe- techniques are important for statistical analy-
nomenon evidenced by the international climate sis and modeling that can help produce climate
change mitigation policies mentioned in the pre- change projections for the near future. These
vious section, a substantial focus on accurately projections directly impact climate change miti-
assessing, measuring, forecasting, and validating gation policies, adaptation projects, and business
the variables of climate change emerges. All the decisions of respective stakeholders.
policies and strategies to mitigate climate change
fundamentally require measurement and valida- Surface and Air Temperature Analysis for
tion methodologies in order to succeed. The fun- Land and Sea
damental scienti c approaches to analyzing cli- Measurements of land air temperature and sea
mate change and its mitigation provide a key surface temperature (SST) are of signi cant im-
perspective of the future and how to maneuver portance to understand climate conditions in re-
accordingly to adapt to the potential impacts from spective regions. This is evidenced by the many
climate change. decades of available data of the measurements
Proper scienti c analysis can bene t all that exist previously to the climate change mit-
stakeholders within the climate change mitigation igation conversation. These measurements can
framework by providing relative perspective provide the data necessary to corroborate nd-
and data interpretation that can potentially ings from climate data models by serving as
drive strategic decisions. This section will the ground truth validation source (Hansen et al.
brie y review the popular scienti c methods 2006). More importantly, the temperature mea-
that examine aspects of climate change and its surements over land and sea can be coordinated in
mitigation. a spatiotemporal plane for pattern analysis, data
modeling, and statistical analysis.
Climate Trend Analysis Land surface air temperature weather stations
Climate can be de ned as the weather conditions are usually stationed in strategic locations
that reveal over an arbitrary period of time, which throughout a speci ed region to collect
is usually supported through conventional sta- appropriate data and summarize as the highest
tistical analysis or statistical diagnostics. Trend, and lowest temperature recorded for a particular
relative to climate, can be de ned as the gradual day, which is then reported to a central station
differences of certain climate-related variables which may collect the raw data and combine
over some period of time (Shea 2014). it with other regional surface temperature
Traditional statistical time series analysis can weather stations for further analysis. Appropriate
be conducted on temperature changes, rainfall standards are followed in the placement of
measurement, snow patterns, and ooding, and the temperature sensors which ensure they are
other climate change indicators to detect, es- impartial to in uences that may be in close
timate, and predict possible emerging climate proximity (Hansen et al. 2006).
trends are signi cant scienti c tools to better Similarly, sea surface temperatures (SST) can
understand climate change (Shea 2014). be collected by remote stations on ships or buoys
Climate Risk Analysis for Financial Institutions 221
equipped with sensors that take measurements of ing sensors that can accurately audit the amount
the water surface and summarize the highs and of GHG emissions produced. These policies are
lows of daily water temperature and levels which driven by the science that each GHG has a di-
can be later polled and combined at a central rect impact on the climate change to the planet.
station (Reynolds et al. 2007). Climate scien- These greenhouse gases that are emitted to the
tists rely on statistical anomaly analysis of the atmosphere create a barrier which does not allow
water temperature and levels to assess potential solar heat received from the sun to escape the C
inclement weather in the form of cyclones, hur- planet s atmosphere once it has reached surface
ricanes, and tropical storms. With the mentioned level, thereby increasing the climate (Myhre et al.
techniques, climatologists can develop statistical 2013).
models that can help estimate, detect, and project Scienti c methods to derive the atmospheric
future weather patterns (Reynolds et al. 2007). lifetime, which is the amount of time a gas
Satellite resolution imaging may give a may stay in the atmosphere; GHG concentra-
broader, less granular depiction of the overall tions, which are the estimated values measured
temperature ranges worldwide to help focus on in respective until current GHG emissions in
particular patterns or regions of interest, but they the atmosphere; radiative forcing, which is the
are unable to produce the amount of detail that amount of heat energy the gases absorb and keep
surface level temperature sensors can provide in the earth s atmosphere rather than allow it to
(Kungvalchokechai and Sawada 2013). leave back to space; and global warming potential
The monitoring and analysis of land surface (GWP), which is the a derived ratio from the
temperature is scienti cally linked to the planet s atmospheric lifetime and radiative forcing over a
weather and climate patterns, which can be a di- speci ed timescale to determine the impact of the
rect result of increasing atmospheric GHGs. The gas on global warming relative to carbon dioxide
temperature increases in certain regions can have (CO2 ); give climate scientists quanti able met-
effects on global glaciers, arctic ice sheets, and rics to weigh and assess the impact of each GHG
vegetation on the planet. Accurately understand- emission to appropriate mathematical models and
ing the aspects of the surface temperatures can climate data models (Myhre et al. 2013). These
give scientists a clearer picture about adaptation methodology and analysis give climate scientists
needs and climate impact projections. quanti able terms to weight and assess the differ-
ent intensities and impacts of each GHG emission
Emissions Analysis to appropriate mathematical models and climate
The term greenhouse gases emissions refer di- data models. An important note about emissions
rectly to the emissions produced from indus- that impact climate change include both natural
trial processes, transportation by-products, agri- (water vapor) and anthropogenic (pollution or
cultural by-products, and societal waste products. pollutants from human activity) sources which
The gases in questions are the following: carbon both need to be accurately quanti ed and ana-
dioxide (CO2 ), methane (CH4 ), nitrous oxide lyzed (Myhre et al. 2013).
(N2 O), per uorocarbons (PF C s), hydro uoro-
carbons (HF C s), sulfur hexa uoride (SF6 ), as Carbon Capture and Sequestration
well as the indirect gases that will not be men- Analysis
tioned here (Raufer and Iyer 2012). As mentioned The term carbon sequestration refers to the natu-
in the previous section, the success of climate ral or synthetic process of capturing and/or stor-
change mitigation policies rely directly on the ac- ing carbon dioxide (CO2 ) emissions, thereby mit-
curate measurements of past, present, and future igating climate change by reducing the amount
GHG emissions that could reach the atmosphere, of the GHG emission to reach or remain in the
thereby increasing the global temperatures. atmosphere. The natural process of achieving a
GHG emissions control stipulated from cli- balance of CO2 emissions and climate change
mate change mitigation policies require monitor- comes in the form of forested wetlands, terrestrial
222 Climate Risk Analysis for Financial Institutions
forests, and plant life, all of which have the capa- Geographical Information Systems can be
bility to capture CO2 emissions for consumption developed and customized to successfully
and store carbon into the soil which their roots are achieve the feature requirements for different
deeply entrenched (Freedman 2014; Alexander climate analysis purposes, but some of the
2013). The synthetic process captures carbon- conceptual fundamentals that a GIS system
based emissions at the point of production from developed to analyze climate change usually
industrial facilities that produce the emissions revolve around the following abilities.
and transport it deep underneath land or sea
where it may dissolve or be stored inde nitely Mapping
(Katzer et al. 2007). A GIS system for climate change analysis should
Both the natural and synthetic carbon seques- have the ability to render a data canvas of the
tration processes require accurate calculations geographical region of interest or global map
and depictions of the amount of CO2 being cap- where data overlays can be produced based on
tured and/or stored to determine the effectiveness appropriate data streams to represent appropriate
of the mitigation (Freedman 2014; Alexander depictions of the said data.
2013; Katzer et al. 2007). To achieve this feat
synthetically, scientists need to mathematically Gridding and Regridding
calculate the amount of CO2 in units of metric Gridded data can be high-resolution images of
tons that can be properly captured and stored a certain geographical region that does not give
under the planet s land and sea without causing the total perspective of surrounding regions due
adverse effects to the environment. The terrestrial to computational or storage limitations. Segments
or natural approach would require scientists to or fragments of a larger overall high-resolution
determine the amount of CO2 that plant life from image are provided, which is a part of a se-
forested areas can capture and store the emissions quenced grid of neighboring images that can be
to achieve a substantial mitigation to climate examined individually. Due to the nature of the
change (Freedman 2014; Katzer et al. 2007). This high-resolution image, data overlays, points of
is evidenced by the rati cation of the REDD+ interests, and data streams can still be integrated
policy mentioned in the previous section. using GIS technologies but only speci c to the
gridded image provided (Shea 2014; Reynolds
and Smith 1994).
Geographical Information Systems Regridding refers to the interpolation of
The scienti c analysis techniques, data sources, one grid resolution image to a different grid
and respective stakeholder interests in climate resolution image, usually that of a sequence that
change mitigation have created a demand for depicts the immediate neighboring resolutions of
platforms that can dynamically bring together a speci c geographical region. Different methods
the different aspects that are required to perform such as temporal, vertical, or horizontal inter-
effective climate change mitigation analysis. Ad- polation is used to combine the resolutions, but
vances in information technology, accessibility to most commonly spatial (horizontal) interpolation
data sources, and economic costs of data storage is utilized (Shea 2014; Reynolds and Smith
have allowed for the availability of Geographi- 1994). Depending on the type of analysis and
cal Information Systems (GIS) platforms to be data, appropriate interpolation techniques are
developed for robust analysis requirements of required. To perform quantitative analysis on
climate change mitigation research. GIS serves data points across many gridded resolutions,
as a tool for scienti c-based climate research by regretting across a common grid is required
practically combining the many different scien- to avoid misleading numerical calculations
ti c analysis techniques with appropriate data among the data from different grid images.
streams and visualizations to provide data-driven GIS applications and platforms provide many
insights for climate change stakeholders. different interpolation techniques for regridding
Climate Risk Analysis for Financial Institutions 223
which allows for more accurate data analysis climate change (Rosenqvist et al. 2003; Reynolds
(Shea 2014; Reynolds and Smith 1994). This is and Smith 1994; Gibbs et al. 2007; Palmer Fry
a crucial tool that can ensure accuracy of very 2011).
computationally large amounts of geographical
data. Verification
The ability to monitor and report on different
Monitoring and Measurement aspects of climate change based on statistical C
GIS applications and systems can be con gured models or projections derived from historical data
to dynamically operate with real-time data may not always accurately portray the actual
streams from third-party data vendors or remote observational data. Scienti c analysis requires
sensors that may provide climate-based or corroborated ground truth data to validate if the
emissions-based information. A platform that data models developed from historical data or
can actively receive the data streams from the data from a different region is statistically sig-
sensors and spatially visualize and overlay ni cant enough to be accurate. Veri cation is a
the data on a geographical plane relative to critical factor in climate change mitigation poli-
the sensor s logistical location can provide cies due to the reliance on the ability to cor-
an automated monitoring system to detect rectly determine climate change and emissions
potentially interesting climate or emissions levels to properly incentivize global participants
patterns which can be practically interpreted to achieve the common goal to degrade atmo-
depending on stakeholder interests (Rosenqvist spheric temperature increases (Moore 2012). To
et al. 2003; Reynolds and Smith 1994; Gibbs achieve ground truth validations, climate change
et al. 2007; Palmer Fry 2011). The monitoring mitigation policies emphasize the requirement of
aspect of GIS indicates the ability to process large approved sensors that can accurately verify the
amounts of data, store the data, and visualize integrity of measurements taken at the point of
the data in minimal amounts of time to provide production. This can be interpreted as remote
insight to the stakeholder. Without this aspect sensors that are capable of measuring the ground
or ability of a GIS platform or system, climate truth data that is required in climate-based anal-
change analysis techniques would not bene t ysis scenarios. (Rosenqvist et al. 2003; Reynolds
greatly from GIS technologies. and Smith 1994; Gibbs et al. 2007; Palmer Fry
2011). GIS systems need to be scalable and
Reporting adaptable to incorporate regulatory ground truth
The ability to retrieve information and analysis data or provide the appropriate information tech-
dynamically in an easy to interpret format is nology that meet the standards of climate mitiga-
a key fundamental for a GIS system that may tion policies.
be developed for the purposes of climate anal-
ysis. The reporting mechanism allows the user
of the system to gather important data and in- Key Applications
telligence that could lead insight-driven decision.
Both monitoring and reporting are crucial aspects Some of the aspects of climate change mitigation
of a GIS system designed for climate analysis policies discussed offer nancial instruments,
due to the fact that reporting is based on the data incentives, and platforms for interested investors,
derived from monitoring, and the insights from impacted industrial stakeholders and participat-
reporting are the primary output that analysis will ing nations to explore opportunities and strategies
be conducted on. Inaccuracies or inconsistencies that can directly, indirectly, or residually impede
in reporting may deem the GIS system obsolete, the global temperature increase. Stakeholders
but accurate reporting could mean a substantial who may decide to participate in the incentives
increase in productivity, ef ciency, and progress offered by climate mitigation polices are aware
in conducting relevant analysis and research on that proper knowledge and analysis of climate
224 Climate Risk Analysis for Financial Institutions
change aspects that may be related to respective Joint Implementation programs, have reached bil-
interests may provide a competitive edge for lions of dollars a year on average which is fore-
potential investment decisions (Kossoy and casted to grow into the trillions in the near future
Guigon 2012). This section will explore some (Buchner et al. 2011; Moore 2012). To effectively
practical examples of Geographical Information monitor the funding needs, progress, success,
Systems that perform climate science-related and completion of projects and programs, the
analysis and their application to different intermediaries of the climate nance framework
nancial investment research. rely on GIS-based tools and analysis to make
informed data-driven decisions.
Climate Finance
The term climate nance represents the nancial Carbon Finance
mechanisms set in place by climate change The UNFCCC stipulations of pollution and emis-
mitigation policies, such as Kyoto Protocol and sions control creates a realm in which carbon
Copenhagen Accord, which allow for national, footprints and greenhouse gases are constrained
regional, and international parties to have access to limit the potential increase in global climate
to nancing channels speci cally for climate change (Moore 2012). This constraint creates a
change mitigation and adaptation projects and commodity out of the amount of carbon-based
programs (Kossoy and Guigon 2012; Buchner or GHG emissions permitted for industrial and
et al. 2011). These projects and programs are national interests (Raufer and Iyer 2012). Climate
developed based on achieving minimal carbon- mitigation policy frameworks have promoted the
based emissions footprints and resiliency to investments in projects and programs that reduce
climate change through appropriate research the previously mentioned GHG emissions as well
and economic development. The term had been as provided a platform where the commodity of
originally coined to refer to the obligations that allowed emissions amounts are monetized into -
developed countries committed to developing nancial instruments that are tradable in a market-
countries under the rati ed UNFCCC policies; based cap and trade framework. The platform
however, the term is now more synonymous with where the commoditized emissions allowances
the all nancial procedures and ows relating are exchanged is typically referred to the carbon
to climate change mitigation and adaptation market, while the overall concept of investing and
projects and programs (Kossoy and Guigon 2012; trading these commodities can be represented by
Buchner et al. 2011). Financial funding can be the term carbon nance. (Moore 2012; Raufer
provided from government budgets, domestic and Iyer 2012; Kossoy and Guigon 2012).
budgets, capital markets, and public and/or pri- Carbon nance leverages the Kyoto Protocol s
vate sectors mediated through bilateral nancial Clean Development Mechanisms and Joint
institutions, multilateral nancial institutions, Implementation framework to help facilitate the
and development cooperation agencies or directly investments into emissions reductions projects
from the UNFCC itself via the Green Climate to earn or trade emissions allowances or credits
Fund, NGOs(nongovernmental organizations), (Kossoy and Guigon 2012; Henr quez 2013). The
and/or private sector. Investments decisions and World Bank facilitates carbon nance through its
strategies in renewable energy can potentially own carbon nance unit which purchases carbon
be considered climate nance if the renewable credits or GHG emissions reductions generated
energy projects and programs qualify under the from projects or programs in developing
UNFCC guidelines (Kossoy and Guigon 2012; countries or transitioning economies to their
Buchner et al. 2011). fund contributors that employ their services,
The nancing projects and programs designed usually in the form of governments or companies
to mitigate or adapt the effects of climate change, with an interest in attaining or trading the
such as the previously discussed Carbon Offset carbon credits (Lewis 2010). The World Bank
Programs, Clean Development Mechanisms, and can achieve this by providing carbon funds
Climate Risk Analysis for Financial Institutions 225
continuity and prosperity which can be trans- GIS systems and tools are used to provide mean-
lated into long-term con dence for investors. To ingful insight and intelligence for monitoring,
achieve such assessments, GIS tools and analysis reporting, and veri cation applications of climate
can be employed to analyze environmental risk change analysis. GIS technologies and systems
factors to business operations, supply chains, and may combine climate-related research and anal-
other applicable business assets. Forecasting and ysis data sources on geographical planes that
simulation models of risk factors are considered can help perform traditional analytical techniques
as well as nancial burdens that may be ex- to yield data models that can be used to make
perienced by the impacted business (Zu 2013). strategic decisions. The continued emergence and
After examination of each business process and demand for GIS and geospatial analysis skills
potential environmental risks that may impact in the climate science and investment research
them are analyzed, strategies are developed to markets can be expected to grow as the relevance
minimize the risks (Zu 2013). Integrated tech- of climate-related applications, such as climate
nologies utilizing GIS-based analysis and data nance, carbon nance, and sustainability man-
management tools can be employed to conduct agement, increases.
automated monitoring, auditing, and reporting Important trends in climate change research,
on sustainability models and goals to achieve GIS, and nancial applications are brie y dis-
compliance. The identi cation of potential risks cussed in the following sections.
and issues impacting business interests early on
can help the business maneuver its directional Mobile GIS Remote Sensor Networks
strategy to avoid costly regulatory failures or Mobile GIS remote sensor networks are consid-
repetitional damage (Zu 2013; Schmiedeknecht ered an important topic in both climate research
2013). and GIS. To optimally and ef ciently design,
Sustainability Risk Management extends monitoring, reporting, and veri cation GIS sys-
into the nancial markets by allowing orga- tems that can potentially be incorporated into
nizations that satisfy corporate sustainability REDD+ projects and programs or other climate
assessments to be held in Sustainability Indices nance-funded projects are crucial to attain accu-
(Schmiedeknecht 2013). Sustainability indices rate data at the highest integrity standards (Samek
represent an index of organizations considered to et al. 2013; Rosenqvist et al. 2003; Patenaude
be socially responsible, environmentally friendly, et al. 2004). Much research is being conducted to
and sustainable in the event of environmental optimize and propose equipment and techniques
risks. Investment rms that offer Sustainability to achieve an economically and practically fea-
Indices may market them as safer and resilient sible approach to achieving GIS remote sensor
to climate change to potential investors who seek networks that can potentially become a standard-
investment con dence relative to environmental ized method to collect and validate data such as
risks (Schmiedeknecht 2013). Organizations who emissions, carbon storage, carbon sequestration
are able to reach Sustainability Indices may be rates, air temperatures, and other climate change-
considered a safer investment option compared related datapoint.
to organizations that cannot achieve the same
quali cations. Data-Driven GIS Decision-Making Tools
GIS systems that can properly collect data from
multiple data sources and conduct application-
Future Directions speci c analysis on the said data with potential
business logic are an area that climate change
Climate change analysis and Geographical Infor- stakeholders are seeking to expand (Sizo
mation Systems share a relationship that will only et al. 2014; Benz et al. 2004; Ganguly et al.
evolve as the awareness and applications of cli- 2005). Climate-based nancial and regulatory
mate change mitigation become more prevalent. applications to automate business intelligent GIS
Climate Risk Analysis for Financial Institutions 227
systems that can perform dynamic analytical Freedman B (2014) Maintaining and enhancing ecological
observations to yield insights to assist in carbon sequestration. In: Freedman B (ed) Global
environmental change. Handbook of global environ-
decision-making scenarios can directly provide mental pollution, vol 1. Springer, Berlin/Heidelberg,
value-added service for climate change stake- pp 783 801
holders. Automated sustainability assessments Ganguly AR, Gupta A, Khan S (2005) Data min-
for organizations or GIS-based systems that can ing technologies and decision support systems for
signal important investment research analysis are
business and scienti c applications. In: Encyclope- C
dia of data warehousing and mining. Idea Group
some of the many applications that data-driven Publishing
geospatial analysis and technology is making Gibbs HK, Brown S, Niles JO, Foley JA (2007) Mon-
available to the climate research and nance- itoring and estimating tropical forest carbon stocks:
making REDD a reality. Environ Res Lett 2(4):045023
based industries (Tomlinson 2007; Zu 2013). Hansen JE, Ruedy R, Sato M, Lo K (2006) NASA GISS
surface temperature (GISTEMP) analysis. Trends: a
compendium of data on global change
Henr quez BLP (2013) Environmental commodities mar-
Cross-References kets and emissions trading, towards a low carbon
future. Routledge
ArcGIS: General-Purpose GIS Software Katzer J, Ansolabehere S, Beer J, Deutch J, Ellerman
Climate Adaptation, Introduction AD, Friedmann SJ, Herzog H, Jacoby HD, Joskow PL,
McRae G et al (2007) The future of coal: options for
Climate Change and Developmental Economies
a carbon-constrained world. Massachusetts Institute of
Climate Extremes and Informing Adaptation Technology, Boston
Climate Hazards and Critical Infrastructures Kossoy A, Guigon P (2012) State and trends of the carbon
Resilience market. World Bank, Washington DC
Kungvalchokechai S, Sawada H (2013) The ltering of
Data Models in Commercial GIS Systems
satellite imagery application using meteorological data
Financial Asset Analysis with Mobile GIS aiming to the measuring, reporting and veri cation
Geosensor Networks, Formal Foundations (MRV) for REDD. Asian J Geoinf 13(3)
GPS Data Processing for Scienti c Studies of Lewis JI (2010) The evolving role of carbon nance in
promoting renewable energy development in China.
the Earth s Atmosphere and Near-Space Envi- Energy Policy 38(6):2875 2886
ronment Litterman B (2013) What is the right price for carbon
emissions. Regulation 36:38
Moore C (2012) Climate change legislation: current
developments and emerging trends. In: Chen W-Y,
References Seiner J, Suzuki T, Lackner M (eds) Handbook of cli-
mate change mitigation. Springer, Berlin/Heidelberg,
Alexander S (2013) Reducing emissions from deforesta- pp 43 87
tion and forest degradation. In: Finlayson M, McInnes Myhre G, Shindell D, BrØon F-M, Collins W, Fuglestvedt
R, Everard M (eds) Encyclopedia of wetlands: wetland J, Huang J, Koch D, Lamarque J-F, Lee D, Mendoza
management, vol 2. Springer, Berlin/Heidelberg B, Nakajima T, Robock A, Stephens G, Takemura T,
Baranzini A, Carattini S (2014) Taxation of emissions Zhang H (2013) Climate change 2013: the physical
of greenhouse gases. In: Freedman B (ed) Global science basis. Contribution of working group I to
environmental change. Handbook of global environ- the fth assessment report of the intergovernmental
mental pollution, vol 1. Springer, Berlin/Heidelberg, panel on climate change, book section 8. Cambridge
pp 543 560 University Press, Cambridge/New York, pp 659 740
Baumast A (2013) Carbon disclosure project. In: Id- Nzunda EF, Mahuve TG (2011) A swot analysis of mit-
owu SO, Capaldi N, Zu L, Gupta AD (eds) Ency- igation of climate change through REDD. In: Filho
clopedia of corporate social responsibility. Springer, WL (ed) Experiences of climate change adaptation
Berlin/Heidelberg, pp 302 309 in Africa. Climate change management. Springer,
Benz UC, Hofmann P, Willhauck G, Lingenfelder I, Berlin/Heidelberg, pp 201 216
Heynen M (2004) Multi-resolution, object-oriented Palmer Fry BP (2011) Community forest monitoring
fuzzy analysis of remote sensing data for gis-ready in REDD+: the M in MRV? Environ Sci Policy
information. ISPRS J Photogramm Remote Sens 14(2):181 187
58(3):239 258 Patenaude G, Hill RA, Milne R, Gaveau DLA, Briggs
Buchner B, Falconer A, HervØ-Mignucci M, Trabacchi C, BBJ, Dawson TP (2004) Quantifying forest above
Brinkman M (2011) The landscape of climate nance. ground carbon content using LiDAR remote sensing.
Climate Policy Initiative, Venice, p 27 Remote Sens Environ 93(3):368 380
228 Climate Risks
This article surveys existing spatial cloaking Spatial cloaking is a technique to blur a user s
techniques for preserving users location privacy exact location into a spatial region in order to
in location-based services (LBS) where users preserve her location privacy. The blurred spatial
have to continuously report their locations to the region must satisfy the user s speci ed privacy
database server in order to obtain the service. requirement. The most widely used privacy re- C
For example, a user asking about the nearest gas quirements are k-anonymity and minimum spa-
station has to report her exact location. With tial area. The k-anonymity requirement guar-
untrustworthy servers, reporting the location antees that a user location is indistinguishable
information may lead to several privacy threats. among k users. On the other hand, the minimum
For example, an adversary may check a user s spatial area requirement guarantees that a user s
habit and interest by knowing the places she exact location must be blurred into a spatial
visits and the time of each visit. The key idea of region with an area of at least A, such that the
a spatial cloaking algorithm is to perturb an exact probability of the user being located in any point
1
user location into a spatial region that satis es within the spatial region is A . A user location
user speci ed privacy requirements, e.g., a k- must be blurred by a spatial cloaking algorithm
anonymity requirement guarantees that a user is either on the client side or a trusted third party
indistinguishable among k users. before it is submitted to a location-based database
server.
Cross-References
Historical Background
Location-Based Services: Practices and Prod-
ucts The emergence of the state-of-the-art location-
Privacy Preservation of GPS Traces detection devices, e.g., cellular phones, global
positioning system (GPS) devices, and radio-
frequency identi cation (RFID) chips, has
resulted in a location-dependent information
access paradigm, known as location-based
Cloaking Algorithms for Location services (LBS). In LBS, mobile users have the
Privacy ability to issue snapshot or continuous queries
to the location-based database server. Examples
Chi-Yin Chow of snapshot queries include where is the nearest
Department of Computer Science, City gas station and what are the restaurants within
University of Hong Kong, Hong Kong, China one mile of my location, while examples of
continuous queries include where is the nearest
police car for the next one hour and continuously
Synonyms report the taxis within one mile of my car
location. To obtain the precise answer of these
Anonymity; Location anonymization; Location queries, the user has to continuously provide
blurring; Location perturbation; Location-based her exact location information to a database
services; Location-privacy; Nearest neighbor; server. With untrustworthy database servers, an
Peer to peer; Privacy adversary may access sensitive information about
230 Cloaking Algorithms for Location Privacy
F Users A B C D E F
4
E
D ku 6 2 2 3 3 3
C
3 Rank(u) 0 1 2 3 4 5
B
Start(u) 0 0 2 3 3 3
2
End(u) 5 1 3 5 5 5
b
Hash Table The Entire System Area (level 0)
UID CID
... 2 2 Grid Structure (level 1)
...
... 4 4 Grid Structure (level 2)
...
... ...
... ...
... ...
... ... 8 8 Grid Structure (level 3)
spatial cloaked area is computed as the region that answers, it forwards the candidate answers to the
covers the entire group of peers. Figure 4 gives mobile user A. Finally, the mobile user A gets
an illustrative example of peer-to-peer spatial the actual answer by ltering out all the false
cloaking. The mobile user A wants to nd her positives.
nearest gas station while being ve anonymous,
i.e., the user is indistinguishable among ve
users. Thus, the mobile user A has to look around Key Applications
and nd four other peers to collaborate as a
group. In this example, the four peers are B, C , Spatial cloaking techniques are mainly used to
D, and E. Then, the mobile user A cloaks her preserve location privacy, but they can be used in
exact location into a spatial region that covers the a variety of applications.
entire group of mobile users A, B, C , D, and
E. The mobile user A randomly selects one of
the mobile users within the group as an agent. Location-Based Services
In the example given in Fig. 4, the mobile user Spatial cloaking techniques have been widely
D is selected as an agent. Then, the mobile user adopted to blur user location information before
A sends her query (i.e., what is the nearest gas it is submitted to the location-based database
station) along with her cloaked spatial region to server, in order to preserve user location privacy
the agent. The agent forwards the query to the in LBS.
location-based database server through a base
station. Since the location-based database server Spatial Database
processes the query based on the cloaked spatial Spatial cloaking techniques can be used to deal
region, it can only give a list of candidate answers with some speci c spatial queries. For example,
that includes the actual answers and some false given an object location, nd the minimum area
positives. After the agent receives the candidate which covers the object and other k 1 objects.
234 Cloaking Algorithms for Location Privacy
A
Base
Station B
D
E
the accuracy of user locations to provide international conference on data engineering (ICDE),
their valuable services. In order to convince Atlanta
Meyerson A, Williams R (2004) On the complexity of
users to participate in these systems, certain optimal K-anonymity. In: Proceedings of the ACM
privacy guarantees should be imposed on symposium on principles of database systems (PODS),
their behavior through guaranteeing the Paris, pp 223 228
privacy of their location-based queries even Mokbel MF, Chow CY, Aref WG (2006) The new casper:
though their locations will be revealed.
query processing for location services without com- C
promising privacy. In: Proceedings of the international
conference on very large data bases (VLDB), Seoul,
pp 763 774
Sweeney L (2002a) Achieving k-anonymity privacy pro-
Cross-References tection using generalization and suppression. Int J
Uncertain Fuzziness Knowl Sys 10(5):57 88
Location-Based Services: Practices and Prod- Sweeney L (2002b) k-anonymity: a model for protecting
ucts privacy. Intern J Uncertain Fuzziness Knowl-based
Syst 10(5):557 570
Privacy and Security Challenges in GIS
Privacy Preservation of GPS Traces
References
Close Range
Raster models are best suited to represent SQL for traditional relational databases (ISO
data that vary continuously, for example, aerial 2008).
and satellite imagery or elevation surfaces. The Recently, with the rapid increase of geospatial
spatial resolution of raster data depends on the archives and the availability of large temporal
resolution of the grid and is determined at the data stacks, research efforts have focused on spa-
acquisition phase. tiotemporal clustering in order to extract mean-
Data in raster format is basically a matrix of ingful spatiotemporal patterns (Kisilevich et al. C
data points on a regular grid. It comes in various 2010b). The direct approach is to simply add
forms depending on the source, satellite imagery, the time dimension in the distance metric be-
Digital Elevation Model, or grid data in mete- tween points. With the addition of a time di-
orology. Data volumes are generally signi cant mension on a single spatial entity, the notion
as the spatial coverage can be extensive com- of trajectory appears. The time measurements
bined sometimes with a high resolution (Miller can be regular or irregular. Geo-referenced time
2010). In addition, the number of raster dimen- series are common in meteorology such as sea
sions can be fairly high such as hyperspectral surface temperature. Moving spatial objects and
imagery. trajectories such as sea surface temperature, or to
The vectorial data model relies on geometric represent moving spatial clouds. Kisilevich et al.
shapes such as points, lines, or polygons that (2010a) distinguish three kinds of spatialtemporal
can be de ned by mathematical functions: points data according to the way they are collected:
are de ned by their coordinates, latitude, and movement, cellular networks, and environmen-
longitude typically in the 2D space. Altitude or tal. Movement datasets are typically associated
depth may also be used to de ne coordinates in with location-based services or sometimes video
the 3D space. Points can be joined together in surveillance applications (Kuijpers et al. 2008);
a speci c order to de ne a line. A closed line, patterns will be formed by grouping similar tra-
where the last point corresponds to the rst point jectories (Andrienko 2008). Environmental data
of the line, de nes a polygon. The vector model collected either from a network of sensors or
is most useful to represent data with discrete and satellite imagery are used in many applications
well-de ned boundaries such as country borders, (seismology, meteorology, remote sensing, etc.).
parcels, or streets. Various data structures can
be used to store vector data, in particular the Distributed Systems for Geospatial
spaghetti model, which simply describes the ob- Big Data
jects independently of the others, and more so- Many geospatial applications (see next section)
phisticated topological models where each object now rely on massive amounts of data that may
includes information about the elements it is re- require processing in real time. It is possible to
lated to. For example, using the spaghetti model, scale a single machine vertically to some extent
a polygon is de ned by the coordinates of its by adding extra storage, more memory, or a faster
boundary points; using a topological approach, a CPU. This approach is however usually costly, is
polygon can be described as a series of connected sensible to failures, and does not scale well with
lines; each line of the polygon was previously the number of users.
de ned in the model as a series of points.
The vector data model was standardized CAP Theorem
together by the Open Geospatial Consortium To overcome those limitations, it is possible to
(OGC) and the International Organization for scale a system horizontally, that is, combine sev-
Standardization (ISO 19125). This standard eral machines to form a cluster. The cluster is
called Simple Features de nes the speci cations leveraged by distributing data and processing
of vector data (coordinates, points, lines, and algorithms across the different machines. Dis-
polygons) (ISO 2004), as well as a number of tributed systems are typically used to process
spatial operators including an extension of the amounts of data that are typically too large for a
238 Clustering of Geospatial Big Data in a Distributed Environment
single machine. Besides the sheer amount of data, ture (Stonebraker et al. 1986) where all the nodes
distributed systems have many bene ts compared are independent, which facilitates horizontal scal-
to a monolithic system, including greater relia- ability.
bility, higher availability and better performance NoSQL systems can be broadly classi ed into
depending on the use-cases. However, distributed 5 families:
systems also present a number of challenges,
network latency and hardware failure to name a Key Value (KV): the simplest form of NoSQL
few. system, where data are represented as a list
In general, distributed systems should feature of pairs <key; value>, similar to a hash ta-
the following characteristics: ble. In many systems, this list is stored in
memory for better performance. This family
Consistency (C): all nodes in the cluster see includes in particular MemcacheDB, (http://
the same data; memcachedb.org/) Redis (http://redis.io/), and
Availability (A): all requests get a success or Amazon DynamoDB (DeCandia et al. 2007).
error noti cation, even if one or several nodes Column: those systems are used to log-
are unavailable (failure or planned mainte- ically organize <key; value> pairs into
nance); tables, conceptually similar to tables in
Partition tolerance (P): the system remains the relational model. Notable column-
fully functional, even if one or several nodes based systems include Google s proprietary
are unavailable. Bigtable (Chang et al. 2006), used in
particular to index massive amounts of
However, Brewer s CAP theorem (Brewer geospatial data from Google Earth (https://
2012; Gilbert and Lynch 2002) stipulates that a www.google.com/earth/), and its open-source
distributed system can present at most two of derivatives from the Apache foundation:
the three traits above. It is therefore possible to HBase (http://hbase.apache.org/), Cassandra
design CA, CP, or AP systems, and the choice of (http://cassandra.apache.org/), and Accumulo
an architecture over another largely depends on (https://accumulo.apache.org/).
the intended usage of the system. Document: this family is the most common
among NoSQL systems and is used to store
semi-structured data, typically in XML or
Distributed Geospatial Databases JSON format. The main systems of this
The fundamental principle to allow the process- type are MongoDB (http://www.mongodb.
ing of spatial big data in a reasonable time is org/), Apache CouchDB (http://couchdb.
parallelism, which is often not trivial and re- apache.org/), and ElasticSearch (http://www.
quires new algorithms that can be distributed elasticsearch.org/).
across the different machines of the cluster. For Graph: Graph systems can ef ciently repre-
example, while many geospatial systems rely sent strongly connected data. The most popu-
on PostgreSQL (http://www.postgresql.org/) and lar graph database is Neo4J (Webber 2012).
PostGIS (http://postgis.net/) to store geospatial Constraint: Constraint databases (Kanellakis
data, the traditional relational model of database et al. 1995) rely on constraint programming
is dif cult to distribute and parallelizing complex to represent geospatial data and reason about
SQL queries is challenging. New paradigms such them. While they can represent raster data,
as Not-Only-SQL (Cattell 2011) (NoSQL) have their capabilities are best leveraged with vec-
thus emerged. Most NoSQL systems relax the torial data sets. Unlike other NoSQL fam-
transactional properties of traditional databases, ilies, the constraint programming paradigm
which guarantee consistency, to favor high avail- inherently dif cult to parallelize and distribute
ability and partition tolerance (type AP). They ef ciently. As a result, current systems do not
usually implement a Shared-Nothing architec- adopt a Shared-Nothing architecture and do
Clustering of Geospatial Big Data in a Distributed Environment 239
GeoMesa (Fox et al. 2013) to process vectorial number of independent points in a local neigh-
data and GeoTrellis (http://geotrellis.io/), more borhood which can clash with the independence
adequate for raster data. Both libraries provide assumptions of some clustering methods.
a geospatial extension of the standard Spark
RDDs. This key feature allows programmers to Density-based algorithms on the other hand
leverage other Spark features and apply them make use of the density of data points within a
to geospatial data. For example, an important region to discover the clusters. Unlike distance-
component of Spark is MLLib, a library for data based techniques, density-based algorithms can
analysis similar to Mahout. It implements several uncover clusters of various shapes but assume
key machine learning algorithms, including the that they are of similar density. The choice
k-means clustering algorithm and its streaming of a clustering algorithm therefore depends
variant to build data clusters in real time as new on the distribution of the data. Density-based
data feed the system. algorithms are easier to parallelize and more
scalable as they usually rely on local search
Distributed Clustering Algorithms techniques to identify dense regions in the
This section details properties of the algorithms feature space. One of the most popular density-
that are important for their parallelization in a based algorithm is DBSCAN (Ester et al.
distributed environment. It is however beyond 1996), and many distributed variants (He et al.
the scope of this article to extensively review all 2014; Noticewala and Vaghela 2014; Kisilevich
characteristics and use cases of each clustering et al. 2010a; Patwary et al. 2012) have been
algorithm. This article also does not aims at implemented using MapReduce and show
comparing the accuracy of those algorithms: all signi cant running-time improvements, even
clustering algorithms rely on some assumption when handling billions of data points. Another
on the distribution of the data (clusters of sim- popular example of density-based algorithm is
ilar shapes, similar densities, etc.), and the best DenClue (Hinneburg and Keim 1998) and its
method to cluster some data depends on the actual recent improvements (Hinneburg and Gabriel
distribution of the data. 2007), which is also suitable for segmenting
Clustering techniques can be categorized into and clustering raster data. More recently,
two broad categories: density-based and distance- Cludoop (Yu et al. 2015) was implemented
based algorithms. using MapReduce, and experiments on geospatial
data showed signi cant improvements in
Distance-based algorithms rely on the distance terms of performance and scalability over
between data points in the feature space to es- MR-DBSCAN (He et al. 2014).
tablish the clusters. Distance-based algorithms In addition, multiple dense regions can be
assume that clusters to nd are of similar shapes explored simultaneously by discretizing the input
and will perform well if this hypothesis is veri ed feature space into a nite number of grid cells
by the actual data. Those algorithms are also and applying the clustering method within each
well suited to cluster raster data types: While we cell. Existing algorithms include STING (Wang
can look at a raster dataset simply as a collec- et al. 1997), WaveCluster (Sheikholeslami et al.
tion of points, clustering techniques speci c to 1998; Jestes et al. 2011), and Clique (Agrawal
raster will try to take into account or enforce a et al. 1998). Parallel grid-based clustering further
certain notion of spatial homogeneity between divides cells into sub-cells, processes each sub-
neighborhood points. Spatial homogeneity that cell, and combines the individual results to
states that nearby points are more similar than far- build the nal clusters (Xiaoyun et al. 2009;
apart points is often captured via the estimation of Zhang et al. 2010). More recently, PatchWork
the local autocorrelation function (Hagenauer and (Gouineau et al. 2016) was implemented
Helbich 2013). High spatial correlation values using Apache Spark to distribute local density
between spatially close points often limit the computations and sowed signi cant performance
Clustering of Geospatial Big Data in a Distributed Environment 241
improvements over MapReduce implementations main factors that differentiate HAC methods. In
of DBScan. This approach is particularly useful single-linkage clustering, the link between two
to mine geospatial data as the cell grid can clusters is made by a single element pair of
be de ned using Hilbert or Z-order space the two elements (one in each cluster) that are
lling curves (Dai and Su 2003; Hong-bo et al. closest to each other. In complete-linkage clus-
2009) which are implemented in the distributed tering, the link between two clusters considers
GeoMesa framework (see the section above). all element pairs, and the distance between clus- C
It is also possible to further categorize clus- ters equals the distance between those two ele-
tering techniques depending on the output of the ments (one in each cluster) that are farthest away
algorithm (Hruschka et al. 2009): hierarchical or from each other. Other linkage methods such
partitioning. as UPGMA have been proposed and often used
in bioinformatics for phylogenetic studies. How-
Partitioning algorithms may de ne mutually ever, HAC algorithms rely on a global distance
exclusive hard clusters or soft clusters that al- matrix and are notoriously dif cult to parallelize.
low a certain degree of overlap measured by Furthermore, most of those algorithms have a
a membership function. Fuzzy clustering tech- computational complexity in O.N 2 log N / or
niques are typical of this kind of soft partitioning O.N 2 / with N the number of data points and
approach (Ehrlich et al. 1984). The most pop- do not scale well. Clustering of geospatial big
ular partitioning algorithm is k-means cluster- data using naive HAC algorithms will therefore
ing (MacQueen 1967; Lloyd et al. 1982): given quickly become problematic, and more sophis-
k clusters to nd, the technique determines the ticated methods have been proposed, including
centers of the clusters and updates the member- DISC (Jin et al. 2013), MR-VPSOM (Gao et al.
ship of each cluster iteratively using the distance 2010), etc. Those HAC methods can be ef -
to the center of the cluster. The approach can ciently distributed and were implemented using
be easily parallelized and distributed given that the MapReduce framework for batch computa-
computing the distance to the cluster centers has tions and NoSQL databases for the storage of
no dependencies. Several distributed implemen- large-distance matrices.
tations are available, in particular for MapReduce
and Spark as part of the Mahout and MLLib
libraries, respectively. For the same reason, dis-
Key Applications
tributed implementations of the streaming variant
of k-means are available to cluster spatiotemporal
This section presents various key use cases of
data in real time. Distributed implementations
clustering algorithms that facilitate the analysis
are also available for related algorithms such as
of spatial and spatiotemporal big data.
CLARANS (Ng et al. 2005).
Those devices are usually equipped with a engineering and planning of new infrastructures.
variety of sensors that can collect data in real For example, the city of Riyadh, Saudi Ara-
time or several times per minute. They also often bia, modeled the entire transportation network,
include a geolocation tracker or can be paired including constraints for transit time between
with a device that has a geolocation tracker. major activity centers. The goal was to ana-
Those connected devices thus generate very large lyze the network to highlight infrastructure de -
amounts of data with a strong spatiotemporal ciencies where usage exceeds capacity and pre-
component. Those devices rely on the integration dict future travel demand and potential conges-
of many spatiotemporal data sources, including tion areas under different scenarios and network
the geolocation of the user (from their connected topologies. Similar studies were conducted in
smartphone) and meteorological data. Jaipur, India (Gahlot et al. 2012), and Vancouver,
One of the key bene ts of the IoT is to enable Canada (Foth 2010), for the design of their public
machine-to-machine communication thereby transit network.
facilitating the automation of various tasks.
Automation relies on the spatiotemporal data Remote Sensing
collected by the devices, as well as rules to de ne Remote sensing is the science of obtaining infor-
triggers and actions. Web services to facilitate the mation about objects or areas from a distance,
implementation of those rules have emerged and typically from aircraft, boats, or satellites, with-
are gaining popularity as new devices become out making physical contact with the object and
connected. Some devices are now relying on thus in contrast to on-site observation. It refers
machine learning algorithms to learn how they to the use of aerial sensor technologies to detect
are being used. Connected home thermostats, for and classify objects on Earth (both on the surface
instance, are now capable of learning from the and in the atmosphere and oceans) by means
habits of the home owners to automatically de ne of propagated signals, including electromagnetic
rules and triggers to adjust the temperature. (RADAR, LiDAR, etc.), acoustic (SONAR, seis-
mograms, etc.), and geodetic (gravitational eld
Smart Cities measurement). The remote sensor can collect the
Another key application of large-scale geospatial signal passively emitted from a surface of interest
information systems is the modeling of public (e.g., a photometer measuring sunlight) or ac-
infrastructures for the development of smarter tively transmit a signal and collect its re ection
cities. Smart cities such as Barcelona, Stockholm, (e.g., RADARs in airplanes).
or Montreal heavily rely on digital technologies Remote sensing has an immense range of
to reduce resource consumption and to engage applications: agriculture (e.g., crop monitoring),
more effectively with their citizens. geology (terrain analysis, topography, etc.), hy-
An example of smart city projects, which also drology ( ood monitoring), environment (sea ice
relies on the Internet of Things, is the pub- coverage, biomass mapping, forestry, land us-
lic bicycle sharing systems such as Bixi (http:// age, etc.), and oceanography (oil spill detection,
www.publicbikesystem.com/) that are implanted tsunami detection, phytoplankton concentration,
in many large cities. In such a system, bikes etc.), to name a few.
are equipped with GPS trackers, allowing the For example, many oceanographic character-
operator to monitor the usage and adjust the istics (such as currents) vary over both time and
service accordingly (Wood et al. 2011). Studies space. At a xed location, an important spatial
of the usage of public bicycle sharing systems us- coordinate is the vertical axis through the wa-
ing spatial clustering algorithms (Austwick et al. ter column or pro le from the surface to
2013) were also conducted to reveal structures of the ocean bottom. An individual pro le can be
social communities in major cities. viewed as a vertical-line plot. A time series of
As another example, distributed computing pro les is best viewed by stacking sequential
systems could also be leveraged to optimize the pro les next to each other to form an image.
Clustering of Geospatial Big Data in a Distributed Environment 243
Remote sensors thus usually produce massive mation obtained from a group of people, can
amounts data of type raster, which may also be provide information similar to those obtainable
combined with data from other on-site sensors. from a sensor network. Twitter (https://twitter.
Distributed clustering algorithms (Lv et al. 2010) com/), one of the most popular social networks,
and color coding then reveal both the vertical allows people to share short messages in real
and temporal structure of the measured quantity, time, many of them are now associated with a
which depends on the sensor. geolocation. Using geospatial data mining and C
Ocean Networks Canada has developed sev- natural language processing techniques, it is thus
eral types of sensors, including an active zoo- possible to leverage Twitter as an effective data
plankton acoustic pro ler (ZAP). This sensor source for social sensing. This approach has been
emits an acoustic pulse through water; when it successfully applied to the identi cation of out-
encounters shes, suspended particulate, or zoo- breaking seismic events (Avvenuti et al. 2014):
plankton oating in the water, a part of the sound the system is able to detect earthquake within
is re ected back. By gating the re ected signals seconds of the event and to notify people far
in time, the vertical distribution of scatterers is earlier than of cial channels.
recorded and provides useful information about
marine life.
Future Directions
Medical Area and Disaster Monitoring
As transportation means have allowed to travel While data clustering has been extensively stud-
faster around the globe, more effective infection ied and proved to be tremendously useful in data
monitoring tools are needed to help in the control mining and knowledge discovery, clustering of
of disease outbreaks as illustrated by the recent geospatial big data presents several challenges to
H1N1 u and Ebola pandemics. The ability to be addressed in the future.
quickly analyze the evolution of the disease, and First, an increasing number of geospatial ap-
to discover patterns in the data, is critical to plications are now generating very large vol-
understand the root cause of the pandemics and umes of data. Those applications include remote
take appropriate measures to control an emerging sensing, drones, and the Internet of Things. The
disease situation and prevent their further spread. number of sensors that collect geocoded data is
Epidemiological data consist of spatiotempo- increasing exponentially, from 500 million de-
ral data describing the evolution of the disease vices in 2003 to an anticipated 50 billion sen-
in both space and time. Key challenges of epi- sors in the next 5 years, resulting in volumes
demiological data are the ability of analyzing new of data far exceeding the computing power of
trends and patterns in pseudo-real time as the a single machine. However, many clustering al-
disease spreads, as well as the recursive nature gorithms which were developed in the past two
of those patterns, that is, patterns from previous decades rely on an iterative approach, which is
pandemics are likely to give important clues for inherently dif cult to parallelize and distribute
the prediction of the evolution of the current ef ciently. While a few parallelized variants of
pandemics. Note that those challenges are not popular algorithms, such as k-means, have been
unique to pandemics, and other disasters that proposed and implemented using the MapReduce
have strong geospatial and temporal dimensions, paradigm, the variety of geospatial clustering
such as tornadoes, water ooding, or oil spills, algorithms that can be ef ciently distributed is
share the same characteristics. limited. As a result, scaling a system horizontally
Crisis detection and management can also be to accommodate larger amounts of data or to
facilitated by integrating traditional geospatial reduce the running time of the algorithms remains
data sources with alternative sources, in partic- challenging.
ular from social media. The underlying idea is In addition to the rapidly increasing num-
that social sensing, which is the set of infor- ber of sensors, a large portion of those sensors
244 Clustering of Geospatial Big Data in a Distributed Environment
can collect data several times per second or per management of data (SIGMOD 98), New York. ACM,
minute: geospatial datasets thus often present a pp 94 105
Alam S, Dobbie G, Koh YS, Riddle P, Rehman SU
strong temporal dimension. A growing number (2014) Research on particle swarm optimization based
of applications require real-time or near real- clustering: a systematic review of literature and tech-
time processing of those spatiotemporal data, for niques. Swarm Evol Comput 17(0):1 13
example, traf c optimization or crime prevention Andrienko G (2008) Spatio-temporal aggregation for vi-
sual analysis of movements. In: Proceedings of IEEE
in smart cities. For those applications, a batch- symposium on visual analytics science and technology
oriented approach to distributed computing such (VAST 2008), Columbus, pp 51 58
as the popular MapReduce paradigm is not suit- Austwick MZ, O Brien O, Strano E, Viana M (2013)
able because of latency issues. Alternative dis- The structure of spatial networks and communities in
bicycle sharing systems. PLoS ONE 8(9):e74685, 09
tributed computing frameworks such as Apache Avvenuti M, Cresci S, Marchetti A, Meletti C, Tesconi M
Storm or Spark signi cantly can handle continu- (2014) Ears (earthquake alert and report system): a
ous streams of data and reduce the latency of the real time decision support system for earthquake cri-
system. However, very few clustering algorithms sis management. In: Proceedings of the 20th ACM
SIGKDD international conference on knowledge dis-
suitable for geospatial applications have been covery and data mining (KDD 14), New York. ACM,
implemented for those frameworks so far. pp 1749 1758
Last, most popular distributed computing Brewer E (2012) Cap twelve years later: how the rules
frameworks, Hadoop and Spark in particular, have changed. Computer 45(2):23 29
Cattell R (2011) Scalable SQL and NoSQL data stores.
were developed only recently and are still under SIGMOD Rec 39(4):12 27
active development. The sets of features of Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA,
those systems are often not stable or mature Burrows M, Chandra T, Fikes A, Gruber RE (2006)
yet. In addition, with the notable exception of Bigtable: a distributed storage system for structured
data. In: Proceedings of the 7th symposium on oper-
Accumulo, most distributed systems have not yet ating systems design and implementation (OSDI 06),
emphasized development on data access control Berkeley. USENIX Association, pp 205 218
and privacy concerns, which can be critical for Chen X, Vo H, Aji A, Wang F (2014) High performance
geospatial applications. integrated spatial big data analytics. In: Proceedings
of the 3rd ACM SIGSPATIAL international workshop
on analytics for big geospatial data (BigSpatial 14),
New York. ACM, pp 11 14
Dai H-K, Su H-C (2003) Approximation and analyti-
Cross-References cal studies of inter-clustering performances of space-
lling curves. In: Banderier C, Krattenthaler C (eds)
Big Data and Spatial Constraint Databases Discrete random walks (DRW 03), Paris, Sept 1 5
2003. Discrete mathematics and theoretical computer
Distributed Geospatial Computing (DGC)
science proceedings, vol AC. DMTCS, pp 53 68
Irregular Shaped Spatial Clusters: Detection Daschiel H, Datcu M (2005) Information mining in re-
and Inference mote sensing image archives: system evaluation. IEEE
k-NN Search in Time-dependent Road Net- Trans Geosci Remote Sens 43(1):188 199
Dean J, Ghemawat S Mapreduce: simpli ed data process-
works
ing on large clusters. In: Proceedings of the 6th con-
Movement Patterns in Spatio-Temporal Data ference on symposium on opearting systems design &
Outlier Detection implementation (OSDI 04), vol 6, Berkeley. USENIX
Outlier Detection, Spatial Association, pp 10 10
DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lak-
Patterns, Complex
shman A, Pilchin A, Sivasubramanian S, Vosshall P,
Vogels W (2007) Dynamo: Amazon s highly available
key-value store. In: Proceedings of twenty- rst ACM
SIGOPS symposium on operating systems principles
References (SOSP 07), New York. ACM, pp 205 220
Ehrlich R, Bezdek JC, Fullh W (1984) Fcm: the
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) fuzzy c-means clustering algorithm. Comput Geosci
Automatic subspace clustering of high dimensional 10(2 3):191 203
data for data mining applications. In: Proceedings of Eldawy A, Mokbel MF (2015) Spatialhadoop: a mapre-
the 1998 ACM SIGMOD international conference on duce framework for spatial data. In: Proceedings of the
Clustering of Geospatial Big Data in a Distributed Environment 245
31st IEEE international conference on data engineer- ISO (2004) Geographic information simple feature
ing (ICDE), Seoul access Part 1: common architecture. ISO 19125
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density- 1:2004, International Organization for Standardiza-
based algorithm for discovering clusters in large spa- tion, Geneva
tial databases with noise. In: Simoudis E, Han J, ISO (2008) Geographic information simple feature
Fayyad UM (eds) Second international conference on access Part 2: SQL option. ISO 19125 2:2004, In-
knowledge discovery and data mining. AAAI Press, ternational Organization for Standardization, Geneva
Palo Alto, pp 226 231 Jestes J, Yi K, Li F (2011) Building wavelet histograms C
Foth N (2010) Long-term change around skytrain stations on large data in mapreduce. Proc VLDB Endow
in Vancouver, Canada: a demographic shift-share anal- 5(2):109 120
ysis. Geograph Bull 51:37 52 Jin C, Patwary MMA, Agrawal A, Hendrix W, Liao W-k,
Fox A, Eichelberger C, Hughes J, Lyon S (2013) Choudhary A (2013) Disc: a distributed single-linkage
Spatio-temporal indexing in non-relational distributed hierarchical clustering algorithm using mapreduce. In:
databases. In: 2013 IEEE international conference on Proceedings of the 4th international SC workshop on
big data, Santa Clara, pp 291 299 data intensive computing in the clouds, Denver. (http://
Gahlot V, Swami BL, Parida M, Kalla P (2012) User datasys.cs.iit.edu/events/DataCloud2013/)
oriented planning of bus rapid transit corridor in GIS Jin C, Liu R, Chen Z, Hendrix W, Agrawal A, Choudhary
environment. Int J Sustain Built Environ 1:102 109 A (2015) A scalable hierarchical clustering algorithm
Gao H, Jiang J, She L, Fu Y (2010) A new agglomera- using spark. In: IEEE rst international conference on
tive hierarchical clustering algorithm implementation big data computing service and applications, Redwood
based on the map reduce framework. J Digit Content City, pp 418 426
Technol Appl 4(3):95 100 Kanellakis PC, Kuper GM, Revesz P (1995) Constraint
Ghemawat S, Gobioff H, Leung S-T (2003) The google query languages. J Comput Syst Sci 51(1):26 52
le system. In: Proceedings of the 19th ACM sympo- Kisilevich S, Mansmann F, Keim D (2010a) P-dbscan: a
sium on operating systems principles (SOSP 03), New density based clustering algorithm for exploration and
York. ACM, pp 29 43 analysis of attractive areas using collections of geo-
Gilbert S, Lynch N (2002) Brewer s conjecture and the tagged photos. In: Proceedings of the 1st international
feasibility of consistent, available, partition-tolerant conference and exhibition on computing for geospatial
web services. SIGACT News 33(2):51 59 research & application (COM.Geo 10), Wash-
Gouineau F, Landry T, Triplet T (2016) PatchWork: a scal- ington, DC. ACM, Springer, pp 1 4. (http://www.
able density-grid clustering algorithm. In: Proceedings springer.com/us/book/9780387098227)
of the 31st ACM symposium on applied computing, Kisilevich S, Mansmann F, Nanni M, Rinzivillo S (2010b)
data mining track, Pisa Spatio-temporal clustering. In: Maimon O, Rokach L
Hagenauer J, Helbich M (2013) Contextual neural gas for (eds) Data mining and knowledge discovery hand-
spatial clustering and analysis. Int J Geograph Inf Sci book. Springer, pp 855 874. http://www.springer.com/
27:251 266 us/book/9780387098227
He Y, Tan H, Luo W, Feng S, Fan J (2014) MR-DBSCAN: Kuijpers B, Alvares LO, Palma AT, Bogorny V (2008) A
a scalable mapreduce-based DBSCAN algorithm clustering-based approach for discovering interesting
for heavily skewed data. Front Comput Sci 8(1): places in trajectories. In: Proceedings of the 2008
83 99 ACM symposium on applied computing, Fortaleza,
Hinneburg A, Gabriel H-H (2007) Denclue 2.0: fast clus- pp 863 868
tering based on kernel density estimation. In: Proceed- Lloyd S (1982) Least squares quantization in PCM. IEEE
ings of the 7th international conference on intelligent Trans Inf Theory 28(2):129 137
data analysis (IDA 07). Springer, Berlin/Heidelberg, Lv Z, Hu Y, Zhong H, Wu J, Li B, Zhao H (2010) Parallel
pp 70 80 k-means clustering of remote sensing images based on
Hinneburg A, Keim DA (1998) An ef cient approach to MapReduce. In: Proceedings of the 2010 international
clustering in large multimedia databases with noise. conference on web information systems and mining
In: Agrawal R, Stolorz PE, Piatetsky-Shapiro G (eds) (WISM 10). Springer, Berlin/Heidelberg, pp 162 170
Proceedings of the fourth international conference on MacQueen J (1967) Some methods for classi cation and
knowledge discovery and data mining (KDD-98), New analysis of multivariate observations. In: Proceedings
York, 27 31 Aug 1998. AAAI Press, pp 58 65 of the 5th Berkeley symposium on mathematical statis-
Hong-bo X, Zhong-xiao H, Qi-Long H (2009) A clus- tics and probability, Berkeley/Los Angeles
tering algorithm based on grid partition of space- Miller HJ (2010) The data avalanche is here. Shouldn t we
lling curve. In: 2009 fourth international conference be digging? J Reg Sci 50:181 201
on internet computing for science and engineering Ng RT, Han J, Ieee Computer Society (2005) Clarans: a
(ICICSE), Harbin, pp 260 265 method for clustering objects for spatial data mining.
Hruschka ER, Campello RJGB, Freitas AA, de Carvalho IEEE Trans Knowl Data Eng 1003 1017
ACPLF (2009) A survey of evolutionary algorithms Noticewala M, Vaghela D (2014) Article: Mr-idbscan:
for clustering. IEEE Trans Syst Man Cybern Part C ef cient parallel incremental dbscan algorithm using
Appl Rev 39(2):133 155 mapreduce. Int J Comput Appl 93(4):13 18
246 Cognition
Cognition Co-location
Cross-References
Synonyms
Patterns, Complex
Collocation pattern; Spatial association pattern Retrieval Algorithms, Spatial
Definition
Co-location Pattern Discovery
A (spatial) co-location pattern P can be modeled
by an undirected connected graph where each Wei Hu
node corresponds to a nonspatial feature and each International Business Machines Corp.,
edge corresponds to a neighborhood relationship Rochester, MN, USA
between the corresponding features. For exam-
ple, consider a pattern with three nodes labeled
timetabling, weather, and ticketing and two Synonyms
edges connecting timetabling with weather
and timetabling with ticketing. An instance Co-location mining; Co-location rule discovery;
of a pattern P is a set of objects that satisfy Co-location rule nding; Co-location rule min-
the unary (feature) and binary (neighborhood) ing; Co-occurrence; Spatial association; Spatial
constraints speci ed by the pattern s graph. An association analysis
instance of an example pattern is a set fo1 , o2 ,
o3 g of three spatial locations where label(o1 ) D
timetabling, label(o2 ) D weather, label(o3 )
Definition
D ticketing (unary constraints), and dist(o1 , o2 )
', dist(o1 , o3 ) ' (spatial binary constraints).
Spatial co-location rule discovery or spatial co-
In general, there may be an arbitrary spatial (or
location pattern discovery is the process that
spatiotemporal) constraint speci ed at each edge
identi es spatial co-location patterns from large
of a pattern graph (e.g., topological, distance,
spatial datasets with a large number of Boolean
direction, and time-difference constraints).
spatial features.
Main Text
Historical Background
Co-location patterns are used to derive co-
location rules that associate the existence The co-location pattern and rule discovery are
of nonspatial features in the same spatial part of the spatial data mining process. The
248 Co-location Pattern Discovery
differences between spatial data mining and other Boolean spatial features in the neighbor-
classical data mining are mainly related to data hood. Figure 1 also provides good examples of
input, statistical foundation, output patterns, and spatial co-location rules. As can be seen, rule
computational process. The research accomplish- Nile crocodiles ! Egyptian plover can predict
ments in this eld are primarily focused on the the presence of Egyptian plover birds in the
output pattern category, speci cally the predictive same areas where Nile crocodiles live. A dataset
models, spatial outliers, spatial co-location rules, consisting of several different Boolean spatial
and clusters (Shekhar et al. 2003). feature instances is marked on the space. Each
The spatial pattern recognition research pre- type of Boolean spatial features is distinguished
sented here, which is focused on co-location, is by a distinct representation shape. A careful ex-
also most commonly referred to as the spatial co- amination reveals two co-location patterns: ( + ,
location pattern discovery and co-location rule x ) and ( o , * ) (Shekhar et al. 2003). Spatial
discovery. To understand the concepts of spatial co-location rules can be further classi ed into
co-location pattern discovery and rule discovery, popular rules and con dent rules, according to the
we will have to rst examine a few basic concepts frequency of cases showing in the dataset. The
in spatial data mining. major concern here is the difference of dealing
The rst word to be de ned is Boolean spatial with rare events and popular events. Usually,
features. Boolean spatial features are geographic rare events are ignored, and only the popular co-
object types. They either are absent or present location rules are mined. So if there is a need
regarding different locations within the domain of to identify the con dent co-location rules, then
a two-dimensional or higher (three)-dimensional special handling and a different approach must be
metric space such as the surface of the earth taken to reach them (Huang et al. 2003).
(Shekhar et al. 2003). Some examples of Boolean Spatial co-location rule discovery is the pro-
spatial features are categorizations such as plant cess that identi es spatial co-location patterns
species, animal species, and types of roads, can- from large spatial datasets with a large num-
cers, crimes, and businesses. ber of Boolean spatial features (Shekhar et al.
The next concept relates to co-location 2003). The problems of spatial co-location rule
patterns and rules. Spatial co-location patterns discovery are similar to the spatial association
represent the subsets of Boolean spatial features rule mining problem, which identi es the inter-
whose instances are often located in close relationships or associations among a number of
geographic proximity (Shekhar et al. 2003). It spatial datasets. The difference between the two
resembles frequent patterns in many aspects. has to do with the concept of transactions.
Good examples are symbiotic species. The An example of association rule discovery can
Nile crocodile and Egyptian plover in ecology be seen with market basket datasets, in which
prediction (Fig. 1) are one good illustration of a transactions represent sets of merchandise item
point spatial co-location pattern representation. categories purchased altogether by customers
Frontage roads and highways (Fig. 2) in speci ed (Shekhar et al. 2003). The association rules are
metropolitan road maps could be used to derived from all the associations in the data
demonstrate line-string co-location patterns. with support values that exceed a user-de ned
Examples of various categories of spatial co- threshold. In this example, we can de ne in
location patterns are given in Table 1. We can detail the process of mining association rules as
see that the domains of co-location patterns are to identify frequent item sets in order to plan
distributed in many interesting elds of science store layouts or marketing campaigns as a part of
research and daily services, which proves their related business intelligence analysis.
great usefulness and importance. On the other hand, in a spatial co-location
Spatial co-location rules are models to as- rule discovery problem, we usually see that
sociate the presence of known Boolean spatial the transactions are not explicit (Shekhar et al.
features referencing the existence of instances of 2003). There are no dependencies among the
Co-location Pattern Discovery 249
30
20
10
0
0 10 20 30 40 50 60 70 80
Co-location Pattern Discovery, Fig. 2 Illustration of line-string co-location patterns. Highways, e.g., Hwy100, and
frontage roads, e.g., Normandale Road, are co-located (Shekhar et al. 2003)
transactions analyzed in market basket data, distributed into a continuous space domain and
because the transaction data do not share thus share varied spatial types of relationships,
instances of merchandise item categories but such as overlap, neighbor, etc., with each other.
rather instances of Boolean spatial features Although spatial co-location patterns and co-
instead. These Boolean spatial features are location rules differ slightly, according to the
250 Co-location Pattern Discovery
Co-location Pattern
Discovery, Table 1 Domains Example features Example co-location patterns
Examples of co-location Ecology Species Nile crocodile, Egyptian
patterns (Xiong et al. 2004) plover
Earth science Climate and distur- Wild re, hot, dry, lightning
bance events
Economics Industry types Suppliers, producers, consul-
tants
Epidemiology Disease types and en- West Nile disease, stagnant
vironmental events water sources, dead birds,
mosquitoes
Location-based service Service type requests Tow, police, ambulance
Weather Fronts, precipitation Cold front, warm front, snow
fall
Transportation Delivery service US Postal Service, UPS,
tracks newspaper delivery
previous de nitions, it can be said that spatial Co-location Pattern Discovery, Table 2 Boolean fea-
co-location pattern discovery is merely another ture A and the de ned transactions related to B and C
phrasing for spatial co-location rule nding. Ba- Instance of A Transaction
sically, the two processes are the same and can (0,0) ;
be used in place of each other. Both are used to (2,3) {B,C}
nd the frequent co-occurrences among Boolean (3,1) {C}
spatial features from given datasets. (5,5) ;
Co-location Pattern Discovery, Fig. 3 Transactions are Co-location Pattern Discovery, Fig. 4 Example of
de ned around instances of feature A, relevant to B and C window-centric model (Shekhar and Huang 2001)
(Shekhar and Huang 2001)
is a transaction, and the process tries to nd is an instance of feature fj (Shekhar and Huang
which features appear together the most number 2001). The participation ratio and participation
of times in these transactions, alias, and windows, index are two measures which replace support
i.e., using support and con dence measurements and con dence here. The participation ratio is the
(Shekhar and Huang 2001). number of row instances of co-location C divided
Figure 4 shows the processing with window by number of instances of Fi. Figure 5 shows an
partitions on data similar to that shown in Fig. 3. example of this model.
As this is a local model, even though here the A Table 3 shows a summary of the interest mea-
and C could have been a pattern, these features sures for the three different models.
are completely ignored since they are not within With different models to investigate different
a single window. problems of various application domains, there
The third modeling method is the event- are also multiple algorithms used in the discov-
centric model. This model is mostly related to ery process. Approaches to discover co-location
ecology- speci c domains where scientists want rules can be categorized into two classes, spatial
to investigate speci c events such as drought, statistics and data mining approaches.
El Nino, etc. The goal of this model is to nd Spatial statistics-based approaches use mea-
the subsets of spatial features likely to occur in sures of spatial correlation to characterize the
the neighborhood of a given event type. One relationship between different types of spatial
of the assumptions of this algorithm is that the features. Measures of spatial correlation include
neighbors are re exive, that is, interchangeable. the cross-K function with Monte Carlo simula-
For example, if A is a neighbor of B, then B is tion, mean nearest-neighbor distance, and spatial
also a neighbor of A. regression models. Computing spatial correlation
The event centric de nes key concepts as fol- measures for all possible co-location patterns can
lows: A neighborhood of l is a set of locations be computationally expensive due to the expo-
L = {l1, l2, l3,. . . ,lk} such that li is a neighbor of nential number of candidate subsets extracted
l (Shekhar and Huang 2001). I = {I1,. . . ,Ik} is a from a large collection of spatial Boolean features
row instance of a co-location C ={f1,. . . ,fk} if Ij that we are interested in Huang et al. (2004).
252 Co-location Pattern Discovery
Legend:
T.i represents instance i with feature type T t7
B.5
Lines between instance represents neighbor relationships A B C
C.3
A.4 3 4 1
A.2 e .2
C.2 + k=
A.3 B.3
t1 t2 t3 t4 t5 t6 table Id
A A B A C B C co-location
B C candidate co-locations of size 2
1 1 1 1 2 2 1 row instance
1 1 A B t1 t2 2 4 3 1 4 1
2 2 2 5 3 table instance
3 3 3 A C t1 t3 3 4 .5
4 4 B C t2 t3 .6 participation index
1 .4
5 will be pruned if min-prevalence
1
a 1 b c set to .5 and algorithm stops
k=1 k=2
Co-location Pattern Discovery, Fig. 5 Event-centric model example (Huang et al. 2004)
Co-location Pattern Discovery, Table 3 Interest measures for different models (Shekhar et al. 2003)
Interest measures for C1 ! C2
Model Items Transactions de ned by Prevalence Conditional probability
Reference feature Predicates on refer- Instances of reference Fraction of instance Pr(C2 is true for an instance
centric ence and relevant fea- feature C1 and C2 in- of reference feature of reference features given
tures volved with with C1 [ C2 C1 is true for that instance of
reference feature)
Data partitioning Boolean feature types A partitioning of spatial Fraction of Pr(C2 in a partition given C1
dataset partitions with C1 [ in that partition)
C2
Event centric Boolean feature types Neighborhoods of in- Participation index Pr(C2 in a neighborhood of
stances of feature types of C1 [ C2 C1 /
Data mining approaches can be further di- algorithm can be used just as in the association
vided into two categories: the clustering-based rule discovery process. Transactions over space
map overlay approach and the association rule- can be de ned by a reference-centric model as
based approaches. discussed previously, which enables the deriva-
Clustering-based map overlay approach re- tion of association rules using the a priori al-
gards every spatial attribute as a map layer and gorithm. There are few major shortcomings of
considers spatial clusters (regions) of point data this approach: generalization of this paradigm is
in each layer as candidates for mining the as- nontrivial in the case where no reference fea-
sociations among them. Association rule-based ture is speci ed, and duplicate counts for many
approaches again can be further divided into two candidate associations may result when de ning
categories: the transaction-based approaches and transactions around locations of instances of all
the distance-based approaches. features.
Transaction-based approaches aim to de ne Distance-based approaches are relatively
transactions over space such that an a priori-like novel. A couple of different approaches have
Co-location Pattern Discovery 253
been presented by different research groups. A nal example in our list of applications is
One proposes the participation index as the traf c control or transportation management.
prevalence measure, which possesses a desirable With the knowledge of co-location rules
anti-monotone property (Huang et al. 2003). discovered from existing datasets, better
Thus, a unique subset of co-location patterns can supervising and management could be carried out
be speci ed with a threshold on the participation to make transportation systems run in the most
index without consideration of detailed algorithm ef cient way, as well as to gain clearer foresights C
applied such as the order of examination of of future road network development and
instances of a co-location. Another advantage of expansion.
using the participation index is that it can de ne There are many more interesting elds re-
the correctness and completeness of co-location lated to the spatial co-location application do-
mining algorithms. main, such as disease research, economics, earth
science, etc. (Shekhar et al. 2002). With the
availability of more spatial data from different
areas, we can expect more research and studies
Key Applications to bene t from this technology.
References Definition
Huang Y, Xiong H, Shekhar S, Pei J (2003) Mining con- A spatial co-location pattern associates the co-
dent colocation rules without a support threshold. In:
Proceedings of the 18th ACM symposium on applied
existence of a set of non-spatial features in
computing (ACM SAC), Melbourne a spatial neighborhood. For example, a co-
Huang Y, Shekhar S, Xiong H (2004) Discovering co- location pattern can associate contaminated water
location patterns from spatial datasets: a general ap- reservoirs with a certain disease within 5 km
proach. IEEE Trans Knowl Data Eng (TKDE) 16(12)
December
distance from them. For a concrete de nition
Shekhar S, Huang Y (2001) Discovering spatial co- of the problem, consider number n of spatial
location patterns: a summary of results. In: Proceed- datasets R1 ; R2 ; : : : ; Rn , such that each Ri
ings of 7th international symposium on spatial and contains objects that have a common non-spatial
temporal databases (SSTD), Redondo Beach
Shekhar S, Schrater P, Raju W, Wu W (2002) Spatial
feature fi . For instance, R1 may store locations
contextual classi cation and prediction models for of water sources, R2 may store locations of ap-
mining geospatial data. IEEE Trans Multimed pearing disease symptoms, etc. Given a distance
Shekhar S, Zhang P, Huang Y, Vatsavai RR (2003) Trends threshold ", two objects on the map (independent
in spatial data mining. In: Kargupta H, Joshi A,
Sivakumar K, Yesha Y (eds) Data mining: next gen-
of their feature labels) are neighbors if their
eration challenges and future directions. AAAI/MIT distance is at most ". We can de ne a co-
Press, Cambridge, MA location pattern P by an undirected connected
Xiong H, Shekhar S, Huang Y, Kumar V, Ma X, Yoo J graph where each node corresponds to a feature
(2004) A framework for discovering co-location pat-
terns in data sets with extended spatial objects. In: and each edge corresponds to a neighborhood
Proceedings of SIAM international conference on data relationship between the corresponding features.
mining (SDM) Figure 1 shows examples of a star pattern, a
clique pattern and a generic one. A variable
labeled with feature fi is only allowed to take
instances of that feature as values. Variable pairs
that should satisfy a spatial relationship (i.e.,
Co-location Patterns constraint) in a valid pattern instance are linked
by an edge. In the representations of Fig. 1, we
Co-location Patterns, Interestingness Mea- assume that there is a single constraint type (e.g.,
sures close to), however in the general case, any spatial
relationship could label each edge. Moreover,
in the general case, a feature can label more
than two variables. Patterns with more than one
Co-location Patterns, Algorithms variable of the same label can be used to describe
spatial autocorrelations on a map.
Nikos Mamoulis Interestingness measures (Huang et al. 2003;
Department of Computer Science, University of Shekhar and Huang 2001) for co-location pat-
Hong Kong, Hong Kong, China terns express the statistical signi cance of their
instances. They can assist the derivation of useful
rules that associate the instances of the features.
Association; Co-occurrence; Mining collocation The problem of mining association rules based on
patterns; Mining spatial association patterns; spatial relationships (e.g., adjacency, proximity,
Participation index; Participation ratio; Refer- etc.) of events or objects was rst discussed in
ence-feature centric Koperski and Han (1995). The spatial data are
Co-location Patterns, Algorithms 255
Co-location Patterns, a b c
Algorithms, Fig. 1 Three
pattern representations. c a b a b
(a) Star. (b) Clique. b
a
(c) Generic
d d c d c
C
star clique generic
with con dence 100%. For simplicity, in the rest A co-location clique pattern P of length k is
of the discussion, fi ) I will be used to denote described by a set of features ff1 ; f2 ; : : : ; fk g.
rules that associate instances of feature fi with A valid instance of P is a set of objects
instances of feature sets I , fi I , within its fo1 ; o2 ; : : : ; ok g W .81 i k; oi 2 Ri / ^ .81
proximity. For example, the rule above can be i <j k; dist.oi ; oj / "/. In other words, all
expressed by a ) fbg. The mining process for pairs of objects in a valid pattern instance should
feature a can be repeated for the other features be close to each other, or else the closeness
(e.g., b and c) to discover rules having them relationships between the objects should form
on their left side (e.g., one can discover rule a clique graph. Consider again Fig. 2 and the
b ) fa; cg with conf. 100%). Note that the pattern P D fa; b; cg. fa1 ; b1 ; c1 g is an instance
features on the right hand side of the rules are not of P , but fa1 ; b2 ; c2 g is not.
required to be close to each other. For example, Huang et al. (2003) and Shekhar and Huang
rule b ) fa; cg does not imply that for each b (2001) de ne some useful measures that charac-
the nearby instances of a and c are close to each terize the interestingness of co-location patterns.
other. In Fig. 2, observe that although b2 is close The rst is the participation ratio pr.fi ; P / of a
to instances a1 and a2 of a and instance c2 of c, feature fi in pattern P , which is de ned by the
c2 is neither close to a1 nor to a2 . following equation:
Using this measure, one can de ne co-location of features. In addition, prevalence is monotonic;
rules that associate features with the existences if P P 0 , then prev.P / prev.P 0 /. For
of other features in their neighborhood. In example, since prev.fb; cg/ D 2=3, we know
other words, one can de ne rules of the form that prev.fa; b; cg/ 2=3. This implies that
.label.o/ D fi / ) (o participates in an instance the a priori property holds for the prevalence
of P with con dence pr(fi ,P )). These rules are of patterns and algorithms like generalized
similar to the ones de ned in Koperski and Han (Agrawal and Skrikant 1994) can be used to
(1995); the difference here is that there should mine them in a level-wise manner (Shekhar and
be neighborhood relationships between all pairs Huang 2001).
of features on the right hand side of the rule. For Finally, the con dence conf(P ) of a pattern P
example, pr.b; fa; b; cg/ D 0:5 implies that 50% is de ned by the following equation:
of the instances of b (i.e., only b1 / participate in
some instance of pattern a; b; c (i.e., fa1 ; b1 ; c1 g).
conf .P / D maxfpr.fi ; P /; fi 2 P g: (3)
The prevalence prev(P ) of a pattern P is
de ned by the following equation: For example, conf .b; c/ D 1 since
pr.b; fb; cg/ D 1 and pr.c; fb; cg/ D 2=3. The
prev.P / D minfpr.fi ; P /; fi 2 P g: (2) con dence captures the ability of the pattern
to derive co-location rules using the participation
For example, prev.fb; cg/ D 2=3 since ratio. If P is con dent with respect to a minimum
pr.b; fb; cg/ D 1 and pr.c; fb; cg/ D 2=3. The con dence threshold, then it can derive at least
prevalence captures the minimum probability that one co-location rule (for the attribute fi with
whenever an instance of some fi 2 P appears pr.fi ; P / D conf .P /). In Fig. 2, conf .fb; cg/D1
on the map, it will then participate in an instance implies that we can nd one feature in fb; cg
of P . Thus, it can be used to characterize the (i.e., b), every instance of which participates in
strength of the pattern in implying co-locations an instance of fb; cg. Given a collection of spatial
Co-location Patterns, Algorithms 257
Co-location Patterns,
Algorithms, Fig. 3 A 1 2 3 a
regular grid and some
objects b
4 5 6 Œ C
b1
a1
c1
7 8 9
Co-location Patterns,
Algorithms, Fig. 4 An
algorithm for reference
feature co-locations
if there is any instance (i.e., object) within " may use spatial analysis to identify features that
distance from oi ; if there is, we add the cor- commonly appear in the same constellation (e.g.,
responding feature to L. Finally, L will con- low brightness, similar colors). Biologists may
tain the maximal pattern that includes fi ; for identify interesting feature combinations appear-
each subset of it we increase the support of ing frequently in close components of protein or
the corresponding co-location rule. For more de- chemical structures.
tails about this process, the reader can refer to
Zhang et al. (2004). Decision Support
Overall, the mining algorithm requires two Co-location pattern analysis can also be used
database scans; one for hashing and one for read- for decision support in marketing applications.
ing the partitions, performing the spatial joins and For example, consider an E-commerce company
counting the pattern supports, provided that the that provides different types of services such
powerset of all features but fi can t in memory. as weather, timetabling and ticketing queries
This is a realistic assumption for typical applica- (Morimoto 2001). The requests for those services
tions (with 10 or less feature types). Furthermore, may be sent from different locations by (mobile
it can be easily extended for arbitrary pattern or x line) users. The company may be interested
graphs like those of Fig. 1b and c. in discovering types of services that are requested
by geographically neighboring users in order
to provide location-sensitive recommendations
Key Applications to them for alternative products. For example,
having known that ticketing requests are
Sciences frequently asked close to timetabling requests,
Scienti c data analysis can bene t from mining the company may choose to advertise the
spatial co-location patterns (Salmenkivi 2004; ticketing service to all customers that ask for
Yang 2005). Co-location patterns in census data a timetabling service.
may indicate features that appear frequently in
spatial neighborhoods. For example, residents of
high income status may live close to areas of low Future Directions
pollution. As another example from geographical
data analysis, a co-location pattern can associate Co-location patterns can be extended to include
contaminated water reservoirs with a certain de- the temporal dimension. Consider for instance, a
cease in their spatial neighborhood. Astronomers database of moving objects, such that each object
Co-location Patterns, Algorithms 259
is characterized by a feature class (e.g., private co-location patterns mining problem is converted
cars, taxis, buses, police cars, etc.). The move- to the spatial co-locations mining problem we
ments of the objects (trajectories) are stored in have seen thus far.
the database as sequences of timestamped spatial A more interesting (and more challenging)
locations. The objective of spatio-temporal co- type of spatio-temporal collocation requires that
location mining is to derive patterns composed the closeness relationship has a duration of at
by combinations of features like the ones seen least time units, where is another mining C
in Fig. 1. In this case, each edge in the graph of parameter. For example, we may consider, as
a pattern corresponds to features that are close a co-location instance, a combination of fea-
to each other (i.e., within distance ") for a large ture instances (i.e., moving objects), which move
percentage (i.e., large enough support) of their closely to each other for continuous time units.
locations during their movement. An exemplary To count the support of such durable spatio-
pattern is ambulances are found close to police temporal patterns, we need to slide a window
cars with a high probability . Such extended spa- of length along the time dimension and for
tial co-location patterns including the temporal each position of the window, nd combinations of
aspect can be discovered by a direct applica- moving objects that qualify the pattern. Formally,
tion of the existing algorithms. Each temporal given a durable pattern P , speci ed by a feature-
snapshot of the moving objects database can be relationship graph (like the ones of Fig. 1) which
viewed as a segment of a huge map (that includes has a node fi and distance/duration constraints "
all frames) such that no two segments are closer and , the participation ratio of feature fi in P is
to each other than ". Then, the spatio-temporal de ned by:
Thus, the participation ratio of a feature fi in Brinkhoff T, Kriegel HP, Seeger B (1993) Ef cient pro-
P is the ratio of window positions that de ne cessing of spatial joins using r-trees. In: Proceedings
a sub-trajectory of at least one object of type of the ACM SIGMOD international conference
Huang Y, Xiong H, Shekhar S, Pei J (2003) Mining con -
fi which also de nes an instance of the pattern. dent co-location rules without a support threshold. In:
Prevalence and con dence in this context are Proceedings of the 18th ACM symposium on applied
de ned by (2) and (3), as for spatial co-location computing (ACM SAC) (2003)
patterns. The ef cient detection of such patterns Koperski K, Han J (1995) Discovery of spatial asso-
ciation rules in geographic information databases.
from historical data as well as their on-line iden- In: Proceedings of the 4th international symposium
ti cation from streaming spatio-temporal data are on advances in spatial databases (SSD), vol. 951,
interesting problems for future research. pp. 47 66
Mamoulis N, Papadias D (2001) Multiway spatial joins.
ACM Trans Database Syst 26(4):424 475
Cross-References Morimoto Y (2001) Mining frequent neighboring class
sets in spatial databases. In: Proceedings of the ACM
Co-location Pattern SIGKDD international conference knowledge discov-
ery and data mining
Patterns, Complex Munro R, Chawla S, Sun P (2003) Complex spatial
Retrieval Algorithms, Spatial relationships. In: Proceedings of the 3rd IEEE inter-
national conference on data mining (ICDM)
Preparata FP, Shamos MI (1985) Computational geome-
References try: an introduction. Springer, New York
Salmenkivi M (2004) Evaluating attraction in spatial point
Agrawal R, Skrikant R (1994) Fast algorithms for mining patterns with an application in the eld of cultural
association rules. In: Proceedings of the 20th interna- history. In: Proceedings of the 4th IEEE international
tional conference on very large data bases, pp 487 499 conference on data mining
260 Co-location Patterns, Interestingness Measures
Shekhar S, Huang Y (2001) Discovering spatial co- often be assumed as desirable. Typically, these
location patterns: a summary of results. In: Proceed- properties are based on the frequencies of pattern
ings of the 7th international symposium on advances
in spatial and temporal databases (SSTD)
instances in the data.
Wang J, Hsu W, Lee ML (2005) A framework for mining Spatial association rules, co-location patterns
topological patterns in spatio-temporal databases. In: and co-location rules were introduced to address
Proceedings of the 14th ACM international conference the problem of nding associations in spatial
on Information and knowledge management. Full pa-
per in IEEE Trans. KDE 16(12), 2004
data, and in a more general level, they are applica-
Yang H, Parthasarathy S, Mehta S (2005) Mining spatial tions of the problem of nding frequent patterns
object associations for scienti c data. In: Proceedings on spatial domain. Interestingness of a pattern in
of the 19th International Joint Conference on Arti cial data is often related to its frequency, and that is
Intelligence
Zaki MJ, Gouda K (2003) Fast vertical mining using
the reason for the name of the problem.
diffsets. In: Proceedings of the ACM SIGKDD Con- In practice, a pattern is considered as interest-
ference ing, if the values of the interestingness measures
Zhang X, Mamoulis N, Cheung, DWL, Shou Y (2004) (possibly only one) of the pattern exceed the
Fast mining of spatial collocations. In: Proceedings of
the ACM SIGKDD Conference
thresholds given by the user.
Historical Background
Co-location Patterns,
Interestingness Measures Finding patterns in data and evaluating their
interestingness has traditionally been an essential
Marko Salmenkivi task in statistics. Statistical data analysis methods
HIIT Basic Research Unit, Department of cannot always be applied to large data masses.
Computer Science, University of Helsinki, For more detailed discussion of the problems,
Helsinki, Finland see Scienti c Fundamentals. Data mining, or
knowledge discovery from databases, is a branch
of computer science that arose in the late 1980s,
Synonyms when classical statistical methods could no
longer meet the requirements of analysis of the
Association Measures; Co-location Patterns; In- enormously increasing amount of digital data.
terestingness Measures; Selection Criteria; Sig- Data mining develops methods for nding trends,
ni cance Measures regularities, or patterns in very large datasets.
One of the rst signi cant contributions of data
mining research was the notion of association
rule, and algorithms, e.g., Apriori (Agrawal and
Definition Ramakrishnan 1994), for nding all interesting
association rules from transaction databases.
Interestingness measures for spatial co-location Those algorithms were based on rst solving
patterns are needed to select from the set of all the subproblem of the frequent itemset discovery.
possible patterns those that are in some (quantita- The interesting association rules could easily be
tively measurable) way, characteristic for the data deduced from the frequent itemsets.
under investigation, and, thus, possibly, provide When applying association rules in spatial
useful information. domain, the key problem is that there is no natural
Ultimately, interestingness is a subjective mat- notion of transactions, due to the continuous two-
ter, and it depends on the user s interests, the dimensional space. Spatial association rules were
application area, and the nal goal of the spatial rst introduced in Koperski and Han (1995).
data analysis. However, there are properties that They were analogous to association rules with
can be objectively de ned, such that they can the exception that at least one of the predicates
Co-location Patterns, Interestingness Measures 261
in a spatial association rule expresses spatial rela- the possible windows of size k k form the set
tionship (e.g., adjacent_to, within, close_to). The of transactions. The items of the transaction are
rules always continue a reference feature. Sup- the features present in the corresponding window.
port and con dence were used as interestingness Thus, support can be used as the interestingness
measures similarly to the transaction-based as- measure. The interpretation of the con dence of
sociation rule mining. Another transaction-based the rule A ! B is the conditional probability
approach was proposed in Morimoto (2001): spa- of observing an instance of B in an arbitrary C
tial objects were grouped into disjoint partitions. k k-window, given that an instance of feature
One of the drawbacks of the method is that A occurs in the window.
different partitions may result in different sets The reference feature-centric model focuses
of transactions, and, thus, different values for on a speci c Boolean spatial feature, and all the
the interestingness measures of the patterns. As discovered patterns express relationships of the
a solution to the problem, co-location patterns reference feature and other features. The spatial
in the context of the event-centric model were association rules introduced in Koperski and Han
introduced in Shekhar and Huang (2001). (1995) are based on selecting a reference fea-
ture, and then creating transactions over space.
Transactions make it possible to employ the in-
Scientific Fundamentals terestingness measures introduced for transac-
tion databases in the context of frequent item-
Different models can be employed to model the set discovery: support of a feature set (analo-
spatial dimension, and the interpretation of co- gously to the support of an itemset in transaction
location patterns as well as the interestingness databases), and con dence (or conditional prob-
measures are related to the selected model. ability) of an association rule.
The set of proposed models include at least the In the event-centric model introduced in
window-centric model, reference feature-centric Shekhar and Huang (2001), the spatial proximity
model, event-centric model, and buffer-based of objects is modeled by using the notion
model (Xiong et al. 2004). of neighborhood. The neighborhood relation
Co-location patterns and co-location rules can R.x; y/, x, y 2 O, where O is the set of spatial
be considered in the general framework of fre- objects, is assumed to be given as input. The
quent pattern mining as pattern classes. Other objects and the neighborhood relation can be
examples of pattern classes are itemsets and as- represented as an undirected graph, where nodes
sociation rules (in relational databases), episodes correspond to objects, and an edge between
(in event sequences), strings, trees and graphs nodes indicates that the objects are neighbors
(Mannila et al. 1995; Zaki 2002). (see Fig. 1). A limitation of the event-centric
In the window-centric model the space is dis- model is that it can be used only when the
cretized by a uniform grid, and the set of all objects are points. An advantage is that the
Co-location Patterns, Interestingness Measures, Fig. 1 Examples of (row) instances of co-location patterns in the
event-centric model
262 Co-location Patterns, Interestingness Measures
pattern discovery is not restricted to patterns with respect to the pattern class of co-location
with a reference feature. Furthermore, no explicit patterns, since adding features to P can clearly
transactions need to be formed. This fact also only decrease prev(P ).
has consequences as to the choice of relevant Let P and Q be co-location patterns, and
interestingness measures. In a transaction-based P \ Q D ;. Then P ! Q is a co-location
model a single object can only take part in one rule. The con dence (or conditional probability)
transaction, whereas in the event-centric model it of P ! Q (in a given dataset) is the fraction
is often the case that a single object participates of instances of P such that they are also in-
in several instances of a particular pattern. stances of P [ Q. A co-location rule is con dent
Figure 1 shows an example. There are nine if the con dence of the rule exceeds the user-
spatial point objects. The set of features consists speci ed threshold value. A suf ciently high
of three features indicated by a triangle (denote prevalence of a co-location pattern indicates that
it by A), circle (B), and rectangle (C ). In this the pattern can be used to generate con dent
example only one feature is assigned to each co-location rules. Namely, assume that the user-
object, in general there may be several of them. speci ed con dence threshold for interesting co-
There are three instances of feature A, two location rules is min_conf . Then, if prev.P /
instances of B, and four instances of C . The min_conf , rule f ! f1 ; : : : ; fn is con dent for
solid lines connect the objects that are neighbors. all f 2 P.
Cliques of the graph indicate the instances of In the example of Fig. 1 the prevalence
co-location patterns. Hence, there is only one prev.AB/ D min.2=3; 1/ D 2=3. Thus, one
instance of pattern fABC g containing all the can generate rules A ! B, the con dence of rule
features. being 2/3, and B ! A (con dence 1).
The participation ratio of a feature f in a Another interestingness measure proposed for
co-location pattern P is the number of instances co-location patterns is maximum participation
of the feature that participate in an instance of ratio (MPR). Prevalence of a pattern is the min-
P divided by the number of all instances of imum of the participation rations of its features,
f . For instance, in the example data on the whereas MPR is de ned as the maximum of them.
left panel of Fig. 1 the participation ratio of Correspondingly, a suf ciently high MPR im-
feature A in pattern fABg, pr.A; fABg/ D 2=3, plies that at least one of the features, denote it
since two out of three instances of feature by T , rarely occurs outside P. Hence, the co-
A also participate in instances of fABg. location rule fT g ! P n fT g is con dent (Huang
Correspondingly pr.B; fABg/ D 2=2 D 1, since et al. 2003). The motivation of using the MPR is
there is no instance of B that is not participating that rare features can more easily be included in
in fABg. The objects on the right panel of Fig. 1 the set of interesting patterns.
are equal to those of the left panel, except for A drawback of MPR is that it is not
an additional point with feature B. Now, there monotonous. However, a weaker property ( weak
are two different points with feature B such that monotonicity ) can be proved for MPR. This
they both are neighbors of the same instance of property is utilized in Huang et al. (2003) to
A. The instances of pattern fA; Bg have been develop a level-wise search algorithm for mining
indicated by the dashed lines. Thus, one instance con dent co-location rules.
of A participates in two instances of fA; Bg. The The buffer-based model extends the co-
participation ratios are equal to the left-side case: location patterns to polygons and line strings
pr.A; fABg/ D 2=3 and pr.B; fABg/ D 3=3 D 1. (Xiong et al. 2004). The basic idea is to
Prevalence of a co-location pattern is de ned introduce a buffer, which is a zone of a
as prev.P / D minfpr.f; P/; f 2 Pg. A co- speci ed distance, around each spatial object.
location pattern is prevalent, if its prevalence The boundary of the buffer is the isoline of
exceeds the user-speci ed threshold value. Preva- equal distance to the edge of the objects (see
lence is a monotonous interestingness measure Fig. 2). The (Euclidean) neighborhood N(o) of
Co-location Patterns, Interestingness Measures 263
Co-location Patterns,
Interestingness
Measures, Fig. 2
Examples of
neighborhoods in the
buffer-based model
an object o is the area covered by its buffer. sample. The aim in statistics is typically to infer,
The (Euclidean) neighborhood of a feature f based on the sample, knowledge of properties of
is the union of N .oi /, where oi 2 Of , and the reality , that is, the phenomenon, that gen-
Of is the set of instances of f . Further, the erated the data. The goal of co-location pattern
(Euclidean) neighborhood N .C / for a feature mining is to nd descriptions of the data, that is,
set C D ff1 ; f2 ; : : :; fn g is de ned as the only the content of the available database is the
intersection of N .fi /; fi 2 C . object of investigation. In a sense, statistical anal-
The coverage ratio Pr.C /, where C D ysis is more ambitious. However, sophisticated
ff1 ; f2 ; : : :; fn g is a feature set is de ned statistical data analysis methods cannot always
as NZ .C /
, where Z is the total size of the be applied to large data masses. This may be
investigation area. Intuitively, the coverage ratio due to the lack of computational resources, expert
of a set of features measures the fraction of knowledge, or other human resources needed to
the investigation area that is in uenced by the preprocess the data before statistical analysis is
instances of the features. possible.
The coverage ratio is a monotonous Furthermore, depending on the application,
interestingness measure in the pattern class treating the content of a spatial database as a
of co-location patterns in the buffer-based sample may be relevant, or not. Consider, for
model, with respect to the size of the co- instance, roads represented in a spatial database.
location pattern (Xiong et al. 2004). Now Clearly, it is usually the case that (practically)
in the buffer-based model the conditional all of them are included in the database, not
probability (con dence) of a co-location rule only a sample. On the other hand, in an ecolog-
P ! Q expresses the probability of nding ical database that includes the known locations
the neighborhood of Q in the neighborhood of of nests of different bird species, it is obvi-
P . Due to the monotonicity of coverage ratio, it ous that not all the nests have been observed,
can be computed as NN .P [Q/
.P /
. Xiong et al. also and thus a part of the information is missing
demonstrate that the de nition of conditional from the database. Another example is a lin-
probability (con dence) of a co-location rule guistic database that contains dialect variants of
in the event-centric model does not satisfy the words in different regions. Such variants can-
law of compound probability: it is possible that not in practice be exhaustively recorded every-
Prob.BCjA/ ⁄ Prob.C jAB/Prob.BjA/, where where, and, thus, the data in the database is a
Prob.BCjA/ is equal to the con dence of the sample.
rule A ! BC. They show, however, that in the Statistical analysis of spatial point patterns is
buffer-based model this law holds. closely related to the problem of nding inter-
esting co-location patterns (see, e.g., Bailey and
Statistical Approaches Gatrell 1995; Diggle 1983). In statistics, features
An essential difference in the viewpoints of spa- are called event types, and their instances are
tial statistics and co-location pattern mining is events. The set of events in the investigation
that in statistics the dataset is considered as a area form a spatial point pattern. Point patterns
264 Co-location Patterns, Interestingness Measures
of several event types (called marked point pat- Pr.T > t jH0 ). The smaller the p-value, the
terns) may be studied, for instance, to evaluate smaller the probability that the observed degree
spatial correlation (either positive, i.e., clustering of spatial correlation could have been occurred by
of events, or negative, i.e., repulsion of events). chance. Thus, the correlation can be interpreted
Analogously, the point pattern of a single event as interesting if the p-value is small. If the p-
type can be studied for evaluating possible spatial value is not greater than a prede ned , the
autocorrelation, that is, clustering or repulsion of deviation is de ned to be statistically signi cant
the events of the event type. with the signi cance level .
In order to evaluate spatial (auto)correlation, The correlation patterns introduced in
point patterns, that is the data, are modeled as Salmenkivi (2006) represent an intermediate
realizations (samples) generated by spatial point approach between spatial point pattern analysis
processes. A spatial point process de nes a joint and co-location pattern mining. Correlation
probability distribution over all point patterns. patterns are de ned as interesting co-location
The most common measures of spatial correla- patterns (in the event-centric model) of the form
tion in point patterns are the G.h/, and K.h/- A ! B, where A and B are single features. The
functions. For a single event type the value of interestingness is determined by the statistical
G.h/-function in data is the number of events signi cance of the deviation of the observed
such that the closest other event is within a G.h/-value from a null hypothesis assuming no
distance less than h divided by the number of spatial correlation between features A and B.
all events. For two event types, instead of the
closest event of the same type, the closest event
of the other event type is considered. Thus, the
Key Applications
con dence of the co-location rule A ! B, where
A and B are single features in the event-centric
Large spatial databases and spatial datasets. Ex-
model, is equal to the value of GA;B (h)-function
amples: digital road map (Shekhar and Ma), cen-
in the data, when the neighborhood relation is
sus data (Malerba et al. 2001), place name data
de ned as the maximum distance of h between
(Leino et al. 2003; Salmenkivi 2006).
objects.
The statistical framework implies that the rela-
tionship of the phenomenon and the data, which
is a sample, has to be modeled in some way. In Future Directions
spatial statistics, the interestingness measures can
be viewed from several perspectives, depending A collection of interesting patterns can be re-
on the statistical framework, and the methods garded as a summary of the data. However, the
used in the data analysis. One of the most com- pattern collections may be very large. Thus, con-
mon frameworks is the hypothesis testing. densation of the pattern collections and pattern
Hypothesis testing sets up a null hypothesis, ordering are important challenges for research on
typically assuming no correlation between fea- spatial co-location patterns.
tures, and an alternative hypothesis that assumes Co-location patterns and rules are local in the
spatial correlation. A test statistic, e.g., G.h/ sense that, given a pattern, only the instances of
or K.h/-function, for measuring spatial corre- the features that appear in the pattern are taken
lation is selected, denote it by T . The value into account when evaluating the interestingness
of the test statistic in data, denote it by t , is of the pattern. However, the overall distribution
compared against the theoretical distribution of and density of spatial objects and features may, in
the test statistic, assuming that the null hypoth- practice, provide signi cant information as to the
esis holds. Then, a natural interestingness mea- interestingness of a pattern. This challenge is to
sure of the observed spatial correlation is based some extent related to the challenge of integrating
on the so-called p-value, which is de ned as statistical and data mining approaches.
Combinatorial Map 265
Definition
Complex Event Processing
In the last few decades, computing environments
Data Stream Systems, Empowering with Spa-
have evolved to accommodate the need for in-
tiotemporal Capabilities tegrating the separate, and often incompatible,
processes of Geographic Information Systems
(GIS) and Computer Assisted Design (CAD).
Components; Reuse This chapter will explore the evolution of GIS
and CAD computing environments-from desktop
Smallworld Software Suite to Web, and nally to wireless-along with the in-
dustry requirements that prompted these changes.
and maintained by importing data from CAD for users who were not GIS professionals or
drawings. Graphic representations of layers of AutoCAD engineers, and offered the basic tools
a formation, such as water, sewer, roads and of both systems: precision drafting and the capa-
parcels, are imported into the GIS using the le- bility to query large geospatial data and perform
based method. rudimentary analysis and reports.
To better merge the CAD world with the World used a Microsoft Of ce interface to ac-
GIS world, a partnership was formed between cess and integrate different data types, including
Autodesk, Inc. and ESRI, leading to the creation geographic, database, raster, spreadsheet, and im-
of ArcCADfi . ArcCAD was built on AutoCAD ages, and supported Autodesk DWG as a native
and enabled users to create GIS layers and to le format, increasing the value of maps created
convert GIS layers into CAD objects. This tool in AutoCAD and AutoCAD Map. World enabled
also facilitated data cleanup and the attachment users to open disparate GIS data les simulta-
of attributes. Because ArcCAD enabled GIS data neously and perform analysis regardless of le
to be shared with a greater number of people, the type. Autodesk World could access, analyze, edit
data itself became more valuable. and save data in all the standard formats without
Although ArcCAD solved some of the inte- import or export.
gration problems between CAD and GIS, it still Although Autodesk World represented a real
did not provide full GIS or CAD functionality. breakthrough in integrating GIS and CAD les,
For example, overlay analysis still had to be it lacked an extensive CAD design environment.
performed in ArcInfofi and arcs and splines were AutoCAD was still the CAD environment of
not available in the themes created by ArcCAD. choice, and AutoCAD Map continued to offer
In order to provide a fully functional GIS built better integration of GIS within a full CAD en-
on the AutoCAD platform, Autodesk developed vironment. Autodesk World lled a need, much
AutoCAD Mapfi (now called Autodesk Mapfi /, like other desktop GIS solutions at the time, but
which made it simple for a CAD designer to there was still a gap between the CAD design
integrate with external databases, build topology, process and analysis and mapping within the GIS
perform spatial analysis, and utilize data clean- environment.
ing, without le translation or lost data. In Auto- In the same time period, AutoCAD Map con-
CAD Map, lines and polygons were topologically tinued to evolve its GIS capabilities for directly
intelligent with regard to abstract properties such connecting, analyzing, displaying, and theming
as contiguity and adjacency. Since DWG les existing GIS data (in SDE, SHP, DGN, DEM, and
were already le-based packets of information, Raster formats, for example) without import or
they became GIS-friendly when assigned topol- export. In support of the Open GIS data standard,
ogy and connected to databases. Precision was AutoCAD Map could read OpenGIS information
enforced instantly, since the DWG les could natively. GIS and CAD integration continues to
now store coordinate systems and perform pro- be one of key features of AutoCAD Map.
jections and transformations. AutoCAD Map rep-
resented the rst time a holistic CAD and GIS CAD and GIS During the Web Phase
product was available for the PC Workstation The next signi cant in ection point in technology
environment. was the World Wide Web, which increased the
Although AutoCAD Map could import and number of users of spatial data by an order of
export the standard GIS le types (circa 1995: magnitude. With the advent of this new technol-
ESRI SHP, ESRI Coverage, ESRI E00, Microsta- ogy and communication environment, more peo-
tion DGN, MapInfo MID/MIF, Atlas BNA) users ple had access to information than ever before.
began to request real-time editing of layers from Initially, CAD and GIS software vendors re-
third-party GIS les. To meet this demand, Au- sponded to the development of the Web by Web-
todesk created a new desktop GIS/CAD product enabling existing PC applications. These Web-
called Autodesk Worldfi . World was designed enabled applications offered the ability to assign
Computer Environments for GIS and CAD 269
Universal Resource Locators (URLs) to graphic browser plug-in that could display full vector-
objects or geographic features, such as points, format GIS data streamed from an enormous
lines and polygons, and enabled users to publish repository using very little bandwidth. Each
their content for viewing in a browser as an layer in MapGuide viewer could render streamed
HTML (Hypertext Markup Language) page and data from different MapGuide Servers around
a series of images representing maps or design. the Internet. For example, road layers could be
Software developers also Web-enabled CAD streamed directly from a server in Washington, C
and GIS software by providing a thin client or DC, while the real-time location of cars could be
browser plug-in, which offered rich functionality streamed directly from a server in Dallas, Texas.
similar to the original application. MapGuide managed its performance primarily
with scale-dependent authoring techniques that
CAD for the Web limited the amount of data based on the current
In the early Web era, slow data transfer rates scale of the client map.
required thin clients and plug-ins to be small MapGuide could perform basic GIS functions
(less than one megabyte) and powerful enough such as buffer and selection analysis, as well
to provide tools such as pan and zoom. In light as address-matching navigation with zoom-goto.
of this, Autodesk s developed a CAD plug-in One of the more powerful aspects of MapGuide
called Whip! which was based on AutoCAD s was the generic reporting functionality, in which
ADI video driver. MapGuide could send a series of unique IDs of
Although the Whip! viewer today has evolved selected objects to any generic Web page for
into the Autodesk DWF Viewer, the le format, reporting. Parcels, for example, could be selected
DWF (Design Web Format) remains the same. in the viewer and the Parcel IDs could be sent to a
DWF les can be created with any AutoCAD server at City Hall that had the assessment values.
based product, including AutoCAD Map, and A report was returned, as a Web page, containing
the DWF format displays the map or design on all the information about the selected parcels.
the Web as it appears on paper. DWF les are Again, the report could reside on any server,
usually much smaller than the original DWGs, anywhere. The maps in MapGuide were just styl-
speeding their transfer across the Web. With the ized pointers to all the potential servers around
development of DWF, Internet users had access the Internet, containing spatial and attribute data.
to terabytes of information previously available MapGuide was revolutionary at the time, and rep-
only in DWG format. This was a milestone in resented, in the true sense, applications taking ad-
information access. vantage of the distributed network called the Web.
From a GIS perspective, 2D DWF les were MapGuide continued to evolve, using ActiveX
useful strictly for design and did not represent controls for Microsoft Internet Explorer, a plug-
true coordinate systems or offer GIS functional- in for Netscape and a Java applet that could run
ity. Although Whip!-based DWF was extremely on any Java-enabled browser. Initially, MapGuide
effective for publishing digital versions of maps used only its own le format, SDF, for geographic
and designs, GIS required a more comprehensive features. Later, MapGuide could natively support
solution. DWG, DWF, SHP, Oracle Spatial, and ArcSDE.
Note: Today, DWF is a 3D format that sup- Although MapGuide was an extremely effec-
ports coordinate systems and object attributes. tive solution, it could run only on Microsoft
Windows servers. The development of MapGuide
GIS for the Web OpenSource and Autodesk MapGuide Enterprise
As the Web era progressed, it became clear was inspired by the need to move toward a neutral
that a simple retro t of existing applications server architecture and plug-in-free client expe-
would not be suf cient for Web-enabled GIS. rience. MapGuide could be now be used either
In 1996, Autodesk purchased MapGuidefi from without a plug-in or with the newest DWF Viewer
Argus Technologies. MapGuide viewer was a as a thin client.
270 Computer Environments for GIS and CAD
Within AutoCAD Map, users could now pub- Wireless GIS and Location-Based Services
lish directly to the MapGuide Server and main- Initially, the mobile GIS solution at Autodesk was
tain the data dynamically, further closing the GIS- OnSite Enterprise, which leveraged the mobility
CAD gap. of OnSite and the dynamism of MapGuide. On-
Site Enterprise created handheld MapGuide maps
in the form of OSD les that users could simply
CAD and GIS During the Wireless Phase
copy off the network and view on their mobile
Wireless CAD and GIS marked the beginning
devices with OnSite.
of the next in ection point on the information
In 2001, when true broadband wireless came
technology curve, presenting a new challenge
on the horizon, Autodesk created a new corpo-
for GIS and CAD integration. Since early wire-
rate division focused solely on Location-Based
less Internet connection speeds were quite slow-
Services (LBS). The burgeoning Wireless Web
approximately one quarter of wired LAN speed-
required a new type of software, designed specif-
Autodesk initially decided that the best method
ically to meet the high transaction volume, per-
for delivering data to handheld device was sync
formance (+ 40 transactions per second), and
and go, which required physically connecting
privacy requirements of wireless network oper-
a handheld to a PC and using synchronization
ators (WNOs). The next technological in ection
software to transfer map and attribute data to the
point had arrived, where maps and location-based
device. GIS consumers could view this data on
services were developed for mass-market mobile
their mobile devices in the eld without being
phones and handheld devices.
connected to a server or desktop computer. Since
Autodesk Location Services created Location-
handheld devices were much less expensive than
Logic , a middleware platform that provides
PCs, mobile CAD and GIS further increased the
infrastructure, application services, content pro-
number of people who had access to geospatial
visioning, and integration services for deploy-
information.
ing and maintaining location-based services. The
LocationLogic platform was built by the same
Wireless CAD strong technical leadership and experienced sub-
Autodesk OnSite View (circa 2000) allowed ject matter experts that worked on the rst Au-
users to transfer a DWG le to Palm-OS handheld todesk GIS products. The initial version of Lo-
and view it on the device. When synchronized, cationLogic was a core Geoserver speci cally
the DWG le was converted to an OnSite Design targeted for wireless and telecom operators that
le (OSD), and when viewed, allowed users to required scalability and high-volume transaction
pan, zoom and select features on the screen. throughput without performance degradation.
With the advent of Windows CE support, On- The LocationLogic Geoserver was able to
Site View allowed redlining, enabling users to provide:
mark up a design without modifying the original.
Redlines were saved as XML (Extensible Markup
Point of Interest (POI) queries
Language) les on the handheld and were trans-
Geocoding and reverse geocoding
ferred to the PC on the next synchronization or
Route planning
docking. These redline les could be imported
Maps
into AutoCAD, where modi cations to the design
Integrated user pro le and location triggers
could be made.
Autodesk OnSite View could be considered
more mobile than wireless, since no direct access Points of Interest (POIs) usually comprise a
to the data was available without connecting the set of businesses that are arranged in different
mobile device to the PC. OnSite View lled a categories. POI directories, which can include
temporary niche before broadband wireless con- hundreds of categories, are similar to Telecom
nections became available. Yellow Pages, but with added location intelligence.
Computer Environments for GIS and CAD 271
Common POI categories include Gas Stations, Friend nder utilities alerted the phone user that
Hotels, Restaurants, and ATMs, and can be people on their list of friends were within a
customized for each customer. Each listing in certain distance of the phone.
the POI tables is spatially indexed so users can More recently, Autodesk Location Services
search for relevant information based on a given has offered two applications built on Location-
area or the mobile user s current location. Logic that can be accessed on the cell phone
Geocoding refers to the representation of a and via a Web browser: Autodesk Insight and C
feature s location or address in coordinates (x,y) Autodesk Family Minder.
so that it can be indexed spatially, enabling prox- Autodesk Insight is a service that enables
imity and POI searches within a given area. any business with a PC and Web browser to
Reverse geocoding converts x, y coordinates to track and manage eld workers who carry mobile
a valid street address. This capability allows the phones. Unlike traditional eet tracking services,
address of a mobile user to be displayed once Insight requires no special investment in GPS
their phone has been located via GPS or cell hardware. Managers and dispatchers can view the
tower triangulation. Applications such as Where locations of their staff, determine the resource
am I? and friend or family nders utilize reverse closest to a customer site or job request, and
geocoding. generate turn-by-turn travel instructions from the
Route planning nds the best route between Web interface. Managers can also receive alerts
two or more geographical locations. Users can when a worker arrives at a given location or enters
specify route preferences, such as shortest path or leaves a particular zone. Reports on travel,
based on distance, fastest path based on speed route histories, and communications for groups
limits, and routes that avoid highways, bridges, or individuals, over the last 12 or more months,
tollways, and so on. Other attributes of route can be generated from the Web interface.
planning include modes of transportation (such Family Minder allows parents and guardians
as walking, subway, car), which are useful for to view the real-time location of family members
European and Asian countries. from a Web interface or their handset. Parents and
The maps produced by the LocationLogic s guardians can also receive noti cations indicating
Geoserver are actually authored in Autodesk that a family member has arrived at or left a
MapGuide. Although the Geoserver was built location. The recent advances in mobile phone
from the ground up, LocationLogic was able to technology, such as sharper displays, increased
take advantage of MapGuide s effective mapping battery life and strong processing power, make it
software. possible for users to view attractive map displays
LocationLogic also supports user pro les for on regular handsets.
storing favorite routes or POIs. Early versions
of LocationLogic also allowed applications to Enterprise GIS: Workstation, Web and
trigger noti cations if the mobile user came close Wireless Synergy
to a restaurant or any other point of interest. In 1999, Autodesk acquired VISION*fi , along
This capability is now used for location-based with its expertise in Oracle and enterprise GIS
advertising, child zone noti cations, and so on. integration. This was a turning point for Autodesk
GIS. File-based storage of information (such as
DWG) was replaced with enterprise database
Key Applications storage of spatial data. Currently, Autodesk has
integrated VISION* into its development, as
Early LBS applications built on LocationLogic seen in Autodesk GIS Design server. Autodesk
included traf c alerts and friend nder utilities. Topobase , which also stores its data in Oracle,
For example, Verizon Wireless subscribers could connects to AutoCAD Map and MapGuide
receive TXT alerts about traf c conditions at to provide enterprise GIS Public Works and
certain times of day and on their preferred routes. Municipal solutions.
272 Computer Environments for GIS and CAD
Users
Autodesk LocationLogic
s
Autodesk InSight
le s
Autodesk MapGuide Family Minder
re
Wi
Autodesk DWF
Inflection Point
Autodesk OnSite View
Autodesk OnSite Enterprise
b
AutoCAD
We
Autodesk Map
Autodesk Raster Design
Inflection Point
s
PC
Computer Environments for GIS and CAD, Fig. 1 Technological in ection points along the information technology
curve exponential jumps in access to geospatial information
MapGuide and AutoCAD Map support Oracle of spatial data consumers, and the CAD and GIS
Spatial and Locator, which allow all spatial data gap continued to close. The most recent in ection
to be stored in a central repository. All appli- point, Web to wireless, saw the number of spatial
cations can view the data without duplication data users reach a new high, as GIS applications
and reliance on le conversion. AutoCAD Map were embedded in the users daily tools, such
users can query as-built information from the as cell phones (see Fig. 1). At this point in the
central repository for help in designs, and any technology curve, the need for synergy between
modi cations are saved and passed to GIS users. CAD and GIS is apparent more than ever. Since
The central GIS database can also be published the value of spatial data increases exponentially
and modi ed from Web-based interfaces, such as with the number of users who have access to
MapGuide. Real-time wireless applications, such it, Autodesk s enterprise GIS solution, with its
as Autodesk Insight, can use the repository for centralized spatial database, provides signi cant
routing and mobile resource management. value to a wide variety of spatial data consumers.
Autodesk has a history of leveraging in ection
Summary points along the computing and communication
At each technological in ection point-workstation, technology curve to create exciting and innova-
Web and wireless-Autodesk has leveraged tive solutions. For over two decades, Autodesk s
infrastructural changes to exponentially increase mission has been to spearhead the democrati-
the universe of potential consumers of geospatial zation of technology by dramatically increas-
information. The shift from minicomputer to PC ing the accessibility of heretofore complex and
saw Autodesk create AutoCAD and AutoCAD expensive software. This philosophy has been
Map to enable sharing of geographic and design pervasive in the GIS and LBS solutions that it
information. The next in ection point, worksta- has brought to a rapidly growing geospatial user
tion to Web, spurred another jump in the number community.
Computer Environments for GIS and CAD 273
Computer Environments
for GIS and CAD, Fig. 2 Future Technological Inflection Point
Future technological
in ection point (Continued
from Fig. 1)
SOA for GIS
Application Mashups
ess
Family Minder
el
Wir
Ashish Gupta
Recommended Reading
Department of Civil, Environmental, and
Autodesk Geospatial. http://images.autodesk.com/adsk/ Geodetic Engineering, Ohio State University,
les/autodesk_geospatial_white_paper.pdf Columbus, OH, USA
Autodesk Inc. (2007) Map 3D 2007 essentials. Autodesk
Press, San Rafael
Barry D (2003) Web services and service-orientated archi-
tectures, the Savvy Manager s guide, your road map Synonyms
to emerging IT. Morgan Kaufmann Publishers, San
Francisco Autonomous navigation; Global navigation
Best Practices for Managing Geospatial Data.
http://images.autodesk.com/adsk/ les/%7B574931BD
satellite systems; GPS-denied geo-localization;
-8C29-4A18 -B77C-A60691A06A11%7D_Best_Prac Simultaneous localization and mapping; Un-
tices. pdf manned aerial vehicles; Visual odometry
CAD and GIS Critical Tools, Critical Links: Removing
Obstacles Between CAD and GIS Professionals.
http://images.autodesk.com/adsk/ les/3582317_Criti
cal Tools0.pdf Definition
Hjelm J (2002) Creating location services for the wireless
guide. Professional Developer s guide series. Wiley
Geospatial localization is the estimation of
Computer Publishing, New York
Jagoe A (2003) Mobile location services, the global geographic location using, in part,
de nitive guide. Prentice Hall, Upper Saddle geospatial analysis. Geospatial analysis uses
River statistical and other analytic techniques for data
Kolodziej K, Hjelm J (2006) Local positioning systems,
that has a geographic or spatial context to it,
LBS applications and services. Taylor and Francis
group. CRC Press, Boca Raton. typically available in geographic information
Laurini R, Thompson D (1992) Fundamentals of spatial systems (GIS). Geographic location is typically
information systems. The APIC series. Academic, ascertained using Global Navigation Satellite
London/San Diego
Systems (GNSS) like GPS and GLONASS,
Longley P, Goodchild M, Maguire D, Rhind D (1999)
Geographical information systems, 2nd edn. Principles which requires simultaneous line-of-sight
and technical issues, vol 1; Management issues and connection with multiple satellites to estimate
applications, vol 2. Wiley, New York location within an error margin of a few meters.
Sharma C (2001) Wireless internet enterprise applications.
These constraints limit the use of GNSS-based
Wiley tech brief series. Wiley Computer Publishing,
New York localization to outdoors with few obstructing
Schiller J, Voisard A (2004) Location-based services. structures in close proximity and a tolerance
Morgan Kaufmann, San Francisco to uncertainty in exact location. In addition to
Plewe B (1997) GIS ONLINE; information retrieval, map-
ping and the internet. OnWord Press, Santa Fe
these constraints, in many environments such
Vance D, Smith C, Appell J (1998) Inside autodesk world. as indoors, urban canyons, under dense foliage,
OnWord Press, Santa Fe underwater, and underground, there is limited or
Vance D, Walsh D, Eisenberg R (2000) Inside AutoCAD no GPS access. Besides these naturally occurring
map 2000. Autodesk Press, San Rafael
Vance D, Eisenberg R, Walsh D (2000) Inside AutoCAD
constraints, GPS access can be easily blocked by
map 2000, the ultimate how-to resource and desktop jamming, spoo ng, and other GPS-denial threats
reference for AutoCAD map. OnWord Press, Florence in adversarial environments. Consequently,
for positioning, navigation, and timing (PNT)
applications, GPS must be augmented or
Computer Supported Cooperative supplanted by other sensors and systems. In
Work such cases GPS is used for an approximate
localization within a geographic region, which
Geocollaboration can range from tens to thousands of square
Computer Vision Augmented Geospatial Localization 275
meters based on the environment. Alternate vessels while cruising oceans. Such systems are
techniques are used to ascertain exact location too inaccurate for precision demands of present-
within this geographic region. Simultaneous day PNT-dependent systems. To overcome the
localization and mapping (SLAM) techniques are accuracy issues of GNSS for precision aviation,
popularly used in robotics to estimate the location a Local Area Augmentation System (LAAS) is
of a robot in real time based on information used for precision aircraft landing in all weather
acquired from the environment using sensors conditions (Enge 1999). A VHF signal link from C
on board the robot (Lategahn et al. 2011). It airport transmitters is used by aircraft to correct
is typical to use multiple sensors like inertial GPS signal for precise localization. Cellular ca-
measurement units (IMU), single or double pable devices can use Assisted GPS (A-GPS)
video cameras for monocular or stereo vision, for improved localization using information pro-
light detection and ranging (LIDAR), and sound vided by the cellular network in conjunction with
navigation and ranging (SONAR). The choice of satellite signal for a quicker estimation of loca-
sensor suite is based on the environment (aerial, tion. Localization using cellular tower triangula-
ground, underwater, indoor), the type of mobile tion is another alternative with a reasonable error
platform, and the performance, processing, and of several tens of meters, but it is only feasible
cost budget. Vision-based sensors are among outdoors in urban areas. For indoor navigation,
the most popular in SLAM techniques since the IEEE 802.11 wireless LAN (WLAN) location
they are informative and cost-effective. In tracking system is an option (Emery and Denko
addition to acquiring sensor information of 2007). It uses received signal strength indication
its vicinity and estimating its position, SLAM (RSSI) on mobile devices and estimates location
also builds a map of the geographic region in by comparison with a precomputed database of
real time. There are different types of maps. RSSI measurements in that indoor environment.
However, with navigation being the principal It accounts for signal propagation loss and can
objective, topological maps are most relevant. provide accuracy of a few meters.
A topological map focuses on the connectivity These PNT systems are currently operational
between important entities in the environment but are incumbent with high infrastructure
with disregard to their exact location (Paul and cost and have other limitations like availability
Newman 2010). Metric mapping can be used in exclusively in urban environments. Moreover,
conjunction with topological maps to compute they have a natural limitation of localization
a topometric map which is used to compute accuracy. In comparison, computer vision-based
exact localization within that geographic pose estimation techniques used in robotics for
region (Badino et al. 2011). This technique SLAM have a comparatively high localization
records sensor information with its registered accuracy, but have historically been used for
location in a map database. Subsequently, while mapping small-sized environments. However,
moving through that geographic region, sensor success in the DARPA Grand Challenges for
information can be used as a query to the autonomous navigation of driverless vehicles
recorded map database to retrieve geospatial over large distances using SLAM-based
location in real time. techniques established the viability of computer
vision augmented geospatial localization as a
viable PNT alternative in GPS-denied or GPS-
Historical Background degraded environments (Thrun et al. 2006).
object y
object point [mm]
2
feature point Pj
3
camera
4
image k 1 Pj,k 1
1
Pj,k+1
Pj,k camera
image k +1
0 10m
X [mm]
camera image k
wheel odometry visual odometry (SURF)
Computer Vision Augmented Geospatial Localiza- Localization accuracy depends on the type of sensor and
tion, Fig. 1 Estimation of pose of sensors onboard mov- type of features computed from the data stream. The
ing vehicle. Trajectory of moving vehicle is estimated graph is illustrative of the difference between trajectories
based on sensor locations in current and previous frames. recovered using different types of odometry techniques
Computer Vision Augmented Geospatial Localiza- Washington, DC, is abstracted as a graph, where edges
tion, Fig. 2 Map data is abstracted as graph data struc- typically represent roads and nodes represent intersections
ture. The transport network layer from OpenStreetMap of and other relevant points in the map
location is estimated by computing the coor- on board the vehicle is retrieved from a nite set
dinates of the location of the sensor on board the of possible locations. Topological localization
vehicle. These could be geographic coordinates provides a coarse location estimate. Topological
of latitude and longitude. The coordinates of the maps are typically stored as graph structures,
vehicle pose are typically computed by triangu- where nodes indicate possible locations and
lation, using methods like structure from motion edges are connections between locations. An
(SfM) (Koenderink and van Doorn 1991) or example is shown in Fig. 2 for the city of Wash-
Visual Odometry (VO) (Alonso et al. 2012). An ington, DC, where the transport network layer
illustrative example is shown in Fig. 1. Pose of the acquired from OpenStreetMap for the county
sensor is estimated based on matching features area has been abstracted as a graph. The weight of
across different frames in the data stream of a an edge can indicate the similarity or proximity
moving vehicle. The sequence of poses is used to between locations. The size of the nite set of
estimate a 3D trajectory of this moving vehicle. In locations is typically kept small so that ef cient
topological localization, the position of the sensor retrieval in real-time applications is a tractable
Computer Vision Augmented Geospatial Localization 277
problem. While metric approaches provide accu- vironment has several applications in numerous
rate localization results, they tend to fail and drift scenarios. Since it provides very accurate loca-
over time as the vehicle traverses big distances in tion information in real time, it can be used
its geographic region. On the other hand, due to for autonomous navigation for self-driving cars,
its nite state space, topological approaches pro- unmanned aerial vehicles (UAV) and unmanned
vide a robust localization but only rough position ground vehicles (UGV). A vision-based system
estimates. A fusion of the metric and topological provides rich real-time information that allows C
approaches achieves accurate metric results while an autonomous navigation system to tackle a
maintaining the robustness of topological match- dynamic environment, such as appearance of un-
ing, which is a technique typically referred to expected objects in the vicinity of the vehicle that
as topometric localization. It uses a ne-grained were absent during the mapping phase, which is
topological map, where each node has an associ- not available in other PNT systems.
ated coordinate of its real metric location. Such A robust GPS alternative is particularly impor-
topological maps can be acquired from sources tant for military applications since GPS signals
like GIS databases for outdoor navigation. can easily be denied to mission critical naviga-
Finding the node of the current location translates tion systems on several assets, especially in con-
to nding the metric coordinate of the vehicle. tested territory. Relatively cost-effective vision-
A generic topometric localization algorithm based geo-localization can be alternatively used
involves the two stages of map creation and then by guidance systems on weapons platforms like
localization. A vehicle equipped with cameras, missiles, drones, and UGVs.
IMU, and GNSS-capable device rst traverses the Community-driven map generation projects
routes to be mapped. GPS and inertial sensors like OpenStreetMap are extremely popular
are used to create a graph of this environment. (Floros et al. 2013). Accurate and information-
The graph is metric in the sense that the nodes rich vision-based localization can simultaneously
contain the exact location of the vehicle. From correct registration errors in these maps and also
the acquired images using an onboard camera in annotate the maps with geo-referenced objects
monocular or stereo con guration, visual local like buildings, road signs, vegetation, and other
features are extracted. These features are pro- geographic entities.
cessed and stored in a database with a reference Vision-based localization is typically un-
to the node corresponding to its real location. hampered by its environment, unlike radio
At runtime, the vehicle drives over the routes signals which suffer issues of multipath and
included in the a priori map. Video imagery is propagation path losses by absorption. Since
processed online to obtain features. As the vehi- it can be used in most environments, it can
cle moves, these visual features are matched with also be used ubiquitously with disregard to
those in the database. Since there are potentially change in environments like transitioning from
multiple feature matches from different parts of outdoors to indoors, driving through tunnels,
the mapped region, a method like Bayesian l- etc. which otherwise typically require a hand-off
tering is utilized to estimate the probability den- between different PNT systems operational in
sity function of the position of the vehicle. This their respective environments.
facilitates pruning of false-positive matches and
provides accurate localization and a smooth esti-
mated trajectory of the moving vehicle. Future Directions
ef ciently combined for ubiquitous navigation Paul R, Newman P (2010) FAB-MAP 3D: topological
while traveling across different environments mapping with spatial and visual appearance. In: 2010
IEEE international conference on robotics and au-
without degradation in localization accuracy. tomation, Anchorage, pp 2649 2656
The quality of information in GIS databases and Thrun S, Montemerlo M, Dahlkamp H, Stavens D, Aron
accuracy of geospatial localization are synergistic A, Diebel J, Fong P, Gale J, Halpenny M, Hoffmann G,
where one improves the other and vice versa. Lau K, Oakley C, Palatucci M, Pratt V, Stang P, Stro-
hband S, Dupont C, Jendrossek LE, Koelen C, Markey
Cross-referencing and registration of visual infor- C, Rummel C, van Niekerk J, Jensen E, Alessandrini
mation from different mobile platforms including P, Bradski G, Davies B, Ettinger S, Kaehler A, Ne an
UAV and UGV can improve GIS databases and A, Mahoney P (2006) Stanley: the robot that won the
provide a ground and aerial map of a geographic darpa grand challenge: research articles. J Robot Syst
23(9):661 692
region for accurate 3D geospatial localization.
Cross-References
Computing Fitness of Use of
Bayesian Network Integration with GIS Geospatial Datasets
Feature Detection and Tracking in Support of
GIS Leen-Kiat Soh and Ashok Samal
Indoor Localization Department of Computer Science and
OpenStreetMap Engineering, The University of Nebraska at
Optimal Location Queries on Road Networks Lincoln, Lincoln, NE, USA
Road Network Data Model
Spatial Analysis along Networks
Synonyms
from different approaches in order to compute the and reliability in many real applications when
FoU of a dataset. it is impossible to obtain precise measurements
and results from real experiments. In addition,
the Dempster-Shafer belief Theory provides a
Historical Background framework to combine the evidence from mul-
tiple sources and does not assume disjoint out-
In most applications, sometimes it is assumed comes (Sentz and Ferson 2002). Additionally, the C
that the datasets are perfect and without any Dempster-Shafer s measures are not less accurate
blemish. This assumption is, of course, not true. than Bayesian methods, and in fact reports have
The data is merely a representation of a continu- shown that it can sometimes outperform Bayes
ous reality both in space and time. It is dif cult theory (Cremer et al. 1998; Braun 2000).
to measure the values of a continuous space and
time variable with in nite precision. Limitations
Scientific Fundamentals
are also the result of inadequate human capac-
ity, sensor capabilities and budgetary constraints.
Assume that there is a set of geospatial datasets,
Therefore, the discrepancy exists between the re-
S D fS1 ; S2 ; ; Sn g. A dataset Si may consist
ality and the datasets that are derived to represent
of many types of information including (and not
it. It is especially critical to capture the degree of
limited to) spatial coordinates, metadata about
this discrepancy when decisions are made based
the dataset, denoted by auxi , and the actual time
on the information derived from the data. Thus,
series data, denoted by tsi .
this measure of quality of a dataset is a function of
The metadata for a dataset may include the
the purpose for which it is used, hence it is called
type of information being recorded (e.g., precipi-
its tness of use (FoU). For a given application,
tation or volume of water in a stream), the period
this value varies among the datasets. Information
of record, and the frequency of measurement.
derived from high-FoU datasets is more useful
Thus,
and accurate for the users of the application than
that from low-FoU datasets. The challenge is auxi D htypei ; t bi ; t ei ; i nti i;
to develop appropriate methods to fuse derived
information of varying degrees of FoU as well as where t bi and t ei denote the beginning and the
of derived information from datasets of varying ending time stamps for the measurements, and
degrees of FoU. This will give insights as to how inti is the interval at which the measurements are
the dataset can be used or how appropriate the made. Other metadata such as the type and age of
dataset is for a particular application (Yao 2003). the recording device can also be added.
An information theoretic approach is used to The time series data in a dataset may consist
compute the FoU of a dataset. The Dempster- of a sequence of measurements,
Shafer belief theory (Shafer 1976) is used as the
basis for this approach in which the FoU is repre- t si D mi;1 ; mi;2 ; : : : ; mi;p :
sented as a range of possibilities and integrated
Each measurement stores both the time the
into one value based on the information from
measurement was taken and the actual value
multiple sources. There are several advantages
recorded by the sensor. Thus, each measurement
of the Dempster-Shafer belief theory. First, it
is given by
does not require that the individual elements fol-
low a certain probability. In other words, Bayes mi;j D ti;j ; vi;j :
theory considers an event to be either true or
untrue, whereas the Dempster-Shafer allows for It is assumed that the measurements in the
unknown states (Konks and Challa 2005). This dataset are kept in chronological order. Therefore,
characteristic makes the Dempster-Shafer belief
theory a powerful tool for the evaluation of risk ti;j < ti;k ; for j < k:
280 Computing Fitness of Use of Geospatial Datasets
Furthermore, the rst and last measurement inconsistent with it. The values of both belief
times should match the period of record stored and plausibility range from 0 to 1. The belief
in the metadata, function (bel) and the plausibility function (pl)
are related by:
t bi D ti;1 and t ei D ti;p :
pl.P / D 1 bel.P /;
The problem of nding the suitability of a where P is the negation of the proposition P .
dataset for a given application is to de ne a Thus, bel.P / is the extent to which evidence is
function for the FoU that computes the tness of in favor of P .
use of a dataset described above. The function The term Frame of Discernment (FOD) con-
FoU maps Si to a normalized value between 0 sists of all hypotheses for which the information
and 1: sources can provide evidence. This set is nite
FoU .Si ; A/ D 0; 1 ; and consists of mutually exclusive propositions
where Si is a single dataset and A is the intended that span the hypotheses space. For a nite set
application of the data. The application A is of mutually exclusive propositions . /, the set of
represented in the form of domain knowledge possible hypotheses is its power set .2 /, i.e., the
that describes how the goodness of a dataset is set of all possible subsets including itself and a
viewed. A set of rules may be used to specify this null set. Each of these subsets is called a focal
information. Thus, element and is assigned a con dence interval
(belief, plausibility).
A D fR1 ; R2 ; ; Rd g ; Based on the evidence, a probability mass is
rst assigned to each focal element. The masses
are probability-like in that they are in the range
where Ri is a domain rule that describes the
[0, 1] and sum to 1 over all hypotheses. How-
goodness of a dataset and d is the number of
ever, they represent the belief assigned to a focal
rules. Therefore, the FoU function is de ned with
element. In most cases, this basic probability
respect to an application domain. Different appli-
assignment is derived from the experience and the
cations can use different rules for goodness and
rules provided by some experts in the application
derive different FoU values for the same dataset.
domain.
Given a hypothesis H , its belief is computed
Dempster-Shafer Belief Theory
as the sum of all the probability masses of the
The two central ideas of the Dempster-Shafer
subsets of H as follows:
belief theory are: (a) obtaining degrees of belief
from subjective probabilities for a related ques- X
tion, and (b) Dempster s rule for combining such bel.H / D m.e/;
degrees of belief when they are based on indepen- e H
values are then combined using Dempster s Com- to compute the FoU of the datasets are used.
bination Rule to derive joint evidence in order The heuristics can be based on common sense
to support a hypothesis from multiple sources. knowledge or can be based on expert feedback.
Given two basic probability assignments, mA The following criteria are used:
and mB for two independent sources (A and
B/ of evidence in the same frame of discern- Consistency A dataset is consistent if it does
ment, the joint probability mass, mAB , can be not have any gaps. A consistent dataset has a C
computed according to Dempster s Combination higher tness value
Rule: Length The period of record for the dataset is
P also an important factor in the quality. Longer
m.A/ m.B/ periods of record generally imply a higher
A\BDC
mAB .C / D P : tness value
1 m.A/ m.B/
A\BD; Recency Datasets that record more recent
observations are considered to be of a higher
Furthermore, the rule can be repeatedly ap- tness value
plied for more than two sources sequentially, and Temporal Resolution Data are recorded at
the results are order-independent. That is, com- different time scales (sampling periods). For
bining different pieces of evidence in different example, the datasets can be recorded daily,
sequences yields the same results. weekly or monthly. Depending on the applica-
Finally, to determine the con dence in a hy- tion, higher or lower resolution may be better.
pothesis H being true, belief and plausibility are This is also called the granularity (Mihaila
multiplied together: et al. 1999)
Completeness A data record may have many
con dence.H / D bel.H / pl.H /: attributes, e.g., time, location, and one or more
measurements. A dataset is complete if all the
Thus, the system is highly con dent regarding relevant attributes are recorded. Incomplete
a hypothesis being true if it has high belief and datasets are considered to be inferior (Mihaila
plausibility for that hypothesis being true. et al. 1999)
Suppose that there are three discrete FoU out- Noise All datasets have some noise due to
comes of the datasets suitable (s), marginal (m), many different factors. All these factors may
and unsuitable (u), and D fs; m; ug. Then, the lead to data not being as good for use in
frame of discernment is applications.
the rule is evaluated which assigns a value m to Likewise, the periodic variance can be derived
the probability mass for a given set of outcome for the time marks j as
types which in the example, {qtype} {suitable,
marginal, unsuitable}. vari;j
Applying a set of rules as de ned above to !2
dataset Si thus yields a set of masses for different P
k
2
P
k
k mi;p mi;p
combinations of outcome types. These masses are pD0
periodCj
pD0
periodCj
References
Conceptual Modeling
Ahonen-Rainio P, Kraak MJ (2005) Deciding on tness
for use: evaluating the utility of sample maps as an
element of geospatial metadata. Cartogr Geogr Inf Sci Spatiotemporal Database Modeling with an Ex-
32(2):101 112 tended Entity-Relationship Model
Braun J (2000) Dempster-Shafer theory and Bayesian
reasoning in multisensor data fuSion. In: Sen-
sor fusion: architectures, algorithms and applica-
tions IV. Proceedings of SPIE, vol 4051, Orlando, Conceptual Modeling of Geospatial
pp 255 266 Databases
Cremer F, den Breejen E, Schutte K (1998) Sensor data
fusion for antipersonnel land mine detection. In: Pro-
ceedings of EuroFusion98, Great Malvern, pp 55 60 Modeling with ISO 191xx Standards
De Bruin S, Bregt A, Van De Ven M (2001) Assessing
tness for use: the expected value of spatial data sets.
Int J Geogr Inf Sci 15(5):457 471
Konks D, Challa S (2005) An introduction to Bayesian and Conceptual Neighborhood
Dempster-Shafer data fusion available via DSTO-TR-
1436, Edinburgh, Nov 2005. http://www.dsto.defence.
Anthony G. Cohn
gov.au/publications/2563/DSTO-TR-1436.pdf
Mihaila G, Raschid L, Vidal ME (1999) Querying, Qual- School of Computing, University of Leeds,
ity of Data metadata. In: Proceedings of the third Leeds, UK
IEEE meta-data conference, Bethesda, Apr 1999
Sentz K, Ferson S (2002) Combination of evidence in
Dempster-Shafer belief theory. Available via SANDIA Synonyms
technical report SAND2002-0835. http://www.sandia.
gov/epistemic/Reports/SAND2002-0835.pdf
Shafer G (1976) A mathematical theory of evidence. Closest topological distance; Continuity network;
Princeton University Press, Princeton Qualitative similarity
Vasseur B, Devillers R, Jeansoulin R (2003) Ontological
approach of the tness of use of geospatial datasets. In:
Proceedings of 6th AGILE conference on geographic
information science, Lyon, pp 497 504 Definition
Yao X (2003) Research issues in spatiotemporal data min-
ing. A white paper submitted to the University Con-
A standard assumption concerning reasoning
sortium for Geographic Information Science (UCGIS)
workshop on geospatial visualization and knowledge about spatial entities over time is that change is
discovery, Lansdowne, 18 20 Nov 2003 continuous. In qualitative spatial calculi, such
Concurrency Control for Spatial Access 285
Concurrency Control for Spatial Access Method, Fig. 1 Representative spatial access methods
into one-dimensional access methods, e.g., the ally ordered, some traditional concurrency con-
B-tree family. trol techniques such as link-based protocols are
In order to apply the widely studied spatial dif cult to adapt to spatial databases.
access methods to real applications, particular
concurrency control protocols are required for Spatial Concurrency Control Techniques
the multi-user environment. The simultaneous Since the last decade of the twentieth century,
operations on spatial databases need to be treated concurrency control protocols on spatial access
as exclusive operations without interfering with methods have been proposed to meet the require-
each other. In other words, the results of any op- ments of multi-user applications. The existing
eration have to re ect the current stable snapshot concurrency control protocols mainly focus on
of the spatial database at the commit time. the R-tree family, and most of them were de-
The concurrency control techniques for spatial veloped based on the concurrency protocols on
databases have to be integrated with spatial ac- the B-tree family. Based on the locking strategy,
cess methods to process simultaneous operations. these protocols can be classi ed into two cat-
Most of the concurrency control techniques were egories, namely, link-based methods and lock-
developed for one-dimensional databases. How- coupling methods.
ever, the existing spatial data access methods, The link-based methods rely on a pseudo
such as R-tree family and grid les, are quite global order of the spatial objects to isolate each
different from the one-dimensional data access concurrent operation. These approaches process
methods (e.g., overlaps among data objects and update operations by temporally disabling the
among index nodes are allowed). Therefore, the links to the indexing node being updated so that
existing concurrency control methods are not the corresponding search operations will not
suitable for these spatial databases. Furthermore, retrieve any inconsistent data. For instance, to
because the spatial data set usually is not glob- split node A into A1 and A2 in Fig. 2, a lock
288 Concurrency Control for Spatial Access Method
F F F
A B A B A1 B
A2 A2
F F
A1 A2 B A1 A2 B
Concurrency Control for Spatial Access Method, Fig. 2 Example of node split in link-based protocol
will be requested to disable the link from A Sequence Number (NSN). The counter of the
to its right sibling node B (step a) before the NSN is incremented in a node split and a new
actual split is performed. Then, a new node A2 value is assigned to the original node with the
will be created in step b by using the second new sibling node receiving the original node s
half of A, and linked to node B. In step c, A prior NSN and right-link pointer. In order for
will be modi ed to be A1 (by removing the an insert operation to execute correctly in this
second half), and then unlocked. Node F will algorithm, multiple locks on two or more levels
be locked before adding a link from F to A2 in must be held. Partial lock coupling (PLC) (Song
step d . Finally, F will be unlocked in step e, and et al. 2004) has been proposed to apply a link-
thus the split is completed. Following this split based technique to reduce query delays due
process, no search operations can access A2 , and to MBR updates for multi-dimensional index
no update operations can access A.A1 / before structures. The PLC technique provides high
step c. Therefore, the potential con iction caused concurrency by using lock coupling only in MBR
by concurrent update operations on node A can shrinking operations, which are less frequent than
be prevented. As one example of the link-based expansion operations.
approach, R-link tree, a right-link style algorithm The lock-coupling-based algorithms (Chen
(Kornacker and Banks 1995), has been proposed et al. 1997; Ng and Kamada 1993) release the
to protect concurrent operations by assigning lock on the current node only when the lock
logical sequence numbers (LSNs) on the nodes on the next node to be visited has been granted
of R-trees. This approach assures each operation while processing search operations. As shown
has at most one lock at a time. However, when a in Fig. 3, using the R-tree in Fig. 1a, suppose
propagating node splits and the MBR updates, objects C , E, D, and F are indexed by an R-tree
this algorithm uses lock coupling. Also, in with two leaf nodes A and B. A search window
this approach, additional storage is required WS can be processed using the lock-coupling
to maintain additional information, e.g., LSNs approach. The locking sequence in Fig. 3 can
of associated child nodes. Concurrency on the protect this search operation from reading the
Generalized Search Tree (CGiST) (Kornacker intermediate results of update operations as well
et al. 1997) protects concurrent operations by as the results of update operations submitted after
applying a global sequence number, the Node WS. During node splitting and MBR updating,
Concurrency Control for Spatial Access Method 289
1. Lock (Root, S)
2. Lock (A, S)
B
C E 3. Unlock (Root, S)
WS D 4. GetObject (A)
F
A 5. Unlock(A, S) C
Concurrency Control for Spatial Access Method, Fig. 3 Example of locking sequence using lock-coupling for WS
B
C E
D Object deleted by WU: D;
WS WU F
Objects selected by WS: C, E, D.
A
Concurrency Control for Spatial Access Method, Fig. 4 Example of phantom update
this scheme holds multiple locks on several Consistency All operations must leave the
nodes simultaneously. The dynamic granular database in a consistent state.
locking approach (DGL) has been proposed to Isolation Operations cannot interfere with
provide phantom update protection (discussed each other.
later) in R-trees (Chakrabarti and Mehrotra 1998) Durability Successful operations must sur-
and GiST (Chakrabarti and Mehrotra 1999). vive system crashes.
The DGL method dynamically partitions the
embedded space into lockable granules that The approaches to guarantee Atomicity and
can adapt to the distribution of the objects. Durability in traditional databases can be applied
The lockable granules are de ned as the leaf in spatial databases. Current research on spatial
nodes and external granules. External granules concurrency control approaches mainly focus
are additional structures that partition the non- on the Consistency and Isolation rules. For
covered space in each internal node to provide example, in order to retrieve the valid records,
protection. Following the design principles of spatial queries should not be allowed to access
DGL, each operation requests locks only on the intermediate results of location updates.
suf cient granules to guarantee that any two Similarly, the concurrent location updates
con icting operations will request locks on at with common coverage have to be isolated as
least one common granule. sequential execution; otherwise, they may not be
processed correctly.
Scientific Fundamentals
Phantom Update Protection
ACID Rules In addition to the ACID rules, phantom update
Concurrency control for spatial access methods protection is used to measure the effectiveness of
should assure the spatial operations are processed a concurrency control. An example of phantom
following the ACID rules (Ramakrishnan and update is illustrated in Fig. 4, where C , E, D, and
Gehrke 2001). These rules are de ned as F are objects indexed in an R-tree, and leaf nodes
follows. A, B are their parents, respectively. A deletion
with the window WU is completed before the
Atomicity Either all or no operations are commitment of the range query WS. The range
completed. query returns the set fC; E; Dg, even object D
290 Concurrency Control for Spatial Access Method
should have been deleted by WU. A solution to sensitive messages to cell phone users within
prevent phantom update in this example is to lock a certain range. Concurrency control methods
the area affected by WU (which is D [ W U ) in should be employed to protect the search
order to prevent the execution of WS. process from frequent location updates, because
the updates are not supposed to reveal their
intermediate or expired results to the search
Measurement
process. Another example is a taxi management
The ef ciency of concurrency control for spatial
system that needs to assign a nearest available
access methods is measured by the throughput
taxi based on a client s request. Concurrency
of concurrent spatial operations. The issue to
control methods need to be applied to isolate
provide high throughput is to reduce the num-
the taxi location updating and queries so that
ber of unnecessary con icts among locks. For
the query results are consistent to the up-to-date
the example shown in Fig. 5, even if the update
snapshot of the taxi locations.
operation with window WU and the range query
with window WS intersect with the same leaf
node A, they will not affect each other s results.
Therefore, they should be allowed to access A Future Directions
simultaneously. Obviously, the smaller the lock-
able granules, the more concurrency operations The study on spatial concurrency control is far
will be allowed. However, this may signi cantly behind the research on spatial query process-
increase the number of locks in the database, and ing approaches. There are two interesting and
therefore generate additional overhead on lock emergent directions in this eld. One is to apply
maintenance. This is a tradeoff that should be concurrency control methods on complex spatial
considered when designing concurrency control operations; the other is to design concurrency
protocols. control protocols for moving object applications.
Complex spatial operations, such as spatial
join, k-nearest neighbor search, range nearest
Key Applications neighbor search, and reverse nearest neighbor
search, require special concern on concurrency
Concurrency control for spatial access methods control to be applied in multi-user applications.
are generally required in commercial multi- For example, how to protect the changing search
dimensional database systems. These systems range, and how to protect the large overall search
are designed to provide ef cient and reliable range have to be carefully designed. Furthermore,
data access. Usually, they are required to reliably the processing methods of those complex oper-
handle a large amount of simultaneous queries ations may need to be redesigned based on the
and updates. Therefore, sound concurrency concurrency control protocol in order to improve
control protocols are required in these systems. the throughput.
In addition, concurrency control methods are Spatial applications with moving objects
required in many speci c spatial applications have attracted signi cant research efforts. Even
which have frequent updates or need fresh query though many of these applications assume
results. For instance, a mobile advertise/alarm that the query processing is based on main
system needs to periodically broadcast time- memory, their frequent data updates require
Conflation of Features 291
Cross-References
Concurrent Spatial Operations
Indexing, Hilbert R-tree, Spatial Indexing,
Multimedia Indexing Concurrency Control for Spatial Access
Concurrency Control for Spatial Access
Method
References
Conflation of Features, Fig. 1 Two GIS data layers for Washington DC and their overlay
Like-Feature Detection
Scientific Fundamentals
Similarity Sets
A prototype of a feature con ation system is
shown in Fig. 2. Such a system would typically
Feature Matching
form the back end of a geographic information
and decision-support system used to respond to
user queries for matched features. Some systems Consolidated Data
may not implement all three steps while others
may further re ne some of the steps, e.g., like- Conflation of Features, Fig. 2 Feature con ation steps
feature-detection may be split into two steps that
either use or ignore the geographical context of
the features during comparison. Further details of unknown transformation T :
the basic steps appear in the following sections.
g.u; v/ D T .f .x; y//:
Registration and Recti cation
Registration refers to a basic problem in remote Thus, in order to recover the original informa-
sensing and cartography of realigning a recorded tion from the recorded observations, we must rst
digital image with known ground truth or an- determine the nature of the transformation T , and
other image. An early survey in geographic data then execute the inverse operation T 1 on this
processing (Nagy and Wagle 1979) formulates image.
the registration problem in remote sensing as Often, because only indirect information is
follows: available about T , in the form of another im-
The scene under observation is considered age or map of the scene in question, the goal
to be a 2D intensity distribution f .x; y/. The of registration becomes nding a mathematical
recorded digital, another 2-D distribution g.u; v/, transformation on one image that would bring it
is related to the true scene f .x; y/ through an into concurrence with the other image. Geometric
294 Conflation of Features
distortions in the recorded image, which affect the well-known Hamming or Levenshtein metric
only the position and not the magnitude, can aimed at transcription errors. An alternative mea-
be corrected by a recti cation step that only sure of string comparison, based on their phonetic
transforms the coordinates. representation (Hall and Dowling 1980), may be
better suited to transcription errors. However, the
Like-Feature Detection names are often word phrases that may look
The notion of similarity is fundamental to match- very different as character strings, but connote
ing features, as it is to many other elds, includ- the same object, e.g., National Gallery of Art
ing, pattern recognition, arti cial intelligence, and National Art Gallery . Table 1 (Samal et al.
information retrieval, and psychology. While the 2004) shows the type of string errors that string
human view of similarity may be subjective, matching should accommodate:
automation requires objective (quantitative) mea- For locations or points, the Euclidean distance
sures. is commonly used for proximity comparison. A
Similarity and distance are complementary generalization to linear features, such as streets
concepts. It is often intuitively appealing to de ne or streams, is the Hausdorff distance, which de-
a distance function d.A; B/ between objects A notes the largest minimum distance between the
and B in order to capture their dissimilarity and two linear objects. Goodchild and Hunter (1997)
convert it to a normalized similarity measure by describe a less computer-intensive and robust
its complement: method that relies on comparing two represen-
tations with varying accuracy. It estimates the
d.A; B/ percentage of the total length of the low-accuracy
s.A; B/ D 1 ; (1) representation that is within a speci ed distance
U
of the high-accuracy representation.
where the normalization factor U may be chosen The shape is an important attribute of polygo-
as the maximum distance between any two ob- nal features in GIS, such as building outlines and
jects that can occur in the data set. The normaliza- region boundaries. As polygons can be regarded
tion makes the value of similarity a real number as linear features, the Goodchild and Hunter
that lies between zero and one. approach may be adapted to de ne shape com-
Mathematically, any distance function must parison. A two-step process is described for this
satisfy the properties if minimality (d.a; b/ purpose by Samal et al. (2004). First, a veto
d.a; a/ 0), symmetry (d.a; b/ D d.b; a/), is imposed if the aspect ratios are signi cantly
and triangular inequality (d.a; b/ C d.b; c/ different. Otherwise, the shapes are scaled to
d.a; c/). However, in human perception studies, match the lengths of their major axes and overlaid
the distance function must be replaced by the by aligning their center points. The similarity
judged distance for which all of these math- of a less accurate shape A to a more accurate
ematical axioms have been questioned (Santini shape B is the percentage A within the buffer
and Jain 1999). Tversky (1977) follows a set- zone of B (see Fig. 3). When the accuracy of the
theoretic approach in de ning similarities be- two sources is comparable, the measure could
tween two objects as a function of the attributes be taken as the average value of the measure
that are shared by the two or by one but not computed both ways.
the other. His de nition is not required to follow Comparing scalars (reals or integers) seems
any of the metric axioms. It is particularly well to be straightforward: take the difference as
suited to fuzzy attributes with discrete overlap- their distance and convert to similarity by using
ping ranges of values. Eq. (1). The normalization factor U , however,
In GIS, the two objects being compared often must be chosen carefully to match intuition. For
have multiple attributes, such as name, location, example, one would say that the pair of numbers
shape, and area. The name attribute is often 10 and 20 is less similar to the pair 123,010
treated as a character string for comparison using and 123020, even though the difference is the
Conflation of Features 295
Conflation of
Error type Examples
Features, Table 1
Typical string errors and Sample 1 Sample 2
differences that matching Word omission Abraham Lincoln Memorial Lincoln Memorial
should accommodate Word substitution Reagan National Airport Washington National Airport
Word transposition National Art Gallery National Gallery of Art
Word abbreviation National Archives Nat l Archives
Character omission Washington Monument Washington Monument C
Character substitution Frear Gallery Freer Gallery
Conflation of Features, Fig. 4 Two similar features and their geographic contexts
The geographic context is de ned as the include clustering (Baraldi and Blonda 1999)
spatial relationships between objects in an and fuzzy logic (Zadeh 1965).
area (Samal et al. 2004). Examples of such For example, similarity of two buildings
relationships include topologies, distances, and appearing in different GIS data layers (as in
directions. Topological relationships, such as Fig. 1a, b) could be established by comparing
disjoint, meet, overlap, and covering are used their individual attributes, such as shape
by the researchers at the University of Maine to and coordinates. These context-independent
model the context of areal features in applications measures, however, may not be suf cient and
involving query by sketch and similarity of it may become necessary to use the geographical
spatial scenes (Bruns and Eggenhofer 1996). context to resolve ambiguities or correct errors.
Distances and angles of a feature to other features
have also been used to represent the geographic
context (Samal et al. 2004). Figure 4 shows Key Applications
the geographic contexts of two nearby features
with similar shapes. The contexts can be seen
Coverage Data gathering is the most expensive part of
to be different enough to disambiguate these two Consolida- building a geographical information system
features when they are compared with a candidate tion: (GIS). In traditional data gathering, this
feature in another source. Further, to keep the cost expense is directly related to the standards
of rigor used in data collection and data
of context-dependent matching under control, it
entry. Feature con ation can reduce the cost
may be enough to de ne the geographic context of GIS data acquisition by combining inex-
with respect to only a small number of well pensive sources into a superior source. With
chosen landmark features. the widespread use of the Web and GPS,
the challenge in consolidation is shifting
from improving accuracy to integrating an
abundance of widely distributed sources by
Feature Matching automated means
Spatial data By identifying common and missing fea-
update: tures between two sources through feature
The similarity measures discussed above for indi-
con ation, new features can be added to an
vidual attributes of a feature must be combined in old source or their attributed updated from
some fashion to provide overall criteria for fea- a newer map
ture matching. According to Cobb et al. (1998), Coverage Non-georeferenced spatial data must be
The assessment of feature match criteria is a registra- registered before it can be stored in a
tion: GIS. Good registration requires choosing
process in which evidence must be evaluated and a number of features for which accurate
weighed and a conclusion drawn not one in geo-positional information is available and
which equivalence can be unambiguously deter- which are also spatially accurate on the
mined . . . after all, if all feature pairs matched source. Spatial data update can help in iden-
tifying good candidate features for registra-
exactly, or deviated uniformly according to pre- tion
cise processes, there would be no need to con ate Error Feature con ation can not only tell which
the maps! detection: features in two sources are alike, but also
The problem can be approached as a restricted provide a degree of con dence for these
assertions. The pairs with low con dence
form of the classi cation problem in pattern can be checked manually for possible errors
recognition: Given the evidence provided by
the similarity scores of different attributes of
two features, determine the likelihood of one
feature belonging to the same class as the other Future Directions
feature. Because of this connection, it is not
surprising that researchers have used well-known Con ation in GIS can be thought of as part of
techniques from pattern recognition to solve the the broader problems in the information age
feature matching problem. These techniques of searching, updating, and integration of data.
Conflation of Geospatial Data 297
Because the sense of place plays such an Hall P, Dowling G (1980) Approximate string matching.
important role in our lives, all kinds of non- ACM Comput Surv 12(4):381 402
Longley PA, Goodchild MF, Maguire DJ, Rhind DW
geographical data related to history and culture (2001) Geographic information systems and science.
can be tied to a place and thus become a candidate Wiley, Chichester
for con ation. In this view, geographical Lynch MP, Saal eld A (1985) Con ation: automated map
reference becomes a primary key used by compilation a video game approach. In: Proceedings
search engines and database applications to
of the auto-carto 7 ACSM/ASP, Falls Church, 11 Mar C
1985
consolidate, lter, and access the vast amount Nagy G, Wagle S (1979) Geographic data processing.
of relevant data distributed among many data ACM Comput Surv 11(2):139 181
sources. The beginnings of this development Rodriguez A, Egenhofer M (2003) Determining semantic
similarity among entity classes from different ontolo-
can already be seen in the many applications gies. IEEE Trans Knowl Data Eng 15:442 456
already in place or envisaged for Google Earth Saal eld A (1988) Con ation: automated map compila-
and other similar resources. If the consolidated tion. Int J GIS 2(3):217 228
data remains relevant over a period of time and Samal A, Seth S, Cueto K (2004) A feature-based ap-
proach to con ation of geospatial sources. Int J GIS
nds widespread use, it might be stored and used 18(5):459 589
as a new data source, much in the same fashion Santini S, Jain R (1999) Similarity measures. IEEE Trans
as the results of con ation are used today. Pattern Anal Mach Intell 21(9):87 883
The traditional concern in con ation for posi- Tversky A (1977) Features of similarity. Psychol Rev
84:327 352
tional accuracy will diminish in time with the in- White M (1981) The theory of geographical data con a-
creasing penetration of GPS in consumer devices tion. Internal Census Bureau draft document
and the ready availability of the accurate position Zadeh LA (1965) Fuzzy sets. Inf Control 8:338 353
of all points on the earth. The need for updating
old data sources and integrating them with new
information, however, will remain an invariant.
Recommended Reading
Geospatial data con ation is the compilation or For a number of years, signi cant manual effort
reconciliation of two different geospatial datasets has been required to con ate two geospatial
covering overlapping regions (Saalfeld 1988). In datasets by identifying features in two datasets
general, the goal of con ation is to combine the that represent the same real-world features,
best quality elements of both datasets to create then aligning spatial attributes and non-spatial
a composite dataset that is better than either of attributes of both datasets. Automated vector
them. The consolidated dataset can then provide and vector con ation was rst proposed by
additional information that cannot be gathered Saalfeld (1988), and the initial focus of con ation
from any single dataset. was using geometrical similarities between
Based on the types of geospatial datasets dealt spatial attributes (e.g., location, shape, etc.)
with, the con ation technologies can be catego- to eliminate the spatial inconsistency between
rized into the following three groups: two overlapping vector maps. In particular, in
Saalfeld (1988), Saalfeld discussed mathematical
theories to support the automatic process.
Vector to vector data con ation: A typical
From then, various vector to vector con ation
example is the con ation of two road net-
techniques have been proposed (Walter and
works of different accuracy levels. Figure 1
Fritsch 1999; Ware and Jones 1998) and
shows a concrete example to produce a su-
many GIS systems (such as Con ex (http://
perior dataset by integrating two road vec-
www.digitalcorp.com/con ex.htm)) have been
tor datasets: road network from US Census
implemented to achieve the alignments of
TIGER/Line les, and road network from the
geospatial datasets. More recently, with the
department of transportation, St. Louis, MO
proliferation of attributed vector data, attribute
(MO-DOT data).
information (i.e., non-spatial information) has
Vector to raster data con ation: Fig. 2 is an
become another prominent feature used in the
example of con ating a road vector dataset
con ation systems, such as ESEA MapMerger
with a USGS 0.3 m per pixel color image.
(http://www.esea.com/products/) and the system
Using the imagery as the base dataset for
developed by Cobb et al. (1998).
position, the con ation technique can cor-
Most of the approaches mentioned above fo-
rect the vector locations and also annotate
cus on vector to vector con ation by adapting
the image with appropriate vector attributes
different techniques to perform the matching.
(as Fig. 2b).
However, due to the rapid advances in remote
Raster to raster data con ation: Fig. 3 is an
sensing technology from the 1990s to capture
example of con ating a raster street map (from
high resolution imagery and the ready accessibil-
MapQuest) with a USGS image. Using the im-
ity of imagery over the Internet, such as Google
agery as the base dataset for position, the con-
Maps (http://maps.google.com/) and Microsoft
ation technique can create intelligent images
TerraService (http://terraservice.net/), the con a-
that combine the visual appeal and accuracy
tion with imagery (such as vector to imagery
of imagery with the detailed attributes often
con ation, imagery to imagery con ation and
contained in maps (as Fig. 3b).
raster map to imagery con ation) has become
one of the central issues in GIS. The objectives
Also note that although the examples shown in of these imagery-related con ation are, of course,
Figs. 1, 2, and 3 are the con ation of datasets cov- to take full advantages of updated high resolu-
ering the same region (called vertical con ation), tion imagery to improve out-of-date GIS data
the con ation technologies can also be applied to and to display the ground truth in depth with
merge adjacent datasets (called horizontal con a- attributes inferred from other data sources (as
tion). the examples shown in Figs. 2b and 3b). Due to
Conflation of Geospatial Data 299
Conflation of Geospatial
Data, Fig. 1 An example
of vector to vector
con ation
the natural characteristics of imagery (or, more knowledge about the approximate location and
generally, geospatial raster data), the matching shape of the counterpart elements in the image,
strategies used in con ation involve more image- thus improving the accuracy and running time to
processing or pattern recognition technologies. detect matched features from the image. Mean-
Some proposed approaches (Cobb et al. 1998; while, there are also numerous research activ-
Flavie et al. 2000) rely on edge detections or ities (Chen et al. 2004a; Seedahmed and Mar-
interest-point detections to extract and convert tucci 2002; Dare and Dowman 2000) focusing
features from imagery to vector formats, and on con ating different geospatial raster datasets.
then apply vector to vector con ation to align Again, these approaches perform diverse image-
them. Other approaches (Agouris et al. 2001; processing techniques to detect and match coun-
Chen et al. 2006a; Eidenbenz et al. 2000), how- terpart elements, and then geometrically align
ever; utilize the existing vector data as prior these raster datasets so that the respective pixels
knowledge to perform a vector-guided image or their derivatives (edges, corner point, etc.)
processing. Conceptually, the spatial informa- representing the same underlying spatial structure
tion on the vector data represents the existing are fused.
300 Conflation of Geospatial Data
Conflation of Geospatial
Data, Fig. 2 An example
of vector to raster data
con ation (Modi ed gure
from Chen et al. 2006a)
Conflation of Geospatial Data, Fig. 3 An example of raster map to imagery con ation (Modi ed gure from Chen
et al. 2004a)
point pairs, in two datasets, (2) Match checking: the third step in the above-mentioned con ation
Filter inaccurate control point pairs from the set framework). The conclusion of Saalfeld s work is
of control point pairs for quality control, and that Delaunay triangulation is an effective strat-
(3) Spatial attribute alignment: Use the accurate egy to partition the domain space into triangles
control points to align the rest of the geospatial (in uence regions) to de ne local adjustments
objects (e.g., points or lines) in both datasets by (see the example in Fig. 4). A Delaunay trian-
using space partitioning techniques (e.g., triangu- gulation is a triangulation of the point set with
lation) and geometric interpolation techniques. the property that no point falls in the interior
During the late 1980s, Saalfeld (1988) initial- of the circumcircle of any triangle (the circle
ized the study to automate the con ation process. passing through the three triangle vertices). The
He provided a broad mathematical context for Delaunay triangulation maximizes the minimum
con ation theory. In addition, he proposed an angle of all the angles in the triangulation, thus
iterative con ation paradigm based on the above- avoiding elongated, acute-angled triangles. The
mentioned con ation framework by repeating the triangle vertices (i.e., control points) of each
matching and alignment, until no further new triangle de ne the local transformation within
matches are identi ed. In particular, he investi- each triangle to reposition other features. The
gated the techniques to automatically construct local transformation used for positional interpo-
the in uence regions around the control points lation is often the af ne transformation, which
to reposition other features into alignment by consists of a linear transformation (e.g., rota-
appropriate local interpolation (i.e., to automate tion and scaling) followed by a translation. An
302 Conflation of Geospatial Data
af ne transformation can preserve collinearity cally merge datasets based on the control points,
and topology. The well-known technique, rubber- many algorithms have been invented around this
sheeting (imagine stretching a dataset as if it were con ation paradigm with a major focus on solv-
made of rubber), typically refers to the process ing the matching (correspondence) problem to
comprising triangle-based space partition and the nd accurate control point pairs (i.e., to automate
transformation of features within each triangle. the rst two steps in the above-mentioned con-
What Saalfeld discovered had a profound im- ation framework). However, feature matching
pact upon con ation techniques. From then on, algorithms differ with the types of datasets un-
the rubber-sheeting technique (with some vari- dergoing the match operation. In the following,
ants) is widely used in con ation algorithms, we discuss existing con ation (matching) tech-
because of the sound mathematical theories and nologies based on the types of geospatial datasets
because of its success in many practical exam- dealt with.
ples. In fact, these days, most of commercial
con ation products support the piecewise rubber- Vector to vector conflation: There have been
sheeting. Due to the fact that rubber-sheeting has a number of efforts to automatically or semi-
become commonly known strategy to geometri- automatically accomplish vector to vector
Conflation of Geospatial
Data, Fig. 4 An example
of Delaunay triangulation
based on control points
(Modi ed gure from
Chen et al. 2006a)
Conflation of Geospatial Data 303
con ation. Most of the existing vectorvector requires more data-speci c image process-
con ation algorithms are with a focus on road ing techniques to identify the corresponding
vector data. These approaches are different, features from raster data. Some exiting ap-
because of the different methods utilized proaches, for example, include:
for locating the counterpart elements from Con ating two images by extracting and
both vector datasets. The major approaches matching various features (e.g., edges and
include: feature points) across images (Seedahmed C
Matching vector data based on the simi- and Martucci 2002; Dare and Dowman
larities of geometric information (such as 2000).
nodes and lines) (Saalfeld 1988; Walter and Con ating a raster map and imagery
Fritsch 1999; Ware and Jones 1998). by computing the relationship between
Matching attributeannotated vector data two feature point sets detected from
based on the similarities of vector shapes as the datasets (Chen et al. 2004a). In this
well as the semantic similarities of vector approach, especially, these feature points
attributes (Cobb et al. 1998). are generated by exploiting auxiliary
Matching vector data with unknown coor- spatial information (e.g., the coordinates of
dinates based on the feature point (e.g., the imagery, the orientations of road segments
road intersection) distributions (Chen et al. around intersections from the raster map,
2006b). etc.) and non-spatial information (e.g., the
Vector to imagery conflation: Vector to im- image resolution and the scale of raster
agery (and Vector to raster) con ation, on the maps). Figure 3b is the example result
other hand, mainly focus on developing effec- based on this technology.
tive and ef cient image processing techniques
to resolve the correspondence problem. The
major approaches include:
Detecting all salient edges from imagery Key Applications
and then comparing with vector data (Filin
and Doytsher 2000). Con ation technologies are used in many appli-
Utilizing vector data to identify corre- cation domains, most notably the sciences and
sponding image edges based on (modi ed) domains using high quality spatial data such as
Snakes algorithm (Agouris et al. 2001; GIS.
Kass et al. 1987).
Utilizing stereo images, elevation data and
knowledge about the roads (e.g., parallel- Cartography
lines and road marks) to compare vector It is well known that computers and mathematical
and imagery (Eidenbenz et al. 2000). methods have had a profound impact upon car-
Exploiting auxiliary spatial information tography. There has been a massive proliferation
(e.g., the coordinates of imagery and vec- of geospatial data, and no longer is the traditional
tor, the shape of roads around intersections, paper map the nal product. In fact, the focus of
etc.) and non-spatial information (e.g., the cartography has shifted from map production to
image color/resolution and road widths) to the presentation, management and combination
perform a localized image processing to of geospatial data. Maps can be produced on
compute the correspondence (Chen et al. demand for specialized purposes. Unfortunately,
2004b, 2006a). Figure 2b is the example the data used to produce maps may not always be
result based on this technology. consistent. Geospatial data con ation can be used
Raster to raster conflation: In general, raster to address this issue. For example, we can con-
to raster con ation (e.g., imagery to imagery ate to out-of-date maps with up-to-date imagery
con ation and map to imagery con ation) to identify inconsistencies.
304 Conflation of Geospatial Data
a b
Conflation of Geospatial Data, Fig. 5 An example of parcel vector data to imagery con ation
verify and update road vector data. The ability vided by the parcel data (as an example shown
to automatically con ate the original road vector Fig. 5b) can be combined with the visible in-
data with images supports more ef cient and formation provided by the imagery. Therefore,
accurate updates of road vector. the con ation of these datasets can provide cost
savings for many applications, such as county,
Real Estate city, and state planning, or integration of diverse
With the growth of the real estate market, there datasets for more accurate address geocoding or
are many online services providing real estate emergency response.
records by superimposing the parcel boundaries
on top of high-resolution imagery to show the lo-
cation of parcels on imagery. However, as is typ- Future Directions
ically the case in integrating different geospatial
datasets, a general problem in combining parcel With the rapid improvement of geospatial data
vector data with imagery from different sources is collection techniques, the growth of Internet and
that they rarely align (as shown in Fig. 5a). These the implementation of Open GIS standards, a
displacements can mislead the interpretation of large amount of geospatial data is now readily
parcel and land use data. As the example shown available. There is a pressing need to combine
in Fig. 5, parcel data are often represented as these datasets together using con ation technol-
polygons and include various attributes such as ogy. Although there has been signi cant progress
ownership information, mailing address, acreage, on automatic con ation technology in the last
market value and tax information. The cities and few years, there is still much work to be done.
counties use this information for watershed and Important research problems include, but are not
ood plain modelling, neighborhood and trans- limited to the following: (1) resolving discrepan-
portation planning. Furthermore, various GIS ap- cies between datasets with very different levels
plications rely on parcel data for more accurate of resolution and thematic focus, (2) extending
geocoding. By con ating parcel vector data and existing technologies to handle a broad range of
imagery, the detailed attribution information pro- datasets (in addition to road networks), such as
306 Conflict Resolution
elevation data and hydrographic data, (3) allow- and GIS data. Int Arch Photogramm Remote Sens
ing for uncertainty in the feature matching stage, 33:282 288
Flavie M, Fortier A, Ziou D, Armenakis C, Wang S
and (4) improving the processing time (espe- (2000) Automated updating of road information from
cially for raster data) to achieve con ation on the aerial images. In: Proceedings of American soci-
y. ety photogrammetry and remote sensing conference,
Amsterdam
Kass M, Witkin A, Terzopoulos D (1987) Snakes:
active contour models. Int J Comput Vis 1(4):
Cross-References 321 331
Saalfeld A (1988) Con ation: automated map
Change Detection compilation. Int J Geogr Inf Sci 2(3):
217 228
Intergraph: Real-Time Operational Geospatial
Seedahmed G, Martucci L (2002) Automated image reg-
Applications istration using geometrical invariant parameter space
Photogrammetric Applications clustering (GIPSC). In: Proceedings of the photogram-
Uncertain Environmental Variables in GIS metric computer vision, Graz
Walter V, Fritsch D (1999) Matching spatial data sets:
Voronoi Diagram
a statistical approach. Int J Geogr Inf Sci 5(1):
445 473
Ware JM, Jones CB (1998) Matching and aligning features
in overlayed coverages. In: Proceedings of the 6th
References ACM symposium on geographic information systems,
Washington, DC
Agouris P, Stefanidis A, Gyftakis S (2001) Differential
snakes for change detection in road segments. Pho-
togramm Eng Remote Sens 67(12):1391 1399
Chen C-C, Knoblock CA, Shahabi C, Chiang Y-Y,
Thakkar S (2004a) Automatically and accurately con- Conflict Resolution
ating orthoimagery and street maps. In: Proceedings
of the 12th ACM international symposium on ad-
vances in geographic information systems, Washing- Computing Fitness of Use of Geospatial
ton, DC Datasets
Chen C-C, Shahabi C, Knoblock CA (2004b) Utiliz- Smallworld Software Suite
ing road network data for automatic identi cation of
road intersections from high resolution color orthoim-
agery. In: Proceedings of the second workshop on
spatiotemporal database management (co-located with Consequence Management
VLDB2004), Toronto
Chen C-C, Knoblock CA, Shahabi C (2006a) Automat-
ically con ating road vector data with orthoimagery. Emergency Evacuations, Transportation Net-
Geoinformatica 10(4):495 530 works
Chen C-C, Shahabi C, Knoblock CA, Kolahdouzan M
(2006b) Automatically and ef ciently matching road
networks with spatial attributes in unknown geometry
systems. In: Proceedings of the third workshop on Conservation Medicine
spatiotemporal database management (co-located with
VLDB2006), Seoul
Cobb M, Chung MJ, Miller V, Foley H III, Petry FE, Shaw Exploratory Spatial Analysis in Disease Ecol-
KB (1998) A rule-based approach for the con ation of ogy
attributed vector data. GeoInformatica 2(1):7 35
Dare P, Dowman I (2000) A new approach to automatic
feature based registration of SAR and SPOT images.
Int Arch Photogramm Remote Sens 33(B2):125 130
Eidenbenz C, Kaser C, Baltsavias E (2000) ATOMI Constrained Nearest Neighbor
automated reconstruction of topographic objects from Queries
aerial images using vectorized map information. Int
Arch Photogramm Remote Sens 33(Part 3/1):462 471
Filin S, Doytsher Y (2000) A linear con ation approach Variations of Nearest Neighbor Queries in Eu-
for the integration of photogrammetric information clidean Space
Constraint Data, Visualizing 307
PointA.x; y/ W - x D 0; y D 5:
LineAB.x; y/ W - x 0; y 0; x C 2y D 10:
PolygonC.i; x; y/ W - i D 1; x 2y 5; x C y 15;
x 5:
PolygonC.i; x; y/ W - i D 2; x C 2y 5x C y 25;
x 3y 5; x 5:
Constraint databases are well suited for spatiotemporal data in an identical format
animation because they allow any granularity and the support of recursive queries make
for the animation without requiring much constraint databases a good approach for
data storage (Revesz 2002). Beyond that, the many dif cult visualization problems, such
ability of representing spatiotemporal and non- as the visualization of recursively de ned
308 Constraint Data, Visualizing
Constraint Data, 10
Visualizing, Fig. 1
Visualization of point,
polyline, and polygon in
constraint databases 1
5 A 2
C
0 B
0 5 10 15 20
spatiotemporal concepts discussed in Revesz model are also implemented in some constraint
and Wu (2004, 2006). database systems. For example, the MLPQ
Although most existing constraint database system implements both the regular polygon
systems can only visualize 2-D spatiotemporal visualization and the parametric rectangle
objects, they can be extended to visualize three visualization. The last one, named the PReSTO
or even higher-dimensional spatiotemporal ob- system, implements several special animation
jects. By introducing new variables into the lin- features like Collide and Block. With the
ear constraints, constraint databases can repre- increased number of applications developed from
sent higher-dimensional objects similar to 2-D the spatial constraint database systems, the aim of
objects. The visualization of those objects is ef ciently and naturally visualizing sophisticated
reduced to a process of visualizing the union of spatial or spatiotemporal constraint data attracts
basic higher-dimensional blocks. more and more attention.
Scientific Fundamentals
Historical Background
Static Displays
Constraint databases, including spatial constraint Any 2-D static display can be reduced to the
databases, were proposed by Kanellakis, Kuper, visualization of points, polylines, and polygons.
and Revesz in (1990). They showed in Kanellakis In constraint databases, a point can be directly
et al. (1995) that ef cient, declarative database represented by linear equations over two vari-
programming can be combined with ef cient ables .x; y/. For example, point A(1,1) in Fig. 2
constraint solving and suggested that the can be represented as
constraint database framework can be applied
to manage spatial data. A few years later, several
spatial constraint databases systems, such as A.x; y/ W - x D 1; y D 1:
the MLPQ system (Revesz and Li 1997), the
CCUBE system (Brodsky et al. 1997), the It is a trivial problem to visualize a point with x
DEDALE system (Grumbach et al. 1998), and and y coordinations. Things are a little bit more
the CQA/CDB system (Goldin et al. 2003), complex for polylines and polygons.
were developed. During the development of The line segment between points B(1,3) and
those systems, convex polygons were the major C (3,1) can be represented as
visualization blocks presenting the outputs
of the spatial constraint databases systems. BC.x; y/ W - x C y D 4; x 1; x 3;
Extreme point data models like the rectangles
data model (Revesz 2002) and Worboys data y 1; y 3;
Constraint Data, Visualizing 309
Naïve Method
x, y, Extreme Points of
Constraints the convex
t =1
x, y, Extreme Points of
t=2 Constraints the convex
x, y Extreme Points of
Constraints the convex
t=n
x, y, t
Display
Constraints
Extreme Points of
t=1 the convex
Extreme Points of
t=n the convex
Parametric Method
Constraint Data, Visualizing, Fig. 3 Naive and parametric animation methods (See Fig. 16.11 in Revesz 2002)
The parametric animation method has a pre- way, the spatiotemporal data are visualized as an
processing step and a display step to speed up animation display.
the animation. The preprocessing step is executed
at the time the constraint relation is loaded or Key Applications
constructed. It rst computes the extreme points
of each polygon based on its constraint tuple. The visualization of spatial constraint databases
Then, each polygon is describable by a sequence is similar to the visualization of other GIS sys-
of extreme points. Finally, each extreme point is tems, such as the ARC/GIS system. However,
represented by parametric functions x D x.t / the power of ef ciently describing in nite spa-
and y D y.t /, which will be kept in memory until tial and spatiotemporal data and the support of
the close of the constraint relation. The display recursive queries make the visualization of spatial
step is executed every time the user requests an constraint databases more attractive for complex
animation display. After the user speci es the problems like visualization of the recursively
range and the granularity of time for the anima- de ned spatiotemporal concepts (Revesz and Wu
tion and sends the request to the system, the ex- 2004). These applications typically include prob-
treme point parametric functions are loaded and lems where various kinds of spatial and spa-
the time variable t is instantiated several times tiotemporal information such as maps, popula-
based on the required granularity. It generates tion, meteorology phenomena, and moving ob-
a sequence of polygon outputs and sequentially jects are represented and visualized. The follow-
and smoothly outputs them onto the monitor. This ing are some examples of such applications.
Constraint Data, Visualizing 311
Visualization Functions of the MLPQ are traveling at uniform speed de ned by the
Constraint Database System transition function in the constraint databases.
The MLPQ constraint database system imple- The Collide operator will generate a new relation
mented many visualization operators. For exam- that expresses the motion of the objects before
ple, the Complement operator returns the comple- and after collision. This operator can be used to
ment of the given spatial object. The Difference visualize applications like the crash of two cars
operator generates the difference between two or the contact of two billiard balls. C
spatial objects. Three commonly used visualiza-
tion operators are described as follows. Applications Based on Recursively De ned
Concepts
2D Animation Visualization of recursively de ned concepts is
The MLPQ constraint database system can dis- a general problem that appears in many areas.
play the spatiotemporal relations in animations. For example, drought areas based on the Stan-
It provides a set of buttons for the user to control dardized Precipitation Index (SPI) and long-term
the displaying of the animation. The animation air pollution areas based on safe and critical
button allows the user to set the start and end level standards are recursively de ned concepts.
time, the time interval of two frames, and the In Revesz and Wu (2004), a general and ef cient
speed of the animation. The play and playback representation and visualization method was pro-
buttons play the animation forward and back- posed to display recursively de ned spatiotem-
ward, respectively. The rst, forward, next, and poral concepts. Sample applications such as vi-
last buttons allow the user to navigate between sualization of drought and pollution areas were
frames. implemented to illustrate the method.
the performance of visualizing spatial constraint Kanellakis PC, Kuper GM, Revesz P (1990) Constraint
databases over the Internet are being developed. query languages. In: Proceedings of ACM sympo-
sium on principles of database systems, Nashville,
By adding one more parameter in the database, pp 299 313
spatial constraint databases can easily represent Kanellakis PC, Kuper GM, Revesz P (1995) Constraint
3-D data. However, by the time of writing this query languages. J Comput Syst Sci 51(1):26 52
entry, only primitive visualization methods like Keil JM (1985) Decomposing a polygon into simpler
components. SIAM J Comput 14(4):799 817
isometric color bands are implemented in some O Rourke J, Supowit KJ (1983) Some NP-hard poly-
existing constraint database systems. For exam- gon decomposition problems. IEEE Trans Inf Theory
ple, the map can be visualized by discrete color 29(2):181 190
zones according to the values of variable ·, which Revesz P (2002) Introduction to constraint databases.
Springer, New York
can be used to represent the value of elevation, Revesz P, Li Y (1997) MLPQ: a linear constraint database
precipitation, or temperature. Using 2-D images system with aggregate operators. In: Proceedings
to visualize 3-D objects is a temporary solution of 1st international database engineering and appli-
for this problem. Compared to the 3-D visual- cations symposium. IEEE Press, Washington DC,
pp 132 137
ization of other commercial GIS systems, this Revesz P, Wu S (2004) Visualization of recursively de-
method has many restrictions on the 3-D objects ned concepts. In: Proceedings of the 8th international
to be visualized, and the result is not impressive. conference on information visualization. IEEE Press,
The implementations of real 3-D visualization of Washington DC, pp 613 621
Revesz P, Wu S (2006) Spatiotemporal reasoning about
constraint databases are being developed. epidemiological data. Artif Intell Med 38(2):157 170
Rigaux P, Scholl M, Segou n L, Grumbach S (2003)
Building a constraint-based spatial database sys-
tem: model, languages, and implementation. Inf Syst
Cross-References 28(6):563 595
Schachter I (1978) Decomposition of polygons into con-
vex sets. IEEE Trans Comput 27(11):1078 1082
Constraint Database Queries
Constraint Databases and Data Interpolation
Constraint Databases and Moving Objects
Constraint Databases, Spatial Constraint Database Queries
MLPQ Spatial Constraint Database System
Raster Data Lixin Li
Vector Data Department of Computer Sciences, Georgia
Southern University, Statesboro, GA, USA
References
Synonyms
Brodsky A, Segal V, Chen J, Exarkhopoulo P (1997) The
CCUBE constraint object-oriented database system. Constraint query languages; Datalog, SQL; Logic
Constraints 2(3 4):245 277 programming language
Chazelle B, Dobkin D (1979) Decomposing a polygon
into its convex parts. In: Proceedings of 11th annual
ACM symposium on theory of computing, Atlanta,
pp 38 48 Definition
Goldin D, Kutlu A, Song M, Yang F (2003) The constraint
database framework: lessons learned from CQA/CDB.
In: Proceedings of international conference on data A database query language is a special-purpose
engineering, Bangalore, pp 735 737 programming language designed for retrieving
Grumbach S, Rigaux P, Segou n L (1998) The information stored in a database. Structured
{DEDALE} system for complex spatial queries.
query language (SQL) is a very widely used
In: Proceedings of ACM SIGMOD international
conference on management of data, Seattle, commercially marketed query language for
pp 213 224 relational databases. Different from conventional
Constraint Database Queries 313
programming languages such as C, C++, tion , nd the amount of ultraviolet radiation for
or Java, a SQL programmer only needs to each ground location (x,y) at time t.
specify the properties of the information to
Since the input relations in Tables 1 and 2 in
be retrieved, but not the detailed algorithm
Entry Constraint Databases and Data Interpola-
required for retrieval. Because of this property,
tion only record the incoming ultraviolet radia-
SQL is said to be declarative. In contrast,
tion u and lter ratio r on a few sample points, C
conventional programming languages are said
these cannot be used directly to answer the query.
to be procedural.
Therefore, to answer this query, the interpolation
To query spatial constraint databases, any
results of INCOMING(y, t, u) and FILTER(x, y, r)
query language can be used, including SQL.
are needed. To write queries, it is not necessary
However, Datalog is probably the most popularly
to know precisely what kind of interpolation
used rule-based query language for spatial
method is used and what are the constraints used
constraint databases because of its power of
in the representation interpolation. The above
recursion. Datalog is also declarative.
query can be expressed in Datalog as follows
(Li 2003):
Ozone_orig(x, y, t, w), which records the origi- The leave-one-out cross-validation is a process
nal measured ozone value w at monitoring site that removes one of the n observation points
location .x; y/ and time t ; and uses the remaining n 1 points to estimate
Constraint Database Queries 315
its value, and this process is repeated at each Error.x; y; t; r/ W O·one_orig.x; y; t; w1/;
observation point (Hjorth 1994). The observa-
tion points are the points with measured origi- O·one_loocv.x; y; t; w2/;
nal values. For the experimental ozone data, the r Dj w1 w2 j =w1:
observation points are the spatiotemporal points
Avg_error.x; y; avg.r// W Error.x; y; t; r/:
.x; y; t /, where .x; y/ is the location of a mon-
itoring site and t is the year when the ozone S i t es_C hosen.x; y/ W Avg_error.x; y; ae/; C
measurement was taken. After the leave-one-out
ae >D 0:2:
cross-validation, each of the observation points
will not only have its original value but also
To nd the areas within 50 miles to the sites with
will have an interpolated value. The original and
more than 20 % interpolation errors, a GIS buffer
interpolated values at each observation point can
operation on the relation Sites_Chosen should
be compared for the purpose of an error analysis.
be performed. The buffer operation is provided
The interpolation error at each data point by
by many GIS software packages and the MLPQ
calculating the difference between its original and
constraint database system. After performing the
interpolated values is as follows:
buffer operation, an output relation will be cre-
j Ii Oi j ated which contains a 50-mile buffer around the
Ei D (1) locations stored in the Sites_Chosen relation.
Oi
Similarly, if there will be a budget cut, similar
where Ei is the interpolation error at observation queries to nd out and shut down the monitor-
point i , Ii is the interpolated value at point i , and ing sites with small interpolation errors can be
Oi is the original value at point i . designed.
Query 2 For a given location with longitude x
and latitude y, nd the ozone concentration level
in year t. House Price Data Example
This can be expressed in Datalog as follows:
The house price data consist of a set of real
O·one_value.w/ O·one_i nt erp.x; y; t; w/: estate data obtained from the Lancaster County
assessor s of ce in Lincoln, Nebraska. House sale
histories since 1990 are recorded in the real estate
Query 3 Suppose that in the future years, there
data set and include sale prices and times. In the
will be a budget increase so that new ozone mon-
experiment, 126 residential houses are randomly
itoring sites can be added. Find the best areas
selected from a quarter of a section of a township,
where new monitoring sites should be installed.
which covers an area of 160 acres. Furthermore,
In order to decide the best locations to add from these 126 houses, 76 houses are randomly
new monitoring sites, it is necessary to rst nd selected as sample data, and the remaining 50
those monitoring sites that have average large houses are used as test data. Figure 1 shows the 76
interpolation errors according to equation (1), for houses with circles and the 50 remaining houses
example, over 20 %. Then, do a buffer operation with stars.
on the set of monitoring sites with big errors to Tables 2 and 3 show instances of these two
nd out the areas within certain distance to each data sets. Based on the fact that the earliest sale
site, for example, 50 miles. Since the buffered of the houses in this neighborhood is in 1990, the
areas are the areas with poor interpolation result, time is encoded in such a way that 1 represents
these areas can be considered the possible areas January 1990, 2 represents February 1990, . . . ,
where new monitoring sites should be built. To and 148 represents April 2002. Note that some
nd the monitoring sites with more than 20 % in- houses are sold more than once in the past, so they
terpolation errors, perform the following Datalog have more than one tuple in Table 2. For example,
queries: the house at the location (888, 115) was sold three
316 Constraint Database Queries
2000
1500
1000
500
0
0 500 1000 1500 2000 2500 3000
Constraint Database Queries, Table 3 Test (x, y, t) Query 5 Suppose it is known that house prices
X Y T in general decline for some time after the rst
115 1525 16
sale. For each house, nd the rst month when it
115 1525 58
becomes pro table, that is, the rst month when
its price exceeded its initial sale price.
115 1525 81
115 1610 63 This can be expressed as follows:
120 1110 30
not _P rof i t able.x; y; t / W Bui lt .x; y; t /:
615 780 59
not _P rof i t able.x; y; t 2 / W
times in the past at time 4 and 76 (which represent not _P rof i t able.x; y; t 1 /;
4/1990 and 4/1996) (Li and Revesz 2002). House.x; y; t 2 ; p2 /; S t art .x; y; p/;
Assume that the input constraint relations
are House.x; y; t; p/ and Bui lt .x; y; t /. t2 D t1 C1; p2 > qp:
House.x; y; t; p/ represents the interpolation P rof i t able.x; y; t 2 / W
result of house price data, and Bui lt .x; y; t /
not _P rof i t able.x; y; t 1 /;
records the time t (in month) when the house at
location .x; y/ was built. The Built relation can House.x; y; t 2 ; p2 /; S t art .x; y; p/;
be usually easily obtained from real estate or city
t2 D t1 C 1; p2 > p:
planning agencies.
S t art .x; y; p/ W Bui lt .x; y; t /; T i me_t o_P rof i t .x; y; t 3 / W Bui lt .x; y; t 1 /;
House.x; y; t; p/: P rof i t able.x; y; t 2 /; t3 D t2 t1 :
Constraint Databases and Data Interpolation 317
All of the above queries could be a part of a Revesz P (2002) Introduction to constraint databases.
more complex data mining or decision support Springer, New York
Silberschatz A, Korth H, Sudarshan S (2006) Database
task. For example, a buyer may want to nd out system concepts, 5th edn. McGraw-Hill, New York
which builders tend to build houses that become Ullman JD (1989) Principles of database and
pro table in a short time or keep their values best. knowledge-base systems. Computer Science Press,
New York
C
Future Directions
Constraint Database Systems
Interesting directions for the future work could
be to continue to design more interesting queries
Linear Versus Polynomial Constraint Databases
in spatial constraint databases which can be a
Polynomial Spatial Constraint Databases
valuable part of decision support systems.
2002). It is the constraint level that makes it shape function-, IDW-, and Kriging-based
possible for computers to use nite number of spatiotemporal interpolation methods by using
tuples to represent in nite number of tuples at an actual real estate data set with house prices.
the logical level. Revesz and Wu (2006) also uses a shape function-
It is very common in GIS that sample mea- based interpolation method to represent the
surements are taken only at a set of points. Inter- West Nile virus data in constraint databases and
polation is based on the assumption that things implements a particular epidemiological system
that are close to one another are more alike called WeNiVIS that enables the visual tracking
than those that are farther apart. Interpolation is of and reasoning about the spread of the West
needed in order to estimate the values at unsam- Nile virus epidemic in Pennsylvania.
pled points.
Constraint databases are very suitable for rep-
resenting spatial/spatiotemporal interpolation re- Scientific Fundamentals
sults. In this entry, several spatial and spatiotem-
poral interpolation methods are discussed, and Suppose that the following two sets of sensory
the representation of their spatiotemporal inter- data are available in the database (Revesz and Li
polation that results in constraint databases is 2002):
illustrated by some examples. The performance
analysis and comparison of different interpola- Incoming (y, t , u) records the amount of
tion methods in GIS applications can be found in incoming ultraviolet radiation u for each pair
Li and Revesz (2002), Li and Revesz (2004), and of latitude degree y and time t , where time is
Li et al. (2006). measured in days.
Filter (x, y, r) records the ratio r of ultra-
violet radiation that is usually ltered out by
Historical Background the atmosphere above location (x, y) before
reaching the earth.
There exist a number of spatial interpolation
algorithms, such as inverse distance weighting Suppose that Fig. 1 shows the locations of the
(IDW), Kriging, splines, trend surfaces, and (y, t ) and (x, y) pairs where the measurements
Fourier series. Spatiotemporal interpolation is for u and r, respectively, are recorded. Then
a growing research area. With the additional Tables 1 and 2 could be instances of these two
time attribute, the above traditional spatial relations in a relational database.
interpolation algorithms are insuf cient for The above relational database can be trans-
spatiotemporal data, and new spatiotemporal lated into a constraint database with the two
interpolation methods must be developed. There constraint relations shown in Tables 3 and 4.
have been some papers addressing the issue Although any relational relation can be trans-
of spatiotemporal interpolation in GIS. Gao lated into a constraint relation as above, not all
(2006), Li et al. (2003), and Revesz and Wu the constraint relations can be converted back to
(2006) deal with the use of spatiotemporal relational databases. This is because a constraint
interpolations for different applications. Li relation can store in nite number of solutions.
et al. (2004) and Li and Revesz (2004) discuss For example, the in nite number of interpolation
several newly developed shape function based results of u and r for all the points in the domains
spatial/spatiotemporal interpolation methods. for Incoming (y, t , u) and Filter (x, y, r) can
There have been some applications on the shape be represented in a constraint database by a nite
function-based methods. For example, Li et al. number of tuples. The representation of interpo-
(2006) applies a shape function interpolation lation results in constraint databases by different
method to a set of ozone data in the conterminous methods for Incoming and Filter will be given in
USA, and Li and Revesz (2004) compares Key Applications.
Constraint Databases and Data Interpolation 319
Constraint Databases t
and Data Interpolation,
Fig. 1 The spatial sample 24 2 y
points for Incoming (left) 3 16 3
and Filter (right) 2
16
8
8 C
1 4 1 4
0 y 0 16 24 x
8 16 24 32 8
Constraint Databases and Data Interpolation, applications, for example, in nite element algo-
Table 1 Relational incoming (y, t, u) rithms (Zienkiewics and Taylor 2000). There are
ID Y T U various types of 2-D and 3-D shape functions.
1 0 1 60 2-D shape functions for triangles and 3-D shape
2 13 22 20 functions for tetrahedra are of special interest,
3 33 18 70 both of which are linear approximation meth-
4 29 0 40 ods. Shape functions are recently found to be a
good interpolation method for GIS applications,
Constraint Databases and Data Interpolation,
Table 2 Relational lter (x, y, r) and the interpolation results are very suitable to
be represented in linear constraint databases (Li
ID X Y R
et al. 2004; Li and Revesz 2002, 2004; Li et al.
1 2 1 0.9
2006; Revesz and Li 2002).
2 2 14 0.5
3 25 14 0.3
4 25 1 0.8
2-D Shape Function for Triangles
Constraint Databases and Data Interpolation,
When dealing with complex two-dimensional
Table 3 Constrain incoming (y, t, u) geometric domains, it is convenient to divide
the total domain into a nite number of simple
ID Y T U
sub-domains which can have triangular or
id y t u id=1, y D 0, t D 1, u D 60
quadrilateral shapes. Mesh generation using
id y t u id=2, y D 13, t D 22, u D 20
id y t u id=3, y D 33, t D 18, u D 70 triangular or quadrilateral domains is important
id y t u id=4, y D 29, t D 0, u D 40 in nite element discretization of engineering
problems. For the generation of triangular
Constraint Databases and Data Interpolation, meshes, quite successful algorithms have
Table 4 Constraint lter (x, y, r) been developed. A popular method for the
ID X Y R generation of triangular meshes is the Delaunay
id x y r id=1, x D 2, y D 1, r D 0:9 triangulation (Preparata and Shamos 1985).
id x y r id=2, x D 2, y D 14, r D 0:5 A linear interpolation function for a triangu-
id x y r id=3, x D 25, y D 14, r D 0:3 lar area can be written in terms of three shape
id x y r id=4, x D 25, y D 1, r D 0:8 functions N1 , N2 , N3 , and the corner values w1 ,
w2 , w3 . In Fig. 2, two triangular nite elements, I
and II, are combined to cover the whole domain
Key Applications considered (Li and Revesz 2004).
In this example, the function in the whole
Applications Based on Shape Function domain is interpolated using four discrete values
Spatial Interpolation w1 , w2 , w3 , and w4 at four locations. A particular
Shape functions, which can be viewed as a spatial feature of the chosen interpolation method is that
interpolation method, are popular in engineering the function values inside the sub-domain I can
320 Constraint Databases and Data Interpolation
w2 w3(x3, y3)
w4
A2 w A1
(x,y)
w1 A3
w3
w1(x1, y1) w2(x2, y2)
Constraint Databases and Data Interpolation, Fig. 2
Linear interpolation in space for triangular elements Constraint Databases and Data Interpolation, Fig. 3
Computing shape functions by area divisions
w1 w1(x1,y1,z1)
w(x,y,z)
w2 w3(x3,y3,z3)
w3
C
w2(x2,y2,z2)
w4
w5 w4(x4,y4,z4)
Constraint Databases and Data Interpolation, Fig. 4 Constraint Databases and Data Interpolation, Fig. 5
Linear interpolation in space for tetrahedral elements Computing shape functions by volume divisions
2 3 2 3
values for element II can be constructed using x 2 1 ·2 x2 y2 1
the corner values w1 , w3 , w4 , and w5 . Suppose c1 D det 4 x3 1 ·3 5 %%d1 D det 4 x3 y3 1 5
V is the volume of the tetrahedral element I. The x 4 1 ·4 x4 y4 1
linear interpolation function for element I can be with the other constants de ned by cyclic inter-
written as: change of the subscripts in the order 4, 1, 2, 3
(Zienkiewics and Taylor 2000).
w.x; y; ·/ D N1 .x; y; ·/w1 C N2 .x; y; ·/w2 Alternatively, considering only the tetrahedral
element I, the 3-D shape function (5) can also be
C N3 .x; y; ·/w3 C N4 .x; y; ·/w4
expressed as follows (Li and Revesz 2004):
2 3
w1
6 w2 7 V1 V2
D N1 N 2 N 3 N 4 6 4 w3 5
7 N1 .x; y; ·/ D ; N2 .x; y; ·/ D ;
V V
(6)
w4 V3 V4
(4) N3 .x; y; ·/ D ; N4 .x; y; ·/ D :
V V
where N1 , N2 , N3 and N4 are the following shape
functions: V1 , V2 , V3 and V4 are the volumes of the four
sub-tetrahedra ww2 w3 w4 , w1 ww3 w4 , w1 w2 ww4 ,
a1 C b1 x C c1 y C d1 · and w1 w2 w3 w, respectively, as shown in Fig. 5;
N1 .x; y; ·/ D ; and V is the volume of the outside tetrahedron
6V
a2 C b2 x C c2 y C d2 · w1 w2 w3 w4 .
N2 .x; y; ·/ D ;
6V
(5) Representing Interpolation Results in
a3 C b3 x C c3 y C d3 · Constraint Databases
N3 .x; y; ·/ D ;
6V In traditional GIS, spatial data are represented in
a4 C b4 x C c4 y C d4 · the relational data model, which is the most
N4 .x; y; ·/ D :
6V popular data model. Many database systems
are based on the relational model, such as
By expanding the other relevant determinants into Oracle and MySQL. However, the relational
their cofactors, there exists model has disadvantages for some applications,
which may lead to in nite relational databases
2 3 2 3
x 2 y 2 ·2 1 y 2 ·2 (Revesz 2002). An in nite relational database
a1 Ddet 4 x3 y3 ·3 5 %%b1 D det 4 1 y3 ·3 5 means the database has relations with in nite
x 4 y 4 ·4 1 y 4 ·4 number of tuples. In reality, only a nite
322 Constraint Databases and Data Interpolation
set of the tuples can be stored in a relation. FILTER(x,y,r) be the constraint relation that
Therefore, a nite set of tuples has to be represents the shape function interpolation result
extracted, which leads to data incompleteness. of the Filter relation.
Using constraint databases can solve this in nity Triangulation of the set of sampled points is
problem. the rst step to use 2-D shape functions. Figure 6
The sensory data of the ultraviolet radiation shows the Delaunay triangulations for the sample
example in scienti c fundamentals will be points in Incoming(y, t , u) and Filter(x, y, r)
used to illustrate how to represent 2-D shape illustrated in Fig. 1.
function spatial interpolation results in constraint The domain of a triangle can be represented
databases. In this example, Incoming(y, t , u) is by a conjunction C of three linear inequalities
treated as if it contains a set of 2-D spatial data. corresponding to the three sides of the triangle.
Let INCOMING(y, t , u) be the constraint relation Then, by the shape function (2), the value w of
that represents the shape function interpolation any point x; y inside a triangle can be represented
result of the Incoming relation. Similarly, let by the following linear constraint tuple:
R.x; y; w/ W C;
w D ..y2 y3 /w1 C .y3 y1 /w2 C .y1 y2 /w3 /=.2A/ x
C ..x3 x2 /w1 C .x1 x3 /w2 C .x2 x1 /w3 /=.2A/ y
C ..x2 y3 x3 y2 /w1 C .x3 y1 x1 y3 /w2
C.x1 y2 x2 y1 /w3 /=.2A/ :
where A is a constant for the area value of the tension (Li and Revesz 2002). These methods can
triangle. By representing the interpolation in each be described brie y as follows:
triangle by a constraint tuple, a constraint relation
to represent the interpolation in the whole domain Reduction This approach reduces the
can be found in linear time. spatiotemporal interpolation problem to
Table 5 illustrates the constraint representation a regular spatial interpolation case. First,
for the interpolation result of FILTER using 2-D interpolate (using any 1-D interpolation
shape functions. The result of INCOMING is in time) the measured value over time at
similar, and the details can be found in reference each sample point. Then get spatiotemporal
Revesz and Li (2002). interpolation results by substituting the
desired time instant into some regular spatial
Applications Based on Shape Function
interpolation functions
Spatiotemporal Interpolation
There are two fundamentally different ways for Extension This approach deals with time as
spatiotemporal interpolation: reduction and ex- another dimension in space and extends the
Constraint Databases 2
and Data Interpolation,
Fig. 6 Delaunay 3
triangulations for Incoming
(left) and Filter (right)
2 3
1 1
4 4
Constraint Databases and Data Interpolation 323
Constraint Databases and Data Interpolation, approximation in space and time. The second
Table 5 FILTER (x, y, r) using 2-D shape functions
step, interpolation in space and time, can be
X Y R implemented by combining a time shape function
x y r 13x 23y C 296 0, x 2, y 1, with the space approximation function (1).
r D 0:0004x 0:0031y C 0:1168 Assume the value at node i at time t1 is wi1 ,
x y r 13x 23y C 296 < q0, x < q25, y < q14, and at time t2 the value is wi2 . The value at the
r D 0:0013x 0:0038y C 0:1056 node i at any time between t1 and t2 can be C
interpolated using a 1-D time shape function in
the following way:
spatiotemporal interpolation problem into a
one-higher dimensional spatial interpolation t2 t t t1
wi .t / D wi1 C wi2 : (7)
problem t2 t1 t2 t1
t2 t t t1 t2 t t t1
w.x; y; t / D N1 .x; y/ w11 C w12 C N2 .x; y/ w21 C w22
t2 t1 t2 t1 t2 t1 t2 t1
t2 t t t1
CN3 .x; y/ w31 C w32
t2 t1 t2 t1
t2 t
D N1 .x; y/w11 C N2 .x; y/w21 C N3 .x; y/w31
t2 t1
t t1
C N1 .x; y/w12 C N2 .x; y/w22 C N3 .x; y/w32 :
t2 t1
The reduction approach for 3-D space and 1-D ample shown in Fig. 4, the interpolation function
time problems can be developed in a similar way for any point constraint to the sub-domain I at
by combining the 3-D interpolation formula (4) any time between t1 and t2 can be expressed as
and the 1-D shape function (7). Using the ex- follows (Li and Revesz 2004):
h i h i
t2 t t t1
w.x; y; ·; t / D N1 .x; y; ·/ w
t2 t1 11
C w
t2 t1 12
C N2 .x; y; ·/ tt22 tt1 w21 C tt2 tt11 w22
h i h i
t2 t t t1
CN3 .x; y; ·/ w
t2 t1 31
C t2
w
t1 32
C N4 .x; y; ·/ tt22 tt1 w41 C tt2 tt11 w42
t2 t
D t2 t 1
N1 .x; y; ·/w11 C N2 .x; y; ·/w21 C N3 .x; y; ·/w31 C N4 .x; y; ·/w41 (8)
C tt2 t1
t1
N1 .x; y; ·/w12 C N2 .x; y; ·/w22 C N3 .x; y; ·/w32 C N4 .x; y; ·/w42 :
Since the 2-D/3-D space shape functions and tiotemporal interpolation function (110) is not
the 1-D time shape function are linear, the spa- linear but quadratic.
324 Constraint Databases and Data Interpolation
Extension Approach sites for which AIRS data are collected, the an-
For 2-D space and 1-D time problems, this nual concentration level measurements of ozone
method treats time as a regular third dimension. (O3), and the years of the measurement. Several
Since it extends 2-D problems to 3-D problems, datasets from the US EPA (website http://cfpub.
this method is very similar to the linear epa.gov/gdm) were obtained and reorganized into
approximation by 3-D shape functions for a dataset with schema .x; y; t; w/, where x and
tetrahedra. The only modi cation is to substitute y attributes are the longitude and latitude co-
the variable · in Eqs. (4), (5), and (6) by the time ordinates of monitoring site locations, t is the
variable t . year of the ozone measurement, and w is the
For 3-D space and 1-D time problems, this O34MAX (4th Max of 1-h Values for O3) value
method treats time as a regular fourth dimension. of the ozone measurement. The original dataset
New linear 4-D shape functions based on 4-D has many zero entries for ozone values, which
Delaunay tessellation can be developed to solve means no measurements available at a particular
this problem. See reference Li (2003) for details site. After ltering out all the zero entries from
on the 4-D shape functions. the original dataset, there are 1209 sites left with
measurements. Figure 7 shows the locations of
Representing Interpolation Results in the 1209 monitoring sites Li et al. (2006).
Constraint Databases Among the 1209 monitoring sites with mea-
The previous section pointed out the in nity surements, some sites have complete measure-
problem for relational databases to represent spa- ments of yearly ozone values from 1994 to 1999,
tial data. The relational data model shows more while the other sites have only partial records. For
disadvantages when handling spatiotemporal example, some sites only have measurements of
data. For example, using the relational model, the ozone values in 1998 and 1999. In total, there are
current contents of a database (database instance) 6135 ozone value measurements recorded. Each
is a snapshot of the data at a given instant in measurement corresponds to the ozone value at a
time. When representing spatiotemporal data, spatiotemporal point .x; y; t /, where .x; y/ is the
frequent updates have to be performed in order location of one of the 1209 monitoring sites, and
to keep the database instance up to date, which t is a year between 1994 and 1999.
erases the previous database instance. Therefore, The spatiotemporal interpolation extension
the information in the past will be lost. This method based on 3-D shape functions is
irrecoverable problem makes the relational data implemented into a Matlab program and applied
model impractical for handling spatiotemporal to the AIRS ozone data. The Matlab function
data. Using constraint data model can solve delaunayn is used to compute the tetrahedral
this problem. A set of Aerometric Information mesh with the 6135 spatiotemporal points as
Retrieval System (AIRS) data will be used to corner vertices. There are 30,897 tetrahedra in
illustrate how spatiotemporal interpolation data the resulting mesh. Using the mesh and the
can be represented accurately and ef ciently in original 6135 original ozone values measured
constraint databases. at its corner vertices, the annual ozone value at
The experimental AIRS data is a set of data any location and year can be interpolated, as long
with annual ozone concentration measurements as the spatiotemporal point is located inside the
in the conterminous USA (website www.epa.gov/ domain of the tetrahedral mesh.
airmarkets/cmap/data/category1.html). AIRS is a Since the 3-D shape function based spa-
computer-based repository of information about tiotemporal interpolation Eq. (4) is linear, the
airborne pollution in the US and various World interpolation results can be stored in a linear
Health Organization (WHO) member countries. constraint database. Suppose the constraint
The system is administered by the US Environ- relation Ozone_interp is used to store the
mental Protection Agency (EPA). The data cov- interpolation results. Table 6 shows one sample
erage contains point locations of the monitoring tuple of Ozone_interp. The other omitted tuples
Constraint Databases and Data Interpolation 325
Constraint Databases and Data Interpolation, Fig. 7 1209 AIRS monitoring sites with measurements in the
conterminous US
Constraint Databases and Data Interpolation, Table 6 The constraint relation Ozone_interp(x, y, t, w), which
stores the 3-D shape function interpolation results of the ozone data
X Y R W
0:002532x C 0:003385y C 0:000511t 1,
0:002709x C 0:003430y C 0:000517t 1,
0:002659x C 0:003593y C 0:000511t < q1,
0:002507x C 0:003175y C 0:000515t < q1,
x y t w v D 0:0127,
v1 D 1=6 j 1:71x C 2:17y C 0:35t 682:87 j,
v2 D 1=6 j 2:10x C 2:84y C 0:40t 790:39 j,
v3 D 1=6 j 1:28x C 1:63y C 0:24t 474:05 j,
v4 D 1=6 j 2:53x C 3:38y C 0:51t 999:13 j,
wv D 0:063v1 C 0:087v2 C 0:096v3 C 0:074v4
::
x y t w :
::
:
are of similar format. Since there are 30,897 1999). The ozone values measured at these
tetrahedra generated in the tetrahedral mesh, four points are 0.063, 0.087, 0.096, and 0.074,
there should be 30,897 tuples in Ozone_interp. respectively. In this constraint tuple, there are
The tuple shown in Table 6 corresponds 10 constraints. The relationship among these
to the interpolation results of all the points constraints is AND. The rst four constraints
located in the tetrahedron with corner vertices de ne the four facets of the tetrahedron, the next
( 68:709; 45:217; 1996), ( 68:672; 44:736; 1999), ve constraints give the volume values, and the
( 67:594; 44:534; 1995), and ( 69:214; 45:164; last constraint is the interpolation function.
326 Constraint Databases and Data Interpolation
Applications Based on IDW Spatial in each region have the same closest members
Interpolation of S . As in an ordinary Voronoi diagram, each
Inverse distance weighting (IDW) interpolation Voronoi region is still convex in a higher-order
(Shepard 1968) assumes that each measured Voronoi diagram. From the de nition of higher-
point has a local in uence that diminishes with order Voronoi diagrams, it is obvious to see that
distance. Thus, points in the near neighborhood the problem of nding the k closest neighbors
are given high weights, whereas points at a far for a given point in the whole domain, which is
distance are given small weights. Reference closely related to the IDW interpolation method
Revesz and Li (2003) uses IDW to visualize with N D k, is equivalent to constructing kth
spatial interpolation data. order Voronoi diagrams.
The general formula of IDW interpolation for Although higher-order Voronoi diagrams are
2-D problems is the following: very dif cult to create by imperative languages,
such as C, C++, and Java, they can be easily
N
X constructed by declarative languages, such as
w.x; y/ D i wi Datalog. For example, a second-order Voronoi re-
iD1 gion for points (x1 ; y1 ), (x2 ; y2 ) can be expressed
(9)
. d1i /p in Datalog as follows.
i D PN 1 p
At rst, let P .x; y/ be a relation that stores
kD1 . dk / all the points in the whole domain. Also let
Di st .x; y; x1 ; y1 ; d1 / be a Euclidean distance
where w.x; y/ is the predicted value at location relation where d1 is the distance between .x; y/
.x; y/, N is the number of nearest known points and .x1 ; y1 /. It can be expressed in Datalog as:
surrounding .x; y/; i are the weights assigned to
each known point value wi at location .xi ; yi /; di
are the 2-D Euclidean distances between each Di st .x; y; x1 ; y1 ; d1 / W
p
.xi ; yi / and .x; y/, and p is the exponent, which d1 D .x x1 /2 C .y y1 /2 :
in uences the weighting of wi on w.
For 3-D problems, the IDW interpolation
function is similar as formula (9), by measuring Note that any point .x; y/ in the plane does not
3-D Euclidean distances for di . belong to the second-order Voronoi region of the
sample points .x1 ; y1 / and .x2 ; y2 / if there exists
another sample point .x3 ; y3 / such that .x; y/
Representing Interpolation Results in
is closer to .x3 ; y3 / than to either .x1 ; y1 / or
Constraint Databases
.x2 ; y2 /. Using this idea, the complement can be
To represent the IDW interpolation, the nearest
expressed as follows:
neighbors for a given point should be found. The
idea of higher-order Voronoi diagrams (or kth
order Voronoi diagrams) can be borrowed from Not _2Vor.x; y; x1 ; y1 ; x2 ; y2 / W P .x3 ; y3 /;
computational geometry to help nd the nearest Di st .x; y; x1 ; y1 ; d1 /;
neighbors. Higher-order Voronoi diagrams gener- Di st .x; y; x3 ; y3 ; d3 /;
alize ordinary Voronoi diagrams by dealing with d1 > d3 :
k closest points. The ordinary Voronoi diagram of
a nite set S of points in the plane is a partition of
the plane so that each region of the partition is the Not _2Vor.x; y; x1 ; y1 ; x2 ; y2 / W P .x3 ; y3 /;
locus of points which are closer to one member Di st .x; y; x2 ; y2 ; d2 /;
of S than to any other member (Preparata and Di st .x; y; x3 ; y3 ; d3 /;
Shamos 1985). The higher-order Voronoi dia- d2 > d3 :
gram of a nite set S of points in the plane is a Finally, the negation of the above can be taken
partition of the plane into regions such that points to get the second-order Voronoi region as follows:
Constraint Databases and Data Interpolation 327
2Vor.x; y; x1 ; y1 ; x2 ; y2 /
(10)
W not Not_2Vor.x; y; x1 ; y1 ; x2 ; y2 / : (2,3)
•2
•3
The second-order Voronoi diagram will be the
union of all the nonempty second-order Voronoi (1,2)
regions. Similarly to the second order, any kth-
(2,4)
C
order Voronoi diagram can be constructed. (3,4)
Constraint Databases and Data Interpolation, Table 7 FILTER (x, y, r) using IDW
X Y R
x y r 2x y 20 < q0, 12x C 7y 216 < q0,
..x 2/2 C .y 14/2 /0:9 C ..x 2/2 C .y 1/2 /0:5
D .2.x 2/2 C .y 14/2 C .y 1/2 /r
x y r 2x y 20 < q0, 12x C 7y 216 < q0,
..x 25/2 C .y 1/2 /0:9 C ..x 2/2 C .y 1/2 /0:8
D .2.y 1/2 C .x 25/2 C .x 2/2 /r
x y r 2x y 20 0, 12x C 7y 216 0,
..x 25/2 C .y 14/2 /0:8 C ..x 25/2 C .y 1/2 /0:3
D .2.x 25/2 C .y 14/2 C .y 1/2 /r
x y r 2x y 20 < q0, 12x C 7y 216 0,
..x 25/2 C .y 14/2 /0:5 C ..x 2/2 C .y 14/2 /0:3
D .2.y 14/2 C .x 25/2 C .x 2/2 /r
p
Table 7 represent the four second-order Voronoi where di D .xi x/2 C .yi y/2 and
regions in Fig. 9. The result of INCOMING is wi .t / D tit2i 2 ti1
t
wi1 C tit2 ti1 w .
ti1 i2
similar and the details can be found in reference
Li (2003). Extension Approach
Since this method treats time as a third dimen-
sion, the IDW-based spatiotemporal formula is in
Applications Based on IDW the form of (9) with
Spatiotemporal Interpolation
p
di D .xi x/2 C .yi y/2 C .ti t /2 :
Similar as shape functions, IDW is originally a
spatial interpolation method, and it can be ex-
tended by reduction and extension approaches to
solve spatiotemporal interpolation problems (Li
2003).
Future Directions
Constraint Databases
and Moving Objects,
Fig. 1 A city street
network Strada and its
adjacency-list
representation
such as people, animals, stars, cars, planes, ships, city. Adjacency-list representation (Cormen et al.
and missiles (Erwig et al. 1997). In reference 1999) of directed weighted graphs can be applied
Saglio and Moreira (1999), Saglio and Moreira to model such networks. The city street network
argued that moving points are possibly the Strada is shown in Fig. 1a and its adjacency-list
simplest class of continuously changing spatial representation list is shown in Fig. 1b. Each street
objects and there are many systems, including has the following attributes: slope, speed limit,
those dealing with the position of cars, ships, or and snow clearance priority (the less the value,
planes, which only need to keep the position of the higher the priority). These three attributes
the objects. are shown as labels of each edge in Fig. 1a.
Continuously changing maps are special cases They are also displayed in the property elds of
of moving regions. There are many applications each node in Fig. 1b. For example, for the street
that need to be visualized by continuously chang- segment s! bc , the slope is 15 , the speed limit is
ing maps which will be illustrated in Key Appli- 35 mph, and the clearance priority value is 1. The
cations. movements of snow removal vehicles in Strada
can be represented in Fig. 2 by eight Datalog
Scientific Fundamentals rules in constraint databases.
Constraint Databases and Moving Objects, Fig. 2 Constraint databases representation of the snow removal
vehicles for Strada
(Management of Linear Programming Queries) Constraint Databases and Moving Objects, Table 1
system is a good example. The MLPQ system is A point-based spatiotemporal relation
a constraint database system for linear constraint Drought_Point
databases (Kanjamala et al. 1998; Revesz 2002; x (easting) y (northing) Year SPI
Revesz and Li 1997). This system has a graphic 315515:56 2178768:67 1992 0:27
user interface (GUI) which supports Datalog- 315515:56 2178768:67 1993 0:17
based and icon-based queries as well as visual- :: :: :: ::
: : : :
ization and animations. The MLPQ system can
outdo the popular ArcGIS system by powerful
queries (such as recursive queries) and the ability
to display continuously changing maps. A few Drought_Point relation, as shown in Table 1,
examples are given below. was obtained from the Uni ed Climate Access
Network (UCAN) (Li 2003).
Assume that in the point-based spatiotemporal
relation Drought_Point, the 48 weather stations
SPI Spatiotemporal Data have not changed their locations for the last
10 years and measured SPI values every year.
The point-based spatiotemporal relation The spatial and temporal parts of the 2nd-order
Drought_Point (x, y, year, SPI) stores the average Voronoi region-based relation of Drought_Point
yearly SPI (Standardized Precipitation Index) are shown in Table 2.
values sampled by 48 major weather stations in Continuously changing maps in MLPQ can be
Nebraska from 1992 to 2002. SPI is a common used to visualize the 2nd-order Voronoi diagrams.
and simple measure of drought which is based Users need to push the color animation button
solely on the probability of precipitation for a in the MLPQ GUI and input the following three
given time period. Values of SPI range from parameters: the beginning time instance, ending
2.00 and above (extremely wet) to 2.00 and time instance and step size. Then, the color of
less (extremely dry), with near normal conditions each region of the map will be animated ac-
ranging from 0.99 to 0.99. A drought event is cording to its value at a speci c time instance.
de ned when the SPI is continuously negative Figure 3 shows the 2nd-order Voronoi diagram
and reaches a value of 1.0 or less, and for the 48 weather stations in Nebraska at the
continues until the SPI becomes positive. The snapshot when t D 1992 (Li 2003).
332 Constraint Databases and Moving Objects
Constraint Databases and Moving Objects, Table 2 A 2nd-order Voronoi region-based database
Drought_Vo2_Space
.x1 ; y1 /; .x2 ; y2 / Boundary
( 9820:18, 1929867.40), ( 42164:88, 1915035.54) ( 17122:48, 2203344.58), (3014.51, 2227674.50) (330-
51.50, 2227674.50), (33051.5, 2140801.51)
:: ::
: :
Drought_Vo2_Time
.x1 ; y1 /; .x2 ; y2 / Year avgSPI
( 9820:18, 1929867.4), ( 42164:88, 1915035.54) 1992 -0.47
( 9820:18, 1929867.4), ( 42164:88, 1915035.54) 1993 0.71
:: :: ::
: : :
( 507929:66, 2216998.17), ( 247864:81, 1946777.44) 2002 -0.03
Constraint Databases and Moving Objects, Fig. 3 The 2rd order Voronoi diagram for 48 weather stations in
Nebraska which consists of 116 regions
Constraint Databases and Moving Objects, Table 3 A region-based spatiotemporal database with separate spatial
and temporal relations
Nebraska_Corn_Space_Region
County Boundary
1 ( 656160:3, 600676.8), ( 652484:0, 643920.3), ( 607691:1, 639747.6), ( 608934:8, 615649.0),
( 607875:6, 615485.8), ( 610542:0, 576509.1), ( 607662:7, 576138.5), ( 611226:9, 537468.5),
( 607807:7, 536762.1), ( 608521:1, 527084.0), ( 660885:4, 531441.2), ( 661759:8,532153.1) C
:: ::
: :
Nebraska_Corn_Time_Region
County Year Practice Acres Yield Production
1 1947 Irrigated 2700 49 132300
1 1947 Non-irrigated 81670 18 1470060
1 1947 Total 84370 19 1602360
:: :: :: :: :: ::
: : : : : :
Constraint Databases and Moving Objects, Fig. 4 A snapshot of continuously changing maps for county-based
corn yield in Nebraska when t D 1998
334 Constraint Databases, Spatial
polygon needs to be represented in MLPQ. Al- the third national conference on digital government
though such county vector data in the US are research, Boston
Revesz P (2002) Introduction to constraint databases.
usually available in ArcView shape le format, a Springer, New York
program can be implemented to convert ArcView Revesz P, Cai M (2002) Ef cient querying of peri-
shape les to MLPQ input text les. The conver- odic spatiotemporal objects. Ann Math Artif Intell
sion from MLPQ les to shape les can also be 36(4):437 457
Revesz P, Li Y (1997) MLPQ: a linear constraint database
implemented. Figure 4 shows the snapshot during system with aggregate operators. In: Proceedings of
the color map animation when t D 1998 (Li the 1st international database engineering and ap-
2003). plications symposium. IEEE Press, Washington, DC,
pp 132 137
Saglio J-M, Moreira J (1999) Oporto: a realistic scenario
generator for moving objects. In: Proceedings of the
DEXA 99 workshop on spatio-temporal data models
Future Directions
and languages (STDML), Florence, pp 426 432
Cross-References Synonyms
Constraint Databases,
Spatial, Fig. 1 A map of
Lincoln, Nebraska
In the above the attribute variables x and y in each row, it is customary to present them in an
represent the longitude and latitude, respectively, order that corresponds to a clockwise ordering of
as measured in units from the .0; 0/ point in the sides of the convex polygon that they together
the above map. In general, any polygonal shape represent.
can be represented by rst dividing it into a set
of convex polygons and then representing each
convex polygon with n sides by a conjunction Historical Background
of n linear inequality constraints. Note that the
above town map is a concave polygon, but it can Constraint databases, including spatial constraint
be divided along the line y D x C 8 into two databases, were proposed by Kanellakis et al. in
convex polygons. The convex pentagon above 1990. A much-delayed journal version of their
line y D x C 8 is represented by the rst original conference paper appeared in Kanellakis
row of the constraint table, while the convex et al. (1995). These papers considered a number
hexagon below line y D x C 8 is represented of constraint database query languages and
by the second row of the constraint table. Within challenged researchers to investigate further their
any row, the atomic constraints are connected by properties. Benedikt et al. (1998) showed that
commas, which simply mean conjunction. While relational calculus queries of constraint databases
the atomic constraints can be given in any order when the constraint database contains polynomial
336 Constraint Databases, Spatial
constraints over the reals cannot express even that contains all .x; y/ points that belong to the
simple Datalog expressible queries. On the other town map. Since there are an in nite number
hand, Datalog queries with linear constraints of such .x; y/ points when x and y are real
can already express some computationally hard numbers, spatial constraint databases are also
or even undecidable problems. Only in special called nitely representable in nite relational
cases, such as with gap-order constraints of the databases.
form xy c where x and y are integer or Spatial constraint databases can represent not
rational variables and c is a nonnegative constant, only areas but also boundaries by using lin-
can an algorithm be given for evaluating Datalog ear equality constraints. Representing the bound-
queries (Revesz 1993). ary of an n-ary (concave or convex) polygo-
The above results in uenced researchers to nal area requires n rows in a constraint table.
implement several spatial constraint database sys- For example, the boundary of the town of Lin-
tems with non-recursive query languages, usually coln, Nebraska, can be represented as shown in
some variation of non-recursive SQL and linear Table 2.
equality and linear inequality constraints. These In the above each range constraint of the form
systems include, in historical order, the MLPQ a x b is an abbreviation of a x; x b
system (Brodsky et al. 1997), the CCUBE sys- where x is any variable and a and b are constants.
tem (Brodsky et al. 1997), the DEDALE sys- Spatial constraint databases can be extended to
tem (Grumbach et al. 1998), and the CQA/CDB higher dimensions. For example, a Z attribute for
system (Goldin et al. 2003). The MLPQ system height or a T attribute for time can be added. As
implements both SQL and Datalog queries. an example, suppose that Fig. 1 shows the map of
Constraint databases are reviewed in a number Lincoln, Nebraska, in year 2000, and since then
of books. Chapter 5.6 of Abiteboul et al. (1995), the town has expanded to the east continuously
a standard reference in database theory, is a com- at the rate of one unit per year. Then the growing
pact description of the main ideas of constraint town area between years 2000 and 2007 can be
databases. Kuper et al. (2000) is a collection of represented as shown in Table 3.
research articles devoted to constraint databases. Spatial constraint databases with polynomial
It is a good introduction to already advanced re- constraints are also possible. With the increased
searchers. Revesz (2002) is the standard textbook complexity of constraints, more complex spatial
for the subject. It is used at many universities. and spatiotemporal objects can be represented.
Chapter 4 of Rigaux et al. (2002), which is an ex- For example, suppose that an airplane ies
cellent source on all aspects of spatial databases, over Lincoln, Nebraska. Its shadow can be
is devoted exclusively to constraint databases. represented as a spatial constraint database
Chapter 6 of Gting and Schneider (2005), which
is a sourcebook on moving object databases, is
also devoted exclusively to constraint databases. Constraint Databases, Spatial, Table 2
Lincoln_Boundary
X Y
Scientific Fundamentals x y x 2; x 6; y D 18
x y x 6; x 8; y D x C 24
The semantics of logical models of a spatial
x y x 8; x 12; y D 0:5x C 12
constraint database table is a relational database
x y x 12; x 14; y D 18
that contains all the rows that can be obtained
x y y 8; y 18; x D 14
by substituting values into the attribute variables
x y x 8; x 14; y D 8
of any constraint row such that the conjunction
x y x 6; x 8; y D 3x C 32
of constraints in that row is true. For example,
x y x 2; x 6; y D 14
it is easy to see that the semantics of the spatial
x y y 14; y 18; x D 2
constraint database table Lincoln is a relation
Constraint Databases, Spatial 337
relation Airplane_Shadow using polynomial databases can be queried. For example, the
constraints over the variables x; y and t . (Here popular Structured Query Language (SQL) for
the time unit t will be measured in seconds relational databases is also applicable to spatial
and not years as in the Lincoln_Growing constraint databases. For example, the following
example.) query nds when the towns of Lincoln, Nebraska,
Spatial constraint databases can be queried and Omaha, Nebraska, will grow into each
by the same query languages that relational other. C
SELECT Min(Lincoln_Growing.T)
FROM Lincoln_Growing, Omaha_Growing
WHERE Lincoln_Growing.X=Omaha_Growing.X AND
Lincoln_Growing.Y=Omaha_Growing.Y AND
Lincoln_Growing.T=Omaha_Growing.T
Suppose that the airplane ies over Lin- the shadow of the airplane will leave the
coln, Nebraska. The next query nds when town.
SELECT Max(Airplane_Shadow.T)
FROM Lincoln, Airplane_Shawdow
WHERE Lincoln.X = Airplane_Shadow.X AND
Lincoln.Y = Airplane_Shadow.Y
in a spatial constraint database like MLPQ, then it hurricane. Endangered airplanes can be given a
becomes easy to use the data for applications like warning and rerouted if need be. Another applica-
estimating price and tax payments for particular tion is checking whether the airspace surrounding
houses, estimating total taxes received by the an airport ever gets too crowded by the arriving
town from sale of houses in any subdivision of and departing airplanes.
the town, etc.
Constraint Programming
Context-Aware Role-Based Access
Integration of Spatial Constraint Databases Control
Definition
Contingency Management System
A continuous query is a new query type that is
Emergency Evacuation Plan Maintenance
issued once and is evaluated continuously in a
database server until the query is explicitly termi-
nated. The most important characteristic of con-
Continuity Matrix tinuous queries is that their query result does not
only depend on the present data in the databases
Spatial Weights Matrix but also on continuously arriving data. During
the execution of a continuous query, the query
result is updated continuously when new data
Continuity Network arrives. Continuous queries are essential to ap-
plications that are interested in transient and fre-
Conceptual Neighborhood
quently updated objects and require monitoring
query results continuously. Potential applications
of continuous queries include but are not lim-
ited to real-time location-aware services, network
Continuous Location-Based Queries ow monitoring, online data analysis and sensor
networks.
Continuous Queries in Spatio-Temporal Continuous queries are particularly important
Databases in Spatiotemporal Databases. Continuous spatio-
temporal queries are evaluated continuously
against spatiotemporal objects and their results
Continuous Queries are updated when interested objects change
spatial locations or spatial extents over time.
Indexing, Query and Velocity-Constrained Figure 1 gives an example of a continuous query
Queries in Spatiotemporal Databases, Time Pa- in a spatiotemporal database. In Fig. 1, o1 to
rameterized o8 are objects moving in the data space and
Q is a continuous spatiotemporal query that
tracks moving objects within the shaded query
region. As plotted in Fig. 1a, the query answer
Continuous Queries in of Q with respect to time t1 consists of three
Spatio-Temporal Databases objects: fo2 ; o3 ; o4 g. Assume that at a later time
t2 , the objects change their locations as shown
Xiaopeng Xiong1 , Mohamed F. Mokbel2 , and in Fig. 1b. Particularly, o2 and o3 move out
Walid G. Aref1 of the query region while o5 moves inside the
1
Department of Computer Science, Purdue query region. o4 also moves, however it remains
University, West Lafayette, IN, USA inside the query region. Due to the continuous
2
Department of Computer Science and evaluation of Q, the query answer of Q will be
Engineering, University of Minnesota, updated to fo4 ; o5 g at time t2 .
Minneapolis, MN, USA
Historical Background
Synonyms
The study of continuous spatiotemporal queries
Continuous location-based queries; Continuous started in the 1990s as an important part of the
query processing; Long-running spatiotemporal study of Spatiotemporal Databases. Since then,
queries; Moving queries continuous spatiotemporal queries have received
Continuous Queries in Spatio-Temporal Databases 341
Continuous Queries in a b
Spatio-Temporal
Databases, Fig. 1 An O1 O1 O6
example of continuous O6
query O2
Q Q
O3
O2 O3 O7 O7
O4
O4
O5
C
O8 O8
O5
At time t1 At time t2
Continuous Queries in a b
Spatio-Temporal
Databases, Fig. 2 O1 O6
O1
Continuous spatiotemporal O6
query types based on O2
mobility Q Q
O3
O2 O3 O7 O7
O4
O5
O4
O8 O8
O5
At time t1 At time t2
static while the query region or query point interest (i.e., the buses and the taxis) continu-
of the continuous spatiotemporal query ously move.
may change over time. This query type is Moving queries over moving objects. In this
abstracted in Fig. 2a. In Fig. 2a, the query o2 query category, both the query region/point
moves along with time (e.g., at t1 , t2 and t3 / of continuous spatiotemporal query and the
and the objects (represented by black dots) are objects of interest are capable of moving. This
stationary. query type is abstracted in Fig. 2c. As shown
in Fig. 2c, the query Q and the objects are both
Examples: moving over time (e.g., at t1 , t2 and t3 /.
Continuous Queries in a b
Spatio-Temporal
Databases, Fig. 3 O1 O1 O6
Continuous spatiotemporal O6
query types based on the O2
time of interest Q Q
O3
O2 O3 O7 O7
O4
O4
O8
O5
C
O8
O5
At time t1 At time t2
when kids step out of the home backyard or when nomena such as forest res or polluted water do-
an unidenti ed ight is predicted to y over a mains. Sensors in a sensor network continuously
military base in 5 min. detect environmental events and feed the data into
spatiotemporal databases. Then, the properties of
the environmental phenomena (e.g., the shape
Digital Battle eld and the movement of the re or the polluted water
In the digital battle eld, continuous queries can area) can be continuously monitored.
help commanders make decisions by continu-
ously monitoring the context of friendly units
such as soldiers, tanks and ights.
Cross-References
Continuously Changing Maps
Queries in Spatiotemporal Databases, Time Pa-
Constraint Databases and Moving Objects
rameterized
Spatiotemporal Database Modeling with an Ex-
tended Entity-Relationship Model
C
Contraflow for Evacuation Traffic
References Management
Kalashnikov DV, Prabhakar S, Hambrusch SE, Aref WA
(2002) Ef cient evaluation of continuous range queries Brian Wolshon
on moving objects. In: DEXA 02: proceedings of the Department of Civil and Environmental
13th international conference on database and expert Engineering, Louisiana State University, Baton
systems applications. Springer, Heidelberg, pp 731 Rouge, LA, USA
740
Kang JM, Mokbel MF, Shekhar S, Xia T, Zhang D (2007)
Continuous evaluation of monochromatic and bichro-
matic reverse nearest neighbors. In: ICDE, Istanbul Synonyms
Li JYY, Han J (2004) Continuous k-nearest neighbor
search for moving objects. In: SSDBM 04: proceed-
ings of the 16th international conference on scienti c All-lanes-out; Emergency preparedness; Evacua-
and statistical database management (SSDBM 04). tion planning; Merge designs; One-way-out evac-
IEEE Computer Society, Washington, DC, p 123 uation; Reversible and convertible lanes; Split
Mokbel MF, Aref WA, Hambrusch SE, Prabhakar S
(2003) Towards scalable location-aware services: re-
designs
quirements and research issues. In: GIS, New Orleans
Mokbel MF, Xiong X, Aref WA (2004) SINA: scalable
incremental processing of continuous queries in spa-
tiotemporal databases. In: SIGMOD, Paris Definition
Roddick JF, Hoel E, Egenhofer ME, Papadias D,
Salzberg B (2004) Spatial, temporal and spatiotem- Contra ow is a form of reversible traf c oper-
poral databases hot issues and directions for PHD
research. SIGMOD Rec 33(2):126 131 ation in which one or more travel lanes of a
Sellis T (1999) Chorochronos research on spatiotem- divided highway are used for the movement of
poral database systems. In: DEXA 99: proceedings traf c in the opposing direction (The common
of the 10th international workshop on database & de nition of contra ow for evacuations has been
expert systems applications. IEEE Computer Society,
Washington, DC, p 452 broadened over the past several years by emer-
Sellis TK (1999) Research issues in spatiotemporal gency management of cials, the news media, and
database systems. In: SSD 99: proceedings of the the public to include the reversal of ow on
6th international symposium on advances in spatial any roadway during an evacuation.) (American
databases. Springer, London, pp 5 11
Tao Y, Papadias D, Shen Q (2002) Continuous nearest Association of State Highway and Transportation
neighbor search. In: VLDB, Hong Kong Of cials 2001). It is a highly effective strategy
Xia T, Zhang D (2006) Continuous reverse nearest neigh- because it can both immediately and signi cantly
bor monitoring. In: ICDE 06: proceedings of the 22nd increase the directional capacity of a roadway
international conference on data engineering (ICDE
06), Washington, DC, p 77 without the time or cost required to plan, design,
Xiong X, Mokbel MF, Aref WG (2005) SEA-CNN: and construct additional lanes. Since 1999, con-
scalable processing of continuous K-nearest neighbor tra ow has been widely applied to evacuate re-
queries in spatiotemporal databases. In: ICDE, Tokyo
gions of the southeastern United States (US)
when under threat from hurricanes. As a result
Continuous Query Processing of its recent demonstrated effectiveness during
Hurricane Katrina (Wolshon 2006), it also now
Continuous Queries in Spatio-Temporal looked upon as a potential preparedness measure
Databases for other mass-scale hazards.
346 Contraflow for Evacuation Traffic Management
Contra ow segments are most common and While the date of the rst use of contra ow
logical on freeways because they are the highest for an evacuation is not known with certainty,
capacity roadways and are designed to facilitate interest in its potential began to be explored
high speed operation. Contra ow is also more after Hurricane Andrew struck Florida in 1992.
practical on freeways because these routes do By 1998, transportation and emergency manage-
not incorporate at-grade intersections that inter- ment of cials in both Florida and Georgia had
rupt ow or permit unrestricted access into the plans in place to use contra ow on segments
reversed segment. Freeway contra ow can also of Interstate freeways. Ultimately, the watershed
be implemented and controlled with fewer man- event for evacuation contra ow in the United
power resources than unrestricted highways. States was Hurricane Floyd in 1999. Since then,
Nearly all of the contra ow strategies cur- every coastal state threatened by hurricanes has
rently planned on US freeways have been de- developed and maintains plans for the use of
signed for the reversal of all inbound lanes. This evacuation contra ow.
con guration, shown schematically in Inset 1d of Hurricane Floyd triggered the rst two major
Fig. 1, is commonly referred to as a One-Way- implementations of contra ow, one on a segment
Out or All-Lanes-Out evacuation. Though not of Interstate (I) 16 from Savannah to Dublin,
as popular, some contra ow plans also include Georgia and the other on I-26 from Charleston
options for the reversal of only one of the inbound to Columbia, South Carolina. The results of both
lanes (Inset 1b) with another option to use one of these applications were generally positive,
or more of the outbound shoulders (Inset 1c) although numerous areas for improvement were
(Wolshon 2001). Inbound lanes in these plans also identi ed. The contra ow application in
are maintained for entry into the threat area by South Carolina was particularly interesting
emergency and service vehicles to provide assis- because it was not pre-planned. Rather, it was
tance to evacuees in need along the contra ow implemented on an improvisational basis after a
segment. strong public outcry came from evacuees trapped
for hours in congested lanes of westbound I-26
seeking ways to use the near-empty eastbound
Historical Background lanes.
The rst post-Floyd contra ow implementa-
Although evacuation-speci c contra ow is a rel- tions occurred in Alabama for the evacuation of
atively recent development, its application for Mobile and Louisiana for the evacuation New
other types of traf c problems is not new (Wol- Orleans. Once again, many lessons were learned
shon and Lambert 2004). In fact, various forms and numerous improvements in both physical
of reversible traf c operation have been used and operational aspects of the plans were sug-
throughout the world for decades to address many gested. The timing of these events was quite
types of directionally unbalanced traf c condi- fortuitous for New Orleans. Within 3 months
tions. They have been most common around ma- of the major changes that were implemented to
jor urban centers where commuter traf c is heavy the Louisiana contra ow plan after Hurricane
in one direction while traf c is light in the other. Ivan, they were put into operation for Hurri-
Reverse and contra ow operations have also been cane Katrina. The changes, so far the most ag-
popular for managing the infrequent, but periodic gressive and far-ranging of any developed until
and predictable, directionally imbalanced traf c that time (Wolshon et al. 2006), involved the
patterns associated with major events like con- closure of lengthy segments of interstate free-
certs, sporting events, and other public gather- way, forced traf c onto alternative routes, es-
ings. Reversible lanes have also been cost effec- tablished contra ow segments across the state
tive on bridges and in tunnels where additional di- boundary into Mississippi, coordinated parallel
rectional capacity is needed, but where additional non-freeway routes, and recon gured several in-
lanes can not be easily added. terchanges to more effectively load traf c from
Contraflow for Evacuation Traffic Management 347
Contraflow for Evacuation Traffic Management, Fig. 1 Freeway contra ow lane use con gurations for evacuations
(Wolshon 2001)
surface streets. The results of these changes were no other type of natural or manmade hazard. The
re ected in a clearance time for the city that was rst reason for this is that these two hazards affect
about half of the previous prediction (Wolshon much greater geographic areas and tend to be
and McArdle). slower moving relative to other hazards. Because
of their scope they also create the need move
larger numbers of people over greater distances
than other types of hazards. The second reason is
Scientific Fundamentals that contra ow requires considerable manpower
and materiel resources as well as time to mobi-
Although the basic concept of contra ow is sim-
lize and implement. Experiences in Alabama and
ple, it can be complex to implement and oper-
Louisiana showed that the positioning of traf c
ate in actual practice. If not carefully designed
control devices and enforcement personnel takes
and managed, contra ow segments also have the
at least 6 h not including the time to plan and
potential to be confusing to drivers. To insure
preposition equipment for the event. In Florida,
safe operation, improper access and egress move-
where needs are great and manpower resources
ments must be prohibited at all times during its
are stretched thin, evacuation contra ow requires
operation. Segments must also be fully cleared
involvement from the Florida National Guard.
of opposing traf c prior to initiating contra ow
For this reason (among others), Florida of cials
operations. These are not necessarily easy to
require a minimum of 49 h of advanced mobi-
accomplish, particularly in locations where seg-
lization time for contra ow to be implemented
ments are in excess of 100 miles and where
(Wolshon et al. 2005).
interchanges are frequent. For these reasons some
transportation of cials regard them to be risky
and only for use during daylight hours and under Operational Effects of Contra ow
the most dire situations. They are also the reason As the goal of an evacuation is to move as many
why contra ow for evacuation has been planned people as quickly out of the hazard threat zone
nearly exclusively for freeways, where access and as possible, the primary goal of contra ow is to
egress can be tightly controlled. increase the rate of ow and decrease the travel
To now, contra ow evacuations have also been time from evacuation origins and destinations.
used only for hurricane hazards and wild res and Prior to eld measurement, it was hypothesized
348 Contraflow for Evacuation Traffic Management
5000
Hurricane Ivan Hurricane Katrina
Total
8/26 thru 8/29, 2005
4500 9/14 and 9/15, 2004 Northbound
Volume
w/ contraflow
4000
3500
Total
3000 Northbound
Volume
2500
2000 Northbound Volume
in “Normal” Lanes
1500
1000
500
0
0 12 24 12 24 12 24 12 24 12 24 12 24 12 24
TUESDAY WEDNESDAY FRIDAY SATURDAY SUNDAY MONDAY
Contraflow for Evacuation Traffic Management, Fig. 2 Northbound traf c volume I-55 at Fluker Louisiana
(Data source: LA DOTD)
that the ow bene ts of contra ow would be left side of the graph) a total of 60,721 vehicles
substantial, but less than that of an equivalent traveled northbound through this location. During
normally owing lane (Wolshon 2001). These the Katrina evacuation, the total volume was
opinions were based on measurements of ow 84,660 vehicles during a corresponding 48 h pe-
on I-26 during the Hurricane Floyd evacuation riod. It is also worthy to note that the duration of
and the theory that drivers would drive at slower the peak portion of the evacuation (i.e., when the
speeds and with larger spacing in contra ow volumes were noticeably above the prior 3 week
lanes. average) was about the same for both storms.
The highest ow rates measured by the South The data in Fig. 2 are also of interest because
Carolina Department of Transportation (DOT) they are consistent with prior analytical models of
during the Floyd evacuation were between 1500 evacuation that have estimated maximum evacua-
to 1600 vehicles per hour per lane (vphpl) (United tion ow on freeways with contra ow to be about
States Army Corps of Engineers 2000). Traf c 5000 vph. One of the dif culties in making full
ows measured during the evacuations for Hurri- analyses of evacuation volume in general, and of
canes Ivan and Katrina on I-55 in Louisiana were contra ow volume in speci c, has been a lack
somewhat less than the South Carolina rates. of speed data. Although the ow rates recorded
Flows in the normal- ow lanes of I-55 averaged during the two recent Louisiana hurricane evacu-
about 1230 vphpl during the peak 10 h of the ations are considerably below than the theoretical
evacuation. Flow rates in the contra ow lanes capacity of this section of freeway, it can not
during the same period averaged about 820 vphpl. be determined with certainty if the conditions
These volumes compare to daily peaks of about were congested with low operating speeds and
400 vphpl during routine periods and a theoretical small headways or relatively free owing at more
capacity of 1800 2000 vphpl for this segment. moderate levels of demand. It is also interesting
The graph of Fig. 2 illustrates the hourly traf c to note that empirical observation of speed at
ow on I-55 during the evacuations for Hurri- a point toward the end of the segment did not
canes Ivan (when contra ow was not used) and appear to support the popular theory of elevated
Katrina (when contra ow was used). During the driver caution during contra ow. In fact, traf c
48 h period of the Ivan evacuation (shown on the enforcement personnel in Mississippi measured
Contraflow for Evacuation Traffic Management 349
speeds well in excess of posted speed limits as the segment and prohibit the segment from carrying
initial group of drivers moved through the newly capacity-level demand. This was illustrated by
opened lanes. I-10 contra ow segment in New Orleans during
the Hurricane Ivan evacuation. At that time,
Elements of Contra ow Segments evacuating traf c vehicles in the left and center
Reversible roadways have a number of physical outbound lanes of I-10 were transitioned across
and operational attributes that common among the median and into the contra ow lanes using C
all applications. The principle physical attributes a paved crossover. However, the combination of
are related to spatial characteristics of the design, the crossover design, temporary traf c control
including its overall length, number of lanes, devices, presence of enforcement personnel,
as well as the con guration and length of the and weaving vehicles created a ow bottleneck
inbound and outbound transition areas. The pri- that restricted in ow into the contra ow lanes.
mary operational attributes are associated with This caused two problems. First, it limited
the way in which the segment will be used and in- the number of vehicles that could enter the
clude the temporal control of traf c movements. contra ow lanes limiting ow beyond the entry
The temporal components of all reversible lane point signi cantly below its vehicle carrying
segments include the frequency and duration of capability. The other was that it caused traf c
a particular con guration and the time required queues upstream of the crossover that extended
to transition traf c from one direction to another. back for distances in excess of 14 miles. This plan
The duration of peak-period commuter reversible was signi cantly improved prior to the Katrina
applications, for example, typically last about evacuation 1 year later by permitting vehicles
2 h (not including set-up, removal, and transition to enter the contra ow lanes at multiple points,
time) with a twice daily frequency. Evacuation spatially spreading the demand over a longer
contra ow, however, may only be implemented distance and reducing the length and duration
once in several years, its duration of operation amount of the congested conditions (Wolshon
may last several days. et al. 2006).
Like all reversible ow roadways, contra ow Inadequate designs at the downstream end of
lanes need to achieve and maintain full utilization contra ow segments can also greatly limit its
to be effective. Although this sounds like an obvi- effectiveness. Prior experience and simulation
ous fact, it can be challenging to achieve in prac- modeling (Lim 2003) have shown that an inabil-
tice. The most common reason for underutiliza- ity to move traf c from contra ow lanes back into
tion has been inadequate transitions into and out normally owing lanes will result in congestion
of the contra ow segment. Contra ow requires a backing up from the termination transition point
transition section at the in ow and out ow ends in the contra ow lanes. Under demand conditions
to allow drivers to maneuver into and out of the associated with evacuations, queue formation can
reversible lanes from the unidirectional lanes on occur quite rapidly and extend upstream for many
the approach roadways leading into it. Since these miles within hours. To limit the potential for
termini regulate the ingress and egress of traf c such scenarios, con gurations that require merg-
entering and exiting the segment and they are ing of the normal and contra owing lanes are
locations of concentrated lane changing as drivers discouraged; particularly if they also incorporate
weave and merge into the desired lane of travel, lane drops. Two popular methods that are used
they effectively dictate the capacity of the entire to terminate contra ow include routing the two
segment. traf c streams at the termination on to separate
Through eld observation and simulation routes and reducing the level of out ow demand
studies (Theodoulou 2003; Williams et al. 2007) at the termination by including egress point along
it has been shown that contra ow entry points the intermediate segment. Several of the more
with inadequate in ow transitions result in traf c common con gurations are discussed in the fol-
congestion and delay prior to the contra ow lowing section.
350 Contraflow for Evacuation Traffic Management
Contra ow Plans and Designs nating route options at the end of the segment. In
The primary physical characteristics of con- some older designs, the contra ow traf c stream
tra ow segments are the number of lanes and was planned to be routed onto an intersecting
the length. A 2003 study (Urbina and Wolshon arterial roadway. One of the needs for this type of
2003) of hurricane evacuation plans revealed that split design is adequate capacity on the receiving
18 controlled access evacuation contra ow ow roadway.
segments and three additional arterial reversible Merge termination designs also have pros and
roadway segments have been planned for use in cons. Not surprisingly, however, these costs and
the US. Currently, all of the contra ow segments bene ts are nearly the exact opposite of split
are planned for a full One-Way-Out operation. designs in their end effect. For example, most
The shortest of the contra ow freeway segments merge designs preserve routing options for evac-
was the I-10 segment out of New Orleans at uees because they do not force vehicles on to
about 25 miles long. The longest were two 180 adjacent roadways and exits. Unfortunately, the
segments of I-10 in Florida; one eastbound from negative side to this is that they also have a
Pensacola to Tallahassee and the other westbound greater potential to cause congestion since they
from Jacksonville to Tallahassee. Most of the merge traf c into a lesser number of lanes. At
others were between 85 and 120 miles. rst glance it would appear illogical to merge
In the earliest versions of contra ow, nearly all two high volume roadways into one. However,
of the planned segments that were identi ed in in most locations where they are planned exit
the study were initiated via median crossovers. opportunities along the intermediate segment will
Now that single point loading strategies have be maintained to decrease the volumes at the end
been shown to be less effective, many locations of the segment.
are changing to multi-point loading. Most popu-
lar of these are median crossovers, with supple-
mental loading via nearby reversed interchange Key Applications
ramps.
The termination con gurations for the The list of applications for contra ow continues
reviewed contra ow segments were broadly to grow as transportation and emergency pre-
classi ed into one of two groups. The rst were paredness agencies recognize its bene ts. As a
split designs, in which traf c in the normal and result, the number of locations that are contem-
contra owing lanes were routed onto separate plating contra ow for evacuations is not known.
roadways at the terminus. The second group were However, a comprehensive study of contra ow
the merge designs in which the separate lane plans (Urbina and Wolshon 2003) in 2003 in-
groups are reunited into the normal- ow lanes cluded 21 reverse ow and contra ow sections.
using various geometric and control schemes. The locations and distances of these locations are
The selection of one or the other of these detailed in Table 1.
termination con gurations at a particular location
by an agency has been a function of several
factors, most importantly the level of traf c Future Directions
volume and the con guration and availability
of routing options at the end of the segment. As experiences with contra ow increase and its
In general, split designs offer higher levels of effectiveness becomes more widely recognized,
operational ef ciency of the two designs. The it is likely that contra ow will be accepted as
obvious bene t of a split is that it reduces the a standard component emergency preparedness
potential for bottleneck congestion resulting from planning and its usage will grow. Several recent
merging four lanes into two. Its most signi cant high pro le negative evacuation experiences have
drawback is that it requires one of the two lane prompted more states to add contra ow options
groups to exit to a different route, thereby elimi- to their response plans. The most notable of
Contraflow for Evacuation Traffic Management 351
Contraflow for Evacuation Traffic Management, Table 1 Planned contra ow/reverse ow evacuation routes
(Urbina and Wolshon 2003)
State Route(s) Approx. distance Origin location Termination
(miles) location
New Jersey NJ-47/ NJ-347a 19 44 29.5 3.5 26 Dennis Twp Atlantic Maurice River Twp
Atlantic City City Ship Bottom Washington Twp
Expressway NJ-72/
NJ-70a NJ-35a
Boro Mantoloking
Boro Wall Twp
Southampton
Pleasant
Pt.
Beach
C
NJ-138/I-195 Upper Freehold
Maryland MD-90 11 Ocean City US 50
a
Virginia I-64 80 Hampton Roads Richmond
Bridge
North Carolina I-40 90 Wilmington Benson (I-95)
South Carolina I-26 95 Charleston Columbia
Georgia I-16 120 Savannah Dublin
Florida I-10 Westbound I-10 180 180 20 110 85 Jacksonville Tallahassee
Eastbound SR 528 75 100 Pensacola SR 520 Tallahassee SR
(Beeline) I-4 East- Tampa Charlotte 417 Orange County
bound I-75 North- County Ft. Pierce I-275 Orlando Coast
bound FL Turnpike Coast
I-75 (Alligator Al-
ley)
Alabama I-65 135 Mobile Montgomery
Louisiana I-10 Westbound I- 25 115a New Orleans New I-55 Hattiesburga
10/I-59 (east/north) Orleans
Texas I-37 90 Corpus Christi San Antonio
a
Notes: Delaware and Virginia contra ow plans are still under development. The actual length of the New Orleans, LA
to Hattiesburg, MS contra ow segment will vary based on storm conditions and traf c demand. Since they are undivided
highways, operations on NJ-47/NJ-347, NJ-72/NJ-70, and NJ-35 are reverse ow rather than contra ow
these was in Houston, where scores of evac- It is also expected that as contra ow gains in
uees (including 23 in a single tragic incident) popularity, the application of other developing
reportedly perished during the highly criticized technologies will be integrated into this strat-
evacuation for Hurricane Rita in 2005 (Senior egy. Such has already been the case in South
Citizens From Houston Die When Bus Catches Carolina, Florida, and Louisiana where various
Fire 2005). Plans for contra ow are currently un- intelligent transportation systems (ITS) and other
der development and should be ready for imple- remote sensing technologies have been applied
mentation by the 2007 storm season. Contra ow to monitor the state and progression of traf c
is also being evaluated for use in some of the on contra ow sections during an evacuation. In
larger coastal cities of northeast Australia. Washington DC, where reversible ow has been
In other locations where hurricanes are not a evaluated for use on primary arterial roadways
likely threat, contra ow is also being studied. during emergencies, advanced control systems
Some of these examples include wild res for modifying traf c signal timings have also
in the western United States (Wolshon and been studied (Chen et al.).
Marchive 2007) and tsunamis and volcanoes
in New Zealand. Greater emphasis on terrorism
response have also resulted in cities with few Cross-References
natural hazards to begin examining contra ow
for various accidental and purposeful manmade Contra ow in Transportation Network
hazards (Sorensen and Vogt 2006). Dynamic Travel Time Maps
352 Contraflow in Transportation Network
Contraflow in Transportation Network, Fig. 2 Hurricane Rita evacuation required contra ow on Interstate Highway
45. Notice that traf c on both sides of I-45 is going north (Source: dallasnews.com)
with traf c immobile on 290, the plan was con gurations (89,023), only 346 con gurations
dropped, stranding many and prompting others to have minimum (i.e., optimal) evacuation time,
reverse course. We need that route so resources which corresponds to 0.26 % out of total possible
can still get into the city, explained an agency con gurations. For the same network with three
spokeswoman. types of ippings as shown in Fig. 3d, the number
of possible networks is 317 , which is more than
100 million. It is impossible to handle such
Scientific Fundamentals exponentially large number of con gurations
even with the most advanced computing system.
Why is Planning Contraflow Difficult?: Figur- These examples with such a small size network
ing out an optimal contra ow network con gura- show why it is dif cult to nd an optimal
tion is very challenging due to the combinatorial contra ow network. The problem is classi ed as
nature of the problem. Figure 3 shows examples an NP-hard problem in computer science domain.
of contra ow network con gurations. Suppose
that people (e.g., evacuees) in a source node S Modeling Contraflow using Graph: It is often
want to escape to destination node D on the necessary to model a contra ow problem using
network. Figure 3a is a road network with all a mathematical graph. S. Kim et al. (Kim and
edges in two way directions. In other words, no Shekhar 2005) presented a modeling approach
edge is reversed in the network. Figure 3b is an for the contra ow problem based on graph and
example of a so called Infeasible contra ow ow network. Figure 4 shows a simple evacuation
con guration because no evacuee can reach des- situation on a transportation network. Suppose
tination node D due to the ill- ipped road seg- that each node represents a city with initial occu-
ments. The network in Fig. 3c allows only two pancy and its capacity, as shown in Fig. 4a. City
types of ippings (i.e., ", #). A network in Fig. 3d A has 40 people and also capacity 40. Nodes
allows three types of ippings (i.e., ", #, "#). A and C are modeled as source nodes, while
Each network used in these examples has node E is modeled as a destination node (e.g.,
17 edges. If two types of ippings are allowed shelter). Each edge represents a road between
as shown in Fig. 3c, the number of possible two cities with travel time and its capacity. For
network con gurations is 217 , that is, 131,072. example, a highway segment between cities A
Among them, 89,032 con gurations are feasible. and B has travel time 1 and capacity 3. If a time
An experiment was conducted by assigning unit is 5 min, it takes 5 min for evacuees to travel
some number of evacuees on node S and from A to B and a maximum of 3 evacuees can
travel time/capacity attributes on edges. If simultaneously travel through the edge. Nodes B
evacuation time is measured for all feasible and D have no initial occupancy and only serve as
Contraflow in Transportation Network 355
Contraflow in a b
Transportation Network,
Fig. 3 Examples of
infeasible contra ow
network and 2 or 3 types of
ippings. (a) All two way S D S D
(b) Infeasible con guration
(c) Two types ippings
(d) Three types ippings
C
c d
S D S D
a b c
{40,40} {0,10}
(1,3)
A B
(1,4) (1,2)
{40,40} {0,10} {40,40} {0,10}
(1,3) (1,7) (1,7)
A B A B
E {0,–} (1,5) (1,5)
(1, 2) (1,2)
Contraflow in Transportation Network, Fig. 4 Graph representation of a simple evacuation situation and two
following contra ow con guration candidates
transshipment nodes. The evacuation time of the but also differ in evacuation time. Even though
original network in Fig. 4a is 22, which can be the time difference is just 3 in this example, the
measured using minimum cost ow algorithm. difference may be signi cantly different in the
Figure 4b and c illustrate two possible case of a complicated real network. This example
contra ow con gurations based on the original illustrates the importance of choice among
graph. All the two-way edges used in the possible network con gurations. In addition,
original con guration are merged by capacity there are critical edges affecting the evacuation
and directed in favor of increasing outbound time, such as edge (B, D) in Fig. 4.
evacuation capacity. There are two candidate
con gurations that differ in the direction of Solutions for Contraflow Planning: S. Kim
edges between nodes B and D. If the evacuation et al. (Kim and Shekhar 2005) presented heuristic
times of both con gurations are measured, the approaches to nd a sub-optimal contra ow
con guration in Fig. 4b has evacuation time 11, network con guration from a given network.
while the con guration in Fig. 4c has evacuation Their approaches used the congestion status of a
time 14. Both con gurations not only reduce road network to select the most effective target
356 Contraflow in Transportation Network
: Highway
0 ~ 1,000
~ 3,000
~ 8,000
~ 16,000
~ 40,000
Contraflow in Transportation Network, Fig. 5 Day-time population distribution in the Twin Cities, Minnesota
road segments. The experimental results showed simulator, they were able to suggest alternative
that reversing less than 20 % of all road segments contra ow con gurations at the level of entry and
was enough to reduce evacuation time by more termination points.
than 40 %. Tuydes and Ziliaskopoulos (2004)
proposed a mesoscopic contra ow network Datasets for Contraflow Planning: When
model based on a dynamic traf c assignment emergency managers plan contra ow schemes,
method. They formulated capacity reversibility the following datasets may be considered.
using a mathematical programming method. First, population distribution is important to
Theodoulou and Wolshon (2004) used CORSIM predict congested road segments and to prepare
microscopic traf c simulation to model the resources accordingly. Figure 5 shows a day-
freeway contra ow evacuation around New time population distribution in the Twin Cities,
Orleans. With the help of a micro scale traf c Minnesota. The dataset is based on Census 2000
Contraflow in Transportation Network 357
Contraflow in Transportation Network, Fig. 6 Monticello nuclear power plant located around Twin Cities,
Minnesota
Contraflow in
Transportation Network,
Fig. 7 A possible
contra ow scheme for
Monticello nuclear power
plant scenario
Contraflow in Transportation Network, Table 2 Different types of disasters present different types of evacuation
properties (Source: Litman 2006)
Type of disaster Geographic scale Warning Contra ow before Contra ow after
p p
Hurricane Very large Days
p p
Flooding Large Days
p
Earthquake Large None
p
Tsunami Very large Short
p
Radiation/toxic release Small to large Sometimes
of disasters and their properties. According to has proven to provide substantial savings in travel
Litman (2006), evacuation route plans should time. Reversible lanes are also commonly found
take into account the geographic scale and in tunnels and on bridges. The Golden Gate
length of warning. Contra ow preparedness Bridge in San Francisco has 2 reversible lanes.
is most appropriate for disasters with large Figure 8 shows a controlled access of reversible
geographic scale and long warning time, which lane system.
gives responders time to dispatch resources and
establish reversed lanes. Thus, hurricane and Others: Contra ow programs are used for
ooding are the most appropriate candidates to events with high density population such as
apply contra ow plans before disaster. Other football games, concerts, and reworks on the
types of disasters with relatively short warning Fourth of July. Highway construction sometimes
time may consider contra ow only after disaster requires contra ow. Figure 9 shows an example
to resolve traf c congestion for back home traf c. of contra ow use for highway construction.
Contraflow in
Transportation Network,
Fig. 8 Controlled access
of automated reversible
lane system in Adelaide
(Source: wikipedia.org)
Contraflow in
Transportation Network,
Fig. 9 Use of contra ow
for highway construction
on I-10 in Arizona (Source:
map.google.com)
Correlated Walk
Convergence of GIS and CAD CrimeStat: A Spatial Statistical Program for the
Analysis of Crime Incidents
Computer Environments for GIS and CAD
its negative counterpart). The presence of spa- lates the assumption of independent observations
tial autocorrelation or dependence means that a and reduces the number of degrees of freedom
certain amount of information is shared and du- or effective sample size. In this context, standard
plicated among neighboring locations, and thus, inferential tests tend to underestimate the true
an entire data set possesses a certain amount sampling variance of the Pearson s correlation
of redundant information. This feature violates coef cient when positive spatial autocorrelation
the assumption of independent observations upon is present in two variables under investigation, C
which many standard statistical treatments are resulting in a heightened chance of committing
predicated. This entry revolves around what hap- a Type I error. One can generate n different pairs
pens to the nature and statistical signi cance of of spatial patterns from the original variables; all
correlation coef cients (e.g., Pearson s r) when of the pairs are identical in terms of Pearson s cor-
spatial autocorrelation is present in both or either relation coef cient, but they are different in terms
of the two variables under investigation. of the number of degrees of freedom or effective
sample size (Clifford and Richardson 1985; Clif-
ford et al. 1989; Haining 1991; Dutilleul 1993).
Historical Background These notions can extend to situations dealing
with a pair of regression residuals (Tiefelsdorf
A lack of independence results in reduced de- 2001).
grees of freedom or effective sample size; the Two different approaches exist, addressing the
greater the level of spatial autocorrelation, the problem of spatial autocorrelation in bivariate
smaller the number of degrees of freedom or correlation. One is to seek to remedy the problem
effective sample size. This means that any type by providing modi ed hypothesis testing proce-
of statistical test based on an original sample dures taking the degree of spatial autocorrelation
size could be awed in the presence of spatial into account (for a comprehensive review and
autocorrelation, thus heightening the probability discussion, see Grif th and Paelinck 2011). The
of committing a Type I error. Suppose that n dif- other is to develop bivariate spatial autocorrela-
ferent map patterns are generated from n obser- tion statistics to capture the degree of spatial co-
vations. Because the n different map patterns are patterning between two map patterns and, fur-
identical in terms of sample mean and variance, ther, to propose some techniques for exploratory
any statistical inferences based on these values spatial data analysis (ESDA) that allow the de-
are identical. However, all of the map patterns tecting of bivariate spatial clusters (among others,
possess different degrees of freedom or effective Lee 2001; Anselin et al. 2002; Lee 2012).
sample size, and thus n different statistical esti-
mations should be obtained.
This type of problem occurs in situations deal- Scientific Fundamentals
ing with the correlation between two variables,
which has long been known (Bivand 1980; Grif- For this section, I seek to conceptualize and
th 1980; Haining 1980; Richardson and HØmon illustrate the concept of bivariate spatial depen-
1981). The presence of spatial autocorrelation dence with which the problems of correlation in
in both or either of two variables under investi- the presence of spatial autocorrelation are better
gation (i.e., bivariate spatial dependence) means captured and tackled. For simplicity, subsequent
that when the nature of a bivariate association discussions about spatial autocorrelation tend to
at a location is known, one can guess the na- refer to its positive component.
ture of bivariate associations at nearby locations. Nearly all studies about spatial autocorrela-
For example, if a location has a pair of higher- tion focus on univariate cases, i.e., on the sim-
than-average values for two variables, there is ilarity/dissimilarity in nearby locations in a sin-
a more-than-random chance to observe similar gle map pattern in terms of their values. How-
pairs in nearby locations. This feature again vio- ever, correlation could be a legitimate statistical
362 Correlation and Spatial Autocorrelation
concept endemic to bivariate situations. A corre- opposite. H Q denotes a spatial lag that is greater
lation coef cient should gauge the nature (direc- than or equal to the global average, and LQ denotes
tion and magnitude) of the relationship between the opposite. The symbol denotes a univariate
two variables under investigation. Interestingly, horizontal relationship. The symbol is intro-
spatial autocorrelation can be viewed as a par- duced here to make a clear distinction between
ticular case of correlation, although only a single an original value at a location and a derived value
variable is involved, which is why it is known as from a set of locations. This conceptualization is
autocorrelation. Because any type of correlation the basis for the Moran s I statistic.
should entail two vectors, another vector should If another concept (i.e., the spatial moving
be spatially derived for spatial autocorrelation average) is introduced, the situation changes sub-
to be a type of correlation. One of the most stantively. Unlike the spatial lag, this concept
commonly used concepts for this case is a spatial treats the reference unit itself as one of its neigh-
lag vector, each element of which represents a bors. Consider
weighted mean of a location s neighbors. In this
Q
H LQ (2)
sense, spatial autocorrelation could be rephrased
as the correlation between one variable and its Here, H Q and LQ denote the spatial moving av-
spatial lag vector (Lee 2001). erages at each location. This conceptualization
But, what kinds of issues can arise when forms the foundation for the Getis-Ord s Gi
we combine the two concepts, correlation and statistic. The four types of univariate spatial as-
spatial autocorrelation? This question might be sociation listed in (1) reduce to the two values
better captured by a rather new concept known in (2); H H Q and L LQ respectively are linked
as bivariate spatial dependence, which is a sim- to H and LQ , but H
Q LQ and L Q can
H
ple extension of the general concept of spatial point either way, depending on the differences in
dependence, and can be de ned as a particular values and/or spatial weights. These two values
relationship between the spatial proximity among can be conceptualized as two different types of
observational units and the numeric similarity of univariate spatial clusters (Lee and Cho 2013).
their bivariate associations (Lee 2001, 2012). This distinction between spatial association types
In a bivariate situation, each observational unit and spatial cluster types is critical because it
contains a pair of values, and the nature of the can represent the two contrasting perspectives of
bivariate association is assumed to be concep- spatial modeling and spatial exploration. This
tually de ned and numerically evaluated. If the distinction plays a pivotal role in addressing vari-
distribution of bivariate associations is not spa- ous issues about multivariate spatial dependence,
tially random, then we might legitimately state a particular case of which is bivariate spatial
that bivariate spatial dependence exists. dependence.
Before attempting to illustrate the concept We now move to bivariate situations in which
of bivariate spatial dependence, we begin with two variables, denoted by X and Y , are under
univariate spatial dependence. Any local set com- investigation. Each observational unit should take
posed of a reference observational unit and its on one of the following four types of bivariate
neighbors takes on one of the following four association (Lee 2012):
types of univariate spatial association:
H H L L
H Q
H H LQ L Q
H L Q
L (1) j j j j (3)
H L H L
Here, H denotes a value at a reference unit that is
greater than or equal to a threshold value (usually In this work, the symbol j denotes a bivari-
the average) or a positive ·-score (original values ate vertical relationship at a location. Pearson s
having the mean subtracted and then divided correlation coef cient is predicated upon this
by the standard deviation), and L denotes the conceptualization and is aspatial in nature in the
Correlation and Spatial Autocorrelation 363
sense that it does not consider the spatial distribu- ists of identifying those showing typical positive
tion of the pair-wise local bivariate associations. bivariate spatial dependence.
Suppose that a location has only one neighbor We might be able to simplify the situation
at which the four different types of bivariate as- by applying the notion of spatial lag as seen
sociation are possible, resulting in the following in (1). Because each variable has four different
16 different types of bivariate spatial association: types of univariate spatial association at a loca-
tion, we always have only 16 different types of C
H H H H H L H L bivariate spatial association (Lee 2012; Lee and
j j j j j j j j Cho 2013), no matter how many neighbors are
H H H L H H H L involved. Consider
H Q
H H HQ H Q
H H HQ
H H H H H L H L j j j j j j j j
j j j j j j j j (4) H Q
H H LQ L Q
H L LQ
L H L L L H L L
H Q
L H LQ H LQ H Q
L
L H L H L L L L j j j j j j j j (5)
j j j j j j j j Q
H H H LQ L HQ L Q
L
H H H L H H H L
L Q
H L HQ L Q
H L HQ
L H L H L L L L
j j j j j j j j
j j j j j j j j Q
H H H LQ L Q
H L LQ
L H L L L H L L
cases are expected to be more observable than bivariate spatial autocorrelation statistics for the
the anti-diagonal cases. If the rst and last cases bivariate counterparts of Moran s I and Getis-
prevail for a local set, a positive bivariate spatial Ord s Gi statistics.
dependence can be said to exist; if the second The test statistic for Pearson s r is given by
and third cases prevail, a negative bivariate spatial
p .p
dependence can be said to exist. Second, with
t Dr n 2 1 r2 (7)
a decent level of negative Pearson s r, the anti-
diagonal cases are expected to be more observ-
able than the main diagonal counterparts. If the with n 2 degrees of freedom when the following
rst and last cases prevail for a local set, a two assumptions are satis ed: pairs of observa-
positive bivariate spatial dependence can be said tions are drawn from the same, approximately
to exist; if the second and third cases prevail, bivariate normal, distribution with constant ex-
a negative bivariate spatial dependence can be pectation and nite variance (Haining 1991) and
said to exist. In an overall sense, if no bivariate observations of each variable are mutually inde-
spatial autocorrelation exists, the 16 different pendent. This standard hypothesis testing proce-
types of bivariate spatial association (occurrences dure for the correlation coef cient might not hold
of which are subordinate to the nature of the for spatial data. The rst assumption of a constant
global aspatial correlation) must be randomly mean structure cannot be assumed because of the
distributed; otherwise, they should show a certain potential presence of a global trend. More impor-
degree of spatial clustering. tantly, the second assumption cannot be sustained
These situations are further simpli ed by in- because of the usual presence of univariate spatial
corporating the notion of spatial moving average. autocorrelation for both or either of the variables
Because the four different types of univariate under investigation, which alludes to bivariate
spatial association de ned in (1) reduce to the two spatial dependence.
different values seen in (2), the 16 different types The standard error of Pearson s r, which is
of bivariate spatial association de ned in (5) can also a part of (7), is given by
reduce to the following four: r
1 r2
Q
H HQ LQ LQ Or D ; (8)
n 2
j j j j (6)
Q
H QL HQ QL where the denominator is associated with the
number of degrees of freedom. This standard er-
These classi cations can be referred to as four ror should be adjusted according to the degree of
different types of bivariate spatial clusters (Lee spatial autocorrelation in the variables; it should
and Cho 2013). The cases in the four corners in be larger when positive spatial autocorrelation
(5) represent typical examples of the four types; prevails (and vice versa for negative spatial au-
the others are classi ed into one of the four cases, tocorrelation) (Haining 1991). This outcome can
depending on differences in values and/or spatial be shown in (8); the lack of independence among
weights. pairs of observations due to positive bivariate
spatial dependence reduces the number of degree
of freedom or effective sample size, thus making
Key Applications the standard error larger.
Several approaches have been proposed
For this section, I focus on two strands of endeav- in order to remedy or at least alleviate
ors that have been undertaken in this particular the problem of underestimation of the true
eld: one is to develop a means to remedy the sampling variance that the standard inferential
problem of correlation in the presence of bivari- test commits (Clifford and Richardson 1985;
ate spatial dependence; the other is to devise Dutilleul 1993). In this entry, we focus solely
Correlation and Spatial Autocorrelation 365
P P
n i j wij .xi x/
N yj yN 1X
CM D P P q qP D · ·Q ; and
wij P n i X i Yi
i j .xii N 2
x/ i .yi N 2
y/
P h P P i
n i j wij xj xN j wij yj yN 1X
L D qP qP D ·Q ·Q : (13)
P P 2 n i X i Yi
i j wij i .xi N 2
x/ i .yi N 2
y/
Here, wij and wij are elements from a zero spatial modeling perspective, whereas Lee s
diagonal and nonzero diagonal spatial weights statistic is more strongly associated with the
matrix, respectively. The former statistic is one spatial exploration perspective. For example,
derived from a multivariate spatial correlation many situations might exist in which one should
matrix proposed by Wartenberg (1985) and is a postulate that a dependent variable at a given
simple extension of univariate Moran s I , thus set of locations is in uenced by independent
gauging the correlation between one variable at variables in the neighboring locations. However,
original locations and the other variable at the if the main interest lies in measuring the spatial
neighboring locations (a spatial lag vector). In similarity between the two map patterns, and
contrast, the latter, which was proposed by Lee exploring and detecting possible bivariate spatial
(2001, 2004, 2009), is de ned as the correlation clusters, L might be the better option. In
between one variable and the other variable s addition, L is much more congruent with what
spatial moving average vectors. In comparison, is documented in (6). The higher the Pearson s
cross-Moran is more congruent with the con- aspatial correlation coef cient, and at the same
cept of cross-correlation, whereas Lee s L deals time the higher the level of spatial clustering of
more directly with the concept of co-patterning bivariate association, the higher the L statistic.
by considering not only bivariate association at Certain exploratory spatial data analysis (ESDA)
the original locations but also their spatial associ- techniques using Lee s local Li (see Eq. 14)
ation with neighboring locations. can be developed like ones using cross-Moran
In examining the different advantages and (Anselin et al. 2002), which is beyond the
weaknesses, one can conclude that the bivariate scope of this entry (see Lee 2012; Lee and Cho
Moran s statistic is more congruent with the 2013):
P P
n2 jwij xj xN j wij yj yN
Li D qP qP D ·Q Xi ·Q Yi (14)
P P 2
i j wij i .xi N 2
x/ i .yi N 2
y/
The distributional properties for all bivariate information in terms of bivariate association,
spatial autocorrelation statistics have been estab- thus violating the assumption of independent
lished with the randomization assumption (Lee sampling, and the shared information spuriously
2004, 2009), which might be crucial to develop strengthens (or weakens) the nature of correlation
certain kinds of ESDA techniques, such as bivari- between two variables under investigation,
ate cluster maps. making any conventional statistical inferences
or judgments considerably questionable.
Future Directions The notion and procedure of correlation coef-
cient decomposition based on the eigenvector
Bivariate spatial dependence points to situations spatial ltering (ESF) technique (Grif th and
in which nearby observational units carry shared Paelinck 2011; Chun and Grif th 2013) provides
Correlation and Spatial Autocorrelation 367
an invaluable insight into our understanding of Bivand R (1980) A Monte Carlo study of correlation
correlation with spatial autocorrelation. It allows coef cient estimation with spatially autocorrelated ob-
servations. Quaest Geogr 6:5 10
an aspatial correlation coef cient to be decom- Clifford P, Richardson S (1985) Testing the association
posed into ve sub-correlations between spatially between two spatial processes. Stat Decis Suppl Issue
ltered variables, common spatial autocorrelation 2:155 160
components, unique spatial autocorrelation com- Clifford P, Richardson S, HØmon D (1989) Assessing
ponents, one s spatially ltered variable and the
the signi cance of the correlation between two spatial C
processes. Biometrics 45:123 134
other s unique spatial autocorrelation component, Chun Y, Grif th DA (2013) Spatial statistics & geostatis-
and one s unique spatial autocorrelation compo- tics: theory and applications for geographic informa-
nent and the other s spatially ltered variable. tion science & technology. Sage, Los Angeles
Dray S, Sonia S, Fran ois D (2008) Spatial ordination of
Bivariate spatial dependence or autocorrela- vegetation data using a generalization of Wartenberg s
tion is a special case of multivariate spatial de- multivariate spatial correlation. J Veg Sci 19:45 56
pendence or autocorrelation (Wartenberg 1985). Dutilleul P (1993) Modifying the t test for assessing the
For example, trivariate spatial dependence is correlation between two spatial processes. Biometrics
49:305 314
simply de ned as a particular relationship be- Getis A (1991) Spatial interaction and spatial autocor-
tween the spatial proximity among observational relation: a cross-product approach. Environ Plan A
units and the numeric similarity of their trivariate 23:1269 1277
associations. Thus, we have 43 D 64 different Grif th DA (1980) Towards a theory of spatial statistics.
Geogr Anal 12:325 339
types of trivariate spatial association, similar to Grif th DA (1988) Advanced spatial statistics: special
(5), and 23 D 8 different types of trivariate topics in the exploration of quantitative spatial data
spatial clusters, similar to (6). series. Kluwer, Dordrecht
Because each pair of variables in a multivari- Grif th DA, Amrhein CG (1991) Statistical analysis for
geographers. Prentice-Hall, Englewood Cliffs
ate data set can be viewed as a building block for Grif th DA, Paelinck, JH (2011) Non-standard spatial
statistical treatments, the notion of bivariate spa- statistics and spatial econometrics. Springer, New York
tial dependence should have certain implications Haining RP (1980) Spatial autocorrelation problems. In:
in spatializing any form of multivariate statisti- Herbert DT, Johnston RJ (eds) Geography and the
urban environment, vol 3. Wiley, New York, pp 1 44
cal techniques, e.g., spatial principal components Haining RP (1991) Bivariate correlation with spatial data.
analysis (e.g., Grif th 1988; Dray et al. 2008; Lee Geogr Anal 23:210 227
and Cho 2014; Lee 2015) and spatial canonical Lee S-I (2001) Developing a bivariate spatial association
correlation analysis. measure: an integration of Pearson s r and Moran s I .
J Geogr Syst 3:369 385
Lee S-I (2004) A generalized signi cance testing method
for global measures of spatial association: an extension
of the Mantel test. Environ Plan A 36:1687 1703
Cross-References Lee S-I (2009) A generalized randomization approach
to local measures of spatial association. Geogr Anal
41:221 248
Spatial Autocorrelation and Spatial Interaction
Lee S-I (2012) Exploring bivariate spatial dependence
Spatial Autocorrelation Measures and heterogeneity: a comparison of bivariate measures
Spatial Filtering of spatial association. Paper presented at the annual
Spatial Statistics and Geostatistics: Basic Con- meeting of the association of American geographers,
New York, 24 28 Feb
cepts
Lee S-I (2015) Some elaborations on spatial principal
components analysis. Paper presented at the annual
meeting of the association of American geographers,
Chicago, 21 25 Apr
References Lee S-I, Cho D (2013) Delineating the bivariate spatial
clusters: a bivariate AMOEBA technique. Paper pre-
Anselin L, Syabri I, Smirnov O (2002) Visualizing mul- sented at the annual meeting of the association of
tivariate spatial correlation with dynamically linked American geographers, Los Angeles, 9 13 Apr
windows. In: Anselin L, Rey S (eds) New tools for Lee S-I, Cho D (2014) Developing a spatial principal
spatial data analysis: proceedings of the specialist components analysis. Paper presented at the annual
meeting, Center for Spatially Integrated Social Science meeting of the association of American geographers,
(CSISS), University of California, Santa Barbara Tampa, 8 12 Apr
368 Correlation Queries
Richardson S, HØmon D (1981) On the variance of the Correlation queries are the queries used for
sample correlation between two independent lattice nding collections, e.g. pairs, of highly correlated
processes. J Appl Probab 18:943 948
Tiefelsdorf M (2001) Speci cation and distributional
time series in spatial time series data, which
properties of the spatial cross-correlation coef cient might lead to nd potential interactions and pat-
C"1 ;"2 . Paper presented at the Western Regional Sci- terns. A strongly correlated pair of time series
ence Conference, Palm Springs, 26 Feb indicates potential movement in one series when
Vallejos R, Osorio F, Cuevas F (2013) SpatialPack
an R package for computing spatial association be-
the other time series moves.
tween two stochastic processes de ned on the plane.
Available via DIALOG. http://rvallejos.mat.utfsm.cl/
Time%20Series%20I%202013/paper3.pdf. Accessed Historical Background
12 Feb 2016
Wartenberg D (1985) Multivariate spatial correlation: a
method for exploratory geographical analysis. Geogr The massive amounts of data generated
Anal 17:263 283 by advanced data collecting tools, such as
satellites, sensors, mobile devices, and medical
instruments, offer an unprecedented opportunity
for researchers to discover these potential nuggets
Correlation Queries of valuable information. However, correlation
queries are computationally expensive due to
Correlation Queries in Spatial Time Series large spatio-temporal frameworks containing
Data many locations and long time sequences.
Therefore, the development of ef cient query
processing techniques is crucial for exploring
these datasets.
Correlation Queries in Spatial Time
Previous work on query processing for time
Series Data
series data has focused on dimensionality re-
duction followed by the use of low dimensional
Pusheng Zhang
indexing techniques in the transformed space.
Microsoft Corporation, Redmond, WA, USA
Unfortunately, the ef ciency of these approaches
deteriorates substantially when a small set of
dimensions cannot represent enough information
Synonyms
in the time series data. Many spatial time se-
ries datasets fall in this category. For example,
Correlation Queries; Spatial Cone Tree; Spatial
nding anomalies is more desirable than nding
Time Series
well-known seasonal patterns in many applica-
tions. Therefore, the data used in anomaly detec-
Definition tion is usually data whose seasonality has been
removed. However, after transformations (e.g.,
A spatial framework consists of a collection Fourier transformation) are applied to deseason-
of locations and a neighbor relationship. A time alize the data, the power spectrum spreads out
series is a sequence of observations taken se- over almost all dimensions. Furthermore, in most
quentially in time. A spatial time series dataset spatial time series datasets, the number of spatial
is a collection of time series, each referencing locations is much greater than the length of the
a location in a common spatial framework. For time series. This makes it possible to improve the
example, the collection of global daily tempera- performance of query processing of spatial time
ture measurements for the last 10 years is a spa- series data by exploiting spatial proximity in the
tial time series dataset over a degree-by-degree design of access methods.
latitude-longitude grid spatial framework on the In this chapter, the spatial cone tree, an spatial
surface of the Earth. data structure for spatial time series data, is dis-
Correlation Queries in Spatial Time Series Data 369
cussed to illustrate how correlation queries are ef- A spatial cone tree is a spatial data structure
ciently supported. The spatial cone tree groups for correlation queries on spatial time series data.
similar time series together based on spatial prox- The spatial cone tree uses a tree data structure,
imity, and correlation queries are facilitated using and it is formed of nodes. Each node in the spatial
spatial cone trees. This approach is orthogonal cone tree, except for the root, has one parent node
to dimensionality reduction solutions. The spatial and several-zero or more-child nodes. The root
cone tree preserves the full length of time series, node has no parent. A node that does not have C
and therefore it is insensitive to the distribution of any child node is called a leaf node and a non-
the power spectrum after data transformations. leaf node is called an internal node.
A leaf node contains a cone and a data pointer
pd to a disk page containing data entries, and
is of the form h(cone.span, cone.center), pd i.
The cone contains one or multiple normalized
Scientific Fundamentals
time series, which are contained in the disk page
referred by the pointer pd . The cone.span and
Let xD hx1 ; x2 ; : : : ; xm i and yD hy1 ; y2 ; : : : ; ym i
cone.center are made up of the characteristic
be two time series of length m. The correlation
parameters for the cone. The data pointer is
coef cient of the two time series is de ned as:
P a block address. An internal node contains a
corr.x; y/ D m1 1 m xi x yi y
iD1 . x / . y / D x O y,
O
Pm q Pm cone and a pointer pi to an index page con-
xi 2
i D1 .xi x/ taining the pointers to children nodes, and is
where x D i D1 , x D ,y D
Pm q
m
Pm
m 1
i D1 yi i D1 .yi x/
2 of the form h (cone.span, cone.center), pi i. The
m
, yD m 1
, xbi D p 1 xi x x , cone.span and cone.center are the characteris-
m 1
ybi D p 1 yi y y , xO D hxO 1 ; xO 2 , : : : ; xc
m i, and tic parameters for the cone, which contains all
m 1
yO D hyO1 ; yb2 ; : : : ; yOm i. normalized times series in the subtree rooted at
Because the sum of the xbi 2 is equal to 1: this internal node. Multiple nodes are organized
0 12
in a disk page, and the number of nodes per
Pm P
bi 2 D iD1 @ p 1 r Pm i
iD1 x
m x x A D disk page is de ned as the blocking factor for a
m 1 .x 2
x/
i D1 i
m 1
spatial cone tree. Notice that the blocking factor,
1; xO is located in a multi-dimensional unit the number of nodes per disk page, depends on
sphere. Similarly, yO is also located in a the sizes of cone span, cone center, and data
multi-dimensional unit sphere. Based on the pointer.
de nition of corr.x; y/, corr.x; y/ D xO yO D Given a minimal correlation threshold .0 <
cos. .x;O y//.
O The correlation of two time series < 1), the possible relationships between a cone
is directly related to the angle between the two C and the query time series, Tq , consist of all-
time series in the multi-dimensional unit sphere. true, all-false, or some-true. All-true means that
Finding pairs of time series with an absolute all times series with a correlation over the correla-
value of correlation above the user given minimal tion threshold; all-false means all time series with
correlation threshold is equivalent to nding a correlation less than the correlation threshold;
pairs of time series xO and yO on the unit multi- some-true means only part of time series with
dimensional sphere with an angle in the range of a correlation over the correlation threshold. The
.0; arccos. // or .180 arccos. /; 180 /. upper bound and lower bound of angles between
A cone is a set of time series in a multi- the query time series and a cone is illustrated
dimensional unit sphere and is characterized by in Fig. 1a. Let T is any normalized time series
!!
two parameters, the center and the span of the in the cone C and .Tq ; T / is denoted for the
cone. The center of the cone is the mean of all the !
angle between the query time series vector Tq and
time series in the cone. The span of the cone is !
the time series vector T in the multi-dimensional
the maximal angle between any time series in the
sphere. The following properties are satis ed:
cone and the cone center.
370 Correlation Queries in Spatial Time Series Data
a b
Lower Bound
Upper bound
Lower bound
All-false Some-true
o acrcos( )
All-true Some-true
Correlation Queries in Spatial Time Series Data, Fig. 1 (a) Upper bound and lower bound, (b) properties of spatial
cone tree
NASA Earth observation systems currently Box G, Jenkins G, Reinsel G (1994) Time series analysis:
generate a large sequence of global snapshots of forecasting and control. Prentice Hall, Upper Saddle
River
the Earth, including various atmospheric, land, Dhillon I, Fan J, Guan Y (2001) Ef cient clustering
and ocean measurements such as sea surface of very large document collections. In: Grossman R,
temperature (SST), pressure, and precipitation. Kamath C, Kegelmeyer P, Kumar V, Namburu R (eds)
These data are spatial time series data in na- Data mining for scienti c and engineering applica-
ture. The climate of the Earth s land surface is
tions. Kluwer Academic, Dordrecht C
Chan FK, Fu AW (2003) Haar wavelets for ef cient
strongly in uenced by the behavior of the oceans. similarity search of time-series: with and without time
Simultaneous variations in climate and related warping. IEEE Trans Knowl Data Eng 15(3):678
processes over widely separated points on the 705
Guttman A (1984) R-trees: a dynamic index structure for
Earth are called teleconnections. For instance, ev- spatial searching. ACM, pp 47 57
ery three to seven years, an El Nino event, i.e., the Kahveci T, Singh A, Gurel A (2002) Similarity searching
anomalous warming of the eastern tropical region for multi-attribute sequences. IEEE, p 175
of the Paci c Ocean, may last for months, hav- Keogh E, Pazzani M (1999) An indexing scheme for fast
similarity search in large time series databases. IEEE,
ing signi cant economic and atmospheric con- pp 56 67
sequences worldwide. El Nino has been linked National Oceanic and Atmospheric Administration. El
to climate phenomena such as droughts in Aus- Nino Web Page. www.elnino.noaa.gov/
tralia and heavy rainfall along the eastern coast Ra ei D, Mendelzon A (2000) Querying time series data
based on similarity. IEEE Trans Knowl Data Eng
of South America. To investigate such land-sea 12(5):675 693
teleconnections, time series correlation queries Rigaux P, Scholl M, Voisard A (2001) Spatial databases:
across the land and ocean is often used to reveal with application to GIS. Morgan Kaufmann Publish-
the relationship of measurements of observations. ers, Reading
Samet H (1990) The design and analysis of spatial data
structures. Addison-Wesley Publishing Company, San
Francisco
Shekhar S, Chawla S (2003) Spatial databases: a tour.
Prentice Hall, Upper Saddle River. ISBN:0130174807
Future Directions Zhang P, Huang Y, Shekhar S, Kumar V (2003a) Correla-
tion analysis of spatial time series datasets: a lter-and-
In this chapter, the spatial cone tree on spa- re ne approach. In: Lecture notes in computer science,
vol 2637. Springer, Berlin/Heidelberg
tial time series data was discussed, and how Zhang P, Huang Y, Shekhar S, Kumar V (2003b) Ex-
correlation queries can be ef ciently supported ploiting spatial autocorrelation to ef ciently pro-
using the spatial cone tree was illustrated. In cess correlation-based similarity queries. In: Lec-
future work, more design issues on the spatial ture notes in computer science, vol 2750. Springer,
Berlin/Heidelberg
cone tree should be further investigated, e.g., the
blocking factor and balancing of the tree. The
spatial cone tree should be investigated to support
complex correlation relationships, such as time
lagged correlation. The generalization of spatial
cone trees to non-spatial index structures using COSP
spherical k-means to construct cone trees is also
an interesting research topic. Error Propagation in Spatial Prediction
Recommended Reading
Counterflow
Agrawal R, Faloutsos C, Swami A (1993) Ef cient simi-
larity search in sequence databases. In: Lecture notes
in computer science, vol 730. Berlin/Heidelberg Contra ow in Transportation Network
372 Coverage Standards and Services, Geographic
Main Text
impact this had on furthering spatial theories of Keith Harries (1974, 1980), to environmental
crime there was not much more that could be criminology using GIS and spatial statistics
done because the basic principles of geographic software continued, thereafter, to strengthen
analysis had not yet been operationalized into the role of geography in the study of crime.
what geographer Jerome (Dobson 1983) would As a result, criminology now has several
call, Automated Geography. Personal comput- geographic theories of crime, including rational
ers soon came afterwards, but software permitting choice (Cornish and Clarke 1986), routine
empirical testing of these theories did not come activity (Cohen and Felson 1979), and crime
until much later. It was at this point that crime pattern theory (Brantingham and Brantingham
mapping became useful to law enforcement, pri- 1981). Social disorganization theory was
marily to depict where crimes were occurring in also extended with geographical principles
order to focus resources (Weisburd and McEwen through the incorporation of simultaneous social
1997). However, there was not yet a relationship interactions between adjacent neighborhoods.
between academic institutions and law enforce- For a brief and succinct listing of these theories
ment agencies to couple theories with actual see Paulsen and Robinson (2004). At this point,
observations from the street. crime mapping branched out to become useful
Crime mapping with computers made an en- in a new practitioner-based area beyond law
trance in the mid 1960s allowing the production enforcement, the criminal justice agency. The
of maps of crime by city blocks shaded by vol- con uence of geographic principles, crimino-
ume of incidents. This was still of little inter- logical theory and advancing technology led to
est to researchers studying crime. Even though the development of crime prevention programs
criminologists were becoming interested in the based on empirical evidence, such as Hot Spot
spatial analysis of crime they were not looking to Policing.
other disciplines, including geography, for help In the late 1980s the Federal government
in analyzing data using a spatial framework. A played a role in advancing the use of the crime
manifold of software programs from geography mapping. The National Institute of Justice (NIJ)
were available that could have been used, but funded several efforts under the Drug Market
there is little evidence in any of the social science Analysis Program (DMAP) that brought together
literature that demonstrate that these programs academic institutions with law enforcement
were being used. Also neglected were principles agencies in ve cities in the United States (La
of geographic analysis, to analyze the spatial Vigne and Groff 2001). The purpose was to
aspects of data. With practitioners, their struggle identify drug markets and activities associated
was different. To produce maps of crime required with them by tracking movement of dealers and
serious computing infrastructure that, at the time, users in and out of them. These grants were the
was only available within larger city government rst to promote working relationships between
agencies, which did not hold making crime maps practitioners and researchers in the area of crime
in high priority (Weisburd and McEwen 1997). mapping to move them beyond the limitations
The growth of environmental criminology each was facing not having the other as a
in the 1980s, spearheaded by Paul and partner.
Patricia Brantingham, allowed the discipline of Continuing improvements in GIS throughout
geography to make inroads into criminological the 1990s, and into the 2000s, made it possible to
theory (La Vigne and Groff 2001). Environmental better assemble, integrate, and create new data.
criminology fused geographical principles This is probably the greatest impact that GIS
and criminological theory together with GIS has had on crime mapping. Not only could a
and provided opportunities to empirically test GIS assemble multiple and disparate sets of de-
the theories it was purporting. Signi cant mographic, economic and social data with crime
contributions by George Rengert (1989) (Rengert data, it could also create new units of analysis that
and Simon 1981), Jim LeBeau (1987, 1992) and better modeled human behavior. This capability
Crime Mapping and Analysis 375
and routine activities theories, with a geographic for categorizing the distribution of a variable.
framework, place. The theory works at various Qualitative maps provide a mechanism for classi-
geographic scales, from the macro-level with spa- cation of some description, or label, of a value.
tial aggregation at the census tract or other level, They are often shaded administrative or statistical
to the micro-scale with focus on speci c crime boundaries, such as census blocks, police beats or
events and places. Crime pattern theory focuses neighborhoods. For example, robbery rates based
on situations or places where there is lack of on population can be derived for neighborhood C
social control or guardianship over either the boundaries giving an indication of the neighbor-
suspect or victim, combined with a concentration hoods that pose the highest risk. However, loca-
of targets. For example, a suburban neighborhood tions can be symbolized to show quantities based
can become a hot spot for burglaries because on size or color of the symbol. For example, mul-
some homes have inadequate protection and no- tiple crime events at a particular location give an
body home to guard the property. indication of repeat victimization, such as com-
mon in burglary. However, simple visualization
Social Disorganization of values and rates can be misleading, especially
Social disorganization theory emphasizes the im- since the method of classi cation can change the
portance of social controls in neighborhoods on meaning of a map. Spatial statistics are then used
controlling behavior, particularly for individuals to provide more rigorous and objective analysis
with low self-control or a propensity to commit of spatial patterns in the data.
crime. Social controls can include family, as well
as neighborhood institutions such as schools and Non-graphical Indicators
religious places. When identifying places with Non-graphical statistical tests produce a single
social disorganization, the focus is on ability of number that represents the presence of the clus-
local residents to control social deviancy (Bursik tering of crime incidents or not. These are global
and Grasmick 1993). Important factors include level statistics indicating the strength of spa-
poverty, as well as turnover of residents and tial autocorrelation, but not its location. They
outmigration, which hinder the development of compare actual distributions of crime incidents
social networks and neighborhood institutions with random distributions. Positive spatial au-
that lead to collective ef cacy (Sampson et al. tocorrelation indicates that incidents are clus-
1997). tered, while negative indicates that incidents are
uniform. Tests for global spatial autocorrelation
within a set of points include Moran s I, (Chakra-
Key Applications vorty 1995), Geary s C statistic, and Nearest
Neighbor Index (Levine 2005). After visualizing
There are ve key applications in crime mapping. data in thematic maps these are the rst statistical
These applications are thematic mapping, non- tests conducive to determining whether there are
graphical indicators, hot spots, spatial regression any local level relationships between crime and
and geographic pro ling. They make up a full place exist.
compliment of techniques from elementary to
advanced. Hot Spots
Hot spots are places with concentrations of high
Thematic Mapping crime or a greater than average level of crime.
Thematic maps are color coded maps that de- The converse of a hot spot is a cold spot, which
pict the geographic distribution of numeric or are places that are completely, or almost, devoid
descriptive values of some variable. They reveal of crime. Identi cation and analysis of hot spots
the geographic patterns of the underlying data. is often done by police agencies, to provide
A variable can be quantitative or qualitative. guidance as to where to place resources and
Quantitative maps provide multiple techniques target crime reduction efforts. Hot spot analysis
378 Crime Mapping and Analysis
can work at different geographic levels, from criminals do not go far out of their daily routines
the macro-scale, looking at high crime neigh- to commit crimes. Geographic pro ling takes into
borhoods, or at the micro-scale to nd speci c account a series of crime locations that have been
places such as particular bars or street segments linked to a particular serial criminal and creates a
that are experiencing high levels of crime (Eck probability surface that identi es the area where
et al. 2005). Depending on the level of analysis, the offender s anchor point may be (Rossmo
police can respond with speci c actions such as 2000; Canter 2003). Geographic pro ling was
issuing a warrant or focusing at a neighborhood originally developed for use in serial murder,
level to address neighborhood characteristics that rapes, and other rare but serious crimes. However,
make the place more criminogenic. A variety of geographic pro ling is being expanded to high-
spatial statistical techniques are used for creat- volume crimes such as serial burglary (Chainey
ing hot spots, such as density surfaces (Levine and Ratcliffe 2005).
2005), location quotients (Isserman 1977; Brant-
ingham and Brantingham 1995; Ratcliffe 2004),
local indicators of spatial autocorrelation (LISA) Future Directions
(Anselin 1995; Getis and Ord 1996; Ratcliffe
and McCullagh 1998), and nearest neighborhood The advancement of research and practice in
hierarchical clustering (Levine 2005). crime mapping rests on continuing efforts in three
areas: efforts by research and technology centers,
Spatial Regression software development, and expansion into law
Regression techniques, such as Ordinary Least enforcement and criminal justice.
Squares (OLS), have been used for quite some Crime mapping research and technology
time in criminology as explanatory models. This centers, such as the MAPS Program, the CMAP
technique has a major limitation, in that it does and the Crime Mapping Center at the Jill
not account for spatial dependence inherent in Dando Institute (JDI), are primary resources
almost all data. Holding to geographic principles, for research, development and application of
a place with high crime is most likely surrounded GIS, spatial data analysis methodologies and
by neighbors that also experience high crime, geographic technologies. These three centers
thereby displaying spatial autocorrelation, i.e. a serves as conduits for much of the work
spatial effect. Spatial regression techniques, de- conducted in both the academic and practitioner
veloped by Luc (Anselin 2002), take into account communities. The MAPS Program is a grant
spatial dependence in data. Not factoring these funding and applied research center that serves as
spatial effects into models makes them biased and a resource in the use of GIS and spatial statistics
less ef cient. Tests have been created for iden- used in crime studies. The program awards
tifying spatial effects in the dependent variable numerous grants for research and development
(spatial lag) and among the independent variables in the technical, applied and theoretical aspects
(spatial error). If tests detect the presence of of using GIS and spatial data analysis to study
spatial lag or error, this form of regression adjusts crime, as well as conduct research themselves.
the model so that spatial effects do not unduly As a counterpart to the MAPS Program,
affect the explanatory power of the model. CMAP s mission is to serve practitioners in
law enforcement and criminal justice agencies
Geographic Pro ling by developing tools and training materials for
Geographic pro ling is a technique for identify- the next generation and crime analysts and
ing the likely area where a serial offender resides applied researchers in the use of GIS and spatial
or other place such as their place of work, that analysis. In the UK the Jill Dando Institute of
serves as an anchor point. Geographic pro ling Crime Science has a Crime Mapping Center that
techniques draw upon crime place theory and contributes to the advancement in understanding
routine activities theory, with the assumption that the spatial aspects of crime with an approach
Crime Mapping and Analysis 379
called crime science. This approach utilizes combines crime data with community data where
theories and principles from many scienti c crime is a characteristic of populations rather than
disciplines to examine every place as a unique a product. That is to say crime is, at times, a cause
environment for an explanation of the presence or of conditions rather than the result of conditions.
absence of crime. They conduct applied research It is an indicator of the well being of neigh-
and provide training with their unique approach borhoods, communities or cities. Shared with
on a regular basis. The MAPS Program and local level policy makers, COMPASS provides a C
the Crime Mapping Center at the JDI hold view into this well being of their communities.
conferences on a regular basis. These events Resources can be directed to those places that are
form the nexus for practitioners and researchers not well and helps to understand what makes
to work together in the exchange of ideas, data, other places well. Combined with problem-
experiences and results from analysis that create oriented policing, a strategy that addresses spe-
a more robust applied science. ci c crime problems, this approach can be ef-
Software programs are vital to the progression fective in reducing crime incidents and a general
of the spatial analysis of crime. These programs reduction in social disorder (Braga et al. 1999).
become the scienti c instruments that researchers Coupled with applications in criminal justice,
and practitioners need in understanding human mapping can be utilized to understand the results
behavior and environmental conditions as they of policy and the outcomes. This includes topics
relate to crime. Software, such as CrimeStat, important to community corrections in moni-
GeoDa and spatial routines for R are being writ- toring or helping returning offenders, including
ten to include greater visualization capabilities, registered sex offenders. Or, mapping can be of
more sophisticated modeling and mechanisms use in allocating probation and parole of cers
for seamless operation with other software. For to particular geographic areas, directing proba-
example, in version three of CrimeStat the theory tioners and parolees to community services, and
of travel demand was operationalized as a set of selecting sites for new community services and
routines that apply to criminals as mobile agents facilities (Karuppannan 2005). Finally, mapping
in everyday life. GeoDa continues to generate can even help to understand the geographic pat-
robust tools for visualization based on the prin- terns of responses to jury summons to determine
ciples of Exploratory Data Analysis (EDA). New if there are racial biases are occurring in some
and cutting edge tools for geographic visualiza- systematic way across a jurisdiction (Ratcliffe
tion, spatial statistics and spatial data analysis are 2004).
being added to the open statistical development These three elements will persist and inter-
environment R on a regular basis. All of these twine to evermore incorporate the geographic
programs provide a rich set of tools for testing aspects of basic and applied research of crime
theories and discovering new patterns that recip- through technology. The advancement of knowl-
rocally help re ne what is known about patterns edge that crime mapping can provide will re-
of crime. The emergence of spatial statistics has quire continued reciprocation of results between
proven important enough that even the major research and practice through technology (Stokes
statistical software packages, such as SAS, SPSS, 1997). The hope is that researchers will continue
and Stata are all incorporating full sets of spatial to create new techniques and methods that fuse
statistics routines. classical and spatial statistics together to further
The application of crime mapping is expand- operationalize geographic principles and crimi-
ing into broader areas of law enforcement and nological theory to aid in the understanding of
criminal justice. In law enforcement mapping is crime. Practitioners will implement new tools
taking agencies in new directions toward crime that are developed for analyzing crime with ge-
prevention. For example, the Computer Map- ographic perspectives. They will also continue to
ping, Planning and Analysis of Safety Strate- take these tools in new directions as improvers
gies (COMPASS) Program, funded by the NIJ, of technology (Stokes 1997) and discover new
380 Crime Mapping and Analysis
patterns as those tools become more complete in Canter D (2003) Mapping murder: the secrets of geo-
modeling places. graphic pro ling. Virgin Publishing, London
Chainey S, Ratcliffe J (2005) GIS and crime mapping.
Wiley, Hoboken
Acknowledgements We would like to thank Keith Har-
Chakravorty S (1995) Identifying crime clusters: the spa-
ries, Dan Helms, Chris Maxwell and Susan Wernicke-
tial principles. Middle States Geogr 28:53 58
Smith for providing comments on this entry in a very short
Cohen L, Felson M (1979) Social change and crime rate
time. They provided valuable comments that were used
trends. Am Soc Rev 44(4):588 608
toward crafting this entry.
Cornish D, Clarke RV (1986) The reasoning criminal.
The views expressed in this paper are those of the
Springer
authors, and do not represent the of cial positions or
Dobson JE (1983) Automated geography. Prof Geogr
policies of the National Institute of Justice or the US
35(2):135 143
Department of Justice.
Eck J, Chainey S, Cameron J, Leitner M, Wilson
RE (2005) Mapping crime: understanding hot spots.
National Institute of Justice, Washington, DC
Eck J, Wartell J (1997) Reducing crime and drug dealing
Cross-References by improving place management: a randomized exper-
iment. National Institute of Justice
Felson M (1994) Crime and everyday life. Pine
Autocorrelation, Spatial
Forge
Constraint Data, Visualizing Getis A, Ord JK (1996) Local spatial statistics: an
CrimeStat: A Spatial Statistical Program for the overview. In: Longley P, Batty M (eds) Spatial analy-
Analysis of Crime Incidents sis: modelling in a GIS environment. Geoinformation
International, Cambridge, pp 261 277
Data Analysis, Spatial
Harries KD (1974) The geography of crime and justice.
Exploratory Visualization McGraw-Hill, New York
Hotspot Detection, Prioritization, and Security Harries KD (1980) Crime and the environment. Charles C
Patterns, Complex Thomas Press, Sping eld
Isserman AM (1977) The location quotient approach for
Spatial Econometric Models, Prediction
estimating regional economic impacts. J Am Inst Plan
Spatial Regression Models 43:33 41
Statistical Descriptions of Spatial Patterns Karuppannan J (2005) Mapping and corrections: man-
Time Geography agement of offenders with geographic information
systems. Corrections Compendium. http://www.iaca.
net/Articles/drjaishankarmaparticle.pdf
La Vigne NG, Groff ER (2001) The evolution of crime
mapping in the United States: from the descriptive to
References the analytic. In: Hirsch eld A, Bowers K (eds) Map-
ping and analyzing crime data. University of Liverpool
Anselin L (1995) Local indicators of spatial association Press, Liverpool, pp 203 221
LISA. Geogr Anal 27:93 115 LeBeau JL (1987) Patterns of stranger and serial rape
Anselin L (2002) Under the hood: issues in the speci - offending: factors distinguishing apprehended and at
cation and interpretation of spatial regression models. large offenders. J Crim Law Criminol 78(2):309
http://sal.uiuc.edu/users/anselin/papers.html 326
Braga AA, Weisburd DL, et al (1999) Problemoriented LeBeau JL (1992) Four case studies illustrating the spa-
policing in violent crime places: a randomized con- tialtemporal analysis of serial rapists. Police Stud
trolled experiment. Criminology 7:541 580 15:124 145
Brantingham P, Brantingham P (1981) Environmental Levine N (2005) CrimeStat III version 3.0, a spatial
criminology. Waverland Press, Prospect Heights statistics program for the analysis of crime incident
Brantingham P, Brantingham P (1995) Location quotients locations
and crime hotspots in the city. In: Block C, Dabdoub Park RE, Burgess EW, McKenzie RD (1925) The city:
M, Fregly S (eds) Crime analysis through computer suggestions for investigation of human behavior in
mapping. Police Executive Research Forum, Washing- the urban environment. University of Chicago Press,
ton, DC Chicago
Beccaria C (1764) Richard Davies, translator: on crimes Paulsen DJ, Robinson MB (2004) Spatial aspects of crime:
and punishments, and other writings. Cambridge theory and practice. Allyn and Bacon, Boston
University Press Ratcliffe JH, McCullagh MJ (1998) The perception of
Bursik RJ, Grasmick HG (1993) Neighborhoods and crime hotspots: a spatial study in Nottingham, UK.
crime: the dimensions of effective community control. In: Crime mapping case studies: successes in the eld.
Lexington Books, New York National Institute of Justice, Washington, DC
CrimeStat: A Spatial Statistical Program for the Analysis of Crime Incidents 381
Definition
Recommended Reading
CrimeStat is a spatial statistics and visualization
Clarke RV (1992) Situational crime prevention: successful program that interfaces with desktop GIS pack-
case studies. Harrow and Heston, New York ages. It is a stand-alone Windows program for the
Cresswell T (2004) Place: a short introduction. Blackwell
Publishing Ltd, Malden analysis of crime incident locations and can in-
Haining R (2003) Spatial data analysis: theory and prac- terface with most desktop GIS programs. Its aim
tice. Cambridge University Press, Cambridge/New is to provide statistical tools to help law enforce-
York ment agencies and criminal justice researchers
Ray JC (1977) Crime prevention through environmental
design. Sage Publications, Beverly Hills in their crime mapping efforts. The program has
Ronald CV (1992) Situational crime prevention. Harrow many statistical tools, including centrographic,
and Heston, New York distance analysis, hot spot analysis, space-time
Weisburd D, Green L (1995) Policing drug hot spots: the analysis, interpolation, Journey-to-Crime estima-
Jersey city drug market analysis experiment. Justice Q
12(4):711 736 tion, and crime travel demand modeling routines.
The program writes calculated objects to GIS
les that can be imported into a GIS program,
including shape, MIF/MID, BNA, and ASCII.
Crime Travel Demand The National Institute of Justice is the distributor
of CrimeStat and makes it available for free to an-
alysts, researchers, educators, and students (The
CrimeStat: A Spatial Statistical Program for the program is available at http://www.icpsr.umich.
Analysis of Crime Incidents edu/crimestat). The program is distributed along
382 CrimeStat: A Spatial Statistical Program for the Analysis of Crime Incidents
with a manual that describes each of the statistics Evans 1954), the linear nearest neighbor statistic,
and gives examples of their use (Levine 2007a). the K-order nearest neighbor distribution (Cressie
1991), and Ripley s K statistic (Ripley 1981).
The testing of signi cance for Ripley s K is done
Historical Background through a Monte Carlo simulation that estimates
approximate con dence intervals.
CrimeStat has been developed by Ned Levine
and Associates since the late 1990s under grants Hot Spot Analysis
from the National Institute of Justice. It is an An extreme form of spatial autocorrelation is a
outgrowth of the Hawaii Pointstat program that hot spot. While there is no absolute de nition of
was UNIX-based (Levine 1996). CrimeStat, on a hot spot , police are aware that many crime
the other hand, is a Windows-based program. It incidents tend to be concentrated in a limited
is written in C++ and is multi-threading. To date, number of locations. The Mapping and Analy-
there have been three major versions with two sis for Public Safety Program at the National
updates. The rst was in 1999 (version 1.0) with Institute of Justice has sponsored several major
an update in 2000 (version 1.1). The second was studies on crime hot spot analysis (Harries 1999;
in 2002 (CrimeStat II) and the third was in 2004 LaVigne and Wartell 1998; Eck et al. 2005).
(CrimeStat III). The current version is 3.1 and CrimeStat includes seven distinct hot spot
was released in March 2007. analysis routines: the mode, the fuzzy mode,
nearest neighbor hierarchical clustering (Everitt
et al. 2001), risk-adjusted nearest neighbor hi-
Scientific Fundamentals erarchical clustering (Levine 2004), the Spatial
and Temporal Analysis of Crime routine (STAC)
The current version of CrimeStat covers seven (Block 1994), K-means clustering, and Anselin s
main areas of spatial analysis: centrographic; Moran statistic (Anselin 1995).
spatial autocorrelation, hot spot analysis, inter- The mode counts the number of incidents at
polation, space-time analysis, Journey-to-Crime each location. The fuzzy mode counts the number
modeling, and crime travel demand modeling. of incidents at each location within a speci ed
search circle; it is useful for detecting concen-
Centrographic Measures trations of incidents within a short distance of
There are a number of statistics for describing each other (e.g., at multiple parking lots around
the general properties of a distribution. These a stadium; at the shared parking lot of multiple
include central tendency of the overall spatial apartment buildings).
pattern, dispersion and directionality. Among the The nearest neighbor hierarchical clustering
statistics are the mean center, the center of min- routine de nes a search circle that is tied to
imum distance, the standard distance deviation, the random nearest neighbor distance. First, the
the standard deviational ellipse, the harmonic algorithm groups incidents that are closer than the
mean, the geometric mean, and the directional search circle and then searches for a concentra-
mean (Ebdon 1988). tion of multiple incidents within those selected.
The center of each concentration is identi ed and
Spatial Autocorrelation all incidents within the search circle of the center
There are several statistics for describing spatial of each concentration are assigned to the cluster.
autocorrelation, including Moran s I, Geary s C, Thus, incidents can belong to one-and-only-one
and a Moran Correlogram (Moran 1948; Geary cluster, but not all incidents belong to a cluster.
1954; Ebdon 1988). There are also several statis- The process is repeated until the distribution is
tics that describe spatial autocorrelation through stable ( rst-order clusters). The user can spec-
the properties of distances between incidents in- ify a minimum size for the cluster to eliminate
cluding the nearest neighbor statistic (Clark and very small clusters (e.g., 2 or 3 incidents at the
CrimeStat: A Spatial Statistical Program for the Analysis of Crime Incidents 383
same location). Once clustered, the routine then (a kernel). The densities are summed over all
clusters the rst-order clusters to produce second- incidents to produce an estimate for the cell. This
order clusters. The process is continued until the process is then repeated for each grid cell (Bailey
grouping algorithm fails. The risk-adjusted near- and Gatrell 1995). CrimeStat allows ve different
est neighbor hierarchical clustering routine fol- mathematical functions to be used to estimate the
lows the same logic but compares the distribution density. The particular dispersion of the function
of incidents to a baseline variable. The clustering is controlled through a bandwidth parameter and C
is done with respect to a baseline variable by the user can select a xed or an adaptive band-
calculating a cell-speci c grouping distance that width. It is a type of hot spot analysis in that it
would be expected on the basis of the baseline can illustrate where there are concentrations of
variable, rather than a single grouping distance incidents. However it lacks the precision of the
for all parts of the study area. hot spot routines since it is smoothed. The hot
The Spatial and Temporal Analysis of Crime spot routines will show exactly which points are
hot spot routine (STAC) is linked to a grid and included in a cluster.
groups on the basis of a minimum size. It is CrimeStat has two different kernel function, a
useful for identifying medium-sized clusters. The single-variable kernel density estimation routine
K-means clustering algorithm divides the points for producing a surface or contour estimate of the
into K distinct groupings where K is de ned density of incidents (e.g., the density of burglar-
by the user. Since the routine will frequently ies) and a dual-variable kernel density estimation
create clusters of vastly unequal size due to the routine for comparing the density of incidents to
concentration of incidents in the central part of the density of an underlying baseline (e.g., the
most metropolitan areas, the user can adjust them density of burglaries relative to the density of
through a separation factor. Also, the user can de- households).
ne speci c starting points (seeds) for the clusters As an example, Fig. 1 shows motor vehicle
as opposed to allowing the routine to nd its own. crash risk along Kirby Drive in Houston for
Statistical signi cance of these latter routines 1999 2001. Crash risk is de ned as the an-
is tested with a Monte Carlo simulation. The nual number of motor vehicle crashes per 100
nearest neighbor hierarchical clustering, the risk- million vehicle miles traveled (VMT) and is a
adjusted nearest neighbor hierarchical clustering, standard measure of motor vehicle safety. The
and the STAC routines each have a Monte Carlo duel-variable kernel density routine was used to
simulation that allows the estimation of approx- estimate the densities with the number of crashes
imate con dence intervals or test thresholds for being the incident variable and VMT being the
these statistics. baseline variable. In the map, higher crash risk
Finally, unlike the other hot spot routines, is shown as darker. As a comparison, hot spots
Anselin s Local Moran statistic is applied to ag- with 15 or more incidents were identi ed with the
gregates of incidents in zones. It calculates the nearest neighbor hierarchical clustering routine
similarity and dissimilarity of zones relative to and are overlaid on the map as are the crash
nearby zones by applying the Moran s I statistic locations.
to each zone. An approximate signi cance test
can be calculated using an estimated variance.
Space-Time Analysis
Interpolation There are several routines for analyzing cluster-
Interpolation involves extrapolating a density es- ing in time and in space. Two are global measures
timate from individual data points. A ne-mesh the Knox and Mantel indices, which specify
grid is placed over the study area. For each grid whether there is a relationship between time and
cell, the distance from the center of the cell to space. Each has a Monte Carlo simulation to es-
each data point is calculated and is converted timate con dence intervals around the calculated
into a density using a mathematical function statistic.
384 CrimeStat: A Spatial Statistical Program for the Analysis of Crime Incidents
CrimeStat: A Spatial Statistical Program for the Analysis of Crime Incidents, Fig. 1 Safety on Houston s Kirby
Drive: 1998 2001
The third space-time routine is a speci c tool events committed by the serial offender by dis-
for predicting the behavior of a serial offender tance, direction, and time interval. It does this
called the Correlated Walk Analysis module. This by analyzing the sequence of lagged incidents. A
module analyzes periodicity in the sequence of diagnostic correlogram allows the user to analyze
CrimeStat: A Spatial Statistical Program for the Analysis of Crime Incidents 385
periodicity by different lags. The user can then on a large set of records of known offenders,
specify one of several methods for predicting the routine estimates the distribution of origins
the next incident that the serial offender will of these offenders. This information can then
commit, by location and by time interval. Error be combined with the travel distance function
is, of course, quite sizeable with this methodol- to make estimates of the likely location of a
ogy because serial offenders don t follow strict serial offender where the residence location is not
mathematical rules. But the method can be useful known. Early tests of this method suggest that it is C
for police because it can indicate whether there 10 15% more accurate than the traditional travel
are any repeating patterns that the offender is distance only method in terms of estimating the
following. distance between the highest probability location
and the location where the offender lived.
Journey-to-Crime Analysis As an example, Fig. 2 shows a Bayesian prob-
A useful tool for police departments seeking ability model of the likely residence location of a
to apprehend a serial offender is Journey-to- serial offender who committed ve incidents be-
crime analysis (sometimes known as Geographic tween 1993 and 1997 in Baltimore County, Mary-
Pro ling). This is a method for estimating the land (two burglaries and three larceny thefts). The
likely residence location of a serial offender given grid cell with the highest probability is outlined.
the distribution of incidents and a model for The location of the incidents is indicated as is the
travel distance (Brantingham and Brantingham actual residence location of the offender when ar-
1981; Canter and Gregory 1994; Rossmo 1995; rested. As seen, the predicted highest probability
Levine 2007b). The method depends on building location is very close to the actual location (0.14
a typical travel distance function, either based on of a mile error).
empirical distances traveled by known offenders
or on an a priori mathematical function that ap- Crime Travel Demand Modeling
proximates travel behavior (e.g., a negative expo- CrimeStat has several routines that examine
nential function, a negative exponential function travel patterns by offenders. There is a module
with a low use buffer zone around the offender s for modeling crime travel behavior over a
residence). metropolitan area called Crime Travel Demand
CrimeStat has a Journey-to-Crime routine that modeling. It is an application of travel demand
uses the travel distance function and a Bayesian modeling that is widely used in transportation
Journey-to-Crime routine that utilizes additional planning (Ortuzar and Willumsen 2001). There
information about the likely origins of offenders are four separate stages to the model. First,
who committed crimes in the same locations. predictive models of crimes occurring in a series
With both types the traditional distance-based of zones (crime destinations) and originating in
and the Bayesian, there are both calibration and a series of zones (crime origins) are estimated
estimation routines. In the calibration routine for using a non-linear (Poisson) regression model
the Journey-to-Crime routine, the user can create with a correction for over-dispersion (Cameron
an empirical travel distance function based on the and Trivedi 1998). Second, the predicted origins
records of known offenders where both the crime and destinations are linked to yield a model
location and the residence location were known of crime trips from each origin zone to each
(typically from arrest records). This function can destination zone using a gravity-type spatial
then be applied in estimating the likely location interaction model. To estimate the coef cients,
of a single serial offender for whom his or her the calibrated model is compared with an actual
residence location is not known. distribution of crime trips.
The Bayesian Journey-to-Crime routine uti- In the third stage, the predicted crime trips
lizes information about the origins of other of- are separated into different travel modes using
fenders who committed crimes in the same lo- an approximate multinomial utility function
cations as a single serial offender. Again, based (Domencich and McFadden 1975). The aim is
386 CrimeStat: A Spatial Statistical Program for the Analysis of Crime Incidents
CrimeStat: A Spatial Statistical Program for the Analysis of Crime Incidents, Fig. 2 Estimating the residence
location of a serial offender in Baltimore County (MD)
useful to help police systematically identify the consistent with trends in computer science. First,
high crime areas as well as the areas where there there will be a new GUI interface that will be
are concentrations of offenders (which are not more Windows Vista-oriented. Second, the code
necessarily the same as the high crime locations). is being revised to be consistent with the .NET
For example, the hot spot tools were used to framework and selected routines will be compiled
identify locations with many red light running as objects in a library that will be available for
crashes in Houston as a prelude for introducing programmers and third-party applications. Third, C
photo-enforcement. The Massachusetts State Po- additional statistics relevant for crime predic-
lice used the neighbor nearest hierarchical clus- tion are being developed. These include a spatial
tering algorithm to compare heroin and marijuana regression module using Markov Chain Monte
arrest locations with drug seizures in one small Carlo methods and an incident detection module
city (Bibel 2004). for identifying emerging crime hot spot spots
Another criminal justice application is the early in their sequence. Version 4 is expected to
desire to catch serial offenders, particularly be released early in 2009.
high visibility ones. The Journey-to-Crime and
Bayesian Journey-to-Crime routines can be
useful for police departments in that it can narrow Cross-References
the search that police have to make to identify
likely suspects. Police will routinely search Autocorrelation, Spatial
through their database of known offenders; Crime Mapping and Analysis
the spatial narrowing can reduce that search Data Analysis, Spatial
substantially. The CrimeStat manual has several Emergency Evacuation, Dynamic Transporta-
examples of the Journey-to-Crime tool being tion Models
used to identify a serial offender. As an example, Hotspot Detection, Prioritization, and Security
the Glendale (Arizona) Police Department used Movement Patterns in Spatio-temporal Data
the Journey-to-Crime routine to catch a felon Nearest Neighbor Problem
who had committed many auto thefts (Hill 2004). Movement Patterns in Spatio-Temporal Data
Many of the other tools are more relevant Public Health and Spatial Modeling
for applied researchers such as the tools for de- Routing Vehicles, Algorithms
scribing the overall spatial distribution or for Statistical Descriptions of Spatial Patterns
calculating risk in incidents (police typically are
interested in the volume of incidents) or for
modeling the travel behavior of offenders. Two References
examples from the CrimeStat manual are given.
Amaral S, Monteiro AMV, C mara G, Quintanilha JA
First, the spatial distribution of Man With A (2004) Evolution of the urbanization process in the
Gun calls for service during Hurricane Hugo Brazilian Amazonia. In: Levine N (ed) CrimeStat III:
in Charlotte, North Carolina was compared with a spatial statistics program for the analysis of crime
a typical weekend (LeBeau 2004). Second, the incident locations (version 3.0), Chapter 8. Ned Levine
& Associates, Houston; National Institute of Justice,
single-variable kernel density routine was used Washington, DC
to model urbanization changes in the Amazon Anselin L (1995) Local indicators of spatial association
between 1996 and 2000 (Amaral et al. 2004). LISA. Geogr Anal 27(2):93 115
Bailey TC, Gatrell AC (1995) Interactive spatial data
analysis. Longman Scienti c & Technical/Burnt Mill,
Essex
Future Directions Bibel B (2004) Arrest locations as a means for direct-
ing resources. In: Levine N (ed) CrimeStat III: a
spatial statistics program for the analysis of crime
Version 4 of CrimeStat is currently being de-
incident locations (version 3.0), Chapter 6. Ned Levine
veloped (CrimeStat IV). The new version will & Associates, Houston; National Institute of Justice,
have a complete restructuring to modernize it Washington, DC
388 Cross-Covariance Models
Block CR (1994) STAC hot spot areas: a statistical tool & Associates, Houston; National Institute of Justice,
for law enforcement decisions. In: Proceedings of the Washington, DC
workshop on crime analysis through computer map- Levine N (2007a) CrimeStat III: a spatial statistics pro-
ping. Criminal Justice Information Authority, Chicago gram for the analysis of crime incident locations (ver-
Brantingham PL, Brantingham PJ (1981) Notes on the sion 3.1). Ned Levine & Associates, Houston; National
geometry of crime. In: Brantingham PJ, Brantingham Institute of Justice, Washington, DC
PL (eds) Environmental criminology. Waveland Press, Levine N (2007b) Bayesian journey to crime estima-
Inc., Prospect Heights, pp 27 54 tion (update chapter). In: Levine N (ed) CrimeStat
Cameron AC, Trivedi PK (1998) Regression analysis of III: a spatial statistics program for the analysis of
count data. Cambridge University Press, Cambridge crime incident locations (version 3.1). Ned Levine
Canter D, Gregory A (1994) Identifying the residential & Associates, Houston; National Institute of Justice,
location of rapists. J Forens Sci Soc 34(3):169 175 Washington, DC. Available at http://www.icpsr.umich.
Clark PJ, Evans FC (1954) Distance to nearest neighbor edu/crimestat
as a measure of spatial relationships in populations. Moran PAP (1948) The interpretation of statistical maps.
Ecology 35:445 453 J R Stat Soc B 10:243 251
Cohen LE, Felson M (1979) Social change and crime Ortuzar JD, Willumsen LG (2001) Modeling transport,
rate trends: a routine activity approach. Am Soc Rev 3rd edn. Wiley, New York
44:588 608 Ripley BD (1981) Spatial statistics. Wiley, New York
Cressie N (1991) Statistics for spatial data. Wiley, New Rossmo DK (1995) Overview: multivariate spatial pro-
York les as a tool in crime investigation. In: Block CR,
Domencich T, McFadden DL (1975) Urban travel de- Dabdoub M, Fregly S (eds) Crime analysis through
mand: a behavioral analysis. North-Holland Publish- computer mapping. Police Executive Research Forum,
ing Co. Reprinted 1996. Available at: http://emlab. Washington, DC, pp 65 97
berkeley.edu/users/mcfadden/travel.html Sedgewick R (2002) Algorithms in C++: part 5 graph
Ebdon D (1988) Statistics in geography, 2nd edn. (with algorithms, 3rd edn. Addison-Wesley, Boston
corrections) Blackwell, Oxford Wilson JQ, Kelling G (1982) Broken windows: the police
Eck J, Chainey S, Cameron J, Leitner M, Wilson RE and neighborhood safety. Atl Mon 29(3):29 38
(2005) Mapping crime: understanding hot spots. Map-
ping and Analysis for Public Safety/National Institute
of Justice, Washington, DC
Everitt BS, Landau S, Leese M (2001) Cluster analysis, Cross-Covariance Models
4th edn. Oxford University Press, New York
Geary R (1954) The contiguity ratio and statistical map- Hurricane Wind Fields, Multivariate Model-
ping. Inc Stat 5:115 145
Harries K (1999) Mapping crime: principle and practice.
ing
NCJ 178919, National Institute of Justice/US Depart-
ment of Justice, Washington, DC. Available at http://
www.ncjrs.org/html/nij/mapping/pdf.html CSCW
Hill B (2004) Catching the bad guy. In: Levine N (ed)
CrimeStat III: a spatial statistics program for the anal-
ysis of crime incident locations (version 3.0), Chapter Geocollaboration
10. Ned Levine & Associates, Houston; National Insti-
tute of Justice, Washington, DC
LaVigne N, Wartell J (1998) Crime mapping case studies: Cuda/GPU
success in the eld, vol 1. Police Executive Research
Forum and National Institute of Justice/US Depart-
ment of Justice, Washington, DC Cheng-Zhi Qin
LeBeau JL (2004) Distance analysis: man with a gun calls State Key Laboratory of Resources &
for Servicein Charlotte, N.C., 1989. In: Levine N (ed) Environmental Information System, Institute of
CrimeStat III: a spatial statistics program for the analy-
sis of crime incident locations (version 3.0), Chapter 4. Geographic Sciences & Natural Resources
Ned Levine & Associates, Houston; National Institute Research, Chinese Academy of Sciences,
of Justice, Washington, DC Beijing, P.R. China
Levine N (1996) Spatial statistics and GIS: software
tools to quantify spatial patterns. J Am Plan Assoc
62(3):381 392 Synonyms
Levine N (2004) Risk-adjusted nearest neighbor hierar-
chical clustering. In: Levine N (ed) CrimeStat III: a
General-purpose computing on graphics process-
spatial statistics program for the analysis of crime
incident locations (version 3.0), Chapter 6. Ned Levine ing units (GPGPUs)
Cuda/GPU 389
have continually improved their performance and CUDA, which was rst released by NVIDIA in
capacities. 2007, brought GPGPU widespread popularity.
In the early 2000s, both NVIDIA and ATI CUDA is a C-language extension for general-
added programmable capacity and oating- purpose programming used exclusively for
point support to GPUs. These improvements recent NVIDIA GPUs (NVIDIA Corp. 2012).
make it possible to off-load the non-graphical Other main programming models for GPGPU
calculations from CPUs to GPUs. One of the rst include DirectCompute from Microsoft, which
attempts to use GPUs in scienti c computing was is speci c to newer Windows operating system,
the matrix multiplication function developed in and OpenCL, which is designed by Apple Inc.
2001 (Larsen and McAllister 2001). This new and maintained by Khronos Group. Since rst
trend was represented by the term GPGPU, being released in 2009, OpenCL has been an
which was proposed by Dr. Mark Harris in 2002 industry standard for GPGPUs because it not
(See http://GPGPU.org/about. Accessed on 8 Jan only provides capabilities similar to CUDA but
2015.). also has programming portability across GPUs,
Without easy-to-use GPGPU programming multicore processors, and operating systems
tools, the use of GPGPUs would not be practical, (Stone et al. 2010; Munshi 2012).
nor would it have received such widespread With the many applications of GPGPU,
acceptance. Although graphics application GPU producers continually enhance their
programming interfaces (APIs) such as OpenGL computational performance. The computing
and DirectX were released in the 1990s to aid capacity of GPUs has been doubled every
in the development of graphics applications, 12 18 months and is several times higher than
these graphics APIs were inconvenient for that of contemporary CPUs (Lindholm et al.
developing non-graphical applications. Instead 2008) (Fig. 2). NVIDIA s GPUs with Fermi
Cuda/GPU, Fig. 2 Comparison of computational capacity (unit: 10 billion FLOP(s) or GFLOP(s)) between NVIDIA
GPU and Intel CPU (Adapted from NVIDIA Corp. 2012)
X
Zernike