Introduction To GIS Programming and Fundamentals With Python and ArcGIS® Geo Zaghlol PDF
Introduction To GIS Programming and Fundamentals With Python and ArcGIS® Geo Zaghlol PDF
Programming and
Fundamentals with
Chaowei Yang
Introduction to GIS
Programming and
Fundamentals with
Python and ArcGIS®
Introduction to GIS
Programming and
Fundamentals with
Python and ArcGIS®
Chaowei Yang
With the collaboration of
Manzhu Yu
Qunying Huang
Zhenlong Li
Min Sun
Kai Liu
Yongyao Jiang
Jizhe Xia
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors
and publishers have attempted to trace the copyright holders of all material reproduced in this
publication and apologize to copyright holders if permission to publish in this form has not been
obtained. If any copyright material has not been acknowledged please write and let us know so we
may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.
copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC),
222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that
provides licenses and registration for a variety of users. For organizations that have been granted a
photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Preface......................................................................................................................xv
Acknowledgments............................................................................................... xxi
Editor................................................................................................................... xxiii
Contributors......................................................................................................... xxv
Section I Overview
1. Introduction......................................................................................................3
1.1 Computer Hardware and Software..................................................... 3
1.2 GIS and Programming..........................................................................5
1.3 Python...................................................................................................... 7
1.4 Class and Object..................................................................................... 9
1.5 GIS Data Models................................................................................... 10
1.6 UML....................................................................................................... 11
1.7 Hands-On Experience with Python.................................................. 14
1.8 Chapter Summary............................................................................... 16
Problems........................................................................................................... 17
2. Object-Oriented Programming.................................................................. 19
2.1 Programming Language and Python............................................... 19
2.2 Class and Object................................................................................... 21
2.2.1 Defining Classes..................................................................... 21
2.2.2 Object Generation................................................................... 23
2.2.3 Attributes................................................................................. 23
2.2.4 Inheritance............................................................................... 25
2.2.5 Composition............................................................................ 26
2.3 Point, Polyline, and Polygon............................................................... 27
2.4 Hands-On Experience with Python.................................................. 30
2.5 Chapter Summary...............................................................................30
Problems........................................................................................................... 31
3. Introduction to Python................................................................................. 35
3.1 Object-Oriented Support..................................................................... 35
3.2 Syntax.................................................................................................... 36
3.2.1 Case Sensitivity....................................................................... 36
3.2.2 Special Characters.................................................................. 36
vii
viii Contents
3.2.3 Indentation............................................................................... 36
3.2.4 Keywords................................................................................. 37
3.2.5 Multiple Assignments............................................................ 38
3.2.6 Namespace............................................................................... 38
3.2.7 Scope......................................................................................... 38
3.3 Data Types............................................................................................. 40
3.3.1 Basic Data Types..................................................................... 40
3.3.2 Composite Data Types...........................................................42
3.4 Miscellaneous....................................................................................... 48
3.4.1 Variables................................................................................... 48
3.4.2 Code Style................................................................................ 49
3.5 Operators............................................................................................... 50
3.6 Statements............................................................................................. 53
3.7 Functions...............................................................................................54
3.8 Hands-On Experience with Python.................................................. 56
3.9 Chapter Summary............................................................................... 56
Problems........................................................................................................... 57
6. Shapefile Handling....................................................................................... 97
6.1 Binary Data Manipulation.................................................................. 97
6.2 Shapefile Introduction....................................................................... 101
6.3 Shapefile Structure and Interpretation........................................... 102
6.3.1 Main File Structure of a Shapefile...................................... 102
6.3.1.1 Main File Header................................................... 102
6.3.1.2 Feature Record....................................................... 104
6.3.2 Index File Structure (.shx).................................................... 105
6.3.3 The .dbf File........................................................................... 107
6.4 General Programming Sequence for Handling Shapefiles.......... 107
6.5 Hands-On Experience with Mini-GIS............................................ 108
6.5.1 Visualize Polylines and Polygons....................................... 108
6.5.2 Interpret Polyline Shapefiles............................................... 109
6.6 Chapter Summary............................................................................. 113
Problems......................................................................................................... 113
References............................................................................................................ 287
Index...................................................................................................................... 291
Preface
xv
xvi Preface
Hands-On Experience
As a practical text for developing programming skills, this book makes
every effort to ensure the content is as functional as possible. For every
introduced GIS fundamental principle, algorithm and element, an example
is explored as a hands-on experience using Mini-GIS and/or ArcGIS with
Python. This learning workflow helps build a thorough understanding of
the fundamentals and naturally maps to the fundamentals and program-
ming skills.
For system and open-source development, a step-by-step development of
a python-based Mini-GIS is presented. For application development, ArcGIS
is adopted for illustration.
The Mini-GIS is an open-source software developed for this text and can be
adopted for building other GIS applications. ArcGIS, a commercial p roduct
from ESRI, is used to experience state-of-the-art commercial software.
For learning purpose, ArcGIS is available for free from ESRI.
Online Materials
This book comes with the following online materials:
The intent of the authors for such a broad audience is based on the desire to
cultivate a competitive professional workforce in GIS development, enhance
the literature of GIS, and serve as a practical introduction to GIS research.
• Dr. Min Sun, Ms. Manzhu Yu, Mr. Yongyao Jiang, and Mr. Jizhe Xia
developed Section III in collaboration with Professor Yang.
• Professor Yang edited and revised all chapters to assure a common
structure and composition.
• Ms. Manzhu Yu and Professor Yang edited the course slides.
• Assistant Professor Li, Mr. Kai Liu, Mrs. Joseph George, and
Ms. Zifu Wang edited Mini-GIS as the software for the text.
• After the above text and course materials were completed, four
professors and two developers were invited to review the text’s
content.
• The assembled materials for the text were finally reviewed by
several professionals, including Ms. Alena Deveau, Mr. Rob
Culbertson, and Professor George Taylor.
• The text was formatted by Ms. Minni Song.
• Ms. Manzhu Yu and Professor Yang completed a final review of the
chapters, slides, codes, data, and all relevant materials.
Acknowledgments
This text is a long-term project evolving from the course “Introduction to GIS
Programming” developed and refined over the past decade at George Mason
University. Many students and professors provided constructive s uggestions
about what to include, how best to communicate and challenge the students,
and who should be considered as audience of the text.
The outcome reflects Professor Yang’s programming career since his
undergraduate theses at China’s Northeastern University under the
mentoring of Professor Jinxing Wang. Professor Yang was further mentored
in programming in the GIS domain by Professors Qi Li and Jicheng Chen.
His academic mentors in the United States, Professors David Wong and
Menas Kafatos, provided support over many decades, giving him the chance
to teach the course that eventually led to this text.
Professor Yang thanks the brilliant and enthusiastic students in his
classes at George Mason University. Their questions and critiques honed
his teaching skills, improved the content, and prompted this effort of
developing a text.
Professor Yang thanks his beloved wife, Yan Xiang, and children—Andrew,
Christopher, and Hannah—for accommodating him when stealing valuable
family time to complete the text.
Ms. Manzhu Yu extends her gratitude to the many colleagues who
provided support, and read, wrote, commented, and assisted in the editing,
proofreading, and formatting of the text.
Assistant Professor Huang thanks her wonderful husband, Yunfeng Jiang,
and lovely daughter, Alica Jiang.
Dr. Min Sun thanks her PhD supervisor, Professor David Wong, for
educating her. She also thanks David Wynne, her supervisor in ESRI where
she worked as an intern, and her other coworkers who collectively helped
her gain a more complete understanding of programming with ESRI
products. Last but not least, she thanks her parents and lovely dog who
accompanied her when she was writing the text.
Yongyao Jiang thank his wife Rui Dong, his daughter Laura, and his par-
ents Lixia Yao and Yanqing Jiang.
xxi
Editor
xxiii
Contributors
xxv
xxvi Contributors
Overview
1
Introduction
3
4 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
(a) (b)
FIGURE 1.1
(a) NASA supercomputer. (From NASA supercomputer at http://www.nas.nasa.gov/hecc/
resources/pleiades.html.) (b) Other computers: personal computer (PC), laptop, pad. (From differ-
ent computers at http://www.computerdoc.com.au/what-are-the-different-types-of-computers.)
Application software
Word, Web browser, ArcGIS
System software
Windows, Linux,...
Embedded software
Hardware
FIGURE 1.2
Different types of software.
Introduction 5
complete set of GIS functionalities for professionals in the GIS domain. Less
intense, but popular, GIS software used to view the geographic environment
are the online mapping application, such as Google Maps and Google Earth.
1.3 Python
Python was originally developed by a Dutch programmer, Guido van
Rossum, in 1990. Van Rossum was reportedly a fan of the British comedy
series, Monty Python’s Flying Circus, and upon developing the open-source
programming language, he borrowed to the name “Python” for the language
and his nonprofit institution, the Python Software Foundation.
Similar to programming languages C++ and Java, Python is an object-
oriented and interactive language. Python is dynamic in that it uses an auto-
matic memory management mechanism to allocate and release memory for
8 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
data (variables). Python and ArcGIS regularly release new versions of their
programs; this book is based on Python release 2.7 and ArcGIS 10.1.
There are many reasons for choosing Python, including the following:*
• Get familiar with the concept of class and object (Chapters 1 and 2).
• Learn the syntax of Python, including variables, data types, struc-
tures, controls, statements, and other programming structures
(Chapters 1 through 4).
• Build Python programs from scratch and integrate open-source
libraries to facilitate programming (Chapter 5).
• Become comfortable with the Python programming environment
(Python interpreter or Python Text editor, Chapter 6).
• Solve GIS problems by writing code for GIS algorithms (Chapters 7
through 13).
* http://pythoncard.sourceforge.net/what_is_python.html.
Introduction 9
FIGURE 1.3
An example of representing students with the Student class.
10 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 1.4
A UML diagram for the City class.
Introduction 11
FIGURE 1.5
The River class includes three parts.
FIGURE 1.6
The County class includes three parts.
Polygons are another class of vector data that are also represented by a list
of points; however, with polygons, the first and last points are the same. For
example, on the map of the state of Virginia, a specific county, like Fairfax
County, can be represented as a polygon. The county is a type of polygon
class, which includes a list of points, relevant attributes, and a set of meth-
ods. Countries on a world map may also be represented as polygons. In
either case, both the county and country are types of polygons. As shown
in Figure 1.6, the first row is the subject name: County; the second row is the
subject’s attributes: name and population; and the third row refers to the meth-
ods: getName, setPopulation, and setName.
Developing more methods will require adding more methods and attri-
butes to each class to capture the evolution of the data models and the
functionality of software; UML diagrams are used to standardize their rep-
resentation. This section uses class diagrams and relevant UML standards
for the point, polyline, and polygon classes.
1.6 UML
In 1997, the Object Management Group (OMG)* created the UML to record
the software design for programming. Software designers and programmers
use UML to communicate and share the design. Similar to the English lan-
guage in which we communicate through sharing our ideas via talking or
writing, UML is used for modeling an application or problem in an object-
oriented fashion. UML modeling can be used to facilitate the entire design
and development of software.
The UML diagram is used to capture the programming logic. There are
two types of diagrams that we will specifically discuss: class diagrams and
object diagrams (Figure 1.7).
The UML class diagram can represent a class using three parts: name,
attributes, and methods. The attributes and methods have three different
accessibilities: public (+), private (-), and protected (#). Attributes and meth-
ods are normally represented in the following format:
Diagram
Structure Behavior
diagram diagram
Composite State
Profile Deployment Package Interaction
structure machine
diagram diagram diagram diagram
diagram diagram
Interaction
Sequence Communication Timing
Notation: UML overview
diagram diagram diagram
diagram
FIGURE 1.7
The class diagram and object diagram used in this book.
13
14 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 1.8
Inheritance and dependency.
(a) (b)
Composition
Counties Aggregation State Point MultiPoint
0..* 2..*
Hollow Filled
diamond diamond
FIGURE 1.9
(a) Aggregation and (b) composition are two polar relationships among classes.
FIGURE 1.10
Multicity relationship among classes.
of an object can be changed. Figure 1.11’s class name is worldMap, and its
object is the coordinate system that changed from WGS 1972 to WGS 1984
after performing reprojection.
FIGURE 1.11
worldMap is an object of the Map class and the state is changing with different operations.
distances between points. You will learn how to create point objects from
point class.
FIGURE 1.12
Launch the Python programming window (GUI).
16 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
>>> p1 = Point()
>>> p2 = Point()
>>> p1.setXY(1,2)
>>> p2.setXY(2,3)
>>> p1.calDis(p2)
1.4142135623730951
>>>
CODE 1.1
Creating a point class and generating two points, then calculating the distance between the
two points.
Programming tips:
PROBLEMS
• Define computer, programming, software, and GIS.
• What are the different methods to categorize software?
• What are the three GIS data models found on the UML diagram?
• Explain why we need to learn GIS programming.
• Use the UML diagram to model the relationship between polylines.
• Use the UML diagram to model the relationship between polygons.
• Practice Python’s Chapter 3 tutorial: https://docs.python.org/3/tuto-
rial/introduction.html.
• Use Python to calculate the distance between Point (1, 2) and Point
(2, 2).
• Discuss how to identify classes used on a world map and how to use
UML to capture those classes.
2
Object-Oriented Programming
19
20 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 2.1
Print ‘A’ 1000 times using different types of languages.
* http://grass.osgeo.org/.
Object-Oriented Programming 21
* http://www.esri.com/software/arcgis.
22 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
Method name
Assign value
Dot used to call
the attribute
FIGURE 2.2
An example of defining a Point class with Python.
(such as Point class here). The __init__ method has four ‘_’—two before and
two after ‘init’—to make it the construction method that will be used when
creating an object. For all methods defined by a class, the first parameter is
always ‘self’, which refers to the object itself. This can be used to refer to the
attributes and methods of the objects. For example, the __init__ method will
create a point object with self as the first parameter and x, y, name initial val-
ues for the object. By default (without specifying the values), the values for
x, y, and name will be 0, 0, and blank string, respectively. The first two state-
ments of Code 2.1 create two point objects (point0 and point1). The object
point0 is created with default values and point1 is created with arguments
CODE 2.1
Creating a point may pass in value to the object through parameters.
Object-Oriented Programming 23
of 1, 1, and ‘first point’. If no parameters are given when creating point0, the
default values 0, 0, and ’ ’ will be used. When values (1, 1, ’first point’) are
given parameters, the __init__ method will assign the values passed into the
attributes of point1.
objectName = className(value1,value2,…)
In Code 2.1, we generated two objects, point0 and point1. While declaring
object point0, no parameter is passed while three values (1, 1, ’first point’) are
used to generate point1.
To refer to an object’s attribute or method, we start with the objectName,
followed by a period and then end with the attribute name or method name.
objectName.attributeName
objectName.methodName()
Code 2.1 uses .x, .y, and .name following the objects point0 and point1
to refer to the attributes x, y, and name. The instruction point1.setName() is
called to change the name of point1 to ‘second point’.
2.2.3 Attributes
Each class may have one or more attributes. Section 1.4 explains how attri-
butes can be public, private, or protected to indicate different accessibility by
other objects. How do you explicitly specify the public and private attributes
while declaring a class?
CODE 2.2
Declare public, private, and protect attributes.
(Code 2.3). For example, we can create a map including different layers, and
the layer scale can be static and the same to all layer objects.
A class (and instantiated object) can have special built-in attributes.
The special class attributes include a class name and description of the class
(Code 2.4).
>>> class Test:
version = 1.0
>>> Test.version
1.0
>>> t1 = Test()
>>> t2 = Test()
>>> t1.version
1.0
>>> t2.version
1.0
>>> Test.version = 2.0
>>> t1.version
2.0
>>> t2.version
2.0
>>> t1.version = 3.0
>>> t1.version
3.0
>>> Test.version
2.0
>>> t2.version
2.0
>>>
CODE 2.3
Declare static attributes.
Object-Oriented Programming 25
>>> Point.__name__
'Point'
>>> Point.__doc__
'Point Class Definition'
>>> Point.__module__
'__main__'
>>>
CODE 2.4
Special class attributes.
>>> p1 = Point()
>>> p1.__class__
<class __main__.Point at 0x02A100D8>
>>> p1.__dict__
{'y': 0.0, 'x': 0.0}
>>>
CODE 2.5
Special object attributes.
The special object attributes include a class name and an object’s attributes
(Code 2.5).
2.2.4 Inheritance
Chapter 1 introduces three important relationships among objects in object-
oriented programming: inheritance, encapsulation, and polymorphism.
Inheritance is an efficient way to help reuse a developed class. While private
attributes and methods cannot be inherited, all other public and protected
attributes and methods can be automatically inherited by subclasses.
26 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 2.3
An example of inheritance (ParkingLot class inherits from class Polygon, and Polygon i nherits
from Feature).
To inherit a super class in Python, include the super class name in a pair of
parentheses after the class name.
class DerivedClassName(SuperClass1)
We can also inherit multiple classes in Python by entering more than one
class name in the parentheses.
2.2.5 Composition
Composition is an efficient way to help us reuse created objects, and to
maintain the part-to-whole relationship between objects. To maintain the
Object-Oriented Programming 27
FIGURE 2.4
Composition example (a Polygon class includes attribute points as objects generated from class
Point).
composition relationship, you must define a class with an attribute that can
include a number of other class objects.
Figure 2.4 shows an example of composition. The class Point and the class
Polygon inherit from the class Feature. The class Polygon border is defined by
a sequence of points formed in a ring and is captured by point attributes.
The points’ coordinates are kept in the point objects. Not only does this show
how a Polygon object requires a number of Point objects, but also the composi-
tion relationship between Point and Polygon.
• Feature: Normally, a Feature class (Figure 2.5) has a name to keep the
feature name and a method draw() to draw the feature on a map.
The draw method should include at least two parameters, self and
map. Self refers to the object accessing feature object data while
drawing, whereas a map refers to the b ackground that we will draw
the f eature on.
28 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 2.5
UML design for Feature class to define common attributes and methods of Point, Polyline, and
Polygon.
>>>
CODE 2.6
Define a Feature class as a super class for Point, Polyline, and Polygon.
FIGURE 2.6
UML design for Point class to keep point vector data.
>>>
CODE 2.7
Define a Point class in Python.
>>> p1 = Point(1,2)
>>> p2 = Point(2,2)
>>> p1.calDis(p2)
1.0
>>>
CODE 2.8
Calculate the distance between (1, 2) and (2, 2).
(a) (b)
FIGURE 2.7
(a) UML Polyline class uses point object list to keep coordinates for polylines. (b) UML Polylines
class uses x and y lists to keep coordinates data for Polylines.
FIGURE 2.8
UML design for Polygon class to keep polygon vector data.
30 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
>>>
CODE 2.9
A Polyline class has one attribute (points), two methods (setPoints(), and getLength()).
length of the Polygon without arguments. The return value for the
border length of polygon is designated as float.
PROBLEMS
1. Pick three points, for example, (1, 100), (25, 60), and (1, 1). Could you
form a polyline or polygon using these three points?
2. Create an algorithm to calculate the distance between two points, for
example, (x1, y1), (x2, y2).
3. Read Python Tutorial 6.2 and 6.3. (Use Python command line
window for 6.2).
4. Define and program three classes for Point, Polyline, and Polygon.
5. Add distance calculation in-between every two points, and program
to calculate the distance among the three points given.
6. Add the getLength() method in Polyline and Polygon; create a polyline
and polygon using the three points given; calculate the length of the
polyline and perimeter of the polygon.
Section II
Python Programming
3
Introduction to Python
35
36 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
GIS data classes can be put into one big module. Submodules include classes/
methods for vector, raster, and special data types. If other classes need to
access the classes/methods in a module, the module must be imported first.
For example, the math module in Code 2.5 and Code 2.6 are imported to
access all math methods (e.g., math.sqrt).
3.2 Syntax
3.2.1 Case Sensitivity
Python is case sensitive, meaning capital and lowercase letters represent dif-
ferent identifiers. You can define a variable myList with an uppercase L, and
store the list of items “1, 2, 3, and 4.” If you get the first value of the list using
mylist[0] with a lowercase l, you will see a NameError, which shows that mylist
is not defined because you defined myList using a capital L (Code 3.1).
3.2.3 Indentation
In Python, indentation is important for grouping code. Indented lines start
at different positions or column; numbers are not allowed, they will trigger
an IndentationError. You may use a different number of spaces or columns
to indent different levels of statements; however, 4 or 8 spaces are recom-
mended. Therefore, space and tabs play significant roles in organizing codes.
Different program editors (e.g., command line and Python GUI) use “tab” in
different manners. Depending on your text editor, it may represent different
numbers of spaces.
CODE 3.1
Case sensitive.
Introduction to Python 37
TABLE 3.1
Special Characters in Python
Symbols Function Example
\ Escape characters that have a >>> print ''test''
special meaning test
>>> print '\'test\''
'test'
>>> print '\\test'
\test
\n New line >>> print 'first line\nsecond line'
first line
second line
\t Tab >>> print 'str1\tstr2'
str1 str2
: Go to next level of statements >>> class Polyline:
def getLength():
pass
# Indicate Python comments >>> # this is a comment
; Join multiple statements on a >>> import math; x = math.pow(2,3)
single line >>> import math y = math.pow(2,3)
SyntaxError: invalid syntax
3.2.4 Keywords
Keywords, such as def and del, are reserved words and cannot be used for
any other purpose (e.g., as the names of variables, classes, and objects); oth-
erwise, a SyntaxError will occur (Code 3.2). Table 3.2 lists the keywords in
Python.
>>> x =and
SyntaxError: invalid syntax
>>>
CODE 3.2
Keywords SyntaxError example.
TABLE 3.2
Python Keywords
and elif global or
assert else if pass
break except import print
class exec in raise
continue finally is return
Def for lambda try
Del from not while
38 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
>>> a = b = c = 0
>>> a1,b1,c1= 1, 1.0, 'c1'
>>> (a1,b1,c1)=(2, 2.0, 'c2')
>>> (a2,b2,c2)=(2, 2.0, 'c2')
>>> a, b, c, a1, b1, c1, a2, b2, c2
(0, 0, 0, 2, 2.0, 'c2', 2, 2.0, 'c2')
>>>
CODE 3.3
Multiple assignments.
• The first line of code assigns the same value to multiple variables by
using “a = b = c = value.”
• The second and third lines of code assign different values to dif-
ferent variables by using “a1, b1, c1 = v1, v2, v3,” or “(a2, b2, c2) =
(v1, v2, v3).”
3.2.6 Namespace
A namespace is a place in which a name resides. Variables within a namespace
are distinct from variables having the same names but located outside of
the namespace. It is very easy to confuse names in different namespaces.
Namespace layering is called scope. A name is placed within a namespace when
that name is given a value. Use dir to show the available names within an indi-
cated namespace. For example, dir() can find current namespace names, dir(sys)
Built-in namespace
(built-in names, such as int()) Outermost
scope
Global namespace
(global names) Global
scope
Local namespace
(local names) Local
scope
FIGURE 3.1
Hierarchy of namespaces.
Introduction to Python 39
will find all names available from sys, and dir(math) will find all names available
from math. A program typically includes three layers of scope (Figure 3.1):
3.2.7 Scope
Scope refers to a portion of code and is used to identify the effectiveness
of variables. Scope is important in functions, modules, classes, objects,
and returned data. Modules with function and data act similar to objects.
For example, when defining Point class, use p1 as a variable within the
calDis() function of the class; or use p1 to refer to an object later when c reating
a point object. The first p1 is only effective in the scope of the Point class’
calDis() function. The second p1 is only effective at the same level as the
overall p
rogram without indentation.
Variables can be local or global:
FIGURE 3.2
Local and global variables.
40 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
• Integers
Integers are equivalent to integers in C programming language. The
range of an integer is limited as follows:
−231 ∼ 232 (−2147483648∼4294967296)
The integer value can be represented by a decimal, octal, and
hexadecimal format. For example, 010 is an octal representation of 8
and ox80 is a hexadecimal of 8.
• Long integers of nonlimited length
The range of long integer is only limited by computer memory. A
long integer is denoted by appending an upper- or lowercase “L.” It
can also be represented in decimal octal and hexadecimal formats.
• Float
Floating numbers are equivalent to doubles in C language. A float
value is denoted by a decimal point (.) in the appropriate place and
TABLE 3.3
Basic Data Types
Basic Variable Conversion
Types Range Description Examples (Typecast)
Number Integer −231 ∼ 232 decimal , octal, 20, −20, 010, int(), e.g., int(2.0),
and hexadecimal ox80 int(‘2’), int(2L)
format
Long Limited only by Denoted by (L) 20L, −20L, long(), e.g.,
integer memory or (l). 010L, ox80L long(2), long(‘2’)
float Depends on machine Denoted by a 0.0, −77.0, 1.6, float(), e.g.,
architecture and decimal point (.) 2.3e25, 4.3e-2 float(2),
python interpreter
String N/A Denoted by (‘’), ‘test’, “test” str(), e.g., str(2.0)
(“”)
Introduction to Python 41
CODE 3.4
Data type conversion.
Tips
Typecast: Convert data from one type to another, for example, float (‘3.14’),
which casts a string data type to float.
Type conversed assignment may result in lost precision, for example
y = 3.14
x = int(y)
where x will lose the precision values and has a value of 3 as a result.
• Strings
String data are denoted by single quotes ‘’ or double quotes “”.
• Other built-in types
There are several other built-in data types, such as type, None, func-
tion, and file. Code 3.5 illustrates the following types:
Function type () takes an object as an argument and returns the
data type of the object.
None is the null object and has no attribute.
bool object has two potential values: True and False. Conditional
expressions will result in Boolean value as either True or False.
Tips
Different from C or Java language, Python does not support Byte, Boolean,
Char, Pointer data types.
42 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
>>> type(None)
<type 'NoneType'>
>>>
CODE 3.5
Function type, None type, and bool type.
• List
The most commonly used and important composite data type is list,
which can be used to group different values together. Use list to keep
a series of points, polylines, and polygons in GIS programs.
Define: A list object can be created from: [v1, v2, v3, ….], where
elements are surrounded by a square bracket (e.g., as in Code 3.6).
Operators: Composite data types are good for expressing complex
operations in a single statement. Table 3.4 lists common operators
shared by complex data types.
• seq[index]: gets a value for a specific element. The starting index
of all sequence data is 0, and the end index is one fewer than the
number of elements n in the sequence (i.e., n-1) (Code 3.6a).
TABLE 3.4
Composite Data Types and Common Operators
Container Define Feature Examples Common Operators
List Delimited by [ ]; mutable [‘a’, ‘b’, ‘c’] seq[index],
Tuple Denoted by immutable (‘a’, ‘b’, ‘c’) seq[index1: index2],
parenthesis () seq * expr,
dictionary {key: value, key: mutable {‘Alice’: ‘7039931234’, seq1 + seq2, obj in
value , …} ‘Beth’: ‘7033801235’} seq, obj not in seq,
len(seq) etc
Introduction to Python 43
>>> a = [1,2,3,4]
>>> a
[1, 2, 3, 4]
>>> b = [x*3 for x in a]
>>> b
[3, 6, 9, 12]
>>> a[0]
1
>>> a[3]
4
>>> a[4]
>>> sum
21
>>>
CODE 3.6
List example.
>>> a = [1,2,3,4]
>>> a[0]
1
>>> a[3]
4
CODE 3.6a
A List operation.
44 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
>>> a = [1,2,3,4]
>>> a[4]
CODE 3.6b
List operation out of range.
>>> a = [1,2,3,4]
>>> len(a)
4
CODE 3.6c
List length.
>>> a = [1,2,3,4]
>>> a[1:3]
[2, 3]
CODE 3.6d
Subset a List.
>>> a = [1,2,3,4]
>>> a
[1, 2, 3, 4]
>>> del a[0]
>>> a
[2, 3, 4]
CODE 3.6e
Delete an element from a List.
>>> a = [1,2,3,4]
>>> a*3
[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]
CODE 3.6f
List multiplies an integer.
Introduction to Python 45
>>> a = [1,2,3,4]
>>> a = [11,12,13,14]
>>> a + b
[1, 2, 3, 4, 11, 12, 13, 14]
CODE 3.6g
Union two sequence objects.
>>> a = [1,4,7,9]
>>> sum = 0
>>> for i in a:
sum+=i
>>> sum
21
CODE 3.6h
Loop each object in the complex data.
• obj in seq (obj not in seq): loops through each object in the complex
data and performs an operation with each element. The example
goes through each object in the list a, and adds the value of each
object to the sum obj (Code 3.6h).
String data type also belongs to the sequence data type, and those opera-
tors can be applied to a string object.
Methods: As seen in the classes created in the previous chapters, a list is a
system built-in class. The objects created from list have many methods. The
most important methods include append(), insert(), remove(), pop(), sort(), and
reverse() (Code 3.7).
>>> a = [1,2,3,4]
>>> a.append(10)
>>> a Append item 10 at the last position
[1, 2, 3, 4, 10]
>>> a.insert(2,15)
>>> a Insert item 15 at the index position
[1, 2, 15, 3, 4, 10]
>>> a.pop()
10 Popup the last item
>>> a
[1, 2, 15, 3, 4]
>>> a.sort()
>>> a Sort the list
[1, 2, 3, 4, 15]
>>>
CODE 3.7
List methods.
46 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
• filter(func, list): Filter is used to extract a target list from the original
list. For example, you can use the filter function to select all cities
within Virginia, or select the restaurants and hotels within Fairfax
city.
• map(func, list): Map is used to convert the original list to a new list
using the function. For example, you can convert it from degrees to
meters.
• reduce(func, list): Reduce is another method that is useful for real-
world GIS problems. To calculate the total street or road length of
Fairfax County, reduce can invoke a function func iteratively over
each element of the list, returning a single cumulative value.
• Tuple
Similar to lists, tuple is another complex data type. One obvious
difference between tuple and list is that it is denoted by the use of
parentheses. Another difference is that tuple data type is immuta-
ble (Table 3.4), meaning that the element cannot be altered once it is
defined. An error will occur if the value of a tuple element is altered
(Code 3.8).
• Dictionary
A dictionary is mutable and a container data type that can store any
Python objects, including other container types. A dictionary differs
from sequence type containers (lists and tuples) in how the data are
stored and accessed.
>>> a = [1,2,3,4]
>>> a
[1, 2, 3, 4]
>>> a[0]=5
>>> a
[5, 2, 3, 4]
>>> x = (1,2,3,4)
>>> x
(1, 2, 3, 4)
>>> x[0]=5 Immutable, cannot be changed
CODE 3.8
Tuple operation.
Introduction to Python 47
CODE 3.9
Dictionary operation.
• Set
Set is used to construct and manipulate unsorted collections of
unique elements. A set object can either be created from {v1, v2,
v3,….} where elements are surrounded by braces, or from a set(list)
where the argument is a list (Code 3.10).
Operations: Set supports several operations (Table 3.5), including union (|),
intersection (&), difference (-), and symmetric difference (^) (Linuxtopia 2016).
>>> s = {1,3}
>>> s
set([1, 3])
>>> a = [2,4,3,1]
>>> s = set(a)
>>> s
set([1, 2, 3, 4])
>>>
CODE 3.10
Set operation.
48 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
TABLE 3.5
Operations between Two Set Objects, s and t
Operation Operator Function Usage
difference − Create a new set with
elements in s but not in t
intersection & Create a new set with Get spatial objects with both
elements common to conditions matched, such as finding
s and t the restaurants within Fairfax, as well
as with French style
symmetric ^ Create a new set with
difference elements in either s or t
but not both
union | Create a new set with Combine two states’ cities to get
elements in both s and t collection
>>> s = set(['A','B','C','D'])
>>> t = set(['A','B','E','F'])
>>> s-t
set(['C', 'D'])
>>> s&t
set(['A', 'B'])
>>> s^t
set(['C', 'E', 'D', 'F'])
>>> s|t
set(['A', 'C', 'B', 'E', 'D', 'F'])
>>> s.difference(t)
set(['C', 'D'])
CODE 3.11
Set operations.
3.4 Miscellaneous
3.4.1 Variables
A variable is a memory space reserved for storing data and referred to by
its name. Before use, a variable should be assigned a value. Variables have
different types of values. The basic variable value types include byte, short,
int, long, text, float, and double, as introduced in Section 3.4.1. Some types of
data can be converted using typecast. For example, float(“3.14”) will convert
texts into a floating number 3.14.
Introduction to Python 49
FIGURE 3.3
Dynamic data type.
In Python, variable types are dynamic, with the type only defined when
its value is assigned. This means a variable can change to a different data
type. For example (Figure 3.3), x = float(1) will assign x as float, but x = ‘x has
a dynamic type’ will change the variable x from a float type to a string type.
Here, we use ‘p1’ as an object variable name.
A name is required for each variable. The variable’s name must be a
legal identifier, which is a limited combination series of alphabet let-
ters, digits, and underscores. The name must begin with a character
or u nderscore, but it may not start with a digit. Therefore, ‘point1’ is a
legal name, but ‘1point’ is illegal. In addition, blanks are not allowed
in variable name. Python reserved words (Table 3.2) cannot be used as
variable name.
FIGURE 3.4
Coding style.
The compiler ignores everything from the # to the end of the line.
3.5 Operators
Operators include basic characters, division and type conversion, modulo,
negation, augmented assignment, and Boolean operations. These operators
are categorized into several types:
TABLE 3.6
Arithmetic Operators (Assume Variable a Holds 5 and Variable b Holds 2)
Arithmetic Operators Description Example
+ Addition >>> a + b
7
− Subtraction >>> a − b
3
* Multiplication >>> a * b
10
/ Division >>> a/b
2.5
** Exponentiation: Performs exponential (power) >>> a ** b
calculation on operators 25
% Modulus: Divides left-hand operand by right-hand >>> a % b
operand and returns remainder 1
// Floor Division: The division of operands where the >>> a // b
result is the quotient in which the digits after the 2
decimal point are removed >>> 5.0 // 2.0
2.0
TABLE 3.7
Bitwise&Shift Operators (Assume Variable a Holds 5 and Variable b Holds 2)
Item Description Example
>> Binary Right Shift Operator. The left operand’s value is moved a >> b will give 1,
right by the number of bits specified by the right operand. which is 0000 0001
<< Binary Left Shift Operator. The left operand’s value is moved a << b will give 20,
left by the number of bits specified by the right operand. which is 0001 0100
& Binary AND Operator copies a bit to the result if it exists in a & b will give 0,
both operands. which is 0000 0000
| Binary OR Operator copies a bit if it exists in either operand. a | b will give 7,
which is 0000 0111
^ Binary XOR Operator copies the bit if it is set in one operand a ^ b will give 7,
but not both. which is 0000 0111
52 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
TABLE 3.8
Assignment Operators (Assume Variable a Holds 5 and Variable b
Holds 2)
Assignment
Operators Name How to Use Equivalent Result
= Equal assignment a = b Set a as b 2
+= Add AND a+ = b a = a+b 7
−= Subtract AND a− = b a = a − b 3
*= Multiply AND a* = b a = a * b 10
/= Divide AND a/ = b a = a/b 2
%= Modulus AND a % = b a = a % b 1
** = Exponent AND a ** = b a = a ** b 25
|= Binary OR AND a| = b a = a | b 7
^= Binary XOR AND a^ = b a = a ^ b 7
<<= Left shift AND a<< = b a = a << b 20
>>= Right shift AND a>> = b a = a>>b 1
>>> b=3
>>> a=b
>>> a
3
>>> a+=b
>>> a
6
CODE 3.12
Add AND example.
TABLE 3.9
Comparison Operators (Assume Variable a Holds 5 and Variable b Holds 2)
Comparison
Operators How to Use Compare (or Check) Results
== a= = b a equals b (a == b) is not true.
< a<b a is less than b (a < b) is not true.
> a>b a is greater than b (a > b) is true.
>= a >= b a is greater than or equal to b (a >= b) is true.
<= a <= b a is less than or equal to b (a <= b) is not true.
!= a != b a is not equal to b (a != b) is true.
is a is b a and b are the same object (a is b) is not true.
is not a is not b a and b are different objects (a is not b) is true.
in a in range(b) a is a member of [1, 2, …,b] (a in range (b)) is not true.
not in a not in range (b) a is not a member of [1, 2, …,b] (a not in range (b)) is true.
Introduction to Python 53
TABLE 3.10
Logic Operators (Assume Variable a Holds True and Variable b Holds True)
Logic Operators How to Use Results
And Logical AND operator. If both the operands are (a and b) is true.
true, then condition becomes true.
Or Logical OR operator. If any of the two operands (a or b) is true.
are nonzero, then condition becomes true.
Not Logical NOT operator. Use to reverse the logical Not (a and b) is false.
state of its operand. If a condition is true, then
Logical NOT operator will make false.
CODE 3.13
Logic operations.
3.6 Statements
A statement is a combination of variables and operators. The statement
should comply with the operator’s usage. If you assign a value, you should
use assignment operator. If you accidentally use comparison operators, you
should expect an error. Pay attention to the statement’s precision. For exam-
ple (Code 3.14), the first and second i seem to be assigned with similar values
using identical division and addition operations. However, they generate dif-
ferent results.
>>> i = 1/2+1/2
>>> i
0
>>> i = 1.0/2 + 1.0/2
>>> i
1.0
>>>
CODE 3.14
Statement examples.
54 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
3.7 Functions
Functions are defined by the keyword def followed by the function name
with various input arguments. Similar to the methods defined within a class,
the number of arguments can be zero to many. Class methods are special
functions with the first argument as ‘self.’
Tips
Use the keyword lambda to declare one line version of a function. Usually
such functions are anonymous because they are not defined in a standard
manner. The body of the lambda function statement should be given on the
same line, like in the add() function (Code 3.17).
>>> hi = hello()
Hello, World!
>>> print hi
None
>>> #Return a value
>>> def add(x,y):
return x+y
>>> z = add(1,2)
>>> print z
3
>>>
CODE 3.15
Return value from a function.
Introduction to Python 55
>>> calCost(100)
105.0
>>> calCost(100,0.075)
107.5
>>>
CODE 3.16
Default arguments.
>>> add(1,2)
3
>>> a = lambda x,y:x+y
>>> b = a(1,2)
>>> print b
3
CODE 3.17
Lambda example.
The Python interpreter has built-in functions that are always available.
They are listed in alphabetical order in Figure 3.5. The functions written
in red have already been introduced. The functions written in blue are
important, and will be introduced in later chapters.
FIGURE 3.5
System built-in functions. (From Python. 2001a. Built-In Functions. https://docs.python.org/3/
library/index.html (accessed September 3, 2016).)
56 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
CODE 3.18
Sum-up calculations.
• Python syntax
• Python data types
• Operators
• What a function is and how to declare a function
Introduction to Python 57
PROBLEMS
1. Keywords: Check the following keywords (Table 3.2.) and briefly
explain them (concepts, when/how to use keywords [use function
help() to get help about each keyword, e.g., help(’if’)], and design/
program an example of how to use the keywords). For example,
if, elif, else keywords.
a. Explain
if, elif, else are the keywords used to make decisions……
b. Examples (Code 3.19)
2. Operators: The five categories of operators include arithmetic
operators, shift & bitwise operators, assignment operators, compari-
son operators, and logic operators. For each pair indicated below,
explain the differences between the two operators and then design
and enter an example in Python interpreter to demonstrate the
difference.
a. “+” vs. “+=”
b. “%” vs. “/”
c. “*” vs. “**”
d. “==” vs. “is”
e. “!=” vs. “is not”
f. “ in ” vs. “not in”
g. “ and ” vs. “or”
h. “ not in ” vs. “ not”
3. Class and Object Concepts
a. Briefly describe the argument “self” in class method and provide
an example.
>>> x = [1,2,3,4,5]
>>> y = 8
>>> z = [6,7,8,9]
>>> if y in x:
print y, 'is in', x
elif y in z:
print y, 'is in z', z
else:
print y, 'is not in either x or z', x, z
8 is in z [6, 7, 8, 9]
>>>
CODE 3.19
If…elif…else example.
58 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
Description: A Point class has two attributes, x and y, to represent the coor-
dinate of a point. A Point class also has a method calDis () to calculate the
distance between two points. The arguments for the calDis method is point
object ‘self,’ and the another point object is “point.” The return value for the
distance between two points is designed as float.
The UML design for Point class to keep point vector data is as follows:
Point(Feature)
+x: float
+y: float
+ calDis(p: Point): float
The following Code 3.20 exemplifies how to implement the Point class:
Description: A Polygon class has one attribute “points” to represent the list
of coordinates. A Polygon class also has a method getLength () to calculate
the perimeter of the Polygon. The arguments for the getLength method
is current Polygon object ‘self.’ The return value for the border length of a
polygon is designed as float.
return math.sqrt((self.x-point.x)**2+(self.y-point.y)**2)
>>>
CODE 3.20
Point class definition.
Introduction to Python 59
The UML design for a Polygon class keeping polygon vector data is as
follows:
Polygon(Feature)
+points: list<Points>
+ getLength(): float
Assign a polygon with the following data: [(1.0, 2.0), (3.0, 5.0), (5.0, 6.0),
(1.0, 2.0)] and calculate the border length of the polygon.
4
Python Language Control Structure, File
Input/Output, and Exception Handling
• Any number with a value of zero (e.g., 0, 0.0, 0L, 0j, Code 4.1 right)
• An empty string (‘’ or “”)
• An empty container, such as list (Code 4.1 left), tuple, set, and
dictionary
• False and None
• Comparison Operators: >, >=, <, <=, = =, !=, is, is not, in, not in
• Logic Operators: and, or, not
61
62 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
False False
>>>
CODE 4.1
False conditional expressions: empty list (left) and zero (right).
Statement syntax
if (conditional expression):
Statement block
Statement example:
if a > b:
print “a is bigger than b”
Statement syntax:
if (conditional expression):
Statement block
else:
Statement block
Statement example:
if a > b:
print “a is bigger than b”
Python Language Control Structure, File Input/Output 63
else:
print “a is smaller than b”
elif syntax:
If the conditional expression 1 (or 2, 3,….) is true, then the statement block
1 (or 2, 3….) will be executed and the other statement block will be skipped.
However, if all above conditions (1, 2, 3, …., n−1) are not true, the blocks under
else (statement block n) will be executed.
Tips: pass statement (Code 4.2)
pass statement is unique in that it does not perform any function. It is used
in the decision-making process, telling the interpreter not to do anything
under certain conditions.
In the software development process, it can serve as a place holder, to be
replaced later with written code (Code 4.3).
a=b=0
if a>b:
pass
else:
pass
CODE 4.2
Pass statement in if … else… structure.
def draw():
pass
CODE 4.3
Pass is used as a place-holder statement written in method.
64 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
4.2 Loops
Another type of control structure is the loop. Usually, a loop executes a block
until its condition becomes false or until it has used up all the sequence
elements in a container (e.g., list). You can either interrupt a loop to start a
new iteration (using continue) or end the loop (using break). Both while and for
can be used to loop through a block of statements.
Syntax:
• while statement: while statement is very flexible, and can repeat a block
of code while the condition is true. The while statement is usually
applied when there is an unknown number of times before execut-
ing the loop. Code 4.5 shows an example of using while loop to calcu-
late the sum of 1 to 100.
• range() function and len() function: The range (Pythoncentral 2011) and
len functions are often used in for and while loops. Using range(start,
end, step) generates a list where for any k, start <= k < end, and k
iterates from start to end with increments of step. For example,
range(0,4,1) produces a list of [0,1,2,3]; range(0,50,10) produces a list
of [0,10,20,30,40]. range function takes 0 as default starting value and
1 as default step. For example, range(4) produces a list of [0,1,2,3].
Code 4.4 is an example using range(4) to produce a list, and using for
loop structure to print every element within the list.
>>> for i in range(4):
print i
0
1
2
3
CODE 4.4
Use range function with default start and step values.
Python Language Control Structure, File Input/Output 65
>>> i=0
>>> total=0
>>> while i<101:
total +=i
i+=1
CODE 4.5
Calculating summary of 1 to 100 using while loop.
The following example illustrates how to use the range function, and
how to calculate the sum of 1 to 100 by using the total variable to hold the
summarizing result for loop (Code 4.6).
The function len() returns the total number of elements in composite data.
For instance, len(polyline.points) can return the number of points within a
polyline. The following example uses while and len() to calculate the length
of a polyline (Code 4.7).
>>> total = 0
>>> for i in range(1,101,1):
total+=i
CODE 4.6
Calculate the summary of 1 to 100 using range and for loop.
CODE 4.7
Calculate the length of a polyline using while loop and len() method.
66 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
CODE 4.8
Test if two lines intersect with each other using break and for loop.
CODE 4.9
Draw a map without drawing the image layers.
• continue: The continue statement is used less often than the break
statement to skip the rest of a loop body under a certain condi-
tion. This can eliminate executions of specific loop values or cat-
egories of values. For example, when a map is drawn, you may
uncheck the layers with an image. The code (Code 4.9) skips all
image layers:
The following example (Code 4.10) identifies the points shared between two
polygons (this is helpful to understand the data structure, although tuple is
not normally used to hold Point coordinates):
Python Language Control Structure, File Input/Output 67
CODE 4.10
Loop and decisions combination example.
CODE 4.11
Calculate distance between points using double loops.
• ‘a’: open the file for appending; any data written to the file are
automatically added to the end of file.
• ‘r+’: open the file for both reading and writing.
• Reading a file: The open() function will load the file from disk storage
to the memory, and return it as a file object. This file object has three
file read methods:
• read(size): returns the entire file when no size parameter is passed
(by default the size is equal to −1), or content as a string in the
byte size specified.
• readline(): reads and returns one line as a string (including
trailing ‘\n’).
• readlines(): reads and returns all lines from file as a list of strings
(including trailing ‘\n’).
• Writing a file: A file object has two written methods:
• write(str): writes string ‘str’ to file.
• writelines(list): writes a list of strings to file; each string element
is one line in the file.
The write function writes the contents of a string to a file. However, there
are other data types such as float and integer. How is data written into a
file? Since a string acts as the input argument, the str() and repr() function
will convert a nonstring object to a string object. For example, if you write
a float number 3.14 into a file, you can use write(str(3.14)). Typically, a text
file is organized line by line, while the write() function writes data into a
file and changes it to a new line using the special characters “\n”. Thus,
write (“first line\nsecond line”) will output two lines in the file as shown
below:
first line
second line
The return value is also a string when reading the data from a file with
read() and readline(); therefore, we need to format those string values into the
data type we prefer. We can use float(str), for example, float(‘3.14’), int(str), for
example, int(‘2.0’), and long(str), for example, long(‘2.0l’) to convert strings to
numbers.
• Change file object’s pointer position: While reading a file, we may need
to skip several lines and read out specific information in the file.
Under such circumstances, we can locate specific lines and words of
a file with the following two methods:
Python Language Control Structure, File Input/Output 69
4.5 Exceptions
Exceptions (Python 2001b) are the errors encountered when executing
Python programs, for example, the errors to open a nonexisting file, divi-
sion by zero, concatenate ‘str’ and ‘int’ objects. These exceptions can be
handled in Python programs using the try…except… statement. There can
be one or more except clauses in the statement. Each except clause is used
to catch one exception. If an exception clause matches the exception, the
program will execute the except clause. If no except clause matches, the
program will be passed on to outer try statements and give the exception
error. Code 4.12 handles the ZeroDivisionError using the try…except…
statement.
>>> slope(1,2,3,4)
1
>>> slope(1,4,1,5)
Error: x1 equals x2
>>>
CODE 4.12
Handle ZeroDivisionError using try…except… statement.
70 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
In this method, double loop is used to calculate the distances between each
pair of four points (p0, p1, p2, p3). As shown as Figure 4.1, during the outer
loop level 1, when i=0, the inside loop will iterate j from 1 to 2 to 3. After the
outer loop advances to i=1, j will be iterated from 2 to 3, and so forth.
CODE 4.13
Find the longest distance between any two points of 3 points using single loop.
Python Language Control Structure, File Input/Output 71
CODE 4.14
Calculate the longest distance between any two points of a 4 point set using double for loop
with i and j as incremental variables.
Note: (1) Make sure the directory ‘c:/code’ exists, or you will gener-
ate an error such as: “IOError: [Errno 2] No such file or directory:
‘c:/code/points.txt’”. makedirs() function in os module could help to
create directory; (2) Make sure you have the permission to create a
file under ‘c:/code’ directory, otherwise you will generate an error
such as “IOError: [Errno 13] Permission denied: ‘c:/code/points.txt’”.
p0 p0 p0 p0 p0 p0
Loop 11: j = 1
p1 p1 p1 p1 p1 p1
Loop 12: j = 2 Loop 21: j = 2
p2 p2 p2 p2 p2 p2
FIGURE 4.1
Use double loop to calculate the distance between each pair of four points p0, p1, p2, p3.
72 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
CODE 4.15
Write a text file.
CODE 4.16
Read from a text file.
"""
GGS 650 Lecture 4 Practice
readPointFile() is the function to parse the following format data:
point:
p1: 1.0, 1.0
p2: 2.0, 2.0\n
readPolylineFile() is the function to parse the polyline format as:
polyline;
1: 1.0, 1.0; 2.0, 2.0;....
2: 2.0, 2.0; 3.0, 3.0;....
"""
>>> import math
>>> class Point: ## define a point class
def __init__(self, x=0.0, y=0.0): ## init method for point class
self.x = x
self.y = y
## Declare getDistance as method of Point
def getDistance (self, other):
return math.sqrt((other.x - self.x)**2 +
(other.y - self.y)**2)
>>> def readPointFile(fileName):
file = open(fileName,'r')
#declare empty list for keeping points, and index for line Num
points,index = [],0
for line in file: ## Read each line iteratively
index += 1 ## Increase index after reading one line
if index == 1:
continue ## "Ignore the first line 'point\n'"
# split the line and get the coordinate,e.g,1.0, 1.0
coords = line.split(':')[1]
## Get the point x, y value
xCoord = coords.split(',')[0]
yCoord = coords.split(',')[1]
point = Point(float(xCoord),float(yCoord))
points.append(point)
file.close() # remember to close file after reading
return points
## Call the function for parsing the point file
>>> points = readPointFile('points.txt')#get all points
#print points
>>> length = len(points) # get the length of points list
>>> for i in range(length):
point = points[i]
print point.x, point.y ##print the x, y value of each point
1.0 1.0
2.0 2.0
10.0 11.0
11.2 13.4
CODE 4.17
Read a formatted GIS point data file.
Which portion of the code should we revise? How about the following
format?
point:
1.0, 1.0; 2.0, 2.0
74 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
4.6.4 Hands-On Experience: Input GIS Point Data from Text File
Code 4.18 reads point text file and parse the data by generating point objects.
The maximum distance between any two points is then found and written
to a result file. Please try it on your computer and interpret the example line
by line.
CODE 4.18
Read and write the content of a point data file.
Python Language Control Structure, File Input/Output 75
Point:
1: 1, 2
2: 100, 300
3: 4, 5
4: 0, 500
5: 10, 400
6: 600, 20
7: 500, 400
8: 500, 500
PROBLEMS
1. Review the Python tutorial “Input and Output,” which came with
Chapter 7 of the Python software help document.
2. Analyze the patterns of the following text string and save it to a text
file, for example, polylines.txt.
Polyline:
1. 1603714.835939442,142625.48838266544; 1603749.4678153452,142620.21
243656706; 1603780.3769339535,142607.37201781105; 1603801.47584667
8,142582.27024446055; 1603830.4767344964,142536.14692804776;
2. 1602514.2066492266,142330.66992144473; 1602521.4127475217,142414.9
2978276964; 1602520.1146955898,142433.93817959353; 1602501.3840010
355,142439.12358761206; 1602371.6780588734,142417.84858870413; 1602
76 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
351.6610373354,142408.02716448065; 1602334.5180692307,142388.58748
627454; 1602331.6999511716,142376.66073128115; 1602334.8067251327,
142348.965322732; 1602338.308919772,142323.6111663878; 1602349.022
6452332,142314.50124930218; 1602363.9090971674,142310.79584660195;
1602514.2066492266,142330.66992144473;
3. Write a Python program to parse the text file and use list to hold
the two polylines. Please refer to Section 5.6.1 in Python Library
Reference (from Python help document) String methods for split(),
strip(), and Built-in Functions float(x).
4. Generate two polyline objects.
5. Calculate the length of the two polylines.
6. Review the class materials on handling exceptions and Python
tutorial “Errors and Exceptions” (Section 8.3 in Python help
document).
7. While reading the file and converting the string data to another data
type, for example, float, please add “try…except…finally…” to catch
the Exceptions, for example, IOError and ValueError.
5
Programming Thinking and
Vector Data Visualization
77
78 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 5.1
A map using ArcGIS software, in which polygons are illustrated in light green, polylines in
dark red, and points in green.
Start
Prepare canvas
Coordiante transfer
Draw features
Finalize drawing
End
FIGURE 5.2
The process for visualizing GIS data.
Programming Thinking and Vector Data Visualization 79
In Python, Tkinter and related classes and modules are commonly used
for drawing and developing GUIs. Aside from the Tkinter, there are other
toolkits available, like PyGTK, PyQt, and wxPython. Code 5.1 will bring up
a GUI:
In the sample code:
In this example, a window was created with one label showing “Hello
World.” TKinter supports a variety of widget objects; the most common are
described in Table 5.1.
In addition to the widgets in Table 5.1, Tkinter has other widgets, which
include Entry, Frame, LabelFrame, Menubutton, OptionMenu, panelWin-
dow, Scale, Spinbox, and Toplevel.
When using other widgets, replace the third line and fourth line (in
Code 5.1) by creating an object of a specific widget and passing it in rele-
vant arguments specified on the online reference,* for example, when using
Canvas for map, replace the third line and fourth line with
Among all the widgets, Canvas is the most widely used. It supports many
methods, like drawing points, lines, polylines, and polygons. A typical
Canvas preparing code is shown in Code 5.2.
In Code 5.2, the third line creates a Canvas with the pixel size dimen-
sions of 800 by 600. Although the size is based on the computer monitor
size, the actual size can be adjusted on the display settings tab. The second
to last line ensures that Canvas is visible on the window and the last line
CODE 5.1
Create a GUI using Tkinter.
* http://www.pythonware.com/library/tkinter/introduction/tkinter-reference.htm.
80 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
TABLE 5.1
Popular TKinter Widgets
Widgets Syntax Example
Button Using (master, Button(root, text="OK", command=callback)
Canvas options) as Canvas(root, width = 800, height = 600)
Label parameters to
Label(root, text="Hello, world!")
initialize a widget,
Listbox Listbox(root, height=8)
root is the parent
Menu widget, options Menu(root, borderwidth=2)
Message are the widget Message(root, text="this is a message")
Radiobutton options such as Radiobutton(root, text="Grayscale", value="L")
Scrollbar command, back Scrollbar = Scrollbar(root)
Text Text(root, font=("Helvetica", 16))
CODE 5.2
Preparing a Canvas for drawing.
will ensure the window shows up. The Canvas prepared can be used as
the map cloth, which can be used to draw points, polylines, and polygons.
The next step is to prepare GIS data so that it can serve as visualization
on Canvas.
(154.23, 85.78)
(0, 0)
GIS Data area (–179.00, –89.00, Window within monitor
154.23, 85.78) (800 × 600)
FIGURE 5.3
An example of geographic area and window monitor size with 800 × 600.
82 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
P4 P3 (154.23, 85.78)
RatioX: (maxX–minX)/(800–0)
= (154.23+179.00)/800
P0 = 0.4165375
RatioY: (maxY–minY)/600
= (85.78+89.00)/800
= 0.2913
P2
P1 (–179.00, –89.00)
FIGURE 5.4
Calculation of ratioX and ratioY.
dataset (the point with the lowest x coordinate and highest y coordinate) as
the origin of a computer monitor (0, 0,upper left).
In the coordinate transformation process, keep the map’s direction,
meaning have the monitor map with the same top to bottom and left to right
direction. Along the x-axis, both the geographic coordinate system and the
monitor coordinate system will increase from left to right. However, along
the y-axis, the monitor coordinate system will increase from top to bottom
while the geographic coordinate system will increase from bottom to top.
Therefore, flip the direction when converting the coordinates from a geo-
graphic system to the monitor system. This is done by flipping the p ositive/
negative signs of length along the y direction. The following illustrates
how to calculate the coordinates corresponding to monitor coordinates
(winX, winY).
Both the geographic extent and monitor screen areas have five important
points (Figure 5.4) used to calculate length ratios: the center point (p0) and its
four corners (p1, p2, p3, and p4). Length ratios include both ratioX, which is
the length conversion ratio from geographic coordinates to monitor coordi-
nates along the X direction, and ratioY along the Y direction. The final ratio
used to convert geographic coordinates to monitor coordinates is selected
from either ratioX or ratioY.
T I P S : Using different ratios for the x- and y-axis will cause the map to be
distorted in the visualized map.
A reference point is another important element when converting geographic
coordinates to monitor coordinates. Reference points could be centers of
both systems, which are (−12.385, −1.61) and (400, 300), or upper left of both
systems: (−179.00, 85.78) and (0, 0).
p3 (1143.94, 0)
Use
Use
ratioY
ratioX
p1 (0, 419.60) p2 p2 (1143.94, 600)
FIGURE 5.5
The monitor coordinates of four corner points are based on ratioX (left) and ratioY (right), using
the upper left corner as the reference point.
control point and ratioX as the ratio to convert it from the geographic coordi-
nates to monitor coordinates:
When using ratioY as the ratio, to convert it from the geographic coordinates
to monitor coordinates:
As shown in Figure 5.5, using ratioX will not use all 600 pixels of window
height; however, not all features will show up while using ratioY (window
coordinates are out of boundary). Typically, the larger one should be selected
as the ratio value to ensure that all features are displayed on the screen at the
initialization stage.
Finally, transform the geographic coordinates (x, y) to the screen pixel
coordinates (winx, winy) after both the ratio and reference points (X0, Y0)
are determined using the following formula:
winx = (X–X0)/ratioX
winy= - (X – Y0)/ratioY (add a negative sign to flip the y-axis direction)
For example, if ratioX and an upper left point (−179.00, 85.78) are (0, 0), any
point (x, y) from the GIS data will serve as the coordinates for the monitor
window (Figure 5.6):
winx = (x – (−179.00))/ratioX
winy= – (y – (85.78))/ratioX
(0,0)
FIGURE 5.6
Coordinate conversion using ratioX as ratio and the upper left corner as the reference point.
ratioX = (maxx-minx)/width
ratioY = (maxy-miny)/height
ratio = ratioX>ratioY?ratioX:ratioY
winx = (x-(minx))/ratio
winy = -(y-maxy)/ratio
TABLE 5.2
Canvas Widgets Can Be Created Using Different Create Methods
Graphs Method Context Parameters Usage in GIS Notes
A slice out of an create_arc(x0, y0, (x0, y0, x1, y1) is the Some symbols
ellipse. x1, y1, options) rectangle into which the
eclipse, (x0, y0) and (x1,
y1) are the two diagonal
points
An image as a create_bitmap(x, y, (x, y) is the point location Show raster images
bitmap. options) where the bitmap is
placed
A graphic image. create_image(x, y, (x, y) is the point location Show raster images
options) where the image is
placed
One or more line create_line(x0, y0, (x0, y0, x1, y1,…) is the Some polyline
segments. x1, y1,…, list of the points in the features such as
options) polyline, (x0,y0) and rivers, roads
(x1, y1) are the two
diagonal points
An ellipse; use create_oval(x0, y0, (x0, y0, x1, y1) is the Some ellipse features
this also for x1, y1, options) rectangle into which the or symbols
drawing circles, eclipse, (x0,y0) and (x1,
which are a y1) are the two diagonal
special case of an points
ellipse.
A polygon. create_polygon(x0, (x0, y0, x1, y1,…) is the Some polygon
y0, x1, y1,…, list of the points in the features such as
options) polygon lakes, sea, cities
A rectangle. create_ (x0, y0, x1, y1) is the A map border etc.
rectangle(x0, y0, rectangle, (x0,y0) and
x1, y1, options) (x1, y1) are the two
diagonal points
Text annotation. create_text(x, y, (x, y) is the point location Texture Caption
options) where the text is placed
A rectangular create_window(x, (x, y) is the point location A Canvas to draw the
window. y, options) where the window is map
placed
define the lower right point of the rectangle (in which the arc will be drawn).
There are many options, such as start (the beginning angle of an arc), extent
(the width of the arc in degrees), and fill (the color used to fill in the arc).
Figure 5.7 shows how to create a Canvas object and create three arcs using
the same rectangle points, but with different colors and extents. As illus-
trated, the angle starts from a positive x-axis and goes counterclockwise.
As shown in Figure 5.7a, the source code creates a window, where Canvas
is drawn creating a 30 degree arc (270 degrees red, 60 degrees blue, and
30 degrees green).
86 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
(a) (b)
from Tkinter import*
root = Tk()
can = Canvas(root, width = 800, height = 600)
can.pack ()
xy = 20, 20, 300, 180
can.create_arc(xy, start=0, extent=270, fill="red") (c)
can.create_arc(xy, start=270, extent=60, fill="blue")
can.create_arc(xy, start=330, extent=30, fill="green")
root.mainloop ()
FIGURE 5.7
Create an arc with Tkinter. (a) Source code, (b) Draw arc, (c) Draw line.
can.create_line (1,2,35,46,5,6,76,280,390,400)
Then run the code and check the GUI to see whether it is the same as
Figure 5.5c.
FIGURE 5.8
A simple Python GIS map with point, polyline, and polygon visualized.
• xy can be a list of [x0, y0, x1, y1, x2, y2, …., x0, y0]. Note: the first and
last points are the same
• options include
• fill: to specify the fill color of the polygon
• outline: specify the border line color
• width: to specify the pixel width of the border line
Using these methods, the point, polyline, and polygon data drawn in
Figure 5.1 can be visualized in Python Tkinter Canvas as Figure 5.8.
p2
p3
p4 p5
Calculate the
p7 length
p1 p6
p8 p1 p2 p3 p4 p5 p6 ...
FIGURE 5.9
Programming components/steps for the Chapter 4 problem.
Programming Thinking and Vector Data Visualization 89
"""
Chapter#4
Read the following data:
Polyline;
1: 1603714.835939442,142625.48838266544; 1603749.4678153452,142620.212
43656706; 1603780.3769339535,142607.37201781105; 1603801.475846678,142
582.27024446055; 1603830.4767344964,142536.14692804776;
2: 1602514.2066492266,142330.66992144473; 1602521.4127475217,142414.92
978276964; 1602520.1146955898,142433.93817959353; 1602501.3840010355,1
42439.12358761206; 1602371.6780588734,142417.84858870413; 1602351.6610
373354,142408.02716448065; 1602334.5180692307,142388.58748627454; 160
2331.6999511716,142376.66073128115; 1602334.8067251327,142348.9653227
32; 1602338.308919772,142323.6111663878; 1602349.0226452332,142314.50
124930218; 1602363.9090971674,142310.79584660195; 1602514.2066492266,
142330.66992144473;
Code 5.3 defines the function ‘readPolylineFile’ to read data line by line.
The readPolylineFile function will return two values: polylines and
Programming Thinking and Vector Data Visualization 91
#
## function to read out data one line by one line and
## get all points from both lines
## return two objects: points list and
## the number of the points from the first line
>>> def readPolylineFile(fileName):
f = open(fileName, 'r')
polylines, points, index = [], [],0
firstPolyLineNum = 0
for line in f:
index += 1
if index == 1:
continue
coords = line.split(':')[1]
eachcoords = coords.split(';')
coordsLen = len(eachcoords)
if index == 2:
firstPolyLineNum = coordsLen-1
print 'The first polyline number is : ',
firstPolyLineNum
for i in range(coordsLen-1):
singlecoords = eachcoords[i]
#print 'singlecoords,', singlecoords
xCoord = singlecoords.split(',')[0]
yCoord = singlecoords.split(',')[1]
f.close()
return points, firstPolyLineNum
CODE 5.3
Read from text file and create a polyline to hold data and analyze data. (Continued )
92 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
## call the function to read data and put into points list
>>> resuts = readPolylineFile(‘polylinesHw4.txt’)
>>> points = resuts[0]
>>> firstPolylinePointNum = resuts[1]
>>> length = len(points)
>>> print ‘The total points and the numberof points for
firstpolyline is’,\
length, firstPolylinePointNum
(a) (b)
FIGURE 5.10
Two classes to be created for the point and rectangle problem.
The problem requires creating four random points and rectangles, which
indicate that the following two steps need to be included in the program:
To check each of the four points, loop through the four points, and check to
see whether each of the points is in any of the four rectangles, and then loop
through the four rectangles. This will require a double loop to process.
There are two components in the last check process: (a) how to check if a
point is within a rectangle (the contains() method in rectangle class), and
how to write the results to a file (the file open, write, and close pattern).
Since the file needs to be written as you progress through the double loops,
94 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
CODE 5.4
Generating four points, rectangles, checking contains() relationships, and outputting results
to a text file.
Programming Thinking and Vector Data Visualization 95
open it before entering the loop and close after exiting the loop. Write the
file when executing the loops.
The random.random() method may generate the same values for x1 & x2
or y1 & y2 (which means a line instead of a rectangle). This can be handled
by adding a method to check whether they are the same in order to prevent
an invalid rectangle.
Based on this programming thinking process, the programming codes can
be developed in the flow in Code 5.4:
• Problem analyses
• Pattern matching
• Coordinate transformation
• Drawing vector data on Canvas
• Two coding examples are used to demonstrate the programming
thinking process: (a) reading, parsing, and calculating length for
polylines, and (b) generating random points and rectangles; and
check the contain relationship between every point and rectangle
PROBLEMS
1. There is a module named random in Python; import it and use its
method random() to generate a random number from 0 to 1.
2. There is a popular algorithm in GIS to find whether a point is inside
a rectangle based on their respective point coordinates (x, y, and
minx, miny, maxx, maxy). Describe the algorithm in a mathematical
algorithm using (x, y, and minx, miny, maxx, maxy).
3. Write a program to (a) generate m number of points and n number
of rectangles (m and n can be changed through user input), (b) check
which points are in which rectangles.
4. Program to write the point coordinates and rectangles point coordi-
nates to a text file, and then write the result of (2) to the text file.
5. In a Word document, explain the “point in rectangle” algorithm
and program created, and code the program in a .py file to find
which point generated in (3) is within which rectangle generated in
(3). Then check the text file output.
6
Shapefile Handling
One of the most important functions of GIS software is to read popular GIS
data file formats, such as shapefiles. A shapefile is a binary data file format
originally developed by ESRI, and has been widely used for exchanging
vector data among different GIS professionals and communities (ESRI 1998).
This chapter introduces how shapefiles are formatted and how to read them
with Python, that is, reading binary data, reading a shapefile header, reading
a point shapefile, and reading polyline and polygon shapefiles.
struct.unpack(fmt, binarydata)
The struct module must be imported before using (the first statement of
Code 6.1). The code also demonstrates how to pack two integers (100, 200)
represented by variables (x, y) into a binary string. String ‘ii’ is used to rep-
resent two integer values with each ‘i’ representing one integer. The fifth
97
98 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
import struct
x,y = 100,200
s = struct.pack('ii',x,y)
print s
result = struct.unpack('ii',s)
print result
CODE 6.1
Examples of pack and unpack methods of the struct module.
TABLE 6.1
Format Characters
Format Character C Type Python Type Standard Size
C char string of length 1 1
B signed char Integer 1
B unsigned char Integer 1
? _Bool Bool 1
h short Integer 2
H unsigned short Integer 2
i int Integer 4
I unsigned int Integer 4
l long Integer 4
L unsigned long Integer 4
q long long Integer 8
Q unsigned long long Integer 8
f float Float 4
d double Float 8
CODE 6.2
Packing and unpacking different data types using proper format.
statement unpacks the binary string into its original data value (100, 200).
The string ‘ii’ is important and referred to as the string format (denoted
as fmt), which is used to specify the expected format, and is required to
call both pack and unpack methods. Table 6.1 details the format charac-
ters used in the string fmt to specify format for packing and unpacking
binary data.
Code 6.2 shows how to pack different formats using the format characters.
Five variables representing five values in data types of integer, Boolean,
Shapefile Handling 99
TABLE 6.2
Struct Starting Character
Character Byte Order Size Alignment
@ native native native
= native standard none
< little-endian standard none
> big-endian standard none
! network (= big-endian) standard none
double, float, and double are packed. The total length of the packed string
is 4(i) + 1(b) + 8(d) + 4(f) + 8(d) = 25. Because the struct package is following
C standard, the C Type is used. Python has fewer data types; however,
the standard size of each data type can be kept if the first character of
the format string is indicated by the byte order, size, and alignment of the
packed data.
By default, the @ will be used if the first character is not one of the
characters given in Table 6.2 below:
Byte order* concept is also used in the ESRI Shapefile format. Big-endian
and little-endian byte orders are two ways to organize multibyte words
in the computer memory or storage disk. When using big-endian order,
the first byte is the biggest part of the data, whereas the first byte is the
smallest part of the data when using little-endian order. For example, when
storing a hexadecimal representation of a four-byte integer 0 × 44532011
TABLE 6.3
Four-Byte Integer 0 × 44532011 in the Storage
Big-Endian 44 53 20 11
Little-Endian 11 20 53 44
import struct
x,y,z = 100,200,300
s = struct.pack('>iii',x,y,z)
print s
result = struct.unpack('>iii',s)
print result
result = struct.unpack('<iii',s)
print result
CODE 6.3
Pack and unpack must use the same byte order.
CODE 6.4
The default byte order is little-endian for our computers.
(Table 6.3) using the big-endian order binary format, the byte sequence
would be “44 53 20 11.”
• For big-endian byte order, use ‘>’ while packing or unpacking the
bytes, for example, struct.unpack(‘>iiii’, s).
• For little-endian byte order, use ‘<’, for example, struct.unpack(‘<iiii’, s).
• If these two are mixed, for example, packing using ‘>’ and unpack-
ing using ‘<’, an unexpected result will be generated, as shown in the
last statement of Code 6.3.
• .shp: The .shp file contains the vertices of all entities (Figure 6.1).
The vertices are organized hierarchically in features/records, parts,
and points. The .shp file also contains information on how to read
the vertices (i.e., as points, lines, or polygons). Some important
attributes can also be termed as the third dimension (measure-
ments), and stored in the .shp file.
• .shx: An index is kept for each record, and is beneficial for finding
the records more quickly.
• .dbf: Attribute information is stored in the .dbf file associated with
each .shp file. The .dbf file contains dBASE tables and stores addi-
tional attributes that cannot be kept in a shapefile’s features. It
contains exactly the same number of records as there are features
in the .shp file (otherwise the data could not be interpreted). The
records belong to the shapes sequentially, meaning that the first,
second, and third records belong, respectively, to the first, second,
and third, f eatures in the .shp file. If we edit the .dbf using a third-
party tool and alter the records, the order may be lost. More infor-
mation can be found from the ESRI shapefile format white paper
(ESRI 1998).
FIGURE 6.1
Shapefile structure. (Adapted from ESRI. 1998. ESRI Shapefile Technical Description. An ESRI
White Paper, 34.)
102 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 6.2
Shapefile main file structure.
FIGURE 6.3
File header of shape main file.
Shapefile Handling 103
in big-endian byte order, and ‘>’ is required to unpack the bytes, for example,
struct.unpack(‘>iiiiiii’, s). The unit for the total length of the shapefile is 16 bit
word, that is, the total file length in bytes would be double the value of the
interpreted number.
The rest of the file header is in little-endian byte order, and ‘<’ is required
to unpack them, for example, struct.unpack(‘<iiii’, s). Omit the ‘<’ since it is
the default value for pack or unpack on most PCs. Starting byte 28, a 4-byte
integer (value of 1000) refers to the version of the shapefile. Starting byte 32,
a 4-byte integer, indicates the feature shape type (e.g., 1 means the file is for
Point feature, and 3 indicates it is a Polyline file, Figure 6.3 right). Byte 36 to
the 100 is the bounding box of the entire dataset in the shapefile. The bound-
ing box includes four dimensions x, y, z, and m. Each dimension includes
minimum and maximum values in the sequence of minx, miny, maxx, maxy,
minz, maxz, minm, and maxm. The bounding box should be written with
fmt ‘<dddddddd’ for all values in double data type.
Hands-on practice: Interpret the shapefile header (Code 6.5)
The first three statements have 28 bytes from the shapefile. The fourth state-
ment unpacks the data in big-endian order and has seven integers. The first
integer is 9994, the file code for shapefiles. The next five are zero, and are
reserved for future use. The last one is the file length, which is the total length
of the shape main file in 16-bit or two bytes unit. Therefore, the actual total
length of the file is double the indicated value (i.e., 288 * 2 = 576) bytes. The
next statement reads out 72 bytes and unpacks them using little-endian byte
order to obtain two integers and eight double values. The first one, with
a value of 1000, is the version of shapefile. The second one, with value 1,
indicates that the shapefile feature type is Point. The following four refer to
CODE 6.5
Interpreting the shapefile main file header.
104 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
minx, miny, maxx, maxy, respectively, and minz, maxz, minm, maxm are all
zeros since these two dimensions are not used in this shapefile.
FIGURE 6.4
Point record header and content.
FIGURE 6.5
File structure for Point .shp file.
Shapefile Handling 105
CODE 6.6
Read the point shapefile and write the results to a text file.
Code 6.6 reads the shapefile to get the file length (bytes 24–27 in file header)
and uses the file length to calculate the number of points in this shapefile in
the following steps:
1. Doubling the size to convert from 16-bit (two bytes) unit to 8-bit
(one byte) unit
2. Subtracting 100 bytes for a file header
3. Dividing by 28 (each record header and record content takes 28 bytes
in point shapefile) to get the feature number
The file length and number of point features are then printed out and a
text file is opened to write the results. A for loop is used to cycle through
each record/feature to read out the x, y values and print out and write to
the text file. Lastly, the two files are closed to conclude the file read/write
process. In the for loop, the first line moves the file pointer to the position
where the ith record’s x value starts (100 + 12 + i*28, 12 refer to the record
header [8 bytes] and the shape type integer 1 [4 bytes]), then reads 16 bytes
for x, y and unpacks them into x, y variables.
FIGURE 6.6
File structure for .shx file.
CODE 6.7
Interpreting the shape index file to get the number of records, and the offset, content length of
each record.
record header for the record. Thus, the offset for the first record in the main
file is 50, given the 100-byte header.
Hands-on practice: Interpret the point .shx file
Code 6.7 first reads the integer of index file length from bytes 24–27.
This number, in 16-bit or two-byte unit is then used to calculate the feature
number by
The feature number and file length are printed and a text file is opened
to keep each feature offset and content length value in the main file. The for
loop reads each record and writes it in the text file. Again, both files are
closed at the end of the program. The for loop cycles through each record
to (a) move to the start of the ith record (100 + i*8), (b) read out 8 bytes for
two integers, and (c) unpack the two integers as offset and contentLength,
which are printed and written to the text file following the sequence of the
record. The text file and the output on the interactive window can be used to
verify the content.
* “Data File Header Structure for the dBASE Version 7 Table File.” dBase. http://www.dbase.
com/KnowledgeBase/int/db7_file_fmt.htm.
108 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
Steps 7 and 8 are for shapefile output. Unless converting each feature’s
spatial attributes (coordinates) or nonspatial attributes from a different
format (e.g., a text file) to generate new shapefiles, these two steps are not
required. This conversion could also be easily accomplished using ArcGIS
Python scripting, which will be introduced later in this book.
(a) (b)
FIGURE 6.7
(a) Select the ‘countries’ folder to add data, and (b) the map window displays countries as
polygons and with country borders as polylines.
Shapefile Handling 109
FIGURE 6.8
The shapefile polyline format with the file header and first polyline shown. The rest of the
shapefile repeats the polyline record format.
110 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
##2. read index file header and interpret the meta information, e.g.,
bounding box, and # of #records
# read first 28 bytes
s = shxFile.read(28)
# convert into 7 integers
header = struct.unpack(">iiiiiii",s)
# get file length
fileLength = header[len(header)-1]
# calculate polyline numbers in the shape file based on index file
length
polylineNum = (fileLength*2-100)/8
print 'fileLength, polylineNum:',fileLength, polylineNum
# read other 72 bytes in header
s = shxFile.read(72)
# convert into values
header = struct.unpack("<iidddddddd",s)
# get boundingbox for the shape file
minX, minY, maxX, maxY = header[2],header[3],header[4],header[5]
##3. read records¡¯ meta information, such as offset,
## and content length for each record,
# define an empty list for holding offset of each feature in main file
recordsOffset = []
# loop through each feature
for i in range(0,polylineNum):
# jump to beginning of each record
shxFile.seek(100+i*8)
CODE 6.8
Reading and visualize polyline shapefiles. (Continued)
Shapefile Handling 111
1. Type/copy the code into the programming window, save the python
file.
2. Copy the data Partial_Streets.shp and Partial_Streets.shx to the
same folder where you save the python .py file.
3. Run the python file.
4. Explore and analyze the code to understand each section of
the code.
PROBLEMS
1. Pick a polyline shapefile with 10–100 features and one polygon file
with 10–100 features.
114 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
115
116 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 7.1
Python Command-line GUI.
FIGURE 7.2
Python Interactive GUI.
is closed; therefore, the interactive GUI is not appropriate for writing com-
plex programs.
FIGURE 7.3
Python file-based programming window.
from the ‘File→New Window’ of the Python IDLE. Within this window,
there are three ways to execute the code: (a) press F5, (b) click on Run→Run
Module, or outside the window (c) double click on the .py file in Windows
explorer (the Python IDE must be installed for this to work).
7.1.2.1 Highlighting
Coloring the code can help you better understand, capture, communicate,
and interact with peer programmers. In Python IDLE, code (including com-
ments) can be highlighted with different colors based on the types of the
118 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 7.4
Color of different parts of a program can be highlighted for better formatting, communication,
interaction, and programming.
words input. For example, keywords can be set as yellow, variables as black,
and functions as blue (Figure 7.4). These settings can be customized in the
“Highlighting” tab.
7.1.3 Debugging
Testing the code to fix errors and improve the robustness is called debugging.
It is a must-have process in programming because it is almost impossible to
Python Programming Environment 119
FIGURE 7.5
General setting for the initial status of Python IDLE.
FIGURE 7.6
Customize font size, style, and indentation for Python IDLE.
120 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
write bug-free codes. This section will go through basic skills for errors/
exceptions handling and debugging.
7.1.3.1 SyntaxError
Syntax errors occur when syntax requirements are not met, and are detected
in the interpreting process before the program is executed. Syntax errors are
removed when translating the source code into a binary code. When a syntax
error is detected, the Python interpreter outputs the message “SyntaxError:
invalid syntax.” Such errors occur frequently, especially when you are unfa-
miliar with Python’s syntax. Code 7.1 shows a syntax error with a missing
colon (‘:’) after True.
Unfortunately, error messages are often not informative. Described below
are four common mistakes that result in syntax errors; this list can be used
to help you detect problems:
CODE 7.1
While invalid syntax problem.
>>> if (i>0):
print 'i is bigger than 0'
elif: print 'i is smaller than 0'
CODE 7.2
If invalid syntax because of indentation.
Python Programming Environment 121
KeyboardInterrupt
>>> while True
SyntaxError: invalid syntax
>>> while True:
(a)
>>> def add()
SyntaxError: invalid syntax
>>> def add():
(b)
CODE 7.3
Missing parts of a statement syntax error.
>>> float([1,2,3])
>>> 1/0
Traceback (most recent call last):
File "<pyshell#18>", line 1, in <module>
1/0
ZeroDivisionError: integer division or modulo by zero
(d)
CODE 7.4
Examples of run-time exceptions.
122 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
TABLE 7.1
Built-In Exceptions
Class Name Reasons for Having Exceptions
Exception The root class for all exceptions
AttributeError Attempt to access an undefined object attribute
IOError Attempt to open a nonexistent file
IndexError Request for a nonexistent index of a sequence, for example, list
KeyError Request for a nonexistent dictionary key
NameError Attempt to access an undeclared variable
SyntaxError Code is ill-formed
TypeError Pass function an argument with wrong type object
ValueError Pass function an argument with correct type object but with an
inappropriate value
ZeroDivionError Division (/) or modulo (%) by a numeric zero
• Check the exception type and review reasons causing the exceptions
(Table 7.1).
• Look into the code, especially the line (indicated in the exception
message) that throws errors, and analyze what resulted in the excep-
tion. Sometimes, you may need to go up/down a few lines to identify
the real problem.
• If still not sure about the causes, use ‘print’ to output the values for
relative variables to check if they are right.
• Revise code and run again.
(b)
(a) (c)
FIGURE 7.7
Raise and catch exceptions.
>>> try:
pass #try block
>>> except:
pass #except block
>>> finally:
pass #finally block #executes regardless of exceptions
CODE 7.5
Uses “try…except…finally” to capture the ZeroDivisionError exception and remove result
from memory if an exception happened.
>>> divide(3,1)
Cleaning up ....
3
>>> divide(3,0)
Division by zero
Cleaning up ....
>>>
CODE 7.6
Try…except…finally to clean up variables if an exception occurred.
>>> f = None
>>> try:
f = open('sample.txt', 'r+')
f.readline()
f.readlines()
f.seek(0)
f.read()
f.write('This is a test!')
>>> except IOError:
print 'The file does not exist!'
>>> finally:
## Close the file if the file opened
if f:
f.close()
CODE 7.7
Uses try…except…finally to handle file operation exceptions.
FIGURE 7.8
System modules.
CODE 7.8
Add the path to the module through sys.path.append() method will add the path to the end of
sys.path list.
import sys, os
if os.getcwd() not in sys.path:
sys.path.append(os.getcwd)
The first statement imports sys and os modules. The second statement
fetches the current file path using os.getcwd to check if it is already in the
system path, adding to the system path if not.
TABLE 7.2
System Built-In Modules
Module Description Examples
os Interact with the operating system os.system(‘dir’)
sys System-specific parameters and sys.path, sys.exit()
functions
math Floating point math functions math.pow()
shutil High-level file operations shutil.copyfile(), shutil.move()
glob Get file lists from directory glob.glob(‘*.shp’)
wildcard searches
re Regular expression tools for re.match(‘c’, ‘abcdef’)
advanced string processing
datetime Manipulate dates and times date.today()
zlib, gzip, bz2, Data archiving and compression ZipFile(‘spam .zip’, ‘w’)
zipfile and tarfile
math (Figure 7.9) is the most popular module used in this book.
We can always use built-in function dir() to find what is supported in a
new module. Figure 7.10 shows that the math module includes many meth-
ods, such as acos, sin, fabs, etc., and several private attributes, such as ‘__
doc__’, ‘__name__’, and ‘__package__’. These three attributes are typically
included in all modules. The built-in function help() can be used to check the
description of the module and functions.
FIGURE 7.9
math methods.
FIGURE 7.10
Check a module.
map display area (Figure 7.12). The composite layers of a map show the street,
park, house (Parcels), and river information about the city of Westerville,
Ohio. Upon loading a dataset to the memory, each layer includes a series of
objects (e.g., each house can be treated as one object).
FIGURE 7.11
ArcMap user interface.
Map
attribute Map
Layer
Layer 1 Layer m Layer n
attribute
Object
Object 1 Object k
attribute
FIGURE 7.12
Map data organization hierarchy.
Feature class (Figure 7.13b). Therefore, the visualization does not have to be
defined many times.
Putting these together creates a package for reading and displaying ESRI
shapefiles, including simple functions (e.g., calculate the distance and cen-
troid). As shown in Figure 7.14, Tkinter is a built-in module for data visual-
ization in a GUI, and struct is used for reading binary data. A feature class
is a simple module including only two methods, __init__() and vis() (Figure
7.14). Polyline and Point classes are inherited from the Feature class and
include two methods, __init__() and vis(), as well. In addition to these two
methods, Polylines also include a method length(). A polygon is inherited
from the Polyline class and overrides the vis() function.
130 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
(a)
Reading data
Point Polyline Polygon Visualization
for objects
(b)
Reading data
Point Polyline Polygon
for objects
FIGURE 7.13
Developed code. (a) Visualization explicitly defined separate from Point, Polyline, and Polygon. (b)
Visualization is inherited from Feature class therefore, no need to define visualization explicitly.
Feature
Polyline Point
Polygon
Features
struct ReadShapeFile
Tkinter Layer
Inherit Map
Import Init
FIGURE 7.14
Hierarchy of primary Mini-GIS modules (Tkinter and struct are Python built-in modules).
Python Programming Environment 131
FIGURE 7.15
Mini-GIS code organization.
132 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 7.16
Primary functions are feature modules and map modules.
Normally, this is the only part you need to understand fully if you get the
package from elsewhere; otherwise, the interface description is enough to
use a package.
1. Add the path to the sys path; or just double click the MiniGIS.py; or
use IDLE to run the package. Here, use the third method to open
MiniGIS.py.
2. Click Run→Run module menu or the ‘F5’ button to run the package.
It will bring up a window, which has the title ‘MiniGIS.’
3. Import shapefile in two ways: ‘Import shp’ and ‘Add shp layer’ in
the File menu. Figure 7.17a illustrates the map of imported Fairfax
shapefile folder data.
Python Programming Environment 133
(a) (b)
(c)
FIGURE 7.17
(a) Using Mini-GIS to import Fairfax shapefile folder. (b) Zoom to extent. (c) Python Shell
outputs.
4. View the map using the menu under ‘View’ menu: zoom in, zoom
out, zoom to extent, zoom to full window and close layer. Figure
7.17b shows zooming the map to the left boundary. Figure 7.17c
shows the outputs in the Python Shell, which also outputs the map
ratio.
FIGURE 7.18
Draw features.
FIGURE 7.19
Intersections between HIGHWAY and UTILITY_LINES layers.
Python Programming Environment 135
PROBLEMS
1. Take the data used in Chapter 4.
2. Based on the solution to problems in Chapter 4, how can you find the
bounding box (minx, miny, maxx, maxy) of the polylines?
3. Prepare the data to visualize (coordinates transform, etc.) based on
the window size you want to draw.
4. Program to draw the data (using Canvas and related objects/
functions/widgets). Organize your program in a file or several files
to set up a module in the system; import the module and execute the
module that was developed and configured.
5. Develop a word document explaining the program.
6. Provide comments for all lines in the code to explain the logic.
8
Vector Data Algorithms
GIS is different from other software technologies due to its algorithms and
analyses for spatial relationships in spatial data. This chapter introduces
a similar category of algorithms for vector data processing, including the
calculations of centroid (8.1), area (8.2), length (8.3), line intersection (8.4),
and point in polygon (8.5), which are the fundamental algorithms based on
geometry and the spatial relationship.
8.1 Centroid
Centroid can be considered as the gravity center of a feature (wiki.gis
2011). One application example of centroid is to set up the annotation
location, such as labeling a building on a map. This section discusses the
centroid calculation for three geometry types: triangle, rectangle, and
polygon.
( x1 + x2 + x3 )
xcentroid = 3
(8.1)
ycentroid = ( x1 + x 2 + x3 )
3
137
138 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
(x3, y3)
(x, y)
FIGURE 8.1
A triangle’s centroid is the intersection point of the three medians. (Refer to http://jwilson.
coe.uga.edu/EMAT6680Su09/Park/As4dspark/As4dspark.html. for the proof of concurrency
of the three medians.)
(xmax, ymax)
(x, y)
(xmin, ymin)
FIGURE 8.2
A rectangle’s centroid is the intersection of two diagonals.
( xmin + xmax )
xcentroid = 2
(8.2)
( y +
ycentroid = min ymax )
2
∑
n −1
( xi + 1 + xi )( xi yi + 1 − xi + 1 yi )
xcentroid =
i=0
6A
(8.3)
∑
n −1
( yi + 1 + yi )( xi yi + 1 − xi + 1 yi )
ycentroid = i=0
6A
8.2 Area
This section introduces how to calculate the area of polygon feature.
Two types of polygons are discussed: a simple polygon and a polygon with
hole(s).
SP = SA + SB − SC − SD − SE (8.4)
where SA, SB, SC, SD, and SE denote the corresponding areas of the five
trapezoids.
FIGURE 8.3
Area calculation of a simple polygon.
140 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
Since the coordinates for each point are known, we can calculate the area
for each trapezoid with Equation 8.5.
( x2 − x1 )( y 2 + y1 )
SA =
2
( x3 − x2 )( y 3 + y 2 )
SB =
2
( x 4 − x3 )( y 4 + y 3 )
SC = (8.5)
2
( x5 − x 4 )( y 5 + y 4 )
SD =
2
( x6 − x5 )( y6 + y 5 )
SE =
2
By plugging Equation 8.5 into Equation 8.4, we derive Equation 8.6 to cal-
culate the area of the polygon.
5 5
( xi + 1 − xi )( yi + 1 + yi ) 1
SP = ∑ 2
=
2 ∑(x y − xi y i + 1 )
i +1 i (8.6)
i =1 i =1
By generalizing Equation 8.6, we have Equation 8.7 to calculate the area for
any simple polygon with n points (point0 = pointn).
n −1
∑(x y
1
A= i i +1 − xi + 1 y i ) (8.7)
2
i=0
–B
–
C
A+
FIGURE 8.4
A polygon with two holes.
Vector Data Algorithms 141
polygon (A) with two holes (B and C). In GIS, the vertices of a hole are recorded
in counterclockwise order. Hence, the area of the holes SB and SC calculated
using Equation 8.7 will result in a negative number (SA < 0, SB < 0). Therefore,
the actual area of polygon A can be calculated as S = SA − (−SB) − (−SC) = SA +
SB + SC.
8.3 Length
Length calculation finds the length of a line feature and the perimeter of a
polygon feature.
P1P2 = c = ( x2 − x1 )2 + ( y 2 − y1 )2 (8.8)
y
P2(x2, y2)
c
b
α
P1(x1, y1) a
FIGURE 8.5
Length calculation of a straight line segment.
142 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
Pn+1(xn+1, yn+1)
P1P2 P1(x4, y4)
P1(x1, y1)
FIGURE 8.6
A polyline with n + 1 points.
n n
Lpolyline = ∑ i i +1 =
PP ∑ ( x i + 1 − x i ) 2 + ( y i + 1 − y i )2 (8.9)
i =1 i =1
y
Lin e1
e2 (x2, y2) Lin
(x3, y3)
FIGURE 8.7
Illustration of line segments intersection.
y = ax + b
y=b
x
x = –b/a
FIGURE 8.8
Mathematical representation of a line in Cartesian coordinate system.
Calculating a12, b12, a34, b34: Since the two points (x1,y1), (x2,y2) are on Line1, we
have the following equation group (Equation 8.10):
y1 = a12 x1 + b12
(8.10)
y 2 = a12 x2 + b12
y 2 − y1
a12 =
x2 − x1 (8.11)
b12 = y1 − a12 x1
y 3 = a34 x3 + b34
(8.12)
y 4 = a34 x 4 + b34
y4 − y3
a34 =
x 4 − x3 (8.13)
b34 = y 3 − a34 x3
y = a12 x + b12
(8.14)
y = a12 x + b12
b12 − b34
x0 =
a34 − a12 (8.15)
y = a12 x0 + b12
where a34 is not equal to a12, and a12, b12, a34, b34 can be calculated based on the
given four points. It should be noted that (x0,y0) is the solution for two infinite
lines. To check whether (x0,y0) falls on both Line1 ((x1,y1), (x2,y2)) and Line2
((x3,y3), (x4,y4)), the following test is required:
x1 ≤ x0 ≤ x2
x 3 ≤ x0 ≤ x 4
y1 ≤ y 0 ≤ y 2
y 3 ≤ y0 ≤ y 4
If the four conditions are all true, Line1 and Line2 intersect at point (x0,y0).
Otherwise, they do not intersect.
Vector Data Algorithms 145
The above method for checking the intersection does not work for two
special scenarios: parallel lines and vertical lines.
Once (x0,y0) is calculated, the same test is required to check whether the
intersection point falls on both line segments. The same method can be
applied if Line2 ((x3,y3), (x4,y4)) is vertical.
If both lines are vertical, they are parallel. This can be handled the same
way as parallel lines.
(x4, y4)
(x1, y1)
(x2, y2)
FIGURE 8.9
Line1 ((x1,y1), (x2,y2)) is vertical.
146 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
P2(x2, y2)
P1(x1, y1)
Pn(xm, ym)
FIGURE 8.10
Intersection checking of two polylines.
Since a polyline consists of two or more line segments (Figure 8.10), the
following procedure can be used to check whether the two polylines inter-
sect: for each line segment in polyline1, check whether it intersects with any
of the line segments in polyline2. Once a single intersection point is found,
we can conclude that the two polylines intersect. We may also find out all
the intersection points of two polylines by checking each of the line segment
pairs between the two polylines.
3 6
1 5
2
FIGURE 8.11
Illustration of ray casting algorithm.
TABLE 8.1
Number of Intersection Points for the Seven Rays
Point Id Number of Intersection Points Point in Polygon?
1 0 No
2 2 No
3 1 (odd) Yes
4 4 No
5 5 (odd) Yes
6 2 No
7 6 No
1 3
2
FIGURE 8.12
Special scenarios: ray passes the polygon vertex.
FIGURE 8.13
“Centroid Practice” window.
FIGURE 8.14
“Area Practice” window.
150 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 8.15
“Line Intersection Practice” window.
PROBLEMS
• Review the class material.
• Review the class practice code: chap8_intersection_pract.py
• Create the following four line segments: [(50,200),(400,200)],[(60,450),
(400,450)],[(100,600),(350,250)],[(300,100),(300,400)]
• Write a program to determine whether they intersect with each
other.
• Display the line segments on the monitor and draw the intersected
point in a different color and size.
Section III
Advanced GIS
Algorithms and Their
Programming in ArcGIS
9
ArcGIS Programming
153
154 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
(a)
(b)
FIGURE 9.1
Geoprocessing with (a) ArcToolbox, ModelBuilder, and (b) ArcMap Python window.
• Open Python window from ArcMap and type in (not copy) the
following command:
FIGURE 9.2
ArcMap Python window and the opening button.
156 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
1. Open the Python editor window and enter Code 9.1. Save the code to
a .py file, for example, samplecode.py.
2. Open the windows command console, navigate to the folder where
the *.py file was saved, and run the script to perform buffer analysis
with the following command:
samplecode.py C:\Default.gdb\out2 C:\Default.gdb\output 5000
Meters
The first parameter is the absolute path of the input data, the second
parameter is the absolute path of the output data, and the third describes the
buffer size (Figure 9.4).
FIGURE 9.3
Sample code of using ArcPy outside ArcMap.
ArcGIS Programming 157
import arcpy
import sys
script_name = sys.argv[0]
fc=sys.argv[1]
output=sys.argv[2]
bufferSize=sys.argv[3]
arcpy.Buffer_analysis(fc, output, bufferSize)
CODE 9.1
Python code using ArcPy and allowing flexible parameter inputs.
FIGURE 9.4
Screenshot of the windows command to execute a .py file and the output. (If there is no error
reported in the window, and the output is generated, then the execution is successful.)
FIGURE 9.5
Desktop and online help document.
FIGURE 9.6
Syntax and sample code of Python script in the online help document for the corresponding
geoprocessing tool.
where the first three parameters are required and the last parameter with
braces is optional. Scripting, rather than operating the tool through the
ArcToolBox interface, is very useful when the process involves a loop (for/
while statement) or conditional expression (if statement) to execute the geo-
processing function repeatedly under certain conditions. For example, to cal-
culate the line length within each polygon in a data layer, the process needs
to include a for loop to enable the repeated process for each polygon. Inside
the for loop, there will be a clip function that can cut off the line inside that
polygon and a statistics function, which sums up the total length of the lines
within the polygon.
160 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
1. Type Code 9.2 in the Python window in ArcMap. Change the paths
according to your workspace.
Note that, arcpy.env.workspace is executed at the beginning to
set the workspace, which is the path for accessing the input data
and saving the results. With environment workspace set up, the
input and output parameter can be set using the relative path. If
the inputs are feature class, image, etc., stored in a geodatabase,
we can set the geodatabase as the workspace. If the inputs are
Shapefiles, TINs, etc., we can set the geodatabase as the folder
where the input files are stored. On the contrary, if the workspace
is not set, the input and output parameters need to be set using
absolute paths. Even when workspace is set, we can still input or
output any dataset outside the workspace by directly using the
absolute path, for example:
arcpy.MakeFeatureLayer_management("I:\\sampledata\\data.gdb\\
bearMove", "inferLy")
2. Open the output tables, and list the total lengths of roads in each
polygon (Figure 9.7)
9.4.1 SearchCursor
The SearchCursor function establishes a read-only cursor on a feature
class, such as a shapefile or a table. The SearchCursor can be used to iterate
"""
Set the path of the input data roads and source data. You need to change to your own path.
In this sample, the workspace is a geodatabase.
"bearMove" and "roads" are two feature classes in the geodatabase.
ArcGIS Programming
"""
arcpy.env.workspace = "O:\\Book\\Code\\9\\chp9Data\\bookSampleData.gdb"
# ensure bearMove is in workspace first
arcpy.MakeFeatureLayer_management("bearMove","inferLy")
arcpy.MakeFeatureLayer_management("roads","targetLy")
"""
"MakeFeatureLayer_management" can create a feature layer object from the path of the input,
which is a string. "SelectLayerByAttribute", "Clip_analysis", and "Statistics_analysis" are
then conducted on the feature layer.
"""
for i in range(0,9):
# select the polygon with FID = i
arcpy.SelectLayerByAttribute_management("inferLy","NEW_SELECTION","\"OBJECTID \"="+str(i))
# execute clip analysis and out intermediate data "out_" + str(i) in workspace
fc = arcpy.Clip_analysis("targetLy","inferLy","out"+str(i))
# execute sum statistical analysis and output result "sum_" + str(i) in workspace
arcpy.Statistics_analysis(fc, "sum" + str(i), [["Shape_Length","SUM"]])
CODE 9.2
Automate calculating the road length within each polygon.
161
162 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 9.7
Example of output summary table generated by Code 9.2.
through row objects and extract field values. The syntax of SearchCursor is
as follows:
arcpy.da.searchCursor (in_table, field_names, {where_clause}, {spatial_reference},
{explode_to_points}, {sql_clause})
The first argument (input feature table) and second argument (the queried
fields) are required while others are optional (e.g., limited by a where clause
or by field, and optionally sorted).
"""
Open a SearchCursor and include a list of attribute(s) that you want to
access (e.g. Shape_Leng, NAME, TYPE) in the parameter(s).
"""
rows = arcpy.da.SearchCursor(inputdata, ["Shape_Leng", "NAME", "TYPE"])
outputFile = open("C:\\ArcGISdata\\results.txt", "w")
"""
The cursor will place a lock on the data until either the script completes or
the cursor object is deleted. Therefore, we need to delete the row and
cursor objects to remove read locks on the data source.
"""
del row
del rows
outputFile.close()
CODE 9.3
SearchCursor with for statement.
ArcGIS Programming 163
inputdata = "O:\\Book\\Code\\9\\chp9Data\\Partial_Streets.shp"
with arcpy.da.SearchCursor(inputdata, ["Shape_Leng", "NAME", "TYPE"]) as rows:
for row in rows:
print "{}, {}, {}\n".format(row[0], row[1], row[2])
CODE 9.4
SearchCursor using with statement.
inputdata = "O:\\Book\\Code\\9\\chp9Data\\Partial_Streets.shp"
CODE 9.5
SearchCursor with where clause ("FID < 10").
2. Check the results.txt file. What is included in the file? How many
lines you can find in the file?
3. Search cursors also support the with statement. Using a with state-
ment will guarantee that the iterator is closed and the database lock
is released. By applying the with statement, the above code can be
changed to Code 9.4.
4. A where clause may be used to limit the records returned by the cur-
sor. Run Code 9.5 and check the result again. How many lines are
included in the result file?
5. SearchCursor can also access the feature geometry. Run Code 9.6
and Code 9.7, and then check the result again:
CODE 9.6
Accessing geometry using SearchCursor example 1.
CODE 9.7
Accessing geometry using SearchCursor example 2.
164 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
inputdata = "O:\\Book\\Code\\9\\chp9Data\\Partial_Streets.shp"
CODE 9.8
UpdateCursor example.
inputdata = "O:\\Book\\Code\\9\\chp9Data\\railway.shp"
CODE 9.9
Using UpdateCursor to delete rows/records.
9.4.2 UpdateCursor
The UpdateCursor object can be used to update or delete specific rows in
a feature class, shapefile, or table. The syntax of UpdateCursor is similar to
that of SearchCursor:
Hands-On Practice 9.6: Update the Attributes of Each Feature for a Shapefile
Using the UpdateCursor
9.4.3 InsertCursor
InsertCursor is used to insert new records into a feature class, shapefile, or
table. The InsertCursor returns an enumeration object, that is, a row in a table.
ArcGIS Programming 165
inputdata = "O:\\Book\\Code\\9\\chp9Data\\bookSampleData.gdb\\school"
# create the insert cursor and list the attributes that needs
to be filled up with values
cursor = arcpy.da.InsertCursor(inputdata, ["SCHOOL_NAM", "SHAPE@XY"])
"""
Create the a new record with property "NAME" filled up
with value "NewSchool" and xy coordinates filled up
with (1847395.83394, 772277.97643)
"""
new_row = ["NewSchool", (1847395.83394, 772277.97643)]
cursor.insertRow(new_row)
CODE 9.10
Insert Cursor example.
Hands-On Practice 9.7: Inserts Rows into a Shapefile Using the InsertCursor
9.4.4 NumPy
As a new feature in ArcMap 10.1+, data access modules offer functions
to enable the transformation between data array and feature classes or
tables. Since NumPy library has powerful capabilities for handling arrays,
the related function of ArcPy is developed based on NumPy. With arcpy.
NumPyArrayToFeatureClass, arcpy.NumPyArrayToTable, and arcpy.
TableToNumPyArray functions, users can quickly transform values that are
organized in array into a feature class or table, and vice versa.
Without using the NumPy function in the data access module, include
many more steps to create a point feature class so the performance is much
lower (Code 9.11).
In contrast, creating the feature class with NumPy requires only one step.
The arcpy.da.NumPyArrayToFeatureClass function actually creates a fea-
ture class and inserts two records inside.
1. Open the Python window in ArcMap, and run Code 9.12 to cre-
ate a feature class using the NumPy functions in the data access
module.
166 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
import arcpy
import numpy
# input array
array = numpy.array([(1,(471316.3835861763, 5000488.782036674)),
(2, (470402.49348005146, 5000049.216449278))],
numpy.dtype([('idfield', numpy.int32),('XY','<f8',2)]))
for i in array:
new_row = [i[1]]
cursor.insertRow(new_row)
del cursor
CODE 9.11
Add multiple new records to feature class using InsertCursor.
import arcpy
import numpy
output = "O:\\Book\\Code\\9\\chp9Data\\Default.gdb\\out"
# input array
array = numpy.array([(1,(471316.3835861763, 5000488.782036674)),
(2, (470402.49348005146, 5000049.216449278))],
numpy.dtype([('idfield', numpy.int32),('XY','<f8',2)]))
CODE 9.12
Add multiple new records to feature class using NumPyArrayToFeatureClass.
2. Also run Code 9.11 in the Python window in ArcMap, and compare the
execution time spent of with and without using numpy: the arcpy.
da.NumPyArrayToFeatureClass has a much better performance.
data = "O:\\Book\\Code\\9\\chp9Data\\bookSampleData.gdb\\railway"
dscb = arcpy.Describe(data)
if dscb.shapeType == "Polygon":
print "I am polygon"
elif dscb.shapeType == "Polyline":
print "I am polyline"
else:
print "I am not either polyline or polygon"
CODE 9.13
Describe function example.
• Spatial reference
• Extent of features
• Path and so on
1. Run Code 9.13 in ArcMap Python window and check the output.
What is the shape type of the input feature class?
2. Replace the Describe parameter with another shapefile that has dif-
ferent geometric types and see how the results change when run-
ning it again.
9.5.2 List
ArcPy provides functions to list all data under a particular workspace or list
corresponding information in data. ListFields are frequently used functions
in ArcPy to list all the fields and associated properties of a feature class,
shapefile, or table. Code 9.14 is an example of the ListFields function. The
code controls operations to be conducted on specific fields only: those that
are in “Double” type or that include the name “Flag.”
Functions listing data under a workspace (e.g., ListDatasets,
ListFeatureClasses, ListFiles, ListRasters, ListTables) are very useful to help
batch processing. For example, to perform a buffer analysis for multiple
import arcpy
# list the field of the data "roads.shp" under the folder "ArcGISdata"
fieldlists = arcpy.ListFields("O:\\Book\\Code\\9\\chp9Data\\
bookSampleData.gdb\\roads")
CODE 9.14
ListFields example.
168 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
polyline shapefiles in a workspace, list all the polyline shapefiles with the
ListFeatureClasses method, then use a loop to go through each shapefile,
performing a buffer analysis using Buffer_analysis method.
Hands-On Practice 9.10: Perform Buffer Analysis for Every Polyline Feature
Class in a Workspace
The Walk function in the data access module can help list all data under
the workspace or under its sub-workspace in hierarchy. For example, there
is a shapefile, a .png file, a geodatabase “Default.gdb,” and a folder “temp”
under a specific workspace “sampledata.” This function will list all the files
under “sampledata,” the feature classes under “Default.gdb,” and the files
under “temp.” The Walk function is much faster than traditional List func-
tions. Code 9.16 is a sample code of the Walk function and Figure 9.8 shows
its results.
import arcpy
arcpy.env.workspace = "O:\\Book\\Code\\9\\chp9Data"
# get the list of all of the polyline feature classes
fcList = arcpy.ListFeatureClasses('*','Polyline')
CODE 9.15
List all polyline feature class and conduct buffer analysis on them.
workspace = "O:\\Book\\Code\\9\\chp9Data"
CODE 9.16
List files using arcpy.da.Walk function.
ArcGIS Programming 169
FIGURE 9.8
Results of arcpy.da.Walk function.
FIGURE 9.9
Perform union with ArcMap and python scripting.
170 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
import arcpy
fc = "O:\\Book\\Code\\9\\chp9Data\\bookSampleData.gdb\\roads"
desc = arcpy.Describe(fc)
# get a list of field objects from the describe object
fields = desc.fields
CODE 9.17
Accessing the properties of a field object.
• Point: a single point, which is the basic unit for all the other geometry
types, cannot be directly used as input in geoprocessing functions
• Geometry: a general type
• Multipoint: a geometry entity with multiple points, consisting of
Point(s)
• PointGeometry: a geometry entity with a single point, consisting of
Point
• Polyline: a line geometry entity, consisting of Point(s)
• Polygon: a polygon geometry entity, consisting of Point(s)
# create a point
point = arcpy.Point(471316.38358618, 5000448)
# create the geometry interface of the point
pointgeom = arcpy.PointGeometry(point)
# create output geometry
outgeom = arcpy.Geometry()
# calculate the buffer of the create point geometry
arcpy.Buffer_analysis(pointgeom,outgeom,"5000 Meters")
CODE 9.18
Example of using geometry object to coduct buffer analysis.
ArcGIS Programming 171
CODE 9.19
Create SpatialReference object.
1. Create the object using a string as the path to a .prj file or using a
string with the name of spatial reference (Code 9.19). SpatialReference
is an ArcPy class.
2. Access the property of the spatial reference object (Code 9.20).
3. Create a Feature class in geodatabase using the spatial reference cre-
ated (Code 9.21).
4. Create a Feature class in geodatabase using the spatial reference of
another dataset (Code 9.22). The Describe function will be used to
obtain the spatial reference information of the data.
print spatialRef.name
print spatialRef.XYTolerance
print spatialRef.metersPerUnit
print spatialRef.GCS
CODE 9.20
Access the properties of a SpatialReference object.
import arcpy
arcpy.env.workspace = ("O:\\Book\\Code\\9\\chp9Data")
# use the name of the coordinate system
spatialRef = arcpy.SpatialReference("Hawaii Albers Equal Area Conic")
# create the FDS using the spatialRef created from arcpy.
SpatialReference() method
arcpy.CreateFeatureDataset_management
('O:\\Book\\Code\\9\\chp9Data\\Default.gdb', 'results', spatialRef)
CODE 9.21
Create a feature class with a spatial reference.
import arcpy
arcpy.env.workspace = ("O:\\Book\\Code\\9\\chp9Data")
desc = arcpy.Describe('school.shp')
spatialRef = desc.SpatialReference
CODE 9.22
Create a feature class with the spatial reference from another data.
172 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 9.10
Functions in arcpy.mapping module.
# list the data frame in the map document - dfs is the first data frame in the document
dfs = arcpy.mapping.ListDataFrames(mxd)[0]
# create a layer from the dataset (e.g. a feature class) which will be styled and mapped
lyr = arcpy.mapping.Layer(featureclass)
ArcGIS Programming
# get the first layer with name containing the string "test" inside the *.lyr file
symbollyr = arcpy.mapping.ListLayers(symbollyrs, ('*test*'))[0]
# change the symbol style of the feature class lyr in the dfs data frame into the pre-defined style symbollyr
arcpy.mapping.UpdateLayer(dfs, lyr, symbollyr, True)
# add the feature class with the updated symbol style into dfs data frame
arcpy.mapping.AddLayer(dfs, lyr)
# set the location and content of the first map element (e.g. text box) inside the mxd map document
elm = arcpy.mapping.ListLayoutElements(mxd, 'TEXT_ELEMENT', 'testelm')[0]
elm.elementPositionY = -1
elm.text = "this is a test map element"
elm.elementPositionX = 15
CODE 9.23
Making maps with predefined symbol style automatically.
173
174 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
4. Input basic
1. Create a 2. Create a 5. Specify script 6. Specify
3. Add script info, such as
toolbox toolset file path parameters
name, label, etc.
FIGURE 9.11
General steps to add scripts as tools.
1. Before creating the ArcTool, copy Code 9.24 and save as a *.py file.
import arcpy
"""
The following is the buffer tool script, where the first
argument is the input feature, the second argument is the
output feature, and the third argument is the buffer
distance
"""
inputFC = arcpy.GetParameterAsText(0)
outputFC = arcpy.GetParameterAsText(1)
bufferDist = arcpy.GetParameterAsText(3)
CODE 9.24
A script with GetParameter functions to obtain input from ArcTool interface.
ArcGIS Programming 175
FIGURE 9.12
Add ArcToolBox.
FIGURE 9.13
Add ArcToolset.
176 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 9.14
Add Python script as ArcTool.
FIGURE 9.15
Configure the property of the tool, the path of script, and the input parameter.
FIGURE 9.16
Results of running script tool with messages added.
Hands-On Practice 9.13: Add Message into the Custom Script Tool
1. Make a Python script with Code 9.25 and then add as an ArcTool.
2. Use the tool to test the buffer analysis and check the output.
import arcpy
"""
The following is the buffer tool script, where the first
argumentis the input feature, the second argument is the
output feature, and the third argument is the buffer distance
"""
inputFC = arcpy.GetParameterAsText(0)
arcpy.AddMessage('-------Input Feature: ' + inputFC)
outputFC = arcpy.GetParameterAsText(1)
arcpy.AddMessage('-------Output Feature: ' + outputFC)
bufferDist = arcpy.GetParameterAsText(2)
arcpy.AddMessage('-------Buffer Distance: ' + bufferDist)
CODE 9.25
AddMessage examples.
The following resources could also provide the information from another
perspective:
Find the data “states.shp,” and run Code 9.26 in the ArcMap Python
window. Note that new fields must be added into the attribute table
before the calculation in order to record the results (Figure 9.17).
# set the workspace
arcpy.env.workspace = "O:\\Book\\Code\\9\\chp9Data"
"""
ArcGIS Programming
Add fields "centX", "centY", "polyArea", and "polyPeri" to record the calculated results.
"DOUBLE" is the value type, 20 is the precision of the double type, and 10 is the scale.
"""
arcpy.AddField_management("states.shp","centX","DOUBLE",20,10)
arcpy.AddField_management("states.shp","centY","DOUBLE",20,10)
arcpy.AddField_management("states.shp","polyArea","DOUBLE",20,6)
arcpy.AddField_management("states.shp","polyPeri","DOUBLE",20,6)
"""
Calculate the centroid, area, and perimeter using the CalculateField_management tool.
"PYTHON_9.3"
means the calculation expression "!SHAPE.CENTROID.X" is in Python 9.3 syntax.
"""
arcpy.CalculateField_management("states.shp","centX","!SHAPE.CENTROID.X!","PYTHON_9.3")
arcpy.CalculateField_management("states.shp","centY","!SHAPE.CENTROID.Y!","PYTHON_9.3")
arcpy.CalculateField_management("states.shp","polyArea","!SHAPE.AREA!","PYTHON_9.3")
arcpy.CalculateField_management("states.shp","polyPeri","!SHAPE.LENGTH!","PYTHON_9.3")
CODE 9.26
Calculate the centroid, perimeter, and area of polygons using arcpy.
179
180 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 9.17
Result of Code 9.26.
1. Find the two shapefiles “interstates” and “railway” in the disk, and
select the interstate roads that intersect with railways. Selection
will be conducted on the “interstate” features based on their spatial
relationship with “railway” layer. Run Code 9.27 in ArcMap python
window and Figure 9.18 shows the result.
2. Select the railway stations (in “amtk_sta.shp”) in Virginia. Run Code
9.28 in ArcMap python window and see the result (Figure 9.19).
9.12 Summary
This chapter introduces programming within ArcGIS using Python scripts
and ArcPy package. This chapter introduces
arcpy.env.workspace = "O:\\Book\\Code\\9\\chp9Data"
"""
"MakeFeatureLayer_management" can create a feature layer object from the path of the input
data, which is a string. Selection will be conducted on the feature layer.
"""
arcpy.MakeFeatureLayer_management("interstates.shp", "roadLy")
arcpy.MakeFeatureLayer_management("railway.shp", "railLy")
# select the features in the interstates layer, which intersect with the features in the railway layer
arcpy.SelectLayerByLocation_management("roadLy","INTERSECT","railLy",selection_type="NEW_SELECTION")
CODE 9.27
Calculate line intersection.
181
182 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
RailLy
RoadLy
FIGURE 9.18
Line intersection result.
9.13 Assignment
• Read the ArcPy section in ArcGIS desktop help or the online version.
• Find a road data or download data from the package of correspond-
ing course material.
• Select highways from road data.
• Generate a buffer with 300 meters as the radius for the highway.
• Output the buffer as a transportation pollution zone.
• Add a field with the name of “buildings” with Long type in the buf-
fer zone data.
• Count the number of buildings within each buffer zone and store
into the new field.
• Write a report to explain how you conducted the analysis and
programming.
• Compare the differences of implementing spatial calculations using
ArcGIS and pure Python.
ArcGIS Programming
arcpy.env.workspace = "O:\\Book\\Code\\9\\chp9Data"
arcpy.MakeFeatureLayer_management("states.shp", "stateLy")
arcpy.MakeFeatureLayer_management("amtk_sta.shp", "stationLy")
# then select the railway stations (points) completely within Virginia (polygon)
arcpy.SelectLayerByLocation_management("stationLy","COMPLETELY_WITHIN","stateLy",selection_type="NEW_SELECTION")
CODE 9.28
Select all railway stations in Virginia.
183
184 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 9.19
Point in polygon result.
185
186 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
Column #
Row #
Pixel
Cell value 1 2
1 1 Different
layers
FIGURE 10.1
Raster data structure.
(a) (b)
3 3 3 3 1 1 1 1
3 3 3 3 2 2 2 2
3 3 3 3 2 1 1 1
3 3 3 3 2 2 2 2
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 2 2
FIGURE 10.2
Raster image. (a) Each pixel of the raster is color coded and (b) value of each pixel and order of
pixel storage.
Raster Data Algorithm 187
84 84 90 90
88 93 93 93
93 93 93 93
87 87 87 94
Array 84 84 90 90 88 93 93 93 93 93 93 93 87 87 87 94
Run Length 84 2 90 2 88 1 93 7 87 3 94 1
FIGURE 10.3
Example of Run Length Coding.
188 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
4 matrix. The computer will scan it starting from the top left and move right,
working its way down, while keeping the data in an array. Then Run Length
Coding will process the string of pixels into a string of pairs (the identical
pixel value, and times of pixel repetition). The length of the initial string is
16 and after Run Length Coding the length is 12. Therefore, Run Length
Coding effectively reduces the storage volume.
Run Length Coding has its limitations. For example, Run Length will not
save storage in cases where pixel values do not repeat frequently. In some
cases, such as DEM data in a mountainous area, neighboring pixels always
have different values, and Run Length Coding may actually increase the
length of the initial storage. However, Run Length Coding is very successful
when dealing with black and white images, such as a fax. In this case, it is
relatively efficient because most faxed documents are predominantly white
space, with only occasional interruptions of black.
89 89 85 85 01 03 89 89
0 2 2 85
89 77 85 85 00 02 89 77
Root
0 1 2 3
85 85
00 01 02 03 10 11 12 13
89 89 77 89 89 89 75 68
FIGURE 10.4
Quad tree process. (a) Pixel value of the rater, (b) search order of the four quadrants, (c) continu-
ing dividing when finding non-equal values inside each quadrant of (b), and (d) final division.
Raster Data Algorithm 189
10.3.1 TIFF
TIFF (Tagged Image File Format) is an image format recognized by many
computer systems. The TIFF imagery file format is used to store and transfer
digital satellite imagery, scanned aerial photos, elevation models, scanned
maps, or the results of many types of geographic analysis. TIFF supports
various compression and tiling options to increase the efficiency of image
transfer and utilization. The data inside TIFF files are categorized as lossless
compressed or lossy compressed.
10.3.2 GeoTIFF
GeoTIFF are TIFF files that have geographic (or cartographic) data embed-
ded as tags within the TIFF file (Ritter and Ruth 1997). The geographic data
can then be used to position the image in the correct location and geometry
on the screen of a geographic information display. The potential additional
information includes map projection, coordinate systems, ellipsoids, datums,
and everything else necessary to establish the exact spatial reference for the
file. Any Geographic Information System (GIS), Computer Aided Design
(CAD), Image Processing, Desktop Mapping, or other type of systems using
geographic images can read GeoTIFF files created on any system following
the GeoTIFF specification.
10.3.3 IMG*
IMG files are produced using the IMAGINE image processing software
created by ERDAS. IMG files can store both continuous and discrete, single-
band and multiband data. These files use the ERDAS IMAGINE Hierarchical
File Format (HFA) structure. An IMG file stores basic information including
file information, ground control points, sensor information, and raster layers.
Each raster layer in the image file contains information in addition to its
data values. Information contained in the raster layer includes layer informa-
tion, compression, attributes, and statistics. An IMG file can be compressed
when imported into ERDAS IMAGINE, which normally uses the run length
compression method (described in Section 10.2.1).
10.3.4 NetCDF
NetCDF (Network Common Data Form) is a set of software libraries and
self-describing, machine-independent data formats that support the cre-
ation, access, and sharing of array-oriented scientific data (Rew and Davis
1990). It is commonly used in climatology, meteorology, and oceanography
applications (e.g., weather forecasting, climate change) and GIS applications.
It is an input/output format for many GIS applications, as well as for general
scientific data exchange. NetCDF is stored in binary in open format with
optional compression.
10.3.5 BMP
BMP (Windows Bitmap) supports graphic files inside the Microsoft Windows
Operational System. Typically, BMP files data are not compressed, which
can result in overly large files. The main advantages of this format are its
simplicity and broad acceptance.
10.3.6 SVG
Scalable Vector Graphics (SVG) are XML-based files formatted for
2D vector graphics. It utilizes a lossless data compression algorithm, and
typically reduces data to 20%–50% of the original size.
10.3.7 JPEG
JPEG (Joint Photographic Experts Group) files store data in a format with loss
compression (in major cases). Almost all digital cameras can save images in
JPEG format, which supports eight bits per color for a total of 24 bits, usually
producing small files. When the used compression is not high, the quality
of the image is not as affected, however, JPEG files can suffer from notice-
able degradations when edited and saved recurrently. For digital photos
that need repeated editing or when small artifacts are unacceptable, lossless
formats other than JPEG should be used. This format is also used as the
compression algorithm for many PDF files that include images.
10.3.8 GIF
GIF (Graphic Interchange Format) is the first image format used on the World
Wide Web. This format is limited to an 8-bit palette, or 256 colors. It utilizes
lossless Lempel–Ziv–Welch (LZW) compression, which is based on patented
compression technology.
10.3.9 PNG
PNG (Portable Network Graphic) is an open-source successor to GIF.
In contrast to the 256 colors supported by GIF, this format supports true color
(16 million colors). PNG outperforms other formats when large uniformly
colored areas form an image. The lossless PNG format is more appropriate
for the edition of figures and the lossy formats, as JPEG, are better for final
distribution of photos, because JPEG files are smaller than PNG files.
Colormap
19
18
Pixel value 17
16
1 I5
14
0
13
0 12
11
1 10
9 200
0 Colorcell 8
7
0
6
0 5
4
0 3
2
1
0
FIGURE 10.5
Pixel value to grayscale mapping.
max(R, G, B) + min(R, G, B)
, Luster method
2
v′ = R + G + B
, Intensity metho
od (10.1)
3
0.21 × R + 0.72 × G + 0.07 × B, Luma method
Colormap
R G B
19
18
Raster Data Algorithm
Pixel value 17
16
1 I5
14
0
13
0 12
11
1 10
9 0 0 255
0 Colorcell 8
7
0
6
0 5
4
0 3
2
1
0
FIGURE 10.6
Pixel value to RGB mapping with the colormap.
193
194 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 10.7
RGB to grayscale. (Original image from http://www.coloringpages456.com/color-pictures/.)
The color depth measures the amount of color information available to display
or print each pixel of a digital image. Owing to the finite nature of storage
capacity, a digital number is stored with a finite number of bits (binary digits).
The number of bits determines the radiometric resolution of the image. A high
color depth leads to more available colors, and consequently to a more accu-
rate color representation. For example, a pixel with one bit depth has only two
possible colors. A pixel with 8 bits depth has 256 possible color values, ranging
from 0 to 255 (i.e., 28–1), and a pixel with 24 bits depth has more than 16 million
of possible color values, ranging from 0 to 16,777,215 (i.e., 224–1). Usually, the
color depths vary between 1 and 64 bits per pixel in digital images.
(a) (b)
1400
1200
1000
800
600
400
200
Count
0
0 50 100 150 200 250
Pixel value
(c) (d)
1000
900
800
700
600
500
400
300
200
Count
100
0
0 50 100 150 200 250
Pixel value
FIGURE 10.8
Stretch renderer. (a) Original figure, (b) Histogram of original figure, (c) Stretched figure,
(d) Histogram of stretched figure.
values could be stretched to utilize this range (Figure 10.8d). In the case of
eight bit planes, values are calculated in Equation 10.2.
v′ = m × v + c
28 − 1
m = (10.2)
max(v) − min(v)
c = 28 − 1 − m × max(v)
where v′ refers to the stretched pixel value and v refers to the original pixel
value. This may result in a crisper image, and some features may become
easier to distinguish (Figure 10.8c).
Different stretches will produce different results in the raster display;
standard methods include Standard Deviation, Minimum–Maximum,
Histogram Equalize, and Histogram Specification.
The RGB Composite renderer uses the same methods as the Stretched
renderer, but allows combining bands as composites of red, green, and blue.
196 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
Land use
0
1
2
3
4 Land use
5
preference
6
7 1
8 2
9 3
FIGURE 10.9
Reclassification of categorical data involves replacing individual values with new values.
For example, land use values can be reclassified into preference values of low (1), medium (2),
and high (3).
Distance
to roads
High : 1784.73 Distance
preference
1
2
Low : 0 3
FIGURE 10.10
Reclassification of continuous data involves replacing a range of values with new values.
For example, a raster depicting distance from roads can be reclassified into three distance
zones.
198 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
When the raster is stored in RGB mode, we will see three sublayers in the
“Table of Contents” (Figure 10.13) and under the “Symbology” tab in the
“Layer Properties” window, there is an “RGB Composite” renderer choice,
but “Unique Value,” “Classified,” and “Discrete Color” are no longer available.
FIGURE 10.11
A raster data rendered using color map in ArcMap. (a) The raster layer rendered using color
map. (b) The label of the layer. (c) The properties window.
Raster Data Algorithm 199
(a)
(b)
FIGURE 10.12
Steps of export raster data in ArcMap. (a) Export the land cover data and (b) “Export Raster
Data” window.
FIGURE 10.13
Raster displayed in RGB.
Note that the “Reclassify” function requires that the ArcMap users
have the “Spatial Analyst” extension; therefore, check the license of
the extension before executing the analysis.
The Reclassify function provides two methods for defining the
classes: RemapRange redefines a range of values into a new class
value; RemapValue defines the one-to-one mapping relationship,
that is, replacing a single original value with a single new value.
2. Land cover dataset stores the land cover types in detail. For exam-
ple, “Forest” is divided into three subtypes: “Deciduous Forest,”
“Evergreen Forest,” and “Mixed Forest.” Run Code 10.2 in the
ArcMap python window to generalize the land cover dataset.
The result is shown in Figure 10.15.
3. Run Code 10.3 in ArcMap python window to overlay the classified
DEM and land cover datasets to find the area in forest (land cover
dataset pixel value = 4) with an elevation between 60 and 100 (DEM
dataset pixel value = 100). For this specific dataset, one way to find
the expected area is to add the two layers and find the pixels with
the value in 104, then reclassify all pixels with the value in 104 into
one class (1), and all other pixels into another class (0). Figure 10.16
shows the result of reclassification.
"""
Check the license of the spatial analyst extension. The
returning value "available" means the functions in this
extension are usable.
"""
arcpy.CheckExtension("spatial")
"""
Define the input data. "Dem" is a file geodatabase raster
dataset stored in "chp10data.gdb"
"""
inRaster = "dem"
"""
Define the value classes. The first two elements in the
bracket [0,10,10] means the minimum and maximum values in the
class and the third element means the new value for the pixels
following in the class.
"""
ranges = arcpy.sa.RemapRange([[0,10,10],[10,30,30],[30,60,60],
[60,100,100],[100,175,175]])
"""
Execute the Reclassify function based on the "Value" field of
the raster using the RemapRange and set missing values as
"NODATA".
"""
outDEM = arcpy.sa.Reclassify(inRaster, "Value", ranges, "NODATA")
"""
Output the reclassify function result as a new raster dataset
in the file geodatabase, named as "classifiedElevation"
(Figure x left).
"""
outDEM.save("classifiedElevation")
CODE 10.1
Classify DEM data.
2. Using the classified land cover dataset that resulted from the
previous practice, run Code 10.5 in the ArcMap Python win-
dow to calculate the area of each land cover type. The area
of each land cover type can be first calculated using the pro-
portion of the pixel counts of this type to the total pixel count,
and then multiply the total area of the raster dataset. Accordingly,
use the SearchCursor (see Chapter 8) to capture the counts of
pixels.
202 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 10.14
Result of reclassifying DEM data in Code 10.1.
"""
Set the mapping relationship between old values and new values.
The first element in the bracket [11,1] means the old value and
the second element means the new value.
"""
values = arcpy.sa.RemapValue([[11,1],[21,2],[22,2],[23,2],[24,2],[31,3],
[41,4],[42,4],[43,4],[52,5],[71,7],[81,8],[82,8],[90,9],
[95,9]])
CODE 10.2
Generalize the land cover dataset.
Raster Data Algorithm 203
FIGURE 10.15
Result of reclassifying land cover type data in Code 10.2.
"""
Use the raster calculator to add two layers. Note that the
raster calculator can execute many other algebra calculations
on the
raster dataset.
"""
temp = arcpy.gp.RasterCalculator_sa("'outLandcover' + 'outDEM'",
"overlayRaster")
"""
Reclassify the layers into two classes - 1 and 0. 1
represents the area that is in the forest and has expected
elevation.
"""
outReclassify = arcpy.sa.Reclassify("overlayRaster", "Value", arcpy.
sa.RemapRange([[12,103,0],
[104,104,1],[105,184,0]]),
"NODATA")
CODE 10.3
Reclassify and find specific areas.
204 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 10.16
Result of reclassifying land cover type data in Code 10.3.
CODE 10.4
Calculate area.
"""
Use the SearchCursor to access the Value and Count fields.
The Value field is the land cover type value.
"""
with arcpy.da.SearchCursor("classifiedLandcover", ["Value", "Count"])
as \ cursor:
for row in cursor:
totalCount = totalCount + row[1]
counts.append({'type': row[0], 'count':row[1]})
CODE 10.5
Calculate total area.
Raster Data Algorithm 205
PROBLEMS
For raster data of your choice, design a scenario that requires reclassifica-
tion. Explain the reasoning for reclassification and determine the purpose
for the new classes. Calculate the area for each class and use different color
rendering methods to present the result.
NO T E : All codes can be successfully executed on ArcGIS for desktop
ersions 10.2.2 to 10.3. There may be problem on running the code on more
v
recent version of ArcGIS.
11
Network Data Algorithms
• V = {v1,v2,v3,v4}
• E = {(v1,v2), (v2,v4), (v1,v3), (v1,v4), (v3,v4)}
207
208 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
Edges
Nodes
FIGURE 11.1
An example of basic network elements.
v3
e3
v1
e1 e5
e4
v2
e2 v4
FIGURE 11.2
An example of basic network representation.
Network Data Algorithms 209
(a) (b)
v1 v1
v2 v3 v2 v3
FIGURE 11.3
Example of directed and undirected network. (a) Directed network, (b) Undirected network.
(a) (b)
v1 v1
v2 v3 v2 v3
v1 v2 v3 v1 v2 v3
v1 0 1 0 v1 0 1 1
v2 0 0 0 v2 1 0 1
v3 1 1 0 v3 1 1 0
FIGURE 11.4
Adjacency matrix of directed and undirected network. (a) Directed network, (b) Undirected
network.
210 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
TABLE 11.1
Example of Node Table for the Network in Figure 11.2
ID X Y
v1 23.2643 75.1245
v2 23.1443 74.1242
v3 23.2823 75.1315
v4 23.1442 75.1286
TABLE 11.2
Example of Link Table for the Network in Figure 11.2
ID Origin Destination One-Way
e1 v1 v2 Not
e2 v2 v4 Not
e3 v1 v3 Not
e4 v1 v4 Not
e5 v3 v4 Not
• Node table: This table contains at least three fields: one to store a
unique identifier and the others to store the node’s X and Y coordi-
nates. Although these coordinates can be defined by any Cartesian
reference system, longitudes and latitudes ensure an easy portabil-
ity to a GIS (Rodrigue, 2016).
• Links table: This table also contains at least three fields: one to store a
unique identifier, one to store the node of origin, and one to store the
node of destination. A fourth field can be used to state whether the
link is unidirectional or not.
• Step 1: Find all possible paths from the start point to the end point.
• Step 2: Calculate the length of each path.
• Step 3: Choose the shortest path by comparing the lengths of all
different paths.
For example, given the network in Figure 11.5, we would like to find the
shortest path from A to all the other vertices. The number of each edge is the
cost and Table 11.3 shows all the possible paths from A to the other vertices.
Although this method of finding the shortest path is simple and straight-
forward, the complexity of this approach increases exponentially with the
number of vertices and edges. For example, if we connect B and C, there will
be at least two more routes from A to E. In a real network application, we
1 2
1
A D E
2 2
C
FIGURE 11.5
A network example used to illustrate finding shortest path problem.
212 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
TABLE 11.3
Brute Force Approach to Solving the Shortest Path Problem
(Find the Shortest Path from A to E)
Destination Point Possible Paths and Length Shortest Path
B AB: 1; ACDB: 6 AB: 1
C AC: 2; ABDC: 5 AC: 2
D ABD: 3; ACD: 4 ABD: 2
E ABDE: 4; ACDE: 5 ABDE: 4
usually have a large number of both vertices and edges (e.g., a transporta-
tion system), which would be very expensive and time-consuming from a
computational standpoint. Therefore, a computationally efficient algorithm
to calculate the shortest path is needed.
• Assign to every node a tentative distance value: set it to zero for the
initial node and to infinity for all other nodes.
• Set the initial node as current. Mark all other nodes unvisited.
Create a set of all the unvisited nodes called the unvisited set.
• For the current node, consider all of its unvisited neighbors and
calculate their tentative distances. Compare the newly calculated
tentative distance to the current assigned value and assign the
smaller one. For example, if the current node A is marked with
a distance of 6, and the edge connecting it with a neighbor B has
length 2, then the distance to B (through A) will be 6 + 2 = 8. If B was
previously marked with a distance greater than 8, then change it to
8. Otherwise, keep the current value.
• After considering all of the neighbors of the current node, mark
the current node as visited and remove it from the unvisited set.
A visited node will never be checked again.
• If the destination node has been marked visited (when planning
a route between two specific nodes) or if the smallest tentative
distance among the nodes in the unvisited set is infinity (when
planning a complete traversal; occurs when there is no connection
Network Data Algorithms 213
TABLE 11.4
Process of Using Dijkstra Algorithm to Solve the Shortest Path
from A to Other Vertices (CX Means the Cost from A to X)
Step Selected Point CA CA CA CA CA Path M
1 A 0 ∞ ∞ ∞ ∞ A A
2 B 1 2 ∞ ∞ AB AB
3 C 2 3 ∞ AC ABC
4 D 3 ∞ ABD ABCD
5 E 1 2 3 4 ABDE ABCDE
between the initial node and remaining unvisited nodes), then stop.
The algorithm has finished.
• Otherwise, select the unvisited node that is marked with the
smallest tentative distance, set it as the new “current node,” and go
back to step 3.
TABLE 11.5
Pseudo-Code for the Dijkstra Algorithm
Input: Network dataset (G), starting vertex (A)
{
DK1: for each vertex v in G:
DK2: dist[v] = ∞
DK3: dist[A] := 0
DK4: T = the set of all vertices in G
DK5: while T is not empty:
DK6: s = vertices in T with smallest dist[ ]
DK7: delete s from T
DK8: for each connected (neighbor) v of s:
DK9: temp_Distance = dist[s] + dist_between(s, v)
DK10: if temp_Distance < dist(v)
DK11: dist [v] = temp_Distance
DK12: shortest_Distance [v] = temp_Distance
DK13: return shortest_Distance []
}
214 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
11.3.1 Routing
Whether finding a simple route between two locations or one that vis-
its several locations, people usually try to take the best route. But the
“best route” can mean different things in different situations. The best
route can be the quickest, shortest, or most scenic route, depending on
the impedance c hosen. The best route can be defined as the route that has
the lowest impedance, where the impedance is chosen by the user. If the
impedance is time, then the best route is the quickest route. Any valid
network cost attribute can be used as the impedance when determining
the best route.
FIGURE 11.6
Example of closest facility.
using a simple circle. Considering that people travel by road, however, this
method will not reflect the actual accessibility to the site. Service networks
computed by Network Analyst can overcome this limitation by identify-
ing the accessible streets within 5 kilometers of a site via the road network
(Figure 11.7).
FIGURE 11.7
Example of service areas.
216 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 11.8
Example of OD cost matrix.
FIGURE 11.9
Example of vehicle routing problem.
limits imposed by driver work shifts, driving speeds, and customer com-
mitments (Figure 11.9).
11.3.6 Location-Allocation
Location-allocation could help choose, given a set of facilities, from which
specific facilities to operate, based on their potential interaction with demand
points. The objective may be to minimize the overall distance between
demand points and facilities, maximize the number of demand points
covered within a certain distance of facilities, maximize an apportioned
amount of demand that decays with increasing distance from a facility, or
maximize the amount of demand captured in an environment of friendly
and competing facilities.
Figure 11.10 shows the results of a location-allocation analysis meant to
determine which fire stations are redundant. The following information was
provided to the solver: an array of fire stations (facilities), street midpoints
(demand points), and a maximum allowable response time. The response
time is the time it takes firefighters to reach a given location. In this exam-
ple, the location-allocation solver determined that the fire department could
close several fire stations and still maintain a 3-minute response time.
218 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 11.10
Example of location-allocation.
1. The network dataset and the stops of the salesman are available in
the data disk under the folder “chp11data.” The first step to conduct
network analysis is to find the shortest path by using the arcpy.
na.MakeRouteLayer function to create a route layer from the net-
work dataset (Code 11.1).
The input parameter is the ND layer of the network dataset. Give
a name to your output route layer, such as myRoute. In this example,
the impedance attribute is “Length.” “FIND_BEST_ORDER” and
“PRESERVE_BOTH” mean that the order of the salesman’s stops can
be changed when analyzing the shortest path (to approach an opti-
mal result), but the first and end stops are preserved as his fixed start
and end locations. The total length of the resulting path is calculated
for reference (accumulate_attribute_name = "Length"). Figure 11.11
shows the route data layer in ArcMap created by Code 11.1.
Network Data Algorithms 219
arcpy.env.workspace = 'C:\\ArcGISdata\\chp11data'
CODE 11.1
Script to create a route layer from the network dataset.
FIGURE 11.11
Network dataset (left) and the route layer generated using the dataset (right).
CODE 11.2
Script to get all input sublayer in the route layer.
2. Add the stops of the salesman in the “stops.shp” to the route layer.
Code 11.2 is provided to obtain all the subclasses in the route
layer structure. The subclasses of a route layer include Barriers,
PolygonBarriers, PolylineBarriers, Stops, and Routes. Except the
“Routes” class, all the other classes are the input classes that allow
users to input stops and barriers to restrict the network analysis.
220 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 11.12
Returned result of Code 11.2: all input sublayer in the route layer.
"""
Add stops (the points in the "stops.shp") to the "Stops"
class in the route layer.
Fieldmapping is used to input the attribute in the stops.shp
to the "Stops" subclass to constrain the network analysis
"""
fieldMappings = arcpy.na.NAClassFieldMappings(routeLy,
naClasses["Stops"])
fieldMappings["Attr_Length"].defaultValue = 0
fieldMappings["Attr_speed"].defaultValue = 0
# add the points in stops feature class into the sublayer “Stopsâ€
of route layer with field mapping
CODE 11.3
Script to add the salesman’s stops to the “Stops” sublayer in the route layer.
Network Data Algorithms 221
FIGURE 11.13
Returned result of Code 11.3: the route layer with stops added in.
CODE 11.4
Script to execute shortest path algorithm in the route layer.
FIGURE 11.14
Routing result (upper: the route line; lower: the attribute table of the route layer).
3. Adjacency matrix
4. Links Table and Node Table
5. Shortest path algorithms (the Dijkstra algorithm)
6. Hands-on experience with ArcGIS through arcpy scripting
PROBLEMS
Review the class material and practice code, and develop a network for
your University Campus for routing:
223
224 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 12.1
Contour lines.
Surface Data Algorithms 225
values, the lines are spaced farther apart; where the values rise or fall rapidly,
the lines are closer together. Contour lines can, therefore, be used not only
to identify locations that have the same value, but also gradient of values.
For topographic maps, contours are a useful surface representation, because
they can simultaneously depict flat and steep areas (distance between con-
tours) and ridges and valleys (converging and diverging polylines).
The elements needed to create a contour map include a base contour and a
contour interval from values for a specific feature. For example, we can create
a contour every 15 meters, starting at 10 meters. In this case, 10 meters would
be considered the base contour and the contour interval would be 15 meters;
the values to be contoured would be 10 m, 25 m, 40 m, 55 m, etc.
12.1.2.2.1 Grid
Grid surface refers to a surface map plotted as a grid of surface values with
uniformly spaced cells. This grid is in the same data structure as raster data,
consisting of a rectangular matrix of cells represented in rows and columns.
Each cell represents a defined square area on the Earth’s surface and holds
a value that is static across the entire cell (Figure 12.2). Elevation models are
one such example of Grid surface models.
FIGURE 12.2
Grid surface model.
226 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
12.1.2.2.2 TIN
TINs are a form of vector-based digital geographic data and are constructed
by triangulating a set of vertices, each with its own x, y coordinate and z
value (Figure 12.3). The vertices are connected with a series of edges to form
a network of triangles. Triangles are nonoverlapping; thus, no vertex lies
within the interior of any of the circumcircles of the triangles in the network
(Figure 12.4). Each triangle comprises of three vertices in a sequence, and is
adjacent to at least one neighboring triangle.
FIGURE 12.3
TIN.
Surface Data Algorithms 227
5
3
4 Node ID X Y Z
4 1 1 3 3
2
3
D 2 1 5 6
B
C 3 4 6 7
2 A
4 6 5 8
1 1
5 5 5 3 5
6
6 3 3 4
1 2 3 4 5 6
Triangle ID Node sequence Neighbors
A 1, 2, 6 –, B, –
B 2, 3, 6 –, B, C
C 6, 3, 5 B, D, –
D 3, 4, 5 –, –, C
FIGURE 12.4
TIN triangles.
∑z i
(12.1)
z0 = i =1
∑ weight ∗ z i i
z0 = i =1
s (12.2)
∑ weight i
i =1
Different methods exist for calculating the grid’s elevation based on the
weights’ scheme and the number of nearest points used. Commonly used
interpolation methods include inverse distance weighting (IDW), spline, krig-
ing, and natural neighbors. IDW weights the points closer to the target cell
more heavily than those farther away. Spline fits a minimum curvature sur-
face through the input points. Kriging is a geostatistical interpolation tech-
nique in which the surrounding measured values are weighted to derive a
predicted value for an unmeasured location. These weights are based on the
distance between the measured points, the prediction locations, and the over-
all spatial arrangement among the measured points. Natural neighbors create
a Delaunay triangulation of the input points, selecting the closest nodes that
form a convex hull around the interpolation point, and then weighting their
values proportional to their area. In ArcGIS, Grid surface is phrased as Raster.
Surface Data Algorithms 229
1.
Pick sample points. In many cases, sample points must be selected
from control points, such as existing, dense Digital Elevation Model
(DEM) or digitized contours, to ensure accuracy of representation.
There are several existing algorithms for selecting from a DEM: the
Fowler and Little (1979) algorithm, the VIP (Very Important Points)
algorithm (Chen and Guevara 1987), and the Drop heuristic algo-
rithm (Lee 1991). In essence, the intent of these methods is to select
points at significant breaks of the surface.
2.
Connect points into triangles. The selected TIN points will then
become the vertices of the triangle network. Triangles with angles
close to 60 degrees are preferred since this ensures that any point
on the surface is as close as possible to a vertex. There are dif-
ferent methods of interpolation to form the triangles, such as
Delaunay triangulation or distance ordering. Delaunay triangula-
tion, the method most commonly used in practice, ensures that
no vertex lies within the interior of any of the circumcircles of the
triangles in the network (Figure 12.5). Delaunay triangulation is
accomplished either by starting from the edges of the convex hull
and working inward until the network is complete, or by connect-
ing the closest pair that must be a Delaunay edge, searching for
a third point such that no other point falls in the circle through
them, and then working outward from these edges for the next
closest point.
3.
Model the surface within each triangle. Normally, the surface within
each triangle is modeled as a plane.
FIGURE 12.5
Delaunay triangulation.
12.3.1 Elevation
Elevation of a certain point can be calculated based on the interpolation
methods introduced in creating the surface. In a Grid model, elevation of a
Surface Data Algorithms 231
certain point can be calculated using the Grid cells close to the point. Taking
IDW as an example, a general form of finding an interpolated elevation z at
a given point x based on samples zi = z (xi) for i = 1, 2, …, N using IDW is an
interpolating function (Equation 12.3).
N
∑ w ( x )z
i i
i =1
N , if d( x , xi ) ≠ 0 for all i
z( x) =
∑ w ( x)
i =1
i (12.3)
zi , if d( x , xi ) = 0 for some i
12.3.2 Slope
Slope identifies the steepest downhill slope for a location on a surface. Slope
is calculated for each triangle in TINs and for each cell in raster. TIN is the
maximum rate of change in elevation across each triangle, and the output
polygon feature class contains polygons that classify an input TIN by slope.
For raster, slope is determined as the greatest of the differences in elevation
between a cell and each of its eight neighbors, and the output is a raster.
The slope is the angle of inclination between the surface and a horizontal
plane, and may be expressed in degrees or percent. Slope in degrees is given
by calculating the arctangent of the ratio of the change in height (dZ) to the
change in horizontal distance (dS) (Equation 12.4).
a b c
d e f
g h i
FIGURE 12.6
Surface scanning window.
The rate of change in the x direction for cell e is calculated with Equation 12.6.
The rate of change in the y direction for cell e is calculated with Equation 12.7.
Based on the above Equations 12.4 through 12.7, the summarized algo-
rithm used to calculate the slope is demonstrated in Equation 12.8.
2 2
dz dz
Sloperadians = atan + (12.8)
dx dy
Slope can also be measured in units of degrees, which uses Equation 12.9.
2 2
dz dz
Slopedegrees = atan + × 57.295578 (12.9)
dx dy
12.3.3 Aspect
Aspect is the direction that a slope faces. It identifies the steepest downslope
direction at a location on a surface. It can be thought of as slope direction or
the compass direction a hill faces. Aspect is calculated for each triangle in
TINs and for each cell in raster. Figure 12.7 shows an example of the aspect
results of a surface using ArcMap 3D Analytics.
Aspect is measured clockwise in degrees from 0 (due north) to 360 (again
due north, coming full circle). The value of each cell in an aspect grid indi-
cates the direction in which the cell’s slope faces (Figure 12.7).
Aspect is calculated using a moving 3 × 3 window visiting each cell in
the input raster. For each cell in the center of the window (Figure 12.6), an
Surface Data Algorithms 233
315 45
N
NW NE
270 W E 90
SW SE
S
225 135
180
FIGURE 12.7
Clockwise in calculation aspect.
Taking the rate of change in both the x and y directions for cell e, aspect is
calculated using Equation 12.11.
if aspect < 0
cell = 90.0 − aspect
else if aspect > 90.0
cell = 360.0 − aspect + 90.0
else
cell = 90.0 − aspect
Aspect code
–1
Elevation
460.556–494 1
427.111–460.556 2
3
393.667–427.111
4
360.222–393.667
5
326.778–360.222
6
293.333–326.778
7
259.889–293.333
8
226.444–259.889
9
193–226.444
FIGURE 12.8
Surface aspect (3D Analyst).
For raster surface dataset, the distance is calculated between cell centers.
Therefore, if the cell size is 1, the distance between two orthogonal cells is 1,
and the distance between two diagonal cells is 1.414 (the square root of 2).
If the maximum descent to several cells is the same, the neighborhood is
enlarged until the steepest descent is found. When a direction of steepest
descent is found, the output cell is coded with the value representing that
direction (Figures 12.9 and 12.10). Taking the 3 by 3 square (in red rectangle)
as an example, the center cell (row 3, column 3) has a value of 44, surrounded
by 8 neighboring cells. The steepest descent can be found at southeastern
cell, which has the largest change from the center cell. Since the steepest
descent direction is found to be the southeast, based on the direction coding
(Figure 12.9), the flow direction of the center cell is 2.
If all neighboring cells are higher than the processing cell or when two
cells flow into each other, creating a two-cell loop, then the processing cell is
a sink, whose flow direction cannot be assigned one of the eight valid values.
Sinks in elevation data are most commonly due to errors in the data. These
errors are often caused by sampling effects and the rounding of elevations
to integer numbers.
Surface Data Algorithms 235
32 64 128
16 1
8 4 2
FIGURE 12.9
Direction coding.
78 72 69 71 58 49 2 2 2 4 4 8
74 67 56 49 46 50 2 2 2 4 4 8
69 53 44 37 38 48 1 1 2 4 8 4
64 58 55 22 31 24 128 128 1 2 4 8
68 61 47 21 16 19 2 2 1 4 4 4
74 53 34 12 11 12 1 1 1 1 4 16
Elevation surface Flow direction
FIGURE 12.10
Flow direction example.
(a) (b)
FIGURE 12.11
Profile view of a sink before and after fill.
FIGURE 12.12
Flow accumulation calculations. (a) Flow directions. (b) Incremental flow. (c) Total flow.
from northwest cell, 3 from west cell, and 0 from southwest. Therefore, the
accumulation flow of this cell is 3 + 1 + 3 + 0 = 7.
Step 1. Run Code 12.1 in the ArcMap Python window to create a TIN
from a DEM surface raster dataset, and then generate a DEM from
the TIN (Figure 12.13).
Surface Data Algorithms 237
arcpy.env.workspace = r"C:\\ArcGISdata\\chp12data.gdb"
"""
DEM to TIN
The TIN data cannot be saved in a geodatabase, so the output data
should be put into a folder e.g. C:\\ArcGISdata\\chp12data.gdb\\tin
"""
arcpy.RasterTin_3d("dem", r"E:\\ArcGISdata\\chp12data.gdb\\tin")
"""
TIN to DEM
The cell size of the new DEM is 50 meters, values are in the
float type, and the methos used to raster the DEM is linear
interpolation
"""
arcpy.TinRaster_3d(in_tin=r"E:\\ArcGISdata\\chp12data.gdb\\tin",
out_raster="demFromTIN",
data_type="FLOAT", method="LINEAR",
sample_distance="CELLSIZE 50",z_factor="1")
CODE 12.1
Conversion between DEM and TIN.
(a) (b)
FIGURE 12.13
Result of Code 12.1. (a) TIN created from DEM. (b) DEM created from the TIN.
Step 2. Compare the original DEM and the new DEM that was regener-
ated from the TIN. Observe the areas with dramatic difference and
consider the reasons (Figure 12.14).
Step 3. Run Code 12.2 in ArcMap Python window to create contours
through a DEM surface raster dataset (Figure 12.15).
Step 4. Run Code 12.2 again, using the new DEM generated from the
TIN as input, to create another contour layer. Compare the two con-
tour layers (Figure 12.16).
238 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
(a) (b)
FIGURE 12.14
DEM comparison. (a) Original DEM. (b) DEM created from TIN, which is generated from the
original DEM.
"""
The input is "dem" and the output is "contour".
The contour is in 10 meter intervals and starts from 330 meters.
"""
arcpy.Contour_3d(in_raster="dem", out_polyline_features=" contour",
contour_interval="10", base_contour="330", z_
factor="1")
CODE 12.2
Create contour from DEM.
FIGURE 12.15
Result of Code 12.2—the contour created from DEM.
Surface Data Algorithms 239
(a) (b)
FIGURE 12.16
Contour comparison. (a) Contour generated from the new DEM. (b) Pink line is the contour
generated from original DEM and green line is the one from the new DEM.
1. Run Code 12.3 in ArcMap Python window to create slope from DEM
(Figure 12.17).
2. Run Code 12.4 in ArcMap Python window to create aspect from
DEM (Figure 12.18).
CODE 12.3
Create slope from DEM.
240 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 12.17
Result of Code 12.3.
CODE 12.4
Create aspect from DEM.
FIGURE 12.18
Result of Code 12.4.
Surface Data Algorithms 241
"""
Set the workspace. All new raster layers generated will be stored
in the workspace.
"""
arcpy.env.workspace = r'C:\\ArcGISdata\\chp12data.gdb'
"""
Create flow direction with "dem" as input. The "NORMAL" argument means
edge cells are not forced outward, but follow normal flow rules.
"""
fd = arcpy.sa.FlowDirection("dem","NORMAL")
CODE 12.5
Create flow direction from DEM.
FIGURE 12.19
Result of Code 12.5.
# calculate sink
sinks = arcpy.sa.Sink("fd")
# fill the sinks on dem
dem_sinkfilled = arcpy.sa.Fill("dem")
CODE 12.6
Check and fill sink.
CODE 12.7
Recreate flow direction and calculate flow accumulation.
242 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 12.20
Flow accumulation layer generated by running Codes 12.3 through 12.7.
PROBLEMS
1. Review the chapter and 3D Analyst extension of ArcGIS.
2. Pick a problem related to 3D surface analysis.
3. Design a solution for the problem, which should include the trans-
formation to TIN and Grid, slope analysis, and aspect or flow direc-
tion analysis.
Surface Data Algorithms 243
Advanced Topics
13
Performance-Improving Techniques
13.1 Problems
If the waiting time for processing cannot be tolerated by the end user, then
the performance of a tool or computer software is critical to its success. Many
GIS algorithms are time-consuming. For example, given the 10,000 GIS river
features of the United States, finding the features within Washington, D.C.,
can be accomplished by evaluating each river to see whether or not it inter-
sects with the D.C. boundary. Supposing that one such evaluation takes 0.1
second, the entire process could take 0.1 × 10,000 = 1000 seconds, or approx-
imately 18 minutes. Such a long time is not tolerable to end users and is
not typical to commercial software or optimized open-source software.
As another example, if we have 10,000 roads and 10,000 rivers within the
United States, and we want to find all intersecting points, we need to check
for possible intersections between each road and each river. If such an
average intersecting check takes 0.01 second, the entire process would take
10,000 × 10,000 × 0.01 seconds = 1M seconds or approximately 300 hours,
which is not tolerable. Many techniques can be utilized to improve process-
ing. We will take a closer look at three different techniques to examine how
and to what extent they can improve performance, as well as their limitations
and potential problems.
247
248 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
integrated inside the CPU, and the computer motherboard data bus inte-
grates the CPU with other on-board components, such as RAM and others
built on the motherboard. Many other components, such as printers, massive
storage devices, and networks, are connected through an extension (e.g., a
cable), from the motherboard. Hard drive (HDD) and massive storage are
among the most frequently used external devices to maintain data/system
files. Recent advances in computer engineering have enabled more tightly
coupled storage on the motherboard using bigger RAM and ROM sizes. In
general, a hard drive is slower than RAM in terms of data access; therefore,
we can speed up a file access process by reading only once from HDD into
RAM and then operating in the memory multiple times instead of reaccess-
ing HDD many times (Yang et al. 2011c).
a:
if bFileBuffer:
size = os.path.getsize(fileName)
shpFile=open(fileName,'rb')
s = shpFile.read(size)
shpFile.close()
b = struct.unpack('>i',s[24:28])
b=b[0]*2
featNum = (b-100)/28
shpFile.close()
layer.minx, layer.miny, layer.maxx, layer.maxy = struct.
unpack(“<dddd”,s[36:68])
pointer = 100+12
for i in range(0,featNum):
b = struct.unpack('dd',s[pointer:pointer+16])
point = FTPoint(b[0],b[1])
layer.features.append(point)
pointer+=28
b:
else:
shpFile=open(fileName,'rb')
s = shpFile.seek(24)
s = shpFile.read(4)
b = struct.unpack('>i',s)
b=b[0]*2
featNum = (b-100)/28
s = shpFile.read(72)
header = struct.unpack(“<iidddddddd”,s)
layer.minx, layer.miny, layer.maxx, layer.maxy =
header[2],header[3],header[4],header[5]
for i in range(0,featNum):
shpFile.seek(100+12+i*28)
s = shpFile.read(16)
b = struct.unpack('dd',s)
point = FTPoint(b[0],b[1])
layer.features.append(point)
shpFile.close()
CODE 13.1
Reading point data from a shapefile with using a single- or multiple-access process: Code
(a) reads all data from the file at once using shpFile.read (size), and unpacks the data from the
content read to s, which is kept as a variable in memory. Code (b) reads data from the hard
drive by jumping through the elements needed and unpacking the data while moving ahead.
method. A laptop will have a less drastic speedup than a desktop using this
method, because the speed difference between accessing RAM and HDD on
a desktop is greater than that of a laptop.
The code for both polyline and polygon shapefiles have been added in the
performance.zip package. Different data layers can be added to the main.
py file and run it to test the time needed for reading the shapefiles with and
without the buffer option.
Performance-Improving Techniques 251
The more external devices used, the slower the program. For example, if
“print” is added to the FTPoint class initialization function def __init__(self),
the process will become much slower if the same main file runs without
changing anything else.
13.3.2 Multithreading
The performance package integrates a sample multithreading framework.
The logic workflow repeats a 0.01-second process 1000 times. Processed
sequentially, this will require roughly 10 seconds to finish all processes.
252 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
a:
threads = []
for i in range(10):
threads.append(SummingThread(i*100,(i+1)*100))
starttime = time.clock()
for i in range(10):
threads[i].start() # This actually causes the thread to run
for i in range(10):
threads[i].join() # This waits until the thread has completed
# At this point, both threads have completed
result = 0
for i in range(10):
result+=threads[i].total
b:
thread = SummingThread(0,1000)
starttime = time.clock()
thread.start()
thread.join()
print 'single thread'
print thread.total
print str(time.clock()-starttime) + ' seconds\n'
CODE 13.2
Execute the same process (takes 0.01 second) 10,000 times in 10 multithreads (a) versus in one
single thread (b).
However, you can break the required processes into 10 groups, with each
group processed by a thread, so all 10 groups can be executed concurrently.
Ideally, this approach could reduce the processing time from 10 seconds
to 1 second. Code 13.2a creates a 10-thread list and Code 13.2b creates one
thread and executes the same number of processes. The SummingThread is
inherited from the Python module threading class Thread.
The code is included in the multithreading.py file and the execution will
output the time spent by each method. It is observed that the 10-multithread
approach is about 10 times faster than the single-thread approach by running
the Python file. The code can be experimented on by running on one com-
puter or multiple computers to compare the time outputs to observe the
improvements.
a:
if multithreading:
starttime = time.clock()
lr1 = AddMapLayer(map,'amtk_sta','yellow')
lr2 = AddMapLayer(map,'amtk_sta', 'red')
lr3 = AddMapLayer(map,'amtk_sta','pink')
lr1.start()
lr2.start()
lr3.start()
lr1.join()
lr2.join()
lr3.join()
print (str(time.clock()-starttime) + ' seconds')
b:
else:
starttime = time.clock()
map.addLayer('amtk_sta','yellow')
map.addLayer('amtk_sta', 'blue')
map.addLayer('amtk_sta','red')
print (str(time.clock()-starttime) + ' seconds')
CODE 13.3
Loading three data layers concurrently (a) or in sequence (b).
to read data (a) versus reading data sequentially (b). The AddMapLayer class
is a multithreading class defined as based on the Thread class of threading
module.
The main Python file can be experimented on by running on a computer
and switching the multithreading Boolean variable on or off and recording
the time spent on each approach. As an exercise, run the code 5 times with
each approach and record the average values. Also try this on different com-
puters or compare your results with peers if they used different computers.
The example in Section 13.3.3 reads multiple shapefiles. If the file buffer
strategy is used, as discussed in Section 13.3.1, then it is possible to execute
more concurrent threads because when several threads access intensively, the
hard drive will let the threads compete for the same I/O resource, thereby
degrading the performance gain. The final and actual performance gain can
be determined by testing a combination of different computing techniques.
Many twenty-first century challenges require GIS data, processes, and
applications, and users are widely distributed across the globe (Ostrom et al.
1999). For example, when a tsunami hits the Indian Ocean, people along the
coast are impacted and need GIS to guide them in making safe decisions.
An application built on sequential strategy will not be able to handle this
problem in a way that satisfies end users or provides timely decision support
information. Therefore, supercomputers are adopted to provide information
in near real time by processing the data much more rapidly using closely
coupled CPUs and computers (Zhang 2010). Grid and cloud computing infra-
structure are utilized to share data, information, and processing among users
(Yang et al. 2013). Using such infrastructure would help improve the appli-
cation performance. But the detailed improvements have to be tested and
optimized to gain the best performance in a computing infrastructure. In
Xie et al. (2010), a high performance computing (HPC) example is illustrated,
which has a dust model parallelized and running on an HPC environment
to speed up the process. Results show that with the initial increase of CPU/
server numbers participating in the geospatial simulation, the performance
is rapidly improving. The time spent is reduced by nearly half by increas-
ing from one core to two cores, and from two cores to four cores; however,
increasing beyond eight CPU cores will not further increase performance.
Although concurrent processing can help increase processing speed, the dust
simulation is a spatiotemporal phenomenon, meaning that the dust concen-
tration moves across different simulation subdomains to maintain its natural
continuity. This process of sharing dust data among different subdomains,
known as model synchronization, is critical. The running time may actually
increase if the domain is divided among subdomains on different computers.
The benefit of running in parallel and the cost of synchronizing is observed to
reach a balance at a certain point—in this case, when using eight CPU cores.
spatial principles will optimize the process by filtering out complex calcula-
tions. An important component is the feature bounding box, defined by the
four coordinates of minx, miny, maxx, and maxy for a minimized rectangle
enclosing a feature. The spatial pattern or principle is that when the bound-
ing boxes of two features disjoint from each other, the two features must be
disjointed. If you are to calculate the intersection of a river data layer and road
data layer (Section 13.1), a bounding box of a river in Washington, DC will not
intersect with the bounding box of roads in California. This spatial pattern
can be utilized to filter out most features before starting the complex compu-
tation of the intersection as introduced in Chapter 8. Use the data layer inter-
section as an example to introduce how to build the algorithm into MiniGIS
and how to optimize the algorithm using a bounding box in Section 13.4.1.
Another simplified spatial pattern, applied to one-dimensional data, is to
sort features (such as points) according to a spatial dimension and to conduct
filtering according to a tree structure, such as binary tree. By expanding this
idea to two dimensions, many different types of spatial indices can be built
to speed up spatial relationship calculations (detailed in Section 13.4.2).
1. Check the intersection of two data layers in the map (can be selected
from existing layers in GUI).
2. Check the intersection of two features in the data layers, using one
from each layer (rivers and roads), and keep all intersecting points
(Layer class).
256 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
3. Check the intersection of every line segment pair, using one from
each layer (rivers and roads), and keep all intersecting points
(Polyline class).
4. Check the intersection and calculate the intersecting point (LineSeg
class).
To speed up the process, add the bounding box check to the first three
steps to:
1. Check whether two data layers are intersecting with each other and
return false if not.
2. Check whether two features’ bounding boxes are intersecting with
each other and return false if not.
3. Check whether two line segments’ bounding boxes intersect with
each other and return false if not.
If this statement is true, then the two bounding boxes cannot intersect with
each other. If this is not true, the two bounding boxes intersect with each other
and we can proceed with the following, more time-consuming, calculations.
A
G H
C
I A B
D
C D E F
J
N
K G H I J K L M N
E L
F
M
B
FIGURE 13.1
R-Tree example.
four polygons. Each rectangle is further divided into two smaller rectangles
that are, again, divided into smaller rectangles, each containing one polygon.
In dividing, three types of criteria were applied: (a) the closer polygons are
put in the same rectangle, (b) each rectangle is the minimum rectangle that
includes the bounding boxes of all polygons within that rectangle, and (c) the
process of division yields two rectangles from the original one. These criteria
ensure that the closer polygons are inside the same branch of the R-Tree and
that the R-Tree is a balanced binary tree, which provide better performance.
There could be other criteria applied in the division process, such as minimiz-
ing the overlap of rectangles from the same level of division. The left side of
the figure shows mapping polygons and R-Tree division; the right side shows
the resulting R-Tree. When searching for features, only the branch that inter-
sects with the searching rectangle will be further searched. This will greatly
improve the performance when there are many features included.
line segments, and (d) the line segment intersection algorithm. We can add
a bounding box check for the first three steps, which will reduce the num-
ber of time-consuming calculations of (d) and improve the program perfor-
mance. The three bboxcheck functions are defined in the Layer, Polyline, and
LineSeg classes (Code 13.4). Because each layer has many polylines and each
polyline has many line segments, this process could significantly improve
the performance. Code 13.4 illustrates how the intersect and bboxcheck func-
tions are defined in Layer class. The Layer’s bboxcheck is called by Map class.
The Layer’s intersect function calls the Feature’s bboxcheck to avoid calculat-
ing two features whose bounding boxes do not intersect. The Polyline class
has similar definitions, with its bboxcheck defined by Layer class object, and
its intersect function calling the bboxcheck function of LineSeg class objects.
Once the calculations are completed, you can display the resulting points
on the GUI. This can be handled in several ways, including (a) adding a data
layer to store the points as the “intersecting point layer,” (b) maintaining
a data structure, for example, a list of intersecting points of map object to
keep all points, and (c) associating the points with one of the two layers. To
simplify the process, choose the second method by maintaining a list data
structure. After the intersection calculations, obtain a list of points defined
as self.intersectPoints in the Map class of map.py file. When displaying the
points after the calculation, you need to go through a process similar to what
you used to display points on Tkinter (Code 13.5)
def bboxcheck(self,layer):
if self.minx>layer.maxx or self.miny>layer.maxy or self.
maxx<layer.minx or self.maxy<layer.miny:
return False
else:
return True
def intersect(self,layer):
intPoints = []
for feature1 in self.features:
for feature1 in self.features:
if feature1.bboxcheck(feature2) or
noBoundingboxCheck:
retPts = feature1.intersect(feature2)
if retPts:
for point in retPts:
intPoints.append(point)
return intPoints
CODE 13.4
Using bounding box check to filter out most intersection calculations in Layer class.
260 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
def vis(self):
self.can.delete('all')
self.calculate()
for layer in self.layers:
for feature in layer.features:
feature.vis(self, layer.color)
for point in self.intersectPoints:
xy = self.transform(point)
self.can.create_rectangle(xy[0]-4, xy[1]-4, xy[0]+4,
xy[1]+4, fill='brown')
self.can.pack()
CODE 13.5
Visualize the intersecting point as rectangles when displaying the data layers.
FIGURE 13.2
Visualization result of Code 13.5.
PROBLEMS
The objective of this homework is to understand and design a comprehen-
sive performance tuning and management process.
1. Please select four different datasets or use the four datasets provided.
2. Design a comprehensive performance testing experiment.
3. Conduct the tests using the MiniGIS package.
4. Compare the performance improvements before and after adopting
the techniques.
5. Explain the performance differences and discuss the trade-off when
using different techniques.
NO T E S : The results may differ according to the datasets selected and the com-
puters used to run the MiniGIS. The applicability of the three d
ifferent cat-
egories of techniques will determine the final performance of the software.
14
Advanced Topics
GIS algorithms and programming are critical to the research and develop-
ment in advancing geographical information sciences (GIScience), because
most mature GIS software packages are not flexible enough to be revised for
the purpose of testing new ideas, models, and systems. This chapter intro-
duces how GIS programming and algorithms are utilized in advancing sev-
eral GIScience frontiers.
265
266 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
(a) (b)
FIGURE 14.1
(a) The architecture of classic data model. (b) The architecture of the improved data model.
(From http://www.unidata.ucar.edu/software/netcdf/papers/nc4_conventions.html.)
altitude) and one temporal dimension (time). Each layer in the array has the
same latitude and longitude dimension, and is called the “grid,” which is
used to store the values for each of the layer-specific variables. These grids
are further grouped by altitude. The variables may have additional attributes
to explain more properties, such as the variable’s full name and the unit for
the variable’s value. However, the classic data model has two obvious limita-
tions: (1) lack of support for nested structure, ragged arrays, unsigned data
types, and user-defined types; and (2) limited scalability due to the flat name
space for dimensions and variables.
To address the limitations of the classic data model, a new data model
is proposed and implemented (Figure 14.1b). It adds a top-level, unnamed
group to contain additional named variables, dimensions, attributes, groups,
and types. Groups are like directories in a file system, each with its own set
of named dimensions, variables, attributes, and types. Every file contains at
least the root group. A group may also contain subgroups to organize hier-
archical datasets. The variables depicted in Figure 14.2 can be divided into
different groups by a certain characteristic, such as the model groups that
generated these data. When storing these variables in a physical file, each
2D grid will be decomposed into one-dimensional byte stream and stored
separately, one by one, in a data file.
Vertical
......
layer m
Advanced Topics
Longitude
Variable 1 Variable 2 Variable n
Latitude
Time
Vertical
layer 1 ......
Longitude
Variable 1 Variable 2 Variable n
FIGURE 14.2
267
Byte stream
FIGURE 14.3
The structure of spatiotemporal index.
HDF4, all variables at a certain time point can be stored in a single file instead
of multiple files.
To enable MERRA data, it can be analyzed in parallel without requiring
preprocessing; Li et al. (2016) utilized the netCDF library for Java to extract
the data structure information of MERRA files and build a spatiotemporal
index, which is implemented as a hash index. Figure 14.3 depicts the struc-
ture of the spatiotemporal index. The index consists of five components: gri-
dId, startByte, endByte, fileId, and nodeList. The first of these, gridId, records
the variable name, time, and altitude information for each grid; startByte
and endByte indicate the exact byte location of the grid in a file; fileId records
the file location where the grid is stored; and nodeList records the node loca-
tion where grids are physically stored in a Hadoop Distributed File System
(HDFS). The key for the hash index is gridId, and the others are treated as the
values. When querying the MERRA data, the spatiotemporal index will be
traversed first. If the gridId has the same variable name as one of the queried
variables, then the time and altitude information in gridId will be further
compared with the input spatiotemporal boundary. If they are within the
spatiotemporal boundary, the values (startByte, endByte, nodeList, and fileId)
will be fetched out and utilized to read the real data out from physical disks
using HDFS I/O API. The spatiotemporal index enables users to access data
by reading only the specific data constrained by the input spatiotemporal
boundary, eliminating the need to examine all of the data.
One month (January 2015) of the MAT1NXINT product (45.29 GB) was
used as experimental data. MapReduce was adopted to compute the daily
mean in parallel for a specified climate variable over a specified spatiotem-
poral range. Two scenarios were evaluated: the first scenario, as the baseline,
was performed without using the spatiotemporal index; the second was per-
formed using the spatiotemporal index. The experiments were conducted
on a Hadoop cluster (version 2.6.0) consisting of seven computer nodes (one
master node and six slave nodes) connected via 1 Gigabit Ethernet (Gbps).
Advanced Topics 269
1000
794
800
Time (second)
645
582
600
482
428
400 340
236
166
200 118
58.5 66 74 73 78 81.5 85.5
55 51 53.5
40
32
0
11 21 31 41 51 61 71 81 91 101 111
Number of variables processed
FIGURE 14.4
Run time for computing the daily global mean for January 2015 for different numbers of
variables.
Each node was configured with eight CPU cores (2.35 GHz), 16 GB RAM,
and CentOS 6.5.
Figure 14.4 shows the run times, comparing the baseline and the index-
ing approach for different numbers of variables. When increasing the
number of variables in the query, the run time for the baseline condition
increased from 55 to 1042 seconds—nearly 19 times longer. In contrast,
when the s patiotemporal index was employed, the run time increased from
35 seconds to 85 seconds—only 2.4 times longer. This comparison result
demonstrates that the spatiotemporal index can significantly improve the
data access rate for the MERRA data stored in HDFS. Two factors lead to
this improvement:
14.2.1 Data
Three datasets are used in this example: tweets in NYC during Chinese New
Year and the Asian population and number of Asian restaurants in NYC at
census tract level. Twitter data was collected using Twitter API; the Asian
population data were retrieved from U.S. Census Bureau; and the number
of Asian restaurants was acquired by using Google Places API (Figure 14.5).
To collect Twitter data, first, a geographical bounding box of NYC is speci-
fied to collect geotagged tweets from 1 week before to 1 week after Chinese
New Year. Second, we select a number of keywords (including “spring festi-
val,” “Chinese new year,” “dumplings,” “china lantern,” and “red envelope”)
to filter these collected tweets, ultimately collecting 2453 tweets related to
Spring Festival in New York City. Third, these filtered tweets are saved
into a PostgreSQL table, containing detailed information such as “Tweet_
ID,” “UserName,” “TimeCreated,” “Lat,” “Lon,” “Hashtag,” “Retweet,”
“ReplyTo,” and “Text.” Figure 14.6 gives an example of our Twitter data out-
put. Note that, although the number of geotagged tweets is still very small
Advanced Topics 271
FIGURE 14.5
Asian population in New York City at census tract level (left) and the number of Asian restau-
rants in New York City at census tract level (right).
FIGURE 14.6
An example of Twitter data output.
FIGURE 14.7
Geographical distribution of collected Twitter data in New York City. (From CartoDB.)
1
Bandwidth = 0.9* min SD , * Dm * n−0.2 (14.1)
ln( 2)
FIGURE 14.8
Heat map of tweets in New York City (red represents high density, while blue represents low
density).
provides a global model of the variable or process you are trying to under-
stand or predict (early death/rainfall), creating a single regression equation
to represent that process. Geographically weighted regression (GWR) is
one of several spatial regression techniques, and increasingly being used in
geography and other disciplines. GWR provides a local model of the vari-
able or process to be understood/predicted by fitting a regression equation
to every feature in the dataset. When used properly, these methods pro-
vide powerful and reliable statistics for examining and estimating linear
relationships.
In this case, OLS is applied as a starting point to explore the relationship
between the number of involved Twitter users and demographic variables
such as Asian population. To achieve this, data can be summarized by each
census tract so that a regression analysis can be conducted at the census tract
level. Two variables are selected to explain the number of tweets in each
census tract: one is Asian population, and the other is the number of Asian
restaurants, which is regarded as a proxy for Asians’ real-world activities. A
regular OLS model is performed initially, with the decision of whether or not
to employ the GWR model depending on the OLS results.
274 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
TABLE 14.1
Result of OLS Analysis
Variable Coef StdError t_Stat Prob
Number of restaurants 2.906641 0.093339 31.140589 0
Asian population 0.001557 0.000218 7.151176 0
Table 14.1 shows the result of an OLS analysis. As can be seen, the R2 value
for this OLS model is only 0.35, which means only 35% of the variation in the
dependent variable can be explained by these two independent variables.
However, strong spatial autocorrelation is found in the spatial distribution
of the standard residual for each census tract (Figure 14.9); thus, conducting
a GWR analysis is deemed worthwhile. The global R2 value for the GWR
model is 0.699, which is much higher than that for the OLS model (Figure
14.10), implying that these relationships behave differently in different parts
of the study area. Figure 14.10 shows the local R2 value for GWR for each
census tract. As can be seen, the highest local R2 is clustered in Manhattan
and part of Queens.
Standard residual
< –2.5 Std. Dev.
–2.5–1.5 Std. Dev.
–1.5–0.5 Std. Dev.
–0.5–0.5 Std. Dev.
0.5–1.5 Std. Dev.
1.5–2.5 Std. Dev.
>2.5 Std. Dev.
FIGURE 14.9
The standard residual of OLS of each census tract.
Advanced Topics 275
Local R2
.000015–.041491
.041492–.102610
.102611–.178814
.178815–.270943
.270944–.393207
.393208–.586986
.586987–.870157
FIGURE 14.10
The local R2 of GWR of each census tract.
Data retrieval
FIGURE 14.11
System architecture.
Advanced Topics 277
the Web, and using the standard REST ensures that data access remains
interoperable.
Several analysis functions consulted by data scientists, such as the model
validation Taylor diagram, are executed in the application server, which
leverages high-performance computing resources. This allows large climate
data to be processed much faster than a stand-alone analysis system. Data
analysis requests are sent to the application server through HTTP request.
The server side then executes analytical models and outputs the results
as resulting figures, file paths, or values, which are returned to the client
for rendering. On the client side, the system provides a user-friendly envi-
ronment with geovisual analytical tools, which contain interactive tools,
dynamic graphs/maps, and live-linked views of data representation.
This system has been implemented and is able to support several types of
climate data, including the MERRA data introduced in Section 14.1, Climate
Forecast System Reanalysis (CFSR) data, ECMWF Interim Reanalysis (ERA-
Interim) data, CPC Merged Analysis of Precipitation (CMAP) data, Global
Precipitation Climatology Project (GPCP) data, and ModelE simulation
data, all of which are raster data. CFSR provides a global reanalysis of past
weather from January 1979 through March 2011 at a horizontal resolution
of 0.5°, and can effectively estimate the observed state of the atmosphere.
ERA-INTRIM, which provides the reanalysis data from 1979 to the present, is
an atmospheric model and assimilation system featuring improved low-fre-
quency variability and stratospheric circulation analysis versus its previous
generation, ERA-40. CMAP merges five kinds of satellite estimates (GPI, OPI,
SSM/I scattering, SSM/I emission, and MSU) to provide the global gridded
precipitation data from 1979 to near the present with a 2.5° spatial resolution.
GPCP combines the data from rain gauge stations, satellites, and sounding
observations to estimate monthly rainfall on a 2.5° global grid from 1979
to the present. ModelE is a general circulation model (GCM) developed by
NASA GISS that simulates more than 300 variables on a global scale at a spa-
tial resolution of 4° along parallels and 5° along meridians. The outputs are
monthly binary data with a size of 16 MB.
• Time series plotting (Figure 14.12): Users can select multiple data
variables in multiple areas of interest (AOIs) for the same time period
and plot the time series for better comparison.
• Correlation analyses for two variables with the same AOI (Figure
14.13): Display the relationships of any two variables or two AOIs
using scatter plots.
278 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
FIGURE 14.12
Time series plotting for two variables and two AOIs.
FIGURE 14.13
Correlation analyses for two variables.
Advanced Topics 279
FIGURE 14.14
Four variables displayed in four windows.
FIGURE 14.15
GUI for Taylor diagram service.
280 Introduction to GIS Programming and Fundamentals with Python and ArcGIS®
Domain Subdomain
FIGURE 14.16
WRF-NMM domain decomposition for parallel running (NCAR, MMM Division, 2004).
(a) (b)
FIGURE 14.17
Two scheduling methods for dispatching eight subdomains to two computing nodes. (a) Non-
cluster scheduling of 8 subdomains to 2 computing nodes (yellow and blue), (b) Non-cluster
scheduling of 8 subdomains to 2 computing nodes (yellow and blue).
FIGURE 14.18
Different decomposition methods and their corresponding performance.
FIGURE 14.19
Low-resolution model domain area and subregions (AOI, area of interest,) identified for high-
resolution model execution.
Advanced Topics 285
PROBLEMS
Requirement:
Deliverables:
• Project Report
• Explain the paper problem.
• Explain the solution.
• Explain how the algorithms learned in class can be adopted to
develop the solution.
• Keep it as simple but clear as possible, and try to use a diagram
or picture in your report.
References
Agarwal, D., Puri, S., He, X., and Prasad, S.K. 2012. A system for GIS polygonal overlay
computation on Linux cluster-an experience and performance report. In Parallel
and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW), 2012
IEEE 26th International, 1433–9. IEEE, Shanghai, China.
Aho, A.V. and Ullman, J.D. 1972. The Theory of Parsing, Translation, and Compiling.
Upper Saddle River, NJ: Prentice Hall, Inc.
Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., and Saltz, J. 2013. Hadoop GIS:
A high performance spatial data warehousing system over MapReduce.
Proceedings of the VLDB Endowment 6(11):1009–20.
Arnold, K., Gosling, J., Holmes, D., and Holmes, D. 2000. The Java Programming
Language, Vol. 2. Reading, MA: Addison-Wesley.
Benedetti, A., Baldasano, J.M., Basart, S. et al. 2014. Operational dust prediction. In
Mineral Dust: A Key Player in the Earth System, eds. P. Knippertz, and W.J.-B.
Stuut, 223–65. Dordrecht: Springer Netherlands.
Bondy, J.A. and Murty, U.S.R. 1976. Graph Theory with Applications (290). New York:
Citeseer.
Bosch, A., Zisserman, A., and Munoz, X. 2007. Image classification using random for-
ests and ferns. International Conference on Computer Vision, Rio de Janeiro, Brazil.
Bourke, P. 1988. Calculating the Area and Centroid of a Polygon. Swinburne University of
Technology, Melbourne, Australia.
Chang, K.T. 2006. Introduction to Geographic Information Systems. Boston, MA: McGraw-
Hill Higher Education.
Chen, Z. and Guevara, J.A. 1987. Systematic selection of very important points (VIP)
from digital terrain models for construction triangular irregular networks.
Proceedings, AutoCarto 8, ASPRS/ACSM, Falls Church, VA, 50–6.
Crooks, A., Croitoru, A., Stefanidis, A., and Radzikowski, J. 2013. Earthquake: Twitter
as a distributed sensor system. Transactions in GIS 17:124–47.
Dale, M.R., Dixon, P., Fortin, M.J., Legendre, P., Myers, D.E., and Rosenberg, M.S.
2002. Conceptual and mathematical relationships among methods for spatial
analysis. Ecography 25(5):558–77.
Dee, D. and National Center for Atmospheric Research Staff. eds. The Climate Data
Guide: ERA-Interim. https://climatedataguide.ucar.edu/climate-data/era-interim
(accessed June 9, 2016).
Dijkstra, E.W. 1959. A note on two problems in connexion with graphs. Numerische
Mathematik 1:269–71.
Eckerdal, A., Thuné, M., and Berglund, A. 2005. What does it take to learn ‘program-
ming thinking’? In Proceedings of the First International Workshop on Computing
Education Research, 135–42. ACM, Seattle, WA.
ESRI. 1998. ESRI Shapefile Technical Description. An ESRI White Paper, 34.
ESRI. 2016a. What Is ModelBuilder?, http://pro.arcgis.com/en/pro-app/help/
analysis/geoprocessing/modelbuilder/what-is-modelbuilder-.htm (accessed
September 9, 2016).
ESRI. 2016b. What Is ArcPy?, http://pro.arcgis.com/en/pro-app/arcpy/get-started/
what-is-arcpy-.htm (accessed September 9, 2016).
287
288 References
Fowler, R.J. and Little, J.J. 1979. Automatic extraction of irregular network digital ter-
rain models. Computer Graphics 13:199–207.
Fowler, M. 2004. UML Distilled: A Brief Guide to the Standard Object Modeling Language.
Boston, MA: Addison-Wesley Professional.
Gittings, B.M., Sloan, T.M., Healey, R.G., Dowers, S., and Waugh, T.C. 1993. Meeting
expectations: A review of GIS performance issues. In Geographical Information
Handling–Research and Applications, ed. P.M. Mather, 33–45. Chichester: John
Wiley & Sons.
Goodchild, M.F. 1992. Geographical information science. International Journal of
Geographical Information Systems 6(1):31–45.
Gosselin, T.N., Georgiadis, G., and Digital Accelerator Corporation. 2000. Digital
data compression with quad-tree coding of header file. U.S. Patent 6,094,453.
Grafarend, E. 1995. The optimal universal transverse Mercator projection. In Geodetic
Theory Today, 51–51. Berlin, Heidelberg: Springer.
Guttman, A. 1984. R-trees: A dynamic index structure for spatial searching.
Proceedings of the 1984 ACM SIGMOD International Conference on Management of
Data 14(2):47–57. ACM.
Healey, R., Dowers, S., Gittings, B., and Mineter, M.J. eds. 1997. Parallel Processing
Algorithms for GIS. Bristol, PA: CRC Press.
Hearnshaw, H.M. and Unwin, D.J. 1994. Visualization in Geographical Information
Systems. Hoboken, NJ: John Wiley & Sons Ltd.
Hoeffding, W. 1963. Probability inequalities for sums of bounded random variables.
Journal of American Statistical Association 58:13–30.
Huang, Q., Yang, C., Benedict, K., Chen, S., Rezgui, A., and Xie, J. 2013b. Utilize cloud
computing to support dust storm forecasting. International Journal of Digital
Earth 6(4):338–55.
Huang, Q., Yang, C., Benedict, K. et al. 2013a. Using adaptively coupled models and
high-performance computing for enabling the computability of dust storm
forecasting. International Journal of Geographical Information Science 27(4):765–84.
Hutchins, W.J. 1986. Machine Translation: Past, Present, Future, 66. Chichester: Ellis Horwood.
Hwang, K. and Faye, A. 1984. Computer Architecture and Parallel Processing, Columbus,
OH: McGraw-Hill.
Jack, K. 2011. Video Demystified: A Handbook for the Digital Engineer. Burlington, MA:
Elsevier.
Johnson, R.A. 1929. Modern Geometry: An Elementary Treatise on the Geometry of the
Triangle and the Circle, 173–6, 249–50, and 268–9. Boston, MA: Houghton Mifflin.
Kanan, C. and Cottrell, G.W. 2012. Color-to-grayscale: Does the method matter in
image recognition? PloS One 7(1):e29740.
Kernighan, B.W. and Ritchie, D.M. 2006. The C Programming Language. Upper Saddle
River, NJ: Prentice Hall.
Khalid, M. 2016. Map, Filter and Reduce. http://book.pythontips.com/en/latest/
map_filter.html (accessed September 3, 2016).
Knippertz, P. and Stuut, J.B.W. 2014. Mineral Dust. Dordrecht, Netherlands: Springer.
Lee, J. 1991. Comparison of existing methods for building triangular irregular net-
work, models of terrain from grid digital elevation models. International Journal
of Geographical Information System 5(3):267–85.
Li, Z., Hu, F., Schnase, J.L. et al. 2016. A spatiotemporal indexing approach for effi-
cient processing of big array-based climate data with MapReduce. International
Journal of Geographical Information Science 1–19.
References 289
Lien, D.A. 1981. The Basic Handbook: Encyclopedia of the Basic Computer Language.
San Diego, CA: Compusoft Pub.
Linuxtopia. 2016. Set Operations. http://www.linuxtopia.org/online_books/
programming_books/python_programming/python_ch16s03.html (accessed
September 3, 2016).
Longley, P.A., Goodchild, M.F., Maguire, D.J., and Rhind, D.W. 2001. Geographic
Information System and Science. England: John Wiley & Sons, Ltd., 327–9.
McCoy, J., Johnston, K., and Environmental Systems Research Institute. 2001. Using
ArcGIS Spatial Analyst: GIS by ESRI. Redlands, CA: Environmental Systems
Research Institute.
Mitchell, J.C. 1996. Foundations for Programming Languages (1). Cambridge: MIT press.
Misra, P. and Enge, P. 2006. Global Positioning System: Signals, Measurements and
Performance Second Edition. Lincoln, MA: Ganga-Jamuna Press.
Nanjundiah, R.S. 1998. Strategies for parallel implementation of a global spectral
atmospheric general circulation model. High Performance Computing, 1998.
HIPC’98. 5th International Conference On 452–8. IEEE.
Neteler, M. and Mitasova, H. 2013. Open Source GIS: A GRASS GIS Approach (689).
New York: Springer Science and Business Media.
NOAA. 2011. Dust Storm Database. https://www.ncdc.noaa.gov/stormevents/
(accessed August 30, 2016).
Ostrom, E., Burger, J., Field, C.B., Norgaard, R.B., and Policansky, D. 1999. Revisiting
the commons: Local lessons, global challenges. Science 284(5412):278–82.
Peng, Z.R. 1999. An assessment framework for the development of Internet GIS.
Environment and Planning B: Planning and Design 26(1):117–32.
Pick, M. and Šimon, Z. 1985. Closed formulae for transformation of the Cartesian
coordinate system into a system of geodetic coordinates. Studia geophysica et
geodaetica 29(2):112–9.
Pountain, D. 1987. Run-length encoding. Byte 12(6):317–9.
Proulx, V.K. 2000. Programming patterns and design patterns in the introductory
computer science course. ACM SIGCSE Bulletin 32(1):80–4. ACM.
Python. 2001a. Built-in Functions. https://docs.python.org/3/library/index.html
(accessed September 3, 2016).
Python. 2001b. Errors and Exceptions. https://docs.python.org/2/tutorial/errors.
html (accessed September 3, 2016).
Pythoncentral. 2011. Python’s range() Function Explained. http://pythoncentral.io/
pythons-range-function-explained/ (accessed September 3, 2016).
PythonForBeginers. 2012. Reading and Writing Files in Python. http://www.
pythonforbeginners.com/files/reading-and-writing-files-in-python (accessed
September 3, 2016).
Raschka, S. 2014. A Beginner’s Guide to Python’s Namespaces, Scope Resolution, and
the LEGB Rule. http://sebastianraschka.com/Articles/2014_python_scope_
and_namespaces.html (accessed September 3, 2016).
Rawen, M. 2016. Programming: Learn the Fundamentals of Computer Programming
Languages (Swift, C++, C#, Java, Coding, Python, Hacking, Programming Tutorials).
Seattle, WA: Amazon Digital Services LLC, 50.
Rew, R. and Davis G. 1990. NetCDF: An interface for scientific data access. IEEE
Computer Graphics and Applications 10(4):76–82.
Ritter, N. and Ruth, M. 1997. The GeoTiff data interchange standard for raster geo-
graphic images. International Journal of Remote Sensing 18(7):1637–47.
290 References
291
292 Index
GUI, 15, 36, 56, 77, 79, 115, 125 Python programming, 153; see also
miscellaneous, 48–50 ArcGIS programming
object-oriented support, 35–36 command-line GUI, 115, 116
operators, 50–53 file-based programming, 116–117
program, 61, 69 hands-on experience with mini-GIS,
programming with ArcPy in Python 131–134
window outside ArcMap, interactive GUI, 115–116
156–157 package management and mini-GIS,
statements, 53 127–130
syntax, 36–39 Python IDE, 115–124
Python, hands-on experience with,
14–16, 56, 30, 70, 90, 148
Q
create GIS objects and check
intersection, 92–95 QGIS software packages, 213
input GIS point data from text file, Quad tree, 188–189
74–75
I/O, create and read file, 70–72
R
I/O, flow control, and file, 72–73
longest distance between any two RAM, data loading with, 257–258
points, 70 random module, 127
reading, parsing, and analyzing text Random. random() method, 95
file data, 90–92 range() function, 64–65
Python IDE, 115–124 Ranges of values process, 196–197
debugging, 118, 120–124 Raster, 185
fonts setup for coding, 118, 119 analysis, 196–198
general setting of programming attribute information of raster dataset
window, 118, 119 and calculating area, 200–204
highlighting, 117–118 color renders, 198–199
python programming windows, conversion between TIN and raster
115–117 surface models, 229–230
settings, 117 rendering, 194–196
Python IDLE, 77, 90 slope, 231
Python language control structure storage and compression, 186
control structures, 66–67 Raster data, 185–186, 265
exceptions, 69 analysis, 199–200
file input/output, 67–69 models, 10
hands-on experience with Python, 70 structure in NetCDF/HDF, 265–266
input GIS point data from text file, Raster data algorithm
74–75 attribute information of raster dataset
I/O, create and read file, 70–72 and calculating area, 200–204
I/O, flow control, and file, 72–73 BMP, 190
longest distance between any two color representation, 191–194
points, 70 GeoTIFF, 190
loops, 64–66 GIF files, 191
making decisions, 61–64 hands-on experience with ArcGIS, 198
Python modules, 124 IMG files, 190
module introduction, 125 JPEG files, 191
set up modules, 125–126 NetCDF, 190
system built-in modules, 126–127, 128 PNG, 191
300 Index