0% found this document useful (0 votes)
22 views26 pages

PGIS Unit 4

This document discusses spatial data analysis in GIS, focusing on classification, retrieval, measurement, overlay, and neighborhood functions. It outlines various analytical capabilities of GIS, including classification of features, retrieval of data based on attributes, and measurement of geometric properties. The document also covers spatial selection queries and classification techniques, emphasizing the importance of defining patterns and relationships in spatial datasets.

Uploaded by

binaryycoder0106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views26 pages

PGIS Unit 4

This document discusses spatial data analysis in GIS, focusing on classification, retrieval, measurement, overlay, and neighborhood functions. It outlines various analytical capabilities of GIS, including classification of features, retrieval of data based on attributes, and measurement of geometric properties. The document also covers spatial selection queries and classification techniques, emphasizing the importance of defining patterns and relationships in spatial datasets.

Uploaded by

binaryycoder0106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Unit 4

Chapter 06
Spatial data analysis

6.1 Classification of analytical GIS capabilities

Classification is the procedure of identifying a set of features as belonging to a group


and defining patterns. Some form of classification function is provided in every GIS. In
a raster-based GIS, numerical values are often used to indicate classes. Classification is
important because it defines patterns.

There are many ways to classify the analytical functions of a GIS. It makes the following
distinctions:

1. Classification, retrieval, and measurement functions. All functions in this category


are performed on a single (vector or raster) data layer, often using the associated
attribute data.

• Classification allows the assignment of features to a class based on attribute values


or attribute ranges (definition of data patterns). Based on reflectance characteristics
found in a raster, pixels may be classified as representing different crops, such as
potato and maize.

• Retrieval functions allow the selective search of data. We might thus retrieve all
agricultural fields where potato is grown.

• Generalization is a function that joins different classes of objects with common


characteristics to a higher level (generalized) class. For example, we might generalize
fields where potato or maize, and possibly other crops, are grown as ‘food produce
fields’.

• Measurement functions allow the calculation of distances, lengths, or areas.

2. Overlay functions. These belong to the most frequently used functions in a GIS
application. They allow the combination of two (or more) spatial data layers
comparing them position by position and treating areas of overlap—and of non-
overlap—in distinct ways. Many GISs support over-lays through an algebraic language,
expressing an overlay function as a formula in which the data layers are the arguments.
In this way, we can find

• The potato fields on clay soils (select the ‘potato’ cover in the crop data layer and the
‘clay’ cover in the soil data layer and perform an intersection of the two areas found),
• The fields where potato or maize is the crop (select both areas of potato and ‘maize’
cover in the crop data layer and take their union),

• The potato fields not on clay soils (perform a difference operator of areas with
‘potato’ cover with the areas having clay soil),

• The fields that do not have potato as crop (take the complement of the potato areas).

3. Neighbourhood functions. Whereas overlays combine features at the same location,


neighbourhood functions evaluate the characteristics of an area surrounding a
feature’s location. A neighbourhood function ‘scans’ the neighbourhood of the given
feature(s), and performs a computation on it.

• Search functions allow the retrieval of features that fall within a given search window.
This window may be a rectangle, circle, or polygon.

• Buffer zone generation (or buffering) is one of the best known neighbourhood
functions. It determines a spatial envelope (buffer) around (a) given feature(s). The
created buffer may have a fixed width, or a variable width that depends on
characteristics of the area.

• Interpolation functions predict unknown values using the known values at nearby
locations. This typically occurs for continuous fields, like elevation, when the data
stored does not provide the direct answer for the location(s) of interest.

• Topographic functions determine characteristics of an area by looking at the


immediate neighbourhood as well. Typical examples are slope computations on
digital terrain models (i.e. continuous spatial fields). The slope in a location is defined
as the plane tangent to the topography in that location. Various computations can be
performed, such as:

–determination of slope angle,


–determination of slope aspect,
–determination of slope length,
–determination of contour lines.
These are lines that connect points with the same value (for elevation, depth,
temperature, barometric pressure, water salinity etc).

Connectivity functions: These functions work based on networks, including road


networks, water courses in coastal zones, and communication lines in mobile
telephony. These networks represent spatial linkages between features. Main
functions of this type include:
• Contiguity functions evaluate a characteristic of a set of connected spatial
units. One can think of the search for a contiguous area of forest of certain size
and shape in a satellite image.
• Network analytic functions are used to compute over connected line features
that make up a network. The network may consist of roads, public transport
routes, high voltage lines or other forms of transportation infrastructure.
Analysis of such networks may entail shortest path computations between two
points in a network for routing purposes.

• Visibility functions also fit in this list as they are used to compute the points
visible from a given location (view shed modelling or view shed mapping) using
a digital terrain model.

6.2 Retrieval, classification, and measurement:

Classification is the procedure of identifying a set of features as belonging to a group


and defining patterns. Some form of classification function is provided in every GIS. In
a raster-based GIS, numerical values are often used to indicate classes. Classification is
important because it defines patterns. One of the important functions of a GIS is to
assist in recognizing new patterns. Classification is done using single data layers, as
well as with multiple data layers as part of an overlay operation.

Generalization, also called map dissolve, is the process of making a classification less
detailed by combining classes. Generalization is often used to reduce the level of
classification detail to make an underlying pattern more apparent.

6.2.1 Measurement

Geometric measurement on spatial features includes counting, distance and area size
computations. In general, measurements on vector data are more advanced, thus, also
more complex, than those on raster data.

Measurements on vector data:

The primitives of vector data sets are point, (poly)line and polygon. Related geometric
measurements are location, length, distance and area size. Some of these are
geometric properties of a feature in isolation (location, length, area size); others
(distance) require two features to be identified. The location property of a vector
feature is always stored by the GIS: a single coordinate pair for a point, or a list of pairs
for a polyline or polygon boundary. Occasionally, there is a need to obtain the location
of the centroid of a polygon; some GISs store these also, others compute them ‘on-
the-fly’.
Length is a geometric property associated with polylines, by themselves, or in their
function as polygon boundary. It can obviously be computed by the GIS—as the sum
of lengths of the constituent line segments—but it quite often is also stored with the
polyline.

Area size is associated with polygon features. Again, it can be computed, but usually
is stored with the polygon as an extra attribute value. This speeds up the computation
of other functions that require area size values.

Another geometric measurement used by the GIS is the minimal bounding box
computation. It applies to polylines and polygons, and determines the minimal
rectangle—with sides parallel to the axes of the spatial reference system—that covers
the feature. This is illustrated in Figure

A common use of area size measurements is when one wants to sum up the area sizes
of all polygons belonging to some class. This class could be crop type: What is the size
of the area covered by potatoes? If our crop classification is in a stored data layer, the
computation would include (a) selecting the potato areas, and (b) summing up their
(stored) area sizes. Clearly, little geometric computation is required in the case of
stored features.

Measurements on raster data:

Spatial resolution refers to the dimension of the cell size representing the area covered
on the ground. Therefore, if the area covered by a cell is 5 x 5 meters, the resolution is
5 meters. The higher the resolution of a raster, the smaller the cell size and, thus, the
greater the detail.

Measurements on raster data layers are simpler because of the regularity of the cells.
The area size of a cell is constant, and is determined by the cell resolution. Horizontal
and vertical resolution may differ, but typically do not.
Location of an individual cell derives from the raster’s anchor point, the cell resolution,
and the position of the cell in the raster. Again, there are two conventions: the cell’s
location can be its lower left corner, or the cell’s midpoint. These conventions are set
by the software in use, and in case of low resolution data they become more important
to be aware of.

The area size of a selected part of the raster (a group of cells) is calculated as the
number of cells multiplied by the cell area size.

The distance between two raster cells is the standard distance function applied to the
locations of their respective mid-points, obviously considering the cell resolution.
Where a raster is used to represent line features as strings of cells through the raster,
the length of a line feature is computed as the sum of distances between consecutive
cells

6.2.2 Spatial selection queries

When exploring a spatial data set, the first thing one usually wants is to select certain
features, to (temporarily) restrict the exploration.

Interactive spatial selection

In interactive spatial selection, one defines the selection condition by pointing at or


drawing spatial objects on the screen display, after having indicated the spatial data
layer(s) from which to select features. The interactively defined objects are called the
selection objects; they can be points, lines, or polygons. The GIS Selection objects then
selects the features in the indicated data layer(s) that overlap (i.e. intersect, meet,
contain, or are contained in; see Figure) with the selection objects. These become the
selected objects.

Spatial selection by attribute conditions:

It is also possible to select features by using selection conditions on feature attributes.


These conditions are formulated in SQL if the attribute data reside in a geodatabase.
Figure shows an example of selection by attribute condition. The query expression is
Area<400000, which can be interpreted as “select all the land use areas of which the
size is less than 400,000.” The polygons in red are the selected areas; their associated
records are also highlighted in red. We use can this selected set of features as the basis
of further selection.

Combining attribute conditions:

When multiple criteria have to be used for selection, we need to carefully express all
of these in a single composite condition. The tools for this come from a field of
mathematical logic, known as propositional calculus. Above, we have seen simple,
atomic conditions such as Area<400000, and LandUse=80. Atomic conditions use a
predicate symbol, such as<(less than) or=(equals). Other possibilities are<=(less than
or equal), >(greater than), >=(greater than or equal) and <> (does not equal). Any of
these symbols is combined with an expression on the left and one on the right.

Spatial selection using topological relationships:

Various forms of topological relationship exist between spatial objects. These


relationships can be useful to select features as well. The steps carried out are:

1. To select one or more features as the selection objects, and

2. To apply a chosen spatial relationship function to determine the selected features


that have that relationship with the selection objects.

Selecting features that are inside selection objects: This type of query uses the
containment relationship between spatial objects. Obviously, polygons can contain
polygons, lines or points, and lines can contain lines or points, but no other
containment relationships are possible.

Figure illustrates a containment query. Here, we are interested in finding the location
of medical clinics in the area of Ilala District. We first selected all areas of Ilala District,
using the technique of selection by attribute condition District=“Ilala”. Then, these
selected areas were used as selection objects to determine which medical clinics (as
point objects) were within them.

Selecting features adjacent to selection objects:

Adjacency is the meet relation-ship. It expresses that features share boundaries, and
therefore it applies only to line and polygon features. We want to select all parcels
adjacent to an industrial area. The first step is to select that area (in dark green) and
then apply the adjacency function to select all land use areas (in red) that are adjacent
to it.

Selecting features based on their distance: One may also want to use the distance
function of the GIS as a tool in selecting features. Such selections can be searches with
in a given distance from the selection objects, at a given distance, or even beyond a
given distance. There is a whole range of applications to this type of selection, e.g.:

• Which clinics are within 2 kilo meters of a selected school? (Information needed for
the school emergency plan.)

•Which roads are within 200 meters of a medical clinic? (These roads must have a high
road maintenance priority.)

Afterthought on selecting features: Any set of selected features can be used as the
input for a subsequent selection procedure. This means, for instance, that we can
select all medical clinics first, then identify the roads within 200 meters, then select
from them only the major roads, then select the nearest clinics to these remaining
roads, as the ones that should receive our financial support. In this way, we are
combining various techniques of selection.

6.2.3 Classification

Classification is a technique of purposefully removing detail from an input dataset, in


the hope of revealing important patterns (of spatial distribution). In the process, we
produce an output data set, so that the input set can be left intact. We do so by
assigning a characteristic value to each element in the input set, which is usually a
collection of spatial features that can be raster cells or points, lines or polygons. If the
number of characteristic values is small in comparison to the size of the input set, we
have classified the input set. The pattern that we look for may be the distribution of
household income in a city. Household income is called the classification parameter.

The input data set may have itself been the result of a classification, and in such a case
we call it are classification. For example, we may have a soil map that shows different
soil type units and we would like to show the suitability of units for a specific crop. In
this case, it is better to assign to the soil units an attribute of suitability for the crop. A
second type of output is obtained when adjacent features with the same category are
merged into one bigger feature. Such post-processing functions are called spatial
merging, aggregation or dissolving.

User-controlled classification:

In user-controlled classification, a user selects the attribute(s) that will be used as the
classification parameter(s) and defines the classification method. The latter involves
declaring the number of classes as well as the correspondence between the old
attribute values and the new classes. This is usually done via a classification table.

Sometimes, one may want to perform classification only on a selection of features. In


such cases, there are two options for the features that are not selected. One option is
to keep their original values, while the other is to assign a null value to them in the
output data set. A null value is a special value that means that no applicable value is
present. Care must be taken to deal with these values correctly, both in computation
and in visualization.

Automatic classification:

User-controlled classifications require a classification table or user interaction. GIS


software can also perform automatic classification, in which a user only specifies
the number of classes in the output data set. The system automatically determines
the class break points. Two main techniques of determining break points are in use.

1. Equal interval technique: The minimum and maximum values vmin and vmax of the
classification parameter are determined and the (constant) interval size for each
category is calculated as (vmax−vmin)/n, where n is the number of classes chosen by the
user. This classification is useful in revealing the distribution patterns as it determines
the number of features in each category.

2. Equal frequency technique: This technique is also known as quantile classification.


The objective is to create categories with roughly equal numbers of features per
category. The total number of features is determined first and by the required number
of categories, the number of features per category is calculated. The class break points
are then determined by counting off the features in order of classification parameter
value. Both techniques are illustrated on a small 5×5 raster in Figure.

6.3 Overlay functions

Standard overlay operators take two input data layers, and assume they are
georeferenced in the same system, and overlap in study area. If either of these
requirements is not met, the use of an overlay operator is senseless. The principle of
spatial overlay is to compare the characteristics of the same location in both data
layers, and to produce a result for each location in the output data layer. The specific
result to produce is determined by the user. It might involve a calculation, or some
other logical function to be applied to every area or location. In raster data, these
comparisons are carried out between pairs of cells, one from each input raster. In
vector data, the same principle of comparing locations applies, but the underlying
computations rely on determining the spatial intersections of features from each input
layer.

6.3.1 Vector overlay operators:

In the vector domain, overlay is computationally more demanding than in the raster
domain. Here we will only discuss overlays from polygon data layers, but we note that
most of the ideas also apply to overlay operations with point or line data layers.
The standard overlay operator for two layers of polygons is the polygon intersection
operator. It is fundamental, as many other overlay operators proposed in the literature
or implemented in systems can be defined in terms of it. The principles are illustrated
in above Figure. The result of this operator is the collection of all possible polygon
intersections; the attribute table result is a join.

A second overlay operator is polygon overwrite. The result of this binary operator is
defined is a polygon layer with the polygons of the first layer, except where polygons
existed in the second layer, as these take priority. The principle is illustrated in Figure.

Most GISs do not force the user to apply overlay operators to the full polygon data
set. One is allowed to first select relevant polygons in the data layer, and then use the
selected set of polygons as an operator argument. The fundamental operator of all
these is polygon intersection. The others can be defined in terms of it, usually in
combination with polygon selection and/or classification. For instance, the polygon
overwrite of A by B can be defined as polygon intersection between A and B, followed
by a classification that prioritizes polygons in B, followed by a merge.

6.3.2 Raster overlay operators: GISs that support raster processing usually have a
language to express operations on rasters. These languages are generally referred to
as map algebra, or sometimes raster calculus. They allow a GIS to compute new rasters
from existing ones, using a range of functions and operators. Unfortunately, not
all implementations of map algebra offer the same functionality. When producing a
new raster we must provide a name for it, and define how it is computed. This is done
in an assignment statement of the following format:

Output_raster_name:=Map_algebra_expression.

The expression on the right is evaluated by the GIS, and the raster in which it results is
then stored under the name on the left. The expression may contain references to
existing rasters, operators and functions; the format is made clear below. The raster
names and constants that are used in the expression are called its operands.

Arithmetic operators:

Various arithmetic operators are supported. The standard ones are multiplication (×),
division (/), subtraction (−) and addition (+). Obviously, these arithmetic operators
should only be used on appropriate data values, and for instance, not on
classification values. Other arithmetic operators may include modulo division (MOD)
and integer division (DIV). Modulo division returns the remainder of division: for
instance,10 MOD 3will return 1 as 10−3×3 = 1. Similarly, 10 DIV 3 will return 3. More
operators are goniometric: sine (sin), cosine (cos), tangent (tan), and their inverse
functions asin, acos, and atan, which return radian angles as real values.

Comparison and logical operators:

Map algebra also allows the comparison of rasters cell by cell. To this end, we may
use the standard comparison operators (<,<=,=,>=,>and<>) that we introduced
before. A simple raster comparison assignment is: C:=A <> B.

It will store truth values—either true or false—in the output raster C. A cell value in C
will be true if the cell’s value in A differs from that cell’s value in B. It will be false if they
are the same. Logical connectives are also supported in most implementations of map
algebra.
Conditional expressions:

The above comparison and logical operators produce rasters with the truth values true
and false. In practice, we often need a conditional expression with them that allows us
to test whether a condition is fulfilled. The general format is:

Outputraster := CON (condition, then expression, else expression). Here, condition is


the tested condition, then expression is evaluated if condition holds, and else
expression is evaluated if it does not hold.

6.3.3 Overlays using a decision table:

Conditional expressions are powerful tools in cases where multiple criteria must be
taken into account. A small size example may illustrate this. Consider a suitability
study in which a land use classification and a geological classification must be used.
The respective rasters are illustrated in Figure. Domain expertise dictates that some
combinations of land use and geology resultin suitable areas, whereas other
combinations do not. In our example, forests on alluvial terrain and grassland on shale
are considered suitable combinations, while the others are not. We could produce the
output raster of Figure with a map algebra expression such as:

Suitability :=CON((Landuse= “Forest” AND Geology= “Alluvial”) OR

(Landuse= “Grass” AND Geology= “Shale”), “Suitable” , “Unsuitable”)

6.4 Neighbourhood functions:

There is another guiding principle in spatial analysis that can be equally useful. The
principle here is to find out the characteristics of the vicinity, here called
neighbourhood, of a location. After all, many suitability questions, for instance,
depend not only on what is at the location, but also on what is near the location. Thus,
the GIS must allow us ‘to look around locally’. To perform neighbourhood analysis, we
must:

1. State which target locations are of interest to us, and define their spatial extent,

2. Define how to determine the neighbourhood for each target,

3. Define which characteristic(s) must be computed for each neighbourhood.

Then, in the third step we indicate what it is we want to discover about the
phenomena that exist or occur in the neighbourhood. This might simply be its spatial
extent, but it might also be statistical information like:

• The total population of the area,

• Average household income, or

• The distribution of high-risk industries located in the neighbourhood.


Determining neighbourhood extent:

To select target locations, one can use the selection techniques. To obtain
characteristics from an eventually identified neighbourhood, the same techniques
apply. So what remains to be discussed here is the proper determination of a
neighbourhood.

6.4.1 Proximity computations:

In proximity computations, we use geometric distance to define the neighbourhood


of one or more target locations. The most common and useful technique is buffer
zone generation. Another technique based on geometric distance.

Buffer zone generation:

The principle of buffer zone generation is simple: we select one or more target
locations, and then determine the area around them, within a certain distance. In
Figure (a), a number of main and minor roads were selected as targets, and a 75 m
(resp., 25 m) buffer was computed from them. In some case studies, zonated buffers
must be determined, for instance in assessments of traffic noise effects. Most GISs
support this type of zonated buffer computation. An illustration is provided in Figure
(b). In vector-based buffer generation, the buffers themselves become polygon
features, usually in a separate data layer, that can be used in further spatial analysis.

Thiessen polygon generation:

Thiessen polygon partitions make use of geometric distance for determining


neighbourhoods. This is useful if we have a spatially distributed set of points as target
locations, and we want to know for each location in the study to which target it is
closest. This technique will generate a polygon around each target location that
identifies all those locations that ‘belong to’ that target. We have already seen the use
of Thiessen polygons in the context of interpolation of point data. Given an input
point set that will be the polygon’s midpoints, it is not difficult to construct such a
partition. It is even much easier to construct if we already have a Delaunay triangulation
for the same input point set.
6.4.2 Computation of diffusion:

The determination of neighbourhood of one or more target locations may depend not
only on distance—but also on direction and differences in the terrain in different
directions. This typically is the case when the target location contains a ‘source
material’ that spreads over time, referred to as diffusion. This ‘source material’ may be
air, water or soil pollution, Diffusion and spread commuters exiting a train station,
people from an opened-up refugee camp, a water spring uphill, or the radio waves
emitted from a radio relay station. In all these cases, one will not expect the spread to
occur evenly in all directions.

Diffusion computation involves one or more target locations, which are better called
source locations in this context. They are the locations of the source of whatever
spreads. The computation also involves a local resistance raster, which for each cell
provides a value that indicates how difficult it is for the ‘source -material’ to pass by
that cell.

Since ‘source material’ has the habit of taking the easiest route to spread, we must
determine at what minimal cost (i.e. at what minimal resistance) it may have
arrived in a cell. Therefore, we are interested in the minimal cost path. To determine
the minimal total resistance along a path from the source location csrc to an arbitrary
cell cx, the GIS determines all possible paths from csrc to cx, and then determines which
one has the lowest total resistance.

6.4.3 Flow computation:

Flow computations determine how a phenomenon spreads over the area, in


principle in all directions, though with varying difficulty or resistance. There are also
cases where a phenomenon does not spread in all directions, but moves or ‘flows’
along a given, least-cost path, determined again by local terrain characteristics. The
typical case arises when we want to determine the drainage patterns in a catchment:
the rainfall water ‘chooses’ a way to leave the area. This principle is illustrated with a
simple elevation raster, in Figure (a). For each cell in that raster, the steepest downward
slope to a neighbour cell is computed, and its direction is stored in a new raster (Figure
(b)). This computation determines the elevation difference between the cell and a
neighbour cell, and takes into account cell distance—1 for neighbour cells in N–S or
W–E direction,√2 for cells in NE–SW or NW-SE direction.

6.4.4 Raster based surface analysis:

Continuous fields have a number of characteristics not shared by discrete fields. Since
the field changes continuously, we can talk about slope angle, slope aspect and
concavity/ convexity of the slope. These notions are not applicable to discrete fields.

Applications:

There are numerous examples where more advanced computations on continuous


field representations are needed. A short list is provided below.

•Slope angle calculation: The calculation of the slope steepness, expressed as an


angle in degrees or percentages, for any or all locations.

•Slope aspect calculation: The calculation of the aspect (or orientation) of the slope
in degrees (between 0 and 360 degrees), for any or all locations.

•Slope convexity/concavity calculation: Slope convexity—defined as the change of


the slope (negative when the slope is concave and positive when the slope is convex)—
can be derived as the second derivative of the field.

•Slope length calculation: With the use of neighbourhood operations, it is possible


to calculate for each cell the nearest distance to a watershed boundary (the upslope
length) and to the nearest stream (the downslope length). This information is useful
for hydrological modelling.

•Hillshading is used to portray relief difference and terrain morphology in hilly and
mountainous areas.

•Three-dimensional map display: With GIS software, three-dimensional views of a


DEM can be constructed, in which the location of the viewer, the angle under which
s/he is looking, the zoom angle, and the amplification factor of relief exaggeration can
be specified.

•Determination of change in elevation through time: The cut-and-fill volume of soil


to be removed or to be brought in to make a site ready for construction can be
computed by overlaying the DEM of the site before the work begins with the DEM of
the expected modified topography.

•Automatic catchment delineation: Catchment boundaries or drainage lines can


be automatically generated from a good quality DEM with the use of
neighbourhood functions.

•Dynamic modeling: Apart from the applications mentioned above, DEMs are
increasingly used in GIS-based dynamic modelling, such as the computation of surface
run-off and erosion.

•Visibility analysis: A view shed is the area that can be ‘seen’—i.e. is in the direct line-
of-sight—from a specified target location. Visibility analysis determines the area visible
from a scenic lookout.

Filtering: The principle of filtering is quite similar to that of moving window


averaging. Again, we define a window and let the GIS move it over the raster cell-by-
cell. For each cell, the system performs some computation, and assigns the result of
this computation to the cell in the output raster.

Computation of slope angle and slope aspect: A different choice of weight factors
may provide other information. Special filters exist to perform computations on the
slope of the terrain. Slope angle, which is also known as slope gradient, is the angle α,
illustrated in Figure, between a path p in the horizontal plane and the sloping terrain.
The path p must be chosen such that the angle α is maximal. A slope angle can be
expressed as elevation gain in a percentage or as a geometric angle, in degrees or
radians.

The two respective formulas are:

Slope_perc= 100·(δf/δp) and slope_angle= arctan(δf/δp).

The path p must be chosen to provide the highest slope angle value, and thus it can
lie in any direction. The compass direction, converted to an angle with the North, of
this maximal down-slope path p is what we call the slope aspect.

6.5 Network analysis:

A completely different set of analytical functions in GIS consists of computations on


networks. A network is a connected set of lines, representing some geographic
phenomenon, typically of the transportation type. The ‘goods’ transported can be
almost anything: people, cars and other vehicles along a road network, commercial
goods along a logistic network, phone calls along a telephone network, or water
pollution along a stream/river network. Network analysis can be performed on either
raster or vector data layers, but they are more commonly done in the latter, as line
features can be associated with a network, and hence can be assigned typical
transportation characteristics such as capacity and cost per unit. A fundamental
characteristic of any network is whether the network lines are considered directed or
not. Directed networks associate with each line a direction of transportation;
undirected networks do not.

Various classical spatial analysis functions on networks are supported by GIS software
packages. The most important ones are:

1. Optimal path finding which generates a least cost-path on a network be-


tween a pair of predefined locations using both geometric and attribute data.

2. Network partitioning which assigns network elements (nodes or line segments) to


different locations using predefined criteria.

Optimal path finding: Optimal path finding techniques are used when a least-cost
path between two nodes in a network must be found. The two nodes are called origin
and destination, respectively. The aim is to find a sequence of connected lines to
traverse from the origin to the destination at the lowest possible cost. The cost function
can be simple: for instance, it can be defined as the total length of all lines on the path.
The cost function can also be more elaborate and take into account not only length of
the lines, but also their capacity, maximum transmission (travel) rate and other line
characteristics, for instance to obtain a reasonable approximation of travel time. There
can even be cases in which the nodes visited add to the cost of the path as well. These
may be called turning costs, which are defined in a separate turning cost table for each
node, indicating the cost of turning at the node when entering from one line and
continuing on another. This is illustrated in Figure.

Problems related to optimal path finding are ordered optimal path finding and
unordered optimal path finding. Both have an extra requirement that a number of
additional nodes needs to be visited along the path. In ordered optimal path finding,
the sequence in which these extra nodes are visited matters; in unordered optimal
path finding it does not. An illustration of both types is provided in Figure. Here, a path
is found from node A to node D, visiting nodes B and C.

Network partitioning:

In network partitioning, the purpose is to assign lines and/or nodes of the net-work,
in a mutually exclusive way, to a number of target locations. Typically, the target
locations play the role of service center for the network. This may be any type of
service: medical treatment, education, water supply. This type of network partitioning
is known as a network allocation problem.

Network allocation:

In network allocation, we have a number of target locations that function as resource


centres, and the problem is which part of the net-work to exclusively assign to which
service centre. This may sound like a simple allocation problem, in which a service
centre is assigned those line (segments) to which it is nearest, but usually the problem
statement is more complicated. These further complications stem from the
requirements to take into account

• The capacity with which a centre can produce the resources (whether they are
medical operations, school pupil positions, kilowatts, or bottles of milk), and

• The consumption of the resources, which may vary amongst lines or line segments.

Trace analysis: Trace analysis is performed when we want to understand which part
of a network is ‘conditionally connected’ to a chosen node on the network, known as
the trace origin. For a node or line to be conditionally connected, it means that a
path exists from the node/line to the trace origin, and that the connecting path
fulfills the conditions set. What these conditions are depends Tracing requires
connectivity on the application, and they may involve direction of the path, capacity,
length, or resource consumption along it. The condition typically is a logical
expression, as we have seen before, for instance:
• The path must be directed from the node/line to the trace origin,

• Its capacity (defined as the minimum capacity of the lines that constitute the path)
must be above a given threshold, and

• The path’s length must not exceed a given maximum length.

6.6 GIS and application models:

Here we define application models to include any kind of GIS based model (including
so-called analytical and process models) for a specific real-world application. Such a
model, in one way or other, describes as faithfully as possible how the relevant
geographic phenomena behave, and it does so in terms of the parameters. The nature
of application models varies enormously. GIS applications for famine relief programs,
for instance, are very different from earthquake risk assessment applications, though
both can make use of GIS to derive a solution. Many kinds of application models exist,
and they can be classified in many different ways. Here we identify five characteristics
of GIS-based application models:

1. The purpose of the model,

2. The methodology underlying the model,

3. The scale at which the model works,

4. Its dimensionality- i.e. whether the model includes spatial, temporal or spatial and
temporal dimensions, and

5. Its implementation logic- i.e. the extent to which the model uses existing knowledge
about the implementation context.

Purpose of the model refers to whether the model is descriptive, prescriptive or


predictive in nature. Descriptive models attempt to answer the “what is” -question.
Prescriptive models usually answer the “what should be” question by determining the
best solution from a given set of conditions.

Methodology refers to the operational components of the model. Stochastic models


use statistical or probability functions to represent random or semi-random behaviour
of phenomena. In contrast, deterministic models are based upon a well-defined cause
and effect relationship.

Rule-based models attempt to model processes by using local (spatial) rules. Cellular
Automata (CA) are examples of models in this category. These are often used to
understand systems which are generally not well understood, but for which their local
processes are well known.
Agent-based models(ABM) attempt to model movement and development of
multiple interacting agents (which might represent individuals), often using sets of
decision-rules about what the agent can and cannot do.

Scale refers to whether the components of the model are individual or aggregate in
nature. Essentially this refers to the ‘level’ at which the model operates. Individual-
based models are based on individual entities, such as the agent-based models.

Dimensionality is the term chosen to refer to whether a model is static or dynamic,


and spatial or aspatial. Some models are explicitly spatial, meaning they operate in
some geographically defined space. Some models are aspatial, meaning they have no
direct spatial reference. Models can also be static, meaning they do not incorporate a
notion of time or change. In dynamic models, time is an essential parameter. Dynamic
models include various types of models referred to as process models or simulations.

Implementation logic refers to how the model uses existing theory or knowledge to
create new knowledge. Deductive approaches use knowledge of the overall situation
in order to predict outcome conditions. This includes models that have some kind of
formalized set of criteria, often with known weightings for the inputs, and existing
algorithms are used to derive outcomes. Inductive approaches, on the other hand, are
less straightforward, in that they try to generalize in order to derive more general
models

6.7 Error propagation in spatial data processing

6.7.1 How errors propagate

A number of sources of error may be present in source data. It is important to note


that the acquisition of base data to a high standard of quality still does not guarantee
that the results of further, complex processing can be treated with certainty. As the
number of processing steps increases, it becomes difficult to predict the behaviour of
error propagation. These various errors may affect the outcome of spatial data
manipulations. In addition, further errors may be introduced during the various
processing steps.
Table lists common sources of error introduced into GIS analyses.

Consider another example. A land use planning agency is faced with the problem of
identifying areas of agricultural land that are highly susceptible to erosion. Such areas
occur on steep slopes in areas of high rainfall. The spatial data used in a GIS to obtain
this information might include:

•A land use map produced five years previously from 1 : 25,000 scale aerial
photographs,

•A DEM produced by interpolating contours from a 1 : 50,000 scale topographic map,


and

•Annual rainfall statistics collected at two rainfall gauges.

6.7.2 Quantifying error propagation

Various perspectives, motives and approaches to dealing with uncertainty have given
rise to a wide range of conceptual models and indices for the description and
measurement of error in spatial data. All these approaches have their origins in
academic research and have strong theoretical bases in mathematics and statistics.
Here we identify two main approaches for assessing the nature and amount of error
propagation:
1. Testing the accuracy of each state by measurement against the real world, and

2. Modelling error propagation, either analytically or by means of simulation


techniques.

Questions
1. What are the Classification, retrieval, and measurement functions of GIS?
2. Write a note on Overlay functions.
3. Explain the Neighbourhood and connectivity functions in GIS.
4. What do you mean by Measurement? How it will be done for Vector and Raster
data?
5. What do you mean by Spatial selection using topological relationships?
6. Explain Classification. What is User Controlled Classification?
7. What is Automatic Classification? What are its different techniques?
8. Differentiate between Vector and Raster Overlays.
9. Write a note on Neighbourhood function.
10. Explain Computation of diffusion and Flow computation.
11. Explain Raster based surface analysis with suitable example.
12. How will you compute slope angle and slope aspect?
13. Explain Network analysis.
14. What is GIS and application models? Explain with example.
15. How errors get propagated in Spatial Data Analysis?

Multiple Choice Questions:


1. ________________________ functions allow the calculation of distances, lengths,
or areas.
A. Classification
B. Retrieval
C. Measurement
D. Generalization
ANSWER: C

2. ________________________ allows the assignment of features to a class on the


basis of attribute values or attribute ranges. On the basis of reflectance
characteristics found in a raster, pixels may be classified.
A. Classification
B. Retrieval
C. Measurement
D. Generalization
ANSWER: A

3. ________________________ allow the combination of two or more spatial data


layers comparing them position by position, and treating areas of overlap and
of non-overlap in distinct ways.
A. Classification
B. Overlay
C. Measurement
D. Generalization
ANSWER: B

4. Examples of Neighbourhood functions are ______________________.


A. Search functions
B. Buffer zone
C. Interpolation
D. All of these
ANSWER: D

5. Topographic functions determine characteristics of an area by looking at the


immediate neighbourhood as well. Various computations that can be
performed are ______________________.
A. determination of slope angle
B. determination of slope length
C. determination of contour lines
D. All of these
ANSWER: D

6. ______________________ is a geometric property associated with polylines, by


themselves, or in their function as polygon boundary..
A. Length
B. Area
C. Both of these
D. None of these
ANSWER: A
7. Measurements on raster data layers are simpler because of the
________________ of the cells.
A. irregularities
B. regularities
C. polyline
D. poly area
ANSWER: B

8. In ___________________________________ , one defines the selection condition by


pointing at or drawing spatial objects on the screen display, after having
indicated the spatial data layer(s) from which to select features..
A. Simple
B. Non-interactive spatial selection
C. interactive spatial selection
D. None of these
ANSWER: C

9. “Select all the land use areas of which the size is less than 400,000” is an
example of ______________________.
A. Spatial selection by attribute conditions
B. Combining attribute conditions
C. Spatial selection using topological relationships
D. None of these
ANSWER: A

10. Techniques of Automatic classification are ______________________.


A. Equal interval technique
B. Equal frequency technique
C. Both of these
D. None of these
ANSWER: C

11. Standard arithmetic operators supported by Raster overlay operators are


__________________________.
A. multiplication (×)
B. division (/)
C. subtraction (−)
D. All of these
ANSWER: D

12. To perform neighbourhood analysis, we must ______________________.


A. State which target locations are of interest to us, and define their spatial
extent
B. Define how to determine the neighbourhood for each target
C. Define which characteristic(s) must be computed for each
neighbourhood
D. All of these
ANSWER: D

13. ______________________ determine how a phenomenon spreads over the


area, in principle in all directions, though with varying difficulty or resistance.
A. Area
B. Length
C. Flow computations
D. None of these
ANSWER: C

14. ______________________ is used to portray relief difference and terrain


morphology in hilly and mountainous areas.
A. Hillshading
B. Hill climbing
C. Area
D. Elevation
ANSWER: D

15. Various classical spatial analysis functions on networks are supported by


GIS software packages. The most important ones are:______________________.
A. Optimal path finding
B. Network partitioning
C. None of these
D. All of these
ANSWER: D

You might also like