Part II - Spatial Data Models
Part II - Spatial Data Models
Part II - Spatial Data Models
1
Introduction
• Reality is so complex that one can never succeed in
representing it in every detail on a map or in a GIS.
• As a result one has to work with a spatial model
(simplification) of reality.
• A representation of part of the real world is model.
• The model allows us to operate on the model instead of the
real world.
• Using models, it is possible to test what happens under
various conditions.
– What if questions can also be answered.
• Modelling is the process of producing an abstraction of the
‘real world’ so that some part of it can be more easily handled.
• Very important components or parts of geographic
phenomenon represented by spatial model 2
…cont’d
• What is geographic phenomenon?
• We might define a geographic phenomenon as
something of interest that:
– can be named or described,
– can be georeferenced, and
– can be assigned a time (interval) at which it is/was
present.
3
…cont’d
• In order to be able to represent a geographic
phenomenon in a GIS, it requires us to state what it is, and
where it is.
• We must provide a description or at least a name on the
one hand, and a georeference on the other hand.
• Some phenomena manifest themselves essentially
everywhere in the study area, while others only do so in
certain localities.
– Based on the nature of the geographic phenomena, we can
classify it as geographic field and geographic object.
• If we define our study area as the equatorial Pacific
Ocean, we can say that Sea Surface Temperature can be
measured anywhere in the study area. Therefore, it is a
typical example of a (geographic) field. 4
…cont’d
5
Geographic objects
10
Inverse Distance Weighted
Interpolation (IDW)
•Accounting
• for “vicinity/nearness” by
(1) selecting points within a Kernel radius or
(2) a fixed number of “near” points (known
points)
•
•“Contribution of a point is the more decreased
the more distant it is from the unmeasured
location”
•
•Weight of each sample point is the inverse
proportion to the distance
•
•This is an exact interpolator
where d = 0 surface takes the value of the data
point
11
…cont’d
12
IDW Computation
Zi
Zj - estimated value for the unknown point at
•
i d ijn
location j
Zj =
• dij - distance between known point i and
unknown point j
1
n
• Zi - is the value at known point i
i
d ij
• n - user-defined exponent for weighting
• Fixed number of points normally
13
Characteristics of IDW
•Exact interpolator
•Interpolated values equal Zi
sample point values at the
sample locations dijn
Zj = = Zi
•Reduction of the formula at 1
sample point locations dijn
14
The object approach:
15
…cont’d
• The method is mostly used to represent phenomena that
occur on the earth’s surface as a collection of objects with
clear boundaries, in other words phenomena that are
characterised by a discrete spatial variation (e.g. parcels,
houses, roads,...).
– In the treatment and interpretation of data one will
often move from a data model, based on a field
approach, to a data model based on objects (model
transformation):
• Production of a soil map based on soil profile analysis
• Interpretation of land use or vegetation
characteristics, based on aerial photographs
• Construction of a DEM based on triangulation (see
further)
16
Visual interpretation of land use, based on a SPOT satellite image
17
Digital representation of a spatial data model
Creation of a raster
20
…cont’d
Rasterizing error for central point (c, d) and dominant unit rasterizing (e, f)
21
Types of Geographic Data
• What kinds of things we might want to store in a GIS?
– We may be interested in the location of houses on
a street,
– We might be interested in storing the boundaries
of a park reserve, or a city.
– We can also use a GIS to store things such as soil
pH variations over an area, or temperature
changes.
22
…cont’d
24
Vector Data
• The Vector data model is the most popular ways to
store geographic data.
• This model is ideally suited for representing discrete
objects.
• The Vector data model can only accurately represent
discrete objects.
• It uses points and edges to represent three basic
types of spatial features: points, lines, and polygons.
• All of these types are capable of storing attribute
data about the particular feature they represent.
25
Point Data
• Point data are data that can be represented as a single
location on a map.
• Point data can be used to represent house locations on
a street.
• This is by far the simplest data type and is very good for
storing data when all we are concerned with is the
location of a feature and not its length or width.
• Point data are zero dimensional and have no width,
length or height. 26
Line Data
• Features that have a location, a length, but no width are
represented in the vector model by lines.
• Examples of some features well represented by lines are
contours, administrative boundaries, roads, rivers, and
sewers.
• It is important to mention that while some line features
such as rivers and roads, may have an area, we use lines to
represent them at scales where their width cannot be
accurately reflected.
• Line features are stored using a collection of points called
nodes and vertices which each have their own unique
coordinate pair.
• Nodes are the endpoints of a line while vertices are
intermediate points located between the two end points 27
…cont’d
Polygon Data
Many more features we may wish to represent in a GIS are going
to have a width and an area associated with them.
29
…cont’d
• When choosing the resolution of the raster a proper
balance has to be achieved between the level of
detail of the terrain description and the amount of
data one will have to treat.
• Each cell in a raster can only have one attribute
value, corresponding to a particular theme
(attribute).
• To combine different themes one has to define
several rasters, one raster for each selected attribute.
• This leads to a typical layer structure, where each
layer corresponds to a single theme. This layer
structure is characteristic for geographical
information systems that are based on the raster
model. 30
…cont’d
• The raster model makes it computationally easy to
combine several themes that are represented by
different layers (overlay analysis).
• This explains the initial success of the model with
planners and landscape architects, who started
experimenting with it in the early ’60.
• They invented a technique which is presently known
as map algebra, and which is essential to spatial
analysis in a raster environment (see further). 31
…cont’d
32
Basic idea of map algebra
33
…cont’d
• In the vector approach spatial structures are
represented by means of objects. The objects are
described by three types of data:
– A unique identification code (ID), which allows
each object in the database to be identified (name,
number).
– A set of thematic characteristics (attributes) that
are linked to a specific object class.
– The geometry of the objects.
• The geometry of the objects is defined by means of a
number of so-called graphic primitives (geometric
building elements): points, nodes, segments, chains,
polygons. 34
…cont’d
• Different vector models are used to represent object
geometry, depending on the GIS-software one works
with. The vector data stored in the GIS software in two
ways: spaghetti model or ring model and the
topological model:
– Spaghetti model or ring model:
• All point, line and area objects are represented
by separate geometric elements, without explicit
definition of topology.
• This leads to data redundancy and complicates
editing work (risk for inconsistencies).
• All spatial relations need to be analyzed “on the
fly”. 35
Topological model:
38
Kinds of topological relationships
• There are four kinds of topological relationships
44
…cont’d
• The term “Data Analysis” is used here to describe the
collection of methods, techniques and approaches to
extract meaningful information from sets of data,
represented in geospatial form in modern GIS packages.
• In other words, the role of analysis in GIS is to turn data
into information and create new data by manipulating
collected data.
• Spatial Analysis has several levels of sophistication:
manipulation, queries, statistics and modelling.
• Spatial data manipulation is one of the classic GIS
capabilities. This includes spatial queries and
measurements, buffering and map layer overlay.
45
Vector data properties
• Vector analysis is based on vector data properties:
geometry and structure.
• Vector data models use mathematical primitives
(points and their x- and y-coordinates) to construct
fundamental geometric spatial features such as points,
lines and polygons.
• Polygons evolve from point and line geometric
primitives which compose its boundary using three line
segments as a minimum. 46
…cont’d
• The length of these lines defines the perimeter and
the area of the polygon.
• It is important to mention here that as a geospatial
feature, polygons have attributes which allow their
identification and manipulation.
• The location of a polygon in any given space is
defined by its centroid.
47
…cont’d
• Basic vector analysis is primarily based on proximity
operations and tools that are used to implement the
following fundamental spatial concepts:
– Buffering
– Overlay
– Distance measurement
– Pattern analysis
– Map manipulation
48
Buffering
• Buffering creates new polygons by expanding or
shrinking existing polygons or by creating polygons
from points and lines.
• Buffers are based on the concept of distance from
the neighbouring features.
• Buffers are generated for spatial analysis to address
proximity, connectivity and adjacency of features in a
geospatial place.
• A buffer is a spatial zone around a point, line or
polygon feature.
49
Figure. Point, line and polygon (area) buffers
50
…cont’d
• There are many variations of buffers. The shape and
size of buffers can be defined by variable distance
(distance based on a feature’s attribute), buffers can
be defined by multiple zones and can have dissolved
or merged boundaries.
• How does a buffer process work? Buffer processes
use mathematical algorithms to identify the space
around a selected landscape feature.
• First, features are selected for buffering through a
variety of selection processes. Then a buffer distance
is specified.
51
Figure . Variations of buffering
52
…cont’d
57
…cont’d
• Map Algebra
– Operand: rasters
– Operations: local, focal, zonal and global
• Image Algebra
– Operand: images
– Operations: crop, zoom, rotate
• There are four types of raster operations.
– Local: only those pixels that overlap a particular
pixel are used to calculate that pixel’s value (must
have multiple input rasters).
– Focal: all pixels in a predetermined neighbourhood
are used to calculate a pixel’s value.
58
…cont’d
– Zonal: use zones defined in one layer to make
calculations on another (variable shaped and sized
neighbourhoods).
– Global: all cells in a raster are used as inputs to
calculate the value of a single pixel.
59
Figure. Four types of raster operations
60
…cont’d
• Local operations:
– Perform calculation on single cell at a time
– Surrounding cells do not affect the calculation
– Can be applied to one raster layer or several
• Focal operations:
– Perform calculation on a single cell and its neighbouring
cells.
– Also known as local neighbourhood functions
• Zonal operations:
– Perform a calculation on a zone, which is a set of cells with
a common value
– Cells in a zone can be discontinuous
• Global operations:
– Perform calculations on the raster as a whole 61
…cont’d
62
…cont’d
63
…cont’d
64
…cont’d
• A zone is defined as a collection of cells within one
layer that all have the same attribute value.
• Zonal operations calculate a new value for a location
based on a specific characteristic of the zone to
which the location belongs. Some examples:
– Calculation of the area or the perimeter of a zone
(area, count / perim)
– Calculation of a summary value for a zone for a
specific attribute, based on an extra layer that
contains local values for the attribute (total,
average, standard deviation, minimum,...) (extract,
score)
65
Quiz
• Suppose you wish to produce a final product that
shows those areas with slopes greater than 20
degrees.
– What data are necessary to produce such a map?
– Show the procedures to reach to the final product.
• Suppose you wish to compute the percentage change
in the forest coverage of Ethiopia between 1950 and
2010.
– What data are necessary to execute the task?
– Show the procedures to reach to the final product.
66
Cartographic modelling
• Logically combining local, focal and zonal operations, in
such a way that the output of one operation becomes
the input of another one, relatively complex spatial
problems can be analyzed.
• A specific sequence of operations which allows one to
solve a particular spatial problem is called a cartographic
model.
• The process itself, which consists of defining a flow
chart, is called cartographic modelling.
• Cartographic modelling is often applied in projects
related to land evaluation and land allocation, where the
objective mostly is to define an optimal use of space,
based on multi-criteria analysis. 67
Implementing a cartographic model
1. Identify the map layers or spatial data sets which are
required.
2. Use logic and natural language to develop the process
of moving from the available data to a solution.
3. Set up a flow chart with steps to graphically represent
the above process. In the context of map algebra this
flow chart represents a series of equations you must
solve in order to produce the solution.
4. Annotate this flow chart with the commands necessary
to perform these operations within the GIS you are
using.
68
Cartographic modelling (flow chart)
69
…cont’d
• To explore cartographic modelling stages let us
consider a supermarket siting example. We can
complete stage one of the cartographic modelling
process by identifying four data layers:-
• land_use
• site_status
• river_map
• roads_map
• Stage two is completed by describing, in natural
language, a scheme of spatial operations required to
identify potential sites for the supermarket.
70
…cont’d
71
…cont’d
78
…cont’d
• For determining the factor weights one often uses
the method of Saaty (Saaty, 1977).
• The method is based on a pairwise estimation of the
relative importance of the different factors based on
a scale with 9 classes.
• From the obtained matrix an optimal set of weights is
derived (first eigenvector of the matrix).
• The consistency of pairwise comparisons can be
evaluated by means of an overall consistency ratio,
which according to Saaty should be smaller than 0.10
(Saaty, 1977).
• It is also possible to identify specific inconsistencies
in the matrix. 79
…cont’d
• A global suitability map, based on a weighted
combination of factor scores, can be recoded into a
map with qualitative suitability classes and/or,
through the definition of a threshold value (often
based on the area that should be allocated), can be
transformed into a Boolean suitability map.
• The method makes it possible to deal with different
priorities (ecological, social, economical,...) and also
allows one to study the impact of assigning more or
less weight to a particular criterion on the outcome
of the allocation process.
80
…cont’d
81
…cont’d
• When dealing with spatially conflicting objectives
(when one location fits several objectives) two
approaches for allocation are possible:
– Hierarchical approach: objective 1 has priority
over objective 2.
• Iterative increase/decrease of the threshold
value for the suitability map of objective 1 until
sufficient area is allocated to objective 1
• Idem for objective 2 in the remaining area
– Conflict approach: looking for a compromise
based on a decision heuristic
82
…cont’d
• Conflict approach:
– Identification of the best x hectares for objective 1 and y
hectares for objective 2 based on the two suitability maps
– Partitioning of the conflict areas based on distance in the
decision space to the ideal conditions for both objectives
– Iterative decrease of the threshold values for both
suitability maps and repeating of the allocation process
until the required area for each type of land use is
obtained
• Method allows to give more weight to one of both objectives
• Correct application of the procedure requires the “ranking”
(histogram equalization) of both suitability maps
83
…cont’d
85
Advantages …cont’d
• Flexible method that allows easy testing of “what if...?”
scenarios (e.g. by modifying the content of one or
more input layers or by changing some of the model
parameters)
• Further refinement/expansion of an existing model is
easy (by adding extra input layers and/or relations)
Disadvantages of cartographic modelling
• Strong deterministic assumptions of the method,
especially if only constraints are used (Boolean overlay)
• Recently a lot of research has been carried out to
define techniques that allow us to quantify the impact
of errors and uncertainties in input data and model
parameters on the outcome of the analysis.
86