Part II - Spatial Data Models

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 86

Spatial data models

1
Introduction
• Reality is so complex that one can never succeed in
representing it in every detail on a map or in a GIS.
• As a result one has to work with a spatial model
(simplification) of reality.
• A representation of part of the real world is model.
• The model allows us to operate on the model instead of the
real world.
• Using models, it is possible to test what happens under
various conditions.
– What if questions can also be answered.
• Modelling is the process of producing an abstraction of the
‘real world’ so that some part of it can be more easily handled.
• Very important components or parts of geographic
phenomenon represented by spatial model 2
…cont’d
• What is geographic phenomenon?
• We might define a geographic phenomenon as
something of interest that:
– can be named or described,
– can be georeferenced, and
– can be assigned a time (interval) at which it is/was
present.

3
…cont’d
• In order to be able to represent a geographic
phenomenon in a GIS, it requires us to state what it is, and
where it is.
• We must provide a description or at least a name on the
one hand, and a georeference on the other hand.
• Some phenomena manifest themselves essentially
everywhere in the study area, while others only do so in
certain localities.
– Based on the nature of the geographic phenomena, we can
classify it as geographic field and geographic object.
• If we define our study area as the equatorial Pacific
Ocean, we can say that Sea Surface Temperature can be
measured anywhere in the study area. Therefore, it is a
typical example of a (geographic) field. 4
…cont’d

• A geographic field is a geographic phenomenon for


which, for every point in the study area, a value can
be determined.

• Some common examples of geographic fields are air


temperature, barometric pressure and elevation.

5
Geographic objects

• Geographic objects populate the study area, and are


usually well-distinguished, discrete, and bounded
entities. The space between them is potentially
‘empty’ or undetermined.
• Many other phenomena do not manifest themselves
everywhere in the study area, but only in certain
localities.
• The array of buoys or markers is a good example:
there is a fixed number of buoys, and for each we
know exactly where it is located.
• The buoys are typical examples of (geographic)
objects. 6
How geographic phenomena described?
• A distinction is made between two fundamentally
different methods of describing spatial reality or
geographic phenomena. We can have two approaches,
the field approach and the object approach.
The field approach:
– Terrain characteristics are described in the form of
attributes. Attribute values are assigned to a set of
selected locations.
– The approach is based on the idea that each
attribute has a unique value at each location and
defines a field of attribute values.
7
…cont’d
– It is mostly applied to describe phenomena that
are characterized by a continuous spatial variation
(e.g. temperature, elevation,...).
– Attribute values are often sampled for an irregular
set of locations. The characteristics of the
sampling (sampling scheme, sampling density)
primarily depend on the sampling technique, the
expected spatial correlation between attribute
values and the accuracy required.
– Measured values for an irregular set of locations
are often transformed into a regular grid by means
of an interpolation method.
8
What is interpolation?
• Interpolation is the estimation of surface values at
unsampled points based on known surface values of
surrounding points.
• It is a technique of combining sampled values and
positions to estimate values at unmeasured
locations.
• Interpolation can be used to estimate elevation,
rainfall, temperature, chemical dispersion, or other
spatially-based phenomena.
• Interpolation is commonly a raster operation, but it
can also be done in a vector environment using a TIN
surface model.
9
…cont’d

Transformation of attribute values for an irregular


set of locations into a regular grid

10
Inverse Distance Weighted
Interpolation (IDW)
•Accounting
• for “vicinity/nearness” by
(1) selecting points within a Kernel radius or
(2) a fixed number of “near” points (known
points)

•“Contribution of a point is the more decreased
the more distant it is from the unmeasured
location”

•Weight of each sample point is the inverse
proportion to the distance

•This is an exact interpolator
where d = 0 surface takes the value of the data
point

11
…cont’d

12
IDW Computation
Zi
Zj - estimated value for the unknown point at

i d ijn
location j
Zj =
• dij - distance between known point i and
unknown point j
1
n
• Zi - is the value at known point i
i
d ij
• n - user-defined exponent for weighting
• Fixed number of points normally

13
Characteristics of IDW
•Exact interpolator
•Interpolated values equal Zi
sample point values at the
sample locations dijn
Zj = = Zi
•Reduction of the formula at 1
sample point locations dijn

14
The object approach:

• Objects are defined on the terrain. For each object


geometric as well as thematic characteristics are
determined.
– For each object class one has to decide how
objects belonging to that class will be spatially
represented : by means of points, lines or areas.
The choice depends on:
• the nature of the objects;
• the scale one is working on;
• the purpose of the analysis.

15
…cont’d
• The method is mostly used to represent phenomena that
occur on the earth’s surface as a collection of objects with
clear boundaries, in other words phenomena that are
characterised by a discrete spatial variation (e.g. parcels,
houses, roads,...).
– In the treatment and interpretation of data one will
often move from a data model, based on a field
approach, to a data model based on objects (model
transformation):
• Production of a soil map based on soil profile analysis
• Interpretation of land use or vegetation
characteristics, based on aerial photographs
• Construction of a DEM based on triangulation (see
further)
16
Visual interpretation of land use, based on a SPOT satellite image
17
Digital representation of a spatial data model

• Two alternative methods can be used to digitally


represent a spatial data model: the raster method and
the vector method.
• Both methods correspond to fundamentally different
concepts for the digital treatment and analysis of spatial
data.
• In the raster approach spatial structures are described by
means of a regular grid of cells that completely covers
the study area. 18
…cont’d
• Each cell receives a value that describes the thematic
content of that cell:
– In the case of a field based model each cell receives the
attribute value of its central location.
– In the case of an object based model the value assignment
depends on whether we are dealing with points, lines or
areas:
• Points: each cell that contains a point receives the attribute
value of that point.
• Lines: each cell that is cut by a linear object receives the
attribute value of that object.
• Areas: each cell receives the attribute value of the area that
occurs in the centre of the cell (central point method) or of
the area that covers the largest part of the cell (dominant
unit method). 19
…cont’d

Creation of a raster
20
…cont’d

Rasterizing error for central point (c, d) and dominant unit rasterizing (e, f)
21
Types of Geographic Data
• What kinds of things we might want to store in a GIS?
– We may be interested in the location of houses on
a street,
– We might be interested in storing the boundaries
of a park reserve, or a city.
– We can also use a GIS to store things such as soil
pH variations over an area, or temperature
changes.

22
…cont’d

• Features stored in a GIS fall into two categories,


discrete and continuous.
• Discrete and continuous data can generally be
differentiated by whether or not the data are
uniform within its boundaries.
• Discrete data stores features that have well defined,
solid, unambiguous boundaries and are uniform
within them.
– A house is a house everywhere within its walls.
– A city is a city everywhere within its boundaries.
– Therefore a house and a city in this context are
discrete features.
23
…cont’d
• Continuous data varies continuously over an area.
– A continuous feature is not uniform within the
data extent.
– Soil pH will vary from one place to another.
– Air temperatures are not consistent throughout an
air mass.
– Both of these examples would be considered
continuous data because there is an infinite
amount of variation within them.

24
Vector Data
• The Vector data model is the most popular ways to
store geographic data.
• This model is ideally suited for representing discrete
objects.
• The Vector data model can only accurately represent
discrete objects.
• It uses points and edges to represent three basic
types of spatial features: points, lines, and polygons.
• All of these types are capable of storing attribute
data about the particular feature they represent.

25
Point Data
• Point data are data that can be represented as a single
location on a map.
• Point data can be used to represent house locations on
a street.
• This is by far the simplest data type and is very good for
storing data when all we are concerned with is the
location of a feature and not its length or width.
• Point data are zero dimensional and have no width,
length or height. 26
Line Data
• Features that have a location, a length, but no width are
represented in the vector model by lines.
• Examples of some features well represented by lines are
contours, administrative boundaries, roads, rivers, and
sewers.
• It is important to mention that while some line features
such as rivers and roads, may have an area, we use lines to
represent them at scales where their width cannot be
accurately reflected.
• Line features are stored using a collection of points called
nodes and vertices which each have their own unique
coordinate pair.
• Nodes are the endpoints of a line while vertices are
intermediate points located between the two end points 27
…cont’d

Figure . Lines can be connected at nodes to form networks.

Polygon Data
Many more features we may wish to represent in a GIS are going
to have a width and an area associated with them.

Examples of these are property boundaries, lakes, and political


boundaries. The GIS stores polygon data much the same way it
stores line data. The major difference between polygons and lines
however, is that a polygon must be composed of at least one line,
and must enclose an area.
28
Raster Data
• A raster model uses a grid of square cells to
store spatial data.
• The most common rasters show up as images
in web pages, and computer graphics.
• A raster is defined by:
– The co-ordinates of its origin
– The resolution (size) of the cells
– The dimension of the raster: number of columns
(x-direction) and rows (y-direction)

29
…cont’d
• When choosing the resolution of the raster a proper
balance has to be achieved between the level of
detail of the terrain description and the amount of
data one will have to treat.
• Each cell in a raster can only have one attribute
value, corresponding to a particular theme
(attribute).
• To combine different themes one has to define
several rasters, one raster for each selected attribute.
• This leads to a typical layer structure, where each
layer corresponds to a single theme. This layer
structure is characteristic for geographical
information systems that are based on the raster
model. 30
…cont’d
• The raster model makes it computationally easy to
combine several themes that are represented by
different layers (overlay analysis).
• This explains the initial success of the model with
planners and landscape architects, who started
experimenting with it in the early ’60.
• They invented a technique which is presently known
as map algebra, and which is essential to spatial
analysis in a raster environment (see further). 31
…cont’d

32
Basic idea of map algebra
33
…cont’d
• In the vector approach spatial structures are
represented by means of objects. The objects are
described by three types of data:
– A unique identification code (ID), which allows
each object in the database to be identified (name,
number).
– A set of thematic characteristics (attributes) that
are linked to a specific object class.
– The geometry of the objects.
• The geometry of the objects is defined by means of a
number of so-called graphic primitives (geometric
building elements): points, nodes, segments, chains,
polygons. 34
…cont’d
• Different vector models are used to represent object
geometry, depending on the GIS-software one works
with. The vector data stored in the GIS software in two
ways: spaghetti model or ring model and the
topological model:
– Spaghetti model or ring model:
• All point, line and area objects are represented
by separate geometric elements, without explicit
definition of topology.
• This leads to data redundancy and complicates
editing work (risk for inconsistencies).
• All spatial relations need to be analyzed “on the
fly”. 35
Topological model:

• Important topological relationships are explicitly


incorporated in the model (through the definition of
“nodes” and “chains”).
• Requires specific functionality for building topology.
• Avoids data redundancy, reduces the risk of
inconsistencies.
• Makes spatial analysis that requires topological information
computationally less complicated.
• To represent thematic characteristics (attributes), terrain
objects are grouped into object classes, each class with its
own typical attribute structure:
– Each object class is represented by a table.
– Each row in the table describes an object of the class
considered. 36
…cont’d
– Each column corresponds to an attribute, starting
with a unique identification code (ID) (“key” of the
table) which unambiguously identifies each object.
– In the most simple case the database contains one
table, which allows one to link thematic information
to the geometry of an object.
– Very often a set of tables is used which are linked to
each other by means of common attributes
(columns) (relational principle). This allows one:
• To define relationships between different objects.
• To link non-explicitly spatial objects (e.g. a person), to
a spatial object. This process is called geo-referencing.
• To define objects that consist of several geometric
elements.
37
…cont’d
– Most GIS-software makes it possible to store and manage
attribute information in an external, relational database
(e.g. Access, Microsoft SQL-server, Oracle, Informix,...) and
to connect this database to the GIS-software by means of a
table that provides the link between the geometric and the
thematic data.
• Also in the case of vector GIS spatial objects are often
thematically grouped in layers. In this case, topological
relationships between objects that are stored in different
layers are not explicitly recorded, irrespective of whether a
spaghetti model or a topological model is used to encode the
data. This implies that topological relationships have to
analyzed “on the fly” (spaghetti model) or that a topological
overlay of two layers has to be carried out (topological model).

38
Kinds of topological relationships
• There are four kinds of topological relationships

– Adjacency – which polygons are next to which?

– Connectivity – which lines connect to which?

– Containment – which features are within another


feature (e.g., “island polygons” -- an island within a
lake)
– Coincidence - which occupy the same space?
39
Ring model 40
Topological model 41
42
43
Vector Data Analysis

44
…cont’d
• The term “Data Analysis” is used here to describe the
collection of methods, techniques and approaches to
extract meaningful information from sets of data,
represented in geospatial form in modern GIS packages.
• In other words, the role of analysis in GIS is to turn data
into information and create new data by manipulating
collected data.
• Spatial Analysis has several levels of sophistication:
manipulation, queries, statistics and modelling.
• Spatial data manipulation is one of the classic GIS
capabilities. This includes spatial queries and
measurements, buffering and map layer overlay.

45
Vector data properties
• Vector analysis is based on vector data properties:
geometry and structure.
• Vector data models use mathematical primitives
(points and their x- and y-coordinates) to construct
fundamental geometric spatial features such as points,
lines and polygons.
• Polygons evolve from point and line geometric
primitives which compose its boundary using three line
segments as a minimum. 46
…cont’d
• The length of these lines defines the perimeter and
the area of the polygon.
• It is important to mention here that as a geospatial
feature, polygons have attributes which allow their
identification and manipulation.
• The location of a polygon in any given space is
defined by its centroid.

47
…cont’d
• Basic vector analysis is primarily based on proximity
operations and tools that are used to implement the
following fundamental spatial concepts:
– Buffering

– Overlay

– Distance measurement

– Pattern analysis

– Map manipulation
48
Buffering
• Buffering creates new polygons by expanding or
shrinking existing polygons or by creating polygons
from points and lines.
• Buffers are based on the concept of distance from
the neighbouring features.
• Buffers are generated for spatial analysis to address
proximity, connectivity and adjacency of features in a
geospatial place.
• A buffer is a spatial zone around a point, line or
polygon feature.

49
Figure. Point, line and polygon (area) buffers
50
…cont’d
• There are many variations of buffers. The shape and
size of buffers can be defined by variable distance
(distance based on a feature’s attribute), buffers can
be defined by multiple zones and can have dissolved
or merged boundaries.
• How does a buffer process work? Buffer processes
use mathematical algorithms to identify the space
around a selected landscape feature.
• First, features are selected for buffering through a
variety of selection processes. Then a buffer distance
is specified.

51
Figure . Variations of buffering
52
…cont’d

• The point and line buffers are the simplest form of


buffering with not much choice compared to
buffering for polygons.
• In this case, users may select whether a buffer is
created that represents:
– only the area outside of the polygon that is being
buffered;
– the area outside of the polygon plus the entire
area of the polygon;
– the buffer area that is created both inside and
outside of the polygon boundary.
• As buffers create separation zones around features,
the interest can be within or outside the buffer zone.53
Where and how are buffers useful?
• There are some examples:
• Buffering a proposed new road path to determine if
wetlands are within 50 meters of a proposed road.
• Buffering the point of discharge to determine if it is
within 100 m of a shellfish bed.
• Buffering stream systems to delineate the distance
herbicide operations must stay away from water
systems.
• Local buildings (particularly houses), roads,
agricultural fields, and orchards may also require
buffering.
54
Raster Data Analysis
• Raster models are used to represent continuous data
(such as elevation surfaces and fields) in ordinal or
rational form, classes and groups of thematic data (such
as forest species), and, finally, digital photographs and
images.
• Like vectors, raster can also be used to represent
fundamental graphic primitives such as points, lines and
polygons and confining cells for representing
corresponding boundaries.
55
Figure . Representing graphic primitives in raster and vector
data models 56
Raster operations
• Advantages of any modern GIS system is clearly seen
through its possibilities not only for displaying spatial
information, but analyzing and manipulating
geospatial data and information.
• GIS data manipulation uses map algebra and image
algebra.
• Algebra is a mathematical structure consisting of
operands and operations.
• Applied to geospatial data, this definition could be
loosely extended and interpreted as map algebra and
image algebra

57
…cont’d
• Map Algebra
– Operand: rasters
– Operations: local, focal, zonal and global
• Image Algebra
– Operand: images
– Operations: crop, zoom, rotate
• There are four types of raster operations.
– Local: only those pixels that overlap a particular
pixel are used to calculate that pixel’s value (must
have multiple input rasters).
– Focal: all pixels in a predetermined neighbourhood
are used to calculate a pixel’s value.
58
…cont’d
– Zonal: use zones defined in one layer to make
calculations on another (variable shaped and sized
neighbourhoods).
– Global: all cells in a raster are used as inputs to
calculate the value of a single pixel.

59
Figure. Four types of raster operations

60
…cont’d

• Local operations:
– Perform calculation on single cell at a time
– Surrounding cells do not affect the calculation
– Can be applied to one raster layer or several
• Focal operations:
– Perform calculation on a single cell and its neighbouring
cells.
– Also known as local neighbourhood functions
• Zonal operations:
– Perform a calculation on a zone, which is a set of cells with
a common value
– Cells in a zone can be discontinuous
• Global operations:
– Perform calculations on the raster as a whole 61
…cont’d

62
…cont’d

63
…cont’d

Creating a buffer around an object by combining


distance calculation and recoding

64
…cont’d
• A zone is defined as a collection of cells within one
layer that all have the same attribute value.
• Zonal operations calculate a new value for a location
based on a specific characteristic of the zone to
which the location belongs. Some examples:
– Calculation of the area or the perimeter of a zone
(area, count / perim)
– Calculation of a summary value for a zone for a
specific attribute, based on an extra layer that
contains local values for the attribute (total,
average, standard deviation, minimum,...) (extract,
score)
65
Quiz
• Suppose you wish to produce a final product that
shows those areas with slopes greater than 20
degrees.
– What data are necessary to produce such a map?
– Show the procedures to reach to the final product.
• Suppose you wish to compute the percentage change
in the forest coverage of Ethiopia between 1950 and
2010.
– What data are necessary to execute the task?
– Show the procedures to reach to the final product.
66
Cartographic modelling
• Logically combining local, focal and zonal operations, in
such a way that the output of one operation becomes
the input of another one, relatively complex spatial
problems can be analyzed.
• A specific sequence of operations which allows one to
solve a particular spatial problem is called a cartographic
model.
• The process itself, which consists of defining a flow
chart, is called cartographic modelling.
• Cartographic modelling is often applied in projects
related to land evaluation and land allocation, where the
objective mostly is to define an optimal use of space,
based on multi-criteria analysis. 67
Implementing a cartographic model
1. Identify the map layers or spatial data sets which are
required.
2. Use logic and natural language to develop the process
of moving from the available data to a solution.
3. Set up a flow chart with steps to graphically represent
the above process. In the context of map algebra this
flow chart represents a series of equations you must
solve in order to produce the solution.
4. Annotate this flow chart with the commands necessary
to perform these operations within the GIS you are
using.

68
Cartographic modelling (flow chart)
69
…cont’d
• To explore cartographic modelling stages let us
consider a supermarket siting example. We can
complete stage one of the cartographic modelling
process by identifying four data layers:-
• land_use
• site_status
• river_map
• roads_map
• Stage two is completed by describing, in natural
language, a scheme of spatial operations required to
identify potential sites for the supermarket.
70
…cont’d

• The following figure shows how stage three is


completed by forming a flow chart to represent the
logic in a GIS project. It is sometimes easier to
visualise this with thumbnails of the data layers.

71
…cont’d

Figure . Flowchart of the operations needed to create a map


identifying suitable locations for a supermarket. 72
Figure . Flow chart with thumbnails to create a map identifying
suitable locations for a supermarket. 73
…cont’d
• Multi-criteria analysis is performed based on the following.
• Single-objective multi-criteria decision making, where
locations that are suited for one particular type of land use
are identified
– Example: identify the best location for growing a
particular type of crop, taking into account several
characteristics of the area (soil depth, soil type,
topography,...).
• Multi-objective multi-criteria decision making, where
different alternatives have to be compared, and one has to
deal with conflicting objectives (priorities)
– Example: defining a proper solution for turistic
development in an ecologically valuable area
74
…cont’d

Example of a single-objective multi-criteria analysis: definition


of a suitability map for maize based on FAO evaluation
procedures (1) (Burrough, 1986) 75
…cont’d
• In cartographic modelling, the criteria on which the
multi-criteria analysis is based are derived from
available thematic maps. A distinction must be made
between two types of criteria:
– Factors: represent one aspect of the degree of
suitability of a location for a certain objective and
can be measured on a discrete scale or on a
continuous scale (fuzzy criterion). The scores for
different factors are defined within the same
range (usually between 0 and 1 or between 0 and
255).
• Example: slope gradient determines the risk for
erosion.
76
…cont’d
– Constraints (restrictions): define the locations where
a certain objective cannot be reached.
• Examples:
– Location above a certain slope gradient are not
suitable for development (Boolean criterion)
– A zone should have an area of at least 20ha to
be suited for development or exploitation
(goal, target)
• Evaluating the suitability of a location based on a set
of criteria is done based on one or more decision
rules, which rely on the scores of a location for each
of the criteria.
77
…cont’d
• If only constraints are applied, use will be made of a
Boolean “AND” (multiply).
• If factors are used, then the evaluation will mostly be
based on a combined suitability index that measures
the effect of different criteria. Such an index can be
defined in different ways:
– Determination of the maximal or minimal factor
score (worst-case scenario): the most restrictive
factor determines the result
– Calculation of a weighted linear combination of
factor scores (trade-off scenario):

78
…cont’d
• For determining the factor weights one often uses
the method of Saaty (Saaty, 1977).
• The method is based on a pairwise estimation of the
relative importance of the different factors based on
a scale with 9 classes.
• From the obtained matrix an optimal set of weights is
derived (first eigenvector of the matrix).
• The consistency of pairwise comparisons can be
evaluated by means of an overall consistency ratio,
which according to Saaty should be smaller than 0.10
(Saaty, 1977).
• It is also possible to identify specific inconsistencies
in the matrix. 79
…cont’d
• A global suitability map, based on a weighted
combination of factor scores, can be recoded into a
map with qualitative suitability classes and/or,
through the definition of a threshold value (often
based on the area that should be allocated), can be
transformed into a Boolean suitability map.
• The method makes it possible to deal with different
priorities (ecological, social, economical,...) and also
allows one to study the impact of assigning more or
less weight to a particular criterion on the outcome
of the allocation process.

80
…cont’d

81
…cont’d
• When dealing with spatially conflicting objectives
(when one location fits several objectives) two
approaches for allocation are possible:
– Hierarchical approach: objective 1 has priority
over objective 2.
• Iterative increase/decrease of the threshold
value for the suitability map of objective 1 until
sufficient area is allocated to objective 1
• Idem for objective 2 in the remaining area
– Conflict approach: looking for a compromise
based on a decision heuristic

82
…cont’d
• Conflict approach:
– Identification of the best x hectares for objective 1 and y
hectares for objective 2 based on the two suitability maps
– Partitioning of the conflict areas based on distance in the
decision space to the ideal conditions for both objectives
– Iterative decrease of the threshold values for both
suitability maps and repeating of the allocation process
until the required area for each type of land use is
obtained
• Method allows to give more weight to one of both objectives
• Correct application of the procedure requires the “ranking”
(histogram equalization) of both suitability maps

83
…cont’d

Example of a decision heuristic for solving spatial conflicts in a


multi-objective multicriteria analysis (Idrisi Guide) 84
Advantages and disadvantages of cartographic
modelling
• Advantages of cartographic modelling:
– Relative simplicity of the technique
– Requires a systematic analysis of the problem and
a clear definition of the data that are needed to
solve to problem
– The use of a flow chart offers insight into the
strategy that is followed and makes it easy for
others to examine (criticize) the approach
– Stimulates active participation of decision makers
in the analytical process

85
Advantages …cont’d
• Flexible method that allows easy testing of “what if...?”
scenarios (e.g. by modifying the content of one or
more input layers or by changing some of the model
parameters)
• Further refinement/expansion of an existing model is
easy (by adding extra input layers and/or relations)
Disadvantages of cartographic modelling
• Strong deterministic assumptions of the method,
especially if only constraints are used (Boolean overlay)
• Recently a lot of research has been carried out to
define techniques that allow us to quantify the impact
of errors and uncertainties in input data and model
parameters on the outcome of the analysis.
86

You might also like