Spatial Data Types
Spatial Data Types
Learning Objectives
At the end of this lesson, you will be able to:
• Describe how regular and irregular tessellations may represent geographic phenomena;
• Understand how geographic fields and geographic objects may be represented through
tessellations or vector representations
2
Computer Representations
of Geographic Information
Some geographic phenomena have the characteristics
of continuous functions over space.
5
The first approach suffers from the fact that
we will never be able to store all elevation
values for all locations; after all, there are
infinitely many locations.
6
HOW DO WE REPRESENT CONTINUOUS
FUNCTIONS IN GIS?
ELEVATION for those stored locations and INTERPOLATION for those
locations that are not stored.
7
Example
8
In real life, these objects are usually not straight, and are
often erratically curved. A famous paradoxical question is whether one can actually
measure the length of Great Britain’s coastline.
From this it becomes clear that phenomena with intrinsic continuous and/or infinite
characteristics have to be represented with finite means (computer memory) for
computer manipulation, and any finite representation scheme is open to errors of
interpretation. 9
The two representation schemes
commonly used for geographic
phenomena are:
▫ Tessellation Approach
▫ Vector Approach
10
TESSELLATIONS
A tessellation (or tiling) is a partitioning of space
into mutually exclusive cells that together make up
the complete study space.
11
REGULAR TESSELLATIONS
In regular tessellations the cells are the same shape and size. The simplest
example is a rectangular raster of unit squares, represented in a computer
in the 2D case as an array of n x m elements
12
REGULAR TESSELLATIONS
The field value of a cell can be interpreted as one for the complete
tessellation cell, in which case the field is discrete, not continuous or
even differentiable.
14
ADVANTAGE
We know how they partition space, and we can make
our computations specific to this partitioning.
DISADVANTAGE
They are not adaptive to the spatial phenomenon we want to represent.
The cell boundaries are both artificial and fixed: they may or may not
coincide with the boundaries of the phenomena of interest.
15
IRREGULAR TESSELLATIONS
In irregular tessellations the cells may vary in size and shape. A well-known
example is the region quadtree where neighbouring cells that have the same
field value are represented as one bigger cell.
16
IRREGULAR TESSELLATIONS
Regular tessellations provide simple structures with straightforward
algorithms, which are, however, not adaptive to the phenomena they
represent. Essentially this means they might not represent the phenomena
in the most efficient way.
For this reason, substantial research effort has also been put into irregular
tessellations. Irregular tessellations are partitions of space into mutually
disjoint cells, but now the cells may vary in size and shape, allowing them
to adapt to the spatial phenomena that they represent.
17
IRREGULAR TESSELLATIONS
Irregular tessellations are more complex than the regular ones, but
they are also more adaptive, which typically leads to a reduction in the
amount of memory used to store the data.
18
Example
This procedure stops when all the cells in a quadrant have the same field value.
The procedure produces an upside-down, tree-like structure, known as a
quadtree. In main memory, the nodes of a quadtree (both circles and squares in
the figure) are represented as records.
The square nodes at the same level represent equal area sizes, allowing quick
computation of the area associated with some field value. The top node of the tree
represents the complete raster 21
Vector Representation
Tessellations do not explicitly store georeferences of the phenomena they represent.
Instead, they provide a georeference of the lower left corner of the raster, for
instance, plus an indicator of the raster’s resolution, thereby implicitly providing
georeferences for all cells in the raster.
A georeference is a coordinate pair from some geographic space, and is also known
as a vector 22
Vector Representation
A commonly used data structure in GIS software is the triangulated irregular
network, or TIN, which can be considered a hybrid between tessellations
and vector representations. A TIN is one of the standard implementation
techniques for digital terrain models (DTM), but it can be used to represent
any continuous field.
The principles behind a TIN are simple. It is built from a set of locations for
which we have a measurement, for instance an elevation.
23
Vector Representation
24
The locations are usually not on a
nice regular grid. Any location
together with its elevation value can
be viewed as a point in a 3D space
25
How can we construct an irregular tessellation from a
set of locations?
In 3D space, three points uniquely determine a plane, as long as they are not
collinear, i.e. they must not be positioned on the same line. A plane fitted through
these points can be used to compute an approximation of elevation of other
locations.
So, it is wise to restrict the use of a plane to the triangular area ‘between’ the
three points.
If we restrict the use of a plane to the area between its three anchor points, we
obtain a triangular tessellation of the complete study space. Unfortunately, there
are many different tessellations for a given input set of anchor points. 26
Example
If we base our elevation computation for location P on the left hand shaded triangle below we
will get another value than from the right hand shaded triangle.
The tessellation on the right will provide a better approximation because the average distance
from P to the three triangle anchors is smaller. 27
Delaunay Triangulation
The triangulation of this figure happens to be a
Delaunay triangulation, which in a sense is an
optimal triangulation. There are multiple ways of defining what such a
triangulation is (see, for instance, Preparata, and Shamos, 1985), but we suffice
here to state two important properties.
However, in this case, the cells do not have an associated stored value as is typical of
tessellations, but rather a simple interpolation function that uses the elevation values
of its three anchor points.
29
Point Representation
Points are defined as single coordinate pairs (x; y) when we work in 2D, or
coordinate triplets (x; y; z) when we work in 3D.
Points are used to represent objects that are best described as shape- and sizeless,
1D features.
Besides the georeference, usually extra data is stored for each point object. This
so-called attribute, or thematic data, can capture anything that is considered
relevant about the object. For phone booth objects, this may include the owning
telephone company or the phone number.
30
Point Representation
31
Line Representation
Line data are used to represent one-dimensional objects such as
roads, railroads, canals, rivers and power lines. As for points, there is an issue of
relevance for the application and the scale that the application requires. For a tourist
city map, bus, subway and streetcar routes are likely to be relevant line features.
Some cadastral systems, on the other hand, may consider roads to be two-dimensional
features, i.e. having a width as well.
The two end nodes and zero or more internal nodes or vertices define a line. Other
terms for ’line’ that are commonly used in some GISs are polyline, arc or edge.
32
Line Representation
A node or vertex is like a point (as defined before) but it only serves to define the
line, and provide shape in order to obtain a better approximation of the actual
feature.
Many GISs store a line as a simple sequence of coordinates of its end nodes and
vertices, assuming that all its segments are straight. This is usually good enough.
When a single straight line segment is considered an unsatisfactory
representation, using multiple (smaller) line segments instead of only one can
help represent more complex features.
33
Line Representation
Collections of (connected) lines may represent phenomena that are best viewed
as networks.
With networks, specific types of interesting questions arise that have to do with
connectivity and network capacity. These relate to applications such as traffic
monitoring and watershed management.
With network elements—i.e. the lines that make up the network—extra values
are commonly associated like distance, quality of the link, or carrying capacity.
34
Line Representation
35
Area Representation
When area objects are stored using a vector approach, the usual technique is to
apply a boundary model. This means that each area feature is represented by
some arc/node structure that determines a polygon as the area’s boundary. The
example below illustrates a simple study with three area objects, represented by
polygon boundaries. Clearly, we expect additional data to accompany the area
data. Such information could be stored in database tables.
A simple but naïve representation of area features would be to list for each
polygon simply the list of lines that describes its boundary. Each line in the list
would, as before, be a sequence that starts with a node and ends with one,
possibly with vertices in between.
36
Polygon Representation
Observe that a polygon representation for an area object is yet another example
of a finite approximation of a phenomenon that inherently may have a curvilinear
boundary. In the case that the object can be perceived as having a fuzzy
boundary, a polygon is an even worse approximation, though potentially the only
one possible.
37
Boundary Model
The line with vertices (1, 2), that makes up
the boundary between the two polygons, is
the same, which means that, using a
polygon representation, the line would be
stored twice, namely once for each
polygon. This is a form of data duplication
- known as data redundancy - which is (at
least in theory) unnecessary, although it
remains a feature of some systems.
Boundary Model
There is another disadvantage to such
polygon-by-polygon representations. If we
want to find out which polygons border the
bottom left polygon, we have to do a rather
complicated and time-consuming analysis
comparing the vertex lists of all boundary
lines with that of the bottom left polygon
Boundary Model
When our data set has 5,000 polygons, with perhaps a
total of 25,000 boundary lines, even the fastest computers will take
their time in finding neighbouring polygons.
The table illustrates a simple boundary model for the polygons A, B and
C. For each arc, we store the start and end node (as well as a vertex list,
but these have been omitted from the table), its left nd right polygon. The
‘polygon’ W denotes the outside world polygon. Obviously, real
coordinates for nodes (and vertices) will also be stored in another table.
41
42
Polygon representation - The boundary model
is sometimes also called the topological data model as it captures some topological
information, such as polygon neighbourhood.
Observe that it is a simple query to find all the polygons that are the neighbour of
some given polygon, unlike the case of the polygon representation.
43
Representing Geographic Fields and Objects
How to represent geographic phenomena?
Geographic objects are usually implemented by vectors. Objects are identified by the
parameters of location, shape, size and orientation and many of these parameters can
be expressed in terms of vectors.
However, tessellations are commonly used for representing geographic objects as well
44
Representing Geographic Fields
Different shades of one color indicate different elevation values,
with darker areas indicating higher elevations.
The choice of a colour spectrum is only to make the illustration aesthetically pleasing;
real elevation values are stored in the raster, so instead we could have printed a real
number value in each cell. This would not have made the figure very legible, however.
A raster can be thought of as a long list of field values: actually, there should be m x n
such values. The list is preceded with some extra information, like a single
georeference as the origin of the whole raster, a cell size indicator, the integer values
for m and n, and a data type indicator for interpreting cell values.
45
Representing Geographic Fields
Rasters and quadtrees do not store the georeference of each cell, but
infer it from the information about the raster
An isoline is a linear feature that connects the points with equal field value. When the
field is elevation, we also speak of contour lines (as shown in the figure on the left).
Both TINs and isoline representations use vectors
46
Representing Geographic Fields
47
Tessellations to Represent Geographic
Objects
Geographic objects can be captured using both rasters and vector
representations. Let’s take an example of one unprocessed digital image
and one classified raster of one agricultural area.
Unprocessed digital images contain many pixels, with each pixel carrying a
reflectance value.
48
Tessellations to Represent Geographic
Objects
49
Tessellations to Represent Geographic Objects
In this example, the classes recognized are urban land use classes.
50
Area Objects
Area objects are conveniently represented in raster,
albeit that area boundaries may appear as jagged
edges.
52
Vector Representations for Geographic
Objects
53
Example
In this figure, a number of geographic objects in the vicinity of a building
have been depicted.
Nodes and vertices of the polylines that make up the object’s boundaries are
not illustrated, though they obviously are stored.
54