0% found this document useful (0 votes)
50 views54 pages

Spatial Data Types

The document discusses different ways of representing geographic information in a computer. It describes tessellations and vector representations as the two main approaches. For tessellations, it covers regular and irregular grids. Irregular grids like quadtrees are more adaptive but also more complex. Vector representations store explicit coordinates for geographic features. Triangulated irregular networks (TINs) are a hybrid approach used for representing terrain through triangles formed between elevation points.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views54 pages

Spatial Data Types

The document discusses different ways of representing geographic information in a computer. It describes tessellations and vector representations as the two main approaches. For tessellations, it covers regular and irregular grids. Irregular grids like quadtrees are more adaptive but also more complex. Vector representations store explicit coordinates for geographic features. Triangulated irregular networks (TINs) are a hybrid approach used for representing terrain through triangles formed between elevation points.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

Spatial Data Types

Learning Objectives
At the end of this lesson, you will be able to:

• Describe how regular and irregular tessellations may represent geographic phenomena;

• Recognize several vector representations

• Understand how geographic fields and geographic objects may be represented through
tessellations or vector representations

2
Computer Representations
of Geographic Information
Some geographic phenomena have the characteristics
of continuous functions over space.

Elevation is an example of a continuous function. It


can be measured at many locations, even within one’s
own backyard, and each location may give a different
value.
4
In order ton represent such a geographic phenomenon faithfully
in computer memory, we could either:

▫ Try to store as many observation pairs (location: elevation) as


possible,

▫ or try to find a symbolic representation of the elevation field


function, as a formula in x and y, which can be evaluated to give us
the elevation at any given (x, y) location.

5
The first approach suffers from the fact that
we will never be able to store all elevation
values for all locations; after all, there are
infinitely many locations.

The second approach suffers from the fact


that we do not know just what this function
should look like, and that it would be
extremely difficult to derive such a function
for larger areas.

6
HOW DO WE REPRESENT CONTINUOUS
FUNCTIONS IN GIS?
ELEVATION for those stored locations and INTERPOLATION for those
locations that are not stored.

Interpolation is made possible by a principle called spatial autocorrelation

▫ Tobler’s First Law of Geography


Refers to the fact that locations that are closer together are more
likely to have similar values than locations that are far apart

7
Example

▫ An obvious example of a phenomenon which exhibits


the property of spatial autocorrelation is sea surface
temperature, where one might expect a high degree of
correlation between measures taken close together.

8
In real life, these objects are usually not straight, and are
often erratically curved. A famous paradoxical question is whether one can actually
measure the length of Great Britain’s coastline.

In a computer, such random, curvilinear features can never be fully represented,


and usually require some degree of generalization.

From this it becomes clear that phenomena with intrinsic continuous and/or infinite
characteristics have to be represented with finite means (computer memory) for
computer manipulation, and any finite representation scheme is open to errors of
interpretation. 9
The two representation schemes
commonly used for geographic
phenomena are:

▫ Tessellation Approach
▫ Vector Approach

10
TESSELLATIONS
A tessellation (or tiling) is a partitioning of space
into mutually exclusive cells that together make up
the complete study space.

11
REGULAR TESSELLATIONS
In regular tessellations the cells are the same shape and size. The simplest
example is a rectangular raster of unit squares, represented in a computer
in the 2D case as an array of n x m elements

12
REGULAR TESSELLATIONS

The field value of a cell can be interpreted as one for the complete
tessellation cell, in which case the field is discrete, not continuous or
even differentiable.

A convention is needed to determine the value at the edge of the


cells. Typically, the lower and left boundaries belong to the cell (but
remember that this is just a convention).
13
REGULAR TESSELLATIONS

To improve on the continuity of the representation, we can do two things:


- make the cell size smaller, so as to make the ‘continuity gaps’
between the cells smaller, and/or
- assume that a cell value only represents elevation for one specific
location in the cell, and to provide a good interpolation function for all
other locations

14
ADVANTAGE
We know how they partition space, and we can make
our computations specific to this partitioning.

This leads to fast algorithms.

DISADVANTAGE
They are not adaptive to the spatial phenomenon we want to represent.

The cell boundaries are both artificial and fixed: they may or may not
coincide with the boundaries of the phenomena of interest.
15
IRREGULAR TESSELLATIONS
In irregular tessellations the cells may vary in size and shape. A well-known
example is the region quadtree where neighbouring cells that have the same
field value are represented as one bigger cell.

16
IRREGULAR TESSELLATIONS
Regular tessellations provide simple structures with straightforward
algorithms, which are, however, not adaptive to the phenomena they
represent. Essentially this means they might not represent the phenomena
in the most efficient way.

For this reason, substantial research effort has also been put into irregular
tessellations. Irregular tessellations are partitions of space into mutually
disjoint cells, but now the cells may vary in size and shape, allowing them
to adapt to the spatial phenomena that they represent.
17
IRREGULAR TESSELLATIONS

Irregular tessellations are more complex than the regular ones, but
they are also more adaptive, which typically leads to a reduction in the
amount of memory used to store the data.

18
Example

A well-known data structure in this family is the


region quadtree.It is based on a regular tessellation
of square cells, but takes advantage of cases where
neighbouring cells have the same field value, so
that they can together be represented as one bigger
cell.

The illustration shows a small 8 by 8 raster with


three possible field values: white, green and blue. 19
IRREGULAR TESSELLATIONS
The quadtree that represents this raster may be constructed
by repeatedly splitting up the area into four quadrants, which are
called NW, NE, SE, SW for obvious reasons.

This procedure stops when all the cells in a quadrant have the same field value.
The procedure produces an upside-down, tree-like structure, known as a
quadtree. In main memory, the nodes of a quadtree (both circles and squares in
the figure) are represented as records.

The links between them are pointers, a programming


technique to address (i.e. to point to) other records. 20
IRREGULAR TESSELLATIONS
Quadtrees are adaptive because they apply the spatial auto-
correlation principle, i.e. that locations that are near in space are
likely to have similar field values. When a conglomerate of cells has the
same value, they are represented together in the quadtree, provided boundaries
coincide with the predefined quadrant boundaries.

This is why we can also state that a quadtree provides a


nested tessellation: quadrants are only split if they have two or more values.

The square nodes at the same level represent equal area sizes, allowing quick
computation of the area associated with some field value. The top node of the tree
represents the complete raster 21
Vector Representation
Tessellations do not explicitly store georeferences of the phenomena they represent.
Instead, they provide a georeference of the lower left corner of the raster, for
instance, plus an indicator of the raster’s resolution, thereby implicitly providing
georeferences for all cells in the raster.

In vector (or topological - topology refers to the spatial relationships between


geographical elements in a data set that do not change under a continuous
transformation.) representations, an attempt is made to explicitly associate
georeferences with the geographic phenomena.

A georeference is a coordinate pair from some geographic space, and is also known
as a vector 22
Vector Representation
A commonly used data structure in GIS software is the triangulated irregular
network, or TIN, which can be considered a hybrid between tessellations
and vector representations. A TIN is one of the standard implementation
techniques for digital terrain models (DTM), but it can be used to represent
any continuous field.

The principles behind a TIN are simple. It is built from a set of locations for
which we have a measurement, for instance an elevation.

23
Vector Representation

24
The locations are usually not on a
nice regular grid. Any location
together with its elevation value can
be viewed as a point in a 3D space

From these 3D points, we can


construct an irregular tessellation
made of triangles.

25
How can we construct an irregular tessellation from a
set of locations?

In 3D space, three points uniquely determine a plane, as long as they are not
collinear, i.e. they must not be positioned on the same line. A plane fitted through
these points can be used to compute an approximation of elevation of other
locations.

So, it is wise to restrict the use of a plane to the triangular area ‘between’ the
three points.

If we restrict the use of a plane to the area between its three anchor points, we
obtain a triangular tessellation of the complete study space. Unfortunately, there
are many different tessellations for a given input set of anchor points. 26
Example
If we base our elevation computation for location P on the left hand shaded triangle below we
will get another value than from the right hand shaded triangle.

The tessellation on the right will provide a better approximation because the average distance
from P to the three triangle anchors is smaller. 27
Delaunay Triangulation
The triangulation of this figure happens to be a
Delaunay triangulation, which in a sense is an
optimal triangulation. There are multiple ways of defining what such a
triangulation is (see, for instance, Preparata, and Shamos, 1985), but we suffice
here to state two important properties.

The first is that the triangles are as equilateral


(‘equal-sided’) as they can be given the set of anchor
points. The second property is that for each triangle, the circumcircle through its
three anchor points does not contain any other anchor point. One such
circumcircle is represented by the dotted circle around the triangle.
28
Delaunay Triangulation
A TIN is clearly a vector representation: each anchor point
has a stored georeference. Yet, we might also call it an
irregular tessellation, as the chosen triangulation provides
a partitioning of the entire study space.

However, in this case, the cells do not have an associated stored value as is typical of
tessellations, but rather a simple interpolation function that uses the elevation values
of its three anchor points.

29
Point Representation
Points are defined as single coordinate pairs (x; y) when we work in 2D, or
coordinate triplets (x; y; z) when we work in 3D.

Points are used to represent objects that are best described as shape- and sizeless,
1D features.

Besides the georeference, usually extra data is stored for each point object. This
so-called attribute, or thematic data, can capture anything that is considered
relevant about the object. For phone booth objects, this may include the owning
telephone company or the phone number.
30
Point Representation

31
Line Representation
Line data are used to represent one-dimensional objects such as
roads, railroads, canals, rivers and power lines. As for points, there is an issue of
relevance for the application and the scale that the application requires. For a tourist
city map, bus, subway and streetcar routes are likely to be relevant line features.

Some cadastral systems, on the other hand, may consider roads to be two-dimensional
features, i.e. having a width as well.

The two end nodes and zero or more internal nodes or vertices define a line. Other
terms for ’line’ that are commonly used in some GISs are polyline, arc or edge.

32
Line Representation
A node or vertex is like a point (as defined before) but it only serves to define the
line, and provide shape in order to obtain a better approximation of the actual
feature.

Many GISs store a line as a simple sequence of coordinates of its end nodes and
vertices, assuming that all its segments are straight. This is usually good enough.
When a single straight line segment is considered an unsatisfactory
representation, using multiple (smaller) line segments instead of only one can
help represent more complex features.
33
Line Representation
Collections of (connected) lines may represent phenomena that are best viewed
as networks.

With networks, specific types of interesting questions arise that have to do with
connectivity and network capacity. These relate to applications such as traffic
monitoring and watershed management.

With network elements—i.e. the lines that make up the network—extra values
are commonly associated like distance, quality of the link, or carrying capacity.
34
Line Representation

35
Area Representation
When area objects are stored using a vector approach, the usual technique is to
apply a boundary model. This means that each area feature is represented by
some arc/node structure that determines a polygon as the area’s boundary. The
example below illustrates a simple study with three area objects, represented by
polygon boundaries. Clearly, we expect additional data to accompany the area
data. Such information could be stored in database tables.

A simple but naïve representation of area features would be to list for each
polygon simply the list of lines that describes its boundary. Each line in the list
would, as before, be a sequence that starts with a node and ends with one,
possibly with vertices in between.
36
Polygon Representation
Observe that a polygon representation for an area object is yet another example
of a finite approximation of a phenomenon that inherently may have a curvilinear
boundary. In the case that the object can be perceived as having a fuzzy
boundary, a polygon is an even worse approximation, though potentially the only
one possible.

37
Boundary Model
The line with vertices (1, 2), that makes up
the boundary between the two polygons, is
the same, which means that, using a
polygon representation, the line would be
stored twice, namely once for each
polygon. This is a form of data duplication
- known as data redundancy - which is (at
least in theory) unnecessary, although it
remains a feature of some systems.
Boundary Model
There is another disadvantage to such
polygon-by-polygon representations. If we
want to find out which polygons border the
bottom left polygon, we have to do a rather
complicated and time-consuming analysis
comparing the vertex lists of all boundary
lines with that of the bottom left polygon
Boundary Model
When our data set has 5,000 polygons, with perhaps a
total of 25,000 boundary lines, even the fastest computers will take
their time in finding neighbouring polygons.

The boundary model is an improved representation that deals with


these disadvantages. It stores parts of a polygon’s boundary as non-
looping arcs and indicates which polygon is on the left and which is
on the right of each arc. A simple example of the boundary model is
provided below.
Example

The table illustrates a simple boundary model for the polygons A, B and
C. For each arc, we store the start and end node (as well as a vertex list,
but these have been omitted from the table), its left nd right polygon. The
‘polygon’ W denotes the outside world polygon. Obviously, real
coordinates for nodes (and vertices) will also be stored in another table.

41
42
Polygon representation - The boundary model
is sometimes also called the topological data model as it captures some topological
information, such as polygon neighbourhood.

Observe that it is a simple query to find all the polygons that are the neighbour of
some given polygon, unlike the case of the polygon representation.

43
Representing Geographic Fields and Objects
How to represent geographic phenomena?

Geographic fields can be represented through a tessellation, through a TIN or through a


vector representation. It is more common to use tessellations, notably rasters, for field
representation, but vector representations are in use too.

Geographic objects are usually implemented by vectors. Objects are identified by the
parameters of location, shape, size and orientation and many of these parameters can
be expressed in terms of vectors.

However, tessellations are commonly used for representing geographic objects as well

44
Representing Geographic Fields
Different shades of one color indicate different elevation values,
with darker areas indicating higher elevations.

The choice of a colour spectrum is only to make the illustration aesthetically pleasing;
real elevation values are stored in the raster, so instead we could have printed a real
number value in each cell. This would not have made the figure very legible, however.

A raster can be thought of as a long list of field values: actually, there should be m x n
such values. The list is preceded with some extra information, like a single
georeference as the origin of the whole raster, a cell size indicator, the integer values
for m and n, and a data type indicator for interpreting cell values.

45
Representing Geographic Fields
Rasters and quadtrees do not store the georeference of each cell, but
infer it from the information about the raster

Geographic fields may be represented by a vector representation as well.

This technique uses isolines of the field.

An isoline is a linear feature that connects the points with equal field value. When the
field is elevation, we also speak of contour lines (as shown in the figure on the left).
Both TINs and isoline representations use vectors
46
Representing Geographic Fields

Isolines as a representation mechanism are not very common, however. They


are in use as a geoinformation visualization technique (in mapping, for
instance), but commonly using a TIN for representing this type of field is the
better choice

47
Tessellations to Represent Geographic
Objects
Geographic objects can be captured using both rasters and vector
representations. Let’s take an example of one unprocessed digital image
and one classified raster of one agricultural area.

Unprocessed digital images contain many pixels, with each pixel carrying a
reflectance value.

Through various techniques digital images are processed into classified


images that can be stored in a GIS as a raster.

48
Tessellations to Represent Geographic
Objects

Image classification attempts to characterize each pixel into one of a finite


list of classes, thereby obtaining an interpretation of the contents of the
image.

An unprocessed digital image and a classified raster of an urban area

49
Tessellations to Represent Geographic Objects

These figures illustrate the unprocessed image as well as a classified


version of the image.

In this example, the classes recognized are urban land use classes.

50
Area Objects
Area objects are conveniently represented in raster,
albeit that area boundaries may appear as jagged
edges.

This is a typical by-product of raster resolution versus


area size, and artificial cell boundaries. One must be
aware, for instance, of the consequences for area size
computations: what is the precision with which the raster
defines the object’s size? 51
Line and Point Objects
Line and point objects are more awkward to represent using rasters. After
all, we could say that rasters are area-based, and geographic objects that
are perceived as lines or points are perceived to have zero area size.
Standard classification techniques, moreover, may fail to recognize these
objects as points or lines.

Many GISs do offer support for line representations in raster, and


operations on them. Lines can be represented as strings of neighbouring
raster cells with equal value. In the image beside, you can view an actual
straight line (in black) and its representation (light green cells) in a raster.

52
Vector Representations for Geographic
Objects

Vector representations are the most


natural way to represent geographic
objects.

53
Example
In this figure, a number of geographic objects in the vicinity of a building
have been depicted.

These objects are represented as area representations in a boundary model.

Nodes and vertices of the polylines that make up the object’s boundaries are
not illustrated, though they obviously are stored.

54

You might also like