0% found this document useful (0 votes)
30 views80 pages

PGIS Unit 3

The document is a comprehensive e-book on the principles of Geographic Information Systems, focusing on spatial referencing, positioning, data entry, and preparation. It covers essential topics such as coordinate systems, satellite-based positioning, data quality, and data preparation techniques. The e-book serves as an educational tool for learners in the field of IT and GIS, compiled by Ujwala Sav from Vidyalankar School of Information Technology.

Uploaded by

binaryycoder0106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views80 pages

PGIS Unit 3

The document is a comprehensive e-book on the principles of Geographic Information Systems, focusing on spatial referencing, positioning, data entry, and preparation. It covers essential topics such as coordinate systems, satellite-based positioning, data quality, and data preparation techniques. The e-book serves as an educational tool for learners in the field of IT and GIS, compiled by Ujwala Sav from Vidyalankar School of Information Technology.

Uploaded by

binaryycoder0106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

Principles of

Geographic
Information
Systems
MODULE-3: Spatial
Referencing and Positioning,
Data Entry and Preparation

Complied by: Ujwala Sav


[email protected]

Vidyalankar School of
Information Technology
Wadala (E), Mumbai
www.vsit.edu.in
Certificate
This is to certify that the e-book titled “Principles of
Geographic Information Systems”
comprises all elementary learning tools for a better understating of the
relevant concepts. This e-book is comprehensively compiled as per the
predefined eight parameters and guidelines.

Mrs. Ujwala Sav Date:08--03-2021


Assistant Professor
Department of IT

DISCLAIMER: The information contained in this e-book is compiled and


distributed for educational purposes only. This e-book has been designed to help learners
understand relevant concepts with a more dynamic interface. The compiler of this e-book
and Vidyalankar Institute of Technology give full and due credit to the authors of the
contents, developers and all websites from wherever information has been sourced. We
acknowledge our gratitude towards the websites YouTube, Wikipedia, and Google search
engine. No commercial benefits are being drawn from this project.
T.Y. B.Sc. IT Sem-6

Principles of Geographic Information Systems


Unit III

Contents:

Spatial Referencing and Positioning


Spatial Referencing: Reference surfaces for mapping, Coordinate Systems, Map
Projections, Coordinate Transformations,

Satellite-based Positioning: Absolute positioning, Errors in absolute positioning,


Relative positioning, Network positioning, code versus phase measurements,
Positioning technology.

Data Entry and Preparation


Spatial Data Input: Direct spatial data capture, Indirect spatial data capture,
Obtaining spatial data elsewhere

Data Quality: Accuracy and Positioning, Positional accuracy, Attribute accuracy,


temporal accuracy, Lineage, Completeness, Logical consistency

Data Preparation: Data checks and repairs, Combining data from multiple sources
Point Data Transformation: Interpolating discrete data, Interpolating continuous
data
Recommended Books

1. Principles of Geographic Information Systems: An introductory textbook, Editors:


Otto Huisman and Rolf A.
2. Principles of Geographical Information Systems – Peter A. Burrough (Author),
Rachael A. McDonnell (Author), Christopher D. Lloyd (Author)

Prerequisites and Linking:

Unit III Sem. II Sem. III Sem. IV Sem. V Sem. VI


Pre-requisites
Spatial referencing DBMS CGA ASQL Projects Projects
and positioning
UNIT-3

Chapter 4

4. Spatial referencing and positioning


4.1 Spatial referencing
4.1.1 Reference surfaces for mapping
4.1.2 Coordinate systems
4.1.3 Map projections
4.1.4 Coordinate transformations
4.2 Satellite-based positioning
4.2.1 Absolute positioning
4.2.2 Errors in absolute positioning
4.2.3 Relative positioning
4.2.4 Network positioning
4.2.5 Code versus phase measurements
4.2.6 Positioning technology

Chapter 5

5 Data entry and preparation


5.1 Spatial data input
5.1.1 Direct spatial data capture
5.1.2 Indirect spatial data capture
5.1.3 Obtaining spatial data elsewhere

5.2 Data quality


5.2.1 Accuracy and precision
5.2.2 Positional accuracy
5.2.3 Attribute accuracy
5.2.4 Temporal accuracy
5.2.5 Lineage
5.2.6 Completeness
5.2.7 Logical consistency
5.3 Data preparation
5.3.1 Data checks and repairs
5.3.2 Combining data from multiple sources
5.4 Point data transformation
5.4.1 Interpolating discrete data
5.4.2 Interpolating continuous data
UNIT-3

Chapter 4

Spatial referencing and positioning

Objectives:
i) To learn the relevance and actual use of reference surfaces,
Coordinate systems and coordinate transformations.
ii) To know about satellite-based positioning.
iii) Introduction of global positioning techniques.

4. Spatial referencing and positioning

In the early days of GIS, users were mainly handling spatially referenced data from a
single country. This data was usually derived from paper maps published by the
country’s mapping organization. Nowadays, GIS users are combining spatial data
from a given country with global spatial data sets, reconciling spatial data from
published maps with coordinates established with satellite positioning techniques
and integrating their spatial data with that from neighboring countries. To perform
these kinds of tasks successfully, GIS users need to understand basic spatial
referencing concepts.

4.1 Spatial referencing

One of the defining features of GIS is their ability to combine spatially referenced
data. A frequently occurring issue is the need to combine spatial data from
different sources that use different spatial reference systems. This section
provides a concept relating to the nature of spatial reference systems and the
translation of data from one spatial referencing system into another.

4.1.1 Reference surfaces for mapping

The surface of the Earth is anything but uniform. The oceans can be treated as
reasonably uniform, but the surface or topography of the land masses exhibits
large vertical variations between mountains and valleys. These variations make it
impossible to approximate the shape of the Earth with any reasonably simple
mathematical model. Consequently, two main reference surfaces have been
established to approximate the shape of the Earth.

The Geoid and ellipsoid

One reference surface is called the Geoid, the other reference surface is the
ellipsoid.

Figure 1: The Earth’s surface, and two reference surfaces used to


approximate it: the Geoid,

These are illustrated in Figure 1. Below, we look at and discuss the respective uses
of each of these surfaces. And a reference ellipsoid. The Geoid separation (N) is
the deviation between the Geoid and a reference ellipsoid

Video on Sea-level
Source: https://www.youtube.com/watch?v=q65O3qA0-n4
The Geoid and the vertical datum
We can simplify matters by imagining that the entire Earth’s surface is covered by
water. If we ignore tidal and current effects on this ‘global ocean’, the resultant
water surface is affected only by gravity. This has an effect on the shape of this
surface because the direction of gravity–more commonly known as plumb. Plumb
line line–is dependent on the mass distribution inside the Earth. Due to
irregularities or mass anomalies in this distribution the ‘global ocean’ results in an
undulated surface. This surface is called the Geoid (Figure 2). The plumb line
through any surface point is always perpendicular to it.

Figure 2: The Geoid,Exaggerated to illustrate the complexity of its surface.

The Geoid is used to describe heights. In order to establish the Geoid as reference
for heights, the ocean’s water level is registered at coastal places over several
years using tide gauges (mareographs). Averaging the registrations largely
eliminates variations of the sea level with time. The resulting water level rep-
resents an approximation to the Geoid and is called the mean sea level. For the
Netherlands and Germany, the local mean sea level is realized through the
Amsterdam tide-gauge (zero height). We can determine the height of a point in
Enschede with respect to the Amsterdam tide gauge using a technique known as
geodetic levelling (Figure 3). The result of process will be the height above local
mean sea level for the Enschede point. The height determined with respect to a
tide-gauge station is known as the orthometric height (height H above the Geoid).

Obviously, there are several realizations of local mean sea levels (also called local
vertical datums) in the world. They are parallel to the Geoid but offset by up to a
couple of meters. This offset is due to local phenomena such as ocean currents,
tides, coastal winds, water temperature and salinity at the location of the tide
gauge. Care must be taken when using heights from another local vertical datum.
For example, this might be the case in the border area of adjacent nations. Even
within a country, heights may differ depending on to which tide gauge, mean sea
level point, they are related. As an example, the mean sea level from the Atlantic
to the Pacific coast of the USA increases by 0.6 to 0.7 m. The tide gauge (zero
height) of the Netherlands differs -2.34 meters from the tide gauge (zero height)
of the neighboring country Belgium. The local vertical datum is implemented
through a levelling network (see Figure 3(a)). A levelling network consists of
benchmarks, whose height above mean sea level has been determined through
geodetic levelling. The implementation of the datum enables easy user access.
The surveyors do not need to start from scratch (i.e. from the Amsterdam tide-
gauge) every time they need to determine the height of a new point. They can
use the benchmark of the levelling network that is closest to the point of interest
(Figure 3(b)).

Figure 3: A levelling network implements a local vertical datum:

(a)network of levelling lines starting from the Amsterdam tide-gauge, showing


some of the benchmarks; (b) how the orthometric height (H) is determined for
some point, working from the nearest benchmark.

As a result of satellite gravity missions, it is currently possible to determine the


height (H) above the Geoid with centimeter level accuracy. It is foreseeable that a
global vertical datum may become ubiquitous in the next 10-15 years. If all
published maps are also using this global vertical datum by that time, heights will
become globally comparable, effectively making local vertical datums re-dundant
for GIS users.
The ellipsoid

Above, we have defined a physical surface, the Geoid, as a reference surface for
heights. We also need a reference surface for the description of the horizontal
coordinates of points of interest. Since we will later project these horizontal
coordinates onto a mapping plane, the reference surface for horizontal
coordinates requires a mathematical definition and description. The most
convenient geoOblate ellipsoid metric reference is the oblate ellipsoid (Figure 4).
It provides a relatively simple figure which fits the Geoid to a first order
approximation, though for small scale mapping purposes a sphere may be used.
An ellipsoid is formed when an ellipse is rotated about its minor axis. This ellipse
which defines an ellipsoid or spheroid is called a meridian ellipse.1

Figure 4: An oblate ellipse, defined by its semimajor axis a & semiminor axis b.

The shape of an ellipsoid may be defined in a number of ways, but in geodetic


practice the definition is usually by its semi-major axis and flattening (Figure 4).
Flattening f is dependent on both semi-major axis a and the semi-minor axis b.

The local horizontal datum


Ellipsoids have varying position and orientations. An ellipsoid is positioned and
oriented with respect to the local mean sea level by adopting a latitude ( ) and
longitude ( ) and ellipsoidal height (h) of a so-called fundamental point and an
azimuth to an additional point. We say that this defines a local horizontal datum.
Notice that the term horizontal datum and geodetic datum are being treated as
equivalent and interchangeable words.
Several hundred local horizontal datums exist in the world. The reason is obvious:
Different local ellipsoids with varying position and orientation had to be adopted to
best fit the local mean sea level in different countries or regions. An example is the
Potsdam Datum, the local horizontal datum used in Germany. The fundamental point
is in Rauenberg and the underlying ellipsoid is the Bessel ellipsoid (a = 6,377, 397.156
m, b = 6,356, 079.175 m). We can determine the latitude and longitude ( ; ) of any
other point in Germany with respect to this local horizontal datum using geodetic
positioning techniques, such as triangulation and trilateration. The result of this
process will be the geographic (or horizontal) coordinates ( ; ) of the new point in the
Potsdam Datum.

Video on datum https://www.youtube.com/watch?v=xKGlMp__jog

Triangulation networks

A local horizontal datum is realized through a triangulation network. Such a


network consists of monument points forming a network of triangular mesh
elements (Figure 6). The angles in each triangle are measured in addition to

at least one side of a triangle; the fundamental point is also a point in the
triangulation network. The angle measurements and the adopted coordinates of
the Fundamental point are then used to derive geographic coordinates ( ; ) for all
monument points of the triangulation network.
Within this framework, users do not need to start from scratch (i.e. from the
fundamental point) in order to determine the geographic coordinates of a new
point. They can use the monument of the triangulation network that is closest to
the new point. The extension and re-measurement of the network is nowadays
done through satellite measurements.

Figure 6: The old primary triangulation network in the Nether-lands made up of 77


points (mostly church towers). The extension and re-measurement of the network is
nowadays done through satellite measurements. Adapted from original figure by
‘Dutch Cadastre and Land Registers’ now called het Kadaster.
The global horizontal datum

Local horizontal datums have been established to fit the Geoid well over the area
of local interest, which in the past was never larger than a continent. With
increasing demands for global surveying activities are underway to establish
global reference surfaces. The motivation is to make geodetic results mutually
comparable and to provide coherent results also to other disciplines like
astronomy and geophysics.

The most important global (geocentric) spatial reference system for the GIS
community is the International Terrestrial Reference System (ITRS). It is a three-
dimensional coordinate system with a well-defined origin (the centre of mass ITRS
of the Earth) and three orthogonal coordinate axes (X; Y; Z). The Z-axis points
towards a mean Earth north pole. The X-axis is oriented towards a mean
Greenwich meridian and is orthogonal to the Z-axis. The Y -axis completes the
right-handed reference coordinate system (Figure 7a).

Figure 7: (a) The Inter national Terrestrial Reference System (ITRS), and;-
(b) the International Terrestrial Reference Frame (ITRF) visualized as a dis-
tributed set of ground control stations (represented by red points).

The ITRS is realized through the International Terrestrial Reference Frame (ITRF),
a distributed set of ground control stations that measure their position
continuously using GPS (Figure 7b).
Constant re-measuring is needed because of the involvement of new control
stations and ongoing geophysical processes (mainly tectonic plate motion) that
deform the Earth’s crust at measurable global, re-gional and local scales. These
deformations cause positional differences in time, and have resulted in more than
one realization of the ITRS. Examples are the ITRF ITRF96 or the ITRF2000. The
ITRF96 was established at the 1st of January, 1997.

This means that the measurements use data up to 1996 to fix the geocentric co-
ordinates (X, Y and Z in metres) and velocities (positional change in X, Y and Z in
metres per year) at the different stations. The velocities are used to prop-agate
the measurements to other epochs (times). The trend is to use the ITRF
everywhere in the world for reasons of global compatibility.

GPS uses the World Geodetic System 1984 (WGS84) as its reference system. It has
been refined on several occasions and is now aligned with the ITRF to within a
few centimetres worldwide. Global horizontal datums, such as the ITRF2000 or
WGS84, are also called geocentric datums because they are geocentrically
Geocentric datums with respect to the centre of mass of the Earth. They became
available

Since the size and shape of satellite orbits is directly related to the centre of mass
of the Earth, observations of natural or artificial satellites can be used to pinpoint
the centre of mass of the Earth, and hence the origin of the ITRS.3 This technique
can also be used for the realization of the global ellipsoids and datums at the
accuracy level required for large-scale mapping.

We can easily transform ITRF coordinates (X, Y and Z in metres) into geo-graphic
coordinates with respect to the GRS80 ellipsoid without the loss of accuracy.
However, the ellipsoidal height h, obtained through this straight-forward
transformation, has no physical meaning and does not correspond to intuitive
human perception of height. We therefore use the height H, above 3D spatial
referencing the Geoid (see Figure 8). It is foreseeable that global 3D spatial
referencing,in terms of ( ; ; H), could become ubiquitous in the next 10–15 years.
If all published maps are also globally referenced by that time, the underlying
spa-tial referencing concepts will become transparent and hence redundant for
GIS users.
Video on Coordinate System Jargon- geoid, datum, projection
Source:https://www.youtube.com/watch?v=Z41Dt7_R180

Figure 8: Height h above the geocentric ellipsoid, and height H above the Geoid. The first is
measured orthogonal to the ellipsoid, the second orthogonal to the Geoid.

Hundreds of existing local horizontal and vertical datums are still relevant because
they form the basis of map products all over the world. For the next few years we
will be required to deal with both local and global datums until the former are
eventually phased out. During the transition period, we will require tools to
transform coordinates from local horizontal datums to a global horizontal datum
and vice versa.
Extra-terrestrial positioning techniques include Satellite Laser Ranging (SLR), Lunar Laser
Ranging (LLR), Global Positioning System (GPS), and Very Long Baseline Interferometry
(VLBI), among others.

The organizations that usually develop transformation tools and make them
available to the user community are provincial or National Mapping
Organizations (NMOs) and cadastral authorities.

4.1.2 Coordinate systems


As mentioned before, the special nature of spatial data obviously lies in it being
spatially referenced. Different kinds of coordinate systems are used to position data
in space. Here we distinguish between spatial and planar coordinate sys-tems. Spatial
(or global) coordinate systems are used to locate data either on the Earth’s surface
in a 3D space, or on the Earth’s reference surface (ellipsoid or sphere) in a 2D space.
Below we discuss the geographic coordinate system in a 2D and 3D space and the
geocentric coordinate system, also known as the 3D Cartesian coordinate system.
Planar coordinate systems on the other hand are used to locate data on the flat
surface of the map in a 2D space. We will discuss the 2D Cartesian coordinate system
and the 2D polar coordinate system.

2D Geographic coordinates (;)


The most widely used global coordinate system consists of lines of geographic
latitude (phi or or ') and longitude (lambda or ). Lines of equal latitude are called
parallels. They form circles on the surface of the ellipsoid4. Lines of equal
longitude are called meridians and they form ellipses (meridian ellipses) on the
ellipsoid. (Figure 9)
Figure 9: The latitude ( ) and longitude ( ) angles represent the 2D geographic coordinate system

normal through P ’ and the equatorial plane. Latitude is zero on the equator ( =
0 ), and increases towards the two poles to maximum values of = +90 (N 90 ) at
the North Pole and = -90 (S 90 ) at the South Pole.

The longitude ( ) is the angle between the meridian ellipse which passes through
Greenwich and the meridian ellipse containing the point in question. It is mea-
sured in the equatorial plane from the meridian of Greenwich ( = 0 ) either
eastwards through = + 180 (E 180 ) or westwards through = -180 (W 180 )

Latitude and longitude represent the geographic coordinates ( ; ) of a point P ’


(Figure 10) with respect to the selected reference surface. They are always given
in angular units. For example, the coordinates for City hall in Enschede are:5

= 52 13026:200N; = 6 53032:100E

The graticule on a map represents the projected position of the geographic co-
ordinates ( ; ) at constant intervals, or in other words the projected position of
selected meridians and parallels (Figure 13). The shape of the graticule depends
largely on the characteristics of the map projection and the scale of the map.
3D Geographic coordinates
3D geographic coordinates are obtained by introducing the ellipsoidal height h
to the system. The ellipsoidal height (h) of a point is the vertical distance of the
point in question above the ellipsoid. It is measured in distance units along the
ellipsoidal normal from the point to the ellipsoid surface. 3D geographic
coordinates can be used to define a position on the surface of the Earth.

Video 1: Coordinate System Jargon- geoid, datum, projection


Source:https://www.youtube.com/watch?v=Z41Dt7_R180
4.1.3 Map projections
Maps are one of the world’s oldest types of document. For quite some time it
was thought that our planet was flat, and during those days, a map simply was a
miniature representation of a part of the world. Now that we know that the
Earth’s surface is curved in a specific way, we know that a map is in fact a flattened
representation of some part of the planet. The field of map projections concerns
itself with the ways of translating the curved surface of the Earth into a flat map.

A map projection is a mathematically described technique of how to


represent sent the Earth’s curved surface on a flat map

To represent parts of the surface of the Earth on a flat paper map or on a computer
screen, the curved screen, the curved horizontal reference surface must be mapped
onto the 2D mapping plane. The reference surface for large scale mapping is usually,
an oblate ellipsoid and for small scale mapping a sphere. Mapping onto a 2D
mapping plane means transforming each point on the reference surface with
geographic coordinates to a set of cartesian coordinates (x, y) representing. position
on the map plane (Figure 15).

Active Map Projection https://www.youtube.com/watch?v=z1PpoolrMK4

https://www.youtube.com/watch?v=gGumy-9HrSY
Classification of map projections
These qualities in turn make resulting maps useful for certain pur poses. By definition,
any map projection is associated with scale distortions. Hundreds of map projections
have been developed, each with its own specific There is simply no way to flatten out
a piece of ellipsoidal or spherical surface without stretching some parts of the surface
more than others. The amount and which kind of distortions a map will have depends
on the type of the map projection that has been selected.

Typical choices for such intermediate surfaces are cones and cylinders. Such map
projections are then called conical, and cylindrical, respectively. Figure 16 shows the
surfaces involved in these three classes of projections.

Cylindrical Conical Azimuthal

Figure 16: Classes of map projections

In the geometric depiction of map projections in Figures 16 and 17, the symmetry
axes of the plane, cone and cylinder coincide with the rotation axis of the ellipsoid or
sphere, i.e. a line through N and S pole. In this case, the projection is said to be a
normal projection. The other cases are transverse projections (symmetry axis in the
equator) and oblique projections (symmetry axis is somewhere between the rotation
axis and equator of the ellipsoid or sphere). These cases are illustrated in Figure 18.
Cylindrical Conical Azimuthal

projection classes

Figure 17: Three secant


The Universal Transverse Mercator (UTM) uses a transverse cylinder, secant
to the horizontal reference surface. UTM is an important projection used
world-wide. The projection is a derivation from the Transverse Mercator
projection (also known as Gauss-Kruger or Gauss conformal projection). The
UTM divides the world into 60 narrow longitudinal zones of 6 degrees,
numbered from 1 to 60. The narrow zones of 6 degrees (and the secant map
surface) make the distortions small enough for large scale topographic
mapping.
Normal cylindrical projections are typically used to map the world in its
entirety. Conical projections are often used to map the different continents,
while the nor-mal azimuthal projection may be used to map the polar areas.
Transverse and oblique aspects of many projections can be used for most
parts of the world.

The planar, conical, and cylindrical surfaces in Figure 16 are all


tangent surfaces they touch the horizontal reference surface in one
point (plane) or along a closed line (cone and cylinder only). Another
class of projections is obtained if the surfaces are chosen to be
secant to (to intersect with) the horizontal reference. surface;
illustrations are in Figure 17. Then, the reference surface is
intersected along one closed line or two closed lines (cone and
cylinder) Secant map surfaces are used to reduce or average out
scale errors because the line of intersection are not distorted on the
map,

4.1.4 Coordinate transformations


Map and GIS users are mostly confronted in their work with transformations from
one two-dimensional coordinate system to another. This includes the trans-
formation of polar coordinates delivered by the surveyor into Cartesian map
coordinates or the transformation from one 2D Cartesian (x; y) system of aspecific
map projection into another 2D Cartesian (x0; y0) system of a defined map
projection.

Datum transformations are transformations from a 3D coordinate system (i.e.


horizontal datum) into another 3D coordinate system. These kinds of
transformations are also important for map and GIS users. They are usually
collecting spatial data in the field using satellite navigation technology and need
to represent this data on published map on a local horizontal datum.
We may relate an unknown coordinate system to a known coordinate system on
the basis of a set of selected points whose coordinates are known in both systems.
These points may be ground control points (GCPs) or common points such as
corners of houses or road intersections, as long as they have known coordinates
in both systems. Image and scanned data are usually transformed by this method.
The transformations may be conformal, affine, polynomial, or of another type,
depending on the geometric errors in the data set.

2D Polar to 2D Cartesian transformations


The transformation of polar coordinates ( ; d), into Cartesian map coordinates (x;
y) is done when field measurements, angular and distance measurements are
transformed into map coordinates.

Changing map projection


Forward and inverse mapping equations are normally used to transform data
from one map projection to another. The inverse equation of the source
projection is used first to transform source projection coordinates (x; y) to
geographic coordinates ( ; ). Next, the forward equation of the target projection
is used to transform the geographic coordinates ( ; ) into target projection
coordinates (x0; y0). The first equation takes us from a projection A into
geographic coordinates. The second takes us from geographic coordinates ( ; ) to
another map projection B. These principles are illustrated in Figure 22.
Figure 22: The Principle of changing from one map projection into another

Datum transformations

A change of map projection may also include a change of the horizontal datum.
This is the case when the source projection is based upon a different horizontal
datum than the target projection. If the difference in horizontal datums is ignored,
there will not be a perfect match between adjacent maps of neighbouring
countries or between overlaid maps originating from different projections. It may
result in up to several hundred meters difference in the resulting coordinates.
Therefore, spatial data with different underlying horizontal datums may need a
so-called datum transformation.

Suppose we wish to transform spatial data from the UTM projection to the Dutch
RD system, and that the data in the UTM system are related to the European
Datum 1950 (ED50), while the Dutch RD system is based on the Amersfoort
datum. In this example the change of map projection should be combined with a
datum transformation step for a perfect match. This is illustrated in Figure 23.

The inverse equation of projection A is used first to take us from the map
coordinates (x; y) of projection A to the geographic coordinates ( ; ; h) in datum
A. A height coordinate (h or H) may be added to the (x; y) map coordinates. Next,
the datum transformation takes us from these coordinates to the geographic
coordinates ( ; ; h) in datum B. Finally, the forward equation of projection B is used
to take us from the geographic coordinates ( ; ; h) in datum B to the map
coordinates (x0; y0) of projection B.

Mathematically a datum transformation is feasible via the geocentric coordi-nates


(x; y; z), or directly by relating the geographic coordinates of both datum systems.
The latter relates the ellipsoidal latitude ( ) and longitude ( ), and

Figure 23: The principle of changing from one pro-jection into another, combined
with a datum trans-formation from datum A to datum B

4.2 Satellite-based positioning


The previous section has noted the importance of satellites in spatial
referencing. Sattelites have allowed us to realize geocentric reference systems,
and increase the level of spatial accuracy substantially. They are critical tools in
geodetic engineering for the maintenance of the ITRF. They also play a key role
in mapping, surveying, and in a growing number of applications requiring
positioning tech niques. Nowadays, for fieldwork that includes spatial data
acquisition, the use of satellite-based positioning is considered indispensable.
Satellite-based positioning was developed and implemented to address
military needs, somewhat analogously to the early development of the
internet. The technology is now widely available for civilians use. The
requirements for the development of the positioning system were:
i) Suitability
• for all kinds of military use: ground troops and vehicles, aircraft and
missiles, ships;

ii) Requiring
• only low-cost equipment with low energy consumption at the
receiver end;

iii) Provision
• of results in real time for an unlimited number of users
concurrently;
Support for different levels of accuracy (military versus civilian); Around-the-
clock and weather-proof availability; Use of a single geodetic datum;

Protection against intentional and unintentional disturbance, for instance,



through a design allowing for redundancy.

A satellite-based positioning system set-up involves implementation of three


hardware segments:

i) The space segment, i.e. the satellites that orbit the Earth, and the radio
signals that they emit,
ii) The control segment, i.e. the ground stations that monitor and maintain the
space segment components, and
iii) The user segment, i.e. the users with their hard and software to conduct
positioning.

In satellite positioning, the central problem is to determine values (X, Y, Z) of


a receiver that receives satellite signals, i.e. to determine the position of the
receiver with a stated accuracy and precision. Required accuracy and precision
depends on the application; timeliness, i.e. are the position values required in
real time or can they be determined later during post processing, also varies
between applications. Finally, some applications like navigation require
kinematic approaches, which take into account the fact that the receiver is not
stationary, but is moving.
Video on Satellite Positioning
https://www.youtube.com/watch?v=ror4P1UAv_g

4.2.1 Absolute positioning


The working principles of absolute, satellite-based positioning are fairly simple:
1. A satellite, equipped with a clock, at a specific moment sends a radio message
that includes the satellite identifier,

its position in orbit, and its clock reading.


2.A receiver on or above the planet, also equipped with a clock, receives the
message slightly later, and reads its own clock.
3. From the time delay observed between the two clock readings, and knowing
The pseudorange of a satellite with respect to a receiver, is its
apparent distance to the receiver, computed from the time delay with
which its radio signal is received.
the speed of radio transmission through the medium between (satellite) sender
and receiver, the receiver can compute the distance to the sender, also known as
the satellite’s pseudorange.such a computation determines the position of the receiver
to be on a sphere of radius equal to the computed pseudorange (refer to Figure
4.24(a)). If the receiver instantaneously would do the same with a message of another
satellite that is positioned elsewhere, the position of the receiver is restricted to another
sphere. The intersection of the two spheres, which have different centres, determines a
circle as the set of possible positions of the receiver (refer to Figure 24(b)). If a third
satellite message is taken into consideration, the inter
Trilateration section of three spheres determines at most two positions, one of
which is the actual position of the receiver. In most, if not all, practical situations
where two positions result, one of them is a highly unlikely position for a signal
receiver. The overall procedure is known as trilateration: the determination of a
position based on three distances. clock that is not synchronized with the satellite
clocks. This brings into play an additional unknown parameter, namely the
synchronization bias of the receiver clock, i.e. the difference in time reading
between it and the satellite clocks.
Our set of unknown variables has now become (X, Y, Z, ∆t) representing a 3D
position and a clock bias. By including the information obtained from a fourth
satellite message, we can solve the problem (see Figure 4.25).

b) This will result in the determination of the receiver’s actual position (X, Y, Z), as
well as its receiver clock bias ∆t, and if we correct the receiver clock for this bias
we effectively turn it into a high precision, atomic clock as well!
Obtaining a high precision clock is a fortunate side-effect of using the receiver, as
it allows the design of experiments distributed in geographic space that demand
high levels of synchrony. One such application is the use of wireless sensor
networks for various natural phenomena like earthquakes, meteorological
patterns or in water management.
Another application is in the positioning of mobile phone users making an
emergency call. Often the caller does not know their location accurately. The
telephone company can trace back the call to the receiving transmitter mast, but
this may be servicing an area with a radius of 300 m to 6 km. That is too inaccurate
a position for an emergency ambulance to go to. However, if all masts in the
telephony network are equipped with a satellite positioning receiver (and thus,
with a very good, synchronized clock) the time of reception of the call at each
mast can be recorded. The time difference of arrival of the call between two nearby
masts determines a hyperbola on the ground of possible positions of the caller; if
the call is received on three masts, we would have two hyperbolas, allowing
intersection, and thus ‘hyperbolic positioning’. With current technology the
(horizontal) accuracy would be better than 30 m. Returning to the subject of
satellite-based positioning, when only three and not four satellites are ‘in view’,
the receiver is capable of falling back from the above 3D positioning mode to the
inferior 2D positioning mode. With the relative abundant 2D positioning mode
dance of satellites in orbit around the earth, this is a relatively rare situation, but
it serves to illustrate the importance of 3D positioning.
If a 3D fix has already been obtained, the receiver simply assumes that the height
above the ellipsoid has not changed since the last 3D fix. If no fix had yet been
obtained, the receiver assumes that it is positioned at the geocentric ellipsoid
adopted by the positioning system, i.e. at height h=0.8 In the receiver
computations, the ellipsoid fills the slot of the missing fourth satellite sphere, and
the unknown variables can therefore still be determined. Clearly in both of these
cases, the assumption for this computation is flawed and the positioning results
in 2D mode will be unreliable—much more so if no previous fix had been
obtained and one’s receiver is not at all near the surface of the geocentric ellipsoid.

Pseud
oRan

Receive
r Clock
Time, clocks and world time
When trains became an important means of transportation, these local time
systems became problematic as the schedules required a single time system. Such
a time system needed the definition of time zones: typically as 24 geographic
strips certain longitudes that are multiples of 15◦. This all gave rise to Greenwich
Mean Time (GMT). GMT was the world time standard of choice. It was a system
based on the mean solar time at the meridian of Greenwich, United Kingdom,
which is the conventional 0-meridian in geography.
GMT was later replaced by Universal Time (UT), a system still based on meridian
crossings of stars, but now of far away quasars as this provides more accuracy
than that of the Sun. It is still the case that the rotational velocity of our planet is
not constant and the length of a solar day is increasing. So UT is not a perfect
system either. It continues to be used for civil clock time, but it is officially now
replaced by International Atomic Time (TAI). UT actually has various versions,
amongst which are UT0, UT1 and UTC. UT0 is the Earth rotational time observed in
some location. Because the Earth experiences polar motion as well, UT0 differs
between locations. If we correct for polar motion, we obtain UT1, which is identical
everywhere. It is still a somewhat erratic clock because of the earlier mentioned
varying rotational velocity of the planet. The uncertainty is about 3 msec per day.
Coordinated Universal Time (UTC) is used in satellite positioning, and is
maintained with atomic clocks. By convention, it is always within a margin of 0.9
sec of UT1, and twice annually it may be given a shift to stay within that margin.
This occasional shift of a leap second is applied at the end of June 30 or preferably
at the end of December 31. The last minute of such a day is then either 59 or
UTC 61 seconds long. So far, adjustments have always been to add a
second. UTC time can only be determined to the highest precision after the fact,
as atomic time is determined by the reconciliation of the observed differences
between a number of atomic clocks maintained by different national time
bureaus.
In recent years we have learned to measure distance, therefore also position, with
clocks using satellite signals. The conversion factor is the speed of light,
approximately 3 108· m/s in vacuum. No longer can multiple seconds of clock bias
be allowed, and this is where atomic clocks come in. They are very accurate
Atomic clocks timekeepers, based on the exactly known frequency with which
specific atoms (Cesium, Rubidium and Hydrogen) make discrete energy state
jumps. Positioning satellites usually have multiple clocks on board; ground control
stations have even better quality atomic clocks.

VIDEO ON Atomatic Clock


https://www.youtube.com/watch?v=9ikbD7UGzoI

4.2.2 Errors in absolute positioning


For background information on the calculation of positional error (specifically,
the calculation of RMSE or root mean square error)

Errors related to the space segment


As a first source of error, the operators of the control segment may intentionally
deteriorate radio signals of the satellites to the general public, to avoid optimal
use of the system by the enemy, for instance in times of global political tension
and war. This selective availability—meaning that the military forces allied with the
control segment will still have access to undisturbed signals—may cause error
that is an order of magnitude larger than all other error sources combined.9
Secondly, the satellite message may contain incorrect information. Assuming that
it will always know its own identifier, the satellite may make two kinds of error:

1. Incorrect clock reading: Even atomic clocks can be off by a small margin, and
since Einstein, we know that travelling clocks are slower than resident clocks,
due to a so-called relativistic effect. If one understands that a clock that is off
by 0.000001 sec causes an computation error in the satellite’s pseudorange of
approximately 300 m, it is clear that these satellite clocks require very strict
monitoring.

2. Incorrect orbit position: The orbit of a satellite around our planet is easy to
describe mathematically if both bodies are considered point masses, but in real
life they are not. For the same reasons that the Geoid is not a simply shaped
surface, the Earth’s gravitation field that a satellite experiences

Both types of error are strictly monitored by the ground control segment, which is
responsible for correcting any errors of this nature, but it does so by applying an
agreed upon tolerance. A control station can obviously compare results of
positioning computations like discussed above with its accurately known position,
flagging any unacceptable errors, and potentially labelling a satellite as
temporarily ‘unhealthy’ until errors have been corrected, and brought to within
the tolerance. This may be done by uploading a correction on the clock or orbit
settings to the satellite.

Errors related to the medium


Thirdly, the medium between sender and receiver may be of influence to the
radio signals. The middle atmospheric layers of stratoand mesosphere are
relatively harmless and of little hindrance to radio waves, but this is not true
of the lower and upper layer. They are, respectively:
The troposhere:

the approximate 14 km high airspace just above the Earth’s surface, which
holds much of the atmosphere’s oxygen and which envelopes all phenomena
that we call the weather. It is an obstacle that delays radio waves in a rather
variable way.

The ionosphere:
• outward part of the atmosphere that starts at an altitude of 90 km,
the most
holding many electrically charged atoms, thereby forming a protection
against various forms of radiation from space, including to some extent radio
waves. The degree of ionization shows a distinct night and day rhythm, and
also depends on solar activity. The latter is a more severe source of delay to
satellite signals, which obviously means that pseudo ranges are estimated
larger than they actually are.
When satellites emit radio signals at two or more frequencies, an estimate
can be computed from differences in delay incurred for signals of different
frequency, and this will allow for the correction of atmospheric delay, leading
to a 10–50% improvement of accuracy. If this is not the case, or if the receiver
is capable of receiving just a single frequency, a model should be applied to
forecast the (especially ionospheric) delay, typically taking into account the
time of day and current latitude of the receiver.

Errors related to the receiver’s environment


Fourth in this list is the error occurring when a radio signal is received via two or
more paths between sender and receiver, some of which typically via a bounce
off of some nearby surface, like a building or rock face. The term applied to this
Multi-path error phenomenon is multi-path; when it occurs the multiple
receptions of the same signal may interfere with each other (see Figure 26).
Multi-path is a difficult to avoid error source.

Figure 26: At any point in time, a number of satellites will be above the
receiver’s horizon. But not all of them will be ‘in view’ (like the left and
right satellites), and for others multipath signal reception may occur.

All of the above error sources have an influence on the computation of a


satellite’s pseudo range. In accumulation, they are called the user equivalent
range error (UERE). Some error sources may be at work for all satellites being
used by the receiver, for instance, selective availability and the atmospheric
delay, while others may be specific to one satellite, for instance, incorrect
satellite information and multi-path.
Errors related to the relative geometry of satellites and
receiver

There is one more source of error that is unrelated to individual radio signal
characteristics, but that rather depends on the combination of the satellite
signals used for positioning. Of importance is their constellation in the sky
from the receiver perspective. Referring to Figure 27, one will understand that
the sphere intersection technique of positioning will provide more precise
results when the four satellites are nicely spread over the sky, and thus that
the satellite constellation of Figure 27(b) is preferred over the one of 27(a).

This error source is known as geometric dilution of precision (GDOP). GDOP


is lower when Geometric dilution of satellites are just above the horizon in
mutually opposed compass directions. However, such satellite positions have
bad atmospheric delay characteristics, so in practice it is better if they are at
least 15◦ above the horizon. When more than four satellites are in view,
modern receivers use ‘least-squares’ adjustment to calculate the best
positional fix possible from all of the signals. This gives a better solution that
just using the “best four”, as was done previously.

satellite clock 2.0 m


satellite position ionospheric delay 2.5 m
tropospheric delay receiver noise multi- 0.5 m
path 0.3 m
0.5 m

Total RMSE Range error:


.
5.97 m
22 + 2.52 + 52 + 0.52 + 0.32 + 0.52 =

Table 4.4: Indication of typical magnitude of errors in absolute satellite-


based positioning precision values
These errors are not all of similar magnitude. An overview of some typical
values

(a) (b)
Figure 27: Geometric dilution of precision. The four satellites used for po-
sitioning can be in a bad constellation (a) or in a better constellation (b).

Figure 27: (without selective availability) is provided in Table 4.4. GDOP


functions not so much as an independent error source but rather as a
multiplying factor, decreasing the precision of position and time values
obtained.

The procedure that we discussed above is known as absolute, single-point


positioning based on code measurement. It is the fastest and simplest, yet least
accurate way of determining a position using satellites. It suffices for
recreational purposes and other applications that require horizontal accuracy
not under 5–10 m. Typically, when encrypted military signals can also be used,
on a dual-frequency receiver the achievable horizontal accuracy is 2–5 m.
Below, we discuss other satellite-based positioning techniques with better
accuracies.

4.2.3 Relative positioning

One technique of trying to remove errors from positioning computations is


to perform many position computations, and to determine the average over
the solutions. Many receivers allow the user to do so. It should however be
clear from the above that averaging may address random errors like signal
noise, selective availability (SA) and multi-path to some extent, but not
systematic sources of error, like incorrect satellite data, atmospheric delays,
and GDOP effects.10
Random and systematic
These sources should be removed before averaging is applied. It has been
shown that averaging over 60 minutes in absolute, single-point positioning
based on code measurements, before systematic error removal, leads only to
a 10–20% improvement of accuracy. In such cases, receiver averaging is
therefore of limited value, and requires long periods under near-optimal
conditions. Averaging is a good technique if systematic errors have been
accounted for.

In relative positioning, also known as differential positioning, one tries to


remove some of the systematic error sources by taking into account
measurements of these errors in a nearby stationary reference receiver with an
accurately known position. By using these systematic error findings at the
reference, the position of the target receiver of interest will become known
much more precisely.

In an optimal setting, reference and target receiver experience identical


conditions and are connected by a direct data link, allowing the target to
receive correctional data from the reference. In practice, relative positioning
allows reference and target receiver to 70–200 km apart, and they will
essentially experience similar atmospheric signal error. For each satellite in view,
the reference receiver.

Finally, there is also a notion of inverted relative positioning. The principles are
still as above, but in this technique the target receiver does not correct for
satellite pseudorange error either, but uses a data link to upload its
positioning/timing information to a central repository, where the corrections
are applied. This can be useful in cases where many target receivers are needed
and budget does not allow them to be expensive.
Video on Absolute Location Versus Relative Location - YouTube

4.2.4 Network positioning

After discussing the advantages of relative positioning, we can move on to


the notion of network positioning: an integrated, systematic network of
reference receivers covering a large area like a continent or even the whole
globe.

The organization of such a network can take different shapes, augmenting an


already existing satellite-based system. Here we discuss a general
architecture, consisting of a network of reference stations, strategically
positioned in the area to be covered, each of which is constantly monitoring
signals and their errors for all positioning satellites in view. One or more
control centres receive the reference station data, verify this for correctness,
and relay (uplink) this information to a geostationary satellite. The satellite
will retransmit the correctional data to the area that it covers, so that target
receivers, using their own approximate position, determine satellite signal
error correction, further accurate position fixes.

With network positioning, accuracy in the submetre range can be obtained.


Typically, advanced receivers are required, but the technology lends itself also
for solutions with a single advanced receiver that functions in the direct
neighbourhood as a reference receiver to simple ones.
Video on Network Positioning
What is Galileo? - YouTube

4.2.5 Code versus phase measurements

Up until this point, we have assumed that the receiver determines the
range of a satellite by measuring time delay on the received ranging code.
There exists a more advanced range determination technique known as
carrier phase measurement. This typically requires more advanced receiver
technology, and longer observation sessions. Carrier phase measurement
can currently only be used with relative positioning, as absolute
positioning using this method is notll developed.

The technique aims to determine the number of cycles of the (sine-shaped)


radio signal between sender and receiver. Each cycle corresponds to one
wavelength of the signal, which in the applied L-band frequencies is 19–
24 cm. Since this number of cycles cannot be directly measured, it is
determined, in a long observation session, from the change in carrier phase
with time. This happens because the satellite is orbiting itself. From its orbit
parameters and the change in phase over time, the number of cycles can
be derived.

With relative positioning techniques, a horizontal accuracy of 2 mm–2 cm


can be achieved. This degree of accuracy makes it possible to measure
tectonic plate movements, which can be as big as 10 cm per year in some
locations on the planet.

4.2.6 Positioning technology

We include this section to provide the reader with a little information on


currently available satellite-based positioning technology. It should be
noted that this textbook will easily outlive the currency of the information
contained within it, as our technology is constantly evolving.

At present, two satellite-based positioning systems are operational (GPS


and GLONASS), and a third is in the implementation phase (Galileo).
Respectively, these are American, Russian and European systems. Any of
these, but especially GPS and Galileo, will be improved over time, and will
be augmented with new techniques.
GPS

The NAVSTAR Global Positioning System (GPS) was declared operational


in 1994, providing Precise Positioning Services (PPS) to US and allied
military forces as well as US government agencies, and Standard
Positioning Services (SPS) to civilians throughout the world. Its space
segment nominally consists of 24 satellites, each of which orbit our planet
in 11h58m at an altitude of 20,200 km. There can be any number of
satellites active, typically between 21 and 27. The satel-Orbital planes
lites are organized in six orbital planes, somewhat irregularly spaced, with
an angle of inclination of 55–63◦ with the equatorial plane, nominally
having four satellites each (see Figure 4.28). This means that a receiver on
Earth will have between five and eight (sometimes up to twelve) satellites
in view at any point in time. Software packages exist to help in planning
GPS surveys, identifying expected satellite set-up for any location and
time.

The NAVSTAR satellites transmit two radio signals, namely the L1


frequency at 1575.42 MHz and the L2 frequency at 1227.60 MHz. The first
two signals consist of

The role of the ranging codes is two-fold:

The carrier waves at the given frequencies, A coarse ranging code, known
as C/A, modulated on L1,:A navigation message modulated on both L1
and L2.The role of L2 is to provide a second radio signal, thereby allowing
(the more expensive) dual-frequency receivers a way of determining fairly
precisely the actual ionospheric delay on satellite signals received.
An •encrypted precision ranging code, known as P(Y), modulated on L1
and L2, and
Figure 28: Constellation of satellites, four shown in only one orbit plane,
in the GPS system.

To identify the satellite that sent the signal, as each satellite sends unique
codes, and the receiver has a look-up table for these codes, and
To determine the signal transit time, and thus the satellite’s pseudorange.

The navigation message contains the satellite orbit and satellite clock error
information, as well as some general system information. GPS also carries
a fifth, encrypted military signal carrying the M-code. GPS uses WGS84 as
its reference system. It has been refined on several occasions and is now
aligned with theWGS84 and ITRF. ITRF at the level of a few centimetres
worldwide. GPS has adopted UTC as its time system.

In the civil market, GPS receivers of varying quality are available, their
quality depending on the embedded positioning features: supporting
single or dual frequency, supporting only absolute or also relative
positioning, performing code measurements or also carrier phase
measurements. Leica and Trimble are two of the well-known brands in the
high-precision, professional surveying domain; Magellan and Garmin, for
instance, operate in the lower price, higher volume consumer market
range, amongst others for recreational use in outdoor activities. Many of
these are single frequency receivers, doing only code measurements,
though some are capable of relative positioning. This includes the new
generation of GPS-enabled mobile phones.

GLONASS

What GPS is to the US military, is GLONASS to the Russian military. The


GLONASS space segment consists of nominally 24 satellites, organized in
three orbital planes, with an inclination of 64.8◦ with the equator. Orbiting
altitude is 19,130 km, with a period of revolution of 11 hours 16 min.
GLONASS uses the PZ–90 as its reference system, and like GPS uses UTC as
time reference, though with an offset for Russian daylight.

GLONASS radio signals are somewhat similar to that of GPS, but differ in
the details. Satellites use different identifier schemes, and their navigation
message use other parameters. They also use different frequencies:
GLONASS L1 is at approximately 1605 MHz (changes are underway), and
L2 is at approximately 1248 MHz. Otherwise, the GLONASS system
performance is rather comparable to that of GPS.
Galileo

In the 1990’s, the European Union (EU) judged that it needed to have its
own satellite-based positioning system, to become independent of the GPS
monopoly and to support its own economic growth by providing services
of high reliability under civilian control.

Galileo is the name of this EU system. The vision is that satellite-based


positioning will become even bigger due to the emergence of mobile
phones equipped with receivers, perhaps with some 400 million users by
the year 2015. Development of the system has experienced substantial
delays, and at the time of writing European ministers insist that Galileo
should be up and running by the end of 2013. The completed system will
have 27 satellites, with three in reserve, orbiting in one of three, equally
spaced, circular orbits at an elevation of 23,222 km, inclined 56◦ with the
equator. This higher inclination, when compared to that of GPS, has been
chosen to provide better positioning coverage at high latitudes, such as
northern Scandinavia where GPS performs rather poorly.

The Galileo Terrestrial Reference Frame (GTRF) will be a realization of the


ITRS independently set up from that of GPS, so that one system can back-
up for the other. Positional differences between the WGS84 and the GTRF
will be at worst a few centimeters.
The Galileo System Time (GST) will closely follow International TAI Atomic
Time (TAI) with a time offset of less than 50 nsec for 95 % of the time over
any period of a year. Information on the actual offset between GST and
TAI, and between GST and UTC (as used in GPS) will be broadcasted in the
Galileo satellite signal.

Satellite-based augmentation systems

Satellite-based augmentation systems (SBAS) aim to improve accuracy


and reliability of satellite-based positioning (see the section on network
positioning, in support of safety-critical navigation applications such as
aircraft operations near airfields. The typical technique is to provide an
extra, now geostationary, satellite that has a large service area like a
continent, and which sends differential data about standard positioning
satellites that are currently in view in its service area. If multiple ground
reference stations are used, the quality of the differential data can be quite
good and reliable. Signals typically use the frequency already in use by the
positioning satellites, so that receivers can receive the differential code
without problem.
Not all advantages of satellite augmentation will be useful for all receivers.
For consumer market receivers, the biggest advantage, as compared to
standard relative positioning, is that SBAS provides an ionospheric
correction grid for its service area, from which a correction specific for the
location of the receiver can be retrieved. This is not true in relative
positioning, where the reference station determines the error it
experiences, and simply broadcasts this information for nearby target
receivers to use. With SBAS, the receiver obtains information that is best
viewed as a geostatistical interpolation of errors from multiple reference
stations. More advanced receivers will be able to deploy also other
differential data such as corrections on satellite position and satellite clock
drift.
Chapter 5

Data entry and preparation

Objectives:
1. To know the collection and use of data.
2. To prepare users of spatial data by drawing attention to issues concerning
data accuracy and quality.
3. A range of procedures for data checking and clean-up.
4. Methods for interpolating point data.

Spatial data can be obtained from various sources. It can be


collected from scratch, using direct spatial data acquisition techniques, or
indirectly, by making use of existing spatial data collected by others. Under
the first heading we could include field survey data and remotely sensed
images. Under the second fall paper maps and existing digital data sets.

5.1 Spatial data input


5.1.1 Direct spatial data capture

One way to obtain spatial data is by direct observation of the relevant


geographic phenomena. This can be done through ground-based field
surveys, or by using remote sensors in satellites or airplanes.

Data which is captured directly from the environment is known as


primary data.

With primary data the core concern in knowing its properties is to know
the process by which it was captured, the parameters of any instruments
used and the rigour with which quality requirements were observed.
Remotely sensed imagery is usually not fit for immediate use, as various
sources of error and distortion may have been present, and the imagery
should first be freed from these. This is the domain of remote sensing.

An image refers to raw data produced by an electronic sensor, which


are not pictorial, but arrays of digital numbers related to some property
of an object or scene, such as the amount of reflected light.
For an image, no interpretation of reflectance values as thematic or
geographic characteristics has taken place. When the reflectance values
have been translated into some ‘thematic’ variable, we refer to it as a
raster.
It is interesting to note that we refer to image pixels but to raster cells,
although both are stored in a GIS in the same way.
In practice, it is not always feasible to obtain spatial data by direct spatial
data capture. Factors of cost and available time may be a hindrance, or
previous projects sometimes have acquired data that may fit the current
project’s purpose.

5.1.2 Indirect spatial data capture

In contrast to direct methods of data capture described above, spatial data


can also be sourced indirectly. This includes data derived from existing
paper maps.
Secondary data through scanning, data digitized from a satellite image,
processed data purchased from data capture firms or international
agencies, and so on. This type of data is known as secondary data:

Any data which is not captured directly from the environment is known
as
secondary data.
Below we discuss key sources of secondary data and issues related to their
use in analysis of which the user should be aware.

Digitizing
A traditional method data is through digitizing existing paper maps. This
can be done using various techniques. Before adopting this approach, one
must be aware that positional errors already in the paper map will further
accumulate, and one must be willing to accept these errors.
There are two forms of digitizing: i) on-tablet and ii) on-screen manual
digitizing. In on-tablet digitizing, the original map is fitted on a special
surface (the tablet), while in on-screen digitizing, a scanned image of the
map (or some other image) is shown on the computer screen.

The function of these points is to ‘lock’ a coordinate system onto the


digitized data: the control points on the map have known coordinates, and
by digitizing them we tell the system implicitly where all other digitized
locations are. At least three control points are needed, but preferably more
should be digitized to allow a check on the positional errors made.
These techniques are known as semi-automatic or automatic digitizing,
depending on how much operator interaction is required. If vector data is
to be distilled from this procedure, a process known as vectorization
follows the scanning process. This procedure is less labour-intensive, but
can only be applied

Scanning
An ‘office’ scanner illuminates a document and measures the intensity of
the reflected light with a CCD array. The result is an image as a matrix of
pixels, each of which holds an intensity value. Office scanners have a fixed
maximum resolution, expressed as the highest number of pixels they can
identify per inch;
Resolution the unit is dots-per-inch (dpi). For manual on-screen digitizing
of a paper map, a resolution of 200–300 dpi is usually sufficient, depending
on the thickness of the thinnest lines. For manual on-screen digitizing of
aerial photographs, higher resolutions are recommended—typically, at
least 800 dpi.
Semiautomatic digitizing requires a resolution that results in scanned lines
of at least three pixels wide to enable the computer to trace the centre of
the lines and thus avoid displacements. For paper maps, a resolution of
300–600 dpi is usually sufficient. Automatic or semi-automatic tracing from
aerial photographs can only be done in a limited number of cases. Usually,
the information from aerial photos is obtained through visual
interpretation.
After scanning, the resulting image can be improved with various image
processing techniques. It is important to understand that scanning does
not result in a structured data set of classified and coded objects.
Additional work is required to recognize features and to associate
categories and other thematic attributes with them.

Vectorization
The process of distilling points, lines and polygons from a scanned image
is called vectorization. As scanned lines may be several pixels wide, they
are often first thinned to retain only the centreline. The remaining
centreline pixels are converted to series of (x, y) coordinate pairs, defining
a polyline. Subsequently, OCR features are formed and attributes are
attached to them. This process may be entirely automated or performed
semi-automatically, with the assistance of an operator.
The phases of the vectorization process are illustrated in Figure 5.1.

nois
scanned image e

Fig. 5.1 phases of the vectorization process and the various sorts of small
error caused by it. The post-processing phase makes the final repairs.

Selecting a digitizing technique


The choice of digitizing technique depends on the quality, complexity and
contents of the input document. Complex images are better manually
digitized; simple images are better automatically digitized. Images that are
full of detail and symbols—like topographic maps and aerial
photographs—are therefore better manually digitized.

In practice, the optimal choice may be a combination of methods. For


example, contour line film separations can be automatically digitized and
used to produce a DEM. Existing topographic maps must be digitized
manually, but new, geometrically corrected aerial photographs, with vector
data from the topographic maps displayed directly over it, can be used for
updating existing data files by means of manual on-screen digitizing.
5.1.2 Obtaining spatial data elsewhere
Over the past two decades, spatial data has been collected in digital form
at increasing rate, stored in various databases by the individual producers
for their own use and for commercial purposes. More and more of this
data is being shared among GIS users. This is for several reasons. Some of
this data is freely available, although other data is only available
commercially, as is the case for most satellite imagery. High quality data
remain both costly and time-consuming to collect and verify, as well as the
fact that more and more GIS applications are looking at not just local, but
national or even global processes. As we will see below, new technologies
have played a key role in the increasing availability of geospatial data. As
a result of this increasing availability, we have to be more careful that the
data we have acquired is of sufficient quality to be used in analysis and
decision making.

Clearinghouses and web portals


Spatial data can also be acquired from centralized repositories. More often
those repositories are embedded in Spatial Data Infrastructures which
make the data available through what is sometimes called a spatial data
clearinghouse.

This is essentially a marketplace where data users can ‘shop’. It Spatial


Data Infrastructures will be no surprise that such markets for digital data
have an entrance through the world wide web. The first entrance is typically
formed by a web portal which categorizes all available data and provides a
local search engine and links to data documentation (also called metadata).
It often also points to data viewing and processing services. Standards-
based geo-webservices have become the common technology behind
such portal services.
Metadata
Metadata is defined as background information that describes all
necessary information about the data itself. More generally, it is known as
‘data about data’.
This includes:
• Identification information: Data source(s), time of acquisition, etc.
Data
• quality information: Positional attribute and temporal accuracy,
lineage, etc.

• Entity and attribute information: Related attributes, units of measure,


etc.

In essence, metadata answer who, what, when, where, why, and how
questions about all facets of the data made available. Maintaining
metadata is an key part in maintaining data and information quality in GIS.
This is because it can serve different purposes, from description of the data
itself through to providing instructions for data handling. Depending on
the type and amount of metadata provided, it could be used to determine
the data sets that exist for a geographic location, evaluate whether a given
data set meets a specified need, or to process and use a data set.

Data formats and standards


An important problem in any environment involved in digital data
exchange is that of data formats and data standards. Different formats
were implemented by different GIS vendors; different standards came
about with different standardization committees. The phrase ‘data
standard’ refers to an agreed upon ISO and OGC standards way of
representing data in a system in terms of content, type and format. The
good news about both formats and standards is that there are many to
choose from; the bad news is that this can lead to a range of conversion
problems. Several metadata standards standards for digital spatial data
exist, including the International Organization for Standardization (ISO)
and the Open Geospatial Consortium (OGC) standards.
5.2 Data quality

With the advent of satellite remote sensing, GPS and GIS technology, and the
increasing availability of digital spatial data, resource managers and others
who formerly relied on the surveying and mapping profession to supply high
quality map products are now in a position to produce maps themselves. At
the same time, GISs are being increasingly used for decision support
applications, with Application requirements increasing reliance on
secondary data sourced through data providers or via the internet, through
geo-webservices. The implications of using low-quality data in important
decisions are potentially severe. There is also a danger that uninformed GIS
users introduce errors by incorrectly applying geometric and other
transformations to the spatial data held in their database.

Discussion on positional, temporal and attribute accuracy, lineage,


completeness, and logical consistency.

5.2.1 Accuracy and precision

So far we have used the terms error, accuracy and precision without
appropriately defining them. Accuracy should not be confused with precision,
which is a statement of the smallest unit of measurement to which data can
be recorded. In conventional surveying and mapping practice, accuracy and
precision are closely related. Instruments with an appropriate precision are
employed, and surveying methods chosen, to meet specified accuracy
tolerances.

In GIS, however, the numerical precision of computer processing and storage


usually exceeds the accuracy of the data. This can give rise to so-called
spurious accuracy, for example calculating area sizes to the nearest m2 from
coordinates obtained by digitizing a 1 : 50, 000 map.
(a)

(c)
T T (b)

T T (d)

Figure 5.2: A measurement probability function and the underlying true value T:

(a) bad accuracy and precision, (b) bad accuracy/good precision,

(c) good accuracy/bad precision, and (d) good accuracy and precision.

5.2.2 Positional accuracy


The surveying and mapping profession has a long tradition of determining
and minimizing errors. This applies particularly to land surveying and
photogrammetry, both of which tend to regard positional and height
errors as undesirable. Cartographers also strive to reduce geometric and
attribute errors in their products, and, in addition, define quality in
specifically cartographic terms, for example quality of linework, layout, and
clarity of text. It must be stressed that all measurements made with
surveying and photogrammetric instruments are subject to error. These
include:

• Human errors in measurement (e.g. reading errors) generally referred to as


gross errors or blunders. These are usually large errors resulting from
carelessness which could be avoided through careful observation,
although it is never absolutely certain that all blunders have been avoided
or eliminated.
• Instrumental or systematic errors (e.g. due to misadjustment of
instruments).This leads to errors that vary systematically in sign and/or
magnitude, but can go undetected by repeating the measurement with
the same instrument. Systematic errors are paticularly dangerous because
they tend to accumulate.
So–called random errors caused by natural variations in the quantity
being measured. These are effectively the errors that remain after blunders
and systematic errors have been removed. They are usually small, and dealt
with in least–squares adjustment.

Root mean square error


Location accuracy is normally measured as a root mean square error
(RMSE). The RMSE is similar to, but not to be confused with, the standard
deviation of a statistical sample. The value of the RMSE is normally
calculated from a set of check measurements (coordinate values from an
independent source of higher accuracy for identical points). The
differences at each point can be plotted as error vectors, as is done in
Figure 5.3 for a single measurement. The error vector can be seen as having
constituents in the xand y-directions, which can be recombined by vector
addition to give the error vector representing its locational error.
.
Figure 5.3: The positional error of a measurement can be expressed as a
vector, which in turn can be viewed as the vector addition of its
constituents in xand y-direction, respectively δx and δy

For each checkpoint, the error vector has components δx and δy. The
observed errors should be checked for a systematic error component,
which may indicate
Σ a (possibly repairable) lapse in the measurement
method. Systematic error has occurred when δx ƒ= 0 or δy ƒ= 0.
Σ
The systematic error δx in x is then defined as the average deviation from
the true value:

n

δx = nδx . i
i=1

Analogously to the calculation of the variance and standard deviation of a


statistical sample, the root mean square errors mx and my of a series of
coordinate measurements are calculated as the square root of the average
squared deviations:

where δx2 stands for δx δx. The total RMSE is obtained with the formula

which, by the Pythagorean rule, is the length of the average (root squared)
vector.

Accuracy tolerances
Many kinds of measurement can be naturally represented by a bell-
shaped probability density function p, as depicted in Figure 5.4(a). This
function is known as the normal (or Gaussian) distribution of a

continuous, random variable, in the


figure indicated as Y . It shape is determined by two parameters: µ, which

is the mean expected value for Y , and σ which is the standard deviation of

Y . A small σ leads to a more attenuated bell shape.

 

(a)

Figure 5.4: (a) Probability density function p of a variable Y , with its mean µ
and standard deviation σ
(a) The probability that Y is in
(b) the range [µ−σ,(c) µ+σ
≤ −
Any probability density function p has the characteristic that the area
between its curve and the horizontal axis has size 1. Probabilities P can be
inferred from p as the size of an area under p’s curve. Figure 5.4(b), for− ≤
instance, depicts P (µ σY µ σ), i.e. the probability that the value for Y
is within distance σ from µ. In a normal distribution this specific probability
for Y is always 0.6826.
The RMSE can be used to assess the probability that a particular set of
measurements does not deviate too much from, i.e. is within a certain
range of, the ‘true’ value. In the case of coordinates, the probability density
function often is considered to be that of a two-dimensional normally
distributed variable (see Figure 5.5). The three standard probability values
associated with this distribution are:

0.50
• for a circle with a radius of 1.1774 mx around the mean (known as
the circular error probable, CEP);
0.6321
• for a circle with a radius of 1.412 mx around the mean (known as
the root mean square error, RMSE);
0.90 for a circle with a radius of 2.146 mx around the mean (known as the

circular map accuracy standard, CMAS).
0.2

0.0

Figure 5.5: Probability density p of a normally distributed, two-dimensional


variable (X, Y ) (also known as a normal, bivariate distribution).

In the ground plane, from inside out, are indicated the circles respectively
associated with CEP, RMSE and CMAS.

The RMSE provides an estimate of the spread of a series of measurements


around their (assumed) ‘true’ values. It is therefore commonly used to
assess the quality of transformations such as the absolute orientation of
photogrammetric models or the spatial referencing of satellite imagery.
The RMSE also forms the basis of various statements for reporting and
verifying compliance with defined map accuracy tolerances. An example is
the American National Map Accuracy Standard, which states that:

“No more than 10% of well-defined points on maps of 1:20,000 scale or


greater may be in error by more than 1/30 inch.”
Normally, compliance to this tolerance is based on at least 20 well-defined
checkpoints.
The epsilon band
As a line is composed of an infinite number of points, confidence
limits can be described by a so-called epsilon (ε) or Perkal band at a fixed
distance on either side of the line (Figure 5.6). The width of the band is
based on an estimate of the probable location error of the line, for example
to reflect the accuracy of manual digitizing. The epsilon band may be used
as a simple means for assessing the likelihood that a point receives the
correct attribute value (Figure 5.7).

Figure 5.6: The εor Perkal band is formed by rolling an imaginary circle
of a given radius along a line

2 1

Figure 5.7: The ε-band may be used to assess the likelihood that a point
falls within a particular polygon. Source: [43]. Point 3 is less likely part of the
middle polygon than point
Describing natural uncertainty in spatial data

There are many situations, particularly in surveys of natural


resources, where, according to Burrough, “practical scientists, faced with
the problem of dividing up undividable complex continua have often
imposed their own crisp structures on the raw data”. In practice, the results
of classification are normally combined with other categorical layers and
continuous field data to identify, for example, areas suitable for a particular
land use. In a GIS, this is normally achieved by overlaying the appropriate
layers using logical operators.
Particularly in natural resource maps, the boundaries between units may
not actually exist as lines but only as transition zones, across which one
area continuously merges into another. In these circumstances, rigid
measures of positional accuracy, such as RMSE (Figure 5.3), may be
virtually insignificant in comparison to the uncertainty inherent in
vegetation and soil boundaries, for example.
In conventional applications of the error matrix to assess the quality of
nominal (categorical) data such as land use, individual samples can be
considered in terms of Boolean set theory. The Boolean membership
function is binary, i.e. an element is either member of the set (membership
is true) or it is not member of the set (membership is false). Such a
membership notion is well-suited to the description of spatial features
such as land parcels where no ambiguity is involved and an individual
ground truth sample can be judged to be either correct or incorrect.
In GIS, fuzzy set theory appears to have two particular benefits:

1. The ability to handle logical modelling (map overlay) operations on inexact


data, and

2. The possibility of using a variety of natural language expressions to qualify


uncertainty.
5.2.3 Attribute accuracy
We can identify two types of attribute accuracies. These relate to the type
of data we are dealing with:

For•nominal or categorical data, the accuracy of labeling (for example the


type of land cover, road surface, etc).

For numerical data, numerical accuracy (such as the concentration of



pollutants in the soil, height of trees in forests, etc).

It follows that depending on the data type, assessment of attribute


accuracy may range from a simple check on the labelling of features—for
example, is a road classified as a metalled road actually surfaced or not?—
to complex statistical procedures for assessing the accuracy of numerical
data, such as the percentage of pollutants present in the soil.
When spatial data are collected in the field, it is relatively easy to check on
the appropriate feature labels. In the case of remotely sensed data,
however, considerable effort may be required to assess the accuracy of the
classification procedures. This is usually done by means of checks at a
number of sample points.

Classified Reference data


image Forest Agriculture Urban total

Forest 62 5 0 67

Agriculture 2 18 0 20

Urban 0 1 12 13

total 64 24 12 100
Table 5.1: Example of a simple error matrix for assessing map attribute
accuracy.

The overall accuracy is (62+18+12)/100 = 92%


5.2.4 Temporal accuracy
As noted, the amount of spatial data sets and archived remotely sensed
data has increased enormously over the last decade. These data can
provide useful temporal information such as changes in land ownership
and the monitoring of environmental processes such as deforestation.
Analogous to its positional and attribute components, the quality of spatial
data may also be assessed in terms of its temporal accuracy. For a static
feature this refers to the difference in the values of its coordinates at two
different times.
This includes not only the accuracy and precision of time measurements
(for example, the date of a survey), but also the temporal consistency of
different data sets. Because the positional and attribute components of
spatial data may change together or independently, it is also necessary to
consider their temporal validity. For example, the boundaries of a land
parcel may remain fixed over a period of many years whereas the
ownership attribute may change more frequently.

5.2.5 Lineage

Lineage describes the history of a data set. In the case of published maps,
some lineage information may be provided as part of the metadata, in the
form of a note on the data sources and procedures used in the compilation
of the data. Examples include the date and scale of aerial photography, and
the date of field verification. Especially for digital data sets, however,
lineage may be defined more formally as:

“that part of the data quality statement that contains information that
describes the source of observations or materials, data acquisition and
compilation methods, conversions, transformations, analyses and
derivations that the data has been subjected to, and the assumptions and
criteria applied at any stage of its life.” [14]

All of these aspects affect other aspects of quality, such as positional


accuracy. Clearly, if no lineage information is available, it is not possible to
adequately evaluate the quality of a data set in terms of ‘fitness for use’.
5.2.6 Completeness

Completeness refers to whether there are data lacking in the database


compared to what exists in the real world. Essentially, it is important to be
able to assess what does and what does not belong to a complete
dataset as intended by its producer.

It might be incomplete (i.e. it is ‘missing’ features which exist in the


real world), or overcomplete (i.e. it contains ‘extra’ features which do not
belong within the scope of the data set as it is defined).

5.2.7 Logical consistency

For any particular application, (predefined) logical rules concern:

The• compatibility of data with other data in a data set (e.g. in terms of data
format),

• The absence of any contradictions within a data set,


• The topological consistency of the data set, and
The• allowed attribute value ranges, as well as combinations of attributes.
For example, attribute values for population, area, and population density
must agree for all entities in the database.

The absence of any inconsistencies does not necessarily imply that the
data are accurate.
5.3 Data preparation

Spatial data preparation aims to make the acquired spatial data fit for use.
Images may require enhancements and corrections of the classification
scheme of the data. Vector data also may require editing, such as the
trimming of overshoots of lines at intersections, deleting duplicate lines,
closing gaps in lines, and generating polygons. Data may require conversion
to either vector format or raster format to match other data sets which will
be used in the analysis. Additionally, the data preparation process includes
associating attribute data with the spatial features through either manual
input or reading digital attribute files into the GIS/DBMS.

Intended use
The intended use of the acquired spatial data may require only a subset
of the original data set, as only some of the features are relevant for
subsequent analysis or subsequent map production. In these cases, data
and/or cartographic generalization can performed on the original data set.

5.3.1 Data checks and repairs


Acquired data sets must be checked for quality in terms of the accuracy,
consistency and completeness parameters discussed above. Often, errors
can be identified automatically, after which manual editing methods can be
applied to correct the errors. Alternatively, some software may identify and
automatically correct certain types of errors. Below, we focus on the
geometric, topological, and attribute components of spatial data.
‘Clean-up’ operations are often performed in a standard sequence. For
example, crossing lines are split before dangling lines are erased, and nodes
are created at intersections before polygons are generated. These are
illustrated in Table 5.2.
With polygon data, one usually starts with many polylines, in an unwieldy
format known as spaghetti data, that are combined in the first step (from
Figure 5.9(a) to (b)). This results in fewer polylines with more internal vertices.
Then, polygons can be identified (c). Sometimes, polylines that should
connect to form closed boundaries do not, and therefore must be connected
(either manually or automatically); this step is not indicated in the figure. In
a final step, the elementary topology of the polygons can be derived (d).
Table 5.2: Clean-up operations for vector data

Before cleanup After cleanup Description Before cleanup After cleanup Description

Erase
duplicates Extend
or sliver undershoots
lines

Erase short Snap


objects clustered
nodes

Break
crossing Erase
objects dangling
objects or
overshoots

1 1 Dissolve Dissolve
2 2 polygons nodes into
2 vertices
(a) Spaghetti data (b) Spaghetti data (cleaned)

(c) Polygons (d) Topology

Figure 5.9: Successive clean-up operations for vector data, turning spaghetti data
into topological structure.

Associating attributes
Attributes may be automatically associated with features that have unique
identifiers. We have already discussed these techniques in Section 3.5. In the
case of vector data, attributes are assigned directly to the features, while in a
raster the attributes are assigned to all cells that represent a feature.

Rasterization or vectorization
Vectorization produces a vector data set from a raster. We have looked at
this in some sense already: namely in the production of a vector set from a
scanned image. Another form of vectorization takes place when we want to
identify features or patterns in remotely sensed imagery.
If much or all of the subsequent spatial data analysis is to be carried out on
raster data, one may want to convert vector data sets to raster data. This
process is known as rasterization.
It involves assigning point, line and polygon attribute values to raster cells
that overlap with the respective point, line or polygon. To avoid information
loss, the raster resolution should be carefully chosen on the basis of the
geometric resolution. A cell size which is too large may result in cells that
cover parts of multiple vector features, and then ambiguity arises as to what
value to assign to the cell. If, on the other hand, the cell size is too small, the
file size of the raster may increase significantly.
Rasterization itself could be seen as a ‘backwards step’: firstly, raster
boundaries are only an approximation of the objects’ original boundary.
Secondly, the original ‘objects’ can no longer be treated as such, as they have
lost their topological properties. Often the reason for rasterisation is because
it facilitates easier combination with other data sources also in raster formats,
and/or because there are several analytical techniques which are easier to
perform upon raster data and generating raster data from them when
needed. Obviously, the issue of performance trade-off must be looked into.

Topology generation

We have already discussed derivation of topology from vectorized data


sources. However, more topological relations may sometimes be needed, for
instance in networks, e.g. the questions of line connectivity, flow direction,
and which lines have overand underpasses. For polygons, questions that may
arise involve What kind of topology is polygon inculsion.
5.2.2 Combining data from multiple sources
A GIS project usually involves multiple data sets, so the next step addresses
the issue of how these multiple sets relate to each other. There are four
fundamental cases to be considered in the combination of data from
different sources:

1. They may be about the same area, but differ in accuracy,

2. They may be about the same area, but differ in choice of representation,

3. They may be about adjacent areas, and have to be merged into a single
data set.

4. They may be about the same or adjacent areas, but referenced in


different coordinate systems.

We look at these situations below. They are best understood with an


example.
Other data preparation functions

A range of other data preparation functions exist that support conversion or


adjustment of the acquired data to format requirements that have been
defined for data storage purposes. These include:

Format
• transformation functions. These convert between data formats of
different systems or representations, e.g. reading a DXF file into a GIS.
Although we will not focus on the technicalities here, the user should be
warned that conversions from one format to another may cause problems.
The reason is that not all formats can capture the same information, and
therefore conversions often mean loss of information. If one obtains a
spatial data set in format F , but needs it in format G (for instance because
the locally preferred GIS package requires it), then usually a conversion
function can be found, often within the same GIS software package. The
key to successful conversion is to also find an inverse conversion, back
from G to F , and to ascertain whether the double conversion back to F
results in the same data set as the original. If this is the case, both
conversions are not causing information loss, and can safely be applied.
Graphic element editing. Manual editing of digitized features so as to correct
• and to prepare a clean data set for topology building.
errors,

Coordinate
• thinning. A process that is often applied to remove redundant
or excess vertices from line representations, as obtained from digitizing.

5.4 Point data transformation


We may want to transform our points into other representations in order to
facilitate interpretation and/or integration with other data. Examples
include defining homogeneous areas (polygons) from our point data, or
deriving contour lines. This is generally referred to as interpolation, i.e.
the calculation of a
Interpolation value from ‘surrounding’ observations. The principle of spatial
autocorrelation plays a central part in the process of interpolation
In order to predict the value of a point for a given (x, y) location, we could
simply find the ‘nearest’ known value to the point, and assign that value.
This is the simplest form of interpolation, known as nearest-neighbour
interpolation. We might instead choose to use the distance that points are
away from (x, y) to weight their importance in our calculation.

Video 2 - GIS Interpolation Type Comparison

Source:https://www.youtube.com/watch?v=to6Eufi58hM
In some instances we may be dealing with a data type that limits the type of
interpolation we can do (refer to page 75 for a brief background). A fundamental
issue in this respect is what kind of phenomena we are considering: is it a discrete
field—such as geological units, for instance—in which the values are of a
qualitative nature and the data is categorical, or is it a continuous field—like
elevation, temperature, or salinity— in which the values are of a quantitative
nature, and represented as continuous measurements? This distinction matters
because we are limited to nearest-neighbour interpolation for discrete data.

A simple example is given in Figure 5.13. Our field survey has taken only two
measurements, one at P and one at Q. The values obtained in these two
locations are represented by a dark and light green tint, respectively. If we
are dealing with qualitative data, and we have no further knowledge, the only
assumption we can make for other locations is that those nearer to P
probably have P ’s value, whereas those nearer to Q have Q’s value. This is
illustrated in part (a).
If, on the contrary, our field is quantitative, we can let the values of P and Q
both contribute to values for other locations. This is done in part (b) of the
figure. To what extent the measurements contribute is determined by the
interpolation function. In the figure, the contribution is expressed in terms of
the ratio of distances to P and Q. We will see in the sequel that the choice of
interpolation function is a crucial factor in any method of point data
transformation.
How we represent a field constructed from point measurements in the GIS
also depends on the above distinction. A discrete field can either be
represented as a classified raster or as a polygon data layer, in which each
polygon has been assigned a (constant) field value. A continuous field can be
represented as an unclassified raster, as an isoline (thus, vector) data layer, or
perhaps as a TIN. Some GIS software only provide the option of generating
raster output, requiring an intermediate step of raster to vector conversion.
The choice of representation depends on what will be done with the data in
the analysis phase.
(a) (b)
Figure 5.13: A geographic field representation obtained from two point
measurements: (a) for qualitative (categorical), and (b) for quantitative
(continuous) point measurements. The value measured at P is represented as
dark green at Q as light green.

5.4.1 Interpolating discrete data


If we are dealing with discrete (nominal, categorical or ordinal) data, we are
effectively restricted to using nearest-neighbour interpolation. This is the
situation shown in Figure 5.13(a), though usually we would have many more
points.
In a nearest-neighbour interpolation, each location is assigned the value of
the closest measured point. Effectively, this technique will construct ‘zones’
around the points of measurement, with each point belonging to a zone
assigned the same value. Effectively, this represents an assignment of an
existing value (or category) to a location.

If the desired output was a polygon layer, we could construct Thiessen


polygons around the points of measurement. The boundaries of such
polygons, by definition, are the locations for which more than one point of
measurement is the closest point. An illustration is provided is Figure 5.14.

Figure 5.14: Generation of Thiessen polygons for qualitative point


measurements. The measured points are indicated in dark green; the
darker area indicates all locations assigned with the measurement value of
the central point
5.4.2 Interpolating continuous data

Interpolation of values from continuous measurements is significantly more complex.


This is the situation of Figure 5.13(b), but again, usually with many more point
measurements.
Since the data are continuous, we can make use of measured values for interpolation.
There are many continuous geographic fields—elevation, temperature and ground
water salinity are just a few examples. Commonly, continuous fields are represented as
rasters, and we will almost by default assume that they are. Alternatives exist though,
as we have seen in discussions in Chapter 2. The main alternative for continuous field
representation is a polyline vector layer, in which the lines are isolines. We will also
address these issues of representation below.

The aim is to use measurements to obtain a representation of the entire field using
point samples. In this section we outline four techniques to do so:

1. Trend surface fitting using regression,

2. Triangulation,

3. Spatial moving averages using inverse distance weighting,


4. Kriging.

1. Trend surface fitting

In trend surface fitting, the assumption is that the entire study area can be represented
by a formula f (x, y) that for a given location with coordinates (x, y) will give us the
approximated value of the field in that location.
The key objective in trend surface fitting is to derive a formula that best describes the
field. Various classes of formulæ exist, with the simplest being the one that describes a
flat, but tilted plane:

f (x, y) = c1 · x + c2 · y + c3.

If we believe—and this judgement must be based on domain expertise—that the field


under consideration can be best approximated by a tilted plane, then the problem of
finding the best plane is the problem of determining best values for the coefficients
c1, c2 and c3. This is where the point measurements earlier obtained become important.
Statistical techniques known as regression techniques.

Regression can be used to determine values for these coefficients ci that best fit with
the measurements. A plane will be fitted through the measurements that makes the
smallest overall error with respect to the original measurements.
In Figure 5.15, we have used the same set of point measurements, with four different
approximation functions. Part (a) has been determined under the assumption that the
field can be approximated by a tilted plane, in this case with a downward slope to the
southeast. The values found by regression techniques were: c1 = −1.83934, c2 =
1.61645 and c3 = 70.8782, giving us:

f (x, y) = −1.83934 · x + 1.61645 · y + 70.8782.

(a) (b)
10 10

8 8

6 6

4 4

2 2

0 0

-2 0 2 4 6 8 10 12 0 2 4 6 8 10 12

(c) (d)1

Figure 5.15: Various Clearly, not all fields are representable as simple, tilted planes.
Sometimes, the theory of the application domain will dictate that the best
approximation of the field is a more complicated, higher-order polynomial function.
Three such functions were the basis for the fields illustrated in Figure 5.15(b)–(d).
The simplest extension from a tilted plane, that of bilinear saddle, expresses some
dependency between the x and y dimensions:

f (x, y) = c1 · x + c2 · y + c3 · xy + c4.

This is illustrated in part (b). A further step up the ladder of complexity is to consider
quadratic surfaces, described by:

f (x, y) = c1 · x2 + c2 · x + c3 · y2 + c4 · y + c5 · xy + c6.
The objective is to find six values for our coefficients that best match with the
measurements. A bilinear saddle and a quadratic surface have been fitted through our
measurements in Figure 5.15(b) and (c), respectively.
Part (d) of the figure illustrates the most complex formula of the surfaces in Figure
5.15, the cubic surface. It is characterized by the following formula:

f (x, y) = c1 · x3 + c2 · x2 + c3 · x +

c4 · y + c5 · y + c6 · y 3+ 2

c7 · x y + c8 · xy + c9 · xy + c10.
2 2

The regression techniques applied for Figure 5.15 determined the following values for
the coefficients ci:
Fig 5.15 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10

(a) -1.83934 1.61645 70.8782

(b) -5.61587 -2.95355 0.993638 89.0418

(c) 0.000921084 -5.02674 -1.34779 7.23557 0.813177 76.9177

(d) -0.473086 6.88096 31.5966 -0.233619 1.48351 -2.52571 -0.115743 -0.052568 2.16927 96.8207

Trend surface fitting is a useful technique of continuous field approximation, though


determining the ‘best fit’ values for the coefficients ci is a time-consuming operation,
especially with many point measurements. Once these best values have been
determined, we know the formula, making it possible to compute an approximated
value for any location in the study area.
It is possible to use trend surfaces for both global and local trends. Global trend surface
fitting is based on the assumption that the entire study area can be approximated by
the same mathematical surface. However in many cases, the
Local trend surface fitting is not a popular technique in practical applications, because
they are relatively difficult to implement, and other techniques such as moving
windows are better for the representation and identification of local trends.
If we know the polynomial, it is relatively simple to generate a raster layer, given an
appropriate cell resolution and an approximation function for the cell’s value.
In some cases it is more accurate to assign the average of the computed values for all of
the cell’s corner points. In order to generate a vector layer representing this
Generating trend surfaces data, isolines can be derived, for a given set of intervals. The
specific techniques of generating isolines are not discussed here, however,
triangulation techniques discussed below can play a role.
2. Triangulation
Another way of interpolating point measurements is by triangulation. Essentially, this
technique constructs a triangulation of the study area from the known measurement
points. Preferably, the triangulation should be a Delaunay triangulation.3 After having
obtained it, we may define for which values of the field we want to construct isolines.
For instance, for elevation, we TINs and isolines might want to have the 100 m-isoline,
the 200 m-isoline, and so on. For each edge of a triangle, a geometric computation can
be performed that indicates which isolines intersect it, and at what positions they do
so. A list of computed locations, all at the same field value, is used by the GIS to
construct the isoline. This is illustrated in Figure 5.16.

Figure 5.16: Triangulation as a means of interpolation. (a) known point measurements; (b)
constructed triangulation on known points; (c) isolines constructed from the triangulation.

3. Moving averages using inverse distance weighting (IDW)


y.(a)10
8

-2
-2 0 2 4 68 10 12

10

-2
-2 0 2 4 6 8 10 12
Moving window averaging attempts to directly derive a raster dataset from a set of
sample points. This is why it is sometimes also called ‘gridding’. The principle behind
this technique is illustrated in Figure 5.17. The cell values for the output raster are
computed one by one. To achieve this, a ‘window’ (also known as a kernel) is defined,
and initially placed over the top left raster cell. Measurement Moving window
averaging points falling inside the window contribute to the averaging computation,
those outside the window do not. This is why moving window averaging is said to be
a local interpolation method. After the cell value is computed and assigned to the cell,
the window is moved one cell to the right, and the computations are performed for
that cell. Successively, all cells of the raster are visited in this wa example of moving
window averaging. In blue, the measurement points. A virtual window is moved over
the raster cells one by one, and some averaging function computes a field value for the
cell, using measurements within the window
In part (b) of the figure, the 295th cell value out of the 418 in total, is being computed. This
computation is based on eleven measurements, while that of the first cell had no
measurements available. Where this is the case, the cell should be assigned a value that signals
this ‘non-availability of measurements’.

Suppose there are n measurements selected in a window, and that a measurement is


denoted as mi. The simplest averaging function will compute the arithmetic mean,
treating all measurements equally:
n

m. i
n i=1

The principle of spatial autocorrelation suggests that measurements closer to the cell
centre should have greater influence on the predicted value than those futher away.
In order to account for this, a distance factor can be brought into the averaging
function. Functions that do this are called inverse distance weighting functions (IDW).
This is one of the most commonly used functions in interpolating spatial data.

Let us assume that the distance from measurement point i to the cell centre is denoted
by di. Commonly, the weight factor applied in inverse distance weighting is the distance
squared, but in the general case the formula is:

n n

Σmi Σ
1
/ .
p p
i=1 d i i
i=1 d
Moving window averaging has many parameters. As experimentation with any GIS package

will demonstrate, picking the right parameter settings may make quite a difference for the
resulting raster. We discuss some key parameters below. Raster resolution: Too large a cell
size will smooth the function too much, removing local variations; too small a cell size will
result in large clusters of equally valued cells, with little added value.

Shape/size of window: Most procedures use square windows, but rectangular, circular

or elliptical windows are also possible. These can be useful in cases where the
measurement points are distributed regularly at fixed distance over the study area, and
the window shape must be chosen to ensure that each raster cell will have its window
include the same number of measurement points. The size of the window is another
important matter. Small windows tend to exaggerate local extreme values, while large
windows have a smoothing effect on the predicted field values.

Selection
• criteria: Not necessarily all measurements within the window need to be used
in averaging. We may choose to select use at most five, (nearest) measurements, or we
may choose to only generate a field value if more than three measurements are in the
window.4

Averaging
• function: A final choice is which function is applied to the selected
measurements within the window. It is possible to use different distance–weighting
functions, each of which will influence the calculation of the resulting value.

4. Kriging

Kriging was originally developed my mining geologists attempting to derive accurate


estimates of mineral deposits in a given area from limited sample measurements. It is
an advanced interpolation technique belonging to the field of geostatistics , which can
deliver good results if applied properly and with enough sample points. Kriging is
usually used when the variation of an attribute and/or the density of sample points is
such that simple methods of interpolation may give unreliable predictions.

Kriging is based on the notion that the spatial change of a variable can be described
as a function of the distance between points. It is similar to IDW interpolation, in that
it the surrounding values are weighted to derive a value for an unmeasured location.
However, the kriging method also looks at the overall spatial arrangement of the
measured points and the spatial correlation between their values, to derive values for
an unmeasured location.

The first step in the kriging procedure is to compare successive pairs of point
measurements to generate a semi-variogram. In the second step, the semi-vario-gram
is used to calculate the weights used in interpolation.

Although kriging is Semi-variogram a powerful technique, it should not be applied


without a good understanding of geostatistics, including the principle of spatial
autocorrelation. For more detail on the various kriging methods, readers are referred
to [11].
Graded Questions

Chapter 04
Spatial referencing and positioning
1. Explain the spatial referencing surfaces for mapping.
2. Write short note on Geoid and Ellipsoid.
3. Explain local horizontal and vertical datum.
4. Describe Triangulation network with the help of diagram.
5. Explain the 2D and 3D Geographic coordinate system.
6. What is Map projection? Explain the types of map projection with the help of
diagrams in brief.
7. Explain the coordinate transformation with the help of 2D polar to 2D Cartesian.
8. Write short note on datum transformation.
9. What is map projection?
10. Describe the classification of map projection.
11. What is Satellite based positioning? Explain.
12. What is absolute positioning?
13. What are the errors in absolute positioning?
14. Write short note on
15. Errors related in space segment
16. Errors in related to medium
17. Errors related to the receiver’s environment
18. Errors related to relative geometry of satellites and receiver
19. Explain positioning technology and describe GPS, GLONASS and Galileo in brief.
20. Assume that the semi-major axis a of an ellipsoid is 6378137 m and the
flattening f is 1:298.257. Using these facts determine the semi-minor axis b
(make use of the given equations).
21. You are required to match GPS data with some map data. The GPS data and the
map layer are based on different horizontal datums. Which steps should you
take to make the GPS data spatially compatible with the map data?
Chapter 05
Data entry and preparation
1. Rasterization of vector data is sometimes required in data preparation. What
reasons may exist for this? If it is needed, the raster resolution must be carefully
selected. Argue why.
2. Write a short note on direct and indirect spatial data capture.
3. Explain the following terms i) digitizing ii) scanning iii) vectorization
4. What is digitizing? What are the types of digitizing? Explain.
5. Explain vectorization in brief.
6. What is scanning?
7. What is metadata? Explain dataformats and standards.
8. Write short notes on following terms in concerns with data quality.
Accuracy and Precision
Positional accuracy
Attribute accuracy.
Temporal accuracy
Lineage
Logical consistency
9. Explain Root Mean Square Error (RMSE).
10. Describe natural uncertainty in spatial data.
11. How do we data checks and repairing in data preparation process?
12. Write short note on
13. Explain point data transformation.
14. Explain interpolating discrete data.
15. Explain interpolating continuous data.
16. Write short note on Kriging.
17. Write short note on triangulation.
18. Write short note on Trend surface fitting.
MULTIPLE CHOICE QUESTIONS FOR QUIZ

1. What is the NAVSTAR GPS?


a. Global Positioning System
b. Geographic Projection System
c. General Policy System
d. Geographic Portion Service

2. Which Time system is used for satellite positioning


a. Atomic Clock
b. UTI
c. UTC (Universal Time Coordinate)
d. UTM

3. Errors related to the space segment is due to __________


a. Incorrect clock reading and orbit positioning
b. Ionosphere
c. Incorrect medium
d. Incorrect signal receivers

4. GPS and GLONASS and Galileo are the _________


a. Environmental Technology
b. Positional Technology
c. Fundamental Technology
d. Advanced Technology

5. Galileo is the name of the _________


a. Environmental unit
b. European Union (EU)
c. Geographic Technology
d. Advanced Technology
6. Data which is captured directly from the environment is known as ______
a. Secondary data
b. Primary data
c. Historical data
d. Important data

7. ________________ digitizing type in which the original map is fitted on a special surface
a. On tablet
b. On screen
c. On scanner
d. On page

8. The process of distilling points, lines and polygons from a scanned image is called _____
a. Rasterization
b. Vectorization
c. Transformation
d. Conversion

9. As a line is composed of an infinite number of points, confidence limits can be described


by a so-called______________ at a fixed distance on either side of the line.
a. epsilon (") or Perkal band
b. gipsililon
c. point_line
d. spatial-on

10. ______ describes the history of a data set.


a. Linage
b. signage
c. oldage
d. Historical data
e. Important data
11. ______________ involves checking for errors, inconsistencies, and simplification and
merging existing spatial data sets
a. Vetorization and Rasterization
b. Data transformation
c. Data cleaning and preparation
d. Data Conversion

12. Which one is not the part of interpolating continuous data.


a. Attribute accuracy
b. Kriging
c. Triangulation
d. Trend surface fitting

13. ________ is based on the notion that the spatial change of a variable can be
described as a function of the distance between points
a. Attribute accuracy
b. Kriging
c. Triangulation
d. Trend surface fitting

14. ________ converts vector data sets to raster data.


a. Rasterization
b. Vectorization
c. Transformation
d. Conversion

15. Data which is captured indirectly from the environment is known as ______
a. Secondary data
b. Primary data
c. Historical data
d. Important data

You might also like