PGIS Unit 3
PGIS Unit 3
Geographic
Information
Systems
MODULE-3: Spatial
Referencing and Positioning,
Data Entry and Preparation
Vidyalankar School of
Information Technology
Wadala (E), Mumbai
www.vsit.edu.in
Certificate
This is to certify that the e-book titled “Principles of
Geographic Information Systems”
comprises all elementary learning tools for a better understating of the
relevant concepts. This e-book is comprehensively compiled as per the
predefined eight parameters and guidelines.
Contents:
Data Preparation: Data checks and repairs, Combining data from multiple sources
Point Data Transformation: Interpolating discrete data, Interpolating continuous
data
Recommended Books
Chapter 4
Chapter 5
Chapter 4
Objectives:
i) To learn the relevance and actual use of reference surfaces,
Coordinate systems and coordinate transformations.
ii) To know about satellite-based positioning.
iii) Introduction of global positioning techniques.
In the early days of GIS, users were mainly handling spatially referenced data from a
single country. This data was usually derived from paper maps published by the
country’s mapping organization. Nowadays, GIS users are combining spatial data
from a given country with global spatial data sets, reconciling spatial data from
published maps with coordinates established with satellite positioning techniques
and integrating their spatial data with that from neighboring countries. To perform
these kinds of tasks successfully, GIS users need to understand basic spatial
referencing concepts.
One of the defining features of GIS is their ability to combine spatially referenced
data. A frequently occurring issue is the need to combine spatial data from
different sources that use different spatial reference systems. This section
provides a concept relating to the nature of spatial reference systems and the
translation of data from one spatial referencing system into another.
The surface of the Earth is anything but uniform. The oceans can be treated as
reasonably uniform, but the surface or topography of the land masses exhibits
large vertical variations between mountains and valleys. These variations make it
impossible to approximate the shape of the Earth with any reasonably simple
mathematical model. Consequently, two main reference surfaces have been
established to approximate the shape of the Earth.
One reference surface is called the Geoid, the other reference surface is the
ellipsoid.
These are illustrated in Figure 1. Below, we look at and discuss the respective uses
of each of these surfaces. And a reference ellipsoid. The Geoid separation (N) is
the deviation between the Geoid and a reference ellipsoid
Video on Sea-level
Source: https://www.youtube.com/watch?v=q65O3qA0-n4
The Geoid and the vertical datum
We can simplify matters by imagining that the entire Earth’s surface is covered by
water. If we ignore tidal and current effects on this ‘global ocean’, the resultant
water surface is affected only by gravity. This has an effect on the shape of this
surface because the direction of gravity–more commonly known as plumb. Plumb
line line–is dependent on the mass distribution inside the Earth. Due to
irregularities or mass anomalies in this distribution the ‘global ocean’ results in an
undulated surface. This surface is called the Geoid (Figure 2). The plumb line
through any surface point is always perpendicular to it.
The Geoid is used to describe heights. In order to establish the Geoid as reference
for heights, the ocean’s water level is registered at coastal places over several
years using tide gauges (mareographs). Averaging the registrations largely
eliminates variations of the sea level with time. The resulting water level rep-
resents an approximation to the Geoid and is called the mean sea level. For the
Netherlands and Germany, the local mean sea level is realized through the
Amsterdam tide-gauge (zero height). We can determine the height of a point in
Enschede with respect to the Amsterdam tide gauge using a technique known as
geodetic levelling (Figure 3). The result of process will be the height above local
mean sea level for the Enschede point. The height determined with respect to a
tide-gauge station is known as the orthometric height (height H above the Geoid).
Obviously, there are several realizations of local mean sea levels (also called local
vertical datums) in the world. They are parallel to the Geoid but offset by up to a
couple of meters. This offset is due to local phenomena such as ocean currents,
tides, coastal winds, water temperature and salinity at the location of the tide
gauge. Care must be taken when using heights from another local vertical datum.
For example, this might be the case in the border area of adjacent nations. Even
within a country, heights may differ depending on to which tide gauge, mean sea
level point, they are related. As an example, the mean sea level from the Atlantic
to the Pacific coast of the USA increases by 0.6 to 0.7 m. The tide gauge (zero
height) of the Netherlands differs -2.34 meters from the tide gauge (zero height)
of the neighboring country Belgium. The local vertical datum is implemented
through a levelling network (see Figure 3(a)). A levelling network consists of
benchmarks, whose height above mean sea level has been determined through
geodetic levelling. The implementation of the datum enables easy user access.
The surveyors do not need to start from scratch (i.e. from the Amsterdam tide-
gauge) every time they need to determine the height of a new point. They can
use the benchmark of the levelling network that is closest to the point of interest
(Figure 3(b)).
Above, we have defined a physical surface, the Geoid, as a reference surface for
heights. We also need a reference surface for the description of the horizontal
coordinates of points of interest. Since we will later project these horizontal
coordinates onto a mapping plane, the reference surface for horizontal
coordinates requires a mathematical definition and description. The most
convenient geoOblate ellipsoid metric reference is the oblate ellipsoid (Figure 4).
It provides a relatively simple figure which fits the Geoid to a first order
approximation, though for small scale mapping purposes a sphere may be used.
An ellipsoid is formed when an ellipse is rotated about its minor axis. This ellipse
which defines an ellipsoid or spheroid is called a meridian ellipse.1
Figure 4: An oblate ellipse, defined by its semimajor axis a & semiminor axis b.
Triangulation networks
at least one side of a triangle; the fundamental point is also a point in the
triangulation network. The angle measurements and the adopted coordinates of
the Fundamental point are then used to derive geographic coordinates ( ; ) for all
monument points of the triangulation network.
Within this framework, users do not need to start from scratch (i.e. from the
fundamental point) in order to determine the geographic coordinates of a new
point. They can use the monument of the triangulation network that is closest to
the new point. The extension and re-measurement of the network is nowadays
done through satellite measurements.
Local horizontal datums have been established to fit the Geoid well over the area
of local interest, which in the past was never larger than a continent. With
increasing demands for global surveying activities are underway to establish
global reference surfaces. The motivation is to make geodetic results mutually
comparable and to provide coherent results also to other disciplines like
astronomy and geophysics.
The most important global (geocentric) spatial reference system for the GIS
community is the International Terrestrial Reference System (ITRS). It is a three-
dimensional coordinate system with a well-defined origin (the centre of mass ITRS
of the Earth) and three orthogonal coordinate axes (X; Y; Z). The Z-axis points
towards a mean Earth north pole. The X-axis is oriented towards a mean
Greenwich meridian and is orthogonal to the Z-axis. The Y -axis completes the
right-handed reference coordinate system (Figure 7a).
Figure 7: (a) The Inter national Terrestrial Reference System (ITRS), and;-
(b) the International Terrestrial Reference Frame (ITRF) visualized as a dis-
tributed set of ground control stations (represented by red points).
The ITRS is realized through the International Terrestrial Reference Frame (ITRF),
a distributed set of ground control stations that measure their position
continuously using GPS (Figure 7b).
Constant re-measuring is needed because of the involvement of new control
stations and ongoing geophysical processes (mainly tectonic plate motion) that
deform the Earth’s crust at measurable global, re-gional and local scales. These
deformations cause positional differences in time, and have resulted in more than
one realization of the ITRS. Examples are the ITRF ITRF96 or the ITRF2000. The
ITRF96 was established at the 1st of January, 1997.
This means that the measurements use data up to 1996 to fix the geocentric co-
ordinates (X, Y and Z in metres) and velocities (positional change in X, Y and Z in
metres per year) at the different stations. The velocities are used to prop-agate
the measurements to other epochs (times). The trend is to use the ITRF
everywhere in the world for reasons of global compatibility.
GPS uses the World Geodetic System 1984 (WGS84) as its reference system. It has
been refined on several occasions and is now aligned with the ITRF to within a
few centimetres worldwide. Global horizontal datums, such as the ITRF2000 or
WGS84, are also called geocentric datums because they are geocentrically
Geocentric datums with respect to the centre of mass of the Earth. They became
available
Since the size and shape of satellite orbits is directly related to the centre of mass
of the Earth, observations of natural or artificial satellites can be used to pinpoint
the centre of mass of the Earth, and hence the origin of the ITRS.3 This technique
can also be used for the realization of the global ellipsoids and datums at the
accuracy level required for large-scale mapping.
We can easily transform ITRF coordinates (X, Y and Z in metres) into geo-graphic
coordinates with respect to the GRS80 ellipsoid without the loss of accuracy.
However, the ellipsoidal height h, obtained through this straight-forward
transformation, has no physical meaning and does not correspond to intuitive
human perception of height. We therefore use the height H, above 3D spatial
referencing the Geoid (see Figure 8). It is foreseeable that global 3D spatial
referencing,in terms of ( ; ; H), could become ubiquitous in the next 10–15 years.
If all published maps are also globally referenced by that time, the underlying
spa-tial referencing concepts will become transparent and hence redundant for
GIS users.
Video on Coordinate System Jargon- geoid, datum, projection
Source:https://www.youtube.com/watch?v=Z41Dt7_R180
Figure 8: Height h above the geocentric ellipsoid, and height H above the Geoid. The first is
measured orthogonal to the ellipsoid, the second orthogonal to the Geoid.
Hundreds of existing local horizontal and vertical datums are still relevant because
they form the basis of map products all over the world. For the next few years we
will be required to deal with both local and global datums until the former are
eventually phased out. During the transition period, we will require tools to
transform coordinates from local horizontal datums to a global horizontal datum
and vice versa.
Extra-terrestrial positioning techniques include Satellite Laser Ranging (SLR), Lunar Laser
Ranging (LLR), Global Positioning System (GPS), and Very Long Baseline Interferometry
(VLBI), among others.
The organizations that usually develop transformation tools and make them
available to the user community are provincial or National Mapping
Organizations (NMOs) and cadastral authorities.
normal through P ’ and the equatorial plane. Latitude is zero on the equator ( =
0 ), and increases towards the two poles to maximum values of = +90 (N 90 ) at
the North Pole and = -90 (S 90 ) at the South Pole.
The longitude ( ) is the angle between the meridian ellipse which passes through
Greenwich and the meridian ellipse containing the point in question. It is mea-
sured in the equatorial plane from the meridian of Greenwich ( = 0 ) either
eastwards through = + 180 (E 180 ) or westwards through = -180 (W 180 )
= 52 13026:200N; = 6 53032:100E
The graticule on a map represents the projected position of the geographic co-
ordinates ( ; ) at constant intervals, or in other words the projected position of
selected meridians and parallels (Figure 13). The shape of the graticule depends
largely on the characteristics of the map projection and the scale of the map.
3D Geographic coordinates
3D geographic coordinates are obtained by introducing the ellipsoidal height h
to the system. The ellipsoidal height (h) of a point is the vertical distance of the
point in question above the ellipsoid. It is measured in distance units along the
ellipsoidal normal from the point to the ellipsoid surface. 3D geographic
coordinates can be used to define a position on the surface of the Earth.
To represent parts of the surface of the Earth on a flat paper map or on a computer
screen, the curved screen, the curved horizontal reference surface must be mapped
onto the 2D mapping plane. The reference surface for large scale mapping is usually,
an oblate ellipsoid and for small scale mapping a sphere. Mapping onto a 2D
mapping plane means transforming each point on the reference surface with
geographic coordinates to a set of cartesian coordinates (x, y) representing. position
on the map plane (Figure 15).
https://www.youtube.com/watch?v=gGumy-9HrSY
Classification of map projections
These qualities in turn make resulting maps useful for certain pur poses. By definition,
any map projection is associated with scale distortions. Hundreds of map projections
have been developed, each with its own specific There is simply no way to flatten out
a piece of ellipsoidal or spherical surface without stretching some parts of the surface
more than others. The amount and which kind of distortions a map will have depends
on the type of the map projection that has been selected.
Typical choices for such intermediate surfaces are cones and cylinders. Such map
projections are then called conical, and cylindrical, respectively. Figure 16 shows the
surfaces involved in these three classes of projections.
In the geometric depiction of map projections in Figures 16 and 17, the symmetry
axes of the plane, cone and cylinder coincide with the rotation axis of the ellipsoid or
sphere, i.e. a line through N and S pole. In this case, the projection is said to be a
normal projection. The other cases are transverse projections (symmetry axis in the
equator) and oblique projections (symmetry axis is somewhere between the rotation
axis and equator of the ellipsoid or sphere). These cases are illustrated in Figure 18.
Cylindrical Conical Azimuthal
projection classes
Datum transformations
A change of map projection may also include a change of the horizontal datum.
This is the case when the source projection is based upon a different horizontal
datum than the target projection. If the difference in horizontal datums is ignored,
there will not be a perfect match between adjacent maps of neighbouring
countries or between overlaid maps originating from different projections. It may
result in up to several hundred meters difference in the resulting coordinates.
Therefore, spatial data with different underlying horizontal datums may need a
so-called datum transformation.
Suppose we wish to transform spatial data from the UTM projection to the Dutch
RD system, and that the data in the UTM system are related to the European
Datum 1950 (ED50), while the Dutch RD system is based on the Amersfoort
datum. In this example the change of map projection should be combined with a
datum transformation step for a perfect match. This is illustrated in Figure 23.
The inverse equation of projection A is used first to take us from the map
coordinates (x; y) of projection A to the geographic coordinates ( ; ; h) in datum
A. A height coordinate (h or H) may be added to the (x; y) map coordinates. Next,
the datum transformation takes us from these coordinates to the geographic
coordinates ( ; ; h) in datum B. Finally, the forward equation of projection B is used
to take us from the geographic coordinates ( ; ; h) in datum B to the map
coordinates (x0; y0) of projection B.
Figure 23: The principle of changing from one pro-jection into another, combined
with a datum trans-formation from datum A to datum B
ii) Requiring
• only low-cost equipment with low energy consumption at the
receiver end;
iii) Provision
• of results in real time for an unlimited number of users
concurrently;
Support for different levels of accuracy (military versus civilian); Around-the-
clock and weather-proof availability; Use of a single geodetic datum;
i) The space segment, i.e. the satellites that orbit the Earth, and the radio
signals that they emit,
ii) The control segment, i.e. the ground stations that monitor and maintain the
space segment components, and
iii) The user segment, i.e. the users with their hard and software to conduct
positioning.
b) This will result in the determination of the receiver’s actual position (X, Y, Z), as
well as its receiver clock bias ∆t, and if we correct the receiver clock for this bias
we effectively turn it into a high precision, atomic clock as well!
Obtaining a high precision clock is a fortunate side-effect of using the receiver, as
it allows the design of experiments distributed in geographic space that demand
high levels of synchrony. One such application is the use of wireless sensor
networks for various natural phenomena like earthquakes, meteorological
patterns or in water management.
Another application is in the positioning of mobile phone users making an
emergency call. Often the caller does not know their location accurately. The
telephone company can trace back the call to the receiving transmitter mast, but
this may be servicing an area with a radius of 300 m to 6 km. That is too inaccurate
a position for an emergency ambulance to go to. However, if all masts in the
telephony network are equipped with a satellite positioning receiver (and thus,
with a very good, synchronized clock) the time of reception of the call at each
mast can be recorded. The time difference of arrival of the call between two nearby
masts determines a hyperbola on the ground of possible positions of the caller; if
the call is received on three masts, we would have two hyperbolas, allowing
intersection, and thus ‘hyperbolic positioning’. With current technology the
(horizontal) accuracy would be better than 30 m. Returning to the subject of
satellite-based positioning, when only three and not four satellites are ‘in view’,
the receiver is capable of falling back from the above 3D positioning mode to the
inferior 2D positioning mode. With the relative abundant 2D positioning mode
dance of satellites in orbit around the earth, this is a relatively rare situation, but
it serves to illustrate the importance of 3D positioning.
If a 3D fix has already been obtained, the receiver simply assumes that the height
above the ellipsoid has not changed since the last 3D fix. If no fix had yet been
obtained, the receiver assumes that it is positioned at the geocentric ellipsoid
adopted by the positioning system, i.e. at height h=0.8 In the receiver
computations, the ellipsoid fills the slot of the missing fourth satellite sphere, and
the unknown variables can therefore still be determined. Clearly in both of these
cases, the assumption for this computation is flawed and the positioning results
in 2D mode will be unreliable—much more so if no previous fix had been
obtained and one’s receiver is not at all near the surface of the geocentric ellipsoid.
Pseud
oRan
Receive
r Clock
Time, clocks and world time
When trains became an important means of transportation, these local time
systems became problematic as the schedules required a single time system. Such
a time system needed the definition of time zones: typically as 24 geographic
strips certain longitudes that are multiples of 15◦. This all gave rise to Greenwich
Mean Time (GMT). GMT was the world time standard of choice. It was a system
based on the mean solar time at the meridian of Greenwich, United Kingdom,
which is the conventional 0-meridian in geography.
GMT was later replaced by Universal Time (UT), a system still based on meridian
crossings of stars, but now of far away quasars as this provides more accuracy
than that of the Sun. It is still the case that the rotational velocity of our planet is
not constant and the length of a solar day is increasing. So UT is not a perfect
system either. It continues to be used for civil clock time, but it is officially now
replaced by International Atomic Time (TAI). UT actually has various versions,
amongst which are UT0, UT1 and UTC. UT0 is the Earth rotational time observed in
some location. Because the Earth experiences polar motion as well, UT0 differs
between locations. If we correct for polar motion, we obtain UT1, which is identical
everywhere. It is still a somewhat erratic clock because of the earlier mentioned
varying rotational velocity of the planet. The uncertainty is about 3 msec per day.
Coordinated Universal Time (UTC) is used in satellite positioning, and is
maintained with atomic clocks. By convention, it is always within a margin of 0.9
sec of UT1, and twice annually it may be given a shift to stay within that margin.
This occasional shift of a leap second is applied at the end of June 30 or preferably
at the end of December 31. The last minute of such a day is then either 59 or
UTC 61 seconds long. So far, adjustments have always been to add a
second. UTC time can only be determined to the highest precision after the fact,
as atomic time is determined by the reconciliation of the observed differences
between a number of atomic clocks maintained by different national time
bureaus.
In recent years we have learned to measure distance, therefore also position, with
clocks using satellite signals. The conversion factor is the speed of light,
approximately 3 108· m/s in vacuum. No longer can multiple seconds of clock bias
be allowed, and this is where atomic clocks come in. They are very accurate
Atomic clocks timekeepers, based on the exactly known frequency with which
specific atoms (Cesium, Rubidium and Hydrogen) make discrete energy state
jumps. Positioning satellites usually have multiple clocks on board; ground control
stations have even better quality atomic clocks.
1. Incorrect clock reading: Even atomic clocks can be off by a small margin, and
since Einstein, we know that travelling clocks are slower than resident clocks,
due to a so-called relativistic effect. If one understands that a clock that is off
by 0.000001 sec causes an computation error in the satellite’s pseudorange of
approximately 300 m, it is clear that these satellite clocks require very strict
monitoring.
2. Incorrect orbit position: The orbit of a satellite around our planet is easy to
describe mathematically if both bodies are considered point masses, but in real
life they are not. For the same reasons that the Geoid is not a simply shaped
surface, the Earth’s gravitation field that a satellite experiences
Both types of error are strictly monitored by the ground control segment, which is
responsible for correcting any errors of this nature, but it does so by applying an
agreed upon tolerance. A control station can obviously compare results of
positioning computations like discussed above with its accurately known position,
flagging any unacceptable errors, and potentially labelling a satellite as
temporarily ‘unhealthy’ until errors have been corrected, and brought to within
the tolerance. This may be done by uploading a correction on the clock or orbit
settings to the satellite.
The ionosphere:
• outward part of the atmosphere that starts at an altitude of 90 km,
the most
holding many electrically charged atoms, thereby forming a protection
against various forms of radiation from space, including to some extent radio
waves. The degree of ionization shows a distinct night and day rhythm, and
also depends on solar activity. The latter is a more severe source of delay to
satellite signals, which obviously means that pseudo ranges are estimated
larger than they actually are.
When satellites emit radio signals at two or more frequencies, an estimate
can be computed from differences in delay incurred for signals of different
frequency, and this will allow for the correction of atmospheric delay, leading
to a 10–50% improvement of accuracy. If this is not the case, or if the receiver
is capable of receiving just a single frequency, a model should be applied to
forecast the (especially ionospheric) delay, typically taking into account the
time of day and current latitude of the receiver.
Figure 26: At any point in time, a number of satellites will be above the
receiver’s horizon. But not all of them will be ‘in view’ (like the left and
right satellites), and for others multipath signal reception may occur.
There is one more source of error that is unrelated to individual radio signal
characteristics, but that rather depends on the combination of the satellite
signals used for positioning. Of importance is their constellation in the sky
from the receiver perspective. Referring to Figure 27, one will understand that
the sphere intersection technique of positioning will provide more precise
results when the four satellites are nicely spread over the sky, and thus that
the satellite constellation of Figure 27(b) is preferred over the one of 27(a).
(a) (b)
Figure 27: Geometric dilution of precision. The four satellites used for po-
sitioning can be in a bad constellation (a) or in a better constellation (b).
Finally, there is also a notion of inverted relative positioning. The principles are
still as above, but in this technique the target receiver does not correct for
satellite pseudorange error either, but uses a data link to upload its
positioning/timing information to a central repository, where the corrections
are applied. This can be useful in cases where many target receivers are needed
and budget does not allow them to be expensive.
Video on Absolute Location Versus Relative Location - YouTube
Up until this point, we have assumed that the receiver determines the
range of a satellite by measuring time delay on the received ranging code.
There exists a more advanced range determination technique known as
carrier phase measurement. This typically requires more advanced receiver
technology, and longer observation sessions. Carrier phase measurement
can currently only be used with relative positioning, as absolute
positioning using this method is notll developed.
The carrier waves at the given frequencies, A coarse ranging code, known
as C/A, modulated on L1,:A navigation message modulated on both L1
and L2.The role of L2 is to provide a second radio signal, thereby allowing
(the more expensive) dual-frequency receivers a way of determining fairly
precisely the actual ionospheric delay on satellite signals received.
An •encrypted precision ranging code, known as P(Y), modulated on L1
and L2, and
Figure 28: Constellation of satellites, four shown in only one orbit plane,
in the GPS system.
To identify the satellite that sent the signal, as each satellite sends unique
codes, and the receiver has a look-up table for these codes, and
To determine the signal transit time, and thus the satellite’s pseudorange.
The navigation message contains the satellite orbit and satellite clock error
information, as well as some general system information. GPS also carries
a fifth, encrypted military signal carrying the M-code. GPS uses WGS84 as
its reference system. It has been refined on several occasions and is now
aligned with theWGS84 and ITRF. ITRF at the level of a few centimetres
worldwide. GPS has adopted UTC as its time system.
In the civil market, GPS receivers of varying quality are available, their
quality depending on the embedded positioning features: supporting
single or dual frequency, supporting only absolute or also relative
positioning, performing code measurements or also carrier phase
measurements. Leica and Trimble are two of the well-known brands in the
high-precision, professional surveying domain; Magellan and Garmin, for
instance, operate in the lower price, higher volume consumer market
range, amongst others for recreational use in outdoor activities. Many of
these are single frequency receivers, doing only code measurements,
though some are capable of relative positioning. This includes the new
generation of GPS-enabled mobile phones.
GLONASS
GLONASS radio signals are somewhat similar to that of GPS, but differ in
the details. Satellites use different identifier schemes, and their navigation
message use other parameters. They also use different frequencies:
GLONASS L1 is at approximately 1605 MHz (changes are underway), and
L2 is at approximately 1248 MHz. Otherwise, the GLONASS system
performance is rather comparable to that of GPS.
Galileo
In the 1990’s, the European Union (EU) judged that it needed to have its
own satellite-based positioning system, to become independent of the GPS
monopoly and to support its own economic growth by providing services
of high reliability under civilian control.
Objectives:
1. To know the collection and use of data.
2. To prepare users of spatial data by drawing attention to issues concerning
data accuracy and quality.
3. A range of procedures for data checking and clean-up.
4. Methods for interpolating point data.
With primary data the core concern in knowing its properties is to know
the process by which it was captured, the parameters of any instruments
used and the rigour with which quality requirements were observed.
Remotely sensed imagery is usually not fit for immediate use, as various
sources of error and distortion may have been present, and the imagery
should first be freed from these. This is the domain of remote sensing.
Any data which is not captured directly from the environment is known
as
secondary data.
Below we discuss key sources of secondary data and issues related to their
use in analysis of which the user should be aware.
Digitizing
A traditional method data is through digitizing existing paper maps. This
can be done using various techniques. Before adopting this approach, one
must be aware that positional errors already in the paper map will further
accumulate, and one must be willing to accept these errors.
There are two forms of digitizing: i) on-tablet and ii) on-screen manual
digitizing. In on-tablet digitizing, the original map is fitted on a special
surface (the tablet), while in on-screen digitizing, a scanned image of the
map (or some other image) is shown on the computer screen.
Scanning
An ‘office’ scanner illuminates a document and measures the intensity of
the reflected light with a CCD array. The result is an image as a matrix of
pixels, each of which holds an intensity value. Office scanners have a fixed
maximum resolution, expressed as the highest number of pixels they can
identify per inch;
Resolution the unit is dots-per-inch (dpi). For manual on-screen digitizing
of a paper map, a resolution of 200–300 dpi is usually sufficient, depending
on the thickness of the thinnest lines. For manual on-screen digitizing of
aerial photographs, higher resolutions are recommended—typically, at
least 800 dpi.
Semiautomatic digitizing requires a resolution that results in scanned lines
of at least three pixels wide to enable the computer to trace the centre of
the lines and thus avoid displacements. For paper maps, a resolution of
300–600 dpi is usually sufficient. Automatic or semi-automatic tracing from
aerial photographs can only be done in a limited number of cases. Usually,
the information from aerial photos is obtained through visual
interpretation.
After scanning, the resulting image can be improved with various image
processing techniques. It is important to understand that scanning does
not result in a structured data set of classified and coded objects.
Additional work is required to recognize features and to associate
categories and other thematic attributes with them.
Vectorization
The process of distilling points, lines and polygons from a scanned image
is called vectorization. As scanned lines may be several pixels wide, they
are often first thinned to retain only the centreline. The remaining
centreline pixels are converted to series of (x, y) coordinate pairs, defining
a polyline. Subsequently, OCR features are formed and attributes are
attached to them. This process may be entirely automated or performed
semi-automatically, with the assistance of an operator.
The phases of the vectorization process are illustrated in Figure 5.1.
nois
scanned image e
Fig. 5.1 phases of the vectorization process and the various sorts of small
error caused by it. The post-processing phase makes the final repairs.
In essence, metadata answer who, what, when, where, why, and how
questions about all facets of the data made available. Maintaining
metadata is an key part in maintaining data and information quality in GIS.
This is because it can serve different purposes, from description of the data
itself through to providing instructions for data handling. Depending on
the type and amount of metadata provided, it could be used to determine
the data sets that exist for a geographic location, evaluate whether a given
data set meets a specified need, or to process and use a data set.
With the advent of satellite remote sensing, GPS and GIS technology, and the
increasing availability of digital spatial data, resource managers and others
who formerly relied on the surveying and mapping profession to supply high
quality map products are now in a position to produce maps themselves. At
the same time, GISs are being increasingly used for decision support
applications, with Application requirements increasing reliance on
secondary data sourced through data providers or via the internet, through
geo-webservices. The implications of using low-quality data in important
decisions are potentially severe. There is also a danger that uninformed GIS
users introduce errors by incorrectly applying geometric and other
transformations to the spatial data held in their database.
So far we have used the terms error, accuracy and precision without
appropriately defining them. Accuracy should not be confused with precision,
which is a statement of the smallest unit of measurement to which data can
be recorded. In conventional surveying and mapping practice, accuracy and
precision are closely related. Instruments with an appropriate precision are
employed, and surveying methods chosen, to meet specified accuracy
tolerances.
(c)
T T (b)
T T (d)
Figure 5.2: A measurement probability function and the underlying true value T:
(c) good accuracy/bad precision, and (d) good accuracy and precision.
For each checkpoint, the error vector has components δx and δy. The
observed errors should be checked for a systematic error component,
which may indicate
Σ a (possibly repairable) lapse in the measurement
method. Systematic error has occurred when δx ƒ= 0 or δy ƒ= 0.
Σ
The systematic error δx in x is then defined as the average deviation from
the true value:
n
1Σ
δx = nδx . i
i=1
where δx2 stands for δx δx. The total RMSE is obtained with the formula
which, by the Pythagorean rule, is the length of the average (root squared)
vector.
Accuracy tolerances
Many kinds of measurement can be naturally represented by a bell-
shaped probability density function p, as depicted in Figure 5.4(a). This
function is known as the normal (or Gaussian) distribution of a
is the mean expected value for Y , and σ which is the standard deviation of
(a)
Figure 5.4: (a) Probability density function p of a variable Y , with its mean µ
and standard deviation σ
(a) The probability that Y is in
(b) the range [µ−σ,(c) µ+σ
≤ −
Any probability density function p has the characteristic that the area
between its curve and the horizontal axis has size 1. Probabilities P can be
inferred from p as the size of an area under p’s curve. Figure 5.4(b), for− ≤
instance, depicts P (µ σY µ σ), i.e. the probability that the value for Y
is within distance σ from µ. In a normal distribution this specific probability
for Y is always 0.6826.
The RMSE can be used to assess the probability that a particular set of
measurements does not deviate too much from, i.e. is within a certain
range of, the ‘true’ value. In the case of coordinates, the probability density
function often is considered to be that of a two-dimensional normally
distributed variable (see Figure 5.5). The three standard probability values
associated with this distribution are:
0.50
• for a circle with a radius of 1.1774 mx around the mean (known as
the circular error probable, CEP);
0.6321
• for a circle with a radius of 1.412 mx around the mean (known as
the root mean square error, RMSE);
0.90 for a circle with a radius of 2.146 mx around the mean (known as the
•
circular map accuracy standard, CMAS).
0.2
0.0
In the ground plane, from inside out, are indicated the circles respectively
associated with CEP, RMSE and CMAS.
Figure 5.6: The εor Perkal band is formed by rolling an imaginary circle
of a given radius along a line
2 1
Figure 5.7: The ε-band may be used to assess the likelihood that a point
falls within a particular polygon. Source: [43]. Point 3 is less likely part of the
middle polygon than point
Describing natural uncertainty in spatial data
Forest 62 5 0 67
Agriculture 2 18 0 20
Urban 0 1 12 13
total 64 24 12 100
Table 5.1: Example of a simple error matrix for assessing map attribute
accuracy.
5.2.5 Lineage
Lineage describes the history of a data set. In the case of published maps,
some lineage information may be provided as part of the metadata, in the
form of a note on the data sources and procedures used in the compilation
of the data. Examples include the date and scale of aerial photography, and
the date of field verification. Especially for digital data sets, however,
lineage may be defined more formally as:
“that part of the data quality statement that contains information that
describes the source of observations or materials, data acquisition and
compilation methods, conversions, transformations, analyses and
derivations that the data has been subjected to, and the assumptions and
criteria applied at any stage of its life.” [14]
The• compatibility of data with other data in a data set (e.g. in terms of data
format),
The absence of any inconsistencies does not necessarily imply that the
data are accurate.
5.3 Data preparation
Spatial data preparation aims to make the acquired spatial data fit for use.
Images may require enhancements and corrections of the classification
scheme of the data. Vector data also may require editing, such as the
trimming of overshoots of lines at intersections, deleting duplicate lines,
closing gaps in lines, and generating polygons. Data may require conversion
to either vector format or raster format to match other data sets which will
be used in the analysis. Additionally, the data preparation process includes
associating attribute data with the spatial features through either manual
input or reading digital attribute files into the GIS/DBMS.
Intended use
The intended use of the acquired spatial data may require only a subset
of the original data set, as only some of the features are relevant for
subsequent analysis or subsequent map production. In these cases, data
and/or cartographic generalization can performed on the original data set.
Before cleanup After cleanup Description Before cleanup After cleanup Description
Erase
duplicates Extend
or sliver undershoots
lines
Break
crossing Erase
objects dangling
objects or
overshoots
1 1 Dissolve Dissolve
2 2 polygons nodes into
2 vertices
(a) Spaghetti data (b) Spaghetti data (cleaned)
Figure 5.9: Successive clean-up operations for vector data, turning spaghetti data
into topological structure.
Associating attributes
Attributes may be automatically associated with features that have unique
identifiers. We have already discussed these techniques in Section 3.5. In the
case of vector data, attributes are assigned directly to the features, while in a
raster the attributes are assigned to all cells that represent a feature.
Rasterization or vectorization
Vectorization produces a vector data set from a raster. We have looked at
this in some sense already: namely in the production of a vector set from a
scanned image. Another form of vectorization takes place when we want to
identify features or patterns in remotely sensed imagery.
If much or all of the subsequent spatial data analysis is to be carried out on
raster data, one may want to convert vector data sets to raster data. This
process is known as rasterization.
It involves assigning point, line and polygon attribute values to raster cells
that overlap with the respective point, line or polygon. To avoid information
loss, the raster resolution should be carefully chosen on the basis of the
geometric resolution. A cell size which is too large may result in cells that
cover parts of multiple vector features, and then ambiguity arises as to what
value to assign to the cell. If, on the other hand, the cell size is too small, the
file size of the raster may increase significantly.
Rasterization itself could be seen as a ‘backwards step’: firstly, raster
boundaries are only an approximation of the objects’ original boundary.
Secondly, the original ‘objects’ can no longer be treated as such, as they have
lost their topological properties. Often the reason for rasterisation is because
it facilitates easier combination with other data sources also in raster formats,
and/or because there are several analytical techniques which are easier to
perform upon raster data and generating raster data from them when
needed. Obviously, the issue of performance trade-off must be looked into.
Topology generation
2. They may be about the same area, but differ in choice of representation,
3. They may be about adjacent areas, and have to be merged into a single
data set.
Format
• transformation functions. These convert between data formats of
different systems or representations, e.g. reading a DXF file into a GIS.
Although we will not focus on the technicalities here, the user should be
warned that conversions from one format to another may cause problems.
The reason is that not all formats can capture the same information, and
therefore conversions often mean loss of information. If one obtains a
spatial data set in format F , but needs it in format G (for instance because
the locally preferred GIS package requires it), then usually a conversion
function can be found, often within the same GIS software package. The
key to successful conversion is to also find an inverse conversion, back
from G to F , and to ascertain whether the double conversion back to F
results in the same data set as the original. If this is the case, both
conversions are not causing information loss, and can safely be applied.
Graphic element editing. Manual editing of digitized features so as to correct
• and to prepare a clean data set for topology building.
errors,
Coordinate
• thinning. A process that is often applied to remove redundant
or excess vertices from line representations, as obtained from digitizing.
Source:https://www.youtube.com/watch?v=to6Eufi58hM
In some instances we may be dealing with a data type that limits the type of
interpolation we can do (refer to page 75 for a brief background). A fundamental
issue in this respect is what kind of phenomena we are considering: is it a discrete
field—such as geological units, for instance—in which the values are of a
qualitative nature and the data is categorical, or is it a continuous field—like
elevation, temperature, or salinity— in which the values are of a quantitative
nature, and represented as continuous measurements? This distinction matters
because we are limited to nearest-neighbour interpolation for discrete data.
A simple example is given in Figure 5.13. Our field survey has taken only two
measurements, one at P and one at Q. The values obtained in these two
locations are represented by a dark and light green tint, respectively. If we
are dealing with qualitative data, and we have no further knowledge, the only
assumption we can make for other locations is that those nearer to P
probably have P ’s value, whereas those nearer to Q have Q’s value. This is
illustrated in part (a).
If, on the contrary, our field is quantitative, we can let the values of P and Q
both contribute to values for other locations. This is done in part (b) of the
figure. To what extent the measurements contribute is determined by the
interpolation function. In the figure, the contribution is expressed in terms of
the ratio of distances to P and Q. We will see in the sequel that the choice of
interpolation function is a crucial factor in any method of point data
transformation.
How we represent a field constructed from point measurements in the GIS
also depends on the above distinction. A discrete field can either be
represented as a classified raster or as a polygon data layer, in which each
polygon has been assigned a (constant) field value. A continuous field can be
represented as an unclassified raster, as an isoline (thus, vector) data layer, or
perhaps as a TIN. Some GIS software only provide the option of generating
raster output, requiring an intermediate step of raster to vector conversion.
The choice of representation depends on what will be done with the data in
the analysis phase.
(a) (b)
Figure 5.13: A geographic field representation obtained from two point
measurements: (a) for qualitative (categorical), and (b) for quantitative
(continuous) point measurements. The value measured at P is represented as
dark green at Q as light green.
The aim is to use measurements to obtain a representation of the entire field using
point samples. In this section we outline four techniques to do so:
2. Triangulation,
In trend surface fitting, the assumption is that the entire study area can be represented
by a formula f (x, y) that for a given location with coordinates (x, y) will give us the
approximated value of the field in that location.
The key objective in trend surface fitting is to derive a formula that best describes the
field. Various classes of formulæ exist, with the simplest being the one that describes a
flat, but tilted plane:
f (x, y) = c1 · x + c2 · y + c3.
Regression can be used to determine values for these coefficients ci that best fit with
the measurements. A plane will be fitted through the measurements that makes the
smallest overall error with respect to the original measurements.
In Figure 5.15, we have used the same set of point measurements, with four different
approximation functions. Part (a) has been determined under the assumption that the
field can be approximated by a tilted plane, in this case with a downward slope to the
southeast. The values found by regression techniques were: c1 = −1.83934, c2 =
1.61645 and c3 = 70.8782, giving us:
(a) (b)
10 10
8 8
6 6
4 4
2 2
0 0
-2 0 2 4 6 8 10 12 0 2 4 6 8 10 12
(c) (d)1
Figure 5.15: Various Clearly, not all fields are representable as simple, tilted planes.
Sometimes, the theory of the application domain will dictate that the best
approximation of the field is a more complicated, higher-order polynomial function.
Three such functions were the basis for the fields illustrated in Figure 5.15(b)–(d).
The simplest extension from a tilted plane, that of bilinear saddle, expresses some
dependency between the x and y dimensions:
f (x, y) = c1 · x + c2 · y + c3 · xy + c4.
This is illustrated in part (b). A further step up the ladder of complexity is to consider
quadratic surfaces, described by:
f (x, y) = c1 · x2 + c2 · x + c3 · y2 + c4 · y + c5 · xy + c6.
The objective is to find six values for our coefficients that best match with the
measurements. A bilinear saddle and a quadratic surface have been fitted through our
measurements in Figure 5.15(b) and (c), respectively.
Part (d) of the figure illustrates the most complex formula of the surfaces in Figure
5.15, the cubic surface. It is characterized by the following formula:
f (x, y) = c1 · x3 + c2 · x2 + c3 · x +
c4 · y + c5 · y + c6 · y 3+ 2
c7 · x y + c8 · xy + c9 · xy + c10.
2 2
The regression techniques applied for Figure 5.15 determined the following values for
the coefficients ci:
Fig 5.15 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
(d) -0.473086 6.88096 31.5966 -0.233619 1.48351 -2.52571 -0.115743 -0.052568 2.16927 96.8207
Figure 5.16: Triangulation as a means of interpolation. (a) known point measurements; (b)
constructed triangulation on known points; (c) isolines constructed from the triangulation.
-2
-2 0 2 4 68 10 12
10
-2
-2 0 2 4 6 8 10 12
Moving window averaging attempts to directly derive a raster dataset from a set of
sample points. This is why it is sometimes also called ‘gridding’. The principle behind
this technique is illustrated in Figure 5.17. The cell values for the output raster are
computed one by one. To achieve this, a ‘window’ (also known as a kernel) is defined,
and initially placed over the top left raster cell. Measurement Moving window
averaging points falling inside the window contribute to the averaging computation,
those outside the window do not. This is why moving window averaging is said to be
a local interpolation method. After the cell value is computed and assigned to the cell,
the window is moved one cell to the right, and the computations are performed for
that cell. Successively, all cells of the raster are visited in this wa example of moving
window averaging. In blue, the measurement points. A virtual window is moved over
the raster cells one by one, and some averaging function computes a field value for the
cell, using measurements within the window
In part (b) of the figure, the 295th cell value out of the 418 in total, is being computed. This
computation is based on eleven measurements, while that of the first cell had no
measurements available. Where this is the case, the cell should be assigned a value that signals
this ‘non-availability of measurements’.
The principle of spatial autocorrelation suggests that measurements closer to the cell
centre should have greater influence on the predicted value than those futher away.
In order to account for this, a distance factor can be brought into the averaging
function. Functions that do this are called inverse distance weighting functions (IDW).
This is one of the most commonly used functions in interpolating spatial data.
Let us assume that the distance from measurement point i to the cell centre is denoted
by di. Commonly, the weight factor applied in inverse distance weighting is the distance
squared, but in the general case the formula is:
n n
Σmi Σ
1
/ .
p p
i=1 d i i
i=1 d
Moving window averaging has many parameters. As experimentation with any GIS package
•
will demonstrate, picking the right parameter settings may make quite a difference for the
resulting raster. We discuss some key parameters below. Raster resolution: Too large a cell
size will smooth the function too much, removing local variations; too small a cell size will
result in large clusters of equally valued cells, with little added value.
Shape/size of window: Most procedures use square windows, but rectangular, circular
•
or elliptical windows are also possible. These can be useful in cases where the
measurement points are distributed regularly at fixed distance over the study area, and
the window shape must be chosen to ensure that each raster cell will have its window
include the same number of measurement points. The size of the window is another
important matter. Small windows tend to exaggerate local extreme values, while large
windows have a smoothing effect on the predicted field values.
Selection
• criteria: Not necessarily all measurements within the window need to be used
in averaging. We may choose to select use at most five, (nearest) measurements, or we
may choose to only generate a field value if more than three measurements are in the
window.4
Averaging
• function: A final choice is which function is applied to the selected
measurements within the window. It is possible to use different distance–weighting
functions, each of which will influence the calculation of the resulting value.
4. Kriging
Kriging is based on the notion that the spatial change of a variable can be described
as a function of the distance between points. It is similar to IDW interpolation, in that
it the surrounding values are weighted to derive a value for an unmeasured location.
However, the kriging method also looks at the overall spatial arrangement of the
measured points and the spatial correlation between their values, to derive values for
an unmeasured location.
The first step in the kriging procedure is to compare successive pairs of point
measurements to generate a semi-variogram. In the second step, the semi-vario-gram
is used to calculate the weights used in interpolation.
Chapter 04
Spatial referencing and positioning
1. Explain the spatial referencing surfaces for mapping.
2. Write short note on Geoid and Ellipsoid.
3. Explain local horizontal and vertical datum.
4. Describe Triangulation network with the help of diagram.
5. Explain the 2D and 3D Geographic coordinate system.
6. What is Map projection? Explain the types of map projection with the help of
diagrams in brief.
7. Explain the coordinate transformation with the help of 2D polar to 2D Cartesian.
8. Write short note on datum transformation.
9. What is map projection?
10. Describe the classification of map projection.
11. What is Satellite based positioning? Explain.
12. What is absolute positioning?
13. What are the errors in absolute positioning?
14. Write short note on
15. Errors related in space segment
16. Errors in related to medium
17. Errors related to the receiver’s environment
18. Errors related to relative geometry of satellites and receiver
19. Explain positioning technology and describe GPS, GLONASS and Galileo in brief.
20. Assume that the semi-major axis a of an ellipsoid is 6378137 m and the
flattening f is 1:298.257. Using these facts determine the semi-minor axis b
(make use of the given equations).
21. You are required to match GPS data with some map data. The GPS data and the
map layer are based on different horizontal datums. Which steps should you
take to make the GPS data spatially compatible with the map data?
Chapter 05
Data entry and preparation
1. Rasterization of vector data is sometimes required in data preparation. What
reasons may exist for this? If it is needed, the raster resolution must be carefully
selected. Argue why.
2. Write a short note on direct and indirect spatial data capture.
3. Explain the following terms i) digitizing ii) scanning iii) vectorization
4. What is digitizing? What are the types of digitizing? Explain.
5. Explain vectorization in brief.
6. What is scanning?
7. What is metadata? Explain dataformats and standards.
8. Write short notes on following terms in concerns with data quality.
Accuracy and Precision
Positional accuracy
Attribute accuracy.
Temporal accuracy
Lineage
Logical consistency
9. Explain Root Mean Square Error (RMSE).
10. Describe natural uncertainty in spatial data.
11. How do we data checks and repairing in data preparation process?
12. Write short note on
13. Explain point data transformation.
14. Explain interpolating discrete data.
15. Explain interpolating continuous data.
16. Write short note on Kriging.
17. Write short note on triangulation.
18. Write short note on Trend surface fitting.
MULTIPLE CHOICE QUESTIONS FOR QUIZ
7. ________________ digitizing type in which the original map is fitted on a special surface
a. On tablet
b. On screen
c. On scanner
d. On page
8. The process of distilling points, lines and polygons from a scanned image is called _____
a. Rasterization
b. Vectorization
c. Transformation
d. Conversion
13. ________ is based on the notion that the spatial change of a variable can be
described as a function of the distance between points
a. Attribute accuracy
b. Kriging
c. Triangulation
d. Trend surface fitting
15. Data which is captured indirectly from the environment is known as ______
a. Secondary data
b. Primary data
c. Historical data
d. Important data