4 - GIS Data Collection - Eng
4 - GIS Data Collection - Eng
The aim of this lection is to introduce the sources and capture methods of GIS data.
GIS data collection is one of the most expensive and time consuming stages of GIS introduction.
Some sources declare the data collection may account up to 85% of GIS project costs. There are
many sources of GIS data. Generally, GIS data can be collected using direct capture and deriving
from other systems. There are two groups of sources of geographic data, even if the differences
between groups are negligible:
• Primary geographic data are captured in digital form for direct use in GIS (GPS
measurements, digital aerial and satellite images, digital ground surveying, etc.).
• Secondary geographic data, both digital and analogous, are collected within other
projects and converted into formats suitable for GIS (scanned paper maps, digital Earth
surface models, generated using contour lines, toponymical data bases, etc.).
Both primary and secondary GIS data may be obtained in digital or analog (paper map, print-
outs of aerial or spatial photos and so on) formats. Analog data before to be stored in a GIS
database, shall be digitized – scanned, captured using heads-up digitizing. Formats and some
other characteristics of digital data may not fit some specific GIS applications, too. So, this
chapter discusses the sources, capture tools and technologies used to collect GIS data.
Even the simplest projects involve several data collection stages – Fig. 4.1. Planning includes
definition of users’ needs, available and needed resources (personnel, hardware and software)
and development of an implementation plan. Preparation involves obtaining data, redrawing of
poor quality maps, editing of scanned maps, etc. It may involve installing necessary GIS
software and hardware, too. Digitizing and transfer is the most effort-consuming stage of GIS
data collection. After it is completed, the initial version of GIS database is almost always
editable to identify and eliminate remaining errors. Any data set is not considered to be
completed until it is not evaluated for quality and documented. Large GIS database construction
projects may involve reduced version of the whole project, including all the above stages –
carrying out so called pilot project to optimize the implementation of the main project.
Planning
Evaluation Preparation
Editing Digitizing /
Transfer
Fig. 4.1 Stages of GIS data collection (source: Paul A. Longley, Michael F. Goodchild, David J.
Maguire, David W. Rhind, Geographic Information Systems and Science, 2nd Edition, Wiley,
2005)
4.2 Collection of primary geographic data
The most popular form of primary geographic data collection is remote sensing. Under remote
sensing we assume deriving the information about the physical, chemical and biological
properties of objects without direct physical contact. Information is collected by measuring
reflected, emitted or dispersed electromagnetic radiation using a variety of sensors that operate in
the electromagnetic spectrum from visible to microwave wavelengths. Remote sensing systems
are based on (1) passive sensors, recording reflected or emitted electromagnetic radiation
(conventional or digital aerial photography, main satellite remote sensing systems – Landsat,
Spot, Ikonos, etc.) and (2) active sensors, that generate the electromagnetic waves, shoots the
Earth surface and records the reflectance properties (RaDAR – Radio Detection And Ranging,
LiDAR – Light Detection And Ranging). The platforms for remote sensing sensors may be
satellites or airplanes, however other aircrafts are used, too – helicopters, balloons, ultra-light
aircrafts, including the unpiloted ones. Remote sensors can be installed, e.g., in mobile
communication masts, like video cameras to monitor and detect forest fires.
From the GIS perspective, the main physical characteristic of remote sensing systems is a
resolution, which may be discussed in three aspects: spatial, spectral and temporal. Spatial
resolution refers to the minimal size of objects that can be resolved in remotely sensed images.
As the majority of images are stored in a raster, pixel or cell, the size is the most convenient
indicator of spatial resolution. Cell size of remotely sensed satellite images may range from 0.5
m up to 1 km, the one of aerial images, from 10 cm up to 5 m (depending on the parameters of
aerial photography).
Remote sensing is based on the reflectance differences of various objects on the Earth surface in
different spectral zones (Fig. 4.2). Remote sensors may record the electromagnetic radiation in
one or several spectral bands – properties of these bands define the spectral resolution. E.g.,
sensors installed on-board of Ikonos satellite system are sensitive to 4 spectral bands: blue, green
and red visible plus near infrared.
Water
Green vegetation
Bare soil
% of reflected radiation
Green
Blue
Red
Middle infrared
Near infrared
Wave length, μm
Fig. 4.2. Generalized reflectance characteristics of main objects on the Earth surface in different
zones of electromagnetic spectrum
Temporal resolution of remote sensing systems is the most varying one. Some systems provide
practically on-line delivery of information, some other ones, like satellite remote sensing systems,
have repeat cycle of several hours, days or weeks. The repeat cycle of aerial photography is
dependent on the will of user (e.g., many countries repeat complete coverage of aerial
photography for mapping purposes every 5 years).
Nowadays quite often remote sensing information may be obtained from different archives,
therefore there is no need to plan and carry-out remote sensing missions. Such information may
be available from governmental (e.g., National Land Service under the Ministry of Agriculture
of Republic of Lithuania) or commercial institutions. Internet plays increasing role in search,
assessment and acquiring of remotely sensed information, too.
There are two main ways to collect primary vector data: ground surveying and use of global
satellite navigation systems (GNSS). The differences between both these technologies become
blurred today.
Ground surveying is based on the principle that the 3D location of any point can be determined
by measuring angles and distances from the points, location of which is known. Measurements
start from a benchmark point. If the coordinate system of this point is known, all other
measurements are done in this coordinate system. However, if the coordinate system is not
known, all measurements are carried-out in the relative or local coordinate system. Since the
locations of measured points are related to the locations of other points, measurement errors
distributes among all points. E.g., when measuring the boundary of a polygon, the locations of
starting and ending point are sometimes achieved different, even theoretically they are identical.
So the errors need to be apportioned between all the points defining boundary of the polygon.
Traditionally ground surveying is carried-out using such equipment like compass or theodolite to
measure angles and measuring tapes to measure distances. Today they are replaced by electro-
optical devices, usually called as total stations (Fig. 4.3).
The basic principles of ground surveying have changed very little during the past 100 years,
although the achievements in technology have improved considerably the accuracy of
measurements and productivity of surveying projects. Usually two persons are carrying-out the
measurements. One operates the total station; the other carries a reflective prism that is placed at
the objects being measured. There are already some new systems available that enables
measurements to be carried-out by one person.
Even ground surveying is an expensive and time consuming affair, it still remains the best
solution to get extremely high precision data about the location of points. High quality
orthophotos are impossible without surveying of ground control points. Ground surveying is
important in controlling the accuracy of any GIS database.
Global Navigation Satellite System (GNSS) consists of the following operational or planned
systems:
• GPS – Global Positioning System, developed in USA, which is the first and most
common system.
• Russian GLONASS (abbreviated from ГЛОбальная НАвигационная Спутниковая
Система).
• Planned European Galileo system.
• Experimental Chinese Beidou system.
GPS is defined as “satellites that emit radio signals to define the location on the Earth surface.
GPS receivers are used to precisely locate their position on or near the Earth surface”1. GPS
consists of:
• 24 satellites (spatial segment). There are several backup satellites available, too.
• 5 ground stations (control segment).
• GPS receivers (user segment).
Spatial segment consists of 24 NAVSTAR satellites on 20200 km orbits and turning around the
globe in 12 hours. The same point is covered every 24 hours, 4 hours earlier than the previous
day (Fig. 4.4). The weight of satellite – 862 kg, size – approx. 5 m. The life-span of each satellite
is 7.5 years, after that it is replaced by another satellite. Satellites broadcast coded radio signals
at precise time intervals.
1
International GIS dictionary, McDonnel & Kemp
The control segment consists of a system of ground monitoring stations that are distributed
through all over the world. They follow the satellites, register their locations and correct the
orbits if required.
The user segment consists of GPS receivers, the construction, precision and price may differ
essentially. GPS receivers are roughly grouped as: (1) the ones to be used for ground surveying
to achieve the accuracies of millimetres or centimetres order; (2) the ones to be used for
collecting of GIS data (accuracies around 1 m) and (3) amateurish receivers, with the accuracies
below 10 m – Fig. 4.5.
a) b) c)
Fig. 4.5. Different types of GPS receivers: a) suited for ground surveying, b) suited for GIS
database development and c) amateurish
GPS measurements are based on the distance that is defined according to the travel time between
the satellite (location of which is known) and the receiver, using so called method of
“triangulation” – Fig. 4.6. Having measured one distance between the receiver and satellite we
may define its location as a point in a sphere, the centre of which is the satellite and the radius –
the distance just measured (Fig. 4.6a). The search is narrowed to the circle (intersection of 2
spheres), if distances to 2 satellites are measured (Fig. 4.6b). Measurement of the 3rd distance
provides the locations of 2 points, one of which corresponds to the location of a receiver (another
is usually somewhere in the atmosphere) – Fig. 4.6b. Usually to measure the location of appoint,
4 distances should be measured.
Third measurement –
2 points are defined
a) b) c)
20200 km
Fig. 4.6. Locating an object of the Earth surface using the method of triangulation from satellite
a) b) c)
d) e)
Fig. 4.7. Factors, influencing the accuracy of GPS measurements: a) atmosphere interference, b)
and c) obstacles on the ground, d) bad distribution of satellites, e) good distribution of satellites
GPS can be used to measure z coordinate of a point, too. One should bear in mind that the
accuracy of z coordinate will be less than the one of x and y. Say, x and y coordinates are
measured with an accuracy of 10 m. The accuracy of z coordinate may be expected to be 30-50
m. The elevation above mean sea level is also a relative value, depending on the mathematical
model of ellipsoid.
This includes development of GIS databases using available paper maps, photographs and other
analog documents. Scanning is used to capture raster data; vector data are collected using manual
digitizing, on-screen vectorization, stereo-photogrammetry, COGO procedures, etc.
A scanner is a device that scans the image available on some surface (e.g. paper sheet) or
photographic film and transfers it to computer’s memory. There are several reasons to scan
analog documents:
• Various paper maps, plans, etc. are scanned to reduce wear and tear. Paper maps in a
digital form are easier to access, administrate.
• Scanned and georeferenced paper maps, aerial photographs provide geographic context
for other data.
• Scanned maps, photographs or images are frequently vectorised, as to be discussed in
coming chapters.
Manual digitizing used to be the main and cheapest method for development of GIS databases a
decade ago. Manual digitizing uses special digitizing equipment – digitizer (Fig. 4.8). Digitizer is
made of digitizing table and special mouse. The surface of digitizing table is covered by
extremely fine mesh of electric wires. The mouse has a special cursor used to copy the shapes of
objects and 16 or more keys to handle the digitizing process and input some attributes. Map is
attached to the digitizing table, mouse cursor is moved over the map and objects are captured by
pressing mouse buttons.
The effectiveness of manual digitizing is dependent on the type of digitizer, characteristics of the
software used and the experience of operators. However, manual digitizing is difficult and
tedious process. Tiredness of operators may significantly reduce the quality of GIS databases
under development. Control of the contours entered is quite problematic, because the operator
focuses his or her attention on the digitizing table rather than computer monitor.
Any scanned map can be georeferenced and displayed on computer screen (contrary to fixing on
a digitizing table). Shapes of geographic objects can be captured using cursor handled by
conventional computer mouse (not the digitizer mouse). This procedure, called on-screen
vectorization, is considered to be the most popular method of GIS database development today.
4.3.3 Photogrammetry
Photogrammetry is the science and technology of making measurements from aerial photographs
or other remotely sensed images. Although it may include 2D measurements, nowadays GIS is
almost exclusively concerned with stereo capturing of 2.5D or 3D data. To get the stereo image,
aerial photographs or other remotely sensed images must have 60& overlap along each flight
strip and 30% overlap between the strips. The magnitude of overlap defines the area for which
3D model can be constructed.
The majority of GIS databases in Lithuania are developed identifying geographic objects from
the ortophotographic maps. Ortophotographic maps are documents created using aerial
photographs. All distortions available in original aerial images are eliminated during the
ortophoto mapping. In order to eliminate the distortions due to differences in elevation, digital
surface models must be created. This is a difficult and time consuming procedure and the
accuracies achieved are rather limited. Development of GIS databases using photogrammetric
methods do not require extra development of digital surface modelling, because the geographic
objects are located correctly regarding the surface peculiarities.
Measurements are captured from the overlapping pairs of photographs using equipment, which is
called stereoplotter. This equipment is used to build a 3D model and capture and edit GIS data.
There have been three generations of stereoplotters: optical, analytic and digital (Fig. 4.9).
Optical or analog plotters are not used any more today. Analytic plotters combine mechanical
measurements and computer-based registration of data. All operations computerized in digital
plotters. It is expected the digital photogrammetry to replace the analytic one very soon.
a) b) c)
Fig. 4.9. Three generations of stereoplotters: a) analog or optical, b) analytic and c) digital
There are many techniques to view the stereo models: starting from the use of simplest
stereoscopes and ending with special glasses that operates using the principles of polarization
(different polarization planes are used for each image of stereo pair as for each pane of glasses)
or anaglyph (left image of stereo pair is displayed using blue light, the right one – using red light.
Both images are observed using glasses one pane of which is blue, another – red).
Major decision to be taking before to start any GIS project is whether to build the database or to
purchase it (or part of it). All the above discussion has been on solutions to build the database
from primary and secondary sources. However, there exist plenty of databases with different
contents, accuracies and accessibility nowadays, so geographic data are more commonly
obtained from some external available sources. GIS databases available for different applications
are discussed in other chapters. The most of such general use GIS databases are under
administration of National Land Service under the Ministry of Agriculture of Republic of
Lithuania (www.nzt.lt) or its authorized institutions (Joint stock company “Institute of Aerial
Geodesy” and State enterprise National Center for Remote Sensing and Geoinformatics “GIS-
Centras”). Geographic information becomes increasingly available from the Internet, too.
One of the biggest problems with data obtained from external sources are that they can be
available in different formats than required by the user’s software. It is not possible to design the
universal data format that, e.g., supports both fast display of huge data amounts and the topology
at the same time. Most of GIS software systems are commercial products – the developers have a
better understanding of their data formats and are not interested in consumers to use their
competitors' data formats.
Bearing in mind that development of database is the most expensive stage of a GIS project, it is
natural, that many tools have been developed to migrate the data between different systems (Fig.
4.10). Most of modern GIS software systems are able to read directly AutoCAD DWG and DXF,
Microstation DGN, ESRI Shape2 and some image file formats.
a)
Translation
b)
Direct reading
Fig. 4.10. Solutions to use external source data: a) translation into format acceptable for a GIS
software, b) direct reading of external file formats
Standardization of GIS information is getting more and more important nowadays. At the global
level for that are responsible technical committees 211 and 287 of the International Standard
2
The most popular GIS software systems are introduced in later chapters
Organization (ISO). CEN (Comité Européen de Normalisation) is engaged in standardization of
geographic information in Europe. “Geodata Specification of Integrated Geographic Information
System (InGIS)”, which was approved in 2000, is an example of GIS data standardization from
Lithuania. It specifies the development of georeference databases and is required to follow by
any administrator of such data.
GIS database stores the information on the location of a certain geographic object as well as the
descriptive characteristics of this object. Even attributes can be captured together with digitizing
the geometry, it is proved to be more efficient to input them separately. Attribute data collection
is relatively simple task and it can be undertaken by less qualified thus cheaper personnel.
Attributes are digitized using computer keyboards, transferred from different data loggers (e.g.,
GPS), or automated using automatic text and today even voice recognition techniques. An
operative control of input data validity is a key requirement for attribute data entry. E.g.,
parameters of forest compartments obtained during the field inventory are checked for valid
values during the entry procedures using correlation with other parameters. Large GIS database
development projects require the attribute data to be entered twice and then compared as a
validation check.
Attributes of geographic features are usually stored separately from corresponding shapes. A
unique object ID (also called as a key) is used to relate the geometry and attributes of an object.
Data about data, or metadata is another type of non-geometric data. GIS software does
automatically generate some metadata, e.g., defines the coordinate system, spatial extent of
collected data, parameters of attribute fields, etc. Other metadata – developer of the data,
contacting authority, parameters of quality assessment, etc. – should be identified separately.
Metadata is entered in the same way as other attributes of geographic objects.
The general principle of any GIS project applies to data collection: clear plan for implementation,
capacities, financing, and timing. In any data collection process there is a trade-off between
quality, speed and price – Fig. 4.11. High quality data can be quickly collected, but it is very
expensive. If the price is key consideration, then data of lower quality is collected over a longer
period.
Price
Quality Speed
Fig. 4.11. Relationship between quality, price and speed in GIS data collection projects
Two strategies are possible in the development of GIS databases:
• Step-by-step. The data collection project is broken into separate stages. This approach
works with limited resources per time unit (e.g. year), however the total costs may be
significantly larger. This is a good approach for inexperienced organization, which may
learn without hurrying. However, there is large risk that optimally looking technical
solutions may get outdated, step-by-step trained personnel may find better employer, etc.
• “Blitzkrieg”. The approach is to get everything at once. Advantages and disadvantages of
the step-by-step approach are switched here.
Another important question to be answered before starting GIS database development project is
whether data collection should use internal or external resources. Nowadays it is increasingly
common to contract for data collection specialist companies producing in the third world
countries (e.g., in India, Indonesia, etc.). However, EU projects restrict the transfer of production
to the countries with very low labour costs outside EU. Such specialist companies usually work
faster, cheaper and with higher quality, but they require using real cash to pay their services.