Lecture 8 - Accuracy in GIS
Lecture 8 - Accuracy in GIS
of Errors
There is no such thing as the perfect GIS data. It is a fact in any science, and
cartography is no exception. However, the imperfection of data and its effects
on GIS analysis had not been considered in great detail until recent years. In the
last decade, GIS specialists started to accept that error, inaccuracy, and
imprecision can affect the quality of many types of GIS projects, in the sense
that errors that are not accounted for can turn the analysis in a GIS project to a
useless exercise. Understanding error inherent in GIS data is critical to ensuring
that any spatial analysis performed using those datasets meets a minimum
threshold for accuracy. The saying, “Garbage in, garbage out” applies all to well
when data that is inaccurate, imprecise, or full of errors is used during analysis.
The power of GIS resides in its ability to use many types of data related to the
same geographical area to perform the analysis, integrating different datasets
within a single system. But when a new dataset is brought to the GIS, the
software imports not only the data, but also the error that the data contains.
The first action to take care of the problem of error is being aware of it and
understanding the limitations of the data being used.
Precision refers how exact is the description of data. Precise data may be
inaccurate, because it may be exactly described but inaccurately gathered.
(Maybe the surveyor made a mistake, or the data was recorded wrongly into the
database).
PRECISION VERSUS ACCURACY
In the series of images above, the concept of precision versus accuracy is
visualized. The crosshair of each image represents the true value of the entity
and the red dots represent the measure values. Image A is precise and
accurate, image B is precise but not accurate, image C is accurate but imprecise,
Image D is neither accurate nor precise. Understanding both accuracy and
precision is important for assessing the usability of a GIS dataset. When a
dataset is inaccurate but highly precise, corrective measures can be taken to
adjust the dataset to make it more accurate.
Error involves assessing both the imprecision of data and its inaccuracies.
The age of data may be another obvious source of error. When data sources are
too old, some, or a big part, of the information base may have changed. GIS
users should always be mindful when using old data and the lack of currency to
that data before using it for contemporary analysis.
There are some types of errors created when formatting data for processing.
Changes in scale, reprojections, import/export from raster to vector, etc. are all
examples of possible sources of formatting errors.
Other sources of error may not be so obvious, some of them originated at the
moment of initial measurements, even from the moment of capturing the data
cause by users.
We also have to pay attention to what has been defined as positional accuracy,
whichis dependent on the type of data. Cartographers can accurately locate
certain features like roads, boundary lines, etc. but other data with less defined
position in space such as soil types, may be just an approximate location based
on the estimation of the cartographer. Other features, like climate, for instance
lack defined boundaries in nature and, therefore, are subject to subjective
interpretation.
Topological errors occur often during the digitizing process. Errors of the
operator may result in polygon knots, and loops, and there may be some errors
associated with damaged source maps as well.
EXAMPLES OF TOPOLOGICAL ERRORS IN GIS. SOURCE: TONY ROTONDAS.
We can never forget that inaccuracy, imprecision, and the resulting error, may
be compounded in a GIS project when we need to employ more than one data
source. In these types of projects, one error leads to another, compounding its
effects on the analysis and affecting the entire project. For that reason, it
becomes clear that the best way to avoid the dangers of propagation of errors
would be to always prepare a data quality report for data created by the GIS
users, even if they don’t plan to share the data with others. The use
of metadata, (or data about the data), is one of the first tools that any GIS user
should consult in order to know more about the data that he is using and to
avoid adding more error to a data that in any case will never be perfect. Any
good metadata should always include some basic information like age of the
data, origin, area that it covers, scale, projection system, accuracy, format, etc.
There are several types of digitizing methods. Manual digitizing involves tracing
geographic features from an external digitizing tablet using a puck (a type of
mouse specialized for tracing and capturing geographic features from the
tablet). Heads up digitizing (also referred to as on-screen digitizing) is the
method of tracing geographic features from another dataset (usually an aerial,
satellite image, or scanned image of a map) directly on the computer screen.
Automated digitizing involves using image processing software that contains
pattern recognition technology to generated vectors. More detail about
creating geographic data can be found in this article: Methods for Creating
Spatial Databases.
During the digitizing process, vectors are connected to other lines by a node,
which marks the point of intersection. Vertices are defining points along the
shape of an unbroken line. All lines have a starting point known as a starting
node and an ending node. If the line is not a straight line, then any bends and
curves on that line are defined by vertices (vertex for a singular bend). Any
intersection of two lines is denoted by node at the point of the intersection.
Dangles or dangling nodes are lines that are not connected but should be. With
dangling nodes, gaps occur in the line work where the two lines should be
connected. Dangling nodes also occur when a digitized polygon doesn’t
connect back to itself, leaving a gap where the two end nodes should have
connected, creating what is called an open polygon.
Slivers
Slivers are gaps in a digitized polygon layer where the adjoining polygons have
gaps between them. Again, setting the proper parameters for snap tolerance is
critical for ensuring that the edges of adjoining polygons snap together to
eliminate those gaps. Where the two adjacent polygons overlap in error, the
area where the two polygons overlap is called a sliver.