Spatial Data Quality
Spatial Data Quality
“But there are also unknown unknowns: the ones we don't know we don't
know.” Donald Rumsfeld
0 1 2
Miles
Scale Examples
Common Scales Large versus Small
1:200 (1”=16.8ft) large: above 1:12,500
1:2,000 (1”=56 yards; 1cm=20m) medium: 1:13,000 - 1:126,720
1:20,000 (5cm=1km) small: 1:130,000 - 1:1,000,000
1:24,000 (1”=2,000ft) very small: below 1:1,000,000
1:25,000 (1cm=.5km) ( really, relative to what’s available for a
1:50,000 (2cm=1km) given area; Maling 1989)
1:62,500 (1.6cm=1km; 1”=.986mi) Map sheet examples:
1:63,360 (1”=1mile; 1cm=.634km) 1:24,000: 7.5 minute USGS Quads
1:100,000 (1”=1.58mi; 1cm=1km) (17 by 22 inches; 6 by 8 miles)
1:500,000 (1”=7.9mi; 1cm=5km) 1:7,500,000 US wall map
1:1,000,000(1”=15.8mi; 1cm=10km) (26 by 16 inches)
1:7,500,000(1”=118mi); 1cm=750km) 1:20,000,000: US 8.5” X 11”
Scale, Resolution & Accuracy in GIS Systems
• On paper maps, scale is hard to change, thus it generally determines resolution and
accuracy--and consistent decisions are made for these.
• A GIS is scale independent since output can be produced at any scale, irrespective
of the characteristics of the input data— at least in theory
• in practice, an implicit range of scales or maximum scale for anticipated output
should be chosen and used to determine:
• what features to show
• manholes only on large scale maps
• how features will be represented
• manhole a polygon at 1:50; cities a point at 1:1,000,000
• appropriate levels for accuracy and precision
• Larger scale generally requires greater resolution
• Larger scale necessitates a higher level of accuracy
• GIS also helps with the the generalization problem implicit in paper maps
• A road drawn with 0.5 mm wide line (the smallest for decent visibility)
• At 1:24,000 implies the road is 12 meters (36 feet) wide
• At 1:250,000 implies the road is 125 meters (375 feet) wide
• At least in a GIS you can store the true road width, but be careful with plots!
Precision or Resolution
it’s not the same as scale or accuracy!
3.2ft
• eg 1.6 ft to 3.2 ft (1/4 storage); to 6.4 ft (1/16 storage)
• resolution and scale
• generally, increasing to larger scale allows features to be observed better and requires higher resolution
• but, because of the human eye’s ability to recognize patterns, features in a lower resolution data set can
sometimes be observed better by decreasing the scale
(6.4 ft resolution shown at 1:400 rather than 1:200)
• resolution and positional accuracy
• you can see a feature (resolution), but it may not be in the right place (accuracy)
• higher accuracy generally costs much more to obtain than higher resolution
• accuracy cannot be greater (but may be much less) than resolution (e.g. if pixel size is one meter, then best
accuracy possible is one meter)
Accuracy: rests on at least four legs, not one!
Positional Accuracy (sometimes called Quantitative accuracy)
Spatial
• horizontal accuracy: distance from true location
• vertical accuracy: difference from true height
Temporal
• Difference from actual time and/or date
Attribute Accuracy or Consistency-- the validity concept in experimental design/stat. inf.
• a feature is what the GIS/map purports it to be
• a railroad is a railroad, and not a road
• A soil sample agrees with the type mapped
Completeness--the reliability concept from experimental design/stat. inf.
• Are all instances of a feature the GIS/map claims to include, in fact, there?
• Partially a function of the criteria for including features: when does a road become a track?
• Simply put, how much data is missing?
Logical Consistency: The presence of contradictory relationships in the database
• Non-Spatial
• Some crimes recorded at place of occurrence, others at place where report taken
• Data for one country is for 2000, for another its for 2001
• Annual data series not taken on same day/month etc. (sometimes called lineage error)
• Data uses different source or estimation technique for different years (again, lineage)
• Spatial
• Overshoots and gaps in road networks or parcel polygons
Sources of Error
Error is the inverse of accuracy. It is a discrepancy
between the coded and actual values.
Sources Example for Positional Accuracy
• Inherent instability of the phenomena itself • choice of spheroid and datum
• E.g. Random variation of most phenomena
(e.g. leaf size) • choice of map projection and its parameters
• Measurement • accuracy of measured locations (surveying) of
• E.g. surveyor or instrument error features on earth
• Model used to represent data • media stability (stretching ,folding, wrinkling of
• E.g. choice of spheroid, or classification maps, photos)
systems
• Data encoding and entry • human drafting, digitizing or interpretation
• E.g. keying or digitizing errors error
• Data processing • resolution &/or accuracy of drafting/digitizing
• E.g. single versus double precision; algorithms equipment
used • Thinnest visible line: 0.1-0.2 millimeters
• Propagation or cascading from one data set to • At scale of 1:20,000 = 6.5 - 12.8 feet
another (20,000 x 0.2 = 4,000mm = 4m = 12.8 feet)
• E.g. using inaccurate layer as source for
another layer • registration accuracy of tics
• machine precision: coordinate rounding error
in storage and manipulation
• other unknown
Measurement of Positional Accuracy
• usually measured by root mean square error: the square root of the average squared
errors
• Usually expressed as a probability that no more than P% of points will be further than S
distance from their true location.
• Loosely we say that the rmse tells us how far recorded points in the GIS are from their
true location on the ground, on average.
• More correctly, based on the normal distribution of errors, 68% of points will be rmse
distance or less from their true location, 95% will be no more than twice this distance,
providing the errors are random and not systematic (i.e. the mean of the errors is zero)
• e.g. for NTGISC digital orthos RMSE is 3.2 feet (one meter)
for USGS Digital Ortho Quads RMSE spec. is approx. 33 feet or 10
meters
(but in reality much better)
-- with GPS, height is 2 or 3 times less accurate in practice at high precision
than horizontal (officially the spec is 1.5, but data collection errors affect vertical
the most)
National Map Accuracy Standards: 1941/47
• established in 1941 by the US Bureau of the Budget (now OMB) for use with US
Geological Survey maps (Maling, 1989, p. 146)
• horizontal accuracy: not more than 10% of tested, ‘well defined’ points shall be more
than the following distances from their true location:
• 1:62,500: 1/50th of an inch (.02”)
• 1:24,000: 1/40th of an inch (amended to 1/50=.02” in 1947)
Smaller scale 1/50=.02”
1:20,000
• 1:12,000: 1/30 of an inch (.033”) Larger scale 1/30=.033”
• Thus, on maps with a scale of 1:63,360 (1”=1 mile) 90%
of points should be within 105.6 feet [(63360 X .02)/12)] of their true location.
• on USGS quads with a scale of 1:24,000 (1”=2,000ft) 90% of points should be within
40 feet [(24,000 X .02)/12] of their true location.
• on a map with a scale of 1:12,000 (1”=1,000ft), 90% of points should be within 33
feet (1,000 X .033), approx. 10 meters
• gives rise to the loose, but often used, statement that the “NMAS is 10 meters”
• Inadequate for the computer age
• how many points? how select?
• how determine their ‘true’ location
• what about attribute completeness?
• Unfortunately, the “new standard” doesn’t address all these issues either
National Standard for Spatial Data Accuracy (NSSDA)
1998
According to chart
In reality
Summary:
Resolution, Scale, Accuracy & Storage:
illustrating the relationship
Go to quality_graphics.ppt
Lineage
• identifies the original sources from which the data was
derived
• details the processing steps through which the data has
gone to reach its current form
• Both impact its accuracy
• Both should be in the metadata, and are required by the
Content Standard for Metadata (see below)
• Michael Goodchild ( the guru of GIS) advocates:
• Measurement-based GIS, in which how data collected and how
measurements made are a part of the record (as in surveying)
• Coordinate-based GIS, is the current approach, and it tracks none
of this.
(see Shi, Fisher and Goodchild Spatial Data Quality London: Taylor and Frances, 2002)
Currency: Is my data “up-to-date”?
• data is always relative to a specific point in time, which must be
documented.
• there are important applications for historical data (e.g. analyzing trends),
so don’t necessarily trash old data
• “current” data requires a specific plan for on-going maintenance
• may be continuous, or at pre-defined points in time.
• otherwise, data becomes outdated very quickly
• currency is not really an independent quality dimension; it is
simply a factor contributing to lack of accuracy regarding
• consistency: some GIS features do not match those in the real world today
• completeness: some real world features are missing from the GIS database
Many organizations spend substantial amounts acquiring a data set without giving
any thought to how it will be maintained.
Standards: common “agreed-to” ways of doing things
• May exist for:
• Data itself [including process (the way it’s produced) and product (the outcome)]
• Utilities Data Content Standard, FGDC-STD-010-2000
• Accuracy of data
• Geospatial Positioning Accuracy Standard, Part 3, National Standard for Spatial Data Accuracy, FGDC-
STD-007.3-1998
• Documentation about the data (metadata)
• Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STD-001-1998
• Transfer of data and its documentation
• Spatial Data Transfer Standard (SDTS), FGDC-STD-002
• For symbology and presentation
• Digital Geologic Map Symbolization
• May address:
• Content (what is recorded)
• Format (how it’s recorded: file format, .tif, shapefile, etc)
• May be a product of:
• An organization’s internal actions [private or organization standards]
• An external government body (Federal Geographic Data Committee) or third sector
body (Open GIS Consortium) [public or de jure standards]
• Laissez-faire market-place-forces leading to one dominant approach e.g. “Wintel
standard” [industry or de facto standards]
http://www.fgdc.gov/standards/
Who Sets Public Standards ?
• Federal Geographic Data Committee
• Sets standards for geospatial data which all federal agencies are
required to follow
• Has representatives from most federal agencies
• National Institute for Standards and Technology (NIST) sets federal gov.
standards for other things (e.g. IT in general)
• national standards bodies
• American National Standards Institute (ANSI)
• has the US’s single vote at ISO
• United States InterNational Committee on Information Technology
Standards (INCITS) handles IT standards for ANSI
• Several FGDC standards been submitted for approval
• Most countries in the world have their equivalent to ANSI
• international standards bodies
• ISO (International Organization for Standardization)
• other assorted vendor groups, professional associations, trade
associations, and consortia
• Open GIS Consortium (OGC) is the main player in GIS
The Process for Setting de jure standards!
Source: URISA News Issue
197, Sept/Oct. 2003
Go to the following web site for excellent overview of standard making: process
http://www.fgdc.gov/publications/documents/standards/
Adopting Standards: What you should do
• Data quality achieved by adoption and use of standards: Do it!
• Common ways of doing things essential for using & sharing data internally
and externally
• only federal agencies required to use FGDC standards, its optional
for any others (e.g. state, local)
• power of feds often results in adoption by everybody, although there are
some noted failures (e.g.the OSI, GOSIP, & POSIX standards in computing in
the 1980s failed and were withdrawn)
• FGDC or ISO standards provide excellent starting point for local
standards, and should be adopted unless there are compelling
reasons otherwise
• Standards for metadata (“documenting your data”) are the most
important and should be first priority.
• Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STD-001-1998
• ISO Document 19115 Geographic Information-Metadata (content) and 19139, Geographic
Information—Metadata—Implementation Specification, (format for storing ISO 19115
metadata in XML format)
• If not one of these standard for metadata, adopt some standard!
Content Standards for Digital Geospatial Metadata
What and Why?
By law (Executive Order 12906, 1994), all federal agencies must document their data according to:
Content Standard for Digital Geospatial Metadata (version 2.0), FGDC-STD-001-1998
Traditional Minimum Documentation Requirements for Maps/GIS
• geodetic datum name (e.g NAD27)--which implies:
If GIS data in lat/long,
• ellipsoid/spheroid name (earth model) e.g. Clark 1866 must know datum.
• point of origin (ties ellipsoid to earth) e.g Meades Ranch If GIS data in XY, must
• required for all GIS data bases and maps know datum and
projection info)
• projection name and its parameters and its measurement units
(see terrestrial lecture for exact details)
• Required for all maps since 2-D by nature
• Required for GIS if data is in X-Y projected form
• Source information
• accuracy standard(s) to which built
• author/publisher/creator name and/or data source
• date(s) of data collection/update, and of map/gis creation + tic marks: +
• Cartographers demand all maps have Points of positional
reference used to
• north arrow relate map to ground
• map scale +or other map +
• graticule indication
• at least four latitude/longitude tic marks, with values in degrees
• at least four X-Y tic marks, with values and units of measurement (feet, meters, etc.)
Texas Standards
http://www.dir.state.tx.us/tgic/pubs/pubs.htm
• Standards for digital spatial data (raster and vector) for State
agencies in Texas were established in 1992
• http://www.dir.state.tx.us/tgic/pubs/gis-standards.htm
• Currently (2004), being reviewed by the Texas Geographic Information
Council (TGIC) for possible update
• Apply to map scales of 1:24,000 and smaller (e.g., 1:100,000; 1:250,000).
• Cover variety of issues including data layers, datum, projections, accuracy,
metadata, etc..
• Two major planning reports on GIS in state gov. in Texas are:
• Digital Texas: 2002 Biennial Report on Geographic Information Systems
Technology
• http://www.dir.state.tx.us/tgic/pubs/gift99-small.pdf
• Geographic Information Framework for Texas (1999)
• http://www.dir.state.tx.us/tgic/pubs/digtex-lowres.pdf
Importance of Standards
Great Baltimore Fire of 1904 - fire engines from different
regions responded only to be found useless since they had
different hose coupling sizes that did not fit Baltimore
hydrants - fire burned over 30 hours, resulted in destruction
of 1526 building covering 17 city blocks.
Fire 1923 - Fall River, MA saved when over 20 neighboring
fire department responded to a town fire since they had
standardized on hydrants and hose couplings sizes.
9/11: Response in NY and DC severely hampered by
incompatibilities between GIS data sets, and lack of data
Also, incompatibilities between communications systems
The most important standard?
Railroad track gauge - adopted by US, UK, Canada, and much of
Europe.
South America still hampered by differing railroad gauges between
countries.
The Best Time
to Adopt a
Standard?
Now? Now?
Before!
Appendix
FGDC Standards
(status as of March 2004)
For latest, go to:
http://www.fgdc.gov/standards/standards.html
FGDC: Metadata Standards
Metadata:
• Content Standard for Digital Geospatial Metadata (version 2.0)
FGDC-STD-001-1998
• Content Standard for Digital Geospatial Metadata, Part 1:
Biological Data Profile FGDC-STD-001.1-1999
• Metadata Profile for Shoreline Data (FGDC-STD-001.2-2001)
• Content Standard for Digital Geospatial Metadata: extension for
remote sensing data (FGDC-STD-0012-2002)
• Encoding Standard for Geospatial Metadata (Draft)
• Metadata Profile for Cultural and Demographic Data (dropped)
Current thrust is to integrate FGDC Metadata standards (and other FGDC standards
eventually) into International Standards Organization (ISO) standards.
FGDC: Data Accuracy Standard
Geospatial Positioning Accuracy Standard (FGDC-STD-007)
Part 1, Reporting Methodology FGDC-STD-007.1-1998
Part 2, Geodetic Control Networks FGDC-STD-007.2-1998
Part 3, National Standard for Spatial Data Accuracy FGDC-STD-007.3-
1998
Part 4: Architecture, Engineering Construction, and Facilities
Management (FGDC-STD-007.4-2002),
Part 5: Standard for Hydrographic Surveys and Nautical Charts (Review)