The document discusses the importance of acquiring geographic data in cartography, noting that data acquisition can consume a significant portion of project resources. It highlights various sources of errors in geographic data, including inaccuracies in measurements and outdated information, and emphasizes the necessity of understanding these errors to ensure the quality of spatial analysis. Additionally, it outlines different types of attribute data and measurement scales, explaining their relevance in GIS analysis.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
5 views36 pages
Lecture 9 - Data Types and Errors
The document discusses the importance of acquiring geographic data in cartography, noting that data acquisition can consume a significant portion of project resources. It highlights various sources of errors in geographic data, including inaccuracies in measurements and outdated information, and emphasizes the necessity of understanding these errors to ensure the quality of spatial analysis. Additionally, it outlines different types of attribute data and measurement scales, explaining their relevance in GIS analysis.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36
CGB 213: Principle
of Cartography - Lecture 9 – Geographic Data Errors and Attribute Data Types. Introduction Acquiring geographic data is crucial in any Cartographic effort.
It has been estimated that data acquisition
typically consumes 60 to 80 percent of the time and money spent on any given project.
It is important to be aware that geographic
data may carry errors and users should take note of these. Geographic Data Errors Cartographic data is not perfect. Like any other data, it can contain errors or inaccuracies that may affect the final output. Some common sources of errors in Cartographic data include incomplete or outdated data sources, errors in data entry or conversion, imprecise or inaccurate measurements, and inherent limitations of the data collection method. Geographic Data Errors The power of GIS and mapping is in the ability to use many types of data related to the same geographical area to perform the analysis, integrating different datasets within a single system.
When a new dataset is loaded into a GIS software application,
the software imports not only the data but also the error that the data contains. The first action to take care of the problem of error is being aware of it and understanding the limitations of the data being used. Geographic Data Errors Those who work with Cartographic data should understand that error, inaccuracy, and imprecision can affect the quality of many types of Cartographic projects, in the sense that errors that are not accounted for can turn the analysis in a project into a pointless exercise. Understanding errors inherent in Cartographic data is critical to ensuring that any spatial analysis performed using those datasets meets a minimum threshold for accuracy. The saying, “Garbage in, garbage out” applies all too well when data that is inaccurate, imprecise, or full of errors is used during analysis. Accuracy and Precision Accuracy and precision are both important aspects of Cartographic data quality, but they refer to different things.
To understand the relevance of accuracy
and precision, we should start by getting the difference between the terms: Accuracy Accuracy can be defined as the degree or closeness to which the information on a map matches the values in the real world. Therefore, when we refer to accuracy, we are talking about the quality of data and about number of errors contained in a certain dataset.
In Cartographic data, accuracy can be referred to as
a geographic position, but it can also be referred to as attribute, or conceptual accuracy. Precision Precision refers to the level of measurement and exactness of description in a GIS database.
Precise data may be inaccurate because it
may be exactly described but inaccurately gathered. (Maybe the surveyor made a mistake, or the data was recorded wrongly into the database). Sources of Inaccuracy and Imprecision Scale, for example, is an inherent error in cartography; depending on the scale used, we can represent different types of data in a different quantity and quality. Cartographers should adapt the scale of work to the level of detail needed in their projects. The age of data may be another obvious source of error. When data sources are too old, some, or a big part, of the information base may have changed. Cartographers should always be mindful of using old data and the lack of currency to that data before using it for analysis. Attribute Errors A common mistake can happen with label or attribute errors.
For instance, an agricultural land may
be incorrectly marked as a Forest, and this would cause an error that the map user may not notice because they may not be familiar with the area in question. Positional accuracy of GIS data Cartographers can accurately locate certain features like roads, boundary lines, etc. but other data with less defined positions in space such as soil types, may be just an approximate location based on the estimation of the cartographer.
Other features, like climate, for instance, lack defined
boundaries in nature and, therefore, are subject to interpretation by a data producer/cartographer. Topological errors Topological errors occur often during the digitizing process. Attribute Data Types Attribute Data Types
The type of data that we employ to help
us understand a given entity is determined by (1) what we are examining, (2) what we want to know about that entity, and (3) our ability to measure that entity at the desired scale. The most common data types available in a GIS are alphanumeric strings, numbers, Boolean values, dates, and binaries. Attribute Data Types An alphanumeric string, or text, the data type is any simple combination of letters and numbers that may or may not form coherent words.
The character property (or string) is for text-
based values such as the name of a street or descriptive values such as the condition of a street. Attribute Data Types The number data type can be subcategorized as either floating-point or integer. A floating point is any data value containing decimal digits, while an integer is any data value not containing decimal digits. Integers can be short or long, depending on the number of significant digits. Attribute Data Types Boolean, date, and binary values are less complex.
Boolean values are simply those deemed true or
false based on applying a Boolean operator such as AND, OR, and NOT.
The date data type is self-explanatory, while the
binary data type represents attributes whose values are either 1 or 0. Measurement Scale In addition to defining data by type, a measurement scale acts to group data according to the level of complexity.
For GIS analysis, measurement scales can be grouped
into two broad categories. Nominal and ordinal data represent categorical data; interval and ratio data represent numeric data. Nominal Scale The most straightforward data measurement scale is the nominal or named scale. The nominal scale makes statements about what to call data points but does not allow for scalar comparisons between one object and another. For example, attributing nominal information to points representing cities will describe whether the given city is “Gaborone” or “Francistown.” However, no further denotations can be made about those locales, such as population or voting history. Other examples of nominal data include last name, eye color, land-use type, ethnicity, and gender. Ordinal Scale Ordinal data places attribute information into ranks and yields more precisely scaled information than nominal data.
Ordinal data describes the position in which
data occur, such as first, second, third, etc. Ordinal Scale These scales may also take on names such as “very unsatisfied,” “unsatisfied,” “satisfied,” and “very satisfied.” Although this measurement scale indicates the ranking of each data point relative to other data points, the ordinal scale does not explicitly denote the exact quantitative difference between these rankings. For example, if an ordinal attribute represents which runner came in first, second, or third place, it does not state how long the winning runner beat the second-place runner. Therefore, one cannot undertake arithmetic operations with ordinal data. The only sequence is explicit. Interval Scale
An interval data measurement scale
allows precise quantitative statements about attributes. Interval data are measured along a scale in which each position is equidistant. Elevation and temperature readings are typical representations of interval data. For example, this scale can determine that 30 degrees Celsius is 5 degrees Celsius warmer than 25 degrees Celsius. Interval Scale
A notable property of the interval scale is
that zero is not a meaningful value because zero does not represent nothingness or the absence of a value.
Indeed, 0 degrees Celsius does not
indicate that no temperature exists. Similarly, an elevation of 0 meters does not indicate a lack of elevation but indicates the mean sea level. Ratio Scale Ratio data are like the interval measurement scale but based on a meaningful zero value.
Population density is an example of ratio
data whereby a 0-population density indicates that no people live in an area. Discrete and Continuous Data Specific to numeric datasets, data values also can be discrete or continuous. Discrete data maintain a finite number of values, while infinite values can represent continuous data. Continuous data represents a measurement that can take on any value within a range, while discrete data represents a specific category or class. Discrete and Continuous Data For example, temperature is a continuous variable because it can take on any value within a range, while land use is a discrete variable because it is made up of distinct, separate categories such as forest, agriculture, or urban areas. Continuous data is often represented using a continuous color scale or a gradient, which allows for the visualization of patterns and trends across a range of values. Discrete data, on the other hand, is typically represented using a set of distinct colors or symbols that correspond to each category or class. Discrete Data Discrete data is geographic data that only occurs in specific locations.
Discrete GIS data can be represented using both
vector and raster data models. Examples of discrete data include land use categories, soil types, or vegetation classes.
Maps made with discrete GIS data will have areas
on the map that contain values from that dataset and areas on the map where that dataset is absent. Continuous Data Continuous data has no clearly defined boundaries.
Every point on a map made with continuous GIS
data will contain a numeric value.
Continuous GIS data is represented by a
continuous scale, such as a gradient.
Examples of continuous data include
temperature, elevation, slope, and rainfall. Continuous Data In GIS, continuous data is often represented by a raster data model, where data is stored in a grid of cells, with each cell representing a small area of the Earth’s surface.
The values in each cell represent the value
of the continuous variable being measured at that location. Spatial Analysis Focus of Discrete and Continuous Data In GIS analysis, the type of data being used has implications for the types of analysis and techniques that can be applied. For example, continuous data is often used in terrain analysis or environmental modeling to identify areas of high or low elevation, slope, or other variables. Discrete data is often used in land use or demographic analysis to identify patterns or clusters of different types of features or populations. What is the age of the data? Where did it come from? In what medium was it originally produced? What is the area coverage of the data? To what map scale was the data digitized? What projection, coordinate system, and datum were used in the data? What was the density of observations used for its compilation? How accurate are positional and attribute features? Does the data seem logical and consistent? Do cartographic representations look "clean?" Is the data relevant to the project at hand? In what format is the data kept? How was the data checked? Why was the data compiled? What is the reliability of the provider?