Concept and Definition of Data Quality
Concept and Definition of Data Quality
1
Definition of Data Quality
• Data quality refers to the condition of a set of values of qualitative or quantitative
variables.
• ISO 9000:2015 definition of quality is applied, data quality can be defined as the
degree to which a set of characteristics of data fulfills requirements.
Examples of characteristics are: completeness, validity, accuracy,
consistency, availability and timeliness. Requirements are defined as the need or
expectation that is stated, generally implied .
2
Accuracy
• Accuracy simply means how closely a measurement corresponds to an actual
value as shown below.
• Accuracy is limited by
• Data collection equipment and technique
• Intended use
3
Precision
• Exactness of representation
• Numerical data
• Number of significant digits
• Does not imply accuracy
• Need varies with scale
• Categorical data
• Level of detail
• Number of categories
e.g.: Residential vs type of residential
4
• Precision takes on a slightly different meaning when it is used to refer to a number
of repeated measurements. In the Figure below, there is less variance among the
nine measurements at left than there is among the nine measurements at right. The
set of measurements at left is said to be more precise.
5
Error and uncertainty
Error and Uncertainty
• Positions are the products of measurements. All measurements contain some
degree of error.
• Errors are introduced in the original act of measuring locations on the Earth
surface. Errors are also introduced when second- and third-generation data is
produced, say, by scanning or digitizing a paper map.
• In general, there are three sources of error in measurement:
• Human beings,
• The environment in which they work, and
• The measurement instruments they use.
6
• Human errors include mistakes, such as reading an instrument incorrectly, and
judgments. Judgment becomes a factor when the phenomenon that is being
measured is not directly observable (like an aquifer), or has ambiguous boundaries
(like a soil unit).
• Environmental characteristics, such as variations in temperature, gravity, and
magnetic declination, also result in measurement errors.
• Instrument errors follow from the fact that space is continuous. There is no limit
to how precisely a position can be specified. Measurements, however, can be only
so precise. No matter what instrument, there is always a limit to how small a
difference is detectable. That limit is called resolution.
7
Figure, below, shows the same position (the point in the center of the bullseye) measured by two
instruments. The two grid patterns represent the smallest objects that can be detected by the
instruments. The pattern at left represents a higher-resolution instrument.
8
Uncertainty
• Degree of doubt
• Accuracy and precision are not known
• Error is not known(but may be large)
• Greater when data from multiple sources and scales are mixed
9
Sources of error in geographical data
• Types of error: spatial or attributes
• Sources of error:
• Instruments,
• Human
• Change
• The ‘errors’ that can occur during the four components of GIS:
• Input
• Database management
• Data analysis
• Output
10
Input:
• Digitizing: human error and the world of a line
• Dangling nodes(connected to only one arc): permissible in arc themes
11
Database management system
12
13
Data analysis
• Interpolation of point data into lines/surfaces. E.g. Contours
• Overlay of layers, digitized separately from different sources or scales,
e.g. soils and vegetation.
• The compounding effect of processing and analysis of multiple layers:
for example, if two layers each have correctness of 90%, the accuracy
of the resulting overlay is around 81%.
• Inappropriate or inadequate inputs for nodes.
14
Output
• The result is being displayed in the screen, paper and so on.
• The measured result may vary from the input data.
15
Thank you !
16