2 Data Types Quality
2 Data Types Quality
and
Data Quality
Data Mining
• Data Mining is
– Also known as knowledge discovery in data (KDD)
• Example:
Quantitative Attributes :: Continuous
• Has real numbers as attribute values
• Typically represented as floating point variables
• Examples: temperature, height, or weight etc.
Data Quality
• The measure of how well suited a data set is to serve its specific
purpose
• Measures of data quality are based on factors such as
– Accuracy
– Completeness
– Consistency
– Validity
– Uniqueness
– Timeliness
Data Quality
• Accuracy
– The data should reflect actual, real-world scenarios
– The measure of accuracy can be confirmed with a verifiable
source.
• Completeness
– Ability of the data to effectively deliver all the required values
that are available
• Consistency
– The uniformity of data as it moves across networks and
applications.
– The same data values stored in difference locations should not
conflict with one another.
Data Quality
• Validity
Data should be collected according to defined business rules and
parameters
Data should conform to the right format and fall within the right
range
• Uniqueness
– Ensures that there are no duplications or overlapping of values
across all data sets
• Timeliness
– Timely data is data that is available when it is required
– Data may be updated in real time to ensure that it is readily
available and accessible.