0% found this document useful (0 votes)
7 views15 pages

2 Data Types Quality

The document discusses data mining, which is the process of discovering knowledge in data, and emphasizes the importance of understanding data objects and their attributes. It categorizes data attributes into qualitative (nominal, ordinal, binary) and quantitative (numeric, interval-scaled, ratio-scaled, discrete, continuous) types. Additionally, it outlines key factors that determine data quality, including accuracy, completeness, consistency, validity, uniqueness, and timeliness.

Uploaded by

Ashish Saikia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views15 pages

2 Data Types Quality

The document discusses data mining, which is the process of discovering knowledge in data, and emphasizes the importance of understanding data objects and their attributes. It categorizes data attributes into qualitative (nominal, ordinal, binary) and quantitative (numeric, interval-scaled, ratio-scaled, discrete, continuous) types. Additionally, it outlines key factors that determine data quality, including accuracy, completeness, consistency, validity, uniqueness, and timeliness.

Uploaded by

Ashish Saikia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Types of Data

and
Data Quality
Data Mining
• Data Mining is

– Also known as knowledge discovery in data (KDD)

• Mining data includes knowing about

• Data

• Finding relations among data


Data
– To know about the data, it is necessary to know about

• Data objects

• Attributes of data

• Different types of data attributes


Data
• Data

– Refers to distinct pieces of information, usually formatted and


stored in a way that is efficient for movement or processing

– Collection of data objects and their attributes

• Object

– Defined by a set of attributes (attribute vector or feature vector)

– Also referred to as a record, entity, sample, etc.


Attributes and Their Types
• Property or characteristics of an object
• Also referred to as variable, field, characteristic, or feature
• Examples
– Eye color of a person, temperature, etc.
• Different types of attributes

Qualitative Quantitative

• Nominal • Numeric
• Interval-scaled
• Ordinal
• Ratio-scaled
• Binary
• Discrete
• Continuous
Qualitative Attributes :: Nominal
• Related to names
• The values of a Nominal attribute
– Are names of things, some kind of symbols
– Represents some category or state
• Also referred as categorical attributes
– No ordering (rank, position) among values
• Example
Qualitative Attributes :: Ordinal
• Provides sufficient information to order the objects

• But the magnitude between values is not actually known

• Example
Qualitative Attributes :: Binary
• Has only 2 values or states
• For Example
– Yes or no, affected or unaffected, true or false etc.
• Symmetric:
– Both values are equally important (Gender)
• Asymmetric:
– Both values are not equally important (Result)
Quantitative Attributes :: Numeric
• Quantitative

– It is a measurable quantity

– Represented in integer or real values

– Of two types

• Interval-Scaled

• Ratio-Scaled
Quantitative Attributes :: Numeric
• Interval-Scaled

– Has values whose differences are interpretable

– Data can be added and subtracted but can not be multiplied or


divided
– Examples: Calendar dates, Temperatures in Celsius or Fahrenheit
• Ratio-Scaled

– Both differences and ratios are significant

– The values are ordered, and the difference between values, the
mean, median, mode etc. can be computed
• Examples: length, time, counts etc.
Quantitative Attributes :: Discrete
• Have finite values which can be numerical or categorical
• Has finite or countable infinite set of values

• Example:
Quantitative Attributes :: Continuous
• Has real numbers as attribute values
• Typically represented as floating point variables
• Examples: temperature, height, or weight etc.
Data Quality
• The measure of how well suited a data set is to serve its specific
purpose

• Measures of data quality are based on factors such as

– Accuracy

– Completeness

– Consistency

– Validity

– Uniqueness

– Timeliness
Data Quality
• Accuracy
– The data should reflect actual, real-world scenarios
– The measure of accuracy can be confirmed with a verifiable
source.
• Completeness
– Ability of the data to effectively deliver all the required values that
are available
• Consistency
– The uniformity of data as it moves across networks and
applications.
– The same data values stored in difference locations should not
conflict with one another.
Data Quality
• Validity
 Data should be collected according to defined business rules and
parameters
 Data should conform to the right format and fall within the right
range
• Uniqueness
– Ensures that there are no duplications or overlapping of values
across all data sets
• Timeliness
– Timely data is data that is available when it is required
– Data may be updated in real time to ensure that it is readily
available and accessible.

You might also like