DS Handout 4
DS Handout 4
Types of Attributes
In data science, understanding the types of attributes in a dataset is crucial for accurate
analysis and processing. Attributes define the characteristics of the data, and they can vary
in terms of their value types, scale, and meaning. This document provides a detailed
overview of different types of attributes, with examples to illustrate how they function in a
dataset.
1. Nominal Attributes
• Definition: Nominal attributes are categorical variables that represent different
categories without any specific order or ranking.
• Key Characteristics:
• No inherent order.
• Arithmetic operations are not meaningful (e.g., you cannot add black hair to
red hair).
• Operations:
• Frequency Count: Count how often each category occurs (e.g., how many
people have black hair).
2. Binary Attributes
• Definition: A type of nominal attribute with only two possible values.
• Example: Gender (Male, Female), COVID status (Positive, Negative), Smoking status
(Smoker, Non-smoker).
• Types:
• Operations:
3. Ordinal Attributes
• Definition: Ordinal attributes are categorical variables where the categories have a
meaningful order or ranking.
• Key Characteristics:
• Operations:
• Conversion from Numeric: You can convert a numeric attribute to an ordinal one
by defining ranges (e.g., Temperature as Low, Medium, High).
4. Numeric Attributes
Numeric attributes are quantitative and can be subjected to arithmetic operations. These
can be further divided into two types:
• Key Characteristics:
• No absolute zero.
• Operations:
• Ratio-Scaled Attributes:
• Key Characteristics:
• Operations:
Final Thoughts
1. Understanding different types of attributes is essential in data science for effective
data analysis and processing.
2. Each type of attribute – whether nominal, binary, ordinal, or numeric – requires
specific methods for analysis, and recognizing these differences is key to drawing
accurate conclusions from the data.
3. By correctly identifying the types of attributes, data scientists can choose the most
appropriate methods for data cleaning, exploration, and modeling, ensuring that the
data is handled efficiently and effectively.