02 Data Categorization
02 Data Categorization
7/4/19
Today’s discussion…
Data in data analytics
NOIR topology
Nominal scale
Binary
Symmetric
Asymmetric
Ordinal scale
Interval and ration scale
Multidimensional Data Model
7/4/19
Data in Data Analytics
7/4/19
Data in Data Analytics
In general, there are many types of data that can be used to
measure the properties of an entity.
7/4/19
Classification of scales of Measurement
7/4/19
NOIR classification
The mostly recommended scales of measurement are
N: Nominal
O: Ordinal
I: Interval
R: Ratio
7/4/19
NOIR Classification
Symmetric Numerically
Ordered Continuous
Literally
Asymmetric Ordered
Numeric
Categorical (Qualitative)
(Quantitative)
7/4/19
Properties of data
Following FOUR properties (operations) of data are pertinent.
1. Distinctiveness = and ≠
Categorical
(Qualitative)
2. Order <,≤,>,≥
3. Addition + and -
Numerical
(Quantitative)
4. Multiplication * and /
7/4/19
NOIR summary
ü Nominal (with distinctiveness property only)
7/4/19
Which of the following employment
classifications best describes your area
of work?
1. Educator
2. Construction worker
3. Manufacturing worker
4. Lawyer
5. Doctor
6. Other
7/4/19
Nominal scale
Definition
A variable that takes a value among a set of mutually exclusive codes that have no logical
order is known as a nominal variable.
Examples
Gender Used letters or numbers
{ M, F} or { 1, 0 }
Country code ??
????
7/4/19
Nominal scale
Note
The nominal scale is used to label data categorization
using a consistent naming convention.
The labels can be numbers, letters, strings,
enumerated constants or other keyboard symbols.
Nominal data thus makes “category” of a set of data.
7/4/19
Nominal scale
Note
A nominal data may be numerical in form, but the numerical values
have no mathematical interpretation.
For example, 10 prisoners are 100, 101, … 110, but; 100 + 110 = 210 is
meaningless. They are simply labels.
Two labels may be identical ( = ) or dissimilar ( ≠ ).
These labels do not have any ordering among themselves.
For example, we cannot say blood group B is better or worse than
group A.
Labels (from two different attributes) can be combined to give
another nominal variable.
For example, blood group with Rh factor ( A+ , A- , AB+, etc.)
7/4/19
Binary scale
Definition
A nominal variable with exactly two mutually exclusive categories that
have no logical order is known as binary variable
Examples
Switch: {ON, OFF}
Attendance: {True, False}
Entry: {Yes, No}
etc.
Note
A Binary variable is a special case of a nominal variable that
takes only two possible values.
7/4/19
Symmetric and Asymmetric Binary Scale
Different binary variables may have unequal importance.
7/4/19
Operations on Nominal variables
Summary statistics applicable to nominal data are mode,
contingency correlation, etc.
Arithmetic ( + , - , * a n d / ) and logical operations ( < , > , ≠ e t c . )
are not permitted.
The allowed operations are : accessing (read, check, etc.) and re-
coding (into another non-overlapping symbol set, that is, one-
to-one mapping) etc.
Nominal data can be visualized using line charts, bar charts or
pie charts etc.
Two or more nominal variables can be combined to generate
other nominal variable.
Example: Gender (M,F) × Marital status (S, M, D, W)
7/4/19
Survey
7/4/19
Ordinal scale
Definition
Ordered nominal data are known as ordinal data and the
variable that generates it is called ordinal variable.
Example:
Shirt size = { S, M, L, XL, XXL}
Note
The values assumed by an ordinal variable can be ordered
among themselves as each pair of values can be compared
literally or using relational operators ( < , ≤ , > , ≥ ).
7/4/19
Operation on Ordinal data
Usually relational operators can be used on ordinal data.
Summary measures mode and median can be used on ordinal data.
Ordinal data can be ranked (numerically, alphabetically, etc.) Hence,
we can find any of the percentiles measures of ordinal data.
Calculations based on order are permitted (such as count, min, max,
etc.).
Spearman’s R can be used as a measure of the strength of association
between two sets of ordinal data.
Numerical variable can be transformed into ordinal variable and vice-
versa, but with a loss of information.
For example, Age [1, … 100] = [young, middle-aged, old]
7/4/19
Interval scale
Definition
Interval-scale variables are continuous measurements of a roughly linear scale.
Example:
weight, height, latitude, longitude, weather, temperature, calendar dates, etc.
Note
Interval data are with well-defined interval.
Interval data are measured on a numeric scale (with +ve, 0 (zero), and –ve
values).
Interval data has a zero point on origin. However, the origin does not imply a
true absence of the measured characteristics.
For example, temperature in Celsius and Fahrenheit; 0⁰ does not mean absence
of temperature, that is, no heat!
7/4/19
Operation on Interval data
We can add to or from interval data.
For example: date1 + x-days = date2
Subtraction can also be performed.
For example: current date – date of birth = age
Negation (changing the sign) and multiplication by a
constant are permitted.
All operations on ordinal data defined are also valid here.
Linear (e.g. cx + d ) or Affine transformations are
permissible.
Other one-to-one non-linear transformation (e.g., log, exp,
sin, etc.) can also be applied.
7/4/19
Operation on Interval data
Note
Interval data can be transformed to nominal or ordinal
scale, but with loss of information.
7/4/19
Ratio scale
Definition
Interval data with a clear definition of “zero” are called ratio data.
Example:
Temperature in Kelvin scale, Intensity of earth-quake on Richter scale,
Sound intensity in Decibel, cost of an article, population of a country, etc.
Note
All ratio data are interval data but the reverse is not true.
In ratio scale, both differences between data values and ratios
(of non-zero) data pairs are meaningful.
Ratio data may be in linear or non-linear scale.
Both interval and ratio data can be stored in same data type
(i.e., integer, float, double, etc.)
7/4/19
Operation on Ratio data
All arithmetic operations on interval data are
applicable to ratio data.
7/4/19
7/4/19