0% found this document useful (0 votes)
22 views65 pages

QM Topic - Data Description & Presentation

Uploaded by

hannmn0581
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views65 pages

QM Topic - Data Description & Presentation

Uploaded by

hannmn0581
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 65

Topic: Data Presentation & Analysis

Week 9:

Post data collection, students understand data description and can:

 Define and explain the difference between categorical, continuous and discrete data
 Define and give examples of nominal, ordinal, interval and ratio data
 Categorise example data into the above categories
 Define descriptive statistics
 Select the suitable visualisation statistical treatments to describe data of differing types*

Week 10

For each of the methods below students can explain their use & use them to manually construct
statistics for data:

 Visualisation of categorical data: tables, bar charts (inc. clustered, component and percentage
component), pie charts, pictograms
 Visualisation of continuous data: rank order list, frequency tables (with cumulative frequency &
relative frequency/ percentages, cumulative relative frequency and visualisation using
histograms/ ogives).

Resources and Activities

Display and present some data using appropriate visualisations and calculate descriptive statistics,
learn techniques to draw and use of software’s to display charts and calculate descriptive statistics.

Definition/Explanation Examples
Categorical
data

Continuous
data

Discrete data


Nominal
data Nominal data is the For instance, we can’t use a
simplest data type. It regression model on nominal
classifies (or names) data, because nominal data lacks
data without suggesting the necessary characteristics
any implied relationship required to carry out this type of
between those data. For analysis (namely: no dependent

1
instance, countries or and independent variables).
species of animals are
both forms of nominal
data.

Ordinal data
Ordinal data also
classifies data but it
introduces the concept of
ranking. An example
might be labeling
animals, but this time by
using discrete and
imprecise measures of
their speed (‘slow’,
‘medium’, ‘fast’).

Interval data
Interval data both
classifies and ranks data
(like ordinal data) but
introduces continuous
measurements.
Examples might be the
time of day or
temperature measured
on either the Celsius and
Fahrenheit scale.
Importantly, it always
lacks a ‘true zero.’ A
measurement of zero
can be midway through a
scale (i.e. you can have
2
minus temperatures).

Ratio data
Ratio data classifies and
ranks data, and uses
measured, continuous
intervals, just like interval
data. However, unlike
interval data, ratio data
has a true zero. This
basically means that
zero is an absolute,
below which there are no
meaningful values.
Speed, age, or weight
are all excellent
examples since none
can have a negative
value (you cannot be -10
years old or weigh -160
pounds!)

What is Nominal Data? Definition,


Characteristics and Examples
3
https://careerfoundry.com/en/blog/data-analytics/what-is-nominal-data/

BY EMILY STEVENS, UPDATED ON AUGUST 23, 202213 mins read

What is nominal data and what is it used for? How is it collected and
analyzed? Learn everything you need to know in this guide.

There are many different industries and career paths that involve working
with data—including psychology, marketing, and, of course, data analytics.
If you’re working with data in any capacity, there are four main data types
(or levels of measurement) to be aware of: nominal, ordinal, interval, and
ratio. Here, we’ll focus on nominal data.

We’ll briefly introduce the four different types of data, before defining what
nominal data is and providing some examples. We’ll then look at how
nominal data can be collected and analyzed. If you want to skip ahead to a
specific section, just use the clickable menu.

1. The four different types of data (or


levels of measurement)

When we talk about the four different types of data, we’re actually referring
to different levels of measurement. Levels (or scales) of measurement
indicate how precisely a variable has been recorded. The level of

4
measurement determines how and to what extent you can analyze the
data.

The four levels of measurement are nominal, ordinal, interval, and ratio,
with nominal being the least complex and precise measurement, and ratio
being the most. In the hierarchy of measurement, each level builds upon
the last. So:

 Nominal data denotes labels or categories (e.g. blonde hair, brown


hair).
 Ordinal data refers to data that can be categorized and also ranked
according to some kind of order or hierarchy (e.g. low income,
medium income, high income). Learn more about ordinal data in this
guide.

 Interval data can be categorized and ranked just like ordinal data,
and there are equal, evenly spaced intervals between the categories
(e.g. temperature in Fahrenheit). Learn more in this complete guide to
interval data.

 Ratio data is just like interval data in that it can be categorized and
ranked, and there are equal intervals between the data points.
Additionally, ratio data has a true zero. Weight in kilograms is an
example of ratio data; if something weighs zero kilograms, it truly
weighs nothing. On the other hand, a temperature of zero degrees
doesn’t mean there is “no temperature”—and that’s the difference
between interval and ratio data. You’ll find a complete guide to ratio
data here.

You can learn more in this comprehensive guide to the levels of


measurement (with examples).

What do the different levels of


measurement tell you?
5
The various levels of measurement are important because they determine
how you can analyze your data. When analyzing data, you’ll
use descriptive statistics to describe or summarize the characteristics of
your dataset, and inferential statistics to test different hypotheses. The
descriptive and inferential methods you’re able to use will vary depending
on whether the data are nominal, ordinal, interval, or ratio. You can learn
more about the difference between descriptive and inferential statistics here .

So, before you start collecting data, it’s important to think about the levels
of measurement you’ll use.

2. Nominal data definition

Nominal data is a type of qualitative data which groups variables into


categories. You can think of these categories as nouns or labels; they are
purely descriptive, they don’t have any quantitative or numeric value, and
the various categories cannot be placed into any kind of meaningful order
or hierarchy.

At this point, it’s important to note that nominal variables may be


represented by numbers as well as words—however, these “number
labels” don’t have any kind of numeric meaning. To illustrate this with an
example, let’s imagine you’re collecting data on people’s hair color. You

6
might use a numbering system to denote the different hair colors: say, 1 to
represent brown hair, 2 to represent blonde hair, 3 for black hair, 4 for
auburn hair, 5 for gray hair, and so on.

Although you are using numbers to label each category, these numbers do
not represent any kind of value or hierarchy (e.g. gray hair as represented
by the number 5 is not “greater than” or “better than” brown hair
represented by the number 1, and vice versa).

As such, nominal data is the simplest, least precise level of measurement.


You can identify nominal data according to the following characteristics.

3. Key characteristics of nominal data


 Nominal data are categorical, and the categories are mutually
exclusive; there is no overlap between the categories.
 Nominal data are categorized according to labels which are purely
descriptive—they don’t provide any quantitative or numeric value.
 Nominal data cannot be placed into any kind of meaningful order or
hierarchy—no one category is greater than or “worth more” than
another.

What’s the difference between nominal


and ordinal data?
While nominal and ordinal data both count as categorical data (i.e. not
numeric), there is one key difference. Nominal variables can be divided into
categories, but there is no order or hierarchy to the categories. Ordinal
variables, on the other hand, can be divided into categories that naturally
follow some kind of order.

7
For example, the variable “hair color” is nominal as it can be divided into
various categories (brown, blonde, gray, black, etc) but there is no
hierarchy to the various hair colors. The variable “education level” is ordinal
as it can be divided into categories (high school, bachelor’s degree,
master’s degree, etc.) and there is a natural order to the categories; we
know that a bachelor’s degree is a higher level of education than high
school, and that a master’s degree is a higher level of education than a
bachelor’s degree, and so on.

So, if there is no natural order to your data, you know that it’s nominal.

4. Nominal data examples


So what are some examples of nominal data that you might encounter?
Let’s take a look.

 Hair color (blonde, gray, brown, black, etc.)


 Nationality (Kenyan, British, Chinese, etc.)
 Relationship status (married, cohabiting, single, etc.)
 Preferred mode of public transportation (bus, train, tram, etc.)
 Blood type (O negative, O positive, A negative, and so on)
 Political parties voted for (party X, party Y, party Z, etc.)
 Attachment style according to attachment theory (secure, anxious-
preoccupied, dismissive-avoidant, fearful-avoidant)
 Personality type (introvert, extrovert, ambivert, for example)
 Employment status (employed, unemployed, retired, etc.)

As you can see, nominal data is really all about describing characteristics.
With those examples in mind, let’s take a look at how nominal data is
collected and what it’s used for.

8
5. How is nominal data collected and
what is it used for?
Nominal data helps you to gain insight into a particular population or
sample. This is useful in many different contexts, including marketing,
psychology, healthcare, education, and business—essentially any scenario
where you might benefit from learning more about your target
demographic.

Nominal data is usually collected via surveys. Where the variables of


interest can only be divided into two or a few categories, you can use
closed questions. For example:

 Question: What’s your favorite mode of public


transportation? Possible answers: Bus, tram, train
 Question: Are you over 30 years of age? Possible answers: Yes,
no

If there are lots of different possible categories, you can use open
questions where the respondent is required to write their answer. For
example, “What is your native language?” or “What is your favorite genre of
music?”

Once you’ve collected your nominal data, you can analyze it. We’ll look at
how to analyze nominal data now.

6. Nominal data analysis


No matter what type of data you’re working with, there are some general
steps you’ll take in order to analyze and make sense of it. These include

9
gathering descriptive statistics to summarize the data, visualizing your
data, and carrying out some statistical analysis.

So how do you analyze nominal data? Let’s take a look, starting with
descriptive statistics.

Descriptive statistics for nominal data


Descriptive statistics help you to see how your data are distributed. Two
useful descriptive statistics for nominal data are frequency
distribution and central tendency (mode).

Frequency distribution tables


Let’s imagine you’re investigating what mode of public transportation
people living in London prefer. In its raw form, this data may appear quite
disorganized and unstructured—a spreadsheet containing a column for
“Preferred mode of public transport,” a column for “Location,” and a column
for “Income,” with the values for each variable entered at random.

Note that, in this example dataset, the first two variables—“Preferred mode
of transport” and “Location”—are nominal, but the third variable (“Income”)
is ordinal as it follows some kind of hierarchy (high, medium, low).

10
At first glance, it’s not easy to see how your data are distributed. For
example, it’s not immediately clear how many respondents answered “bus”
versus “tram,” nor is it easy to see if there’s a clear winner in terms of
preferred mode of transportation.

To bring some order to your nominal data, you can create a frequency
distribution table. This allows you to see how many responses there were
for each category. A simple way to do this in Microsoft Excel is to create a
pivot table. You can learn how to create a pivot table in this step-by-step
guide.

Here’s what a pivot table would look like for our transportation example:

11
You can also calculate the frequency distribution as a percentage, allowing
you to see what proportion of your respondents prefer which mode of
transport. Here’s what that would look like in our pivot table:

Measure of central tendency (mode)


As the name suggests, measures of central tendency help you to identify
the “center point” of your dataset; that is, the value that is most
representative of the entire dataset. Measures of central tendency include:

 The mode: The value that appears most frequently within a dataset
 The median: The middle value
 The mean: The average value

When it comes to nominal data, the only measure of central tendency


you can use is the mode. To identify the mode, look for the value or
category that appears most frequently in your distribution table. In the case
of our example dataset, “bus” has the most responses (11 out of a total of
20, or 55%) and therefore constitutes the mode.

As you can see, descriptive statistics help you to gain an overall picture of
your nominal dataset. Through your distribution tables, you can already
glean insights as to which modes of transport people prefer.

12
Visualizing nominal data
Data visualization is all about presenting your data in a visual format. Just
like the frequency distribution tables, visualizing your nominal data can help
you to see more easily what the data may be telling you.

Some simple yet effective ways to visualize nominal data are through bar
graphs and pie charts. You can do this in Microsoft Excel simply by clicking
“Insert” and then selecting “Chart” from the dropdown menu.

13
(Non-parametric) statistical tests for
nominal data
While descriptive statistics (and visualizations) merely summarize your
nominal data, inferential statistics enable you to test a hypothesis and
actually dig deeper into what the data are telling you.

There are two types of statistical tests to be aware of: parametric


tests which are used for interval and ratio data, and non-parametric
tests which are used for nominal and ordinal data. So, as we’re dealing
with nominal data, we’re only concerned with non-parametric tests.

When analyzing a nominal dataset, you might run:

 A chi-square goodness of fit test, if you’re only looking at one variable


 A chi-square test of independence, if you’re looking at two variables

Chi-square goodness of fit test (for a


dataset with one nominal variable)
The Chi-square goodness of fit test helps you to assess whether the
sample data you’ve collected is representative of the whole population. In
our earlier example, we gathered data on the public transport preferences
of twenty Londoners. Let’s imagine that, prior to gathering this data, we
looked at historical data published by Transport for London (TFL) and
hypothesized that most Londoners will prefer to travel by train. However,
according to the sample of data we collected ourselves, bus is the most
popular way to travel.

Now we want to know how applicable our findings are to the whole
population of people living in London. Of course, it’s not possible to gather
data for every single person living in London; instead, we use the Chi-

14
square goodness of fit test to see how much, or to what extent, our
observations differ from what we expected or hypothesized. If
you’re interested in carrying out a Chi-square goodness of fit test, you’ll find a
comprehensive guide here.

Chi-square test of independence (for a


dataset with two nominal variables)
If you want to explore the relationship between two nominal variables, you
can use the Chi-square test of independence. In our public transport
example, we also collected data on each respondent’s location (inner city
or suburbs). Perhaps you want to see if there’s a significant correlation
between people’s proximity to the city center and their preferred mode of
transport.

In this case, you could carry out a Chi-square test of independence


(otherwise known as a Chi-square association test). Essentially, the
frequency of each category for one nominal variable (say, bus, train, and
tram) is compared across the categories of the second nominal variable
(inner city or suburbs). You can learn more about how to run a Chi-square
test of independence here.

7. Key takeaways and next steps


In this guide, we answered the question: what is nominal data? We looked
at:

 Introduced the four levels of data measurement: Nominal, ordinal,


interval, and ratio.
 Defined nominal data as a type of qualitative data which groups
variables into mutually exclusive, descriptive categories.

15
 Explained the difference between nominal and ordinal data: Both are
divided into categories, but with nominal data, there is no hierarchy or
order to the categories.
 Shared some examples of nominal data: Hair color, nationality, blood
type, etc.
 Introduced descriptive statistics for nominal data: Frequency
distribution tables and the measure of central tendency (the mode).
 Looked at how to visualize nominal data using bar graphs and pie
charts.
 Introduced non-parametric statistical tests for analyzing nominal data:
The Chi-square goodness of fit test (for one nominal variable) and
the Chi-square test of independence (for exploring the relationship
between two nominal variables).

16
What Is Ordinal Data?
https://careerfoundry.com/en/blog/data-analytics/what-is-ordinal-data/

BY WILL HILLIER, UPDATED ON AUGUST 31, 202314 mins read

What is ordinal data, how is it used, and how do you collect and
analyze it? Find out in this comprehensive guide.

Whether you’re new to data analytics or simply need a refresher on the


fundamentals, a key place to start is with the four types of data. Also known
as the four levels of measurement, this data analytics term describes the level
of detail and precision with which data is measured. The four types (or
scales) of data are:

 nominal data
 ordinal data
 interval data
 ratio data

In this article, I’m going to dive deep into ordinal data.

If the concept of these data types is completely new to you, we’ll start with
a quick summary of the four different types, and then explore the various
aspects of ordinal data in a bit more detail,

If you’d like to learn more data analytics skills, try our free 5-day data
short course.

1. An introduction to the four different


types of data
To analyze a dataset, you first need to determine what type of data you’re
dealing with.

17
Fortunately, to make this easier, all types of data fit into one of four broad
categories: nominal, ordinal, interval, and ratio data. While these are
commonly referred to as ‘data types,’ they are really different scales
or levels of measurement.

Each level of measurement indicates how precisely a variable has been


counted, determining the methods you can use to extract information from
it. The four data types are not always clearly distinguishable; rather, they
belong to a hierarchy. Each step in the hierarchy builds on the one before
it.

The first two types of data, known as categorical data, are nominal and
ordinal. These two scales take relatively imprecise measures.

While this makes them easier to analyze, it also means they offer less
accurate insights. The next two types of data are interval and ratio. These
are both types of numerical data, which makes them more complex. They
are more difficult to analyze but have the potential to offer much richer
insights.

 Nominal data is the simplest data type. It classifies data purely by


labeling or naming values e.g. measuring marital status, hair, or eye
color. It has no hierarchy to it.

18
 Ordinal data classifies data while introducing an order, or ranking.
For instance, measuring economic status using the hierarchy:
‘wealthy’, ‘middle income’ or ‘poor.’ However, there is no clearly
defined interval between these categories.
 Interval data classifies and ranks data but also introduces measured
intervals. A great example is temperature scales, in Celsius or
Fahrenheit. However, interval data has no true zero, i.e. a
measurement of ‘zero’ can still represent a quantifiable measure
(such as zero Celsius, which is simply another measure on a scale
that includes negative values).
 Ratio data is the most complex level of measurement. Like interval
data, it classifies and ranks data, and uses measured intervals.
However, unlike interval data, ratio data also has a true zero. When a
variable equals zero, there is none of this variable. A good example
of ratio data is the measure of height—you cannot have a negative
measure of height.

You’ll find a comprehensive guide to the four levels of data measurement


here.

What do the different levels of


measurement tell you?
Distinguishing between the different levels of measurement is sometimes a
little tricky.

However, it’s important to learn how to distinguish them, because the type
of data you’re working with determines the statistical techniques you can
use to analyze it. Data analysis involves using descriptive analytics (to
summarize the characteristics of a dataset) and inferential statistics (to infer
meaning from those data).

19
These comprise a wide range of analytical techniques, so before collecting
any data, you should decide which level of measurement is best for your
intended purposes.

2. What is ordinal data? A definition


Ordinal data is a type of qualitative (non-numeric) data that groups variables
into descriptive categories.

A distinguishing feature of ordinal data is that the categories it uses are


ordered on some kind of hierarchical scale, e.g. high to low. On the levels
of measurement, ordinal data comes second in complexity, directly after
nominal data.

While ordinal data is more complex than nominal data (which has no
inherent order) it is still relatively simplistic.

For instance, the terms ‘wealthy’, ‘middle income’, and ‘poor’ may give you
a rough idea of someone’s economic status, but they are an imprecise
measure–there is no clear interval between them. Nevertheless, ordinal
data is excellent for ‘sticking a finger in the wind’ if you’re taking broad
measures from a sample group and fine precision is not a requirement.

20
While ordinal data is non-numeric, it’s important to understand that it can
still contain numerical figures. However, these figures can only be used as
categorizing labels, i.e. they should have no inherent mathematical value.

For instance, if you were to measure people’s economic status you could
use number 3 as shorthand for ‘wealthy’, number 2 for ‘middle income’, and
number 1 for ‘poor.’ At a glance, this might imply numerical value, e.g. 3 =
high and 1 = low. However, the numbers are only used to denote
sequence. You could just as easily switch 3 with 1, or with ‘A’ and ‘B’ and it
would not change the value of what you’re ordering; only the labels used to
order it.

Key characteristics of ordinal data


 Ordinal data are categorical (non-numeric) but may use numbers as
labels.
 Ordinal data are always placed into some kind of hierarchy or order
(hence the name ‘ordinal’—a good tip for remembering what makes it
unique!)
 While ordinal data are always ranked, the values do not have an
even distribution.
 Using ordinal data, you can calculate the following summary
statistics: frequency distribution, mode and median, and the range of
variables.

What’s the difference between ordinal


data and nominal data?
While nominal and ordinal data are both types of non-numeric
measurement, nominal data have no order or sequence.

21
For instance, nominal data may measure the variable ‘marital status,’ with
possible outcomes ‘single’, ‘married’, ‘cohabiting’, ‘divorced’ (and so on).
However, none of these categories are ‘less’ or ‘more’ than any other.
Another example might be eye color. Meanwhile, ordinal data always has
an inherent order.

If a qualitative dataset lacks order, you know you’re dealing with nominal
data.

3. What are some examples of ordinal


data?
What are some examples of ordinal data?

 Economic status (poor, middle income, wealthy)


 Income level in non-equally distributed ranges ($10K-$20K, $20K-
$35K, $35K-$100K)
 Course grades (A+, A-, B+, B-, C)
 Education level (Elementary, High School, College, Graduate, Post-
graduate)
 Likert scales (Very satisfied, satisfied, neutral, dissatisfied, very
dissatisfied)
 Military ranks (Colonel, Brigadier General, Major General, Lieutenant
General)
 Age (child, teenager, young adult, middle-aged, retiree)

As is hopefully clear by now, ordinal data is an imprecise but nevertheless


useful way of measuring and ordering data based on its characteristics.
Next up, let’s see how ordinal data is collected and how it generally tends
to be used.

22
4. How is ordinal data collected and
what is it used for?
Ordinal data are usually collected via surveys or questionnaires. Any type
of question that ranks answers using an explicit or implicit scale can be
used to collect ordinal data. An example might be:

 Question: Which best describes your knowledge of the Python


programming language? Possible answers: Beginner, Basic,
Intermediate, Advanced, Expert.

This commonly recognized type of ordinal question uses the Likert Scale,
which we described briefly in the previous section. Another example might
be:

 Question: To what extent do you agree that data analytics is the


most important job for the 21st century? Possible answers: Strongly
agree, Agree, Neutral, Disagree, Strongly Disagree.

It’s worth noting that the Likert Scale is sometimes used as a form of
interval data. However, this is strictly incorrect. That’s because Likert
Scales use discrete values, while interval data uses continuous
values with a precise interval between them.

The distinctions between values on an ordinal scale, meanwhile, lack clear


definition or separation, i.e. they are discrete. Although this means the
values are imprecise and do not offer granular detail about a population,
they are an excellent way to draw easy comparisons between different
values in a sample group.

How is ordinal data used?


Ordinal data are commonly used for collecting demographic information.

23
This is particularly prevalent in sectors like finance, marketing, and
insurance, but it is also used by governments, e.g. the census, and is
generally common when conducting customer satisfaction surveys (in any
industry).

5. How to analyze ordinal data


As discussed, the level of measurement you use determines the kinds of
analysis you can carry out on your data. In general, these fall into two broad
categories: descriptive statistics and inferential statistics.

We use descriptive statistics to summarize the characteristics of a dataset.


This helps us spot patterns. Meanwhile, inferential statistics allow us to
make predictions (or infer future trends) based on existing data. However,
depending on the measurement scale, there are limits. You can learn more
about the difference between descriptive and inferential statistics here .

For now, though, Let’s see what kinds of descriptive and inferential
statistics you can measure using ordinal data.

Descriptive statistics for ordinal data


The descriptive statistics you can obtain using ordinal data are:

 Frequency distribution
 Measures of central tendency: Mode and/or median
 Measures of variability: Range

Now let’s look at each of these in more depth.

Frequency distribution
Frequency distribution describes how your ordinal data are distributed.

24
For instance, let’s say you’ve surveyed students on what grade they’ve
received in an examination. Possible grades range from A to C. You can
summarize this information using a pivot table or frequency table, with
values represented either as a percentage or as a count. To illustrate using
a very simple example, one such table might look like this:

As you can see, the values in the sum column show how many students
received each possible grade. This allows you to see how the values are
distributed. Another option is also to visualize the data, for instance using a
bar plot.

Viewing the data visually allows us to easily see the frequency distribution.
Note the hierarchical relationship between categories. This is different from
the other type of categorical data, nominal data, which lacks any hierarchy.

25
Measures of central tendency: Mode
and/or median
The mode (the value which is most often repeated) and median (the central
value) are two measures of what is known as ‘central tendency.’ There is
also a third measure of central tendency: the mean. However, because
ordinal data is non-numeric, it cannot be used to obtain the mean. That’s
because identifying the mean requires mathematical operations that cannot
be meaningfully carried out using ordinal data.

However, it is always possible to identify the mode in an ordinal dataset.


Using the barplot or frequency table, we can easily see that the mode of
the different grades is B. This is because B is the grade that most students
received.

In this case, we can also identify the median value. The median value is
the one that separates the top half of the dataset from the bottom half. If
you imagined all the respondents’ answers lined up end-to-end, you could
then identify the central value in the dataset. With 165 responses (as in our
grades example) the central value is the 83rd one. This falls under the
grade B.

Measures of variability: Range


The range is one measure of what is known as ‘variability.’ Other measures
of variability include variance and standard deviation. However, it is not
possible to measure these using ordinal data, for the same reasons you
cannot measure the mean.

The range describes the difference between the smallest and largest value.
To calculate this, you first need to use numeric codes to represent each
grade, i.e. A = 1, A- = 2, B = 3, etc. The range would be 5 – 1 = 4. So in this
simple example, the range is 4. This is an easy calculation to carry out. The

26
range is useful because it offers a basic understanding of how spread out
the values in a dataset are.

Inferential statistics for ordinal data


Descriptive statistics help us summarize data. To infer broader insights, we
need inferential statistics. Inferential statistics work by testing hypotheses
and drawing conclusions based on what we learn.

There are two broad types of techniques that we can use to do


this. Parametric and non-parametric tests. For qualitative (rather than
quantitative) data like ordinal and nominal data, we can only use non-
parametric techniques.

Non-parametric approaches you might use on ordinal data include:

 Mood’s median test


 The Mann-Whitney U test
 Wilcoxon signed-rank test
 The Kruskal-Wallis H test:
 Spearman’s rank correlation coefficient

Let’s briefly look at these now.

Mood’s median test


The Mood’s median test lets you compare medians from two or more
sample populations in order to determine the difference between them. For
example, you may wish to compare the median number of positive reviews
of a company on Trustpilot versus the median number of negative reviews.
This will help you determine if you’re getting more negative or positive
reviews.

27
The Mann-Whitney U-test
The Mann-Whitney U test lets you compare whether two samples come
from the same population.

It can also be used to identify whether or not observations in one sample


group tend to be larger than observations in another sample. For example,
you could use the test to understand if salaries vary based on age. Your
dependent variable would be ‘salary’ while your independent variable would
be ‘age’, with two broad groups, e.g. ‘under 30,’ ‘over 60.’

Wilcoxon signed-rank test


The Wilcoxon signed-rank test explores the distribution of scores in two
dependent data samples (or repeated measures of a single sample) to
compare how, and to what extent, the mean rank of their populations
differs.

We can use this test to determine whether two samples have been
selected from populations with an equal distribution or if there is a
statistically significant difference.

The Kruskal-Wallis H test


The Kruskal-Wallis H test helps us to compare the mean ranking of scores
across three or more independent data samples.

It’s an extension of the Mann-Whitney U test that increases the number of


samples to more than two. In the Kruskal-Wallis H test, samples can be of
equal or different sizes. We can use it to determine if the samples originate
from the same distribution.

Spearman’s rank correlation coefficient

28
Spearman’s rank correlation coefficient explores possible relationships (or
correlations) between two ordinal variables.

Specifically, it measures the statistical dependence between those


variable’s rankings. For instance, you might use it to compare how many
hours someone spends a week on social media versus their IQ. This would
help you to identify if there is a correlation between the two.

Don’t worry if these models are complex to get your head around. At this
stage, you just need to know that there are a wide range of statistical
methods at your disposal. While this means there is lots to learn, it also
offers the potential for obtaining rich insights from your data.

6. Summary and further reading


In this guide, we:

 Introduced the four levels of data measurement: Nominal, ordinal, interval,


and ratio.

 Defined ordinal data as a qualitative (non-numeric) data type that groups


variables into ranked descriptive categories.

 Explained the difference between ordinal and nominal data: Both are types of
categorical data. However, nominal data lacks hierarchy, whereas ordinal
data ranks categories using discrete values with a clear order.

 Shared some examples of nominal data: Likert scales, education level, and
military rankings.

 Highlighted the descriptive statistics you can obtain using ordinal data:
Frequency distribution, measures of central tendency (the mode and median),
and variability (the range).

 Introduced some non-parametric statistical tests for analyzing ordinal data,


e.g. Mood’s median test and the Kruskal-Wallis H test.

29
What Is Interval Data?
https://careerfoundry.com/en/blog/data-analytics/what-is-interval-data/

BY WILL HILLIER, UPDATED ON AUGUST 20, 202114 mins read

What is interval data and how is it used? What’s the best way to
collect and analyze it? Find out in this guide.

These days, many jobs require at least a basic understanding of data


analytics. You don’t have to be a specialist to require some fundamental
know-how. Whether you work in marketing, sales, or the sciences, data
plays an increasingly important role in the modern workplace. One of the
first things you’ll need to learn is the four main types of
data: nominal, ordinal, interval, and ratio data. In this post, we’ll focus on
the third of these: interval data.

To make sure you’re up to speed, we’ll start by summarizing the four


different data types and how they relate to one another. We’ll then dive in
with interval data to learn a bit more about it. We’ll cover:

Ready to learn all about interval data? Let’s go!

1. An introduction to the four different


types of data
To analyze any dataset, you first need to know what kind of data you’re
working with.

Broadly, data falls into one or more of four


categories: nominal, ordinal, interval, and ratio. These scales, or levels
of measurement, tell us how precisely a variable has been measured. The

30
four data types are not mutually exclusive but rather belong to a hierarchy,
where each level of measurement builds on the previous one.

The simplest levels of measurement are nominal and ordinal data. These
are both types of categorical data that take useful but imprecise measures
of a variable. They are easier to work with but offer less accurate insights.
Building on these are interval data and ratio data, which are both types
of numerical data. While these are more complex, they can offer much
richer insights.

 Nominal data is the simplest (and most imprecise) data type. It uses
labels to identify values, without quantifying how those values relate
to one another e.g. employment status, blood type, eye color, or
nationality.
 Ordinal dataalso labels data but introduces the concept of ranking. A
dataset of different qualification types is an example of ordinal data
because it contains an explicit, increasing hierarchy, e.g. High School
Diploma, Bachelor’s, Master’s, Ph.D., etc.
 Interval data categorizes and ranks data, and introduces precise and
continuous intervals, e.g. temperature measurements in Fahrenheit
and Celsius, or the pH scale. Interval data always lack what’s known
as a ‘true zero.’ In short, this means that interval data can contain

31
negative values and that a measurement of ‘zero’ can represent a
quantifiable measure of something.
 Ratio data categorizes and ranks data, and uses continuous
intervals (like interval data). However, it also has a true zero, which
interval data does not. Essentially, this means that when a variable is
equal to zero, there is none of this variable. An example of ratio data
would be temperature measured on the Kelvin scale, for which there
is no measurement below absolute zero (which represents a total
absence of heat).
Why do the different levels of
measurement matter?
Distinguishing between the different levels of measurement helps you decide which

statistical technique to use for analysis. For example, data analysts commonly use

descriptive techniques (for summarizing the characteristics of a dataset) and inferential

techniques (to infer broader meaning from those data). Understanding what level of

measurement you have will help narrow down the type of analysis you can carry out. That’s

because the level of measurement has implications for the type of calculations that are

possible using those data. When collecting data, then, it’s important to first decide what

types of insights you require. This will determine which level of measurement to use.

2. What is interval data? A definition

32
Interval data is a type of quantitative (numerical) data. It groups variables
into categories and always uses some kind of ordered scale. Furthermore,
interval values are always ordered and separated using an equal measure
of distance. A very good example is the Celsius or Fahrenheit temperature
scales: each notch on the thermometer directly follows the previous one,
and each is the same distance apart. This type of continuous data is useful
because it means you can carry out certain mathematical equations, e.g.
determining the difference between variables using subtraction and
addition. This makes interval data more precise than the levels of measure
that come below it, i.e. nominal or ordinal data, which are both non-
numeric.

Another distinguishing feature of interval data is that it lacks a ‘true zero.’


Put simply, this means that a measure of zero on an interval scale does not
denote the absence of something. By default, this means that zero on an
interval scale is simply another variable. For instance, zero Celsius is a
measure of temperature that can be preceded by meaningful negative
values.

Of the four levels of measurement, interval data is the third most complex.
By introducing numerical values, it is eminently more useful for carrying out
statistical analyses than nominal or ordinal data.

Key characteristics of interval data


 Interval data are measured using continuous intervals that show
order, direction, and a consistent difference in values.
 The difference between values on an interval scale is always evenly
distributed.
 Interval datasets have no ‘true zero,’ i.e. they may contain negative
values. This means they can be subtracted and added, but when

33
multiplied or divided do not offer meaningful insights (this has
important implications for the type of analyses you can carry out).
 Using interval data, you can calculate the following summary
statistics: frequency distribution; mode, median, and mean; and the
range, standard deviation, and variance of a dataset.

What’s the difference between interval


data and ratio data?
While interval and ratio data are both types of numerical data, the main
difference is that ratio data has a true zero, while interval data does not.
This distinction helps differentiate between the two types. If you are
working with quantitative data that contains negative values, you are
working with interval data. On paper, this distinction may seem minor, but
lacking a true zero has important implications for the types of statistical
analysis you can carry out. Using interval data, you cannot calculate the
ratios of your values, e.g. through multiplication, division, logarithms, and
squares.

3. What are some examples of interval


data?
What are some examples of interval data?

 Temperature in Fahrenheit or Celsius (-20, -10, 0, +10, +20, etc.)


 Times of the day (1pm, 2pm, 3pm, 4pm, etc.)
 Income level on a continuous scale ($10K, $20K, $30K, $40K, and so
on)
 IQ scores (100, 110, 120, 130, 140, etc.)
 pH (pH of 2, pH of 4, pH of 6, pH of 8, pH of 10, etc.)

34
 SAT scores (900, 950, 1000, 1050, 1100 etc.)
 Credit ratings (20, 40, 60, 80, 100)
 Dates (1740, 1840, 1940, 2040, 2140, etc.)

As you can see, interval data is all about measuring variables on an


equidistant scale where the zero point is an arbitrary figure. Next, let’s
explore how interval data is collected and commonly used.

4. How is interval data collected and


what is it used for?
Interval data can be collected in various ways. The collection method takes
into account important factors such as how the data will be used and the
nature of the target population.

Common collection techniques include surveys, interviews, or direct


observation. For instance, a survey question might be something like:

 Question: What is today’s temperature in Celsius? Possible


answers: -10, 0, +10, +20, +30.

Note that the distance between the intervals is always equal. This is the
same as for ratio data. However, what distinguishes interval from ratio data
is that the temperature in Celsius can be negative. This is important
because it means you cannot carry out ratio calculations, i.e. the Celsius
scale goes down to -273.15 degrees, so you cannot say that +20 degrees
have twice the value of +10 degrees.

Interval data is also documented through direct observation. For example,


a researcher might note the number of people entering a department store
at regular intervals to measure the change in footfall. This kind of approach
can also be automated using smart technologies. For instance,
temperature data is regularly collected using weather satellites. The benefit

35
of automated collection is that it allows you to compare past and present
data without needing to measure it directly, which can be impractical.

In reality, because the vast majority of numeric scales have a true zero,
most types of quantitative data are ratio data, not interval data. Interval
data is generally collected and used for very specific use-cases. However,
it is still important to understand the difference.

How is interval data used?


Interval data is most commonly used in areas like statistical research, for
grading exams, measuring IQ, applying credit ratings, carrying out scientific
studies on a population, or performing measures of probability.

5. How to analyze interval data


The level of measurement you use will inform the type of analysis you carry
out on your data. Regardless of scale, however, there are two main
categories of analysis you will use: descriptive and inferential statistics.

Descriptive statistics summarize the characteristics of a dataset. Inferential


statistics draw comparisons between samples and offer insights (or ‘infer’
information) based on those data. You can learn more about the difference
between descriptive and inferential statistics here. For now, let’s explore
some common descriptive and inferential techniques you can use on
interval data.

Descriptive statistics for interval data


Descriptive statistics you can obtain using interval data include:

 Frequency distribution
 Central tendency: Mode, median, and mean
 Variability: Range, standard deviation, and variance
36
Let’s look at each of these now.

Frequency distribution
Frequency distribution looks at how data are distributed. Let’s say you take
temperature measurements in the city you live in every day throughout the
year. Your measurements range from -15 degrees Fahrenheit to +90
degrees Fahrenheit. You might represent this information using a table.
Using this simple example, here’s how this might look:

The values in the frequency column show the distribution of temperature


measurements across the year. Another method is to visualize the data, for
instance using a pie chart.

The important thing to note here is that the relationship between different
categories is both hierarchical and evenly spread, i.e. the number of
37
degrees Fahrenheit measured in the category ‘30 to 45’ is the same as the
number of degrees Fahrenheit measured in the category ‘45 to 60,’ and so
on.

Measures of central tendency: Mode,


median, mean
Using interval data, it is possible to measure all three measures of central
tendency. These are:

 The mode (the value occurs most often in your dataset)


 The median (the central value)
 The mean (the average value)

It’s easy to identify the mode by looking at the pie chart or pivot table. As
we can see, throughout the year, the temperature most often falls
somewhere between 60 and 75 degrees Fahrenheit.

We can also identify the median value. This is the value at the center of
your dataset. Since measurements in our temperature dataset were taken
on 365 days of the year, we can determine that the median value is 183rd
value. This is the 45 to 60 degrees Fahrenheit category. The center point of
this category is 52.5 degrees, so this is our median value (or the best
possible estimate, using grouped data).

Finally, we can calculate the mean temperature. For grouped data, this
involves first calculating the midpoint of each group. We can add this to our
table. Next, we must find the product of each midpoint and its
corresponding frequency, which we can also add to our table.

38
By dividing the sum of frequency x product by the sum of frequencies
themselves, we obtain our mean temperature. Doing a quick calculation
(20,437.5 divided by 365) gives us a mean temperature of 56 degrees
Fahrenheit.

Measures of variability: Range,


standard deviation, and variance
Range, standard deviation, and variance are all important measures of
variability that you can extract from interval data.

The range is the simplest measure to determine—it describes the


difference between the smallest and largest value. It helps explain the
distribution of the data points. Because our weather data is already
numeric, we can easily calculate this. Our measurements range from -15 to
90. Doing the math (90 – [–15]) tells us that the range is 105.

The standard deviation (which measures the amount of variation or


dispersion in a set of values) and the variance (which measures variability
from the mean) are more complex measures to calculate. Rather than
getting into detail here, we recommend checking out the difference between
variance and standard deviation when you have a bit of spare time to get
your head around them!

39
Inferential statistics for interval data
To analyze quantitative (rather than qualitative) datasets, it is best to use
what are known as parametric tests, i.e. tests that use data with clearly
defined parameters. You can also use non-parametric tests (more
commonly used for qualitative, non-numerical data, i.e. nominal and ordinal
data). However, these provide less meaningful insights.

To highlight, here is a handful of parametric tests you can use to explore


interval data:

 T-test
 Analysis of variance (ANOVA)
 Pearson correlation coefficient
 Simple linear regression

T-test
The t-test helps to determine if there’s a significant statistical difference
between the mean of two data samples that may be related to one another.
For instance, is there a difference in average credit rating between adults in
the age group 30-40 and the age group 40-50? T-tests are commonly used
for hypothesis testing. To carry out a t-test, all you need to know is the
mean difference between values of each data sample, the standard
deviation of each sample, and the sum of data values in each group.

Analysis of variance
Analysis of variance (ANOVA) compares the mean values across three or
more data samples. For instance, is there a difference in credit rating
between adults in the age groups 30-40, 40-50, and 50-60? In essence,
you can use ANOVA in the same way as a t-test, but for more than two

40
variables. However many variables you have, the t-test will help determine
the relationship between the dependent and independent values.

Pearson correlation coefficient


Pearson correlation coefficient (also known as Pearson’s r) measures the
level of linear correlation between two sets of variables. For instance, does
a relationship exist between someone’s income and their credit rating? By
plotting quantitative variables on a graph, we can measure the direction
and strength of the linear relationship between them.

A graph of Pearson’s r, demonstrating the difference between a positive,


neutral, and negative correlation. Source: statisticshowto.com

Using this approach, values will always fall between 1 and -1. A value of 1
indicates a strong positive correlation, while a value of -1 indicates a strong
negative correlation. A value of 0 suggests no strong correlation at all
between variables.

Simple linear regression


Simple linear regression predicts the relationship between two variables, or
measures the impact of an independent variable on a dependent variable.

41
For instance, can a person’s income be used to predict their credit rating?
Simple linear regression uses only two variables, but there are variations
on the model. For instance, multiple linear regression measures aim to
predict the dependent output variable based on more two or more
independent input variables.

Hopefully this offers a helpful summary of some of the inferential


techniques you can use on interval data. While we haven’t gone into great
detail here, these tests offer a tantalizing taste of the complexity of insights
you can obtain if you use the right data with the right model.

6. Summary and further reading


In this post, we:

 Introduced the four scales, or levels, of measurement: Nominal,


ordinal, interval, and ratio.
 Defined interval data as a quantitative data type that groups variables
into ranked categories, using continuous numerical values.
 Explained the difference between interval and ratio data: Both are
types of numerical data. However, interval data lacks a true zero,
whereas ratio data does not.
 Shared some examples of interval data: Temperature in Fahrenheit
or Celsius, pH measure, IQ and SAT scores.
 Highlighted the descriptive statistics you can obtain using interval
data: Frequency distribution, measures of central tendency (mode,
median, and mean), and variability (range, standard deviation, and
variance).
 Introduced some parametric statistical tests for analyzing ordinal
data, e.g. T-test and ANOVA.

42
What is Ratio Data? Definition,
Characteristics and Examples
https://careerfoundry.com/en/blog/data-analytics/what-is-ratio-data/

BY WILL HILLIER, UPDATED ON APRIL 5, 202316 mins read

What is ratio data? What’s it used for? And how can we best collect
and analyze it? Find out in this guide.

Looking to break into the field of data analytics? Or simply a data


enthusiast? A prerequisite for exploring data more deeply is getting to grips
with the different types of data you might encounter. Broadly speaking,
there are four main types of data (also known as ‘levels of measurement’).
These are nominal, ordinal, interval, and ratio data. In this post, we’re going
to explore the last on this list—ratio data.

First up, though, it’s important to understand that the four data types do not
stand alone; they are closely related. We’ll start by summarizing the four.
We’ll then explore the various aspects of ratio data in closer detail. Want to
jump to a particular topic? Use the clickable headings:

1. An introduction to the four different


types of data

43
Broadly speaking, whatever data you are using, you can be certain that it
falls into one or more of four
categories: nominal, ordinal, interval, and ratio. Introduced in 1946 by
the psychologist Stanley Smith Stevens, these four categories are also
known as the levels of measurement. They are now widely used across
the sciences and within data analytics to define the degree of precision to
which a variable has been measured. As a hierarchical scale, each level
builds on the one that comes before it.

The most basic levels of measurement are nominal and ordinal data. These
are types of categorical data that take relatively simplistic measures of a
given variable. Building on these are interval and ratio data—more complex
measures. These are both types of numerical data. They can be harder to
analyze but will, in general, lead to much richer, actionable insights. Let’s
briefly look at what each level measures:

 Nominal data is the simplest data type. It classifies (or names) data
without suggesting any implied relationship between those data. For
instance, countries or species of animals are both forms of nominal
data.
 Ordinal data also classifies data but it introduces the concept of
ranking. An example might be labeling animals, but this time by using
discrete and imprecise measures of their speed (‘slow’, ‘medium’,
‘fast’).
 Interval data both classifies and ranks data (like ordinal data) but
introduces continuous measurements. Examples might be the time of
day or temperature measured on either the Celsius and Fahrenheit
scale. Importantly, it always lacks a ‘true zero.’ A measurement of
zero can be midway through a scale (i.e. you can have minus
temperatures).

44
 Ratio data classifies and ranks data, and uses measured,
continuous intervals, just like interval data. However, unlike interval
data, ratio data has a true zero. This basically means that zero is an
absolute, below which there are no meaningful values. Speed, age,
or weight are all excellent examples since none can have a negative
value (you cannot be -10 years old or weigh -160 pounds!)

What do the different levels of


measurement tell you?
Because each type of data has different features, this impacts how we analyze them.
For instance, we can’t use a regression model on nominal data, because nominal
data lacks the necessary characteristics required to carry out this type of analysis
(namely: no dependent and independent variables).

All statistical techniques fall into two broad categories: descriptive statistics (which
summarize a dataset’s features) and inferential statistics (which help us make
predictions based on those data). Determining if you’re working with nominal,
ordinal, interval, or ratio data helps narrow down which technique to use.
Conversely, determining what kind of analysis you wish to carry out (i.e. what your
goal is) will tell you which type of data measurement you need to take.

2. What is ratio data? A definition

45
Ratio data is a form of quantitative (numeric) data. It measures variables on
a continuous scale, with an equal distance between adjacent values. While
it shares these features with interval data (another type of quantitative
data), a distinguishing property of ratio data is that it has a ‘true zero.’ In
other words, a measure of zero on a ratio scale is absolute: ratio data can
never have a negative value. This is important because it allows us to apply
all the possible mathematical operations (addition, subtraction,
multiplication, and division) when carrying out statistical analyses.

It’s worth noting that while ratio data must have a true zero, it does not
necessarily require an endpoint. A ratio scale can have potentially infinite
values or a finite endpoint. The only important distinguisher over interval
data is the existence of a true zero.

Of the four levels of measurement, ratio data is the most complex—one


step up in the hierarchy from interval data. This also makes it the most
desirable type of data. Why? Well, in data analytics terms, this means it can
be used to carry out the widest possible range of analyses, vastly
improving our ability to test hypotheses and obtain accurate insights
(presuming, of course, that we’ve chosen the right analytical test and
executed it properly…but more on that later!)

Key characteristics of ratio data


 Ratio data are measured using a continuous, equidistant scale that
shows order, direction, and a precise difference in values.
 Ratio data have a ‘true zero,’ i.e. zero represents an absence of the
variable, and you cannot have negative values.

46
 Because ratio data lack negative values, they can be added,
subtracted, multiplied, and divided (unlike the other three types of
data).
 Ratio data can be used to calculate measures including frequency
distribution; mode, median, and mean; range, standard deviation,
variance, and coefficient of variation.

What’s the difference between ratio


data and interval data?
Both ratio and interval data are types of numerical data. The key difference
is that ratio data has a true zero, while interval data does not. So, if your
data are numerical, contain no negative numbers, and a measure of zero is
equivalent to an absence of the chosen variable, you are dealing with ratio
data. This difference is not trivial. By incorporating negative numbers,
interval data prevents us from carrying out key mathematical functions, i.e.
multiplication and division.

To illustrate, if we are measuring distance (ratio data) then we could say


that 40 miles are double the value of 20 miles. However, if we are
measuring temperature in Celsius (interval data) we cannot say that 40
degrees is double the value of 20 degrees since a measure of zero (rather
than being the absence of temperature) is simply another measurement
with an inherent value. This limits interval data’s usefulness. Ratio data is
always the preferable option if you can get your hands on it!

3. Ratio data examples


Now we have an idea of what ratio data is, what are some examples? Let’s
take a look.

 Temperature in Kelvin (0, +10, +20, +30, +40, etc.)

47
 Height (5ft. 8in., 5ft. 9in., 5ft. 10in., 5ft. 11in., 6ft. 0in. etc.)
 Price of goods ($0, $5, $10, $15, $20, $30, etc.)
 Age in years (from zero to 100+)
 Distance (from zero miles/km upwards)
 Time intervals (might include race times or the number of hours spent
watching Netflix!)

As you can see, ratio data is all about measuring continuous variables on
equidistant scales.

It’s important to note that while values in a ratio dataset must be capable of
reaching true zero, this is not the same thing as actually having values that
go down to zero. To illustrate, if you’re measuring the heights of a group of
adults, you probably won’t obtain many measurements below 5 feet. The
existence of true zero simply means that the measurement scale you are
using has a definitive starting point of zero, i.e. you could reach zero in
theory, even if not in practice.

Ratio data examples in survey


questions
So, using those examples listed previously, here are some uses in mock
survey questions.

How old are you?

 18-24 years old


 25-34 years old
 35-44 years old
 45-54 years old
 55-64 years old

48
How many hours do you use your phone per day?

 0-3 hours
 3-6 hours
 6-9 hours
 More than 9 hours

How much does your pet weigh?

 20-25 kg
 26-30 kg
 31-35 kg
 36-40 kg

Next, let’s see how ratio data is typically collected and used in everyday
life.

4. How is ratio data collected and what


is it used for?
There are many ways to collect ratio data. The chosen method depends on
the nature of what you are measuring and how you intend to use the data.
Common methods for collecting ratio data include surveys, questionnaires,
or interviews. A familiar type of question might be:

Question: How much time do you spend on social media per


day? Possible answers: 0-1 hours, 1-2 hours, 2-3 hours, 3-4 hours, 4-5
hours.

Note that, in this example, the distance between intervals is always


equal and there is a true zero, i.e. you cannot spend -2 hours a day on

49
social media. Plus, if your scale lacks equal distance between measures,
you are not collecting ratio data, but ordinal data.

Like interval data, ratio data are sometimes collected through direct
observation, too. For instance, a zoologist might measure the heights of
various elephants. To drive the point home, note once again that height
measurements have a true zero, i.e. an elephant with a height of zero is an
absence of an elephant.

Another common way of collecting ratio data is through automated data


collection. For instance, most vehicles have software that tracks their
speed and distance over time. Collecting and documenting this information
regularly and automatically is beneficial. It allows for direct comparison
between past and present data over periods that would be impractical to
measure through direct observation.

Finally, it’s helpful to remember that, as a general rule of thumb, most


quantitative data is ratio data. This is because most numerical
measurements use a true zero scale.

What is ratio data used for?


Because ratio data incorporates the cumulative characteristics of data from
all the levels of measurement (i.e. nominal, ordinal, and interval) it can be
used for any type of data analysis you can think of. This kind of makes ratio
data the holy grail of measurement scales! It can be used for everything
from measuring customer behaviors to predicting future sales trends, and
improving health outcomes…we could go on, but the list is pretty much
endless.

5. How to analyze ratio data

50
In all cases, ratio data is the best type of data to work with. This is because
it allows you to apply the entire arsenal of different statistical techniques.
Even in the case of summary statistics—the most fundamental type of
measurement—it allows you to scrutinize data at a deeper level than is
possible for nominal, ordinal, and interval data.

The two main types of statistical analysis are descriptive and inferential
statistics. Descriptive statistics summarize a dataset’s characteristics.
Inferential statistics allow you to test hypotheses or make predictions. Let’s
look at each more closely, in relation to ratio data.

Descriptive statistics for ratio data


Descriptive statistics you can obtain using ratio data include:

 Frequency distribution
 Central tendency: Mode, median, and mean
 Variability: Range, standard deviation, variance, and coefficient of
variation

Almost all of these statistics can also be measured using interval data. The
only exception is the coefficient of variation. For more detail on how you
might obtain each of these measures, check out section five of our post on
interval data, which uses more explicit examples.

Frequency distribution
As the name suggests, frequency distribution explores how a dataset’s
values are distributed. The most common way to measure frequency
distribution is to represent your data using a pivot table or some kind of
graph. For example, the bar graph here shows the distribution of weight in
a sample of marlins.

51
A bar plot showing the estimated weight of marlins. The x-axis shows
weight, the y-axis shows frequency. Source: John C. Holdsworth /
ResearchGate

Remember: while you can measure frequency distribution for many types
of data, ratio data it must have a true zero, as does a measure of weight
(on any scale).

Measures of central tendency: Mode,


median, mean
Just like interval data, it’s possible to determine the three measures of
central tendency using ratio data. These are:

 The mode (the value that’s repeated most often throughout the data)
 The median (the central value in the dataset)
 The mean (the dataset’s average value)

The measures of central tendency are useful summary statistics for judging
the relative positions and importance of different values within a dataset.
For example, we can use these measures to determine whether a value
falls below or above the mean, how far from the mean it sits, what this

52
implies, and so on. This is all beneficial when you are first dealing with a
new set of data since it helps determine the best way to analyze it in more
depth.

Measures of variability: Range,


standard deviation, variance, and
coefficient of variation
Variability is a term used to describe a collection of different measures.
Range, standard deviation, and variance are all measures of variability that
you can extract from ratio and interval data. However, using ratio data you
can also calculate what’s known as the coefficient of variation. But what do
all these measures tell us?

 Range: Describes the difference between the smallest and largest


value.
 Standard deviation: Measures the amount of variation, or
dispersion, in a set of values.
 Variance: Measures to what extent the values in a dataset vary from
the mean.
 Coefficient of variation: Measures the ratio between the standard
deviation and the mean. It’s usually expressed as a percentage. The
higher the number, the greater the degree of dispersion around the
mean. It’s a complex concept, but you can learn how to determine the
coefficient of variation in this guide.

Inferential statistics for ratio data


When it comes to in-depth statistical analyses, you can analyze ratio data
using the same techniques that you would use to analyze interval data.
Ideally, you should apply parametric over non-parametric techniques.

53
That’s because parametric techniques are uniquely suited to quantitative
data (which has clearly defined parameters). Parametric tests offer a
deeper level of insight than non-parametric tests. While you can still apply
these to ratio data, they will not make the most of a ratio dataset’s full
range of characteristics.

Here are some statistical tests you can use on ratio data:

T-test
T-tests help you identify whether or not a significant statistical variation
exists between the mean value of two separate data samples. For instance,
is there a difference in average heights between adults who weigh less
than 180 pounds and those who weigh more than 180 pounds? If you want
to test your hypothesis, the t-test is very useful. While there’s a range of
different versions, in general, all you need is the average difference
between values, the standard deviation, and the total of values from each
sample.

Analysis of variance (ANOVA)


You might use analysis of variance (ANOVA) to evaluate the mean values
across three or more data samples. How do they compare? For instance, is
there a difference in average heights between adults who weigh between
150 and 180 pounds, 180 and 210 pounds, and 210 and 240 pounds?
ANOVA provides a similar outcome to a t-test but is useful when there are
more than two independent variables.

Pearson correlation coefficient


You can use the Pearson correlation coefficient (also known as
Pearson’s r) to measure the extent of linear correlation between two sets of
variables. For instance, can you identify a relationship between someone’s

54
weight and the amount they spend on weekly groceries? By plotting
quantitative variables on a graph, you can determine the direction and
strength of correlation between the different variables. When calculating
Pearson’s r, values always fall between 1 and -1. 1 indicates a strong
positive correlation, while -1 indicates a strong negative correlation. A value
of 0 shows no correlation between variables.

Simple linear regression


You can use simple linear regression to identify the relationship between
two variables. One of these will be the dependent variable, which is
impacted by the independent variable. It is commonly used in predictive
analytics. For instance, can a person’s height be used to predict their
weight? Simple linear regression uses only two variables but variations,
such as multiple linear regression, measure a dependent variable based on
two or more independent variables.

This is just a small sample of the parametric tests you can use on ratio
data. The full selection is wide and includes variations on those already
described, from alternative regression tests (like logistic regression) to other
comparative tests (such as the paired t-test or multiple analysis of variance,
or MANOVA). While these methods require some getting used to, by now
you hopefully have an idea of the kinds of analyses you can carry out.

6. Summary and further reading


In this post, we’ve:

 Introduced the four levels of measurement: Nominal, ordinal, interval,


and ratio.
 Defined ratio data as a type of quantitative data that measures
variables using continuous, equidistant numerical values on a scale
with a true zero.
55
 Explained the difference between ratio and interval data: Both are
types of numerical data. However, only ratio data has a true zero,
allowing us to apply all possible mathematical operations (addition,
subtraction, multiplication, and division) when carrying out an
analysis.
 Shared some ratio data examples: temperature in Kelvin, height,
distance, age in years.
 Highlighted the descriptive statistics you can obtain using ratio data:
Frequency distribution, measures of central tendency (mode, median,
and mean), and variability (range, standard deviation, variance, and
coefficient of variation).
 Introduced some parametric tests for analyzing ratio data, e.g.
Pearson correlation coefficient and linear regression.

56
A Guide to Data Types in
Statistics
What types of data are used in statistics? Here's a comprehensive
guide.

Written byNiklas Donges

https://builtin.com/data-science/data-types-statistics

Data types are an important concept in statistics, which needs to


be understood, to correctly apply statistical measurements to
your data and therefore to correctly conclude certain
assumptions about it. This blog post will introduce you to the
different data types you need to know in order to
do proper exploratory data analysis (EDA), which is one of the
most underestimated parts of a machine learning project.

WHAT ARE THE MAIN TYPES


OF DATA IN STATISTICS?
 Nominal data
 Ordinal data
 Discrete data
 Continuous data

57
Introduction to Data Types

Having a good understanding of the different data types, also


called measurement scales, is a crucial prerequisite for doing
exploratory data analysis, since you can use certain statistical
measurements only for specific data types.

You also need to know which data type you are dealing with to
choose the right visualization method. Think of data types as a
way to categorize different types of variables. We will discuss the
main types of variables and look at an example for each. We will
sometimes refer to them as measurement scales.

Categorical or Qualitative Data Types

Categorical data represents characteristics. Therefore it can


represent things like a person’s gender, language,
etc. Categorical data can also take on numerical
values (Example: 1 for female and 0 for male). Note that those
numbers don’t have mathematical meaning.

58
NOMINAL DATA

Nominal values represent discrete units and are used to label


variables that have no quantitative value. Just think of them as
“labels.” Note that nominal data that has no order. Therefore, if
you would change the order of its values, the meaning would not
change. You can see two examples of nominal features below:

The left feature that describes a person’s gender would be called


“dichotomous,” which is a type of nominal scales that contains
only two categories.

ORDINAL DATA

Ordinal values represent discrete and ordered units. It is


therefore nearly the same as nominal data, except that its
ordering matters. You can see an example below:

59
Note that the difference between Elementary and High School is
different from the difference between High School and College.
This is the main limitation of ordinal data, the differences
between the values is not really known. Because of that, ordinal
scales are usually used to measure non-numeric features like
happiness, customer satisfaction and so on.

Numerical or Quantitative
Data Types
DISCRETE DATA

We speak of discrete data if its values are distinct and separate.


In other words: We speak of discrete data if the data can only
take on certain values. This type of data can’t be measured but it
can be counted. It basically represents information that can be
categorized into a classification. An example is the number of
heads in 100 coin flips.

You can check by asking the following two questions whether you
are dealing with discrete data or not: Can you count it and can it
be divided up into smaller and smaller parts?

CONTINUOUS DATA
60
Continuous data represents measurements and therefore their
values can’t be counted but they can be measured. An example
would be the height of a person, which you can describe by using
intervals on the real number line.

Interval Data

Interval values represent ordered units that have the same


difference. Therefore we speak of interval data when we have a
variable that contains numeric values that are ordered and
where we know the exact differences between the values. An
example would be a feature that contains temperature of a given
place like you can see below:

The problem with interval values data is that they don’t have a
“true zero.” That means in regards to our example, that there is
no such thing as no temperature. With interval data, we can add
and subtract, but we cannot multiply, divide or calculate ratios.
Because there is no true zero, a lot of descriptive and inferential
statistics can’t be applied.

Ratio Data

61
Ratio values are also ordered units that have the same
difference. Ratio values are the same as interval values, with the
difference that they do have an absolute zero. Good examples are
height, weight, length, etc.

Types of Data: Nominal, Ordinal, Interval/Ratio - Statistics Help |


Video: Dr Nic's Maths and Stats

Why Are Data Types Important in


Statistics?

Data types are an important concept because statistical methods


can only be used with certain data types. You have to analyze
continuous data differently than categorical data otherwise it
would result in a wrong analysis. Therefore knowing the types of
data you are dealing with, enables you to choose the correct
method of analysis.

We will now go over every data type again but this time in
regards to what statistical methods can be applied. To
understand properly what we will now discuss, you have to
understand the basics of descriptive statistics. If you don’t know
them, you can read my blog post (9min read) about
it: https://towardsdatascience.com/intro-to-descriptive-statistics-
252e9c464ac9.

Statistical Methods for Nominal,


Ordinal and Continuous Data Types

SUMMARIZING NOMINAL DATA

When you are dealing with nominal data, you collect information through:

62
Frequencies: The frequency is the rate at which something occurs over a

period of time or within a data set.

Proportion: You can easily calculate the proportion by dividing the frequency

by the total number of events. (e.g how often something happened divided by

how often it could happen)

Visualization Methods: To visualize nominal data you can use a pie chart or a

bar chart.

In data science, you can use one hot encoding, to transform


nominal data into a numeric feature.

SUMMARIZING ORDINAL DATA

When you are dealing with ordinal data, you can use the same
methods as with nominal data, but you also have access to some
additional tools. Therefore you can summarize your ordinal data
with frequencies, proportions, percentages. And you can
visualize it with pie and bar charts. Additionally, you can use
percentiles, median, mode and the interquartile range to
summarize your data.

63
In data science, you can use one label encoding, to transform
ordinal data into a numeric feature.

SUMMARIZING CONTINUOUS DATA

When you are dealing with continuous data, you can use the
most methods to describe your data. You can summarize your
data using percentiles, median, interquartile range, mean, mode,
standard deviation, and range.

Visualization Methods:

To visualize continuous data, you can use a histogram or


a boxplot. With a histogram, you can check the central tendency,
variability, modality, and kurtosis of a distribution.

Summary

In this post, you discovered the different data types that are used
throughout statistics. You learned the difference between
discrete & continuous data and learned what nominal, ordinal,
interval and ratio measurement scales are. Furthermore, you
64
now know what statistical measurements you can use at which
data Etype and which are the right visualization methods. You
also learned, with which methods categorical variables can be
transformed into numeric variables. This enables you to create a
big part of an exploratory analysis on a given data set.

65

You might also like