1.2.
Variables and Data
In the language of statistics, one of the most basic concepts is sampling. In most statistical
problems, a specified number of measurements or data – a sample – is drawn from a much
larger body of measurements, called the population.
Definition 1. A population is the set of all measurements of interest to the investigator.
Definition 2. A sample is a subset of measurements selected from the population of interest.
In most cases, we are interested primarily in the population, but the population may be difficult
or impossible to enumerate. Thus, we try to describe or predict the behavior of the population on
the basis of information obtained from a representative sample from that population. We hope
that the sample is representative of the population.
The words sample and population have two meanings for most people. In statistics, we
distinguish between the set of objects on which the measurements are taken and the
measurements themselves. To experimenters, the objects on which measurements are taken are
called experimental units.
Definition 3. A variable is a characteristic that changes or varies over time and/or for
different individuals or objects under consideration.
Definition 4. An experimental unit is the individual or object on which a variable is measured.
Definition 5. A single measurement or data value results when a variable is actually
measured on an experimental unit.
Example 1. A set of five students is selected from all undergraduates at a large university, and
measurements are entered into a spreadsheet as shown below.
Identify the various elements involved in obtaining this set of measurements.
Answers.
1. The experimental unit on which the variables are measured is a particular undergraduate
student on the campus, found in column A.
2. Five variables are measured for each student: grade point average (GPA), gender, year in
college, major, and current number of units enrolled.
3. Each of these characteristics varies from student to student. If we consider the GPAs of all
students at this university to be the population of interest, the five GPAs in column B
represent a sample from this population.
In Example 1 we have measured each of the five variables on a single experimental unit — the
student. Therefore, in this example, a measurement really consists of five observations, one for
each of the five measured variables. For example, the measurement taken on student 2
produces this observation:
(2.3, F, So, Mathematics, 15)
You can see that there is a difference between a single variable measured on a single
experimental unit and multiple variables measured on a single experimental unit.
Definition 6. Univariate data results when a single variable is measured on a single
experimental unit.
Definition 7. Bivariate data results when two variables are measured on a single
experimental unit.
Definition 8. Multivariate data results when more than two variables are measured on a
single experimental unit.
In Example 1, five variables were measured on each student, resulting in multivariate data.
Variables can be classified into one of two types: qualitative or quantitative.
Definition 9. A qualitative variable measures a quality or characteristic on each experimental
unit.
Definition 10. A quantitative variable measures a numerical quantity or amount on each
experimental unit.
Example 2. Qualitative variables produce data that can be categorized according to
similarities or differences in kind; hence, they are often called categorical data. Examples are
Political affiliation: PDP-Laban, NP, NPC, Lakas-CMD, LP
Taste ranking: excellent, good, fair, poor
UAAP colors: blue, sky blue, dark blue, green, maroon, gold, white
Quantitative variables, often represented by the letter , produce numerical data, such as
those listed here:
QPI (Quality Point Index)
Number of students in class
Time spent in sleeping
Notice that there is a difference in the types of numerical values that these quantitative
variables can assume. The number of students, for example, can take on only the values
, whereas QPI can take any value from to , or . To describe this
difference, we define two types of quantitative variables: discrete and continuous.
Definition 11. A discrete variable can assume only a finite or countable number of values.
Definition 12. A continuous variable can assume the infinitely many values corresponding to
the points on an interval.
Example 3. The following table lists down a few examples of discrete variables and continuous
variables.
Discrete Variable Continuous Variable
number of siblings height and weight
number of pets at home COVID-19 recovery rate
number of gadgets owned distance from house to school
The types of data discussed above can be summarized in the following diagram:
Source: Mendenhall III, W. et al. (2020). Introduction to Probability and Statistics, 15th edition.
Brookes/Cole.