STAT 101 Module Handout 1.1
STAT 101 Module Handout 1.1
Statistics is the science of learning from data. Its importance is emphasized in the various sectors of society.
Now more than ever, we are seeing how statistics shape the way we view the world. Statistics guide us in
making decisions, from the most trivial to the most crucial ones. For instance, statistics is used in making routine
decisions such as what to wear without you even noticing it. Given the publicly available weather data, would
you still wear your sweater given that the heat index is 48 oC? Likewise, statistics is used in decision-making
with far-reaching impacts. Given the statistics that fully vaccinated individuals are less likely to be hospitalized
due to COVID-19 than those who have not been vaccinated, would you still be hesitant to get yourself
vaccinated? With the voluminous data that are being produced today, it becomes more evident that statistics
plays an important role in evidence-based decision making. Thus, everyone should learn how to use statistics
to fully understand and make sense of this huge amount of data. In this module, you will learn the basic concepts
in statistical inference. Let’s start training your brain for statistics by watching this TEDTalks video by Alan Smith
and a video on why you need to study statistics.
LEARNING OBJECTIVES
At the end of this module, you must be able to:
1. define basic statistical concepts in the context of a given problem,
2. differentiate parameter from statistic, and
3. classify variables according to their type and level of measurement.
The term statistics can be viewed both as singular and plural sense. Statistics in the singular sense refers to
a branch of science concerned with the collection, organization, presentation, analysis, and interpretation of
sets of numerical figures. It refers to the science consisting of theory and methods for planning the collection of
data to answer the research questions at hand, and then collecting the data, analyzing the data, and finally
interpreting the results of the analysis. On the other hand, statistics in the plural sense refers to a collection of
numerical figures. For instance, the numbers that you hear and read from the news, such as the number of
COVID-19 active cases, the unemployment rate, the inflation rate, the proportion of Filipinos who are satisfied
with the government response to COVID-19, and even basketball statistics, point out to the use of statistics in
the plural sense.
Making sound decisions is anchored with the proper use of statistics. If the goal is to make good decisions
based on the data, then it is very important to know what questions you hope to answer with the data and how
the data was obtained. In most studies, you might be interested to answer questions about the characteristics
of a large group or to compare two or more well-defined groups. To do this, we use a small group of units from
each large group and use information from these units to learn about the characteristics of the large group.
DEFINITION
Population: The entire collection of entities that we want information about, on which inferences are made.
Sample: The subset of the population we actually examine to gather information.
If we want to learn about a population, why can we not just measure every unit in the population (census)
rather than observe a sample?
There are many reasons for studying a sample rather than the entire population. All too often, the population is
very large, hence it would be very costly and time-consuming to observe its entirety. That is why censuses such
as the Census of Population and Housing are conducted by the PSA infrequently. In some situations, the
process of measuring items in the population is destructive. For example, if you are interested in studying the
breaking strength of glass bottles in a manufacturing lot, it would be foolish to study the entire population. If you
want to learn more about the reasons for studying a sample rather than the entire population, you can watch
this YouTube video.
Suppose you wish to determine the strategies used by UPLB students to cope with remote
learning. As per records, there were around 12,000 UPLB students enrolled last semester.
Considering the limited resources, you decided to survey 373 randomly selected UPLB students
instead. In this example, the population of interest is the set of all UPLB students enrolled last
semester while the sample is the set of randomly selected UPLB students enrolled last
semester.
Icon made by Prosymbols
from www.flaticon.com
Summary measures from a population are called parameters. The population mean (µ) and the population
standard deviation (σ) are examples of parameters. You would often find parameters to be denoted using Greek
letters. On the other hand, summary measures based on a sample are called statistics. The sample mean (𝑥̅ )
and the sample standard deviation (s) are examples of statistics. You will again notice that there is a different
notation for statistics. Similarly, the sizes of the population and sample are traditionally denoted by uppercase
and lower-case English alphabets, like N and n, respectively. You will see more about the distinction between
parameters and statistics in the succeeding modules.
Statistics plays a key role in describing and analyzing information from the units in the population or sample.
For instance, we may observe your blood type (say, AB) and age (say, 21 years old). This only shows that the
observations we take may either be represented by labels or numerical values. Further, these characteristics
vary depending on the unit chosen. Another person in your house may have a different blood type and age.
These characteristics may also change over time within a single unit. Your age will certainly change in the
coming years, but your blood type will not. This information, a characteristic or attribute that is measured for the
unit under consideration, is referred to as variable. The realized value of a variable that is measured and
recorded for each unit is an observation. The collection of observations on one or more variables is called
data.
Variables can be classified into one of two categories: qualitative or quantitative. A qualitative
variable measures a characteristic that can be classified into one of a group of categories and
cannot be measured on a natural numerical scale. Qualitative variables could be represented in
a database as numbers: for example, 1 if male, 0 if female. These two numerical values do not
express quantity, amount, or magnitude. Zip code (say, 4030 for Los Baños, Laguna) is another
example of a qualitative variable that is arbitrarily coded numerically. Notice that performing any Icon made by Freepik
from www.flaticon.com
arithmetic operations on these variables does not convey any meaning. Observations made on
qualitative variables are sometimes referred to as categorical data.
Icon made by Flat Icons
from www.flaticon.com
A quantitative variable measures a characteristic that is recorded on a naturally occurring
numerical scale; thus, performing arithmetic operations on this type of variable has a meaning.
Quantitative variables can be further classified as either discrete or continuous. A quantitative
variable is said to be discrete if it can only take a finite or countable number of values. The
number of correct answers in a 10-item multiple-choice test is a discrete variable since the
number of possible values is finite: 0, 1, 2, 3, 4, …, 10. On the other hand, a continuous
quantitative variable takes infinitely many values at any point on a given interval. For example, the height (in
centimeters) of a student can assume any value from the minimum possible height and the maximum possible
height. In fact, for any two values of height that are selected, there is always a value between them. As a word
of caution, do not let the appearance of the recorded data be misleading as to their type. For example, a height
measurement of 174 cm does not imply that height is discrete. We have just recorded height to its nearest whole
number but still, height can take any value within an interval.
Levels of Measurement
It is essential to know the type of variable that is being collected and the kind of data that it generates since the
method of data presentation and analysis depends on the type of data that was collected. We now discuss the
levels of measurement of the data. There are four levels of measurement, arranged in a hierarchy, namely:
nominal, ordinal, interval, and ratio level.
The nominal level, the lowest level of measurement, gives names or labels to various categories
without a sense of order. For example, sex, employment status, farm type, tenure status of the
housing unit, and whether or not the child is fully immunized can be classified at the nominal
level. Arithmetic operations are not done with variables at the nominal level. Information that can
be obtained from processing data on these variables is limited to frequency counts and
percentages.
Icon made by Freepik
from www.flaticon.com
The ordinal level also deals with qualitative variables, but there is an inherent ordering.
However, the difference between categories cannot be measured and has no meaning. For
example, the highest educational attainment, satisfaction rating of a newly elected politician (very
dissatisfied, dissatisfied, neutral, satisfied, very satisfied), and student classification (freshman,
sophomore, junior, senior) can be classified at the ordinal level. Similar to the nominal level,
information that can be obtained from processing data on these variables is limited to frequency
Icon made by Freepik
counts and percentages, with additional information on ordering. from www.flaticon.com
The interval level deals with quantitative variables with differences between two consecutive
quantities being constant. This means that the intervals between categories have meaning. For
example, if we consider the IQ scores of two individuals, we can tell which of the two has a higher
IQ score by looking at that individual’s rank relative to the other. However, note that variables in
the interval level have no absolute zero point. The zero value in a variable in the interval level
Icon made by Freepik does not mean the absence of the characteristic being measured, instead, it has an arbitrary
from www.flaticon.com
interpretation. Further, the lack of an absolute zero point leads to ratios of two observed values
from a specific variable having no meaning. For example, the temperature measured in 0C is considered interval
level. A temperature of 00C does not indicate the absence of temperature in an area. In science, 00C is defined
as the freezing point of water. Also, we cannot say that a temperature reading of 40 0C is twice as hot as 200C.
The ratio level also deals with quantitative variables, but it has an absolute zero point. The zero
value in a variable in the ratio level means the absence of the characteristic being measured. For
example, the number of household members, the distance of the school from your house, and
the number of vehicles passing through EDSA can be classified in the ratio level. A 0 value for
the number of vehicles passing through EDSA, which in reality is highly improbable, means that
there was no vehicle observed passing through EDSA. Likewise, an observation of a household Icon made by Freepik
from www.flaticon.com
with 10 members is twice as many as a household with 5 members.
TAKE NOTE!
The level of measurement depends mainly on the measuring process and not on the property being
measured. As an example, we consider the weight of certain equipment. If we measure the weight of the
equipment in kilograms, it can be classified as a quantitative variable in the ratio level. However, if we
measure the weight of the equipment and label it as light, medium, and heavy, it becomes a qualitative
variable at the ordinal level. Additional examples for this topic can be found on this YouTube video.
REFERENCES
PECK, R., OLSEN, C., and DEVORE, J. L. (2015). Introduction to statistics and data analysis. Cengage
Learning.