STAT Module
STAT Module
Learning Outcomes
After working on this module, you will be able to:
1. define statistics;
2. differentiate descriptive and inferential statistics;
3. evaluate the different measures of the average;
4. explain the appropriate measure of the average to use in a given data set;
5. use a variety of statistical tools to process and manage numerical data, and
6. appreciate the use of statistical data in making important decisions.
Activities To Do
Watch the video on this link: https://www.youtube.com/watch?v=jbkSRLYSojo
Question To Ponder
What is the importance of data in understanding and dealing with various aspects of the past
and our present-day living?
Statistics defined in its plural sense is a set of numerical data, while in its singular sense
refers to the scientific discipline consisting of theory and methods in processing numerical
information that one can use when making decisions in the face of uncertainty. Others define
statistics as the art and science of collecting, presenting, analyzing, and interpreting data.
Statistics aids in decision making, summarizes or describes data, helps forecast or predict
future outcomes, aids in making inferences, and helps in comparisons or establishing relationships.
In education, for instance, statistical techniques and methods are used to get information on
enrolment, finance, physical facilities, dropout rate, proficiency level, and many others. In
researches, statistical tools are used to test differences, effectiveness, impact, relationships or
interdependence of variables. In management, statistics is used in decision making such as labour
Department of Mathematics, College of Science, University of Eastern Philippines 143
GE 1 – Mathematics in the Modern World Module
Author: Engr. Ida E. Esquierdo Section 2: Mathematics as a Tool
relations, human resource allocation, performance assessment, and many others. In economics, it
determines trends, helps financial analysts make investment decisions. These are among the many
uses of statistics.
Areas of Statistics
Levels of Measurement
Nominal, ordinal, interval, and ratio data are the different levels of measurement. They only
differ in the property of numbers (identity, order, additivity) that they possess.
Levels of Measurement
Ordinal scale possesses the property of identity and order. It can rank
or order the objects to whether they possess more, less or the same amount
of the variables being measured.
4 The following are examples of nominal, ordinal, interval, and ratio level of measurement
Nominal Ordinal Interval Ratio
Example
gender, eye color, level of educational Celsius scale weight, height, time
smoking status, attainment, military measurement of spent in watching
nationality ranks, academic temperature and television, number of
ranks, class standing intelligence score students
Measures of center, or, more colloquially, averages, is the most well-known measures of
numeric data. They are single values, intended as representatives, which can neatly characterize
the whole group. In other words, a measure of the center is any single value that is used to identify
the center of the given set of data. It is often referred to as the average. Thus, trade unions, when
negotiating pay increases, often use an ‘average wage’ as a standard on which to base those
increases. There are three most commonly used measure of the center, namely, mean, median, and
mode.
The Mean
1. x
x 4 8 10 12 6 40 8
n 5 5
2. x
x 84 75 90 98 88 79 95 86 93 89 877 87.7
n 10 10
6
The following data relates to the number of successful sales made by the salesmen employed
Example
by a large microcomputer firm in a particular quarter. Calculate the average number of sales.
Solution:
Since the given problem is presented in terms of number of salesmen for each range of
number of sales, we can use the formula for the mean of the grouped data. That is,
x
fx , where: f is the number of salesmen and x is the midpoint between each
n
number of sales and n is the total number of salesmen.
2 1 7 14 12 23 17 21 22 15 27 6
x
80
2 98 276 357 330 162
x
80
1225
x
80
x 15.31
A weighted mean is a kind of average. Instead of each data point contributing equally to the
final mean, some data points contribute more “weight” than others. If all the weights are equal, then
the weighted mean equals the arithmetic mean. The formula is given by:
x
wx
wT
The following example will show how the weighted mean is used in computing the grade
point average (GPA).
7
A university uses 4 – point grading system, A = 4, B = 3, C = 2, D = 1 and F = 0. Dillon’s 2nd
Example
x
wx 3 4 4 3 1 3 2 4 35
wT 14 14
x 2.5
The mean may not be an actual value of observation in the data set because it could be
the median or the mode that would give the actual value of the average depending on the
extreme values of the data set.
It can be applied in at least an interval level of measurement.
It is easy to compute.
Every observation contributes to the value of the mean.
Subgroup mean can be combined to come up with a group mean.
The mean is easily affected by extreme values.
Median may also be computed using the formula for the grouped data.
8
1. Find the median for the data in the following lists.
Example
a. 4, 8, 1, 14, 9, 21, 12
b. 46, 23, 92, 89, 77, 108
2. The median of the ranked list 3, 4, 7, 11, 17, 29, 37 is 11. If the maximum value 37 is
increased to 55, what effect will this have on the median?
Solution:
1. The data given must be arranged either increasing or decreasing,
then get the middle value for the value of the median.
a. The number of observation is 7, therefore the middle value is 9
when arranged in increasing order. Therefore, the median is 9.
b. Since the number of observation is 6, there are two middle values,
77+89
77 and 89. The arithmetic mean: 2 = 83. Therefore, the
median is 83.
2. The median will remain the same because 11 will still be the middle
number in the rank list.
3. To compute the median of the given set of data, we can use the formula for the grouped
data given by:
n
F No. of No. of Sales
Md L 2 i Salesmen f F
f
0–4 1 1
5–9 14 15
Table 1 shows the no. of salesmen and the
10 – 14 23 38
no. of sales. Therefore, from the formula, 15 – 19 21 59
n 20 – 24 15 74
40 , F is the cumulative frequency 25 – 29 6 80
2
preceding the median class whose value is i 5 n 80
38, f is the frequency within the median
Table 1
class equals 21, L is the lower limit within
the median class (the true lower limit is
L 0.5 14 0.5 13.5 ).
n
F
Md L 2 i 13.5 40 38 5
f 21
M d 13.5 0.48
M d 13.98
Therefore, the median of the number of sales is 13.98
The Mode
The mode is the value that appears the most number of times or that value with the
greatest frequency. The mode may not exist, and even if it does exist it may not be unique. A
distribution having only one mode is called unimodal.
9
Find the mode for the data in the following lists.
Example
Data that has two modes is called bimodal. On the other hand, data that has three modes
is called trimodal.
The mean, the median, and the mode are all averages; however, they are generally not equal.
The mean of a set of data is the most sensitive of the averages. A change in any of the numbers
changes the mean, and the mean can be changed drastically by changing an extreme value.
In contrast, the median and the mode of a set of data are usually not changed by changing an
extreme value. When a data set has one or more extreme values that are very different from the
majority of data values, the mean will not necessarily be a good indicator of an average value.
In the following example, we compare the mean, the median, and the mode for the salaries
of 5 employees of a small company.
Self-Assessment Activity 1
I. Identify whether the following statements is a descriptive or inferential statistics.
_________________ 1. Last semester, the ages of students at a certain college ranged from 16 to
25 years old.
_________________ 2. Based on the survey conducted by the National Statistics Office, it is
estimated that 24% of unemployed people are women.
_________________ 3. A survey says that 1 out of 10 Filipinos is a member of a fitness center.
_________________ 4. A recent study showed that eating garlic can lower blood pressure.
_________________ 5. After studying the effects of the gradual increase in dosage of a certain
drug on cancer patients, a scientist concludes that the drug can arrest
cancer cell growth at increased quantities in all cancer patients.
II. Classify the following variables as qualitative or quantitative. If the variable is quantitative,
identify if it is discrete or continuous. Then, identify further whether each variable is
nominal, ordinal, interval, or ratio.
__________________________________________________ 1. Room temperature
__________________________________________________ 2. Color of cellular phone casing
__________________________________________________ 3. Number of dining customers
__________________________________________________ 4. Serial number of car motors
__________________________________________________ 5. Flavor of ice cream
__________________________________________________ 6. Height of mercury level in a barometer
__________________________________________________ 7. Educational attainment of teachers
Activity
Compare the dispensed of soda of both machine in Table
2. Explain the variation of data. What effect would this
make to any of the machines?
A measure of dispersion is
the statistical name for the spread or variability of data. This measure determines whether the set
of observations tend to be quite similar (homogeneous) or whether they vary considerably
(heterogeneous). To measure the spread or dispersion of data, we must introduce statistical values
known as the range and the standard deviation.
Range
The range of a set of data values is the difference between the greatest data
value and the least data value.
10
Find the range of the numbers of ounces dispensed by Machine 1 in Table 2
Example
Solution:
The greatest number of ounces dispensed is 10.07 and the least is 5.85. The range of the
numbers of ounces dispensed is 10.07 − 5.85 = 4.22 𝑜𝑧.
Standard Deviation
The range of a set of data is easy to compute, but it can be deceiving. The range is a measure
that depends only on the two most extreme values, and as such it is very sensitive. A measure of
dispersion that is less sensitive to extreme values is the standard deviation. The standard deviation
of a set of numerical data makes use of the individual amount that each data value deviates from
the mean.
The standard deviation indicates how closely the values of a given data are set clustered
around the mean. A lower value of the standard deviation means that the values of that given data
set are spread over a smaller range around the mean. On the other hand, a larger value of the
standard deviation means that the values of the data set are spread over a larger range around the
mean.
Standard Deviation
The following examples illustrate how the standard deviation should be computed following
the procedure.
11
The following numbers were obtained by sampling a population.
Example
2, 4, 7, 12, 15.
Find the standard deviation of the sample.
Solution:
2 4 7 12 15
Step 1: Calculate the mean: x 8
5
Step 2: Calculate the deviation between the number and the mean.
𝒙 𝒙−𝒙 ̅
2 2 − 8 = −6
4 4 − 8 = −4
7 7 − 8 = −1
12 12 − 8 = 4
15 15 − 8 = 7
Step 3: Calculate the square of each deviation in Step 2, and find the sum of these squared
deviation
𝒙 𝒙−𝒙 ̅ ̅) 𝟐
(𝒙 − 𝒙
2
(−6) = 36
2 2 − 8 = −6
4 4 − 8 = −4 (−4)2 = 16
7 7 − 8 = −1 (−1)2 = 1
12 12 − 8 = 4 (4)2 = 16
15 15 − 8 = 7 (7)2 = 49
Sum of the squared deviations
118
12
A consumer group has tested a sample of 8 size-D batteries from each of 3 companies. The
Example
results of the tests are shown in the following table. According to these tests, which company
produces batteries for which the values representing hours of constant use have the smallest
standard deviation?
From the result, the batteries from Dependable have the smallest standard deviation.
According to these results, the Dependable company produces the most consistent batteries
with regard to life expectancy under constant use.
The Variance
Variance
A statistic known as the variance is also used as a measure of dispersion. The
variance for a given set of data is the square of the standard deviation of the
data.
13
The following numbers were obtained by sampling a population.
Example
2, 4, 7, 12, 15.
Find the variance of the sample.
Solution:
Since the standard deviation has been computed in Example 2 whose value is 𝑠 = 5.43, the
variance is 𝑠 2 = (5.43)2 = 29.48
Self-Assessment Activity 2
1. Find the range, the standard deviation, and the variance for the given samples. Round non-
integer results to the nearest tenth.
a. 1, 2, 5, 7, 8, 19, 22
b. 3, 4, 7, 11, 12, 12, 15, 16
c. 2.1, 3.0, 1.9, 1.5, 4.8
d. 5.2, 11.7, 19.1, 3.7, 8.2, 16.3
2. A mountain climber plans to buy some rope to use as a lifeline. Which of the following would
be the better choice? Explain why you think your choice is the better choice.
Rope A: Mean breaking strength: 500 lb; standard deviation of 100 lb
Rope B: Mean breaking strength: 500 lb; standard deviation of 10 lb
3. Evaluate the accuracy of the following statement: When the mean of a data set is large, the
standard deviation will be large. Explain.
There are times when we want to know the position of a value relative to the other
observations in a data set. For instance, you took a 100 – item test. You might want to know how
your score of 88 compares to the scores of the other.
Z - Score
Consider an Internet site that offers movie downloads. Based on data kept by the site, an
estimate of the mean time to download a certain movie is 12 min with a standard deviation of 4
min. When you download this movie, the download takes 20 min, and you think that is an unusually
long time for the download. On the other hand, when your friend downloads the movie, the
download takes only 6 min, and your friend is pleasantly surprised at how quickly she receives the
movie. The point here is that, in each case, a data value far from the mean is unexpected.
Measuring the distance of a data value from the mean in standard deviation units instead in
the units of the data (minutes in this example) is quite useful. The number of standard deviations a
data value is from the mean is known as its 𝒛-score or standard score.
𝒁-Score
The 𝒛-score for a given data value 𝑥 is the number of standard deviations that
𝑥 is above or below the mean of the data. The following formulas show how to
calculate the 𝑧-score for a data value 𝑥 in a population and in a sample.
of all scores was 65 and the standard deviation was 8. He received a 60 on a second test, for
which the mean of all scores was 45 and the standard deviation was 12. In comparison to
the other students, did Raul do better on the first test or the second test?
Solution:
Find the z – score for each test:
72 65 60 45
First test: z 0.875 Second test: z 1.25
8 12
Raul scored 0.875 standard deviation above the mean on the first test and 1.25 standard
deviations above the mean on the second test. These 𝑧 - scores indicate that, in comparison
to his classmates, Raul scored better on the second test than he did on the first test.
15
Which of the following exam grades has better relative position?
Example
43 40 75 72
Algebra: z 1 Geometry: z 0.6
3 5
Since, the score z - score for Algebra test is larger, the position in the Algebra test is higher
that the position in the Geometry test.
16
A consumer group tested a sample of 100 light bulbs. It found that the mean life expectancy
Example
of the bulbs was 842 h, with a standard deviation of 90. One particular light bulb from the
DuraBright Company had a 𝑧 - score of 1.2. What was the life span of this light bulb?
Solution:
Substitute the given values into the z - score equation. Here, z 1.2 , x 842 , and s 90 .
x 842
1 .2
90
x 842 108 Solve for x .
x 108 842
x 950
Percentile
Most standardized examinations provide scores in terms of percentiles, which are defined
as follows:
Percentile
A percentile, p, is a measure used to indicate the value below which a given
percentage of observation fall.
For example, the 90th percentile indicates the value below which 90% of observation may
be found. And it is also the value above which 10% of the observations may be found. The
following example illustrates how this will be done.
17
The median annual salary for a physical therapist is ₱ 3,724,000. If the 90th percentile for
Example
the annual salary of a physical therapist was ₱ 5,295,000, find the percent of physical
therapists whose annual salary is
a. more than ₱ 3,724,000.
b. less than ₱ 5,295,000.
c. between ₱ 3,724,000 and ₱ 5,295,000.
Solution:
a. By definition, the median is the 50th percentile. Therefore, 50% of the physical
therapists earned more than ₱ 3,724,000 per year.
b. Because ₱ 5,295,000 is the 90th percentile, 90% of all physical therapists made less
than ₱ 5,295,000.
c. From parts a and b, 90% − 50% = 40% of the physical therapists earned between
₱ 3,724,000 and ₱ 5,295,000.
18
Find the percentile rank of a test score of 49 in the data set: 12, 28, 35, 42, 47, 49, 50.
Example
Solution:
Arrange the data in order from lowest to highest. Notice that there are 5 values of
scores that is less than 49. Then substitute in the formula.
71.4 or approximately 71th, this means that 71% of the scores in the distribution are
less than 49 or 29% of the scores in the distribution are greater than 49.
Quartiles
2, 5, 5, 8, 11, 12, 19, 22, 23, 29, 31, 45, 83, 91, 104, 159, 181, 312, 354
𝑄1 𝑄2 𝑄3
Box-and-Whisker Plot
A box-and-whisker plot (sometimes called a box plot) is often used to provide a visual
summary of a set of data. A box-and-whisker plot shows the median, the first and third quartiles,
and the minimum and maximum values of a data set. See the figure below.
1. Draw a horizontal scale that extends from the minimum data value to
the maximum data value.
2. Above the scale, draw a rectangle (box) with its left side at and its
right side at .
3. Draw a vertical line segment across the rectangle at the median, .
4. Draw a horizontal line segment, called a whisker, that extends from
to the minimum and another whisker that extends from to the
maximum.
19
Construct a box-and-whisker plot for the data set in Example 18.
Example
Solution:
For the data set in Example 6, we determined that Q1 39 , Q2 43 , and Q3 51.5 . The
minimum data value for the data set is 26, and the maximum data value is 73. Thus the box-and-
whisker plot is as shown below.
Stem-and-Leaf Diagrams
The relative position of each data value in a small set of data can be graphically displayed by
using a stem-and-leaf diagram.
1. Determine the stems and list them in a column from smallest to largest
or largest to smallest.
2. List the remaining digit of each stem as a leaf to the right of the stem.
3. Include a legend that explains the meaning of the stems and the leaves.
Include a title for the diagram.
20
Consider the following history test scores:
Example
65, 72, 96, 86, 43, 61, 75, 86, 49, 68, 98, 74, 84, 78, 85, 75, 86, 73
Solutions:
In the stem-and-leaf diagram at the right, we have
organized the history test scores by placing all of the scores
that are in the 40s in the top row, the scores that are in the
50s in the second row, the scores that are in the 60s in the
third row, and so on. The tens digits of the scores have been
placed to the left of the vertical line. In this diagram, they
are referred to as stems. The ones digits of the test scores
have been placed in the proper row to the right of the
vertical line. In this diagram, they are the leaves. It is now
easy to make observations about the distribution of the
scores. Only two of the scores are in the 90s. Six of the
scores are in the 70s, and none of the scores are in the 50s.
The lowest score is 43, and the highest is 98.
The choice of how many leading digits to use as the stem will depend on the particular data
set. For instance, consider the following data set, in which a travel agent has recorded the amount
spent by customers for a cruise.
Self-Assessment Activity 3
1. A data set has a mean of 𝑥̅ = 75 and a standard deviation of 11.5. Find the 𝑧 − 𝑠𝑐𝑜𝑟𝑒 for each
of the following.
a. 𝑥 = 85
b. 𝑥 = 95
c. 𝑥 = 50
d. 𝑥 = 75
2. A blood pressure test was given to 450 women ages 20 to 36. It showed that their mean
systolic blood pressure was 119.4 mm Hg, with a standard deviation of 13.2 mm Hg.
a. Determine the z-score, to the nearest hundredth, for a woman who had a systolic blood
pressure reading of 110.5 mm Hg.
b. The 𝑧 - score for one woman was 2.15. What was her systolic blood pressure reading?
3. The following data give the scores of 18 students in a Statistics class:
84 92 87 77 98 58 63 97 82
94 84 90 68 64 92 78 75 84
Summary
This module served as a review of SHS Math 2 (Statistics and Probability). We have
discussed the differences between descriptive and inferential statistics. We have learned to
classified variables as qualitative and quantitative (discrete and continuous); levels of
measurement; measures of central tendency (mean, median, and mode); measures of dispersion
(range; standard deviation, and variance); and measures of relative position (z-score, percentile,
and quartile).
Population Mean:
Sample Mean:
Standard Deviation
References
Aufmann, R., Lockwood, J., et.al, Mathematics in the Modern World, Rex Bookstore, Inc., 2018.
Lerner, K.L., Lerner, B.W., Real-life Math, Vol. 2, Thomson Gale, 2006.
Nocon, R., Nocon, E., Essential Mathematics for the Modern World, C & E Publishing, Inc. 2018.
Post, T.R., The Role of Manipulative Materials in the Learning Mathematical Concepts.
Retrieved from: http://www.cehd.umm.edu/ci/rationalnumberproject/81_4.html