Module 5
Module 5
MODULE 5
DATA MANAGEMENT
CORE IDEA
Statistical tools derived from mathematics are useful in processing and managing numerical data to
describe a phenomenon and predict values.
Learning Outcome:
5. Use a variety of statistical tools to process and manage numerical data.
6. Use the methods of linear regression and correlations to predict the value of a predict the value of a
variable given certain condi variable given certain conditions.
7. Advocate the use of statistical data in making important decisions. .
Introduction
1|Module 3
It is written in the Holy Book that “the truth shall set us free;” therefore, understanding statistics paves
the way towards intellectual freedom. For without sufficient knowledge about it, we may be doomed
to a life of half-truth. Statistics will provide deeper insights to critically evaluate information and to
bring us to the well-lit arena of practicality.
Descriptive Statistics. If statistics, in general, basically deals with analysis of data, then descriptive
statistics part of the general field is about “describing” data in symbolic forms and abbreviated
fashions. Sometimes we dealing with a large amount of data and that it is impossible to describe it as
it is being a large amount
To explore the characteristics of descriptive statistics, let us create a fictitious situation. What does it
mean if someone tells you that majority of workers earn approximately P20,000.00 in a month? Were
you able to dissect the idea behind the plain statement? Does it trigger your mind to question further?
This statement is a piece of information that described a particular trait or characteristic of a group of
workers. Supplied with this singular information but armed with statistical inquisitiveness, descriptive
statistics can further describe the given information to the extent of its depth and breadth.
Inferential Statistics. We could probably argue that descriptive statistics, with its characteristic to
describe, is sufficient to depict any given information. While it is effective to describe a manageable
size of data, it can hardly engulf a sizeable amount of data. Thus, for this kind of situation, inferential
statistics is the alternative technique that can be used. Inferential statistics has the ability to “infer”
and to generalize and it offers the right tool to predict values that are not really known.
Let us consider the fictitious situation we made under descriptive statistics, but this time instead of
reporting the approximate monthly earning of some workers, we want to determine the estimated
monthly earnings of all the workers in a certain region. By attempting to apply descriptive statistics, it
would be impossible to ask all the workers in the entire region about their monthly income. But by
using inferential statistics, we would instead practically decide to select just a small number of
workers and ask them of their predict or approximate in a more less fashion the monthly income of all
workers in the entire region.
Of course, inference or generalization is a risky process that is why we need to ensure that the small
group of workers we selected are the approximate representative of the workers in the entire region.
But nevertheless, this inference or prediction is better than chance accuracy.
Measurement
It essentially means quantifying an observation according to a certain rule. For instance, the presence
of fever can be quantified by using a thermometer. Body weight can be determined by using a
weighing scale. Or the mental ability can be quantified by using written examination that can generate
scores. The quantification sometimes can be done is simply counting. In quantifying an observation,
there are two types of quantitative informations: variable and constant. A variable is something that
can be measured and observed to vary. While a constant is something that does not vary, and it only
maintains a single value.
Scales of Measurement
2|Module 3
To quantify an observation, it is necessary to identify its scale of measurement, it is known as level of
measurement. Scale of measurement is the gateway to the fascinating world of statistics. Without
sufficient knowledge of it, all our statistical learnings lead to nowhere.
Nominal Scale. It concerns with categorical data. It simply means using numbers to label categories.
This is done by counting the occurrence of frequency within categories. One condition is that the
categories must be independent or mutually exclusive. This implies that once something is identified
under a certain category, then that something cannot be reassigned at the same time to another category.
An example for this, if we want to measure a group of people according to marital status. We can
categorize marital status by simply assigning a number. For instance “1” for single and “2” for married.
Obviously, those numbers only serve as labels and they do not contain any numerical weight. Thus,
we cannot say that married people (having been labelled 2) have more marital status than single
people (having been labelled 1).
Ordinal Scale: It concerns with ranked data. There are instances wherein comparison is necessary
and cannot be avoided. Ordinal scale provides ranking of the observation in order to generate
information to the extent of “greater than” or “less than;”. But the ranked data generated is limited
also the extent of “greater than” or “less than;”. It is not capable of telling information about how
much greater or how much less.
Ordinal scale can be best illustrated in sports activities like fun run. Finding the order finish among
the participants in a fun run always come up with a ranking. However, ranked data cannot provide
information as to the difference in time between 1st placer and 2nd placer. Relative to this, reading
reports with ordinal information is also tricky. For example, a TV commercial extol a certain brand
for being the number one product in the country. This may seem acceptable, but if you learned that
there is no other product then definitely the message of the commercial will be swallowed with an
smirking face.
Interval Scale: It deals with measurement data. In the nominal scale, we use numbers to label
categories while in the ordinal scale we use numbers to merely provide information regarding greater
than or less than. However, in interval scale we assign numbers in such a way that there is meaning
and weight on the value of points between intervals. This scale of measurement provides more
information about the data. Consider the comparative illustration below:
Interval Data 99 74 73 70 70
Ordinal Data 1st 2nd 3rd 4th 5th
Nominal Data Passed Failed Failed Failed Failed
3|Module 3
As you may have noticed, the interval scale provides substantial information about the grades of
students. Student A earned a grade of 99, and so on and so forth. Now look at the information given
by ordinal data. It is simply about ranking. With this of information, Student B can proudly and
rightfully claim the 2nd place in the ranking. Ordinal scale is a trusted friend to keep a secret, that the
grade of student B though claiming 2nd place is actually 74. Let us analyze the nominal data in our
example. With this scale, it is also alright for the school sadly to announce that only one student
passed and four students failed. Nominal data cannot provide more information specifically provide
brighter limelight to student A. Audience may assume that Student A just got passing grade a little bit
higher than the passing mark but student A grade of 99 will remain hidden forever.
Ratio Scale. This is an extension of an interval scale. It also pertains with measurement data but
ratio’s point of view is about absolute value. Because of this, we oftentimes cannot utilize ratio scale
in the social sciences. We cannot justify an absolute value to gauge intelligence. We cannot say that
our student A with a grade of 99 has an intelligence several points superior than student E who hardly
but successfully achieved a grade of 70.
events having at least one trait in common (Sprinthall, 1994). A common trait is the binding factor in
order to group a cluster and call it a population. Merely having a clustering of people, things or events
cannot be considered as a population. At least one common trait must be established to make a
population. But, on the other hand, adding too many common traits can also limit the size of the
population. In the illustration below, notice how a trait can severely reduce the size or membership in
the population.
A group of students (this is a population, since the common trait is “students”) A group of male
students.
A group of male students attending the Statistics class with iPhone and Earphone
As we read the list, we can mentally visualize that the size of the population is dramatically
becoming smaller and as we add more traits we may wonder if anyone still qualifies. The more
common traits we add, the more we reduce the designated population.
Parameter. In gauging the entire population, any measure obtained is called a parameter .
Situationally, if someone asks you as to what is the parameter of the study, then bear in mind that he
is referring to the size of the entire population. In some situations where the actual size of the
population is difficult to obtain; the parameters are in the form of estimate or inference.
Sample. The small number of observation taken from the total number making up a population is
called a sample. As long as the observation or data is not the totality of the entire population, then it is
always considered a sample. For instance, in a population of 100, then 1 is considered as a sample. 30
is clearly a sample. It may seem absurd but 99 taken from 100 is still considered a sample. Not until
we include
that last number (making it 100) could we claim that it is already a population and no longer a
sample.
Statistic. In gauging the sample, any measure obtained from the sample is called a statistic.
Whenever we describe the sample, then it is called statistics. Since a sample is easier to observe or
gather than the population, then statistics are simpler to gather than the parameter.
4|Module 3
Graphs. It is another way to visually show the behavior of data. To create a graph, distribution of
scores must be organized. For instance, in the scores provided below, presenting the scores in an
unorganized manner can provide confusing or no information at all; Reporting raw can even hide
some significant scores to be noticed.
But when we arrange the scores from highest to lowest, which is a form of score distribution, some
pieces of information can gradually brought forth and exposed.
Distribution of Scores
120
110
105
105
100
100
95
90
90
90
85
85
80
75
65
The score distribution can still be organized in a form of a frequency distribution. Frequency
distribution provides information about raw scores, and the frequency of occurrences. Frequency
distribution provides clearer insights about the behavior of scores.
5|Module 3
Another alternative way of presenting data in frequency distribution is to present them in a tabular
form. A tabular form has the advantage of showing the visual representation of the data. This kind of
presentation is more appealing to the general audience.
Another way of showing the data in graphical form is by using Microsoft Excel, as also illustrated in
the graphs below. It is the frequency polygon of the scores in our cited example above.
Notice in the illustration of the frequency polygon, the two graphs may appear different but they are
actually the same and they disclose the similar information. This illustration will allow you realize
that unless you see things with a critical eye, a graph can create a false impression of what the data
really reveal. This is an obvious situation showing how graphs can be used to distort reality if you are
not equipped with a critical statistical mind. This type of deceitful cleverness in distorting graphs is
common in some corporations devising the tinsel to camouflage and also to portray some gigantic
leaps in sales in order to attract more clients or buyers.
Learning Activity 1
1. Both Globe and Smart phone number prefix 0917 and 0923 served 1 million
and 2.5 subscribers, respectively.
6|Module 3
2. The Philippine Statistics Office announces that the average height of
Filipino male is 156.41 cm tall.
3. Postal Office shows that 4,231 individuals have a zip code of 4231.
4. The Sportsfest committee posted the names of individuals with their order
of finish for the
first 50 runners to reach the finish line.
5. The University Admission Office posted the names and scores of student
applicants who took the entrance examination.
Discussion
As we venture into the realm of descriptive statistics, let us now focus in describing the nature of a
quantitative data. By using an appropriate descriptive technique, we can organize and neatly
summarize small amounts and large amounts of data distribution. The procedure, utilizing measures
of central tendency, allows us to precisely describe the centrality of data distribution.
Measures of central tendency are methods that can used to determine information regarding average,
ranking, and category of any data distribution. Mean, median and mode are the three tools in
obtaining the measures of central tendency. But only by knowing and using the appropriate tool that
most accurate estimation of centrality can be achieved. The objective of the measures of central
tendency is to describe the centrality of the distribution into a single numerical unit. This single
numerical unit must provide clear description about the common trait being observed in the
distribution of scores.
7|Module 3
In this example, the mean is an appropriate measure of central tendency because the distribution is
fairly well-balanced. This means that there are no extremely high or extremely low scores in either
direction that can unusually influence the average
of the scores. Thus, the mean value of 190,083.00 represents the total picture of the distribution (i.e.
annual incomes). This means that in a “more or less” or approximate fashion it describes the entire
distribution.
Mean of Skewed Distribution. There are situations wherein the mean cannot be trusted to provide a
measure of central tendency because it portrays an extremely distorted picture of the average value of
a distribution of scores. For instance, let us still consider our example of annual incomes but this time
with some adjustment. Let us introduce another score. The annual income of an affluent new neighbor
who happened to move to this town just recently. This new neighbor has a frugal high annual income
so extremely far above the others.
8|Module 3
As you may have noticed, the mean income of Php 367,769.00 this time provides a highly misleading
picture of great prosperity for this neighborhood. The distribution was unbalanced by an extreme
score of the new affluent neighbor. This is what we call an skewed distribution.
When the tail goes to the right, the curve is positively skewed; when it goes to the left, it is negatively
skewed. The skew is in the direction of the tail-off of scores, not of the majority of scores. The mean
is always pulled toward the extreme score in a skewed distribution. When the extreme score is at the
low end, then the mean is too low to reflect centrality. When the extreme score is at the high end, the
mean is too high.
The Median
The median is the point that separates the upper half from the lower half of the distribution.
distribution. It is the middle point or midpoint midpoint of any distribution. distribution. If the
distribution is made up of an even number of scores, the median can be found by determining the
point that lies halfway between the two middlemost scores.
193,000.00
190,000.00
185,000.00 MEDIAN
180,000.00
9|Module 3
The Mode
Another measure of central tendency is called the mode. It is the most frequently occurring score in a
distribution. In a histogram, the mode is always located beneath the tallest bar.
10 | M o d u l e 3
The mode provides an extremely fast way of knowing the centrality of the distribution. You can
immediately spot the mode by simply looking at the data and find the dominant constant. It is the
frequently occurring scores.
The best way to illustrate the comparative applicability of the mean, median and mode is to look
again at the skewed distribution.
Most income is always skewed to the right because the low end has a fixed limit of zero while the
high end has no limit. If we consider that the area of the curve is 100 percent, then the median is the
exact midpoint of the distribution. The area below and above the median is both equal to 50 percent.
Thus, if the median income is P20,000.00 this means that 50% of the households have an income
below P20,000.00 and 50% of the households have an income above P20,000.00. On the other hand,
the mean in our figure above indicates a high income of P 100,000. This makes the curve positively
skewed. The value of the mean gives a distorted picture.
The scale of measurement in which the data are based oftentimes dictates the measures of central
tendency to be used. The interval data can entertain the calculations of all three measures of central
tendency. The modal and ordinal data cannot be used to calculate for the mean. Ordinal mean can
provide an extremely confusing wrong result. Since median is about ranking, a rank above the score
falls and a rank below a score falls; the ordinal arrangement is necessary in finding the median. For
the nominal data, however, neither the mean nor the median can be used. Nominal data are restricted
by simply using a number as a label for a category and the only measure of central tendency
permissible for nominal data is the mode.
In summary, if the interval data distribution is fairly well balanced, it is appropriate to use the mean to
measure the central tendency. If the distribution of the interval data is skewed, you may either remove
the outlier or adopt the median. If the interval data distribution manifests a significant clustering of
11 | M o d u l e 3
scores, then consider to visually analyze the scores and find the presence of dominant constant which
is the Mode.
Learning Activity 2
1. A class of 13 students takes a 20-item quiz on Science 101. Their scores were as follows: 11,
11, 13, 14, 15, 18, 19, 9, 6, 4, 1, 2, 2.
2. A day after, the of 13 students mentioned in problem 1 takes the same test a second time. This
time their scores were: 10, 10, 10, 10, 11, 13, 19, 9, 9, 8, 1, 7, 8.
d. Was there a difference in their performance when taking the test a second time?
3. For the set of scores: 1000, 50, 120, 170, 120, 90, 30, 120.
MEASURES OF POSITION
Learning Objectives:
10. perform the specific activities or tasks and complete the exercises and
assessments provided.
12 | M o d u l e 3
QUARTILE
Quartiles are values that divide a set of data into four equal parts. Each part is equal to
a quarter of the data. Quartiles are calculated only after the data have been sorted. Values are said to
be sorted if they are arranged in ascending order.
There are three quartiles, denoted by Q1, Q2, and Q3:
1
Q1 (First Quartile) This value is the median of the first half of the data. These separates of the
4
3
sorted values from the upper of the values. If there are 𝑁 sorted values,
4
25% 𝑜𝑓 𝑁 ≤ 𝑄1 ≤ 75% 𝑜𝑓 𝑁.
1
Q2 (Second Quartile) This value is the median of the entire data. This separates the bottom
2
1
of the sorted values from the upper of the values. If there are 𝑁 sorted of the values,
2
50% 𝑜𝑓 𝑁 ≤ 𝑄2 ≤ 50% 𝑜𝑓 𝑁.
Q3 ( Third Quartile) This value is the median of the second half of the dat. This separates the
3 1
bottom of the sorted values from the upper of the values. If there are 𝑁 sorted values,
4 4
75% 𝑜𝑓 𝑁 ≤ 𝑄3 ≤ 25% 𝑜𝑓 𝑁.
16
𝑄1 = =4
4
𝑄1 = 4𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝑄1 = 6
➢ For 𝑄2
2(𝑛+1)
𝑄2 = 4
4, 5, 6, 6, 7, 7, 7, 8, 9, 9, 9, 9, 10, 10, 15
2(15+1)
𝑄2 = 4
32
𝑄2 = =4
4
𝑄2 = 8𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝑄2 = 8
➢ For 𝑄3
3(𝑛+1)
𝑄3 = 4
4, 5, 6, 6, 7, 7, 7, 8, 9, 9, 9, 9, 10, 10, 15
3(15+1)
𝑄3 =
4
48
𝑄3 = 4
=4
𝑄3 = 12𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝑄3 = 9
Example 2
The following are the test scores of 20 students for the first quarterly exam in mathematics.
Find the three quartiles.
72 84 70 80 62 78 44 74 72 82
72 56 74 58 75 64 78 64 79 82
Solution
Step 1: Determine the number of observations
➢ the number of observations is 𝑛 = 20
14 | M o d u l e 3
Step 2: Arrange the data
➢ 44, 56, 58, 62, 64, 64, 70, 72, 72, 72, 74, 74, 75, 78, 78, 79, 80, 82, 82, 84
21
𝑄1 = = 5.25 ≈ 6
4
𝑄1 = 6𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝑄1 = 64
➢ For 𝑄2
2(𝑛+1)
𝑄2 = 4
2(20+1) 44, 56, 58, 62, 64, 64, 70, 72, 72, 72, 74, 74, 75, 78, 78, 79, 80, 82, 82, 84
𝑄2 =
4
42
𝑄2 = = 10.5 ≈ 11
4
𝑄2 = 11𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝑄2 = 74
➢ For 𝑄3
3(𝑛+1)
𝑄3 = 4
44, 56, 58, 62, 64, 64, 70, 72, 72, 72, 74, 74, 75, 78, 78, 79, 80, 82, 82, 84
3(20+1)
𝑄3 = 4
63
𝑄3 = = 15.75 ≈ 16
4
𝑄3 = 16𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝑄3 = 79
Thus, in the result of the first quarterly exam in Math, 𝑄1 = 64, 𝑄2 = 74 and 𝑄3 = 79
This means to say that –
1. 64 represent the first quartile. 25% of the scores are below 64
2. 74 represents the second quartile. 50% of the scores are below 74
3. 79 represents the third quartile. 75% of the scores are below 79
15 | M o d u l e 3
QUARTILES OF GROUPED DATA
Let us find out from the following example how we can compute 𝑄1 , 𝑄2 and 𝑄3 for grouped data.
Example 3
A survey was conducted among 500 families to find out the length of time they spent in watching
TV per day the following data were obtained
No. of To solve for the median or 𝑄2 , the following steps are recommended
Hours
Families
1. Start by constructing a cumulative frequency or (cf).
0–1 55
This is done by successively adding the frequencies starting from the
2–3 87 lowest class interval.
10 – 11 35
12 - 13 15
3. Solve the median by substituting the corresponding values of the variables in the formula.
Cumulative To get the cumulative frequency, add the
Interval 𝑥1 𝑓1
Frequency frequency from this row + the previous
frequency, eg. 87+55=142
12 – 13 12.5 15 500
10 – 11 10.5 35 485
0–1 0.5 55 55
Total 500
𝑁
Next, identify the class interval to which belongs.
2
𝑁 500
= = 250
2 2
The class interval to which 250 belongs is 4 – 5 because it contains one half of the total frequency which
is 250. The lower-class boundary of the median class is 3.5, so
𝑁 = 250
𝐿𝑄1 = 3.5
< 𝑐𝑓 = 142
16 | M o d u l e 3
𝑖 = 2
The lower class boundary is formed by subtracting 0.5 units from the lower
𝑓𝑚 = 145
class limit. The upper class boundary is formed by adding 0.5 units to the
upper limit.
The median is
250 − 42
𝑚𝑒𝑑𝑖𝑎𝑛 = 3.5 + ( ) 2 = 4.99
145
The following are the formulas for computing Q1 and Q3 for grouped data:
𝑁
− 𝑐𝑓
𝑄1 = 𝑋𝐿𝐵 + ( 4 )𝑖
𝑓𝑄1
Where
𝑋𝐿𝐵 = lower boundary of the q1 class
𝑁 = total frequency
𝑐𝑓 = commulative frequency before the q1 class
𝑓𝑄1 = frequency of the 𝑄1 class
𝑖 = size of the class interval
3𝑁
− 𝑐𝑓
𝑄3 = 𝑋𝐿𝐵 + ( 4 )𝑖
𝑓𝑄3
Compute for the three quartiles of the height of 50 Filipino children, 7 to 12 years of age.
98 – 103 9 15
17 | M o d u l e 3
92 – 97 5 6
86 – 91 1 1
N=50
𝑄2 = 123
104 – 109 2 17
98 – 103 9 15
92 – 97 5 6
86 – 91 1 1
N=50
𝑄3 = 131.83
104 – 109 2 17
98 – 103 9 15
92 – 97 5 6
86 – 91 1 1
N=50
This means that 25% of the data is below 𝑄1 = 101.82, 50% of the data is below 𝑄2 = 123 and 75% of
the data is below 𝑄3 = 131.83.
Fifteen children have a height below 101.82 cm; thirty-one children have a height of 123 cm and below;
forty children have a height of 125.82 cm and below.
18 | M o d u l e 3
DECILE
Just like the quartiles, the deciles are values which divides a collection of data into equal
parts to analyze relationships. In statistics, even the most meager relationships are very significant in
tracing the existence of pattern, or in finding solutions to problems.
Deciles are the score-points that divide a distribution into 10 equal parts. The decile are computed in
the same way the quartiles are computed.
𝑘(𝑁 + 1)
𝐷𝑘 =
10
𝑘𝑁
− 𝑐𝑓𝑏
𝐷𝑘 = 𝑋𝐿𝐵 + ( 10 )𝑖
𝑓𝐷𝑘
Where 1 ≤ 𝑘 ≤ 9
Example 1. A garment factory owner keeps track of the quality of all produced merchandise before
they are sold out in the market. She patiently checks the garments and records all those with defects.
For the past 24 months, the owner noted the following number of defective items produced in the
month
45, 30, 36, 16, 21, 33, 40, 32, 14, 10, 29, 23, 39, 17, 11, 18, 34, 19, 24, 21, 65, 42, 37
Solution:
➢ Number of observations = 24
➢ 10, 11, 14, 16, 17, 18, 19, 21, 21, 23, 24, 26, 29, 30 ,32, 33, 34, 35, 36, 37, 39, 40, 42, 45
𝑘(𝑁 + 1)
𝐷𝑘 =
10
5(24 + 1)
𝐷5 =
10
𝐷5 = 12.5𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
29 − 26
𝐷5 = 26 +
2
𝐷5 = 26 + 1.5 = 27.5
19 | M o d u l e 3
Example 2. Compute for the 𝐷3 , 𝐷6 , and 𝐷9 of the height of 50 Filipino children, with ages 7 – 12
years old.
𝐷3 = 103.5
104 – 109 2 17
98 – 103 9 15
92 – 97 5 6
86 – 91 1 1
N=50
𝐷6 = 126.78
104 – 109 2 17
98 – 103 9 15
92 – 97 5 6
86 – 91 1 1
20 | M o d u l e 3
N=50
𝐷9 = 136.5
104 – 109 2 17
98 – 103 9 15
92 – 97 5 6
86 – 91 1 1
N=50
These means that 30% of the children have a height 103.5 c. and below, 60% of the children have a
height of 126.78 cm and below and 90% of the children have a height of 136.5 cm and below.
PERCENTILE
Percentiles are the values of arrange data which divide the whole data into one hundred equal parts.
Very often, schools used percentiles to ranks students based on their academic performances.
In common use, the percentile usually indicates that a certain percentage falls below the given
percentile rank. For example, if you are ranked 25 th percentile, then 25% of the test scores are below
your score. Expressed in another way, when you ranked in the 25 th percentile, then 75% of the test
scores are above you.
21 | M o d u l e 3
𝑝
𝑙 = 𝑛( )
100
where
𝑙 = location of the data value
𝑝 =percentile as a whole number
𝑛 =sample size
and for grouped data, we have the formula
𝐾𝑛
− 𝑐𝑓
𝑃𝑘 = 𝐿 + ( 100 )𝑖
𝑓𝑘
where
𝑃𝑘 =the kth percentile
𝐿 =the lower-class boundary of the kth percentile class
𝑓𝑘 =the frequency of the class containing the 𝑃𝑘
𝑛 =the total number of observations
𝑖 =the width of the class interval containing the percentile point
𝑐𝑓 =cumulative frequency for the class interval immediately below the class interval containing the
percentile point
Example 1. Given the data 11, 11, 14, 15, 16, 16, 17, 19, 22, 25, 26, 27, 31, 34, 36, what data value lies at
the 30th percentile?
Solution
➢ Number of observations = 15
➢ 11, 11, 14, 15, 16, 16, 17, 19, 22, 25, 26, 27, 31, 34, 36
30
𝑙 = 15 ( )
100
4500
𝑙= = 4.5𝑡ℎ
100
round the point to the 5th position. Thus, the 30th percentile is
11, 11, 14, 15, 16, 16, 17, 19, 22, 25, 26, 27, 31, 34, 36
Example 2
Using the table of the tabulated height of 50 students, compute for 𝑃13 , 𝑃32 and 𝑃84 .
134 – 139 10 50
22 | M o d u l e 3
128 – 133 9 40 Class interval: 98 - 103
13𝑛
122 – 127 8 31 𝐿 = 97.5 −𝑐𝑓
𝑃13 = 𝐿 + ( 100 )𝑖
𝑓𝑃13
116 – 121 1 23 𝑐𝑓 = 6
6.5−6
𝑓𝑃13 = 9 𝑃13 = 97.5 + ( )6
110 – 115 5 22 100
98 – 103 9 15
92 – 97 5 6
86 – 91 1 1
N=50
𝑃32 = 106.5
104 – 109 2 17
98 – 103 9 15
92 – 97 5 6
86 – 91 1 1
N=50
23 | M o d u l e 3
Class Interval For 84th Percenile
f cf
Height (cm) Class:
84𝑁
=
84(50)
= 42
100 10
𝑃84 = 134.7
104 – 109 2 17
98 – 103 9 15
92 – 97 5 6
86 – 91 1 1
N=50
Learning Activity 3
1. 15, 6, 17, 84, 13, 57, 89, 19, 53, 85, 14, 23, 27, 66, 21, 1, 32, 62, 66, 38
1. The following are test scores of the first quarterly exam in Statistics and Probability. Find the three
quartiles
1–5 4
6 – 10 6
11 – 15 10
24 | M o d u l e 3
16 – 20 8
21 – 25 7
26 - 30 5
N=40
2. 122, 125, 133, 122, 132, 122, 123, 125, 122, 123
E. Complete the entries in the following table of the scores of the Oral Exam in Math and compute
and interpret the 9 deciles.
Cumulative
Score Frequency (f)
Frequency (cf)
91 – 100 2
81 – 90 6
71 – 80 9
61 – 70 11
51 – 60 8
41 – 50 7
31 – 40 5
21 – 30 4
11 – 20 3
1 – 10 1
Total
25 | M o d u l e 3
F. Refer to the set of fata from the scores of 50 students in an English Proficiency Test and identify
the following information.
3. cf of P_53 class
10. P_43
Class Interval f cf
98 – 100 3 50
95 – 97 4 47
92 – 94 7 43
89 – 91 12 36
86 – 88 9 24
83 – 85 5 15
80 – 82 2 10
77 – 79 1 8
74 – 76 4 7
71 - 73 3 3
N=50
26 | M o d u l e 3
27 | P a g e