Chapter 4 Statistics & Data Management
Chapter 4 Statistics & Data Management
LEARNING OBJECTIVES
4.1 STATISTICS
➢ Statistics is the science of collecting, describing, interpreting data and making decisions
based on data.
• For instance, collecting numerical facts and figures about the number of people
tested positive for Corona virus disease 2019 (COVID-19), then describing the
possibility of how they got infected, tracing other people who may be exposed to
them and may also be infected and in the end would come up with list of recoveries,
deaths and those that needs to be quarantined. Statistics showed that a pandemic
of respiratory disease is spreading from person to person caused by novel corona
virus.
2. Inferential Statistic
Inferential statistics is used to make predictions or comparisons about a larger group
(population) using information gathered about a small part of population. It is produced through
complex mathematical calculations that scientist used to infer.
57
Inferential statistics is generally used when the user needs to make a conclusion about
the whole population at hand, and this is done using the various types of tests such as Linear
Regression Analysis, Analysis of Variance, Analysis of Co-variance, Statistical Significance (T-
Test) and Correlation Analysis.
Quantitative data are easily amenable to statistical manipulation and can be represented
by a wide variety of statistical types of graphs and charts such as line, bar graph, scatter plot, etc.
Example 3: Number of enrollees in your course this school year, height and weight of children,
distance travelled from home to office, scores in an examination, prices of commodities, salaries
of workers, etc.
Table 4.0: Types of data based on their mathematical properties
Nominal Data
Nominal data represents discrete units and are used to label variables that have no
quantitative value. It is used for classification purposes. Numbers or letters may be used to
represent nominal measurements.
Nominal scales are said to be the least powerful in measurement with no arithmetic origin,
order, direction, or distance relationship, reason why it has limited or restricted use.
59
Ordinal Data
This type of data represents discrete and ordered units. The order of the values is what is
important and significant, but the differences between each one is not really known.
Ordinal scales are typically the measure of non-numeric concepts like satisfaction,
happiness, discomfort, etc., and because they only show sequence, arithmetic cannot be done
with it.
These are set of numerical measurements in which the distance between numbers are
known, constant size or equal. But they do not have a “true zero”.
Interval scales are great, but ratios cannot be calculated. Interval data are more powerful
than ordinal scale due to equality of intervals.
Ratio Data
Ratio data is defined as quantitative data, having the same properties as interval and
definitive ratio between each data and absolute “zero” being treated as a point of origin, which
means there can be no negative numerical value in ratio data.
The most precise data and allow for application of all statistical techniques.
Summarizing the types of data: Nominal variables are used to “name” or label a series of
values, ordinal data provide good information about order of choices, such as customer
satisfaction survey. Interval scales give us the order of values and the ability to quantify the
difference between each one. And ratio scales provide the order, interval values plus the ability
to calculate ratio since a “true zero” can be defined.
Continuous Data
Continuous data represents measurements in which values cannot be counted but they
can be measured. It is more precise, more informative and can remove estimation and rounding
of measurements, but it is often more time consuming to obtain.
Example: Heights and weights of students, number of seconds after the start of a race
Note: A good great rule for defining if a data is continuous or discrete is that if the point of
measurement can be reduced in half and still make sense, the data is continuous.
4.7 FREQUENCY
The number of times a data point occurs in the set of data or the numerical count of data
in each class interval.
Frequency Distribution
A table that list each data point and its frequency. It shows the frequency, or number of
occurrences, in each of several categories. Frequency distributions are used to summarize large
volumes of data values.
60
Relative Frequency
Relative frequency is the frequency of a data point expressed as a percentage of the total
number of data points. It can be written as fractions, percent or decimals.
Cumulative Frequency
Cumulative frequency is the accumulation of the previous relative frequencies. To obtain
the current frequencies just add all the previous relative frequencies.
Raw Data
Raw data is data that is not usually summarized or organized in any meaningful way. Often
it is data as it is collected or recorded without any particular order except time of observation or
sequence of observation.
Class interval
Class intervals are one way of categorizing raw data according to numerical constant
intervals.
ℎ𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 − 𝑙𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 𝑅𝑎𝑛𝑔𝑒
𝐶𝑙𝑎𝑠𝑠 (𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙)𝑤𝑖𝑑𝑡ℎ, 𝑊 = =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠
Number of Classes
It is a common practice to keep the number of classes between 5 and 20 classes.
However, in deciding the approximate number of classes, H.A. Sturges suggests the following
formula:
K = 1 + 3.322 log N
Where: K - represents the number of classes
log N – logarithm of the total number of classes
Example: If the total number of observations is 50, the number of classes would be
Example 6: Given the following ungrouped data, prepare the frequency distribution.
3, 1, 0, 2, 4, 3, 4, 0, 3, 2, 4, 1, 0, 4, 2, 1, 5, 3, 2, 2, 0, 4, 3,
1, 2, 1, 4, 2, 5, 1, 2, 4, 1, 2, 2, 4, 2, 4, 1, 1, 3, 2, 2, 2, 3, 3
Solution: The given data can be arranged as shown in Table 4.2.
Table 4.2
No. Frequency
0 4
1 9
2 14
3 8
4 9
5 2
Total = 46
2. Grouped data
Grouped data are list of numbers that has been organized and arranged into classes
and categories and some data analysis has been done.
Example 7: Frequency Distribution of Heights of Freshmen
Number of Students
Height (cm)
(Frequency)
140-150 78
151-160 172
161-170 39
171-180 11
Total = 300
Example 7a: A record of 30 cars speed of a particular street are as follows with accuracy of 1
km/hr. Construct a frequency distribution for the given data.
62 58 58 52 48 53 54 63 69 63
57 56 46 48 53 56 57 59 58 53
52 56 57 52 52 53 54 58 61 63
(Source: http://www.uobabylon.edu.iq/eprints/publication_1_326_638.pdf
Solution: The following are suggested steps for the construction of frequency distribution.
Step 1: Find the range
Highest value = 69
Lowest value = 46
Range = 69 – 46 = 23
Step 2: Find the number of class intervals using Sturges formula where N = 30
K = 1 + 3.322 log 30 = 1 + 3.322 (1.477) = 1 + 4.9066 = 5.9066
Therefore, the number of classes = 6
23
Step 3: Width of class interval, W = = 3.833 ≈ 𝟒
6
Step 4: The class limits and all frequencies belong to each class interval maybe
computed as follows:
First class: Lower limit = 46
62
Upper limit = 46 + 4 -1 = 49
Second class: Lower limit = 49 + 1 = 50
Upper limit = 50 + 4 – 1 = 53
Third Class: Lower limit = 53 + 1 = 54
Upper limit = 54 + 4 – 1 = 57
Step 5: Preparing the frequency table.
Class
Interval Frequency
46 - 49 3
50 - 53 8
54 - 57 8
58 - 61 6
62 - 65 4
66 - 69 1
Note: Tally sheet is omitted
4.9 METHODS OF DATA PRESENTATION
Table 4.4 Methods of Data Presentation
33 44 43 29 50 35 43 25
35 48 28 50 33 44 25 29
43 33 25 20 35 20 20 25
37 37 40 30 45 43 29 25
20 48 37 29 33 25 20 20
20 20 25 29 33 37 43 45
20 25 25 29 33 37 43 48
20 25 28 30 35 37 43 48
20 25 29 33 35 40 44 50
20 25 29 33 35 43 44 50
Stem-and-leaf Plot
Stem-and-leaf plot is a table which sorts data according to a certain pattern. It consists of
separating a number into two parts. For a two-digit number, the stem consists of the first digit and
the leaf consists of the second digit. For three-digit number, the stem consists of the first two
digits, and the leaf consists of the last digit. For a one-digit number, the stem is zero.
Table 4.5
Stem Leaves
2 0,0,0,0,0,0,5,5,5,5,5,5,8,9,9,9,9
3 0,3,3,3,3,5,5,7,7,7
4 0,3,3,3,3,4,4,5,8,8
5 0,0
In the above Stem-and-leaf plot, there are 11 top scorer: 50, 50, 48, 48, 45, 44, 44,
43, 43, 43, 43 while the lowest 12 scores are 20, 20, 20, 20, 20, 20, 25, 25, 25, 25, 25, 25.
Table 4.6
Frequency Distribution
Graduating High School Students
No. of Hours Frequency
1 6
2 15
3 22
4 4
5 3
Total = 50
64
Figure 4.9
Contingency Table
A contingency table, sometimes called a two-way frequency table, is a tabular mechanism with at
least two rows and two columns used in statistics to present categorical data in terms of frequency
counts. The intersection of a row and a column of a contingency table is called a cell.
Example: The contingency table shown has two rows and five columns (not counting header
rows/columns) and shows the results of a random sample of 2,200 adults classified by two
variables, namely gender and favorite way to eat ice cream.
One benefit of having data presented in a contingency table is that it allows one to perform basic
probability calculations more easily, a feat made easier still by augmenting a summary row and
column to the table as shown below:
1. Bar Graph
The set of numbers (data) are illustrated using set of rectangular bars. It can be plotted
vertically or horizontally, and it can be simple, multiple, or component type.
Example: Shown in figure 4.1 and figure 4.2 are Birth weights of newly born babies.
65
6.00 Mheric
4.00
2.00 Emmanuel
0.00 Mary… Dennise
Mheric
Maiko
Rica
Eiji
Emmanuel
Lorenze
Owen
Dennise
2. Histogram
It is a pictorial diagram of frequency distribution. The class intervals are reflected along the x-axis,
and the frequencies along the y-axis using series of bars. It looks like a bar chart but there are
important differences between them.
Example: The Psychology test given to 642 students consist of 197 items, each graded as
"correct" or "incorrect” with the students' scores ranged from 46 to 167. The test scores arranged
in a simple frequency in Table 4.8. for the purpose of creating a Histogram.
Table 4.8
Grouped Frequency Distribution
of Psychology Test Scores
Lower Upper Class
Limit Limit Frequency
39.5 49.5 3
49.5 59.5 10
59.5 69.5 53
69.5 79.5 107
79.5 89.5 147
89.5 99.5 130
99.5 109.5 78
109.5 119.5 59
119.5 129.5 36
129.5 139.5 11
139.5 149.5 6
149.5 159.5 1
159.5 169.5 1
Table 4.8 shows the first interval is from 39.5 to 49.5, the second from 49.5 to 59.5,
etc.There are three scores in the first interval, 10 in the second, etc. Class intervals of width 10
provide enough detail about the distribution to be revealing without making the graph too
"choppy." Placing the limits of the class intervals midway between two numbers (e.g., 49.5)
ensures that every score will fall in an interval rather than on the boundary between intervals.
Figure 4.3
66
Source: http://onlinestatbook.com/2/graphing_distributions/histograms.html
In the above histogram, the class frequencies are represented by bars and the height of
each bar corresponds to its class frequency.
Figure 4.4
Source:http://static3.mbtfiles.co.uk/media/docs/newdocs/gcse/maths/data_handling/height_and_weight_of_pupils_and_other_mayfi
eld_high_school_investigations/34766/html/images/image01.png
4. Frequency polygon
It is a graph obtained by joining the mid-points of histogram blocks using line segments.
Table 4.8 shows the scores in Quiz 1 of 3 sections in MMW and its equivalent frequency polygon
in Figure 4.5
67
5. Pie Charts
It is a chart in circular form that uses “pie slices” to illustrate the relative sizes of data. The
chart is divided into sectors of circle where each sector shows the relative size of each data. In
figure 4.6 shows is an example of a pie chart
Figure 4.6
6. Pictogram / Pictograph A chart that uses pictures to represent data. Pictograms are illustrated
in the same way as bar charts, but instead of bars they use columns of pictures to show the
numbers involved.
Example: The following pictograph shows the number of students using the different types of
transportation to go to school.
Walking
Tricycle
Bus
Private Car
LRT/MRT
- represents 10 students
68
Walking
Tricycle
Bus
Private Car
LRT/MRT
- represents 10 students
70
c.) The percentage of students who cycle to school = 𝑥 100 = 𝟏𝟕. 𝟓%
400
Example: Shown below is the employee’s data of a certain company, where a pictograph was
created with the aid of MS Excel.
Figure 4.7
Source: https://excelchamps.com/blog/how-to-make-a-pictograph-in-excel/
7. Scatter Plot
It uses dots to represent values for two different variables. Scatter plots are used to
observe relationships between variables and shows patterns when data are taken as a whole.
69
Figure 4.9
70
Number of
Heights, cm
Students
140 - 150 75
150 - 160 156
160 - 170 128
170 - 180 41
Total 400
2. Table 4.11 represents the heights, in inches, of a sample of 100 male basketball players in
NCR. Construct the cumulative frequency curve.
Table 4.11
3. Twenty people were asked the number of kilometers they commute to work every day.
2 5 7 3 2 10 15 20 7 10
18 5 12 13 12 4 5 10 18 4
Relative Cumulative
Data Frequency
Frequency Relative Frequency
2 2 4/19 0.2105
3 3 3/19 0.1579
4 1 1/19 0.2105
5 3 3/19 0.1579
7 2 2/19 0.2632
10 3 4/19 0.4737
12 2 2/19 0.7895
13 1 1/19 0.8421
15 1 1/19 0.8948
18 1 1/19 0.9474
20 1 1/19 1.000
71
Problem:
a. Is the table correct? If it is not correct, what is wrong?
b. True or False: Three percent of the people surveyed commute 3 kilometers. If the statement is
not correct, what should it be? If the table is incorrect, make the corrections.
c. What fraction of the people surveyed commute 5 or 7 kilometers?
d. What fraction of the people surveyed commute 12 kilometers or more? Less than 12
kilometers? Between 5 and 13 kilometers, (does not include 5 and 13 miles)?
4. Draw the Ogive graph for the following set of data:
2, 7, 16, 21, 31, 3, 8, 17, 21, 55, 3, 13, 22,
55, 4, 14, 19, 25, 57, 6, 15, 20, 29, 58, 18
5. An insurance company requested a police report about color of cars that encountered accident
during first quarter of the year. The report is shown in the table below:
Construct a bar graph for the above police report. Also create its pie-chart.
6. Consider the following data on annual wages of 20 Security Guards assigned to different
establishments in Metro Manila Area.
Annual Wages of Security Guards in Hundred Thousand of Pesos
168 162 216 210 162 180 180 210 204 156
210 204 168 216 216 202 162 204 168 180
7. The temperature, °C in Baguio for the month of March 2020 is shown in Table 4.13
Table 4.13
BAGUIO TEMPERATURE, °C MARCH 2020
SUNDAY MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY
1 2 3 4 5 6 7
Table 4.14
Number
Blood Feeling Social
ID AGE GENDER HEIGHT LDL of Smoke?
Group Happy? Class
Children
1 25 F 1.62 B 150 Agree 0 No I
Strongly
2 35 F 1.58 O 123 1 Yes II
Agree
3 44 M 1.35 A 178 Disagree 3 Yes I
4 28 F 1.54 AB 205 Disagree 0 No III
5 3 M 1.35 O 229 Indifferent 2 Yes I
6 42 M 1.21 B 215 Agree 2 Yes IV
Strongly
7 36 F 1.76 A 130 1 No IV
Disagree
8 38 M 1.57 A 175 Disagree 1 Yes V
9 30 M 1.47 AB 240 Indifferent 0 No III
Strongly
10 40 F 1.18 B 167 6 No I
agree
*LDL – Low Density Lipoprotein
Source: https://www.slideshare.net/WinonaEselBernardo/presentation-of-data-10958540
37 + 33 + 33 + 32 + 29 + 28 + 23 + 22 + 22 + 22 + 21 + 21 + 21 … + 6
𝑥̅ =
30
606
𝑥̅ = = 20.2
30
Therefore, the mean score of the class, 𝑥̅ = 𝟐𝟎. 𝟐
The Weighted Mean
Weighted mean is a mean that is calculated with extra weight given to the sample data.
∑ 𝑤𝑥
𝑥̅ =
∑𝑤
Where w – the weight of the given data
Example 12: The numbers 1, 2, 3, 4 and 5 have weights 0.1, 0.25, 0.3, 0.15 and 0.2, respectively.
Calculate its weighted mean?
Solution: Substitute the given numbers with their corresponding weights as follows:
∑ 𝑤𝑥 1(0.1) + 2(0.25) + 3(0.3) + 4(0.15) + 5(0.2)
𝑥̅ = =
∑𝑤 0.1 + 0.25 + 0.3 + 0.15 + 0.2
frequencies.
Example 13: The following data shows the number of days that students were absent from school
due to sickness in the previous school year. Use the direct method to compute the mean.
∑ 𝑓𝑥 395
𝑥̅ = = = 9.875
𝑛 40
Example 14: The following data shows the distance traveled by 100 commuters from house to
work.
Distance in
1 - 10 11 - 20 21 - 30 31 - 40 41 - 50
kilometers
No. of Commuters 10 17 42 18 13
21+30
Solution: The given data can be tabulated as follow, where the assumed mean, A = = 25.5
2
75
Number of 𝑥−𝐴
Mid points
Distance, km Commuters 𝑢= fu
x 10
f
1 – 10 10 5.5 -2 -20
11 – 20 17 15.5 -1 -17
21 – 30 42 25.5 0 0
31 – 40 18 35.5 1 18
41 - 50 13 45.5 2 26
Total ∑ 𝑓 = 100 ∑ 𝑓𝑢 = 7
∑ 𝑓𝑢
Solving for the arithmetic mean using the formula, 𝑥̅ = 𝐴 + ∑𝑓
(ℎ)
7
𝑥̅ = 25.5 + (10) = 26.2
100
Note: The mid-point is a multiple of 5 and the difference from mid-point to mid-point is 10, it is the
class size.
Example 15: Shown below is a table of scores of 50 students in MMW class. Find the arithmetic
mean by a.) direct method b.) step-deviation method.
Scores 20 - 29 30 - 39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89
Frequency 1 5 12 15 9 6 2
Direct
Step-Deviation Method
Method
Marks f x fx fu
20 - 29 1 24.5 24.5 -3 -3
30 - 39 5 34.5 172.5 -2 -10
40 - 49 12 44.5 534 -1 -12
50 - 59 15 54.5 817.5 0 0
60 - 69 9 64.5 580.5 1 9
70 - 79 6 74.5 447 2 12
80 - 89 2 84.5 169 3 6
Total 50 2745 2
50+59
Where A = = 54.5 h = 10
2
∑ 𝑓𝑥 2745
𝑥̅ = ∑𝑓
= = 𝟓𝟒. 𝟗
50
∑ 𝑓𝑢
𝑥̅ = 𝐴 + 𝑥ℎ
∑𝑓
50+59
Where 𝐴 = = 54.5 and h = 10,
2
2
𝑥̅ = 54.5 + 𝑥 10 = 𝟓𝟒. 𝟗
50
76
Arithmetic mean is based on observations and easy to calculate and determined for almost
every kind of data. It is only slightly affected by the unstable values of samples.
It is highly affected by utmost values. It is not the proper way of expressing averages into ratios
and percentages and for highly skewed distributions is not a proper way to compute its average.
Median is defined as the middle value of the data when the data is arranged in ascending
or descending order. When the number of observations is odd, the middle number is the median,
but when the number of observations is even, the median is the average of the two middle
numbers.
𝑛+1
If the number of observations is odd: 𝑀𝑑 = 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 ( ) 𝑡ℎ 𝑑𝑎𝑡𝑎
2
𝑛 𝑛
( 2 )𝑡ℎ + ( 2 +1)𝑡ℎ
If the number of observations is even: 𝑀𝑑 =
2
There are 7 terms in the given, which is odd, therefore the median is the middle (4 th)
term which is 29 or Md = 29.
Note: The number of terms having values greater than or equal 29 is the same as the number of
terms having values less than or equal to it.
The number of terms is 6 which is even, which means the 3rd and 4th are middle terms. Median is
the average value of these terms,
55 + 65 120
𝑀𝑒𝑑𝑖𝑎𝑛 = = = 𝟔𝟎
2 2
Hence, Md = 60.
Note: The number of terms having values greater than or equal to 60 is the same as the number
of terms having values less than or equal to it.
77
𝑛
𝑖 ( − 𝑓 <)
𝑀𝑑 = 𝐿 + 2
𝑓
n – number of observations
f < - less than cumulative frequency of the class below the median class
Example 18: Given the following frequency distribution, find the median.
Raw Score f
75 -79 3
70 - 74 1
65 - 69 7
60 -64 9
55 - 59 6
50 - 54 4
Solution: Add a third column for the less than cumulative frequency to the above table. The third
column is obtained by adding the frequencies of the class starting from the lowest class up to the
topmost class. The cumulative frequency of the topmost class must be equal to the number of
observations
For the median class, consider the class opposite the third column (f <) which is
𝑛 𝑛
immediately greater than . Since = 15, the entry in the third column that is immediately greater
2 2
than it is 19, and therefore the median class is 60 – 64. The lower limit of the median class is 60
and the exact lower limit of the median class is 60 – 0.5 = 59.5. The class size i = 5 and f < = 15,
then Md is as follows,
𝑛
𝑖 ( − 𝑓 <) 5(15 − 10)
𝑀𝑑 = 𝐿 + 2 = 59.5 +
𝑓 9
Hence, Md = 62.27
The mode is the number that appears most frequently in a data set. A set of data may
have one mode, (unimodal) two modes (bimodal), 3 modes (trimodal) or more (multimodal) or no
mode at all (zero mode).It is denoted by the symbol Mo and it can be obtained by inspection of
the given set of data.
78
Example 19: Given the following sets of numbers, determine its mode.
c. 5, 10, 15, 20, 25, 30, 30, 25, 20, 15, 10, 5
Solution:
a. The number that occur most frequently is 15, therefore the Mo = 15.
b. There are three numbers that occur most frequently, 79, 80 and 91 therefore the
Mo = 79, 80, 91. The mode of the given set of data is trimodal.
74, 76,79,79,80,80,85,91,91,93
5, 10, 15, 20, 25, 30, 30, 25, 20, 15, 10, 5
The mode of grouped data can be computed using the following formula:
𝑑1
𝑀𝑜 = 𝐿𝐵 + 𝑖 ( )
𝑑1 + 𝑑2
LB – lower boundary of the modal class (also called exact lower limit of modal class)
i – class size
d1 – difference between frequency of the model class and the class below it.
d2 – difference between frequency of the modal class and the class above it.
Example 20: A record of Elite Coffee Shop shows the number of orders of Brewed Coffee made
by its customers per hour. Find the mode.
Number of
Frequency
Brewed Coffee
0-2 2 Image Source:
3-5 3 https://youronevoiceca
nmakeadifference.files.
6-8 6 wordpress.com/2012/0
9 - 11 7 6/cup-of-fresh-brewed-
12 - 14 5 coffee-photo-credit-
fanpop1.jpg
15 - 17 3
18 - 20 2
79
Solution: Locate the modal class, as shown below the modal class is 9 – 11, having the highest
frequency of 7,
Number of
Frequency
Brewed Coffee
0-2 2
3-5 3
6-8 6
9 - 11 7
12 - 14 5
15 - 17 3
18 - 20 2
And the LB, lower boundary of the modal class = 9 – 0.5 = 8.5
d1 = 7 – 6 = 1
d2 = 7 – 5 = 2
i=3
𝑑1 1
𝑀𝑜 = 𝐿𝐵 + 𝑖 ( ) = 8.5 + 3 ( ) = 𝟗. 𝟓
𝑑1 + 𝑑2 2+1
Advantages of Mode
Mode is easy to understand and calculate and is useful for qualitative data. It is not
affected by extreme values. It is easy to identify in a data set and in a frequency distribution. It
can be computed in an open -ended frequency table and can be located in a graph.
Disadvantages of Mode
Mode is defined when there are no values repeated in a data set. It is not based on all
values. It unstable when data consist of a small number of values.
Example 21: Find the arithmetic mean, median and mode of the following set of data:
∑ 𝑥 73 + 78 + 73 + 74 + 74 + 73 + 76 + 81
̅=
𝒙 = = 𝟕𝟓. 𝟐𝟓
𝑛 7
To solve for the median, arrange the given data in ascending order, there are 8 items in
the data which means the median is the average of the value 4th and 5th item in the sequence,
74 + 74
𝑴𝒅 = = 𝟕𝟒
2
Note: In many cases, the modal value will differ from the average value in the data.
80
Exercises 4.1
x f
10 6
23 13
27 15
30 17
35 12
2. Shown in the table below is the Age distribution of newly hired health workers to be deployed
to different hospitals in USA. Use the Step-Deviation Method to compute for mean deployment
rate of health workers.
Number of Health
Age Group
workers
25 – 29 18
30 – 34 21
35 – 39 35
40 – 44 40
45 – 49 36
50 – 54 29
55 – 59 15
3. The final grade of Emmanuel in every subject is indicated in his report card shown below, use
weighted mean to calculate for Emmanuel’s GPA.
No. of Final
Subjects
Units Grade
Oral Communication in Context 3 88
Komunikasyon at Pananaliksik sa Wika at Kulturang Pilipino 3 90
General Mathematics 3 85
Earth Science 3 89
Introduction to the Philosophy of the Human Person 3 92
Empowerment and Technologies 3 91
Pre-Calculus 3 87
Note: P.E. is not included in the computation of GPA.
a. 13, 14, 15,1 3, 16, 17, 12. b. 42, 44, 45, 43, 48, 47, 44, 40.
5. During the 4 months community quarantine due to pandemic the Popular Electric company was
not able to do actual meter reading of electric consumption instead charged their customers based
on average household consumptions for the last 4 months. If a household electric consumption
for the last 4 months were 95kwh, 106.2 kwh, 98.9kwh and 101.3kwh. Find the number of kilowatt
hour that the Popular electric company charged to the customer on the 5 th month of pandemic.
6. The typing speeds of 25 secretaries in a big company are recorded below (in words per minute).
Find the mode.
35, 43, 39, 46, 43, 47, 38, 51, 43, 38, 40, 45
81
7. The manager of a Burger House recorded the number of burgers sold per day in 2 weeks
(below). Which of the following statements is true?
132, 121, 119, 116, 130, 121, 131, 117, 119, 135, 121, 129, 119, 134
8. Medric’s cumulative GPA for 3 semesters was 2.5 for 42 course units. His fourth semester was
3.0 for 15 course units. What is his cumulative GPA for all 4 semesters?
10. Find the mean, median and all the modes for the following frequency distribution.
Points scored in
Frequency
basketball game
3 5
4 6
5 5
8 3
11 2
15 1
20 1
11. Given the frequency distribution of scores in the final examination of 50 engineering students
in Calculus I. Compute the modal score in the final examination in Calculus 1.
12. Organizers of the 5 kilometer Race recorded the participants running pace in minutes. Find
the mean, median and all the modes of the runners.
After the mean, median, mode, range and variance were calculated for the scores, it was
discovered that one of the scores of 20 should have been an 18. Which of the following will change
when the calculations are redone using the correct scores?
82
The measure of dispersion provides information on how scatter or spread are the given
set of data. It gives a clear idea about the distribution of the data. It shows how much of the data
vary from their average value. The measure of dispersion shows if the distribution of data is
homogeneous or heterogenous. It is also known as the measure of variability.
Some ways to measure dispersion are calculating the range, mean absolute deviation,
variance, and standard deviation.
The range and quartile deviation express the scattering of observation in terms of
distances while mean deviation and standard deviation expresses the variations in terms of the
average of deviations of observations.
Range is the simplest measure of dispersion. It is the difference between two extreme
observations of the data set.
R=H–L
Where R – range
Example 22: Compute range of the of the Final grades of 10 Male and 10 Female Grade 11
students in Statistics. Describe the result.
Male: 91, 98, 87, 76, 75, 82, 84, 78, 86, 83
Female: 92, 88, 83, 85, 82, 77, 91, 78, 82, 79
Solution: The highest final grade of Male students is 98 and the lowest grade is 75, while for
female the highest final grade is 92 and the lowest grade is 73, therefore the range are,
Male: R = 98 – 75 = 23
Female: R = 92 – 77 = 15
From the above computation the range of male is 23 while that of female is 15, this means
that the final grades of male students are more scattered while the final grades of female students
are closer. This also indicates that the final grades of female students are more homogeneous
than the final grades of male students.
1. Range = Upper boundary of the highest class - Lower boundary of the lowest class
R = UH – LL
R = Mh - M L
83
Example 23: Given the weights of Female students in a PE class, find its range and coefficient
of range.
Weight
100 - 104 105 - 109 110 - 114 115 – 119 120 - 124 125 - 129
(Pounds)
Number of 8 13 19 43 10 7
Students
Note: Range is the simplest method and easily understood measure of dispersion, but it
is a poor measure of dispersion and does not give a good picture of the overall spread of
the observations with respect to the center of the observations.
Mean deviation is the arithmetic mean of the absolute deviations of the observations
from a measure of central tendency.
∑|𝑥 − 𝑥̅ |
𝑀𝐴𝐷 =
𝑛
Where MAD – Mean absolute deviation
X – given value in the set of ungrouped data
𝑥̅ – mean of ungrouped data
n – total number of measurements
∑|𝑥 − 𝑥̅ | – summation of the absolute deviation of the values from the mean.
Example 23: The record of the 10 fastest participants of 5 Kilometer Fun Run in Manila are as
follows 50, 60, 75, 80, 90, 105, 125, 145, 165, 200 all in minutes. Find the mean absolute
deviation.
∑𝑥 50+60+75+80+90+105+125+145+165+200 1095
𝑥̅ = = = = 𝟏𝟎𝟗. 𝟓
𝑛 10 10
Now, solving for other items in the formula and presenting in tabulated form,
84
X 𝒙 − 𝒙̅ |𝒙 − 𝒙 ̅|
50 -59.5 59.5
60 -49.5 49.5
75 -34.5 34.5
80 -29.5 29.5
90 -19.5 19.5
105 -4.5 4.5
125 15.5 15.5
145 35.5 35.5
165 55.5 55.5
200 90.5 90.5
n = 10 ∑|𝑥 − 𝑥̅ | = 394
∑|𝑥 − 𝑥̅ | 394
𝑀𝐴𝐷 = = = 39.4
𝑛 10
∑ 𝑓|𝑥 − 𝑥̅ |
𝑀𝐴𝐷 =
𝑛
Note: The class mark of the class is the average of the lower and upper boundary of the class.
Example 24: Consider the previous problem, where the weights of Female students in a PE class
are given, find its MAD.
Weight
100 - 104 105 - 109 110 - 114 115 – 119 120 - 124 125 - 129
(Pounds)
Number of
8 13 19 43 10 7
Students
Solution: We first solve for 𝑥̅ using the short cut method of mean of grouped data,
Weight, 𝑥 − 𝐴
F x 𝑢= fu 𝑓 |𝑥 − 𝑥̅ |
(lbs) 5 𝑥 − 𝑥̅
100 – 104 8 102 -3 -24 -14.55 116.4
105 – 109 13 107 -2 -26 -9.55 124.15
110 – 114 19 112 -1 -19 -4.55 86.45
115 – 119 43 117 0 0 0.45 19.35
120 – 124 10 122 1 10 5.45 54.5
125 - 129 7 127 2 14 10.45 73.15
-45 474
115 + 119
𝐴= = 117
2
85
∑ 𝑓𝑢 (−45)
𝑥̅ = 𝐴 + ∑𝑓
= 117 + = 𝟏𝟏𝟔. 𝟓𝟓
100
Substitute 𝑥̅ to MAD,
∑ 𝑓 |𝑥 − 𝑥̅ | 474
𝑀𝐴𝐷 = = = 4.74
𝑛 100
Note: The merit of mean deviation is based on all observations, it provides minimum value when
the deviations are taken from the median and independent of change of origin. Its demerits, it is
not easily understandable, its calculation is time consuming, negative sign is useless for
mathematical treatment.
Standard deviation is defined as the positive square root of the arithmetic mean of the
squares of the deviations of the given values from their arithmetic mean.
∑(𝑥 − 𝑥̅ )2 ∑(𝑥 − 𝑥̅ )2
𝑠2 = and 𝑠 =√
𝑛−1 𝑛−1
n – number of measurements
Example 25: The following numbers were obtained from the daily sales of pair of trendy school
shoes for two weeks in one of its stores.
1 15 18 16 11 19 21 16
2 10 11 12 11 15 24 14
15 + 18 + 16 + 11 + 19 + 21 + 16 + 10 + 11 + 12 + 11 + 15 + 24 + 14 213
𝑥̅ = =
14 14
𝑥̅ = 𝟏𝟓. 𝟐𝟏
86
x 𝑥 − 𝑥̅ (𝑥 − 𝑥̅ )2
10 -5.21 27.1 ∑(𝑥 − 𝑥̅ )2 226.4
𝑠2 = = = 17.415
11 -4.21 17.7 𝑛−1 14 − 1
11 -4.21 17.7
11 -4.21 17.7
12 -3.21 10.3
14 -1.21 1.5
15 -0.21 0.0 Therefore, the variance of daily sales
15 -0.21 0.0 of trendy shoes is 17.415 and its
16 0.79 0.6 standard deviation is 4.17.
16 0.79 0.6
18 2.79 7.8
19 3.79 14.4
21 5.79 33.5
24 8.79 77.3
n = 14 226.4
VARIANCE AND STANDARD DEVIATION FOR GROUPED DATA
The formula for variance and standard deviation for grouped data is shown below:
∑ 𝑓(𝑥−𝑥̅ )2 ∑ 𝑓(𝑥−𝑥̅ )2
𝑠2 = and 𝑠=√
𝑛−1 𝑛−1
Example 26: Consider the previous problem, the weights of Female students in a PE class are
given, calculate it variance and standard deviation.
∑ 𝑓𝑥 11,475
𝑥̅ = = = 114.75
𝑛 100
1. Find the standard deviation of the sample of the following numbers obtained by sampling
method.
Company A Company B
Number of Employees 950 1,100
Average Daily Wage ₱ 680.00 ₱ 750.00
Variance in the distribution of Wages 100 145
4. Some 15 Students in Chemistry were observed to spend more time doing their experiments
beyond their laboratory class time. The amount of time (minutes) spent by these students are as
follows:
15, 28, 25, 48, 22, 43, 49, 34, 22, 33, 27, 25, 22, 20, 39
a. Find the Range, Standard Deviation, and Variance for the above data.
b. What does this information tell you about the variability of student's length of time
doing their experiments beyond their laboratory class time? Is it homogeneous or
heterogeneous?
5. The final scores in 10 Pin Bowling obtained by 6 players are given below:
Players Score
Lorenze 81
Maiko 130
Eiji 80
Rica 109
Dennise 150
Owen 86
a. Find the Range, Standard Deviation, and Variance for the above data.
b. What does this information tell you about the variability of final scores in 10 Pin
Bowling of the players? Is it homogeneous or heterogeneous?
6. The table below shows the total number of man-days lost to sickness during one week’s
operation of a small chemical plant.
88
Frequency 8 7 10 9 6
Calculate the variance and standard deviation of the number of lost days.
7. Use Step-deviation method to calculate the mean of the following data:
Quiz 1
11 - 20 21 - 30 31 - 40 41 - 50 51 - 60 61 - 70 71 - 80
Raw Scores
Number of
4 7 15 18 12 6 1
Students
8. The variance of a sample of 121 observations equals 441. Find its standard deviation.
9. In a farm, 50 hogs are raised. The caretaker observed the weight gain of pigs who are fed
with starter ration from weaning until two months of age are recorded in the table below:
Weight Gain, Kg Number of Pigs
3.0 – 3.9 1
4.0– 4.9 5
5.0 – 5.9 8
6.0 – 6.9 12
7.0 – 7.9 15
8.0 – 8.9 6
9.0 – 9.9 3
10. The record of dental clinic inside the mall shows patients that availed of tooth restoration
during weekdays:
Compute the variance in the number of patients of dental clinic using two methods.