Module 345stat Roxas1stsem23 24
Module 345stat Roxas1stsem23 24
This chapter is concerned with another way of describing numerical data. In order to
transform the set of data into a meaningful form, the raw data must be organized into a frequency
distribution table and presented textually and graphically. The measure of central tendency is a
single number, which represents the general level of performance of the group.
The Central Tendency
The central tendency is the center or concentration of scores in the set of gathered data. It
is a single value that represents the set of data.
The three measures of central tendency are the mean, the median and the mode.
The Mean – the arithmetic mean or average is the sum of the values in the group of data
divided by the number of values. The mean is the most reliable central tendency measure.
The Population Mean is used when a study involves all persons, animals, objects or
things, use population mean. For raw data, that is less than 30 is considered as ungrouped data.
Example: The researcher wants to find out the study habits of the female basketball players in the
university. Even though there are only 12 payers yet they comprise the population.
u = Σx
n
where:
u = population mean
n = total number of observations in the population
x = a particular value
Σx = sum of population values of x
Sample Mean
The sample mean is the sum of all the sample values divided by the total number
of values in the sample.
Consider the following data on the performance in Mathematics and Science of the fourth
year high school students in a certain public school.
Student Performance in Performance in
Mathematics Science
1 28 23
2 23 17
3 30 27
4 21 15
5 16 18
6 26 25
7 17 20
8 19 18
9 23 20
10 16 17
11 23 24
12 20 15
1. What is the arithmetic mean performance in Mathematics?
2. What is the arithmetic mean performance in Science?
3. Is the data considered as a population parameter? Why?
Solution:
1. Arithmetic mean of performance in Mathematics.
_
x = Σx = 262 = 21.83
n 12
2. Arithmetic mean of performance in Science.
_
x = Σx = 239= 19.92
n 12
3. No, because we are only considering 12 fourth year high school students and they represents a
sample of the population. It is a fact that many public schools have more than 12 graduating
students.
Weighted Mean
The weight of each value are considered according to its importance.
The Mc Bee Fast Food chain pays its employees on an hourly basis raging from P 15.00,
P16.50, P18.50 and an hour. 15 employees earn P15.00 an hour, 22 are paid P16.50 an hour and
7 are paid P18.50 per hour. Find the weighted mean.
Solution:
To compute for the weighted mean, find the sum of the products of 15 by P15.00, 22 by
P16.50 and 7 by P18.50, and divide by the sum of 15, 22 and 7. The resulting average is called
the weighted mean.
= 717.5
44
= Php 16.31
Median
The median is the middle most value after they have been ranked from the lowest to the
highest, or vice versa.
Ranking is the process of arranging the data from the highest to the lowest or vice versa
based on certain criteria. Such criteria can be ordered in terms of quantity, quality, appraised
value and chronology.
Steps in ranking:
1. Arranged the data to be ranked in a descending or ascending order.
2. Assign consecutive numbers for each item from the highest or from the lowest.
3. Rank an item occurring once the same as its consecutive number.
4. The rank of an item occurring two or more times is done by adding their consecutive
numbers and divide by the number of items.
Prices of denim jeans Prices of denim jeans
from highest to lowest from lowest to highest
Number Data Rank Data Number Rank
1 850.00 1 550.00 1 1
2 750.00 2 600.00 2 2.5
3 725.00 3 600.00 3 2.5
4 675.00 4 675.00 4 4
5 600.00 5.5 725.00 5 5
6 600.00 5.5 750.00 6 6
7 550.00 7 850.00 7 7
Mode
The mode is the most frequent value or score of the observations that appears in the
study. In the case where there are 2 most frequent value in the study it is called bimodal. If there
are 3 trimodal, etc.
Example: The following data are the annual salaries of the office secretary in private companies:
Office secretaries with annual salary of P54,000 appear 5 times more than any of the
salaries. It is the mode of the annual salaries stated.
Exercises 3a:
Name______________________________________________ Score_____________
Course/year________________________________________ Date_____________
1. All the students in Integral Calculus are considered in the study. Their grades are 2.25,
1.75, 3.0, 2.50, 2.75, 2.50, 2.0, 3.0.
2. There are six persons employed in an insurance company. The reported transactions for
the month (rounded to the nearest hundreds) are as follows: P35,000, P50,000, 75,000,
28,000, 19,000, and 42,000.
3. The daily salary of the workers in the construction company are P120, 150, 175, 130, 150,
200, 280, 225, 160, 200, 175, 120.
4. The grades of Peter in his subjects at the end of the school year are as follows; English,- 88,
Filipino- 90, Mathematics – 89, Science – 85, T.H.E.- 85, Social Studies – 89 Values Education
– 88, P.E.H.M. – 87, and R.H.G.P. – 88. The unit per subject are: One (1) unit for subjects
English, Filipino, Mathematics, Social Studies, Values Education, and PEHM, Two (2) units for
Science and T.H.E. and (2)unit for RGHP.
Grouped Data
The Mean
To determine the mean of the interval and the ratio data organized into the
frequency distribution use the summation of the product of the frequency and the midpoint of the
class or the frequency midpoint method.
The frequency midpoint method
Σfm
x = n
Formula:
Where:
x = is the arithmetic mean
m = is midpoint of each class
f = is the frequency of each class
n = is the total number of frequencies
In the first row the product 194 (fm) were taken from 2 (f) times 97 (m). This is done
continuously until the last row is completed.
Median
The median is the point where half of the values lie above it and the other half of
the values lie below it. When the raw data have been organized into a frequency distribution we
cannot determine the exact median. However the median can be estimated by locating the point
in which the median class lies.
Formula:
Md = Ll + [ N/2 -Cf ] I
f
Where:
Md = The median
Ll = the lower limit of the median class
Cf = Cumulative frequency
F = Frequency of the class interval
I = Class interval
N = Total number of frequencies
Cumulative Cumulative
Frequency Midpoint Frequency Frequency
Class interval f m CF< CF>
95 – 99 2 97 75 2
90 – 94 3 92 73 5
85 – 89 6 87 70 11
80 – 84 8 82 64 19
75 – 79 10 77 56
70 – 74 14 72 46 43
65 – 69 12 67 32 55
60 – 64 7 62 20 62
55 – 59 6 57 13 68
50 – 54 4 52 7 72
45 – 49 3 47 3 75
n = 75
Substituting the following values in the formula
Ll = 69 + 70 = 139 = 69.5
2 2
Cf< = 32
F = 14
I = 5
N = 75
Md = Ll + [N/2 - Cf] I
F
Md = 69.5 + [75/2 - 32] 5
14
Md = 69.5 + [37.5 - 32] 5
14
Md = 69.5 + [5.5] 5
14
Md = 69.5 + 27.5
14
Md = 69.5 + 1.96
Md = 71.46
The Mode
If you recall that the mode is the most frequent scores or values. For grouped data into a
frequency distribution, the mode can be the midpoint in the largest class frequency
In the aforementioned problem the largest class frequency is located in the class interval
69.5 – 74.5 with the frequency equal to 14. The midpoint is 72., Therefore the Mode is 72.
When two values occur most frequently with the same number of frequency then there
are two modes in which case the two modes are called Bimodal.
Exercises 3b
Name____________________________________________________ Score_______
Course/year______________________________________________ Date ________
1. Determine the mean, the median and the mode of the following daily wages of sales clerks in
the following frequency distribution.
Daily wages Number of workers
130 - 149 5
150 - 169 26
170 - 189 30
190 - 209 42
210 - 229 20
230 – 249 8
250 – 269 4
2. The new management of a radio station changed its format from drama to commentaries. A
recent sample of radio listeners revealed the following age distribution. Find the mean, the
median and the mode age of listeners.
Age Frequency
10-19 15
20-29 29
30-39 38
Age Frequency
40-49 62
50-59 40
60-70 19
3. There are 464 students in second year high school. The following are the frequency distribution
of scores in the English test.
Class Interval Frequency
28-30 5
25-27 8
22-24 10
16-18 17
13-15 20
10-12 18
7-9 12
4-6 5
1-3 1
N=
a. What is the mean score in English?
b. Determine median score?
c. What is mode score?
d. Is this a sample estimate or a population parameter?
4. In the class of 30 students, the following are their scores in mathematics
90 87 89 85 88 74 59 82 92 87 79 76 83 71 85
81 69 82 75 70 73 76 91 83 75 75 88 71 83 90
Prepare a worksheet and compute for the mean, median and the mode. Use 55 as the lower
limit of the lowest step interval and the class interval of 5.
Exercises 3c
Name____________________________________________________ Score_______
Course/year______________________________________________ Date ________
1. The following data on the daily wages of construction worker are presented in the
frequency distribution.
Daily wages Number of workers
120 – 134 2
135 – 149 16
150 – 164 25
165 – 179 39
180 – 194 53
195 – 209 28
210 – 224 12
225 – 239 3
1. Find Q1 and Q2.,,,
2. Find D1 and D9,
3. Find P34 and P69.
2. Given the following frequency distribution. Find the quartile 1, quartile 3, decile 3, decile 8,
percentile 8, and percentile 98.
Classes Frequency
10.5 – 15.5 15
15.5 – 20.5 29
20.5 – 25.5 38
25.5 – 30.5 62
30.5 – 35.5 40
35.5 – 40.5 19
40.5 – 45.5 18
45.5 – 50.5 12
3. There are 464 students the secondary year high school. The following are the frequency
distribution of scores in English test.
Class interval Frequency
28 – 30 5
25 – 27 8
22 – 24 10
19 – 21 20
16 – 18 17
13 – 15 20
10 – 12 18
7–9 12
4–6 5
1–3 1
a. What is the P29 and P89 ?
b. Determine the D1 and D7 ?
c. What is the Q1 and Q3?
d. Is the sample estimate or a population parameter? Give reason for your answer.
Measures of Location
The measures of location are those points that divide a class frequency distribution of a
variable into a number of equal parts. Generally, 100 is the multiple of the point measures, that
is, they divide 100 exactly. The more common point measures are the quartile, the decile, and the
percentile.
The Quartile
The quartile is a point measure that divides the class frequency distribution of a variable
theoretically into four equal parts. There are two quartiles that need to be computed, the first
quartile and the third quartile. The second quartile is the median.
¼ ½ ¾
First Quartile
The computation of the first quartile is the same as the computation of the median;
however, it is only ¼ of the scale that is being considered:
The scores of 110 students in a multiple choice test in Philippine History grouped into a
class frequency distribution is shown in table 5.1.
Table 5.1
Frequency Distribution of the Multiple Choice
Test Scores in History
Procedure:
1. Use the formula:
Third Quartile
Using the same data given in table 5.1 follow the process of computing quartile 1 (Q 1).
To compute the third quartile (Q3) consider ¾ of the scale from the lowest to the highest values.
The Decile
The term deci means ten and the decile is a point measure that divides the class frequency
distribution of a variable into ten equal parts.
D1 D2 D3 D4 D5 D6 D7 D8 D9
The formula:
D= LL + ( dN/ 10 - Cf) I
F
Note that D stands for the decile rank as D1, D2, D3,… D9.
The Percentile
Percentiles are points that divide the distribution theoretically into 100 equal parts.
Hence, percentile means one-hundredth.
D1 D 2 D3 D4 D5 D6 D7 D8 D9
P45 P84
Computation of Percentile
The formula:
Pr = Ll + (RN/100 - Cf) I
F
Note: r stands for the percentile rank to be computed. So if percentile rank 45 is to be computed, P r should be P45, if
percentile rank 84, Pr should be P84, etc.
Table 5.1
Frequency Distribution of the Multiple Choice
Test Scores in History
2. Substitute the values computed for their respective symbols in the formula and solve.
Exercise 4a
Name____________________________________________ Score____________
Course/year________________________________________Date_____________
1. The following data on the daily wages of construction worker are presented in the
frequency distribution.
Find the following: Q1 & Q3, D1 & D9, P34 & P69 .
2. Given the following frequency distribution. Find the quartile 1, quartile 3, decile 3, decile
8, percentile 8, and percentile 98.
Classes Frequency
10.5-15.5 15
15.5-20.5 29
20.5-25.5 38
25.5-30.5 62
30.5-35.5 40
35.5-40.5 19
40.5-45.5 18
45.5-50.5 12
Measures of Variability
This chapter will present several measures that describe the dispersion, variability, or the
spread of the data. Discussed in this chapter are the range, quartile deviation, percentile range,
mean absolute deviation and standard deviation, the box plot, coefficient of variation, and
skewness.
A measure of variability or measures of variation is a method of measuring the degree by
which quantitative data or values tend to spread from point of central tendency or cluster about
the central point of the mean. It is also called measures of dispersion.
The most common measures of variation are the following:
1. The Range
2. The Quartile Deviation
3. The Percentile range
4. The Average Deviation
5. The Standard Deviation
1. The Range
A.. Ungrouped Data
R = H-L
Where:
R= the range
H= the highest data
L= the lowest data
R=H–L
R = 25 – 9
R = 16
The Range
B. Group Data
The range for the group data is the difference between the highest class upper
boundary and the lowest class lower boundary:
R = Ubh - LBl
Find the range for the Grouped Data.
Table 5.1
Frequency Distribution of a Multiple Choice
Test Scores in History
Class Interval Frequency Midpoint CF<
42.5 – 45.5 3 44 110
39.5 – 42. 5 8 41 107
36.5 – 39.5 10 38 99
33.5 – 36.5 12 35 89
30.5 – 33.5 15 32 77
27.5 – 30.5 16 29 62
24.5 – 27.5 14 26 46
21.5 – 24.5 11 23 32
18.5 – 21.5 9 20 21
15.5 – 18.5 7 17 12
12.5 – 15.5 5 14 5
N= 110
Solution:
R = Ubh - LBl
R = 45.5 – 12.5
R = 33
Q.D. = Q3 – Q1
2
Where:
Q.D. = the quartile deviation
Q1 = the quartile 1
Q3 = the quartile 3
The quartile 1 (Q1) divides the frequency distribution into a lower one-fourth of the
data, while Q3 divides the distribution into an upper one-fourth of the data. In
computing Q1, which is the lower one-fourth of the distribution, use the following
formula:
Q1 = L1 + [ N/4 - Cf ] I
F
The quartile 3 (Q3) is the value that separates the data from the upper one-fourth. The
formula for quartile 3 is as follows:
Q3 = L1 + [ 3N/4 - Cf ] I
F
Find the quartile (Q. D) of the following scores in the arithmetic computation test of 100
students as shown in the frequency distribution table as follows.
Table 6.1
Frequency Distribution of Arithmetic Computation Test
LL HL f M CF<
94.5 – 99.5 1 97 100
89.5 – 94.5 3 92 99
84.5 – 89.5 8 87 96
79.5 – 84.5 11 82 88
74.5 13 - 79.5 77 77
69.5 – 74.5 19 72 64
64.5 – 69.5 17 67 45
59.5 12 - 64.5 62 28
54.5 – 59.5 9 57 16
49.5 – 54.5 5 52 7
44.5 – 49.5 2 47 2
N= 100
Solution:
Computation of Quartile 1
N = 100 = 25
4 4
Q1 = L1 + [ N/4 - Cf ] I
F
Q1 = 59.5 + [ 25 - 16 ] 5
12
= 59.5 + (9) 5
12
Q1 = 63.25
Computation of Quartile 3
3N = 3(100) = 75
4 4
Q3 = L1 + [ 3N/4 - Cf ] I
F
Q3 = 74.5 + [ 75 - 64 ] 5
13
= 74.5 + (11) 5
13
Q3 = 78.73
Quartile Deviation
Q.D. = 7.74
3. Percentile Range
The Percentile Range is the distance between the 10 th percentile and the 90th percentiles.
The symbol P10 is used to represent the 10th percentile and P90 the 90th percentile.
( 10N - Cf) I
100
P10 =LL + -------------------
F
( 90N - Cf) I
100
P90 =LL + -------------------
F
(
P10 = 54.5 + 10 (100)/ 100 -7 )5
9
(
P10 = 54.5 + 10 -7 )5
9
= 56.17 (ans)
P90 = 84.5 + (2 ) 5
8
P90 = 84.5 + 1.25
= 85.75 (ans)
A. Ungrouped Data
The average deviation is considered more important or appropriate than the quartile
deviation because it takes into account all the individual values of the distribution. The mean
absolute deviation (MAD) measures the extent of each individual data in the distribution that
deviates from the mean of that distribution.
The formula:
MAD = ∑│X - x │
n
To compute for the mean absolute deviation for ungrouped data first arrange the values in
column and find the value of the mean ( x). Determine the deviations of the raw score to the
mean (X- x ). Change the deviations using the absolute value sign │X - x │. Finally get the sum
of the absolute deviations and divide the sum by the total frequency (n).
B. Grouped Data
When the data are presented in a frequency distribution, first compute the mean of the
distribution and find the deviation of each midpoint from the mean. Multiply each absolute
value deviation by the corresponding class frequency. Finally divide the sum of the products
by the total number of observations.
The formula:
MAD = ∑f│M - x │
n
Where:
MAD = mean absolute deviation
f = class frequency
M = class mark
X = the individual value
x = sample mean
n = total class frequency
LL HL f M fM /M -x / f/M - x /
Solutions:
The standard deviation is the most stable and considered the most important measures of
variability. Compared to the mean absolute deviation, it is relatively easier to handle
mathematically.
To solve for the standard deviation first find the sample mean (x). Then find the deviation
of each X value from the mean (x). Square each deviation that is, (X-x) 2. Find the total of the
squared deviation that is Σ (X-x)2. Divide the sum by the total number of observed values.
Extract the square root of the quotient. The resulting root is the value of the standard deviation.
Formula:
Sd = ∑ (X- x) 2 σ = ∑ x 2 – (x)2
n n n
Where:
Ungrouped Data
Find the standard deviation of the grades of sample students in high school physics
Grades
X (X-x) (X-x)2
75 -6 36
77 -4 16
78 -3 9
79 -2 4
80 -1 1
81 0 0
83 2 4
84 3 9
85 4 16
88 7 49
________________ ________________
Solutions:
x=∑X
n
= 810
10
= 81 – the mean
Sd = ∑ (X- x) 2
n
Sd = 144 .
10
Sd= 3.79
Grouped Data
The formula for computing the SD from group data using the midpoint method is as
follows:
Σ (fM)2
Sd = Σ (fM)2 - n
n - 1
Where:
Sd =Standard deviation
M = the midpoint of a class
f = the class frequency
n = the total number of sample observation
LL HL f M fM fM2
∑fM ∑fM2
B. Procedure:
Σ (fM)2
Sd = Σ (fM)2 - n
n - 1
Σ (fM)2
Sd = Σ (fM)2 - n
n - 1
(7090)2
Sd = 514510 - 100
100 - 1
Sd = 11829
99
Sd = 119.48
Sd = 10.93
The results of a 100 – item mathematics test are shown with the following information. (Results
of computation were rounded off to the nearest whole number):
Lowest score 28
Quartile 1 (Q1) 38
Median (Md) 47
Quartile 3 (Q3) 60
Highest score 95
Steps in constructing a box plot
a. Make an appropriate scale along the horizontal axis.
b. Draw a box that starts at Q1 (38) and ends at Q3 (60).
c. Inside a box place a vertical line to represent the median (47)
d. Extend the horizontal line from the box out to the minimum score (95).
Skewness
Another characteristic that can be measured in a set of data is the skewness of
distribution. If the frequency distribution is symmetrical it has no skewness, the skewness is zero
(0). If one or more sets of data are extremely large, the mean of the distribution becomes greater
than the median or mode. Such distribution is said to be positively skewed.
On the other hand, if one or more extremely small data are present and the mean is the
smallest of the three measures of central tendency, the distribution is said to be negatively
skewed.
s\ P15.50
s= 4.0 s= 3.0
P 175 36 38 39 10 11 13
Median
Mode
Graphical Presentation
The Coefficient of Skewness which is a measure used to describe the degree of skewness
was developed by Karl Pearson.
The formula:
Sk = 3 (x – md ) Where:
Sd Sk = Coefficient of skewness
x = the mean
Md= the median
Sd = the standard deviation
The scores of 75 students in a mathematics test had the mean of 71.4, the median of 71.46
and the mode of 72. The standard deviation was 10.5.
a. Determine if the distribution is symmetrical, positively skewed, or negatively skewed.
b. What is the coefficient of skewness?
Solution:
a. The distribution is symmetrical because the there is a slight difference in the mean
median and the mode. The three measures of central tendency almost lie on the same
point.
b. The coefficient of skewness
Sk = 3 (x – md)
Sd
Sk = 3 (71.4 – 71.46)
10.5
Sk = 3 (-.06)
10.5
Sk = -.18
10.5
Sk = -.017
Interpretation: the result of -.017 shows a very negligible amount of skewness (negative). The result is
almost zero. The coefficient of skewness generally lies between the -3 and +3.
Exercise 5a
Name________________________________________________ Score___________
Course/year___________________________________________Date____________
a. The following is the frequency distribution table on a multiple choice test scores in History.
N = 110