Lesson 6 - Lecture
Lesson 6 - Lecture
Introduction
After scoring the test papers in a particular subject, more often than not
teachers are faced with a problem in describing and interpreting the scores of their
pupils. They find difficulty on describing and synthesizing data that facilitates
decision making. In this lesson, we will attempt to present different ways of
tabulating and graphing data.
Suppose we have just given a Math test to our Grade VI pupils. We have
scored the papers. What are we going to do with the data? Some of the question
that we will probably ask include “What is the general pattern of the set of scores?,
or ‘ What do these scores look like?”, or “ How can we picture the set of scores to
get an impression of the group as a whole? To answer these questions, we will need
to consider simple ways of tabulating and graphing a set of scores.
The simplest rearrangement would be to just arrange the scores from highest
to lowest, but this simple arrangement of scores still has too much detail for us to
understand general pattern clearly. We need to condense it into a more compact
form so that computation and interpretation would be easier.
bjectives
Example: Thirty (30) or 60% of students who took the Mathematics test passed the test.
Of these, 13 or 26% got scores between 60-65, and 17 or 34% got scores higher than 65.
2. Tabular presentation – this method utilizes rows and columns like a frequency
distribution. Data is presented in a systematic and orderly manner. If the data is relatively
large, say more than 30, it is more appropriate to present the data in a grouped frequency
distribution.
One way of organizing the scores for presentation is to prepare what is termed as
a frequency distribution. This is a table showing how often each score occurred. Each
score value is listed and the number of times it occurred is shown.
1. Find the Range of the scores. The Range is the score distance between the highest
and the lowest scores.
Range = Highest score – Lowest Score
Too many or too few intervals may sacrifice the needed information to see the pattern. In
this illustration, 10 will be used as the number of class intervals
5. Get the class mark. The class mark is the average of the lower and upper limits of a
class interval
Class mark = = 13.5
Illustrative example
A Math test is given to Grade 12 students. The scores of the 50 students are given
below. Let’s make a frequency distribution and tally the frequency.
48 35 36 40 42
32 30 46 43 40
35 15 44 48 45
28 16 41 46 39
20 19 38 47 31
25 18 39 43 28
28 33 19 39 29
36 34 29 31 18
38 13 16 29 19
41 15 44 28 12
Solution:
1. The Range of the distribution is 36 ( HS-LS; 48-12=36)
2. Class interval is 10
3. The Interval of the distribution is 4 (Range/10; 36/10 = 3.6 or 4)
4. The Lowest Limit of the distribution is 12 – 15 (Note that the lowest score (12) is
exactly divisible by the interval (4).
5. The class mark for the lowest step interval is 13.5
3. Graphical Presentation
The left most part of the histogram represents the step or score interval where
lower scores can be located while the higher scores are located on the right most part of
the graph. As compared to frequency distribution, one can get quick information as to
what score interval did most scores fall, or least fall by simply looking at the piles.
a. Bar graph
A College Algebra test was given to 60 students and the scores are :
Number of
Scores students
10 3
11 4
12 8
13 10
14 5
15 7
16 8
17 4
18 5
19 4
20 2
b. Line graph
Below is the enrolment trend of a university over the last five years
c. Pie graph
Below is the enrolment data of the College of Education, SY 2018-2019.
Number of
Year level students
First 175
Second 130
Third 120
Fourth 100
MEASURES OF CENTRAL TENDENCY
THE MEAN
The arithmetic mean or mean is the most familiar and most widely used measure
of central tendency. It is also the most reliable value in which all the values of the
variable are taken into consideration. The arithmetic mean for ungrouped data is obtained
by taking the sum of all the values in a set of observations divided by the number of
observations. In symbols,
where: is the arithmetic mean of the X’s (observations), ∑X (the Greek letter sigma),
is ‘the sum of X” , and n is the number of observations, or in our case, scores.
Example 1. Carlo obtained the following scores for the 10 quizzes given by his teacher:
25,28, 34, 35, 28, 37, 36, 32, 35 and 34 .
The mean is:
=
Example 2. Joan got the following scores in the performance tasks:
90 ,75, 80, 83, 87, 86, 84, 80, and 92
= 84.11
Example 1
Fifty (50) students were given a test in Science and their scores are presented below. The
table shows that of the 50 students, 4 of them got a score of 70, 6 students got a score of
67, and so forth.
Xi fi Xifi
70 4 280 (70x4)
67 6 402
The weighted mean is:
64 5 320
60 7 420
58 5 290
56 4 224
54 5 270
50 6 300
45 8 360
Ʃ f1 =50 Ʃ Xifi= 2,866
Example 2
Fifty-five (55) students were given a test in Chemistry and their scores are:
Xi fi Xifi
92 3 276 (92x3)
90 8 720
88 4 352 The weighted mean is:
86 5 430
84 10 840
83 9 747
80 7 560
79 9 711
Ʃ f1 =55 Ʃ Xifi= 4,636
Solution:
Example 2
Solution:
If there are n numbers in the array and n is an odd number, the median is found by
the formula, Median = (n+1)/2.
For example, if there are 11 numbers in the list, the median is (11+1)/2 = 6th ,
which is the middle position in the array. If n is an even number, then the median lies
between two observations occupying the middle portion of the distribution curve. For
example, If n = 10, then the median lies between the (10 + 1) = 5.5th observation.
Example 1
98 72 95 75 90 80 88 81 88 86 92
Median = (11+1)/2 = 6.
The middle score is the 6th score, either from the highest or from the lowest, and it is 88.
Example 2
128 127 127 125 124 120 120 119 118 115
The scores are already in descending order. Since there are 10 scores and is even,
the middle score is the 5.5th score. The 5th score is 124 and the 6th score is 120.
Therefore, the middle score is the average of 124 and 120 which is 122. (Note that in the
array, there is no score of 122, but statistically, it is the middle score. It is the score that
separates the top half from the bottom half of the score distribution).
The computation and interpretation of median for ungrouped data is easy to do
and easy to understand. It is affected by the level of measurement and the shape of the
distribution, not the number of observations. Median is usually used for ordinal data or
when there are extreme (too high or too low) values in the distribution.
As what has been discussed, the median is the value that occupies the middle
position in an array of scores. Since the actual values of a data set are lost when the
frequency table was constructed, it’s only possible for us to approximate for the value of
the median from grouped data. The first step in computing for the median is to locate the
class that contains the median observation. Then compute the median value by
interpolating within the median class on the assumption that there is an even distribution
of values throughout the class.
Example1
Using the scores in the English test
Test Scores Number of Students Cumulative Frequency
(f) (cf)
48-51 2 50
44-47 6 48
40-43 7 42
36-39 7 35
32-35 5 (one step higher) 28
28-31 10 23 (less than 25)
24-27 1 13
20-23 1 12
16-19 7 11 (4+7=11)
12-15 4 4
N=50
The formula for computing the median for grouped data is:
given scores, . Find in the column the number which is less than or equal
to 25. In the group, the cf of the median class is 23. The frequency will be one step
higher than the frequency where the median class is located, hence . The Lowest
limit is one-half less than the lower limit of the interval where the median is located. In
this example the lower limit is 32 , so the lowest limit (32-0.5) = 31.5. Thus,
Example 2
THE MODE
The Crude Mode. It can be found by mere inspection. It may not exist in some
sets of data, or there may be more than one mode in other sets of data. A bimodal
distribution should probably have two modes. Extreme scores in the distribution do not
affect the mode but this is the least reliable measure of central tendency than mean or
median.
Example 1 96 97 98 97 93 90 89 97 81 80
What is the most frequent score? Or, which score has the highest frequency? The most
frequently occurring score is 97, with 3 observations. Thus, the mode is 97.
Example 2 92 92 90 89 89 88 87 86 85 84
In this example, the modes are 92 and 89 (bimodal). Both scores have a frequency of 2.
Example 3 90 89 88 87 86 85 84 83 82 81
The formula is :
For the English test, Median is 33.1 and the Mean is 32.22.
Therefore:
= 32.86
For the Math test, median is 73.75 and the mean is 73.
Therefore:
=221.25 – 146
= 75.25
Advantages of the Mode
1. If we want to get a quick estimate of measure of central tendency
2. It can be easily observed especially when the data is presented graphically
RANGE
A very simple measure of variability is to get the score difference between the
highest and the lowest score and this measure is called the range of the distribution. If in
a reading test for example, the highest score is 95 and the lowest is 45, the range is 40.
However, the range depends only upon the 2 extreme scores in the total group. This
makes this measure a very unreliable because it can be changed a good bit by the
inclusion or omission of a single extreme case. The example below illustrates that the
range of a set of scores is affected by a single extreme score. The Range for group 1 is
40 while the range for group 2 is 70, but the only difference is the lowest scores.
Group 1 45 50 76 77 80 81 90 95
Group 2 25 50 76 77 80 81 90 95
Another measure of variability is the range of scores that includes a specified part
of the total group – usually the middle fifty percent. The middle fifty percent of the
group are scores lying between the 25th and 75th percentiles. The 25th (Q1) and 75th (Q3)
percentiles are called quartiles since they cut off the bottom quarter and the top quarter of
the group respectively. The score distance between them is called the interquartile range.
The statistic that is often reported as a measure of variability is the semi-quartile range
(Q), which is half of the interquartile range.
So,
Example
56 57 63 75 78 79 80 82 87 89 90 92
Q1 = (n+1)/4 , is the position of the first quartile (Q1), where n is the number of scores
= (12+1)/4
= 3.25
Similarly,
Therefore:
For Grouped Scores
Here is a set of score from a class of 50 student. The scores have already been
summarized into a frequency distribution.
X f cf
95-99 3 50
90-94 4 47
85-89 5 43
80-84 8 38
75-79 6 30 Q3
70-74 10 24
65-69 4 14
60-64 4 10 Q1
55-59 2 6
50-54 0 4
45-49 1 4
40-44 3 3
Note: Just like in finding the median of a set of grouped scores, the first step is to get the
cumulative frequency. In the formula, N is divided by 4 (N/4), because we are looking for the
score which is at middle of the lower half of the distribution. Just like the median, the same
procedure is applied in looking for the value of cf, f and the lowest limit.
So,
So,
DECILES
The score that divides the distribution into ten equal parts is called the decile. Just
like Q1 and Q3, we can compute the Decile by determining the number of cases required.
For example, we are looking for the 1st Decile (D1), then we divide N by 10, then 2N/10
for D2, 3N/10 for D3 , and so on.
Example . Find the 3nd Decile (D3) and 8th Decile (D8)
56 57 63 75 78 79 80 82 87 89 90 92 95 96 97
If the answer is a whole number, get the average of that corresponding value in your
data set and the value that directly follows it. In this example, the 8th decile is the average of the
12th and the 13th score. That is (92+95)/2 = 93.5
Just like in finding the quartiles of a set of grouped scores, the first step is to get the
cumulative frequency. The same procedure is applied in looking for the value of cf, f and the
lowest limit.
X f cf
75-77 3 50
72-74 5 47
69-71 2 42
66-68 4 40
63-65 3 36
60-62 0 33
57-59 2 33
54-56 4 31
51-53 6 27
48-50 3 21 2nd Decile or D2 = LD2+ i [ (2N/10 – cf)/ f D2 ]
45-47 4 18 = 41.5 + 3 [(.20x50)-8]/6
42-44 6 14 D2 = 41.5 + 3 [(10-8)/6 ]
39-41 5 8 = 41.5 + 1.00
36-38 1 3 = 42.50
33-35 2 2
N=50
PERCENTILES
The same procedure may be used when we find the score below which any
percentage of the group falls. These values are called percentiles. The median is the 50th
percentile, i.e., the score below which 50 percent of individuals fall. If we want to find
the 40th percentile, we must find the score below which 40 percent of the cases fall. Any
other percentiles can be found in the same way. Percentiles have many uses, especially
in connection with test norms and interpretation of scores
Example . Find the 45th Percentile (P45) and 60th Percentile (P60)
56 57 63 75 78 79 80 82 87 89 90 92 95 96 97
If the answer is a whole number, get the average of that corresponding value in your
data set and the value that directly follows it. In this example, the 60th percentile is the average of
the 9th and the 10th score. That is (87 + 89)/2 = 88.
Just like in finding the quartiles and deciles of a set of grouped scores, the first step is to
get the cumulative frequency. The same procedure is applied in looking for the value of cf, f and
the lowest limit.
Example: Find P20
X f cf
75-77 3 50
72-74 5 47
69-71 2 42
66-68 4 40
63-65 3 36
60-62 0 33
57-59 2 33
54-56 4 31
51-53 6 27
48-50 3 21
45-47 4 18
42-44 6 14 P20
39-41 5 8
36-38 1 3
33-35 2 2
N=50
We are looking for the 20th percentile or the score in which 20 percent of the cases falls
below it, then
Note: The values of P20 and D2 are the same because in either measure, we are looking
for the score in which 20 percent of the cases falls below it.
The variance is a measure of variability among all scores in the distribution rather
than through extreme scores or only a proportion of the scores. It considers each
observation relative to the mean of the set of scores. It is derived by getting the sum of
the squared deviation from the mean divided by n-1 (for sample variance), and N (for
population variance)
Sample variance (s2)
where:
= deviation from the mean
= squared deviation
= sum of the squared deviation
Example 1. Compute the variance of the following Algebra scores of ten students:
92 75 85 83 90 73 79 80 88 85
Steps:
1. Find the Mean (Mean = 83)
2. Subtract the Mean from the scores to get d (i.e. 92-83 = 9; 75-83 = -8, etc.)
3. Square the deviation (i.e. 92 = 81, (-8)2 = 64,etc. )
4. Find the sum of the squared deviation (∑d2= 352)
5. Divide the sum of the squared deviation by the (n -1 = 9)
Score d d2
92 +9 81
75 -8 64
85 +2 4
83 0 0
90 +7 49
73 -10 100
79 -4 16
80 -3 9
88 +5 25
85 2 4
N = 10 ∑d2= 352
Mean = 83
s2 ; = 39.11
Example 2. Compute for variance of the following Geometry scores of the ten students:
92 95 75 63 45 87 99 90 98 86
Score d d2
92 9 81
95 12 144
75 -8 64
63 -20 400
45 -38 1444
87 4 16
99 16 256
90 7 49
98 15 225
86 3 9
N = 10 ∑d2= 2,688
Mean = 83
The standard deviation gives a better idea of how the data entries differ from the mean. It
is computed by extracting the square root of the variance. The formula for the sample standard
deviation is: or 2
.
Thus, in Example 1 (Algebra scores), the sample standard deviation
is . In Example 2 (Geometry scores), the sample standard deviation
is
So how do we interpret the standard deviation of 6.25 and 17.28. For the Algebra scores,
it means that on the average, the scores are 6.25 away from the mean. For the Geometry scores,
it means that on the average, the distance of the scores from the mean is 17.28. Theoretically,
standard deviation and variance describe how scattered the scores are from a central point (the
mean). In layman’s term, the higher the value of the standard deviation or variance, the more the
scores scatter from the mean. Thus, the distances of the scores are larger. Based on the two
given examples, the average scores are the same (Mean =83), and the number of scores is also
the same (n=10). But the scores for Geometry are farther away from each other and from the
mean, compared to the scores in Algebra.
An alternative way of computing for the standard deviation is to use the sum of all the
scores and the sum of all its squares. The formula is:
Xi (scores) Xi2
92 8,464
75 5,625
85 7,225
83 6,889
90 8,100
73 5,329
79 6,241
80 6,400
88 7,744
85 7,225
∑Xi= 830 ∑Xi2= 69,242
Steps:
1. Get the sum of scores
2. Square all the scores and get the sum .
3. Substitute it with the formula
; ; = 6.25
Xi (scores) Xi2
92 8464
95 9025
75 5625
63 3969
45 2025
87 7569
99 9801
90 8100
98 9604
86 7396
∑Xi= 830 ∑Xi2= 71578
; ; =
The standard deviation and variance of grouped data are calculated using the class
marks of each step interval, or using the deviations
Not
X F Class Mark f x CM f x (CM)2
(CM)
75-77 3 76 (75+77)/2 228 (3x76) 17328 (3x762)
72-74 4 73 292 21316
69-71 6 70 420 29400
66-68 5 67 335 22445
63-65 8 64 512 32768
60-62 9 61 549 33489
57-59 5 58 290 16820
54-56 8 55 440 24200
51-53 3 52 156 8112
48-50 2 49 98 4802
45-47 2 46 92 4232
N=55 3412 2
= 214912
=
=
= 7.75 The variance (s2) in this data set is (7.75)2 = 60.08.
where:
= interval
= number of samples
= summation of frequency deviation
= summation of frequency x squared deviation
Steps:
1. Choose any step interval for the assumed mean as the arbitrary starting point or
“origin”. In the example given, the interval 60-62 has been chosen. Call this interval zero
deviation, and the next higher interval +1, the lower interval -1, etc. These are shown in
the column labeled d. (Note: Any interval can be chosen, and the final result will be the
same)
2. Multiply frequency (f) by the number of deviations (d) and the resulting product is
shown in column labeled fd. Get the sum of fd by taking into account the plus and minus
signs.
3. To get fd2, multiply d by the fd. Then get the sum of fd2
Illustration:
X F d fd fd2
75-77 3 5 15 75
72-74 4 4 16 64
69-71 6 3 18 54
66-68 5 2 10 20
63-65 8 1 8 +67 8
60-62 9 0 0 0
57-59 5 -1 -5 5
54-56 8 -2 -16 32
51-53 3 -3 -9 27
48-50 2 -4 -8 32
45-47 2 -5 -10 - 48 50
N=55 ∑fd = +19 ∑fd2 = 367
= 7.75
The Coefficient of Variation
The coefficient of variation (CV) is a measure that compares the variability of two
sets of data. The formula is:
CV = x 100%
Using Example 1 (Algebra scores), the Standard deviation (s) = 6.25 and Mean = 83
CV = x 100% = 7.53 %
Using Example 2 (Geometry scores), the Standard deviation (s) = 17.28 and Mean = 83
CV = x 100% = 20.82 %