0% found this document useful (0 votes)
10 views31 pages

Lesson 6 - Lecture

This lesson focuses on organizing and describing assessment data, specifically through methods such as frequency distribution and graphical representations. It outlines objectives for teachers, including preparing frequency distributions, analyzing data, and computing measures of central tendency. Various presentation methods, including textual, tabular, and graphical forms, are discussed to help educators effectively interpret and communicate assessment results.

Uploaded by

cuaresmatrisha20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views31 pages

Lesson 6 - Lecture

This lesson focuses on organizing and describing assessment data, specifically through methods such as frequency distribution and graphical representations. It outlines objectives for teachers, including preparing frequency distributions, analyzing data, and computing measures of central tendency. Various presentation methods, including textual, tabular, and graphical forms, are discussed to help educators effectively interpret and communicate assessment results.

Uploaded by

cuaresmatrisha20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Lesson 6

ORGANIZING AND DESCRIBING ASSESSMENT DATA

Introduction

After scoring the test papers in a particular subject, more often than not
teachers are faced with a problem in describing and interpreting the scores of their
pupils. They find difficulty on describing and synthesizing data that facilitates
decision making. In this lesson, we will attempt to present different ways of
tabulating and graphing data.

Suppose we have just given a Math test to our Grade VI pupils. We have
scored the papers. What are we going to do with the data? Some of the question
that we will probably ask include “What is the general pattern of the set of scores?,
or ‘ What do these scores look like?”, or “ How can we picture the set of scores to
get an impression of the group as a whole? To answer these questions, we will need
to consider simple ways of tabulating and graphing a set of scores.

The simplest rearrangement would be to just arrange the scores from highest
to lowest, but this simple arrangement of scores still has too much detail for us to
understand general pattern clearly. We need to condense it into a more compact
form so that computation and interpretation would be easier.

bjectives

After completing this lesson, you are expected to:

 enumerate the steps in preparing a frequency distribution;


 prepare a frequency distribution of a set of scores;
 describe and compare sets of scores by analyzing frequency distribution;
 translate a frequency distribution into graphical representations such as the
histogram and the frequency polygon;
 describe assessment data using descriptive statistics including the
measures of central tendency (mean, median, mode and the measures of
variability (range, quartiles, deciles, percentiles and standard deviation);
 determine when to use mean, median and mode as a way of describing
assessment data;
 compute the measures of central tendency and variability of sets of scores;
and
 report measures of central tendency and variability of a set of scores in an
organized and systematic manner

Three different ways of presenting data

1. Textual presentation– the presentation is in narrative or paragraph form. This is


appropriate if the data being presented is simple and does not require too much of details.
However, it may not get the immediate interest of the readers because it is textual

Example: Thirty (30) or 60% of students who took the Mathematics test passed the test.
Of these, 13 or 26% got scores between 60-65, and 17 or 34% got scores higher than 65.

2. Tabular presentation – this method utilizes rows and columns like a frequency
distribution. Data is presented in a systematic and orderly manner. If the data is relatively
large, say more than 30, it is more appropriate to present the data in a grouped frequency
distribution.

Preparing Frequency Distribution

One way of organizing the scores for presentation is to prepare what is termed as
a frequency distribution. This is a table showing how often each score occurred. Each
score value is listed and the number of times it occurred is shown.

Steps in constructing a frequency distribution

1. Find the Range of the scores. The Range is the score distance between the highest
and the lowest scores.
Range = Highest score – Lowest Score

Example: The highest score in the test is 48 and the lowest is 12


H.S. 48
L.S. -12
36 is the Range
2. Decide on the number of class intervals (k) .
Maximum number of class intervals – 20
Minimum number of class intervals – 7
Ideal number of class intervals - 10-15
Note: There are at least two alternative ways of determining the number of class
intervals:
a. k = n, where n is the number of observations. So if n = 50, then the
number of class intervals is 50 = 7
b. The Sturges formula : k = 1 + 3.32 log10n
If n = 50, then k = 1 + 3.32 x 1.70 = 6.6 or 7

Too many or too few intervals may sacrifice the needed information to see the pattern. In
this illustration, 10 will be used as the number of class intervals

3. Determine the interval (i)


Interval = Range number of steps (In this example , 10 was chosen)
Interval = 36 10 = 3.6 or 4

4. Get the Lowest Limit (L.L.) of the step interval.


Divide the Lowest Score by the interval and then multiply by the interval
Example: 12 4 = 3 x 4 = 12
So the Lowest Limit is 12 – 15.
Remember that the Lowest Limit should be equal to the number that is
exactly divisible by the interval, so we round off the answer for (Lowest score
interval).

5. Get the class mark. The class mark is the average of the lower and upper limits of a
class interval
Class mark = = 13.5
Illustrative example
A Math test is given to Grade 12 students. The scores of the 50 students are given
below. Let’s make a frequency distribution and tally the frequency.

48 35 36 40 42
32 30 46 43 40
35 15 44 48 45
28 16 41 46 39
20 19 38 47 31
25 18 39 43 28
28 33 19 39 29
36 34 29 31 18
38 13 16 29 19
41 15 44 28 12

Solution:
1. The Range of the distribution is 36 ( HS-LS; 48-12=36)
2. Class interval is 10
3. The Interval of the distribution is 4 (Range/10; 36/10 = 3.6 or 4)
4. The Lowest Limit of the distribution is 12 – 15 (Note that the lowest score (12) is
exactly divisible by the interval (4).
5. The class mark for the lowest step interval is 13.5

Frequency Distribution of Grade 12 Math Scores

Class Interval Tally Frequency Class Mark


48-51 II 2 49.5
44-47 IIII – I 6 45.5
40-43 IIII – II 7 41.5
36-39 IIII – II 7 37.5
32-35 IIII 5 33.5
28-31 IIII – IIII 10 29.5
24-27 I 1 25.5
20-23 I 1 21.5
16-19 IIII – II 7 17.5
12-15 IIII 4 13.5
Remember that:
1. The Lowest score (12) is located on the lowest step or score interval (12-15).
Similarly, the highest score (48) is located on the highest step or score interval
(48-51).
2. All numbers on the left (we call it lower limit) are exactly divisible by the
interval (4).
3. There are 10 step or score intervals (because we chose to divide the
distribution into 10. Nevertheless, there are instances that number of step or
score interval exceeds ten, especially when lowest score is not divisible by the
interval (product of rounding off numbers)
4. The frequency distribution provides not only a summary of the scores but it is
clearer what scores occurred most frequently, least frequently and the relative
performance of the whole group. That is: when higher number can be found
on the higher step intervals, it means that most of the students got high scores.
Conversely, when most of the students got low scores, higher frequencies can
be found on the lower step or score intervals.

5. We can summarize even larger number of scores in a frequency distribution.


For example, we can summarize a set of 1000 scores , and we can easily
describe how the scores run from high to low, how many obtained high and
low scores, among the few important others.

3. Graphical Presentation

It is often helpful to translate data into a pictorial representation. A common type


of graphic representation, which is called a histogram, is shown below.
The histogram can be thought of somewhat grimly, as “piling up the bodies”. The
score intervals are shown along the horizontal base line (abscissa). The vertical height of
the pile (ordinate) represents the number of cases. The diagram indicates that there are
four “bodies” piled up in the interval 12-15, seven in the interval 16-19, and so forth.
The figure gives a clear picture of how the piles up, with most of them in the 28-31, while
only few (only 2) got scores between 20-27.

The left most part of the histogram represents the step or score interval where
lower scores can be located while the higher scores are located on the right most part of
the graph. As compared to frequency distribution, one can get quick information as to
what score interval did most scores fall, or least fall by simply looking at the piles.

Another pictorial representation is thru the use of frequency polygon.

Frequency polygons are graphical devices for understanding the shapes of


distributions. They serve the same purpose as histograms, but are especially helpful for
comparing sets of data. Frequency polygons are also a good choice for displaying
cumulative frequency distributions.

To create a frequency polygon, start just as for histograms, by choosing a class


interval. Then draw an X-axis representing the values of the scores in your data. Mark the
middle of each class interval with a tick mark, and label it with the middle value
represented by the class. Draw the Y-axis to indicate the frequency of each class. Place a
point in the middle of each class interval at the height corresponding to its frequency.
Finally, connect the points. The graph will then touch the X-axis on both sides.
Other graphical forms include:

a. Bar graph
A College Algebra test was given to 60 students and the scores are :
Number of
Scores students
10 3
11 4
12 8
13 10
14 5
15 7
16 8
17 4
18 5
19 4
20 2

b. Line graph
Below is the enrolment trend of a university over the last five years
c. Pie graph
Below is the enrolment data of the College of Education, SY 2018-2019.

Number of
Year level students
First 175
Second 130
Third 120
Fourth 100
MEASURES OF CENTRAL TENDENCY

Measures of central tendency provide us a convenient way of describing a set of


data with a single number. On the average, how many times did you go to a library to
borrow recent books for the subject that you are teaching? What percentage of your
income is allotted for professional growth? Which month of the year do you spend the
most for social obligations? The word “averages” has been applied to several measures of
central tendency. The purpose of these averages is to summarize data into a single value,
a typical value, or middle position of a set of values that can be used to describe the basic
characteristics of the frequency distribution data. In this module, three commonly used
measures of central tendency-mean, median and mode will be discussed for ungrouped
(raw) and grouped data. Ungrouped data are raw data and grouped data are raw data that
have been compressed into a frequency distribution table for better and easy
understanding.

THE MEAN

The Arithmetic Mean

The arithmetic mean or mean is the most familiar and most widely used measure
of central tendency. It is also the most reliable value in which all the values of the
variable are taken into consideration. The arithmetic mean for ungrouped data is obtained
by taking the sum of all the values in a set of observations divided by the number of
observations. In symbols,

where: is the arithmetic mean of the X’s (observations), ∑X (the Greek letter sigma),
is ‘the sum of X” , and n is the number of observations, or in our case, scores.

Example 1. Carlo obtained the following scores for the 10 quizzes given by his teacher:
25,28, 34, 35, 28, 37, 36, 32, 35 and 34 .
The mean is:

=
Example 2. Joan got the following scores in the performance tasks:
90 ,75, 80, 83, 87, 86, 84, 80, and 92

= 84.11

Mean can also be viewed as the ‘center of gravity’ of a distribution. It serves as


the fulcrum in balancing all the values in the distribution.

The Weighted Mean

This method involves multiplying each of the scores by the corresponding


frequency. Then add the products and divide by the number of scores.

where: X = number of different values of X in the set


f1 = frequency of the corresponding scores

Example 1

Fifty (50) students were given a test in Science and their scores are presented below. The
table shows that of the 50 students, 4 of them got a score of 70, 6 students got a score of
67, and so forth.

Xi fi Xifi
70 4 280 (70x4)
67 6 402
The weighted mean is:
64 5 320
60 7 420
58 5 290
56 4 224
54 5 270
50 6 300
45 8 360
Ʃ f1 =50 Ʃ Xifi= 2,866
Example 2
Fifty-five (55) students were given a test in Chemistry and their scores are:

Xi fi Xifi
92 3 276 (92x3)
90 8 720
88 4 352 The weighted mean is:
86 5 430
84 10 840
83 9 747
80 7 560
79 9 711
Ʃ f1 =55 Ʃ Xifi= 4,636

The Mean for Grouped Data

Computing the arithmetic mean for a frequency distribution is almost similar to


computing the mean for ungrouped data. But since the compression of data in a
frequency table resulted in the loss of the actual values of observations in each class, it is
necessary to assume that every observation in a class has a value equivalent to the class
midpoint.
Example 1. The following are the students’ test scores in an English class.
Test Number of Students Class Mark f x CM
Scores (f) (CM)
48-51 2 49.5 (48+51)/2 99 (2 x 49.5)
44-47 6 45.5 273
40-43 7 41.5 290.5
36-39 7 37.5 262.2
32-35 5 33.5 167.5
28-31 10 29.5 295
24-27 1 25.5 25.5
20-23 1 21.5 21.5
16-19 7 17.5 122.5
12-15 4 13.5 54
n=50 Ʃ f x CM= 1,611
The formula for computing the mean for grouped data is:

where: f= the frequency or number of observation in a class


CM= the class mark of every observation in the class
n= the total number of frequencies or observations in the distribution

Solution:

Example 2

The following are the students’ test scores in a Math class.


Test Number of Students Class Mark f x CM
Scores (f) (CM)
90-94 3 92 (90+94)/2 276 (3x92)
85-89 8 87 696
80-84 6 82 492
75-79 9 77 693
70-74 10 72 720
65-69 6 67 402
60-64 5 62 310
55-59 3 57 171
50-54 4 52 208
45-49 1 47 47
n=55 Ʃ f x CM= 4,015

Solution:

Advantages of the Mean:


1. The concept of mean is familiar to most people and it is clear
2. It is the most stable and most reliable measure of central tendency
2. Every set of data has a mean – and one and only mean.
3. The mean is useful in computing other statistics such as mean comparisons
(t- test or ANOVA)

Disadvantages of the Mean:


1. The mean is affected by extremely low or high values
2. It is time consuming to compute because all data point is included, and most
especially of you have large data set.
THE MEDIAN

Median is a measure of central tendency that occupies the middle position in an


array of observations or the 50th percentile of a distribution where 50% lie above the
median and 50% lie below the median. The word ’array’ denotes that it is necessary to
rank the data first either in ascending (from lowest to highest) or descending (from
highest to lowest) order before selecting the middle observation.

The Median for Ungrouped Data

If there are n numbers in the array and n is an odd number, the median is found by
the formula, Median = (n+1)/2.
For example, if there are 11 numbers in the list, the median is (11+1)/2 = 6th ,
which is the middle position in the array. If n is an even number, then the median lies
between two observations occupying the middle portion of the distribution curve. For
example, If n = 10, then the median lies between the (10 + 1) = 5.5th observation.

Example 1
98 72 95 75 90 80 88 81 88 86 92

Arranging the scores from highest to lowest, we have


98 95 92 90 88 88 86 81 80 75 72

Median = (11+1)/2 = 6.

The middle score is the 6th score, either from the highest or from the lowest, and it is 88.

Example 2

128 127 127 125 124 120 120 119 118 115

Median = (10+1)/2 = 5.5.

The scores are already in descending order. Since there are 10 scores and is even,
the middle score is the 5.5th score. The 5th score is 124 and the 6th score is 120.
Therefore, the middle score is the average of 124 and 120 which is 122. (Note that in the
array, there is no score of 122, but statistically, it is the middle score. It is the score that
separates the top half from the bottom half of the score distribution).
The computation and interpretation of median for ungrouped data is easy to do
and easy to understand. It is affected by the level of measurement and the shape of the
distribution, not the number of observations. Median is usually used for ordinal data or
when there are extreme (too high or too low) values in the distribution.

The Median for Grouped Data

As what has been discussed, the median is the value that occupies the middle
position in an array of scores. Since the actual values of a data set are lost when the
frequency table was constructed, it’s only possible for us to approximate for the value of
the median from grouped data. The first step in computing for the median is to locate the
class that contains the median observation. Then compute the median value by
interpolating within the median class on the assumption that there is an even distribution
of values throughout the class.

Example1
Using the scores in the English test
Test Scores Number of Students Cumulative Frequency
(f) (cf)
48-51 2 50
44-47 6 48
40-43 7 42
36-39 7 35
32-35 5 (one step higher) 28
28-31 10 23 (less than 25)
24-27 1 13
20-23 1 12
16-19 7 11 (4+7=11)
12-15 4 4
N=50

The formula for computing the median for grouped data is:

where :Md = the median


LLMd = lowest limit of the median class
n = total number of frequencies in the distribution
cf = cumulative frequency of the median class
fMd = frequency of the median
i = size of the interval of the median class
Find the value which will correspond to to determine the median class. In the

given scores, . Find in the column the number which is less than or equal
to 25. In the group, the cf of the median class is 23. The frequency will be one step
higher than the frequency where the median class is located, hence . The Lowest
limit is one-half less than the lower limit of the interval where the median is located. In
this example the lower limit is 32 , so the lowest limit (32-0.5) = 31.5. Thus,

Example 2

Using the scores in the Math test


Test Scores Number of Students Cumulative Frequency
(f) (cf)
90-94 3 55
85-89 8 52
80-84 6 44
75-79 9 38
70-74 10 29
65-69 6 19
60-64 5 13
55-59 3 8
50-54 4 5
45-49 1 1
n=55
Advantages of the Median:
1. It is not affected by extreme values, unlike the mean
2. It is easy to understand and can be calculated from any kind of data.
3. The median can be applied even for qualitative data
4. It can be used when the data is ordinal or the data is badly skewed (too many
high or low values)

Disadvantages of the Median:


1. Data have to be arranged first in an array before computing the median. Thus,
it is time consuming involving large data sets
2. Some statistical techniques using the median is more complicated than the
mean.

THE MODE

The mode, by definition is the most frequently occurring observation in a series of


scores or the most popular score in the list. It is the score that occurs more times than any
other scores. It is the score having the highest point in a frequency polygon.

The Crude Mode. It can be found by mere inspection. It may not exist in some
sets of data, or there may be more than one mode in other sets of data. A bimodal
distribution should probably have two modes. Extreme scores in the distribution do not
affect the mode but this is the least reliable measure of central tendency than mean or
median.

Example 1 96 97 98 97 93 90 89 97 81 80

What is the most frequent score? Or, which score has the highest frequency? The most
frequently occurring score is 97, with 3 observations. Thus, the mode is 97.

Example 2 92 92 90 89 89 88 87 86 85 84

In this example, the modes are 92 and 89 (bimodal). Both scores have a frequency of 2.

Example 3 90 89 88 87 86 85 84 83 82 81

In this example, there is no mode.


The True Mode. If values are unknown in a frequency table, the mode must also
be approximated. From our previous lesson on histogram, it is assumed that the most
commonly occurring value in a frequency distribution is found in the largest class and
directly under the peak of a frequency polygon. We can compute for the value of the
mode in terms of the values of the mean and median. (We can also compute it with a
separate formula, but is not part of the module. We will just maximize the values that
were obtained earlier)

The formula is :

For the English test, Median is 33.1 and the Mean is 32.22.

Therefore:

= 32.86

For the Math test, median is 73.75 and the mean is 73.

Therefore:
=221.25 – 146
= 75.25
Advantages of the Mode
1. If we want to get a quick estimate of measure of central tendency
2. It can be easily observed especially when the data is presented graphically

Disadvantages of the Mode:


1. It is very unreliable because only the score with the highest frequency is reported.
2. You cannot utilize it for other statistical techniques, aside from ranking frequencies
MEASURES OF VARIABILITY

When we summarize and describe a set of scores, or a frequency distribution, we


are also interested to report how variable the scores are, or how much they spread out
from high to low scores. For example, two groups of children, both with the median age
of 10 years would represent quite different educational situations if one had a spread age
9 to 11 while the other ranged from 6 to 14. Another example would be two groups of
children have a mean score of 75 on an achievement test, but one group has a score
ranging from 50 to 93 while the other group has scores ranging from 60 to 82. A
measure of this spread or dispersion is an important statistic for describing a group

RANGE

A very simple measure of variability is to get the score difference between the
highest and the lowest score and this measure is called the range of the distribution. If in
a reading test for example, the highest score is 95 and the lowest is 45, the range is 40.
However, the range depends only upon the 2 extreme scores in the total group. This
makes this measure a very unreliable because it can be changed a good bit by the
inclusion or omission of a single extreme case. The example below illustrates that the
range of a set of scores is affected by a single extreme score. The Range for group 1 is
40 while the range for group 2 is 70, but the only difference is the lowest scores.

Group 1 45 50 76 77 80 81 90 95
Group 2 25 50 76 77 80 81 90 95

Think about this!


Can we use the range to compare two groups relative performance? Does a higher
value of range indicate lower performance? Why? Why not?

SEMI-INTERQUARTILE RANGE OR QUARTILE

Another measure of variability is the range of scores that includes a specified part
of the total group – usually the middle fifty percent. The middle fifty percent of the
group are scores lying between the 25th and 75th percentiles. The 25th (Q1) and 75th (Q3)
percentiles are called quartiles since they cut off the bottom quarter and the top quarter of
the group respectively. The score distance between them is called the interquartile range.
The statistic that is often reported as a measure of variability is the semi-quartile range
(Q), which is half of the interquartile range.
So,

Finding First Quartile (Q1 ) and Third Quartile (Q3)

For Ungrouped scores

Example

56 57 63 75 78 79 80 82 87 89 90 92

Q1 = (n+1)/4 , is the position of the first quartile (Q1), where n is the number of scores
= (12+1)/4
= 3.25

Q1 = 3rd score from the lowest + .25 (4th score-3rd score)


= 63 + .25 (75-63)
= 66

Similarly,

Q3 = 3 (n + 1)/4, is the position of the third quartile (Q3)


= 3 (12 + 1)/4
= 9.75

Q3 = 9th score from the lowest + .75 (10th score-9th score)


= 87 + .75 ( 89-87)
= 89.5

Therefore:
For Grouped Scores

Here is a set of score from a class of 50 student. The scores have already been
summarized into a frequency distribution.
X f cf
95-99 3 50
90-94 4 47
85-89 5 43
80-84 8 38
75-79 6 30 Q3
70-74 10 24
65-69 4 14
60-64 4 10 Q1
55-59 2 6
50-54 0 4
45-49 1 4
40-44 3 3

Where : Q1= the first quartile (25th percentile)


LQ1 = lower limit of the first quartile (25th percentile class)
N = total number of frequencies in the distribution
cf = cumulative frequency of the first quartile
fQ1 = frequency of the first quartile (25th percentile)
i= size of the interval of the first quartile

Note: Just like in finding the median of a set of grouped scores, the first step is to get the
cumulative frequency. In the formula, N is divided by 4 (N/4), because we are looking for the
score which is at middle of the lower half of the distribution. Just like the median, the same
procedure is applied in looking for the value of cf, f and the lowest limit.

So,

= 64.5 + 5 [(50/4 - 10)/4]


= 64.5 + 5 [(12.5-10)/4]
= 64.5 + 5 (2.5/4)
= 64.5 + 5(.63)
= 64.5 + 3.15
= 67.65
Similarly, we use the same formula in finding Q3. The only difference is to get ¾ of N,
because we are looking for the score that lies at the middle of the top half of the distribution.

Where : Q3= the third quartile (75th percentile)


LQ3 = lower limit of the third quartile (75th percentile class)
N = total number of frequencies in the distribution
cf= cumulative frequency of the 3rd quartile class
fQ3 = frequency of the third quartile
i= size of the interval of the 3rd quartile class

So,

= 79.5 + 5 [ (37.5- 30)/8 ]


= 79.5 + 5 (7.5/8)
= 79.5 + 5 (.94)
= 79.5 + 4.7
= 84.2
Therefore: Semi-quartile range

DECILES

The score that divides the distribution into ten equal parts is called the decile. Just
like Q1 and Q3, we can compute the Decile by determining the number of cases required.
For example, we are looking for the 1st Decile (D1), then we divide N by 10, then 2N/10
for D2, 3N/10 for D3 , and so on.

For Ungrouped Data

Example . Find the 3nd Decile (D3) and 8th Decile (D8)

56 57 63 75 78 79 80 82 87 89 90 92 95 96 97

D3 = 3n/10, is the position of the 3rd decile (D3)


= 3 (15)/10
= 4.5
If the answer is not a whole number, round-up to the nearest whole number, and
that is the position of the decile. In this example 4.5 = 5 Thus, the 2nd decile (D2) is the
5th score = 78.

D8 = 8n/10 is the position of the 8th decile (D8)


= 8 (15)/10
= 12

If the answer is a whole number, get the average of that corresponding value in your
data set and the value that directly follows it. In this example, the 8th decile is the average of the
12th and the 13th score. That is (92+95)/2 = 93.5

For Grouped Scores

Just like in finding the quartiles of a set of grouped scores, the first step is to get the
cumulative frequency. The same procedure is applied in looking for the value of cf, f and the
lowest limit.

Example: Look for D2

X f cf
75-77 3 50
72-74 5 47
69-71 2 42
66-68 4 40
63-65 3 36
60-62 0 33
57-59 2 33
54-56 4 31
51-53 6 27
48-50 3 21 2nd Decile or D2 = LD2+ i [ (2N/10 – cf)/ f D2 ]
45-47 4 18 = 41.5 + 3 [(.20x50)-8]/6
42-44 6 14 D2 = 41.5 + 3 [(10-8)/6 ]
39-41 5 8 = 41.5 + 1.00
36-38 1 3 = 42.50
33-35 2 2
N=50
PERCENTILES

The same procedure may be used when we find the score below which any
percentage of the group falls. These values are called percentiles. The median is the 50th
percentile, i.e., the score below which 50 percent of individuals fall. If we want to find
the 40th percentile, we must find the score below which 40 percent of the cases fall. Any
other percentiles can be found in the same way. Percentiles have many uses, especially
in connection with test norms and interpretation of scores

For Ungrouped Data

Example . Find the 45th Percentile (P45) and 60th Percentile (P60)

56 57 63 75 78 79 80 82 87 89 90 92 95 96 97

P45 = 45n/100, is the position of the 45th Percentile (P45)


= 45(15)/100
= 6.75

The procedure in finding the decile of ungrouped scores is applied to percentiles


of ungrouped scores. That is, if the answer is not a whole number, round-up to the nearest
whole number, and that is the position of the percentile. In this example 6.75 = 7 Thus,
the 45th percentile (P45) is the 7th score = 80.

P60 = 60n/100 is the position of the 60th Percentile (P60)


= 60(15)/100
= 9

If the answer is a whole number, get the average of that corresponding value in your
data set and the value that directly follows it. In this example, the 60th percentile is the average of
the 9th and the 10th score. That is (87 + 89)/2 = 88.

For Grouped Scores

Just like in finding the quartiles and deciles of a set of grouped scores, the first step is to
get the cumulative frequency. The same procedure is applied in looking for the value of cf, f and
the lowest limit.
Example: Find P20
X f cf
75-77 3 50
72-74 5 47
69-71 2 42
66-68 4 40
63-65 3 36
60-62 0 33
57-59 2 33
54-56 4 31
51-53 6 27
48-50 3 21
45-47 4 18
42-44 6 14 P20
39-41 5 8
36-38 1 3
33-35 2 2
N=50

We are looking for the 20th percentile or the score in which 20 percent of the cases falls
below it, then

20th percentile or P20 = LP20+ i[ (20%N – cf)/ f P20 ]


= 41.5 + 3 [(.20x50)-8]/6
= 41.5 + 3 [ (10-8)/6 ]
= 41.5 + 1.00
= 42.50

Note: The values of P20 and D2 are the same because in either measure, we are looking
for the score in which 20 percent of the cases falls below it.

The Variance (Ungrouped Data)

The variance is a measure of variability among all scores in the distribution rather
than through extreme scores or only a proportion of the scores. It considers each
observation relative to the mean of the set of scores. It is derived by getting the sum of
the squared deviation from the mean divided by n-1 (for sample variance), and N (for
population variance)
Sample variance (s2)

where:
= deviation from the mean
= squared deviation
= sum of the squared deviation

Example 1. Compute the variance of the following Algebra scores of ten students:
92 75 85 83 90 73 79 80 88 85

Steps:
1. Find the Mean (Mean = 83)
2. Subtract the Mean from the scores to get d (i.e. 92-83 = 9; 75-83 = -8, etc.)
3. Square the deviation (i.e. 92 = 81, (-8)2 = 64,etc. )
4. Find the sum of the squared deviation (∑d2= 352)
5. Divide the sum of the squared deviation by the (n -1 = 9)

Score d d2
92 +9 81
75 -8 64
85 +2 4
83 0 0
90 +7 49
73 -10 100
79 -4 16
80 -3 9
88 +5 25
85 2 4
N = 10 ∑d2= 352
Mean = 83

s2 ; = 39.11
Example 2. Compute for variance of the following Geometry scores of the ten students:
92 95 75 63 45 87 99 90 98 86

Score d d2
92 9 81
95 12 144
75 -8 64
63 -20 400
45 -38 1444
87 4 16
99 16 256
90 7 49
98 15 225
86 3 9
N = 10 ∑d2= 2,688
Mean = 83

Sample variance (s2) = 298.67

The Standard Deviation (Ungrouped Data)

The standard deviation gives a better idea of how the data entries differ from the mean. It
is computed by extracting the square root of the variance. The formula for the sample standard

deviation is: or 2
.
Thus, in Example 1 (Algebra scores), the sample standard deviation
is . In Example 2 (Geometry scores), the sample standard deviation
is

So how do we interpret the standard deviation of 6.25 and 17.28. For the Algebra scores,
it means that on the average, the scores are 6.25 away from the mean. For the Geometry scores,
it means that on the average, the distance of the scores from the mean is 17.28. Theoretically,
standard deviation and variance describe how scattered the scores are from a central point (the
mean). In layman’s term, the higher the value of the standard deviation or variance, the more the
scores scatter from the mean. Thus, the distances of the scores are larger. Based on the two
given examples, the average scores are the same (Mean =83), and the number of scores is also
the same (n=10). But the scores for Geometry are farther away from each other and from the
mean, compared to the scores in Algebra.

An alternative way of computing for the standard deviation is to use the sum of all the
scores and the sum of all its squares. The formula is:

; where: Xi = the ith observed value for the given variable X


n = sample size
Using Example 1

Xi (scores) Xi2
92 8,464
75 5,625
85 7,225
83 6,889
90 8,100
73 5,329
79 6,241
80 6,400
88 7,744
85 7,225
∑Xi= 830 ∑Xi2= 69,242

Steps:
1. Get the sum of scores
2. Square all the scores and get the sum .
3. Substitute it with the formula

; ; = 6.25

In this illustration, the variance s2 is (6.25)2 = 39.11.


Using Example 2

Xi (scores) Xi2
92 8464
95 9025
75 5625
63 3969
45 2025
87 7569
99 9801
90 8100
98 9604
86 7396
∑Xi= 830 ∑Xi2= 71578

; ; =

In this illustration, the variance s2 is (17.28)2 = 298.67.

Finding the Standard Deviation and Variance of Grouped Data

The standard deviation and variance of grouped data are calculated using the class
marks of each step interval, or using the deviations

1. Finding SD using the Class Marks, the formula is:

where: = number of samples


f = frequency (number of observations in each class interval)
CM = class mark (the midpoint of each class interval)
2
= summation of frequency x class mark squared
= summation of frequency x class mark
Illustration:

Not
X F Class Mark f x CM f x (CM)2
(CM)
75-77 3 76 (75+77)/2 228 (3x76) 17328 (3x762)
72-74 4 73 292 21316
69-71 6 70 420 29400
66-68 5 67 335 22445
63-65 8 64 512 32768
60-62 9 61 549 33489
57-59 5 58 290 16820
54-56 8 55 440 24200
51-53 3 52 156 8112
48-50 2 49 98 4802
45-47 2 46 92 4232
N=55 3412 2
= 214912

=
=
= 7.75 The variance (s2) in this data set is (7.75)2 = 60.08.

2. Finding the SD using the deviations, the formula is:

where:
= interval
= number of samples
= summation of frequency deviation
= summation of frequency x squared deviation
Steps:
1. Choose any step interval for the assumed mean as the arbitrary starting point or
“origin”. In the example given, the interval 60-62 has been chosen. Call this interval zero
deviation, and the next higher interval +1, the lower interval -1, etc. These are shown in
the column labeled d. (Note: Any interval can be chosen, and the final result will be the
same)
2. Multiply frequency (f) by the number of deviations (d) and the resulting product is
shown in column labeled fd. Get the sum of fd by taking into account the plus and minus
signs.
3. To get fd2, multiply d by the fd. Then get the sum of fd2
Illustration:

X F d fd fd2
75-77 3 5 15 75
72-74 4 4 16 64
69-71 6 3 18 54
66-68 5 2 10 20
63-65 8 1 8 +67 8
60-62 9 0 0 0
57-59 5 -1 -5 5
54-56 8 -2 -16 32
51-53 3 -3 -9 27
48-50 2 -4 -8 32
45-47 2 -5 -10 - 48 50
N=55 ∑fd = +19 ∑fd2 = 367

= 7.75
The Coefficient of Variation

The coefficient of variation (CV) is a measure that compares the variability of two
sets of data. The formula is:

CV = x 100%

Using Example 1 (Algebra scores), the Standard deviation (s) = 6.25 and Mean = 83

CV = x 100% = 7.53 %

The computed CV of 7.53 indicates that the variability or the degree of


differences of the Algebra scores is relatively low (the scores are closed to each other).

Using Example 2 (Geometry scores), the Standard deviation (s) = 17.28 and Mean = 83

CV = x 100% = 20.82 %

The computed CV of 20.82 indicates that the variability of scores is relatively


higher compared to the data set in Example 1. This means that the Geometry scores
fluctuate more than the Algebra scores, or that the Geometry scores are more variable
than the Algebra scores.

You might also like