Interpreting Test Scores: UNIT-8
Interpreting Test Scores: UNIT-8
INTERPRETING TEST
SCORES
Written By:
Muhammad Azeem
Reviewed By:
Dr. Muhammad Tanveer Afzal
CONTENT
Sr. No Topic Page No
Introduction ...........................................................................................165
Objectives ...........................................................................................165
8.1 Introduction of Measurement Scales and Interpretation of Test Scores ......166
8.2 Interpreting Test Scores by Percentiles........................................................167
8.3 Interpreting Test Scores by Percentages ......................................................171
8.4 Interpreting Test Scores by ordering and ranking ........................................173
8.4.1 Measurement Scales .......................................................................173
8.4.1.1 Nominal Scale ....................................................................173
8.4.1.2 Ordinal Scale......................................................................174
8.4.1.3 Interval Scale .....................................................................174
8.4.1.4 Ratio Scale .........................................................................174
8.5 Frequency Distribution ................................................................................175
8.5.1 Frequency Distribution Tables ............................................................175
8.6 Interpreting Test Scores by Graphic Displays of Distributions ..................179
8.7 Measures of Central Tendency ..................................................................184
8.7.1 Mean ...........................................................................................185
8.7.2 Median ...........................................................................................187
8.7.3 Mode ...........................................................................................188
8.8 Measures of Variability................................................................................188
8.8.1 Range ...........................................................................................189
8.8.2 Mean Deviation...............................................................................191
8.8.3 Variance ..........................................................................................192
8.8.4 Standard Deviation .........................................................................194
8.8.9 Estimation .......................................................................................194
8.10 Planning the Test .........................................................................................198
8.11 Constructing and Assembling the Test ......................................................1202
8.12 Test Administration .....................................................................................203
8.13 Self Assessment Questions .........................................................................205
8.14 References Suggested Reading’s .................................................................208
INTRODUCTION
Raw scores are considering as points scored in test when the test is scored according to the set procedure
or rubric of marking. These points are not meaningful without interpretation or further information.
Criterion referenced interpretation of test scores describes students’ scores with respect to certain criteria
while norm referenced interpretation of test scores describes students’ score relative to the test takers.
Test results are generally reported to parents as a feedback of their young one’s learning achievements.
Parents have different academic backgrounds so results should be presented them in understandable and
usable way. Among various objectives three of the fundamental purposes for testing are (1) to portray
each student's developmental level within a test area, (2) to identify a student's relative strength and
weakness in subject areas, and (3) to monitor time-to-time learning of the basic skills. To achieve any one
of these purposes, it is important to select the type of score from among those reported that will permit the
proper interpretation. Scores such as percentile ranks, grade equivalents, and percentage scores differ
from one another in the purposes they can serve, the precision with which they describe achievement, and
the kind of information they provide. A closer look at various types of scores will help differentiate the
functions they can serve and the interpretations or sense they can convey.
OBJECTIVES
After completing this unit, the students will be able to:
understand what are the test score?
understand what are the measurement scales used for test scores?
ways of interpreting test score
clarifying the accuracy of the test scores
explain the meaning of test scores
interpret test scores
usability of test scores
learn basic and significant concepts of statistics
understand and usage of central tendency in educational measurements
understand and usage of measure of variation in educational measurements
planning and administration of test
8.1 Introduction of Measurement Scales and Interpretation of Test Scores
Interpreting Test Scores
All types of research data, test result data, survey data, etc is called raw data and collected using four
basic scales. Nominal, ordinal, interval and ratio are four basic scales for data collection. Ratio is more
sophisticated than interval, interval is more sophisticated than ordinal, and ordinal is more sophisticated
than nominal. A variable measured on a "nominal" scale is a variable that does not really have any
evaluative distinction. One value is really not any greater than another. A good example of a nominal
variable is gender. With nominal variables, there is a qualitative difference between values, not a
quantitative one. Something measured on an "ordinal" scale does have an evaluative connotation. One
value is greater or larger or better than the other. With ordinal scales, we only know that one value is
better than other or 10 is better than 9. A variable measured on interval or ration scale has maximum
evaluative distinction. After the collection of data, there are three basic ways to compare and interpret
results obtained by responses. Students’ performance can be compare and interpreted with an absolute
standard, with a criterion-referenced standard, or with a norm-referenced standard. Some examples from
daily life and educational context may make this clear:
Sr. Standard Characteristics daily life educational context
No.
1 Absolute simply state the He is 6' and 2" He spelled correctly
observed outcome tall 45 out of 50 English
words
2 criterion- compare the He is tall His score of 40 out
referenced person's enough to of 50 is greater than
performance with a catch the minimum cutoff
standard, or branch of this point 33. So he must
criterion. tree. promoted to the
next class.
3 norm-referenced compare a person's He is the third His score of 37 out
performance with fastest ballar of 50 was not very
that of other people in the good; 65% of his
in the same context. pakistani class fellows did
squad 15. better.
All three types of scores interpretation are useful, depending on the purpose for which comparisons made.
An absolute score merely describes a measure of performance or achievement without comparing it with
any set or specified standard. Scores are not particularly useful without any kind of comparison.
Criterion-referenced scores compare test performance with a specific standard; such a comparison enables
the test interpreter to decide whether the scores are satisfactory according to established standards. Norm-
referenced tests compare test performance with that of others who were measured by the same procedure.
Teachers are usually more interested in knowing how children compare with a useful standard than how
they compare with other children; but norm-referenced comparisons may also provide useful insights.
For example, a score at the 60th percentile means that the individual's score is the same as or higher than
the scores of 60% of those who took the test. The 50th percentile is known as the median and represents
the middle score of the distribution.
Percentiles have the disadvantage that they are not equal units of measurement. For instance, a difference
of 5 percentile points between two individual’s scores will have a different meaning depending on its
position on the percentile scale, as the scale tends to exaggerate differences near the mean and collapse
differences at the extremes.
Percentiles cannot be averaged nor treated in any other way mathematically. However, they do have the
advantage of being easily understood and can be very useful when giving feedback to candidates or
reporting results to managers.
If you know your percentile score then you know how it compares with others in the norm group. For
example, if you scored at the 70th percentile, then this means that you scored the same or better than 70%
of the individuals in the norm group.
Percentile score is easily understood when tend to bunch up around the average of the group i.e. when
most of the student are the same ability and have score with very small rang.
To illustrate this point, consider a typical subject test consisting of 50 questions. Most of the students,
who are a fairly similar group in terms of their ability, will score around 40. Some will score a few less
and some a few more. It is very unlikely that any of them will score less than 35 or more than 45.
These results in terms of achievement scores are a very poor way of analyzing them. However, percentile
score can interpret results very clearly.
Definition
A percentile is a measure that tells us what percent of the total frequency scored at or below that
measure. A percentile rank is the percentage of scores that fall at or below a given score. OR
A percentile is a measure that tells us what percent of the total frequency scored below that
measure. A percentile rank is the percentage of scores that fall below a given score.
Both definitions are seams to same but statistically not same. For Example
Example No.1
If Aslam stand 25th out of a class of 150 students, then 125 students were ranked below Aslam.
Formula:
To find the percentile rank of a score, x, out of a set of n scores, where x is
included:
B 0.5E .100 percentile rank
n
Where B = number of scores below x
E = number of scores equal to x
n = number of scores
using this formula Aslam's percentile rank would be:
Formula:
To find the percentile rank of a score, x, out of a set of n scores, where x is not included:
number of scoresbelow x
.100 percentile rank
n
using this formula Aslam's percentile rank would be:
125
.83 83rd percentile
150
Therefore both definition yields different percentile rank. This difference is significant only for small
data. If we have raw data then we can find unique percentile rank using both formulae.
Example No.2
The science test scores are: 50, 65, 70, 72, 72, 78, 80, 82, 84, 84, 85, 86, 88, 88, 90, 94, 96, 98, 98,
99 Find the percentile rank for a score of 84 on this test.
Solution:
First rank the scores in ascending or descending order
50, 65, 70, 72, 72, 78, 80, 82, 84, |84, 85, 86, 88, 88, 90, 94, 96, 98, 98, 99
Since there are 2 values equal to 84, assign one to the group "above 84" and the other to the group "below
84".
Example No.3
The science test scores are: 50, 65, 70, 72, 72, 78, 80, 82, 84, 84, 85, 86, 88, 88, 90, 94, 96, 98, 98,
99. Find the percentile rank for a score of 86 on this test.
Solution:
First rank the scores in ascending or descending order
Since there is only one value equal to 86, it will be counted as "half" of a data value for the group "above
86" as well as the group "below 86".
Solution Using Formula:
B 0.5E .100 percentile rank
n
11 0.5(1) 11.5
.100 .100 58th percentile
20 20
Keep in Mind:
Percentile rank is a number between 0 and 100 indicating the percent of cases falling at or below
that score.
Percentile ranks are usually written to the nearest whole percent: 64.5% = 65% = 65th percentile
Scores are divided into 100 equally sized groups.
Scores are arranged in rank order from lowest to highest.
There is no 0 percentile rank - the lowest score is at the first percentile.
There is no 100th percentile - the highest score is at the 99th percentile.
Percentiles have the disadvantage that they are not equal units of measurement.
Percentiles cannot be averaged nor treated in any other way mathematically.
You cannot perform the same mathematical operations on percentiles that you can on raw
scores. You cannot, for example, compute the mean of percentile scores, as the results may be
misleading.
Quartiles can be thought of as percentile measure. Remember that quartiles break the data set
into 4 equal parts. If 100% is broken into four equal parts, we have subdivisions at 25%, 50%,
and 75% .creating the:
Example:
The marks detail of Hussan’s math test is shown. Find the percentage marks of Hussan.
Question Q1 Q2 Q3 Q4 Q5 Total
Marks 10 10 5 5 20 50
Marks 8 5 2 3 10 28
obtained
Solution:
Hussan’ s marks = 28
Total marks =50
Marks Obtained 28
Hussan got = 100 = 100 =56 %
Total Marks 50
For example, a number can be used merely to label or categorize a response. This sort of number
(nominal scale) has a low level of meaning. A higher level of meaning comes with numbers that order
responses (ordinal data). An even higher level of meaning (interval or ratio data) is present when numbers
attempt to present exact scores, such as when we state that a person got 17 correct out of 20. Although
even the lowest scale is useful, higher level scales give more precise information and are more easily
adapted to many statistical procedures.
Scores can be summarized by using either the mode (most frequent score), the median (midpoint of the
scores), or the mean (arithmetic average) to indicate typical performance. When reporting data, you
should choose the measure of central tendency that gives the most accurate picture of what is typical in a
set of scores. In addition, it is possible to report the standard deviation to indicate the spread of the scores
around the mean.
Scores from measurement processes can be either absolute, criterion referenced, or norm referenced. An
absolute score simply states a measure of performance without comparing it with any standard. However,
scores are not particularly useful unless they are compared with something. Criterion-referenced scores
compare test performance with a specific standard; such a comparison enables the test interpreter to
decide whether the scores are satisfactory according to established standards. Norm-referenced tests
compare test performance with that of others who were measured by the same procedure. Teachers are
usually more interested in knowing how children compare with a useful standard than how they compare
with other children; but norm referenced comparisons may also provide useful insights.
Criterion-referenced scores are easy to understand because they are usually straightforward raw scores or
percentages. Norm-referenced scores are often converted to percentiles or other derived standard scores.
A student's percentile score on a test indicates what percentage of other students who took the same test
fell below that student's score. Derived scores are often based on the normal curve. They use an arbitrary
mean to make comparisons showing how respondents compare with other persons who took the same
test.
Nominal Data
classification or gatagorization of data, e.g. male or female
no ordering, e.g. it makes no sense to state that male is greater than female (M > F) etc
arbitrary labels, e.g., pass=1 and fail=2 etc
Interval Data
ordered, constant scale, but no natural zero
differences make sense, but ratios do not (e.g., 30°-20°=20°-10°, but 20°/10° is not twice as hot!
e.g., temperature (C,F), dates
Ratio Data
ordered, constant scale, natural zero
e.g., height, weight, age, length
One can think of nominal, ordinal, interval, and ratio as being ranked in their relation to one another.
Ratio is more sophisticated than interval, interval is more sophisticated than ordinal, and ordinal is more
sophisticated than nominal.
Distribution
The distribution of a variable is the pattern of frequencies of the observation.
Frequency Distribution
It is a representation, either in a graphical or tabular format, which displays the number of
observations within a given interval. Frequency distributions are usually used within a statistical context.
Step 1:
Figure out how many classes (categories) you need. There are no hard rules about how many
classes to pick, but there are a couple of general guidelines:
Pick between 5 and 20 classes. For the list of IQs above, we picked 5 classes.
Make sure you have a few items in each category. For example, if you have 20 items, choose 5
classes (4 items per category), not 20 classes (which would give you only 1 item per category).
Step 2:
Subtract the minimum data value from the maximum data value. For example, our the IQ list
above had a minimum value of 118 and a maximum value of 154, so:
154 – 118 = 36
Step 3:
Divide your answer in Step 2 by the number of classes you chose in Step 1.
36 / 5 = 7.2
Step 4:
Round the number from Step 3 up to a whole number to get the class width. Rounded up, 7.2
becomes 8.
Step 5:
Write down your lowest value for your first minimum data value:
The lowest value is 118
Step 6:
Add the class width from Step 4 to Step 5 to get the next lower class limit:
118 + 8 = 126
Step 7:
Repeat Step 6 for the other minimum data values (in other words, keep on adding your class
width to your minimum data values) until you have created the number of classes you chose in
Step 1. We chose 5 classes, so our 5 minimum data values are:
118
126 (118 + 8)
134 (126 + 8)
142 (134 + 8)
150 (142 + 8)
Step 8:
Write down the upper class limits. These are the highest values that can be in the category, so in
most cases you can subtract 1 from class width and add that to the minimum data value. For
example:
118 + (8 – 1) = 125
118 – 125
126 – 133
134 – 142
143 – 149
150 – 157
Step 9:
Add a second column for the number of items in each class, and label the columns with
appropriate headings:
IQ Number
118 – 125
126 – 133
134 – 142
143 – 149
150 – 157
Step 10:
Count the number of items in each class, and put the total in the second column. The list of IQ
scores are: 118, 123, 124, 125, 127, 128, 129, 130, 130, 133, 136, 138, 141, 142, 149, 150, 154.
IQ Number
118 – 125 4
126 – 133 6
134 – 142 4
143 – 149 1
150 – 157 2
Example 2
A survey was taken in Lahore. In each of 20 homes, people were asked how many cars were registered to
their households. The results were recorded as follows:
1, 2, 1, 0, 3, 4, 0, 1, 1, 1, 2, 2, 3, 2, 3, 2, 1, 4, 0, 0
Use the following steps to present this data in a frequency distribution table.
1. Divide the results (x) into intervals, and then count the number of results in each interval. In this
case, the intervals would be the number of households with no car (0), one car (1), two cars (2)
and so forth.
2. Make a table with separate columns for the interval numbers (the number of cars per household),
the tallied results, and the frequency of results in each interval. Label these columns Number of
cars, Tally and Frequency.
3. Read the list of data from left to right and place a tally mark in the appropriate row. For example,
the first result is a 1, so place a tally mark in the row beside where 1 appears in the interval
column (Number of cars). The next result is a 2, so place a tally mark in the row beside the 2, and
so on. When you reach your fifth tally mark, draw a tally line through the preceding four marks to
make your final frequency calculations easier to read.
4. Add up the number of tally marks in each row and record them in the final column
entitled Frequency.
Your frequency distribution table for this exercise should look like this:
Table 1. Frequency table for the number of cars registered in each household
0 4
1 6
2 5
3 3
4 2
By looking at this frequency distribution table quickly, we can see that out of 20 households surveyed,
4 households had no cars, 6 households had 1 car, etc.
Relative frequency and percentage frequency
An analyst studying these data might want to know not only how long batteries last, but also what
proportion of the batteries falls into each class interval of battery life.
This relative frequency of a particular observation or class interval is found by dividing the frequency (f)
by the number of observations (n): that is, (f ÷ n). Thus:
Relative frequency = frequency ÷ number of observations
The percentage frequency is found by multiplying each relative frequency value by 100. Thus:
Percentage frequency = relative frequency X 100 = f ÷ n X 100
8.6 Interpreting Test Scores by Graphic Displays of Distributions
The data from a frequency table can be displayed graphically. A graph can provide a visual display of the
distributions, which gives us another view of the summarized data. For example, the graphic
representation of the relationship between two different test scores through the use of scatter plots. We
learned that we could describe in general terms the direction and strength of the relationship between
scores by visually examining the scores as they were arranged in a graph. Some other examples of these
types of graphs include histograms and frequency polygons.
A histogram is a bar graph of scores from a frequency table. The horizontal x-axis represents the scores
on the test, and the vertical y-axis represents the frequencies. The frequencies are plotted as bars.
A frequency polygon is a line graph representation of a set of scores from a frequency table. The
horizontal x-axis is represented by the scores on the scale and the vertical y-axis is represented by the
frequencies.
Frequency polygons are a graphical device for understanding the shapes of distributions. They serve the
same purpose as histograms, but are especially helpful in comparing sets of data. Frequency polygons are
also a good choice for displaying cumulative frequency distributions.
To create a frequency polygon, start just as for histograms, by choosing a class interval. Then draw an X-
axis representing the values of the scores in your data. Mark the middle of each class interval with a tick
mark, and label it with the middle value represented by the class. Draw the Y-axis to indicate the
frequency of each class. Place a point in the middle of each class interval at the height corresponding to
its frequency. Finally, connect the points. You should include one class interval below the lowest value in
your data and one above the highest value. The graph will then touch the X-axis on both sides.
A frequency polygon for 642 psychology test scores is shown in Figure 1. The first label on the X-axis is
35. This represents an interval extending from 29.5 to 39.5. Since the lowest test score is 46, this interval
has a frequency of 0. The point labeled 45 represents the interval from 39.5 to 49.5. There are three scores
in this interval. There are 150 scores in the interval that surrounds 85.
You can easily discern the shape of the distribution from Figure 1. Most of the scores are between 65 and
115. It is clear that the distribution is not symmetric inasmuch as good scores (to the right) trail off more
gradually than poor scores (to the left). In the terminology of Chapter 3 (where we will study shapes of
distributions more systematically), the distribution is skewed.
A cumulative frequency polygon for the same test scores is shown in Figure 2. The graph is the same as
before except that the Y value for each point is the number of students in the corresponding class interval
plus all numbers in lower intervals. For example, there are no scores in the interval labeled "35," three in
the interval "45,"and 10 in the interval "55."Therefore the Y value corresponding to "55" is 13. Since 642
students took the test, the cumulative frequency for the last interval is 642.
Frequency polygons are useful for comparing distributions. This is achieved by overlaying the frequency
polygons drawn for different data sets. Figure 3 provides an example. The data come from a task in which
the goal is to move a computer mouse to a target on the screen as fast as possible. On 20 of the trials, the
target was a small rectangle; on the other 20, the target was a large rectangle. Time to reach the target was
recorded on each trial. The two distributions (one for each target) are plotted together in Figure 3. The
figure shows that although there is some overlap in times, it generally took longer to move the mouse to
the small target than to the large one.
It is also possible to plot two cumulative frequency distributions in the same graph. This is illustrated
in Figure 4 using the same data from the mouse task. The difference in distributions for the two targets is
again evident.
Solution
8.7 Measures of Central Tendency
Suppose that a teacher gave the same test to two different classes and following results are obtained:
Class 1: 80%, 80%, 80%, 80%, 80%
Class 2: 60%, 70%, 80%, 90%, 100%
If you calculate the mean for both sets of scores, you get the same answer: 80%. But the data of two
classes from which this mean was obtained was very different in the two cases. It is also possible that two
different data sets may have same mean, median, and mode. For example:
Class A: 72 73 76 76 78
Class B: 67 76 76 78 80
Therefore class A and class B has same mean, mode, and median.
The way that statisticians distinguish such cases as this is known as measuring the variability of the
sample. As with measures of central tendency, there are a number of ways of measuring the variability of
a sample.
Probably the simplest method is to find the range of the sample, that is, the difference between the largest
and smallest observation. The range of measurements in Class 1 is 0, and the range in class 2 is 40%.
Simply knowing that fact gives a much better understanding of the data obtained from the two classes. In
class 1, the mean was 80%, and the range was 0, but in class 2, the mean was 80%, and the range was
40%.
Statisticians use summary measures to describe patterns of data. Measures of central tendency refer to
the summary measures used to describe the most "typical" value in a set of values.
Here, we are interested in the typical, most representative score. There are three most common measures
of central tendency are mean, mode, and median. A teacher should be familiar with these common
measures of central tendencies.
8.7.1 Mean
The mean is simply the arithmetic average. It is sum of the scores divided by the number of scores. it is
computed by adding all of the scores and dividing by the number of scores. When statisticians talk about
the mean of a population, they use the Greek letter μ to refer to the mean score. When they talk about the
mean of a sample, statisticians use the symbol to refer to the mean score.
It is symbolized as: X=
X
N
(read as "X-Bar") when computed on a sample
Computation - Example: find the mean of 2,3,5, and 10.
X=
X
=
2 3 5 10 20
= =5
N 4 4
Since means are typically reported with one more digit of accuracy that is present in the data, I reported
the mean as 5.0 rather than just 5.
Example 1
The marks of seven students in a mathematics test with a maximum possible mark of 20 are given below:
15 13 18 16 14 17 12
Find the mean of this set of data values.
Solution:
For example:
95-99 97 1 97
90-94 92 3 276
85-89 87 5 435
80-84 82 6 492
75-79 77 4 308
70-74 72 3 216
65-69 67 1 67
60-64 62 2 124
f=25=N Mid*f=2015
8.7.2 Median or Md
The score that cuts the distribution into two equal halves (or the middle score in the distribution).
The median of a set of data values is the middle value of the data set when it has been arranged in
ascending order. That is, from the smallest value to the highest value.
Example
The marks of nine students in a geography test that had a maximum possible mark of 50 are given below:
47 35 37 32 38 39 36 34 35
Find the median of this set of data values.
Solution:
Arrange the data values in order from the lowest value to the highest value:
32 34 35 35 36 37 38 39 47
The fifth data value, 36, is the middle value in this arrangement.
Median = 36
In general:
Median =
1
n 1 th value, where n is the number of data values in the sample.
2
If the number of values in the data set is even, then the median is the average of the two middle values.
Fortunately, there is a formula to take care of the more complicated situations, including computing the
median for grouped frequency distributions.
Where:
L = Lower exact limit of the interval containing Md.
8.7.3 Mode
Mode is the most frequently occurring score. Note:
o There can be more than one. Can have bi- or tri-modal distributions and then speak of major and
minor modes.
o It is symbolized as Mo.
Example: Find the mode of 2,2,6,0,9 6,8 5,4,5,4,6,4,7,4
Solution: 4 is most frequent occurring score therefore mode is 4.
8.8.1 Range
Probably range is the simplest method to find variability of the sample, that is, the difference between the
largest/maximum/highest and smallest/minimum/lowest observation.
Range = Highest value - Lowest value
R = XH - XL
Example:
The range of the saleem’s four tests scores (3, 5, 5, 7) is:
XH = 7 and XL = 3
Therefore R = XH - XL= 7- 3= 4
Example
Consider the previous example in which results of the two different classes are:
Class 1: 80%, 80%, 80%, 80%, 80%
Class 2: 60%, 70%, 80%, 90%, 100%
The range of measurements in Class 1 is 0, and the range in class 2 is 40%. Simply knowing that fact
gives a much better understanding of the data obtained from the two classes. In class 1, the mean was
80%, and the range was 0, but in class 2, the mean was 80%, and the range was 40%. The relationship
between rang and variability can be graphically show as:
The range of Distribution A and B is the same, although Distribution A has more variability.
Co-efficient of Range
It is relative measure of dispersion and is based on the value of range. It is also called range co-efficient
of dispersion. It is defined as:
Co-efficient of Range = (XH – XL) / (XH + XL)
Let us take two sets of observations. Set A contains marks of five students in Mathematics out of 25
marks and group B contains marks of the same student in English out of 100 marks.
Set A: 10, 15, 18, 20, 20
Set B: 30, 35, 40, 45, 50
The values of range and co-efficient of range are calculated as:
Range Coefficient of Range
20 10
Set A: (Mathematics) 20–10=10 0.33
20 10
50 30
Set B: (English) 50–30=20 0.25
50 30
In set A the range is 10 and in set B the range is 20. Apparently it seems as if there is greater
dispersion in set B. But this is not true. The range of 20 in set B is for large observations and the range of
10 in set A is for small observations. Thus 20 and 10 cannot be compared directly. Their base is not the
same. Marks in Mathematics are out of 25 and marks of English are out of 100. Thus, it makes no sense
to compare 10 with 20. When we convert these two values into coefficient of range, we see that
coefficient of range for set A is greater than that of set B. Thus there is greater dispersion or variation in
set A. The marks of students in English are more stable than their marks in Mathematics.
x X X
M .D
N N
Thus for sample data in which the suitable average is the X , the mean deviation ( M .D ) is given by the
relation:
X X
M .D
n
For frequency distribution, the mean deviation is given by
f X X
M .D
f
Example:
Calculate the mean deviation from arithmetic mean in respect of the marks obtained by nine students
gives below and show that the mean deviation from median is minimum.
Marks (out of 25): 7, 4, 10, 9, 15, 12, 7, 9, 7
Solution:
After arranging the observations in ascending order, we get
Marks: 4, 7, 7, 7, 9, 9, 10, 12, 15
X 80
Mean 8.89
n 9
Marks X X X
4 1.89
7 1.89
7 1.89
7 1.89
9 0.11
9 0.11
10 1.11
12 3.11
15 6.11
Total 21.11
X X 21.11
M .D from mean 2.35
n 9
8.8.3 Variance
Variance is another absolute measure of dispersion. It is defined as the average of the squared
difference between each of the observations in a set of data and the mean. For a sample data the
variance is denoted is denoted by S2 and the population variance is denoted by 2 (sigma square).
That is:
Thus another name for the Variance is the Mean of the Squared Deviations About the Mean (or more
simply, the Mean of Squares (MS)). The problem with the MS is that its units are squared and thus
represent space, rather than a distance on the X axis like the other measures of variability.
Example:
Calculate the variance for the following sample data: 2, 4, 8, 6, 10, and 12.
Solution:
X XX 2
2 (2–7)2 = 25
4 (4–7)2 = 9
8 (8–7)2 = 1
6 (6–7)2 = 1
10 (10–7)2 = 9
12 (12–7)2 = 25
X=42
X X =70
2
X 42
X 7
n 6
S
2
X X
2
S
2
X X
2
n
70 35
S2 11.67
6 3
Variance = S2 = 11.67
Variance is another absolute measure of dispersion. It is defined as the average of the squared difference
between each of the observations in a set of data and the mean.
A simple solution to the problem of the MS representing a space is to compute its square root. That is:
Since the standard deviation can be very small, it is usually reported with 2-3 more decimals of accuracy
than what is available in the original data.
The standard deviation is in the same units as the units of the original observations. If the original
observations are in grams, the value of the standard deviation will also be in grams. The standard
deviation plays a dominating role for the study of variation in the data. It is a very widely used measure of
dispersion. It stands like a tower among measure of dispersion. As far as the important statistical tools are
concerned, the first important tool is the mean x and the second important tool is the standard deviation
. It is based on all the observations and is subject to mathematical treatment. It is of great importance
for the analysis of data and for the various statistical inferences.
Properties of the Variance & Standard Deviation:
1. Are always positive (or zero).
2. Equal zero when all scores are identical (i.e., there is no variability).
3. Like the mean, they are sensitive to all scores.
Example: in previous example
Variance = S2 = 11.67
8.8.9 Estimation
Estimation is the goal of inferential statistics. We use sample values to estimate population values. The
symbols are as follows:
Mean X µ
Variance s2 x2
Standard Deviation s x
It is important that the sample values (estimators) be unbiased. An unbiased estimator of a parameter is
one whose average over all possible random samples of a given size equals the value of the parameter.
Overall Example
Let's reconsider an example from above of two distributions (A & B):
Distribution A B
150 150
145 110
100 100
Data
100 100
55 90
50 50
600 600
N 6 6
X 100 100
A X X X2
150 100 50 2500
145 100 45 2025
100 100 0 0
100 100 0 0
55 100 -45 2025
50 100 -50 2500
600 0 9050
N 6
9050 9050
1810
6 1 5
Note that calculating the variance and standard deviation in this manner requires computing the mean and
subtracting it from each score. Since this is not very efficient and can be less accurate as a result of
rounding error, a computational formula is typically used. It is given as follows:
A X2
150 22500
145 21025
100 10000
100 10000
55 3025
50 2500
600 69050
N 6
Then, plugging in the appropriate values into the computational formula gives:
Note that the defining and computational formulas give the same result, but the computational formula is
easier to work with (and potentially more accurate due to less rounding error).
B X2
150 22500
110 12100
100 10000
100 10000
90 8100
50 2500
600 65200
N 6
Then, plugging in the appropriate values into the computational formula gives:
8.10 Planning the Test
One essential step in planning a test is to decide why you are giving the test. (The word
"test" is used although we are using it in a broad sense that includes performance
assessments as well as traditional paper and pencil tests.)
Are you trying to sort the students (so you can compare them, giving higher scores to
better students and lower scores to poor students)? If so, you will want to include some
difficult questions that you expect only a few of the better students will be able to
answer correctly. Or do you want to know how many of the students have mastered the
content? If your purpose is the latter, you have no need to distribute the scores, so very
difficult questions are unnecessary. You will, however, have to decide how many
correct answers are needed to demonstrate mastery. Another way to address the "why"
question is to identify if this is to be a formative assessment to help you diagnose
students' problems and guide future instruction, or a summative measure to determine
grades that will be reported to parents.
Airasian (1994) lists six decisions usually made by the classroom teacher in the test
development process: 1. what to test, 2. how much emphasis to give to various
objectives, 3. what type of assessment (or type of questions) to use, 4. how much time to
allocate for the assessment, 5. how to prepare the students, and 6. whether to use the test
from the textbook publisher or to create your own. Other decisions, such as whether to
use a separate answer sheet, arise later.
You, as the teacher, decide what to assess. The term "assess" is used here because the
term "assess" is frequently associated only with traditional paper and pencil
assessments, to the exclusion of alternative assessments such as performance tasks and
portfolios. Classroom assessments are generally focused on content that has been
covered in the class, either in the immediate past or (as is the case with unit, semester,
and end-of-course tests) over a longer period of time. For example, if we were
constructing a test for preservice teachers on writing test questions, we might have the
following objectives:
Now that we have made the what decision, we can move to the next step: deciding
how much emphasis to place on each objective. We can look at the amount of time in
class we have devoted to each objective. We can also review the number and types of
assignments the students have been given. For this example, let's assume that 20% of
the assessment will be based on knowing the advantages and disadvantages, 40% will
be on differentiating between well written and poorly written questions, and the other
40% will be on writing good questions. Now our planning can be illustrated with the use
of a table of specifications (also called a test plan or a test blueprint) as shown in table
below.
Table of Specifications:
#
Objectives/Content items/
Knowledge Comprehension Application
area/Topics % of
test
2. Be able to differentiate
between well and poorly
40%
written selection-type
questions
3. Be able to construct
appropriate selection-type
questions using the 40%
guidelines and rules that
were presented in class.
A table of specifications is a two-way table that matches the objectives or content you
have taught with the level at which you expect students to perform. It contains an
estimate of the percentage of the test to be allocated to each topic at each level at which
it is to be measured. In effect we have established how much emphasis to give to each
objective or topic.
In estimating the time needed for this test, students would probably need from 5 to 10
minutes for the 20 True-False questions (15-30 seconds each), 5-7 1/2 minutes for the
five comprehension questions (60-90 seconds each), and 20-30 minutes (rough estimate)
to read the material and write the four questions measuring application. The total time
needed would be from 30 to 48 minutes. If you are a middle or high school teacher,
estimated response time is an important consideration. You will need to allow enough
time for the slowest students to complete your test, and it will need to fit within a single
class period.
Accommodations
Accommodations may be needed for some of your students. It is helpful to keep those
students in mind as you plan your assessments. Some examples of accommodations
include:
Providing written instructions for students with hearing problems
Using large print, reading or recording the questions on audiotape (The student could
record the answers on tape.)
Having an aide or assistant write/mark the answers for the student who has coordination
problems, or having the student record the answers on audiotape or type the answers
Using written assessments for students with speech problems
Administering the test in sections if the entire test is too long for the attention of a
student Asking the students to repeat the directions to make sure they understand what
they are to do
Starting each sentence on a new line helps students identify it as a new sentence
Including an example with each type of question, showing how to mark answers