5 Item Analysis and Validation
5 Item Analysis and Validation
Analysis and
Validation
LEARNING OUTCOMES
Example: What is the item difficulty index of an item if 25 students are unable to answer
it correctly while 75 answered it correctly?
Here, the total number of students is 100, hence the item difficulty is 75Τ100 or 75%.
Another example: 25 students answered the item correctly while 75 students did not. The
total number of students is 100 so the difficulty index is 25Τ100 or 25 which is 25%. It is a difficult item
than that one with a difficulty index of 75.
A high percentage indicates an easy item/question while a low percentage indicates a
difficult item.
One problem with this type of difficulty index is that it may not actually indicate that the
item is difficult (or easy). A student who does not know the subject matter will naturally be
unable to answer the item correctly even if the question is easy. How do we decide on the basis
of this index whether the item is too difficult or too easy?
Difficult items tend to discriminate between those who know and those who do not
know the answer. Conversely, easy items cannot discriminate between these two groups of
students. We are therefore interested in deriving a measure that will tell us whether an item can
discriminate between these two groups of students. Such a measure is called an index of
discrimination.
6.1.1 Discrimination Index
The power of the item to discriminate the students between those who scored high and those who
scored low in the overall test. In other words, it is the power of the item to discriminate the students
who know the lesson and those who do not know the lesson. Discrimination index is the basis of
measuring the validity of an item. This index can be interpreted as an indication of the extent to which
overall knowledge of the content area or mastery of the skills is related to the response on an item.
Options A B C D E
Upper Group
Lower Group
*Note: Put asterisk for the correct answer
4. Compute the value of the difficulty index and the discrimination index and also the analysis of each
response in the distracters.
5. Make an analysis for each item.
An east way to derive such a measure is to measure how difficult an item is with respect
to those in the upper 25% of the class and how difficult it is with respect to those in the lower
25% of the class. If the upper 25% of the class found the item easy yet the lower 25% found it
difficult, then the item can discriminate properly between these two groups. Thus:
𝐼𝑛𝑑𝑒𝑥 𝑜𝑓 𝑑𝑖𝑠𝑐𝑟𝑖𝑚𝑖𝑛𝑎𝑡𝑖𝑜𝑛 = 𝐷𝑈 − 𝐷𝐿(𝑈 − 𝑈𝑝𝑝𝑒𝑟 𝑔𝑟𝑜𝑢𝑝; 𝐿 − 𝐿𝑜𝑤𝑒𝑟 𝑔𝑟𝑜𝑢𝑝)
Example: Obtain the index of discrimination of an item if the upper 25% of the class had
a difficulty index of 0.60 (i.e., 60% of the upper 25% got the correct answer) while the lower 25%
got the correct answer) while the lower 25% of the class had a difficulty index of 0.20.
Here, 𝐷𝑈 = 0.60 while 𝐷𝐿 = 0.20, thus 𝑖𝑛𝑑𝑒𝑥 𝑜𝑓 𝑑𝑖𝑠𝑐𝑟𝑖𝑚𝑖𝑛𝑎𝑡𝑖𝑜𝑛 = 0.60 − 0.20 = 0.40
Discrimination index is the difference between the proportion of the top scorers who
got an item correct and the proportion of the lowest scores who got the item right. The
discrimination index range is between -1 and +1. The closer the discrimination index is to +1, the
more effectively the item can discriminate or distinguish between the two groups of students. A
negative discrimination index means more from the lower group got the item correctly. The last
item is not good and so must be discarded.
Theoretically, the index of discrimination can range from -1.0 (when DU=0 and DL=1) to
1.0 (when DU=1 and DL=0). When the index of discrimination is equal to -1, then this means that
all of the lower 25% of the students got the wrong answer. In a sense, such an index discriminates
correctly between the two groups but the item itself is highly questionable. Why the bright ones
get the wrong answer, and the poor ones get the right answer? On the other hand, if the index of
discrimination is 1.0 then this means that all of the lower 25% failed to get the correct answer
while all the upper 25% got the correct answer. This is a perfectly discriminating item and is the
ideal item that should be included in the test. From these discussions, let us agree to discard or
revise all item that have negative discrimination index for although they discriminate correctly
between the upper and lower 25% of the class, the content of the item itself may be highly
dubious or doubtful. As in the case of the index of difficulty, we have the following rule of thumb:
Item Options
A B* C D
0 40 20 20 Total
1
0 15 5 0 Upper 25%
0 5 10 5 Lower 25%
The correct response is B. Let us compute the difficulty index and index of discrimination:
𝐷𝑈 = 𝑛𝑜.𝑜𝑓𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑖𝑛 𝑢𝑝𝑝𝑒𝑟 25% 𝑤𝑖𝑡ℎ 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒ൗ𝑛𝑜.𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑢𝑝𝑝𝑒𝑟 25%
= 15Τ20
= 0.75 or 75%
Item Options
A B* C D
0 40 20 20 Total
1
0 15 5 0 Upper 25%
0 5 10 5 Lower 25%
𝐷𝐿 = 𝑛𝑜.𝑜𝑓𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑖𝑛 𝑙𝑜𝑤𝑒𝑟 25% 𝑤𝑖𝑡ℎ 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒ൗ𝑛𝑜.𝑜𝑓 𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑟 25%
= 5Τ20
= 0.25 or 25%
It is also instructive to note that the distracter A is not an effective distracter since
this was never selected by the students. It is an implausible distracter. Distracters C and D
appear to have good appeal as distracters. They are plausible distracters.
Index of Difficulty
𝑅𝑈 +𝑅𝐿
𝑃= × 100
𝑇
Where:
𝑅𝑈 −The number in the upper group who answered the item correctly
𝑅𝐿 − The number in the lower group who answered the item correctly.
𝑇 − The total number who tried the item.
8
𝑃= × 100 = 40%
20
The smaller the percentage figure the more difficult the item.
Estimate the item discriminating power using the formula below:
𝑅 −𝑅 6−2
D = 1𝑈Τ 𝑇 𝐿 × 100 = 10 = 0.40
2
Reliability Interpretation
0.90 and above Excellent reliability; at the level of the best standardized tests
0.80 –0.90 Very good for a classroom test
0.70 – 0.80 Good for a classroom test; in the range of most. There are probably
a few items which could be improved.
0.60 – 0.70 Somewhat low. This test needs to be supplemented by other
measures (e.g., more tests) to determine grades. There are
probably some items which could be improved.
0.50 – 0.60 Suggests need for revision of test, unless it is quite short (ten or few
items). The test definitely needs to be supplemented by other
measures (e.g., more tests) for grading.
0.50 or below Questionable reliability. This test should not contribute heavily to
the course grade, and it needs revision.
Exercises 5
A. Write TRUE if the statement is correct and FALSE if it is wrong.
1. Difficulty index indicates the proportion of students who got the item right.
2. Difficulty index indicates the proportion of students who got the item wrong.
3. A high percentage indicates an easy item/question, and a low percentage indicates a difficult
item.
4. Authors agree, in general, that items should have values of difficulty no less than 20% correct
and no greater than 80%.
5. Very difficult or very easy items contribute greatly to the discriminating power of a test.
6. The discrimination index range is between -1 and +2.
7. The farther the index is to +1, the more effectively the item distinguishes between the two
groups of students.
8. When an item discriminates negatively, such item should be revised and eliminated from
scoring.
9. A positive discrimination index indicates that the lower performing students actually selected
the key or correct response more frequently than the top performers.
10.If no one selects distracter it is important to revise the option and attempt to make the
distracter a more plausible choice