Item Analysis and Validation: Learning Outcomes
Item Analysis and Validation: Learning Outcomes
Item Analysis and Validation: Learning Outcomes
“If you don’t validate your own feelings, no one else can.”
- Anonymous
LEARNING OUTCOMES
PRETEST
Direction: Read the statement and options carefully and encircle the letter of your answer.
1. The index of difficulty of a particular test item is 0. 24. What does this mean? My students _________.
a. gained mastery over an item
b. performed very well against expectation
c. found that test item was either easy nor difficult
d. found that the test item was hard
2. The discrimination index of a test item is -0.46. What does this imply?
a. More students from the upper group answered the item incorrectly.
b. More students from the upper group answered the item correctly.
c. More students from the lower group answered the item correctly.
d. The number of students from the lower group and upper group who answered the item correctly are equal.
3. Doc Zam gave a test in Assessment of Learning 1. Item no. 18 has a difficulty index of 0.85 and
discrimination index of -0.10. What should he do?
a. retain the item
b. make the item bonus
c. reject the item
d. reject it and make the item bonus
4. The difficulty index of test item no. 7 is 1. What does this imply?
a. The test is very difficult
b. The test is very easy
c. The test item is a quality item
d. Nobody got the item correctly
5. In an item analysis, Doc Tej found out that more from the lower group got the test item #9 correctly. This
means that the test item ________.
a. has a negative discriminating power
b. has a lower validity
c. has a positive discriminating power
d. has a high reliability
1
CONTENT
The teacher normally prepares a draft of the test. Such a draft is subjected to item analysis and
validation in order to ensure that the final version of the test would be useful and functional. First, the
teacher tries out the draft test to a group of students of similar characteristics as the intended test takers
(try-out phase). From the try-out group, each item will be analyzed in terms of its ability to discriminate
between those who know and those who do not know and also its level of difficulty (item analysis phase).
The item analysis will provide information that will allow the teacher to decide whether to revise or replace
an item (item revision phase). Then, finally, the final draft of the test is subjected to validation if the intent is
to make use of the test as a standard test for the particular unit or grading period. We shall be concerned
with these concepts in this module.
ITEM ANALYSIS: DIFFICULTY INDEX AND DISCRIMINATION INDEX
There are two important characteristics of an item that will be of interest to the teacher. These are
(a) item difficulty and (b) discrimination index. We shall learn how to measure these characteristics and
apply our knowledge in making a decision about the item in question.
The difficulty of an item or item difficulty is defined as the number of students who are able to
answer the item correctly divided by the total number of students. Thus:
Item difficulty = number of students with correct answer/total number of students
The item difficulty is usually expressed in percentage.
Example: What is the item difficulty index of an item if 25 students are unable to answer it correctly while
75 answered it correctly?
Here, the total number of students is 100, hence the item difficulty index is 75/100 or 75%.
Another example: 25 students answered the item correctly while 75 students did not. The total number of
students is 100 so the difficulty index is 25/100 or 25 which is 25%.
It is a more difficult test item than that one with a difficulty index of 75.
A high percentage indicates an easy item/question while a low percentage indicates a difficult item.
One problem with this type of difficulty index is that it may not actually indicate that the item is
difficult (or easy). A student who does not know the subject matter will naturally be unable to answer the
item correctly even if the question is easy. How do we decide on the basis of this index whether the item is
too difficult or too easy?
The following arbitrary rule is often used in the literature:
Range of Difficulty Index Interpretation Action
0 – 0.25 Difficult Revise or discard
0.26 – 0.75 Right difficulty Retain
0.76 – above Easy Revise or discard
Difficult items tend to discriminate between those who know and those who do not know the
answer. Conversely, easy items cannot discriminate between these two groups of students. We are
therefore interested in deriving a measure that will tell us whether an item can discriminate between these
two groups of students. Such a measure is called an index of discrimination.
An easy way to derive such a measure is to measure how difficult an item is with respect to those in
the upper 25% of the class and how difficult it is with respect to those in the lower 25% of the class. If the
upper 25% of the class found the item easy yet the lower 25% found it difficult, then the item can
discriminate properly between these two groups. Thus:
Index of discrimination = DU – DL (U – Upper group; L- lower group)
Example: Obtain the index of discrimination of an item if the upper 25% of the class had a difficulty
index of 0.60 (i.e. 60% of the upper 25% got the correct answer) while the lower 25% of the class had a
difficulty index of 0.20
Here, DU = 0.60 while DL = 0.20, thus index of discrimination = 0.60 – 0.20 = 0.40
Discrimination index is the difference between the proportion of the top scorers who got an item
correct and the proportion of the lowest scorers who got the item right. The discrimination index range is
between -1 and +1. The closer the discrimination index is to +1, the more effectively the item can
discriminate or distinguish between the two groups of students. A negative discrimination index means
more from the lower group got the item correctly. The last item is not good and so must be discarded.
Theoretically, the index of discrimination can range from -1.0 (when DU = 0 and DL = 1) to 1.0
2
(when DU = 1 and DL = 0). When the index of discrimination is equal to -1, then this means that all of the
lower 25% of the students got the correct answer while all of the upper 25% got the wrong answer. In a
sense, such an index discriminates correctly between the two groups but the item itself is highly
questionable. Why should the bright ones get the wrong answer and the poor ones get the right answer? On
the other hand, if the index of discrimination is 1.0, then this means that all of the lower 25% failed to get the
correct answer while all of the upper 25% got the correct answer. This is a perfectly discriminating item and
is the ideal item that should be included in the test. From these discussions, let us agree to discard or revise
all items that have negative discrimination index for although they discriminate correctly between the upper
and lower 25% of the class, the content of the item itself may be highly dubious or doubtful. As in the case of
the index of difficulty, we have the following rule of thumb:
Index Range Interpretation Action
-1.0 – -.50 Can discriminate Discard
but item is questionable
-.55 – -0.45 Non-discriminating Revise
0.46 – 1.0 Discriminating item Include
Example: Consider a multiple choice type of test of which the following data were obtained:
Item Options
1 A B* C D
0 40 20 20 Total
0 15 5 0 Upper 25%
0 5 10 5 Lower 25%
The correct response is B. Let us compute the difficulty index and index of discrimination:
Difficulty index = no. of students getting correct response/total
= 40/100
= 40%, within range of a “good item”
Index of Difficulty
where:
Ru – The number in the upper group who answered the item correctly
Rl – The number in the lower group who answered the item correctly
T – The total number who tried the item
3
Index of Item Discriminating Power
where:
P – percentage who answered the item correctly (index of difficulty)
R – number who answered the item correctly
T – total number who tried the item
The smaller the percentage figure the more difficult the item
Estimate the item discriminating power using the formula below:
The discriminating power of an item is reported as a decimal fraction; maximum discriminating
power is indicated by an index of 1.00.
Maximum discrimination is usually found at the 50 percent level of difficulty
0.00– 0.20 = very difficult
0.21 – 0.80 = moderately difficult
0.81 – 1.00 = very easy
For classroom achievement tests, most test constructors desire items with indices of difficulty no
lower than 20 nor higher than 80, with an average index of difficulty from 30 to 40 to a maximum of 60.
The INDEX OF DISCRIMINATION is the difference between the proportion of the upper group who
got an item right and the proportion of the lower group who got the item right. This index is dependent
upon the difficulty of an item. It may reach a maximum value of 100 for an item with an index of difficulty of
50, that is, when 100% of the upper group and none of the lower group answer the item correctly. For
items of less than or greater than 50 difficulty, the index of discrimination has a maximum value of less
than 100.
4
The Item- Analysis Procedure for Norm provides the following information:
1.The difficulty of the item;
2.The discriminating power of the item, and
3.The effectiveness of each alternative
Some benefits derived from Item Analysis are:
1.It provides useful information for class discussion of the test.
2.It provides data which help students improve their learning.
3.It provides insights and skills that lead to the preparation for better tests in the future.
5
Criterion-related validity is also known as concrete validity because criterion validity refers to a
test’s correlation with a concrete outcome.
In the case of pre-employment test, the two variables that are compared are test scores and
employee performance.
There are 2 main types of criterion validity – concurrent validity and predictive validity. Concurrent
validity refers to a comparison between the measure in question and an outcomes assessed at the same
time.
An example of concurrent validity is a comparison of the scores with NAT Math exam with course
grades in Grade 12 Math. In predictive validity, we ask this question: Do the scores in NAT Math exam
predict the Math grade in Grade 12.
Reliability
Reliability refers to the consistency of the scores obtained – how consistent they are for each
individual from one administration of an instrument to another and from one set of items to another. We
already gave the formula for computing the reliability of a test: for internal consistency; for instance, we
could use the split-half method of the Kuder-Richardson formulae (KR-20 or KR-21)
Reliability and validity are related concepts. If an instrument is unreliable, it cannot get valid
outcomes. As reliability improves, validity may improve (or it may not). However, if an instrument is shown
scientifically to be valid when it is almost certain that it is also reliable.
Predictive validity compares the question with an outcome assessed at a later time. An
example of predictive validity is a comparison of scores in the National Achievement Test (NAT) with first
semester grade point average (GPA) in college. Do NAT scores predict college performance? Construct
validity refers to the ability of a test to measure what it is supposed to measure. As researcher, you intend
to measure depression but you actually measure anxiety as your research gets compromised.
The following table is a standard followed almost universally in educational test and measurement.
Reliability Interpretation
.90 and above Excellent reliability; at the level of the best standardized tests
.80 - .90 Very good for a classroom test
.70 - .80 Good for a classroom test; in the range of most. There are probably a few items which
could be improved.
.60 – 0.70 Somewhat low. This test needs to be supplemented by other measures (e.g., more
tests) to determine grades. There are probably some items which could be improved.
.50 - .60 Suggests need for revision of test, unless it is quite short (ten or fewer items). The
test definitely needs to be supplemented by other measures (e.g., more tests) for
grading.
.50 or below Questionable reliability. This test should not contribute heavily to the course grade,
and it needs revision.
LEARNING ACTIVITIES
6
Item – Analysis – Discrimination and Difficulty Index
A. Give the term described/explained
__________1. Refers to a statistical technique that helps instructors identify the effectiveness of their test
items.
__________2. Refers to the proportion of students who got the test item correctly.
__________3. Which is the difference between the proportion of the top scorers who got an item correct and
the proportion of the bottom scorers who got the item right?
__________4. Which one is concerned with how easy or difficult a test item is?
__________5. Which adjective describes an effective distracter?
C. Problem Solving
Solve for the difficulty index of each test item:
Item No. 1 2 3 4 5
No. of correct responses 2 10 20 30 15
No. of Students 50 30 30 30 40
Difficulty Index
Item No. 1 2 3 4 5
UG LG UG LG UG LG UG LG UG LG
No. of Correct 12 20 10 20 20 10 10 24 20 5
Responses
No. of Students 25 25 25 25 25 25 25 25 25 25
Discrimination Index
7
Based on the computed discrimination index, which are good test items? Not good test items?
_____________________________________________________________________________________
_____________________________________________________________________________________
_____________________________________________________________________________________
E. Study the following data. Compute for the difficulty index and the discrimination index of each
set of scores.
1. N = 80, number of wrong answers: upper 25% = 2 lower 25% = 9
2. N = 30, number of wrong answers: upper 25% = 1 lower 25% = 6
3. N = 50, number of wrong answers: upper 25% = 3 lower 25% = 8
4. N = 70, number of wrong answers, upper 25% = 4 lower 25% = 10
Solution:
ASSESSMENT
Direction: Read the statement and options carefully and encircle the letter of your answer.
1. The index of difficulty of a particular test item is 0. 24. What does this mean? My students _________.
a. gained mastery over an item
b. performed very well against expectation
c. found that test item was either easy nor difficult
d. found that the test item was hard
2. The discrimination index of a test item is -0.46. What does this imply?
a. More students from the upper group answered the item incorrectly.
b. More students from the upper group answered the item correctly.
c. More students from the lower group answered the item correctly.
d. The number of students from the lower group and upper group who answered the item correctly are equal.
3. Doc Zam gave a test in Assessment of Learning 1. Item no. 18 has a difficulty index of 0.85 and
discrimination index of -0.10. What should he do?
a. retain the item
b. make the item bonus
c. reject the item
d. reject it and make the item bonus
4. The difficulty index of test item no. 7 is 1. What does this imply?
a. The test is very difficult
b. The test is very easy
c. The test item is a quality item
d. Nobody got the item correctly
5. In an item analysis, Doc Tej found out that more from the lower group got the test item #9 correctly. This
means that the test item ________.
a. has a negative discriminating power
b. has a lower validity
c. has a positive discriminating power
d. has a high reliability
8
REFERENCES
Books
Navarro, R. Santos, R. and Corpuz, B. (2019). Assessment of Learning 1 4th Ed. Lorimar Publishing,
Inc: Quezon City
Navarro, R. Santos, R. and Corpuz, B. (2012). Assessment of Learning Outcomes 2nd Ed. Lorimar
Publishing, Inc: Quezon City
Gabuyo, Y. (2012). Assessment of Learning 1 Textbook and Reviewer, Rex Book Store, Inc.: Manila
Buendicho, F. (2010). Assessment of Student Learning 1. Rex Book Store, Inc.: Manila
Calmorin, L. (2011). Assessment of Student Learning 1. Rex Book Store, Inc.: Manila