Educ 71 FS2 Episode5
Educ 71 FS2 Episode5
Learning Episode 5:
–Classroom level: determine questions most found very difficult/ guessing on,
–Reteach that concept questions all got right
–Don't waste more time on this area find wrong answers students are choosing-
–Identify common misconceptions
–Individual level: isolate specific errors this student made
– doing occasional item analysis will help become a better test writer
– documenting just how good your evaluation is
– useful for dealing with parents or administrators if there's ever a dispute
– once you start bringing out all these impressive looking stats, parents and
administrators will believe why some students failed.
Example: What is the item difficulty index of an item if 25 students are unable to
answer it correctly while 75 answered it correctly?
Here, the total number of students is 100, hence the item difficulty index is
75/100 or 75%.
Another example: 25 students answered the item correctly while 75 students did
not. The total number of students is 100 so the difficulty index is 25/100 or
25 which is 25%.
It is a more difficult test item than that one with a difficulty index of 75.
One problem with this type of difficulty index is that it may not actually
indicate that the item is difficult (or easy). A student who does not know the subject
matter will naturally be unable to answer the item correctly even if the question is
easy. How do we decide on the basis of this index whether the item is too difficult or
too easy?
Thus:
Example: Obtain the index of discrimination of an item if the upper 25% of the class
had a difficulty index of 0.60 (i.e. 60% of the upper 25% got the correct
answer) while the lower 25% of the class had a difficulty index of 0.20.
From these discussions, let us agree to discard or revise all items that have
negative discrimination index for although they discriminate correctly
between the upper and lower 25% of the class, the content of the item itself may be
highly dubious or doubtful.
DISCRIMINATION INDEX TABLE
Example: Consider a multiple choice type of test of which the following data were
obtained:
Item Options
A B* C D
1 0 40 20 20 Total
0 15 5 0 Upper 25%
0 5 10 5 Lower 25%
The correct response is B. Let us compute the difficulty index and index of
discrimination:
DL = no. of students in lower 25% with correct response/ no. of students in the
lower 25%
Index of Difficulty
Ru + RL
P= x 100
T
Where:
Ru — The number in the upper group who answered the item
correctly. RL — The number in the lower group who answered
the item correctly. T — The total number who tried the item.
Ru + RL
D=
½T
Where:
P percentage who answered the item correctly (index of difficulty)
R number who answered the item correctly
T total number who tried the item.
The smaller the percentage figure the more difficult the item
(Ru — RL) ( 6 – 2)
D= = = 0.40
½t 10
For classroom achievement tests, most test constructors desire items with
indices of difficulty no lower than 20 nor higher than 80, with an average index of
difficulty from 30 or 40 to a maximum of 60.
You are expected to observe in your subject assignment how Item Analysis are
conducted and implemented by your respective CT in the teaching-learning process.
(Note to Student Teacher: As you participate and assist your CT in conducting item
analysis, please take note of what you are expected to give more attention to as
asked in the next step of the Learning Episode (NOTICE)
1. Assist your CT in conducting the item analysis of the summative test in one
grading period of the assigned class.
2. Offer your assistance to engage in the conduct of item analysis trough your
CT.
NOTICE
3. What would be the effect of the results of the item analysis in the
teaching-learning process and the performance of students?
Item Difficulty 2. Refers to the proportion of students who got the test item correctly.
Discrimination Index 3. Which is the difference between the proportion of the top
scorers who got an item correct and the proportion of the
bottom scorers who got the item right?
Item Difficulty 4. Which one is concerned with how easy or difficult a test item is?
B. Problem Solving
Item No. 1 2 3 4 5
No. of Students 50 30 30 30 40
Item No. 1 2 3 4 5
UG LG UG LG UG LG UG LG UG LG
No. of Correct
12 20 10 20 20 10 10 24 20 5
Responses
No. of 25 25 25 25 25 25 25 25 25 25
Students
Discrimination 0.32 0.4 -0.4 0.56 -0.6
Index
1. Based on the computed discrimination index, which are good
test items?
o Based on the computed discrimination index, the good
test items are item number 1, 2, and 4. But the best one
is the item number 4 as it is moderately difficult.
3. A multiple choice type of test has 5 options. The Table below indicates the number
of examinees out of 50 who chose each option.
Option
A B C D E
0 20 15* 5 10
* - Correct answer
4. Study the following data. Compute for the difficulty index and the discrimination
index of each set of scores.
Add other activities / techniques that you have researched on, e.g. how item analysis
is conducted in different learning institutions using technology and software
LEARNING EVIDENCES
Introduction
Item Analysis (a.k.a. Test Question Analysis) is a useful means of discovering how well individual test
items assess what students have learned. For instance, it helps us to answer the following questions.
With this process, you can improve test score validity and reliability by analyzing item performance
over time and making necessary adjustments. Test items can be systematically analyzed regardless of
whether they are administered as a Canvas assignment or if they are submitted as ―bubble sheets‖ to
Scanning Services.
With this guide, you’ll be able to
Define and explain the indices related to item analysis.
Locate each index of interest within Scanning Services’ Exam Analysis reports.
Identify target values for each index, depending upon your testing intentions.
Make informed decisions about whether to retain, revise, or remove test items.
Anatomy of a Test Item
In this guide, we refer to the following terms to describe the items (or questions) that make up multiple-
choice tests.
1. Stem refers to the portion of the item that presents a problem for the respondents (students) to
solve
2. Options refer to the various ways the problem might be solved, from which respondents select
the best answer.
a. Distractor is an incorrect option.
b. Key is a correct option.
1
Figure 1: Anatomy of a test item
Item Analysis in Canvas
By default, the quiz summary function in Canvas shows average score, high score, low score,
standard deviation (how far the values are spread across the entire score range), and average
time of quiz completion. This means that, after the quiz has been administered, you automatically
have access to those results, and you can sort those results by Student Analysis or Item Analysis.
The Canvas Doc Team offers a number of guides on using these functions in that learning
management system. Click on Search the Canvas Guides under the Help menu and enter ―Item
Analysis‖ for the most current information.
Scanning Services offers an Exam Analysis Report (see example) through its Instructor
Tools web site. Learn how to generate and download the report at Scanning Services
Instructor Tools Help.
Item analysis typically focuses on four major pieces of information: test score reliability, item
difficulty, item discrimination, and distractor information. No single piece should be examined
independent of the others. In fact, understanding how to put them all together to help you make
a decision about the item’s future viability is critical.
Reliability
Test Score Reliability is an index of the likelihood that scores would remain consistent over
time if the same test was administered repeatedly to the same learners. Scanning Services’ Exam
Analysis Report uses the Cronbach's Alpha measure of internal consistency, which provides reliability
information about items scored dichotomously (i.e., correct/incorrect), such as multiple choice
items. A test showing a Chronbach’s Alpha score of .80 and higher has less measurement error and
is thus said to have very good reliability. A value below .50 is considered to have low reliability.
Item Reliability is an indication of the extent to which your test measures learning about a
single topic, such as ―knowledge of the battle of Gettysburg‖ or ―skill in solving accounting
problems.‖ Measures of internal consistency indicate how well the questions on the test
consistently and collectively address a common topic or construct.
In Scanning Services’ Exam Analysis Report, next to each item number is the percentage
of students who answered the item correctly.
To the right of that column, you’ll see a breakdown of the percentage of students who selected
each of the various options provided to them, including the key (in dark grey) and the
distractors (A, B, C, D, etc.). Under each option, the Total (TTL) indicates the total number
of students who selected that option. The Reliability coefficient (R) value shows the mean
score (%) and Standard Deviation of scores for a particular distractor.
2
Figure 2: Item number and percentage answered correctly on Exam Analysis Report
Score Reliability is dependent upon a number of factors, including some that you can control and some
that you can’t.
Factor Why it’s important
Length of the test Reliability improves as more items are included.
Item difficulty Very easy and very difficult items do not discriminate well and will
lower the reliability estimate.
Homogeneity of item content Reliability on a particular topic improves as more items on that topic
are included. This can present a challenge when a test seeks to
assess a lot of topics. In that case, ask questions that are varied
enough to survey the topics, but similar enough to collectively
represent a given topic.
Number of test takers Reliability improves as more students are tested using the same pool
of items.
Factors that influence any individual Preparedness, distraction, physical wellness, test anxiety, etc. can
test taker on any given day affect students’ ability to choose the correct option.
Reliability coefficients range from 0.00 to 1.00. Ideally, score reliability should be above 0.80.
Coefficients in the range 0.80-0.90 are considered to be very good for course and licensure
assessments.
3
Difficulty
Item Difficulty represents the percentage of students who answered a test item correctly. This means
that low item difficulty values (e.g., 28, 56) indicate difficult items, since only a small percentage of
students got the item correct. Conversely, high item difficulty values (e.g., 84, 96) indicate easier items, as
a greater percentage of students got the item correct.
As indicated earlier, in Scanning Services’ Exam Analysis Report, there are two numbers in the Item
column: item number and the percentage of students who answered the item correctly. A higher
percentage indicates an easier item; a lower percentage indicates a more difficult item. It helps to gauge
this difficulty index against what you expect and how difficult you’d like the item to be. You should find a
higher percentage of students correctly answering items you think should be easy and a lower percentage
correctly answering items you think should be difficult.
Item difficulty is also important as you try to determine how well an item ―worked‖ to separate students
who know the content from those who do not (see Item Discrimination below). Certain items do not
discriminate well. Very easy questions and very difficult questions, for example, are poor discriminators.
That is, when most students get the answer correct, or when most answer incorrectly, it is difficult to
ascertain who really knows the content, versus those who are guessing.
As you examine the difficulty of the items on your test, consider the following.
1. Which items did students find to be easy; which did they find to be difficult? Do those items
match the items you thought would be easy/difficult for students? Sometimes, for example, an
instructor may put an item on a test believing it to be one of the easier on the exam when, in
fact, students find it to be challenging.
2. Very easy items and very difficult items don’t do a good job of discriminating between students
who know the content and those who do not. (The section on Item Discrimination discusses this
further.) However, you may have very good reason for putting either type of question on your 1
exam. For example, some instructors deliberately start their exam with an easy question or two 4
to settle down anxious test takers or to help students feel some early success with the exam.
What should you aim for?
Popular consensus suggests that the best approach is to aim for a mix of difficulties. That is, a few very
difficult, some difficult, some moderately difficult, and a few easy. However, the level of difficulty should be
consistent with the degree of difficulty of the concepts being assessed. The Testing Center provides the
following guidelines.
21 – 60 Difficult
61 – 90 Moderately difficult
91 – 100 Easy
Discrimination
Item Discrimination is the degree to which students with high overall exam scores also got a particular
item correct. It is often referred to as Item Effect, since it is an index of an item’s effectiveness at
discriminating those who know the content from those who do not.
The Point Biserial correlation coefficient (PBS) provides this discrimination index. Its possible range
is -1.00 to 1.00. A strong and positive correlation suggests that students who get a given question correct
also have a relatively high score on the overall exam. Theoretically, this makes sense. Students who know
the content and who perform well on the test overall should be the ones who know the content. There’s a
problem, however, if students are getting correct answers on a test and they don’t actually know the
content.
Figure 4: Total selections (TTL), option reliability (R), and point biserial correlation coefficient (PBS) on Exam
Analysis Report 1
5
In Scanning Services’ Exam Analysis Report, you’ll find the PBS final column, color-coded so you can easily
distinguish the items that may require revision. Likewise, the key for each item is color-coded grey.
Figure 3: Number of students who selected key and mean score/standard deviation, and point biserial
correlation coefficient (PBS)
As you examine item discrimination, there are a number of things you should consider.
1. Very easy or very difficult items are not good discriminators. If an item is so easy (e.g., difficulty =
98) that nearly everyone gets it correct or so difficult (e.g., difficulty = 12) that nearly everyone
gets it wrong, then it becomes very difficult to discriminate those who actually know the content
from those who do not.
2. That does not mean that all very easy and very difficult items should be eliminated. In fact, they
are viable as long you are aware that they will not discriminate well and if putting them on the test
matches your intention to either really challenge students or to make certain that everyone knows
a certain bit of content.
3. Nevertheless, a poorly written item will have little ability to discriminate.
It is typically recommended that item discrimination be at least 0.15 It’s best to aim even higher. Items with
a negative discrimination are theoretically indicating that either the students who performed poorly on the
test overall got the question correct or that students with high overall test performance did not get the item
correct. Thus, the index could signal a number of problems.
Distractors are the multiple choice response options that are not the correct answer. They
are plausible but incorrect options that are often developed based upon students’ common
misconceptions or miscalculations to see if they’ve moved beyond them. As you examine
distractors, there are a number of things you should consider.
1. Are there at least some respondents for each distractor? If you have 4 possible options for
each item but students are selecting from between only one or two of them, it is an
indication that the other distractors are ineffective. Even low-knowledge students can
reduce the ―real‖ options to one or two, so the odds are now good that they will choose
correctly.
2. It is not necessary to revisit every single ―0‖ in the response table. Instead, be mindful,
and responsive, where it looks as if distractors are ineffective. Typically, this is where
there are two or more distractors selected by no one.
3. Are the distractors overly complex, vaguely worded, or contain obviously wrong, ―jokey‖
or ―punny‖ content? Distractors should be not being mini-tests in themselves, nor should
they be a waste of effort.
Distractors should be plausible options. Test writers often use students’ misconceptions,
mistakes on homework, or missed quiz questions as fodder for crafting distractors. When
this is the approach to distractor writing, information about student understanding can be
gleaned even from their selection of wrong answers.
Conclusion
Suskie, L. (2017). ―Making Multiple Choice Tests More Effective.‖ Schreyer Institute for
Teaching Excellence. The Pennsylvania State University.
1
nderstanding Item Analyses. (2018). Office of Educational Assessment. University of 7
Washington. Retrieved from http://www.washington.edu/assessment/scanning-
scoring/scoring/reports/item-analysis/.
OBSERVE
1. One thing that went well in the conduct of item analysis is
the well-specified learning objectives and well-constructed items that give
me a head start in that process, and that item analysis can give me
feedback on how successful I was in the test questions I have rendered to
the students.
2. One thing that did not go very well in the conduct of item analysis is the
process of solving the difficulty of item, and computing the discrimination
index. Since it require much time and effort in order to solve and find out
the problems.
3. One good thing observed in the conduct of item analysis is that these
analyses evaluate the quality of items and of the test as a whole. Such
analysis can also be employed to revise and improve both items and the
test as a whole. However, some best practices in item and test analysis are
too infrequently used in actual practice.
4. One thing in the conduct of item analysis that needs improvement based on
what we have observed is that item analysis as it help you diagnose why
some items did not work especially well, and thus suggest ways to improve
them (for example, if you find distracters that attracted no one, try
developing better ones).
REFLECT
a. The conduct of item analysis went well because all the necessary
information and tools that are needed in conducting such analysis are all
provided and well-constructed.
b. The conduct of item analysis did not go well because there some
challenges that went on the process like the issues on solving
mathematical formulas. But with determination and perseverance to
learn, everything becomes easier.
ACT
To ensure that the process in the conduct of item analysis serve its purpose
and in order to help in the learning process, I will learn from other’s best
practice by researching on the different techniques and strategies in
conducting item analysis and how it helps to enrich my teaching and
learning process as a competent educator in the future.
PLAN
1
To help improve the conduct of item analysis practices and implementation,
8
I plan to conduct an action research on analysis of test items on difficulty
level and discrimination index in the test for research in education.
Learning Excellent Above Average Sufficient Minimal Poor %
Episodes 50 40 30 20 10 Weighted
Ave.
All episodes All or nearly all Nearly all Few activities of Episodes were
were done with episodes were episodes were the episodes were not done; or
Learning outstanding done with high done with done; only few objectives were
Activities quality; work quality acceptable objectives were not met 40%
exceeds quality met
All questions/ Analysis Half of Analysis Few parts of the Analysis were not
episodes were question were questions were Analysis were answered.
answered answered answered. answered.
completely; in completely.
Analysis
depth answers; Vaguely related Grammar and
of the
thoroughly Clear to the theories spelling needs
Learning
grounded on connections improvement
Episode
theories. with theories Grammar and
Exemplary spelling 30%
grammar and Grammar and acceptable
spelling Spelling are
superior
Reflection Reflection Reflection Few reflection Reflection
statements are statements are statements are statements contain statements are
profound and clear; but not good and is minimal supports poor and no
Reflection/ clear; clearly supported by of concrete real life personal
Insights supported by supported by experiences experiences as experiences were 10%
experiences experiences from the relevance to the stated as
from the from the learning learning episodes relevance to the
learning learning episodes learning episodes
episodes episodes
Portfolio is Portfolio is Portfolio is Few No
complete, clear, complete, clear, incomplete; documents/proofs/ documentations
well-organized well-organized supporting evidences of the and any other
and all and most documentation learning evidences of
Learning
supporting; supporting; are organized experiences from performing the
Portfolio
documentation documentation but are lacking the learning episode
s are located in s area available episode is presented 10%
sections clearly and logical and presented
designated clearly marked
locations
Submiss Submitted Submitted Submitted a Submitted Submitted a
ion of before the on the day after the two-five week or more 10%
Learning deadline deadline deadline days after the
Episodes after the deadline deadline
Total
100%
COMMENT/S
Over-all Score
1
Rating: (Base on transmutation) 9
2
0