0% found this document useful (0 votes)
2K views20 pages

Educ 71 FS2 Episode5

This document provides information about item analysis, which examines test items to assess quality and improve future tests. It discusses examining item difficulty, defined as the percentage of students answering correctly. Items with difficulty between 26-75% are considered of right difficulty. Discrimination index is the difference between the percentage of top and bottom scorers answering correctly. Items with a discrimination index between 0.46-1 are considered discriminating and should be included on future tests. The document provides tables to help interpret difficulty and discrimination index values.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views20 pages

Educ 71 FS2 Episode5

This document provides information about item analysis, which examines test items to assess quality and improve future tests. It discusses examining item difficulty, defined as the percentage of students answering correctly. Items with difficulty between 26-75% are considered of right difficulty. Discrimination index is the difference between the percentage of top and bottom scorers answering correctly. Items with a discrimination index between 0.46-1 are considered discriminating and should be included on future tests. The document provides tables to help interpret difficulty and discrimination index values.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

GROUP 1: LEARNING EPISODE 5

Learning Episode 5:

BASICS OF ITEM ANALYSIS

To have a meaningful and successful accomplishment in this FS episode, be


sure to read through the whole episode before participating and assisting in your FS2
Resource Teacher’s class (any class modality). Note that all information and tasks
you will need to do before working on this episode.

At the end of this Learning Episode, I must be able to:


1. Explain the meaning of item analysis, validity, reliability, item difficulty and
discrimination index;
2. Determine quality of a test item by its difficulty index, discrimination index
and plausibility of options.

(BASICS OF ITEM ANALYSIS)

What is Item Analysis?


● process that examines student responses to individual test items assessing
quality items and the test as a whole
● valuable in improving items which will be used again in later tests and
eliminate ambiguous or misleading items
● valuable for increasing instructors' skills in test construction, and
● Identifying specific areas of course content which need greater emphasis or
clarity.
Several Purposes
1. More diagnostic information on students

 –Classroom level: determine questions most found very difficult/ guessing on,
 –Reteach that concept questions all got right
 –Don't waste more time on this area find wrong answers students are choosing-
 –Identify common misconceptions
 –Individual level: isolate specific errors this student made

2. Build future tests, revise test items to make them better

– know how much working writing good questions


– SHOULD NOT REUSE WHOLE TESTS--> diagnostic teaching means
responding to needs of students, so after a few years a test bank is build up
and choose a tests for the class
– can spread difficulty levels across your blueprint (TOS)

3. Part of continuing professional development

– doing occasional item analysis will help become a better test writer
– documenting just how good your evaluation is
– useful for dealing with parents or administrators if there's ever a dispute
– once you start bringing out all these impressive looking stats, parents and
administrators will believe why some students failed.

Validity. Validity is the extent to which a test measures what it purports to


measure or as referring to the appropriateness, correctness, meaningfulness and
usefulness of the specific decisions a teacher makes based on the test results. These
two definitions of validity differ in the sense that the first definition refers to the test
itself while the second refers to the decisions made by the teacher based on the test.
A test is .valid when it is aligned with the learning outcome.

Reliability refers to the consistency of the scores obtained — how consistent


they are for each individual from one administration of an instrument to another and
from one set of items to another. We already gave the formula for computing the
reliability of a test: for internal consistency; for instance, we could use the split-half
method or the Kuder-Richardson formulae (KR-20 or KR-21)

Reliability and Validity are related concepts. If an instrument is unreliable, it


cannot get valid outcomes. As reliability improves, validity may improve (or it may
not). However, if an instrument is shown scientifically to be valid then it is almost
certain that it is also reliable.
Item Analysis: Difficulty Index and Discrimination Index

There are two important characteristics of an item that will be of interest to


the teacher. These are: (a) item difficulty and (b) discrimination index. We shall learn
how to measure these characteristics and apply our knowledge in making a decision
about the item in question.

The difficulty of an item or item difficulty is defined as the number of


students who are able to answer the item correctly divided by the total number of
students. Thus:

Item difficulty = number of students with correct answer/ total number of


students. The item difficulty is usually expressed in percentage.

Example: What is the item difficulty index of an item if 25 students are unable to
answer it correctly while 75 answered it correctly?

Here, the total number of students is 100, hence the item difficulty index is
75/100 or 75%.

Another example: 25 students answered the item correctly while 75 students did
not. The total number of students is 100 so the difficulty index is 25/100 or
25 which is 25%.

It is a more difficult test item than that one with a difficulty index of 75.

A high percentage indicates an easy item/question while a low percentage


indicates a difficult item.

One problem with this type of difficulty index is that it may not actually
indicate that the item is difficult (or easy). A student who does not know the subject
matter will naturally be unable to answer the item correctly even if the question is
easy. How do we decide on the basis of this index whether the item is too difficult or
too easy?

DIFFICULTY INDEX TABLE

The following arbitrary rule is often used in the literature:

Range of Difficulty Index Interpretation Action


0.00 — 0.25 Difficult Revise or discard
0.26 — 0.75 Right difficulty Retain
0.76 — above Easy Revise or discard
Difficult items tend to discriminate between those who know and those who
do not know the answer. Conversely, easy items cannot discriminate between these
two groups of students. We are therefore interested in deriving a measure that will
tell us whether an item can discriminate between these two groups of students. Such
a measure is called an index of discrimination.

An easy way to derive such a measure is to measure how difficult an item is


with respect to those in the upper 25% of the class and how difficult it is with
respect to those in the lower 25% of the class. If the upper 25% of the class found
the item easy yet the lower 25% found it difficult, then the item can discriminate
properly between these two groups.

Thus:

Index of discrimination = DU — DL (U — Upper group; L — Lower group)

Example: Obtain the index of discrimination of an item if the upper 25% of the class
had a difficulty index of 0.60 (i.e. 60% of the upper 25% got the correct
answer) while the lower 25% of the class had a difficulty index of 0.20.

Here, DU = 0.60 while DL = 0.20,

Thus, index of discrimination = .60 - .20 = .40.

Discrimination index is the difference between the proportion of the top


scorers who got an item correct and the proportion of the lowest scorers who got
the item right. The discrimination index range is between -1 and +1. The closer the
discrimination index is to +1, the more effectively the item can discriminate or
distinguish between the two groups of students. A negative discrimination index
means more from the lower group got the item correctly. The last item is not good
and so must be discarded.

Theoretically, the index of discrimination can range from -1.0 (when DU =0


and DL = 1) to 1.0 (when DU = 1 and DL = 0). When the index of discrimination is
equal to -1, then this means that all of the lower 25% of the students got the correct
answer while all of the upper 25% got the wrong answer. In a sense, such an index
discriminates correctly between the two groups but the item itself is highly
questionable. Why should the bright ones get the wrong answer and the poor ones
get the right answer? On the other hand, if the index of discrimination is 1.0, then
this means that all of the lower 25% failed to get the correct answer while all of the
upper 25% got the correct answer. This is a perfectly discriminating item and is the
ideal item that should be included in the test.

From these discussions, let us agree to discard or revise all items that have
negative discrimination index for although they discriminate correctly
between the upper and lower 25% of the class, the content of the item itself may be
highly dubious or doubtful.
DISCRIMINATION INDEX TABLE

We have the following rule of thumb:

Index Range Interpretation Action


-1.0 — -.50 Can discriminate but item is Discard
questionable
-.51 - 0.45 Non-discriminating Revise
0.46 — 1.00 Discriminating item Include

Example: Consider a multiple choice type of test of which the following data were
obtained:

Item Options
A B* C D
1 0 40 20 20 Total
0 15 5 0 Upper 25%
0 5 10 5 Lower 25%
The correct response is B. Let us compute the difficulty index and index of
discrimination:

Difficulty, Index = no. of students getting correct response/total = 40/100 =


40%, within range of a "good item"

The discrimination index can similarly be computed:

DU = no. of students in upper 25% with correct response/no. of students in the


upper 25%

= 15/20 = .75 or 75%

DL = no. of students in lower 25% with correct response/ no. of students in the
lower 25%

= 5/20 = .25 or 25%

Discrimination Index = DU — DL = .75 - .25 = .50 or 50%.

Thus, the item also has a "good discriminating power."

It is also instructive to note that the distracter A is not an effective


distracter since this was never selected by the students. It is an implausible
distracter. Distracters C and D appear to have good appeal as distracters. They are
plausible distracters.

Index of Difficulty
Ru + RL
P= x 100
T
Where:
Ru — The number in the upper group who answered the item
correctly. RL — The number in the lower group who answered
the item correctly. T — The total number who tried the item.

Index of item Discriminating Power

Ru + RL
D=
½T
Where:
P percentage who answered the item correctly (index of difficulty)
R number who answered the item correctly
T total number who tried the item.

P= 8/20 x 100 = 40%

The smaller the percentage figure the more difficult the item

Estimate the item discriminating power using the formula below:

(Ru — RL) ( 6 – 2)
D= = = 0.40
½t 10

The discriminating power of an item is reported as a decimal fraction;


maximum discriminating power is indicated by an index of 1.00.
Maximum discrimination is usually found at the 50 percent level of difficulty

0.00 – 0.20 = Very difficult

0.21 – 0.80 = Moderately difficult

0.81 – 1.00 = Very easy

For classroom achievement tests, most test constructors desire items with
indices of difficulty no lower than 20 nor higher than 80, with an average index of
difficulty from 30 or 40 to a maximum of 60.

The INDEX OF DISCRIMINATION is the difference between the proportion of


the upper group who got an item right and the proportion of the lower group who
got the item right. This index is dependent upon the difficulty of an item. It may
reach a maximum value of 100 for an item with an index of difficulty of 50, that is,
when 100% of the upper group and none of the lower group answer the item
correctly. For items of less than or greater than 50 difficulty, the index of
discrimination has a maximum value of less than 100.
CONTENTS OF THIS MATERIAL IS ADAPTED IS FROM:
Navarro, R.L., R.G. Santos, B.B. Corpuz. (2019). Assessment of Learning 1. LORIMAR
th
Publishing, Inc. 4 Edition. All Rights Reserved

You are expected to observe in your subject assignment how Item Analysis are
conducted and implemented by your respective CT in the teaching-learning process.

(Note to Student Teacher: As you participate and assist your CT in conducting item
analysis, please take note of what you are expected to give more attention to as
asked in the next step of the Learning Episode (NOTICE)

1. Assist your CT in conducting the item analysis of the summative test in one
grading period of the assigned class.
2. Offer your assistance to engage in the conduct of item analysis trough your
CT.

NOTICE

1. Take note of:


a. Alignment of the different learning behavior or domains with the
learning outcomes based on the TOS and the results of the item
analysis.
b. The distribution of the test items in the learning domains against the
retained/discarded items in the subject as the result of the item
analysis.
c. How the percentage allocation of the lower and higher order thinking
skills observed and distributed in the TOS as manifested in the item
analysis results.
1. Did the results of the conducted item analysis are expected to measure the
learning competencies of students?

 Item analysis is a process which examines student responses to


individual test items (questions) in order to assess the quality of
those items and of the test as a whole. Item analysis is especially
valuable in improving items which will be used again in later tests,
but it can also be used to eliminate ambiguous or misleading items
in a single test administration.

2. Was the item analysis constructed favorably or unfavorably in assessing


students’ performance?

 In measuring student performance, statistical analysis of test item


data is designed positively. It is also beneficial for teachers to
develop valid and reliable evaluation instruments in order to
accurately gauge student performance. The information gathered
from this sort of study is critical for determining the instrument's
strengths and weaknesses.

3. What would be the effect of the results of the item analysis in the
teaching-learning process and the performance of students?

 Item analysis is the process of assessing student responses to


individual exam questions in order to assess exam quality. It is a
crucial tool for maintaining test effectiveness and fairness in order
to improve student performance and the flow of the teaching and
learning process.

1. How would attainment of learning outcomes be measured if item analysis


were not employed or conducted after the summative test?

 Test, measurement, and evaluation are terms used in education to


describe how students' learning progress and ultimate learning outcomes
are examined. When the process is institutionalized, however, item
analysis becomes a scientific tool for improving exams and maintaining
academic integrity. If this item analysis cannot be performed, measuring
students' performance will be difficult. Furthermore, there are a range of
approaches for completing an item analysis, which is typically used, for
example, to choose which items will be maintained for the final edition of
a test. Item analysis is used to assist in "building" reliability and validity
into the test from the beginning.
A. Give the term described/explained.

Item Analysis 1. Refers to a statistical technique that helps instructors identify


the effectiveness of their test items.

Item Difficulty 2. Refers to the proportion of students who got the test item correctly.

Discrimination Index 3. Which is the difference between the proportion of the top
scorers who got an item correct and the proportion of the
bottom scorers who got the item right?

Item Difficulty 4. Which one is concerned with how easy or difficult a test item is?

Plausible 5. Which adjective describes an effective distracter?

B. Problem Solving

1. Solve for the difficulty index of each test item:

Item No. 1 2 3 4 5

No, of Correct Responses 2 10 20 30 15

No. of Students 50 30 30 30 40

Difficulty Index 4 33.33 66.67 100 37.5

1. Which is most difficult? Most easy?


o The most difficult is the item number 1 since only two out of 50
students got the right answer..
o The easiest is the item number 4 as everyone got the correct
answers.

2. Which needs revision? Which should be discarded? Why?


o I believe that item number four (4) should be discarded or
revised as it seems very simple or easy to the students.
Additionally, this indicates that the students really know the
lesson imply by item number four (4).

2. Solve for the discrimination indexes of the following test items:

Item No. 1 2 3 4 5

UG LG UG LG UG LG UG LG UG LG

No. of Correct
12 20 10 20 20 10 10 24 20 5
Responses

No. of 25 25 25 25 25 25 25 25 25 25
Students
Discrimination 0.32 0.4 -0.4 0.56 -0.6
Index
1. Based on the computed discrimination index, which are good
test items?
o Based on the computed discrimination index, the good
test items are item number 1, 2, and 4. But the best one
is the item number 4 as it is moderately difficult.

2. Not good test items?


o Item number 5 got a negative in its discrimination index
which means it is not a good test item, thus should be
revised or deleted.

3. A multiple choice type of test has 5 options. The Table below indicates the number
of examinees out of 50 who chose each option.

Option

A B C D E

0 20 15* 5 10

* - Correct answer

1. Which options are plausible?

o The option that is plausible is the option ―B‖ as there are


many students who choose the option than the correct
one.

2. Which ones are implausible?

o Option ―A‖ is the implausible distracter since nobody has


chosen it as their answer.

4. Study the following data. Compute for the difficulty index and the discrimination
index of each set of scores.

1. N = 80, number of wrong answers: upper 25% = 2 lower 25% = 9


2. N = 30, number of wrong answers: upper 25% = 1 lower 25% = 6
3. N = 50, number of wrong answers: upper 25% = 3 lower 25% = 8
4. N = 70, number of wrong answers, upper 25% = 4 lower 25% = 10

Compile activities, techniques in conducting item analysis by your FS


Resources Teacher in the classes you observed and assigned. Include your
drafts/improvements/ annotations on the conduct of item analysis.

Add other activities / techniques that you have researched on, e.g. how item analysis
is conducted in different learning institutions using technology and software
LEARNING EVIDENCES

GUIDE TO ITEM ANALYSIS

Introduction

Item Analysis (a.k.a. Test Question Analysis) is a useful means of discovering how well individual test
items assess what students have learned. For instance, it helps us to answer the following questions.

 Is a particular question as difficult, complex, or rigorous as you intend it to be?


 Does the item do a good job of separating students who know the content from those who
may merely either guess the right answer or apply test-taking strategies to eliminate the
wrong answers?
 Which items should be eliminated or revised before use in subsequent administrations of the
test?

With this process, you can improve test score validity and reliability by analyzing item performance
over time and making necessary adjustments. Test items can be systematically analyzed regardless of
whether they are administered as a Canvas assignment or if they are submitted as ―bubble sheets‖ to
Scanning Services.
With this guide, you’ll be able to
 Define and explain the indices related to item analysis.
 Locate each index of interest within Scanning Services’ Exam Analysis reports.
 Identify target values for each index, depending upon your testing intentions.
 Make informed decisions about whether to retain, revise, or remove test items.
Anatomy of a Test Item
In this guide, we refer to the following terms to describe the items (or questions) that make up multiple-
choice tests.
1. Stem refers to the portion of the item that presents a problem for the respondents (students) to
solve
2. Options refer to the various ways the problem might be solved, from which respondents select
the best answer.
a. Distractor is an incorrect option.
b. Key is a correct option.

1
Figure 1: Anatomy of a test item
Item Analysis in Canvas

By default, the quiz summary function in Canvas shows average score, high score, low score,
standard deviation (how far the values are spread across the entire score range), and average
time of quiz completion. This means that, after the quiz has been administered, you automatically
have access to those results, and you can sort those results by Student Analysis or Item Analysis.
The Canvas Doc Team offers a number of guides on using these functions in that learning
management system. Click on Search the Canvas Guides under the Help menu and enter ―Item
Analysis‖ for the most current information.

Item Analysis in Scanning Services

Scanning Services offers an Exam Analysis Report (see example) through its Instructor
Tools web site. Learn how to generate and download the report at Scanning Services
Instructor Tools Help.

Four Steps to Item Analysis

Item analysis typically focuses on four major pieces of information: test score reliability, item
difficulty, item discrimination, and distractor information. No single piece should be examined
independent of the others. In fact, understanding how to put them all together to help you make
a decision about the item’s future viability is critical.

Reliability

Test Score Reliability is an index of the likelihood that scores would remain consistent over
time if the same test was administered repeatedly to the same learners. Scanning Services’ Exam
Analysis Report uses the Cronbach's Alpha measure of internal consistency, which provides reliability
information about items scored dichotomously (i.e., correct/incorrect), such as multiple choice
items. A test showing a Chronbach’s Alpha score of .80 and higher has less measurement error and
is thus said to have very good reliability. A value below .50 is considered to have low reliability.

Item Reliability is an indication of the extent to which your test measures learning about a
single topic, such as ―knowledge of the battle of Gettysburg‖ or ―skill in solving accounting
problems.‖ Measures of internal consistency indicate how well the questions on the test
consistently and collectively address a common topic or construct.

In Scanning Services’ Exam Analysis Report, next to each item number is the percentage
of students who answered the item correctly.

To the right of that column, you’ll see a breakdown of the percentage of students who selected
each of the various options provided to them, including the key (in dark grey) and the
distractors (A, B, C, D, etc.). Under each option, the Total (TTL) indicates the total number
of students who selected that option. The Reliability coefficient (R) value shows the mean
score (%) and Standard Deviation of scores for a particular distractor.

2
Figure 2: Item number and percentage answered correctly on Exam Analysis Report

How would you use this information?

Score Reliability is dependent upon a number of factors, including some that you can control and some
that you can’t.
Factor Why it’s important
Length of the test Reliability improves as more items are included.

Proportion of students responding Helps determine item reliability.


correctly and incorrectly to each item

Item difficulty Very easy and very difficult items do not discriminate well and will
lower the reliability estimate.

Homogeneity of item content Reliability on a particular topic improves as more items on that topic
are included. This can present a challenge when a test seeks to
assess a lot of topics. In that case, ask questions that are varied
enough to survey the topics, but similar enough to collectively
represent a given topic.
Number of test takers Reliability improves as more students are tested using the same pool
of items.
Factors that influence any individual Preparedness, distraction, physical wellness, test anxiety, etc. can
test taker on any given day affect students’ ability to choose the correct option.

What should you aim for?

Reliability coefficients range from 0.00 to 1.00. Ideally, score reliability should be above 0.80.
Coefficients in the range 0.80-0.90 are considered to be very good for course and licensure
assessments.
3
Difficulty

Item Difficulty represents the percentage of students who answered a test item correctly. This means
that low item difficulty values (e.g., 28, 56) indicate difficult items, since only a small percentage of
students got the item correct. Conversely, high item difficulty values (e.g., 84, 96) indicate easier items, as
a greater percentage of students got the item correct.

As indicated earlier, in Scanning Services’ Exam Analysis Report, there are two numbers in the Item
column: item number and the percentage of students who answered the item correctly. A higher
percentage indicates an easier item; a lower percentage indicates a more difficult item. It helps to gauge
this difficulty index against what you expect and how difficult you’d like the item to be. You should find a
higher percentage of students correctly answering items you think should be easy and a lower percentage
correctly answering items you think should be difficult.

Item difficulty is also important as you try to determine how well an item ―worked‖ to separate students
who know the content from those who do not (see Item Discrimination below). Certain items do not
discriminate well. Very easy questions and very difficult questions, for example, are poor discriminators.
That is, when most students get the answer correct, or when most answer incorrectly, it is difficult to
ascertain who really knows the content, versus those who are guessing.

Figure 3: Item number and item difficulty on Exam Analysis Report

How should you use this information?

As you examine the difficulty of the items on your test, consider the following.
1. Which items did students find to be easy; which did they find to be difficult? Do those items
match the items you thought would be easy/difficult for students? Sometimes, for example, an
instructor may put an item on a test believing it to be one of the easier on the exam when, in
fact, students find it to be challenging.
2. Very easy items and very difficult items don’t do a good job of discriminating between students
who know the content and those who do not. (The section on Item Discrimination discusses this
further.) However, you may have very good reason for putting either type of question on your 1
exam. For example, some instructors deliberately start their exam with an easy question or two 4
to settle down anxious test takers or to help students feel some early success with the exam.
What should you aim for?

Popular consensus suggests that the best approach is to aim for a mix of difficulties. That is, a few very
difficult, some difficult, some moderately difficult, and a few easy. However, the level of difficulty should be
consistent with the degree of difficulty of the concepts being assessed. The Testing Center provides the
following guidelines.

% Correct Item Difficulty Designation


0 – 20 Very difficult

21 – 60 Difficult

61 – 90 Moderately difficult

91 – 100 Easy

Discrimination

Item Discrimination is the degree to which students with high overall exam scores also got a particular
item correct. It is often referred to as Item Effect, since it is an index of an item’s effectiveness at
discriminating those who know the content from those who do not.

The Point Biserial correlation coefficient (PBS) provides this discrimination index. Its possible range
is -1.00 to 1.00. A strong and positive correlation suggests that students who get a given question correct
also have a relatively high score on the overall exam. Theoretically, this makes sense. Students who know
the content and who perform well on the test overall should be the ones who know the content. There’s a
problem, however, if students are getting correct answers on a test and they don’t actually know the
content.

Figure 4: Total selections (TTL), option reliability (R), and point biserial correlation coefficient (PBS) on Exam
Analysis Report 1
5
In Scanning Services’ Exam Analysis Report, you’ll find the PBS final column, color-coded so you can easily
distinguish the items that may require revision. Likewise, the key for each item is color-coded grey.

Figure 3: Number of students who selected key and mean score/standard deviation, and point biserial
correlation coefficient (PBS)

How should you use this information?

As you examine item discrimination, there are a number of things you should consider.

1. Very easy or very difficult items are not good discriminators. If an item is so easy (e.g., difficulty =
98) that nearly everyone gets it correct or so difficult (e.g., difficulty = 12) that nearly everyone
gets it wrong, then it becomes very difficult to discriminate those who actually know the content
from those who do not.
2. That does not mean that all very easy and very difficult items should be eliminated. In fact, they
are viable as long you are aware that they will not discriminate well and if putting them on the test
matches your intention to either really challenge students or to make certain that everyone knows
a certain bit of content.
3. Nevertheless, a poorly written item will have little ability to discriminate.

What should you aim for?

It is typically recommended that item discrimination be at least 0.15 It’s best to aim even higher. Items with
a negative discrimination are theoretically indicating that either the students who performed poorly on the
test overall got the question correct or that students with high overall test performance did not get the item
correct. Thus, the index could signal a number of problems.

1. There is a mistake on the scoring key.


2. Poorly prepared students are guessing correctly.
3. Well prepared students are somehow justifying or misled by the wrong answer.
1
6
DISTRACTORS

 Distractors are the multiple choice response options that are not the correct answer. They
are plausible but incorrect options that are often developed based upon students’ common
misconceptions or miscalculations to see if they’ve moved beyond them. As you examine
distractors, there are a number of things you should consider.
1. Are there at least some respondents for each distractor? If you have 4 possible options for
each item but students are selecting from between only one or two of them, it is an
indication that the other distractors are ineffective. Even low-knowledge students can
reduce the ―real‖ options to one or two, so the odds are now good that they will choose
correctly.
2. It is not necessary to revisit every single ―0‖ in the response table. Instead, be mindful,
and responsive, where it looks as if distractors are ineffective. Typically, this is where
there are two or more distractors selected by no one.
3. Are the distractors overly complex, vaguely worded, or contain obviously wrong, ―jokey‖
or ―punny‖ content? Distractors should be not being mini-tests in themselves, nor should
they be a waste of effort.

What should you aim for?

Distractors should be plausible options. Test writers often use students’ misconceptions,
mistakes on homework, or missed quiz questions as fodder for crafting distractors. When
this is the approach to distractor writing, information about student understanding can be
gleaned even from their selection of wrong answers.

For best practices on developing effective multiple-choice items, schedule a SITE


consultation or see the following resources:

14 Rules for Writing Multiple Choice Questions


Developing Test Items for Course Examinations, Idea Paper 70
Constructing Written Test Questions for the Basic and Clinical Sciences

Conclusion

Item analysis is an empowering process. Knowledge of score reliability, item difficulty,


item discrimination, and crafting effective distractors can help an instructor make
decisions about whether to retain items for future administrations, revise them, or
eliminate them from the test item pool. Item analysis can also help an instructor to
determine whether a particular portion of course content should be revisited. In any case,
all indices should be considered together before making decisions or revisions. One
important thing to always keep in mind is that decisions about item revision should be
based on the extent to which item performance matches your intent for the item and
your intent for the overall exam.
Resources:

Suskie, L. (2017). ―Making Multiple Choice Tests More Effective.‖ Schreyer Institute for
Teaching Excellence. The Pennsylvania State University.
1
nderstanding Item Analyses. (2018). Office of Educational Assessment. University of 7
Washington. Retrieved from http://www.washington.edu/assessment/scanning-
scoring/scoring/reports/item-analysis/.
OBSERVE
1. One thing that went well in the conduct of item analysis is
the well-specified learning objectives and well-constructed items that give
me a head start in that process, and that item analysis can give me
feedback on how successful I was in the test questions I have rendered to
the students.

2. One thing that did not go very well in the conduct of item analysis is the
process of solving the difficulty of item, and computing the discrimination
index. Since it require much time and effort in order to solve and find out
the problems.

3. One good thing observed in the conduct of item analysis is that these
analyses evaluate the quality of items and of the test as a whole. Such
analysis can also be employed to revise and improve both items and the
test as a whole. However, some best practices in item and test analysis are
too infrequently used in actual practice.

4. One thing in the conduct of item analysis that needs improvement based on
what we have observed is that item analysis as it help you diagnose why
some items did not work especially well, and thus suggest ways to improve
them (for example, if you find distracters that attracted no one, try
developing better ones).

REFLECT
a. The conduct of item analysis went well because all the necessary
information and tools that are needed in conducting such analysis are all
provided and well-constructed.

b. The conduct of item analysis did not go well because there some
challenges that went on the process like the issues on solving
mathematical formulas. But with determination and perseverance to
learn, everything becomes easier.

ACT
To ensure that the process in the conduct of item analysis serve its purpose
and in order to help in the learning process, I will learn from other’s best
practice by researching on the different techniques and strategies in
conducting item analysis and how it helps to enrich my teaching and
learning process as a competent educator in the future.

PLAN
1
To help improve the conduct of item analysis practices and implementation,
8
I plan to conduct an action research on analysis of test items on difficulty
level and discrimination index in the test for research in education.
Learning Excellent Above Average Sufficient Minimal Poor %
Episodes 50 40 30 20 10 Weighted
Ave.
All episodes All or nearly all Nearly all Few activities of Episodes were
were done with episodes were episodes were the episodes were not done; or
Learning outstanding done with high done with done; only few objectives were
Activities quality; work quality acceptable objectives were not met 40%
exceeds quality met

All questions/ Analysis Half of Analysis Few parts of the Analysis were not
episodes were question were questions were Analysis were answered.
answered answered answered. answered.
completely; in completely.
Analysis
depth answers; Vaguely related Grammar and
of the
thoroughly Clear to the theories spelling needs
Learning
grounded on connections improvement
Episode
theories. with theories Grammar and
Exemplary spelling 30%
grammar and Grammar and acceptable
spelling Spelling are
superior
Reflection Reflection Reflection Few reflection Reflection
statements are statements are statements are statements contain statements are
profound and clear; but not good and is minimal supports poor and no
Reflection/ clear; clearly supported by of concrete real life personal
Insights supported by supported by experiences experiences as experiences were 10%
experiences experiences from the relevance to the stated as
from the from the learning learning episodes relevance to the
learning learning episodes learning episodes
episodes episodes
Portfolio is Portfolio is Portfolio is Few No
complete, clear, complete, clear, incomplete; documents/proofs/ documentations
well-organized well-organized supporting evidences of the and any other
and all and most documentation learning evidences of
Learning
supporting; supporting; are organized experiences from performing the
Portfolio
documentation documentation but are lacking the learning episode
s are located in s area available episode is presented 10%
sections clearly and logical and presented
designated clearly marked
locations
Submiss Submitted Submitted Submitted a Submitted Submitted a
ion of before the on the day after the two-five week or more 10%
Learning deadline deadline deadline days after the
Episodes after the deadline deadline
Total
100%

COMMENT/S

3.0 (50-51) 2.0 ( 70-75)


2.75 (52-57) 1.75 (76-81)
2.5 (58-63) 1.5 (82-87)
2.25 (64-69) 1.25 (88-93)
1.0 (94-100)

Over-all Score

1
Rating: (Base on transmutation) 9
2
0

You might also like