0% found this document useful (0 votes)
40 views17 pages

Item Analysis and Validation 1

Item analysis is used to evaluate test items and assess the overall quality of a test. It examines student responses to determine the validity, reliability, difficulty, and discrimination of individual items. Validity refers to how well a test measures what it is intended to measure. Reliability indicates how consistent test scores are. Item difficulty and discrimination are also calculated to evaluate how well items differentiate between higher and lower performing students.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views17 pages

Item Analysis and Validation 1

Item analysis is used to evaluate test items and assess the overall quality of a test. It examines student responses to determine the validity, reliability, difficulty, and discrimination of individual items. Validity refers to how well a test measures what it is intended to measure. Reliability indicates how consistent test scores are. Item difficulty and discrimination are also calculated to evaluate how well items differentiate between higher and lower performing students.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

ITEM ANALYSIS

AND
VALIDATION
LEARNING OUTCOMES
Explain the meaning of item analysis, item
validity, reliability, item difficulty, discrimination
index
 Determine the validity and reliability of the
given test items
ITEM ANALYSIS

Item Analysis is a process which
examines student responses to individual
test items (questions) in order to assess
the quality of those items and of the test
as a whole.
There are two important characteristics of an
item that will be of interest to the teacher:
Item Difficulty
Item difficulty = number of students with correct
answer/total number of students
The item difficulty is usually expressed in percentage.
Example: What is the item difficulty index of an item if 25
students are unable to answer it correctly while 75 answered
it correctly?
Here, the total number of students is 100, hence, the item
difficulty index is 75/100 or 75%.

How do we decide on the basis of this index whether the item is


too difficult or too easy?
The following arbitrary rule is often used in the literature:

Range of Difficulty Index Interpretation Action


0-0.25 Difficult
Revise or Discard
0.26-0.75 Right difficulty
Retain
0.76-above Easy Revise or
An easy way to derive such a measure is to measure
how difficult an item is with respect to those in the
upper 25% of the class and how difficult it is with
respect to those in the lower 25% of the class. If the
upper 25% of the class found the item easy yet the
lower 25% found it difficult, then the item can
discriminate properly between these two groups.
Index of Discrimination = DU – DL

Example: Obtain the index of discrimination of an item if


the upper 25% of the class had a difficulty index of 0.60
(i.e. 60% of the upper 25% got the correct answer) while the
lower 25% of the class had a difficulty index of 0.20.

Hence, DU = 0.60 while DL = 0.20, thus index of


discrimination
= .60 - .20 = .40.
Index of Difficulty

P=

Where:
RU – The number in the upper group who answered the item
correctly.
RL – The number in the lower group who answered the item
correctly.
T – The total number who tried the item.
Index of item Discriminating Power

Where:
P – Percentage who answered the item correctly
(index of difficulty )
R – Number who answered the item correctly
T – Total number who tried the item.
The smaller the percentage figure the more difficult the item

Estimate the item discriminating power using the formula below:

= 0.40

The discriminating power of an item reported as a decimal fraction; maximum


discriminating power is indicated by a index of 1.00.

Maximum discrimination is usually found at the 50 percent level of difficulty.

0.00 - 0.20 = Very difficulty


0.21 - 0.80 = Moderately difficulty
0.81 – 1.00 = Very easy
VALIDATION
Validity is the extent to which a test measures
what it purports to measure or as referring to the
appropriateness, correctness, meaningfulness and
usefulness of the specific decisions a teacher
make based on the test results.
There are essentially three main types of evidence that
may be collected: content-related evidence of validity,
criterion-related evidence of validity and construct-
related evidence of validity.

Content related evidence of validity refers to the


content and format of the instrument.
Criterion-related evidence of validity refers to the
relationship between scores obtained using the
instrument and scores obtained using one or more other
test (often called criterion).

Construct-related evidence of validity refers to the


nature of the psychological construct or characteristic
being measured by the test.
Reliability
Reliability refers to the consistency of the scores obtained – how consistent
they are for each individual from one administration of an instrument to
another and from one set items to another. We already gave the formula for
computing the reliability of a test; for internal consistency; for instance, we
could use the split – half method or the Kuder – Richardson formula (KR –
20 or KR – 21()

Reliability and validity are related concepts. If an instrument is unreliable, it


cannot yet valid outcomes. As reliability improves, validity may improve (or
it may not). However, if an instrument is shown scientifically to be valid then
it is almost certain that it is also reliable.
The following table is a standard followed almost university in educational test
and measurement
Reliability Interpretation
.90 and above Excellent reliability; at the level of the best standardized
test
80 - 90 Very good for a classroom test
.70 - 80 Good for a classroom test; in the range of most. There are
probably a few items which could be improved.
.60 - 70 Somewhat low. This test needs to be supplemented by
others measures (e.g., more test) to determine grades.
There are probably some items which could be improved.

.50 - 60 Suggest need for revision of test, unless it is quite short


(ten on fewer items). The test definitely needs to be
supplemented by other measures (e.g., more test) for
grading.
.50 or below Questionable reliability. This test should not contribute
heavily to the course grade, and it need revision.
GENERALIZATION
Item Analysis is a process which examines student responses to
individual test items (questions) in order to assess the quality of those
items and of the test as a whole.
Validity is the extent to which a test measures what it purports to
measure or as referring to the appropriateness, correctness,
meaningfulness and usefulness of the specific decisions a teacher
make based on the test results.
Reliability refers to the consistency of the scores obtained – how
consistent they are for each individual from one administration of an
instrument to another and from one set items to another.
Item Difficulty- is defined as the number of students who are
able to answer the item correctly divided by the total number
of students.
Index of Discrimination- is the difference between the percent
of correct responses in the upper group and the percent of
responses in the lower group.

You might also like