Test Construction and Validation
Test Construction and Validation
CONSTRUCTION AND
VALIDATION
TEST
TEST is an instrument or tool to
measure any quality, ability, skill or
knowledge.
MEASUREMENT is
the process of quantifying an
individual's achievement,
personality, attitudes among others
the process of quantifying the
degree to which someone/
something possesses a given trait
EDUCATIONAL MEASUREMENT
Feedbacks to students
Forms / Kinds of Assessment
Traditional Assessment - refers to the
forced-choice measures
Performance Assessment - complex task,
often involving the creation of a product.
Portfolio Assessment - a collection of many
different indicators of student progress in
support of curricular goals in dynamic on-going
and collaborative process.
Authentic Assessment - is used to evaluate
student's work by measuring the product
according to real life criteria.
Principles of Evaluation
integral part of the teaching
continuous process
objectives of learning
validity
Reliability
diagnostic characteristic
participative
variety
Validity
In evaluating learners, there must
be a close relationship between
what the test measures and what it
is supposed to measure.
Validity is shown when the
arrow hits its target
10 POINTS
2 POINTS
1 POINT
RELIABILITY
Reliability
refers to the consistency
with which students perform on a
test.
10 POINTS
2 POINTS
1 POINT
This is being Valid & Reliable
10 POINTS
8 POINTS
2 POINTS
Classification of Teacher-Made Tests
Objective Test
– Supply Type
short answer
completion
– Selective Type
true-false or alternative response
Matching
multiple choice
Essay Test
– Extended response
– Restricted response
NON-TEST METHOD
– Observation of student work
– Group Evaluation Activities
Class Discussion
Homework
Notebooks and Note Telling
Reports, Themes and Research Papers
Discussions and Debates
General Suggestions for Writing Test
– Use your test specification as guide to item
writing.
Items .
– Write more test items than needed.
– Write the test items well in advance of the
testing date.
– Write each test item so that the task to be
performed is clearly defined.
– Write each item at an appropriate difficulty
level.
– Write each item so that it does not provide
help in answering other items in the test.
– Write each test item so that the answer is
one that would be agreed upon by experts.
Short-Answer Items
Example:
Content Validity
Face Validity
Criterion-Related
Validity
Content Validity – involves
essentially the systematic
examination of the test content
to determine to whether it
covers a representative sample
of behavior domain to be
measured. This is assured by a
table of specification.
Face Validity – refers not to
what a test actually
measures, but to what it
appears superficially to
measure. Face validity
pertains to whether the test
“looks valid” to the
examinees who take it.
Criterion-Related Validity
– indicates the effectiveness
of a test in predicting an
individual’s behavior in
specific situations.
CRITERION-RELATED VALIDITY – is
established statistically such that a set of
scores revealed by a test is correlated
with the scores obtained in an identified
criterion or measure.
– Concurrent Validity – describes the present
status of the individual by correlating the sets
of scores obtained from two measures given
concurrently.
– Predictive Validity – describes the future
performance of an individual by correlating
the sets of scores obtained from two
measures given at a longer time interval.
CONSTRUCT VALIDITY –
involvespsychological meaningfulness of a
test score, that is the degree to which
certain theoretical factors or constructs can
account for item responses or performances.
Validation of Content
Validity. The instrument
exhibits validity when it
measures what it is
supposed to measure, and
when it hits its target
information.
Instruments such as tests should
show content validity. Content
validity in tests, such as diagnostic
tests, achievement tests, quarter
tests, etc. must be assured by a table
of specification, which shows the
distribution of items within the
content scope of the test.
Table of Specification
Content Objectives
Knowledg Computati Analysi Comprehe
e on s nsion
# of 5 5 2 2 14 100
items
% 40 33.3 13.3 13.3 100
Aside from the table of specification,
a test must come up with the indices
of difficulty and discrimination.
The difficulty index
The difficulty index shows whether
an item is acceptable or not relative
to student’s difficulty in answering
the item.
The discrimination index
The discrimination index shows the
index at which the item discriminates
the high group and low group of
students. It validates the performance
of the high group and the low group.
If the discrimination index is high, it
means that the item confirms the
good performance of the high group
compared to the low group.
Item analysis
Item analysis follows the given procedure:
1. Dry run the test and score the papers.
2. Arrange the papers from highest to lowest.
3. Get the upper and lower 27% of the
papers. The upper 27% shall compose the
upper group while the lower 27%, the lower
group.
4. Tally the answers of the upper and lower
group in each item.
5. Compute necessary statistics to analyze
the items and the whole test.
A Response Analysis Table
Response
Item Group
a b c d
Upper 5 7 12* 0
1 Lower 10 6 11* 0
Upper 0 2 2 15*
2 Lower 7 5 4 11*
choices
Item Group
a b c d
Upper 5 7 12* 0
1 Lower 10 6 11* 0
Discrimination Index
= (Ru – Rl)/1/2 N
Where:
k = Total number of items
2x = the variance of the total test
P= proportion of those who got the
item correctly
q=1-p
pq =the sum of the products of pi and
qi
Example:
X=score
X2= Square of the score
Where:
lower 4 1* 4 6
7 Upper 0 10* 5 0 .37 .60
lower 1 1* 10 3
8 Upper 0 15* 0 0 .76 .47
lower 2 8* 2 3
9 Upper 0 14* 0 1 .53 .8
lower 1 2* 9 2
10 Upper 15* 0 0 0 .67 .67
lower 5* 7 1 2
To judge the results as to acceptability
Discrimination
Difficulty .1 .2 .3 .4 .5 .6 .7 .8 .9
1
29.5 and below
=10/9 1- (2.0363/9.53)
=.87
Split-Half Reliability
Scorer Reliability
1. Test-Retest reliability (coefficient
of stability) – repeating a test on a
second occasion using the same
group of examinees. The two sets of
scores are then correlated by
pearson’s r. The computed value is
the reliability coefficient.
Stud. Test Retest x2 xy y2
X Y
1 11 8 121 88 64
2 9 10 81 90 100
3 5 6 25 30 36
4 13 14 169 182 196
5 15 16 225 240 256
6 3 4 9 12 16
7 1 3 1 3 9
8 2 3 4 6 9
9 8 9 64 72 81
10 5 6 25 30 36
Σ 72 79 724 753 803
r = NΣXY –ΣXΣY
√ NΣX2- (ΣX)2 NΣX2- (ΣX)2
r = 10*753 -72*79
√ 0*724 - (77)2 10*803 - (79)2
Mean
median
Mean Median
Median Mean
Thank You