"Development of Large Scale Student Assessment Test": Chapter 13)
"Development of Large Scale Student Assessment Test": Chapter 13)
Write
Statistical Questions
Review
Standard Test
Development Process
Pilot Content
Testing Review
Step 3: Writing and Reviewing Item development and reviewers must see to it that each item:
Questions Has only one correct answer among the options provided in the
test
Validity
Construct validity refers to whether a scale or test
measures the construct adequately. An example is a measurement
of the human brain, such as intelligence, level of emotion,
proficiency or ability.
Content validity involves examination of the psychological
construct hypothetically assumed to be measured by the
established by doing a factor analysis of the test items to bring
about what defines the overall construct. It determines if the test
measures a unitary construct or if it is a multi-dimensional
construct says shown by the resultant factors. These “validities”
have for a while been what are required to be establishing by
educational and psychological test
Criterion validity (or criterion-related validity) measures how well one
measure predicts an outcome for another measure. A test has this type of
validity if it is useful for predicting performance or behavior in another
situation (past, present, or future).
5 categories of evidence supporting a score interpretation and
which have brought about other forms of validity :
With only the single administration, split-half reliability is workable. This divides the
test into two halves using the old-even spilt. All the odd-number items make up for a
while the even-numbered items compose form B. The coefficient of correlation
between two half tests is obtained using the Pearson Product Moment Correlation with
Spearman-Brown Formula applied to estimate the correlation of the tests (r).
Inter-rater reliability assesses the degree to which different judges or rates agree in their
assessment decisions. This is quite useful to avoid doubts on the scoring procedure of test
with non-objective items. The sets of scores obtained in the test from two raters can also
be subjects to Pearson r to get reliability coefficient. The other type of reliability looks at
the internal consistency of response to all items. With the assumption that all items in the
test are measures of the same construct, there will by inter-item consistency in the
responses of the test takers. The procedure will require how the individuals perform (i.e.
pass/fail) in each item. Ruder-Richardson Formula 20 (K-R 20) will be applied to
estimate the reliability coefficient.
Establishing the validity and estimating the reliability of
tests are given attention in this last chapter to emphasize
their significance in the development process of large-scale
tests. Test documentation must include how reliability is
estimated and this may not be limited to only one type. The
more evidences these are the tests reliability, the test
becomes of its fidelity to measurement consistency. In terms
of validity, supporting evidences for the possible score
interpretations and actions recommended should be
effectively reported. These two technical merits speak well
usability for the recommended usage. With large-scale
students assessments now growing in acceptable all over, it
are important that the integrity of the development process
be upheld.
Thank You
&
God Bless !!!
Answer the Following Questions
1. What are the three conventional types of Validity?
2. This is a measurement of student learning designed to describe the achievement of
students in particular areas of learning across of an education system. (2 points)
3. 5 categories of evidence supporting a score interpretation and which have brought
about other forms of validity?
4. In Your own idea, what is the difference between Validity and Reliability .(4 points)
5. What are the steps in development test by ETS ?
Overall Total
15 Correct Answer