0% found this document useful (0 votes)
1K views28 pages

Establishing Validity-and-Reliability-Test

The document discusses establishing validity and reliability in testing. It defines validity as the degree to which a test measures what it claims to measure, and reliability as the consistency of responses to a test. There are several types of validity discussed, including content validity which ensures a test covers the relevant subject matter, and criterion validity which checks if test scores correlate with real-world outcomes. Reliability can be measured through test-retest methods and assessing internal consistency. Factors like the number of test items and individual differences can impact reliability.

Uploaded by

Ryl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views28 pages

Establishing Validity-and-Reliability-Test

The document discusses establishing validity and reliability in testing. It defines validity as the degree to which a test measures what it claims to measure, and reliability as the consistency of responses to a test. There are several types of validity discussed, including content validity which ensures a test covers the relevant subject matter, and criterion validity which checks if test scores correlate with real-world outcomes. Reliability can be measured through test-retest methods and assessing internal consistency. Factors like the number of test items and individual differences can impact reliability.

Uploaded by

Ryl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

ESTABLISHING TEST VALIDITY AND

RELIABILITY

Neither Valid Reliable


nor Reliable but not
Valid

Fairly Valid but Valid &


not very Reliable
Reliable
ESTABLISHING TEST VALIDITY AND
RELIABILITY

OBJECTIVES

1.U s e p r o c e d u r e s a n d s t a t i s t i c a l a n a l y s i s to
e s t a b l i s h t e s t v a l i d i t y a n d r eliabilit y
2.De c id e w h e t h e r th e te s t is v a lid or r e l i a b l e
3.De c id e w h ic h te s t ite m s a re e a s y a n d
diff icult
Reliability
Reliability is the consistency of the
r e s p o n s e s to m e a s u r e u n d e r t h e t h r e e
conditions.

1 . W hen r e t e s t e d o n t h e s a m e p e r s o n .
 Consist ent r e s p o n s e i s e x p e c t e d w h e n t h e
t e s t i s g i v e n to t h e s a m e p a r t i c i p a n t s .

2 . When retested on t h e s a m e m e a s u r e .
 Reliability is a t t a i n e d if t h e r e s p o n s e s to t h e
s a m e test are consist ent with t h e s a m e test or
its e q u i v a l e n t or anot her test t h a t m e a s u r e s
b u t m e a s u r e s t h e s a m e characteristic w h e n
a d m i n i s t e r e d a t a different t i m e .
Reliability
3. Similarity of responses across item that
measure the same characteristic. There is
reliability when the person responded in the
same way or consistently across items that
measure the same characteristic.
Reliability
Factors that affect the reliability of the measure.
The reliability of the measure can be high or
low, depending on the following factors;

1. The number of items in the test - The more


items a test has, the likelihood of reliability is
high. The probability of obtaining consistent
scores is high because of the large pool of
items.
Reliability
2. Individual differences of
participants –
Every participant possesses
characteristics that affect
their performance in a test,
such as fatigue,
concentration, innate ability,
perseverance, and
motivation.
Reliability
Reliability is the degree to which a test consistently
measures whatever it measures.
1. Stability of measures
 Test-Retest Method
 Equivalent – Forms Method

2. Internal-Consistency Method
 Split-half Procedure
 Kuder-Richardson Approaches
 Alpha Coefficient
Reliability
Test-retest
 Is the degree to which scores are consistent
over time. It indicates score variation that
occurs from testing session to testing session
as a result of errors of measurement.

 Administer it again at another time to the “ same


group "of examinees
Reliability
Equivalent – Forms Method/ Parallel Forms
 When the equivalent-forms method is used,
two different but equivalent (also called
alternate or parallel ) forms of an instrument
are administered to the same group of
individuals during the same time period.

 The test is repeatedly used for different


groups, such as entrance examinations and
licensure examinations
Reliability
SPLIT-HALF PROCEDURE

 Especially appropriate when the test is very


long. The most commonly used method to split
the test into two is using the odd-even strategy.
Reliability
KUDER-RICHARDSON APPROACHES

► K = number of items on the test


► M = mean of the set of test scores
► SD = standard deviation of the set of
test scores
Reliability
Cronbachs’s Alpha
method
 This technique will
work well when the
assessment tool has a
large number of items.
It is also applicable for
scales and inventories
(e.g., Likert scale from
“strongly agree” to
“strongly disagree”).
Reliability

Inter-rater reliability
 used to determine the consistency of multiple
raters when using rating scales and rubrics to
judge performance

 Applicable when the assessment requires the


use of multiple raters.
How to improve Reliability?
 Quality of items; concise statements,
homogenous words (some sort of uniformity)
 Adequate sampling of content domain;
comprehensiveness of items
 Longer assessment – less distorted by
chance factors
 Developing a scoring plan (esp. for subjective
items – rubrics)
 Ensure VALIDITY
Linear
regression

Linear regression
 demonstrated when you have two
variables that are measured, such as two
sets of scores in a test taken at two
different times by the same participants.
Linear
regression
When the two scores are plotted in a graph
(with X-axis and Y-axis), they tend to form a
straight line.
Question…

In the context of what you understand about


VALIDITY and RELIABILITY, how do you go
about establishing/ensuring them in your own
test papers?
Indicators of quality
 Validity
 Reliability
 Utility
 Fairness

Question: how are they all inter-related?


Types of validity measures
 Face validity
 Construct validity
 Content validity
 Criterion validity
1. Predictive
2. Concurrent
 Consequences validity
Face Validity
 Does it appear to measure what it is supposed to
measure?

 Example: Let’s say you are interested in measuring,


‘Propensity towards violence and aggression’. By simply
looking at the following items, state which ones qualify to
measure the variable of interest:
 Have you been arrested?
 Have you been involved in physical fighting?
 Do you get angry easily?
 Do you sleep with your socks on?
 Is it hard to control your anger?
 Do you enjoy playing sports?
Construct Validity
 Does the test measure the ‘human’
CHARACTERISTIC(s) it is supposed to?
 Examples of constructs or ‘human’ characteristics:
 Mathematical reasoning
 Verbal reasoning
 Musical ability
 Spatial ability
 Mechanical aptitude
 Motivation
 Applicable to PBA/authentic assessment
 Each construct is broken down into its component parts
 E.g. ‘motivation’ can be broken down to:
 Interest
 Attention span
 Hours spent
 Assignments undertaken and submitted, etc.
All of these sub-constructs put together – measure ‘motivation’
Content Validity
 How well elements of the test relate to the content
domain?
 How closely content of questions in the test relates
to content of the curriculum?
 Directly relates to instructional objectives and the
fulfillment of the same!
 Major concern for achievement tests (where content
is emphasized)
 Can you test students on things they have not been
taught?
How to establish Content
Validity?
 Instructional objectives (looking at your list)
 Table of Specification
 E.g.
 At the end of the chapter, the student will be able
to do the following:
1. Explain what ‘stars’ are
2. Discuss the type of stars and galaxies in our universe
3. Categorize different constellations by looking at the stars
4. Differentiate between our stars, the sun, and all other
stars
Table of Specification (An Example)

Content areas Categories of Performance (Mental


Skills)

Knowledge Comprehension Analysis Total


1. What are
‘stars’?
2. Our star, the
Sun
3. Constellations
4. Galaxies
Total Grand
Total
Criterion Validity
 The degree to which content on a test (predictor)
correlates with performance on relevant criterion
measures (concrete criterion in the "real" world?)
 If they do correlate highly, it means that the test
(predictor) is a valid one!
 E.g. if you taught skills relating to ‘public speaking’
and had students do a test on it, the test can be
validated by looking at how it relates to actual
performance (public speaking) of students inside or
outside of the classroom
Two Types of Criterion Validity
 Concurrent Criterion Validity = how well performance
on a test estimates current performance on some valued
measure (criterion)? (e.g. test of dictionary skills can
estimate students’ current skills in the actual use of
dictionary – observation)

 Predictive Criterion Validity = how well performance on


a test predicts future performance on some valued
measure (criterion)? (e.g. reading readiness test might
be used to predict students’ achievement in reading)

 Both are only possible IF the predictors are VALID


Consequences Validity
 The extent to which the assessment served
its intended purpose
 Did the test improve performance?
Motivation? Independent learning?
 Did it distort the focus of instruction?
 Did it encourage or discourage creativity?
Exploration? Higher order thinking?
Factors that can lower Validity
 Unclear directions
 Difficult reading vocabulary and sentence structure
 Ambiguity in statements
 Inadequate time limits
 Inappropriate level of difficulty
 Poorly constructed test items
 Test items inappropriate for the outcomes being measured
 Tests that are too short
 Improper arrangement of items (complex to easy?)
 Identifiable patterns of answers
 Teaching
 Administration and scoring
 Students
 Nature of criterion

You might also like