0% found this document useful (0 votes)

1K views28 pages

Establishing Validity-and-Reliability-Test

The document discusses establishing validity and reliability in testing. It defines validity as the degree to which a test measures what it claims to measure, and reliability as the consistency of responses to a test. There are several types of validity discussed, including content validity which ensures a test covers the relevant subject matter, and criterion validity which checks if test scores correlate with real-world outcomes. Reliability can be measured through test-retest methods and assessing internal consistency. Factors like the number of test items and individual differences can impact reliability.

Uploaded by

Ryl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views28 pages

Establishing Validity-and-Reliability-Test

Uploaded by

Ryl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

ESTABLISHING TEST VALIDITY AND

RELIABILITY

Neither Valid Reliable

nor Reliable but not
Valid

Fairly Valid but Valid &

not very Reliable
Reliable
ESTABLISHING TEST VALIDITY AND
RELIABILITY

OBJECTIVES

1.U s e p r o c e d u r e s a n d s t a t i s t i c a l a n a l y s i s to
e s t a b l i s h t e s t v a l i d i t y a n d r eliabilit y
2.De c id e w h e t h e r th e te s t is v a lid or r e l i a b l e
3.De c id e w h ic h te s t ite m s a re e a s y a n d
diff icult
Reliability
Reliability is the consistency of the
r e s p o n s e s to m e a s u r e u n d e r t h e t h r e e
conditions.

1 . W hen r e t e s t e d o n t h e s a m e p e r s o n .
 Consist ent r e s p o n s e i s e x p e c t e d w h e n t h e
t e s t i s g i v e n to t h e s a m e p a r t i c i p a n t s .

2 . When retested on t h e s a m e m e a s u r e .
 Reliability is a t t a i n e d if t h e r e s p o n s e s to t h e
s a m e test are consist ent with t h e s a m e test or
its e q u i v a l e n t or anot her test t h a t m e a s u r e s
b u t m e a s u r e s t h e s a m e characteristic w h e n
a d m i n i s t e r e d a t a different t i m e .
Reliability
3. Similarity of responses across item that
measure the same characteristic. There is
reliability when the person responded in the
same way or consistently across items that
measure the same characteristic.
Reliability
Factors that affect the reliability of the measure.
The reliability of the measure can be high or
low, depending on the following factors;

1. The number of items in the test - The more

items a test has, the likelihood of reliability is
high. The probability of obtaining consistent
scores is high because of the large pool of
items.
Reliability
2. Individual differences of
participants –
Every participant possesses
characteristics that affect
their performance in a test,
such as fatigue,
concentration, innate ability,
perseverance, and
motivation.
Reliability
Reliability is the degree to which a test consistently
measures whatever it measures.
1. Stability of measures
 Test-Retest Method
 Equivalent – Forms Method

2. Internal-Consistency Method
 Split-half Procedure
 Kuder-Richardson Approaches
 Alpha Coefficient
Reliability
Test-retest
 Is the degree to which scores are consistent
over time. It indicates score variation that
occurs from testing session to testing session
as a result of errors of measurement.

 Administer it again at another time to the “ same

group "of examinees
Reliability
Equivalent – Forms Method/ Parallel Forms
 When the equivalent-forms method is used,
two different but equivalent (also called
alternate or parallel ) forms of an instrument
are administered to the same group of
individuals during the same time period.

 The test is repeatedly used for different

groups, such as entrance examinations and
licensure examinations
Reliability
SPLIT-HALF PROCEDURE

 Especially appropriate when the test is very

long. The most commonly used method to split
the test into two is using the odd-even strategy.
Reliability
KUDER-RICHARDSON APPROACHES

► K = number of items on the test

► M = mean of the set of test scores
► SD = standard deviation of the set of
test scores
Reliability
Cronbachs’s Alpha
method
 This technique will
work well when the
assessment tool has a
large number of items.
It is also applicable for
scales and inventories
(e.g., Likert scale from
“strongly agree” to
“strongly disagree”).
Reliability

Inter-rater reliability
 used to determine the consistency of multiple
raters when using rating scales and rubrics to
judge performance

 Applicable when the assessment requires the

use of multiple raters.
How to improve Reliability?
 Quality of items; concise statements,
homogenous words (some sort of uniformity)
 Adequate sampling of content domain;
comprehensiveness of items
 Longer assessment – less distorted by
chance factors
 Developing a scoring plan (esp. for subjective
items – rubrics)
 Ensure VALIDITY
Linear
regression

Linear regression
 demonstrated when you have two
variables that are measured, such as two
sets of scores in a test taken at two
different times by the same participants.
Linear
regression
When the two scores are plotted in a graph
(with X-axis and Y-axis), they tend to form a
straight line.
Question…

In the context of what you understand about

VALIDITY and RELIABILITY, how do you go
about establishing/ensuring them in your own
test papers?
Indicators of quality
 Validity
 Reliability
 Utility
 Fairness

Question: how are they all inter-related?

Types of validity measures
 Face validity
 Construct validity
 Content validity
 Criterion validity
1. Predictive
2. Concurrent
 Consequences validity
Face Validity
 Does it appear to measure what it is supposed to
measure?

 Example: Let’s say you are interested in measuring,

‘Propensity towards violence and aggression’. By simply
looking at the following items, state which ones qualify to
measure the variable of interest:
 Have you been arrested?
 Have you been involved in physical fighting?
 Do you get angry easily?
 Do you sleep with your socks on?
 Is it hard to control your anger?
 Do you enjoy playing sports?
Construct Validity
 Does the test measure the ‘human’
CHARACTERISTIC(s) it is supposed to?
 Examples of constructs or ‘human’ characteristics:
 Mathematical reasoning
 Verbal reasoning
 Musical ability
 Spatial ability
 Mechanical aptitude
 Motivation
 Applicable to PBA/authentic assessment
 Each construct is broken down into its component parts
 E.g. ‘motivation’ can be broken down to:
 Interest
 Attention span
 Hours spent
 Assignments undertaken and submitted, etc.
All of these sub-constructs put together – measure ‘motivation’
Content Validity
 How well elements of the test relate to the content
domain?
 How closely content of questions in the test relates
to content of the curriculum?
 Directly relates to instructional objectives and the
fulfillment of the same!
 Major concern for achievement tests (where content
is emphasized)
 Can you test students on things they have not been
taught?
How to establish Content
Validity?
 Instructional objectives (looking at your list)
 Table of Specification
 E.g.
 At the end of the chapter, the student will be able
to do the following:
1. Explain what ‘stars’ are
2. Discuss the type of stars and galaxies in our universe
3. Categorize different constellations by looking at the stars
4. Differentiate between our stars, the sun, and all other
stars
Table of Specification (An Example)

Content areas Categories of Performance (Mental

Skills)

Knowledge Comprehension Analysis Total

1. What are
‘stars’?
2. Our star, the
Sun
3. Constellations
4. Galaxies
Total Grand
Total
Criterion Validity
 The degree to which content on a test (predictor)
correlates with performance on relevant criterion
measures (concrete criterion in the "real" world?)
 If they do correlate highly, it means that the test
(predictor) is a valid one!
 E.g. if you taught skills relating to ‘public speaking’
and had students do a test on it, the test can be
validated by looking at how it relates to actual
performance (public speaking) of students inside or
outside of the classroom
Two Types of Criterion Validity
 Concurrent Criterion Validity = how well performance
on a test estimates current performance on some valued
measure (criterion)? (e.g. test of dictionary skills can
estimate students’ current skills in the actual use of
dictionary – observation)

 Predictive Criterion Validity = how well performance on

a test predicts future performance on some valued
measure (criterion)? (e.g. reading readiness test might
be used to predict students’ achievement in reading)

 Both are only possible IF the predictors are VALID

Consequences Validity
 The extent to which the assessment served
its intended purpose
 Did the test improve performance?
Motivation? Independent learning?
 Did it distort the focus of instruction?
 Did it encourage or discourage creativity?
Exploration? Higher order thinking?
Factors that can lower Validity
 Unclear directions
 Difficult reading vocabulary and sentence structure
 Ambiguity in statements
 Inadequate time limits
 Inappropriate level of difficulty
 Poorly constructed test items
 Test items inappropriate for the outcomes being measured
 Tests that are too short
 Improper arrangement of items (complex to easy?)
 Identifiable patterns of answers
 Teaching
 Administration and scoring
 Students
 Nature of criterion

Teaching Strategies For The Development of Literacy Skills
100% (11)
Teaching Strategies For The Development of Literacy Skills
16 pages
Three Ideologies
100% (2)
Three Ideologies
5 pages
Affective Domain Assessment - 1
100% (2)
Affective Domain Assessment - 1
17 pages
Leadership in Education-1
100% (2)
Leadership in Education-1
57 pages
Questioning Basics of Physics
100% (7)
Questioning Basics of Physics
279 pages
Facilatating Learner-Centered Teaching Exam
100% (10)
Facilatating Learner-Centered Teaching Exam
20 pages
Brain School by Howard Eaton
75% (4)
Brain School by Howard Eaton
288 pages
Validity and Reliability
100% (4)
Validity and Reliability
19 pages
PROF - ED.6 Basic Concepts and Principles in Assessing Learning
100% (1)
PROF - ED.6 Basic Concepts and Principles in Assessing Learning
29 pages
Improving A Classroom-Based Assessment Test
100% (1)
Improving A Classroom-Based Assessment Test
27 pages
Lesson 6 Establishing Test Validity and Reliability: Learning Instructional Modules For CPE 105
100% (2)
Lesson 6 Establishing Test Validity and Reliability: Learning Instructional Modules For CPE 105
17 pages
2022 Practice Test For Aquaculture
No ratings yet
2022 Practice Test For Aquaculture
75 pages
Lesson 5 Construction of Written Test
100% (1)
Lesson 5 Construction of Written Test
20 pages
Measurements Assessment and Evaluation in Outcome Based Education
67% (6)
Measurements Assessment and Evaluation in Outcome Based Education
14 pages
Learning Material - Developing and Using Rubrics - Final
100% (1)
Learning Material - Developing and Using Rubrics - Final
11 pages
Final Exam in Teaching Profession
100% (1)
Final Exam in Teaching Profession
10 pages
Lesson 6 Establishing Test Validity and Reliability
No ratings yet
Lesson 6 Establishing Test Validity and Reliability
19 pages
Common Computations in FIshTech Exam
100% (1)
Common Computations in FIshTech Exam
22 pages
Lesson 5 What and Why of Performance Assessment
100% (1)
Lesson 5 What and Why of Performance Assessment
15 pages
Raising and Production of Catfish
No ratings yet
Raising and Production of Catfish
6 pages
Key Standards of Epp Sy 2023
No ratings yet
Key Standards of Epp Sy 2023
21 pages
Lesson6 Establishing Test Validity and Reliability
No ratings yet
Lesson6 Establishing Test Validity and Reliability
42 pages
AIA 6600 - Module 4 - Using Quest With US Census Data
100% (1)
AIA 6600 - Module 4 - Using Quest With US Census Data
6 pages
ED209 - Lesson-5
No ratings yet
ED209 - Lesson-5
36 pages
Food Countable - Uncountable
100% (5)
Food Countable - Uncountable
2 pages
Assessment in Learning 1
No ratings yet
Assessment in Learning 1
50 pages
A Detailed Lesson Plan in EDUC 105
No ratings yet
A Detailed Lesson Plan in EDUC 105
8 pages
Educ 104-Prelim Exam
No ratings yet
Educ 104-Prelim Exam
16 pages
Standardized Test
No ratings yet
Standardized Test
7 pages
Module 3 Principles of High Quality Assessment
No ratings yet
Module 3 Principles of High Quality Assessment
17 pages
LET Reviewer
No ratings yet
LET Reviewer
19 pages
Social, Legal and Ethical Implications of Tests
80% (5)
Social, Legal and Ethical Implications of Tests
8 pages
Chapter 2 Target Testing
No ratings yet
Chapter 2 Target Testing
40 pages
III E1 Product Oriented Perfomance Based Assessment
100% (1)
III E1 Product Oriented Perfomance Based Assessment
36 pages
Essay Test
100% (1)
Essay Test
48 pages
FS 1 Ep 10
No ratings yet
FS 1 Ep 10
8 pages
Prawn Hatchery Operation
No ratings yet
Prawn Hatchery Operation
56 pages
Module 1 Assessment of Learning
No ratings yet
Module 1 Assessment of Learning
9 pages
Chapter 3: Organization, Utilization, and Communication of Test Results
No ratings yet
Chapter 3: Organization, Utilization, and Communication of Test Results
25 pages
Matching Type
No ratings yet
Matching Type
26 pages
Assessment of Learning
100% (1)
Assessment of Learning
9 pages
Common Rating Scale Errors: Leniency Error
No ratings yet
Common Rating Scale Errors: Leniency Error
2 pages
Matching Type
No ratings yet
Matching Type
25 pages
Lesson 5 Construction of Written Test
No ratings yet
Lesson 5 Construction of Written Test
47 pages
Sensory Evaluation of Tuna
No ratings yet
Sensory Evaluation of Tuna
23 pages
Learning Assessment 1
100% (1)
Learning Assessment 1
43 pages
Multiple Choice Test PED 6
100% (2)
Multiple Choice Test PED 6
19 pages
FS 1 Learning Episode 8
No ratings yet
FS 1 Learning Episode 8
12 pages
Lesson 5 Construction of Written Tests
No ratings yet
Lesson 5 Construction of Written Tests
16 pages
Development of Varied Assessment Tools Knowledge and Reasonong
No ratings yet
Development of Varied Assessment Tools Knowledge and Reasonong
37 pages
Lesson 1 Basic Concepts and Principles in Assessing Learning
No ratings yet
Lesson 1 Basic Concepts and Principles in Assessing Learning
90 pages
AL 4.1 - Utilization of Assessment Data
No ratings yet
AL 4.1 - Utilization of Assessment Data
10 pages
FIELD STUDY 1 Episode 6
No ratings yet
FIELD STUDY 1 Episode 6
7 pages
Answers PED 9 MODULE 4 CURRICULUM DESIGN
100% (2)
Answers PED 9 MODULE 4 CURRICULUM DESIGN
4 pages
Assessment of Learning 2 Day 1
No ratings yet
Assessment of Learning 2 Day 1
23 pages
MODULE 2-Criterion and Norm Referenced Tests
No ratings yet
MODULE 2-Criterion and Norm Referenced Tests
6 pages
Principles of High Quality Assessment (Presentation)
No ratings yet
Principles of High Quality Assessment (Presentation)
55 pages
5.analyzing Test Items PDF
100% (3)
5.analyzing Test Items PDF
67 pages
Assessment: Types of Test
100% (2)
Assessment: Types of Test
4 pages
AL2-Module7-Grading and Reporting
0% (1)
AL2-Module7-Grading and Reporting
7 pages
AL 3.3 - Improving A Classroom-Based Assessment Test
No ratings yet
AL 3.3 - Improving A Classroom-Based Assessment Test
4 pages
Lesson 4 - Affective Assessment
No ratings yet
Lesson 4 - Affective Assessment
11 pages
M2L7 - Establishing Validity and Reliability of Tests
No ratings yet
M2L7 - Establishing Validity and Reliability of Tests
37 pages
Educ 5 Aosl 1 Determining Progress Towards The Attainment of Learning Outcomes
No ratings yet
Educ 5 Aosl 1 Determining Progress Towards The Attainment of Learning Outcomes
46 pages
Lesson 4 - Planning A Written Test
No ratings yet
Lesson 4 - Planning A Written Test
10 pages
PRC Lecture-Fish Taxonomy
No ratings yet
PRC Lecture-Fish Taxonomy
30 pages
Technology and Teaching Part 2
No ratings yet
Technology and Teaching Part 2
13 pages
Product-Oriented, Performance-Based Assessment
No ratings yet
Product-Oriented, Performance-Based Assessment
9 pages
Typical and Atypical Development During Middle Childhood
100% (1)
Typical and Atypical Development During Middle Childhood
6 pages
Affective Domain
No ratings yet
Affective Domain
4 pages
GRADING AND REPORTING Chapter 9
100% (2)
GRADING AND REPORTING Chapter 9
13 pages
Sea Cucumber
No ratings yet
Sea Cucumber
64 pages
Traditional Assessment Authentic Assessment
No ratings yet
Traditional Assessment Authentic Assessment
4 pages
FS 1 Learning Episode 10
No ratings yet
FS 1 Learning Episode 10
9 pages
Macromolecules
No ratings yet
Macromolecules
64 pages
Vdocuments - MX - Intro To Philosophy Branches of Study Definition of Philosophy Why Study Philosophy
No ratings yet
Vdocuments - MX - Intro To Philosophy Branches of Study Definition of Philosophy Why Study Philosophy
24 pages
UNIT III: Analysis of Assessment Data and Reporting of Assessment Results
No ratings yet
UNIT III: Analysis of Assessment Data and Reporting of Assessment Results
14 pages
Types of Quantitative Item Analysis
No ratings yet
Types of Quantitative Item Analysis
8 pages
Development of Classroom Assessment Tool1
No ratings yet
Development of Classroom Assessment Tool1
6 pages
Prof Ed 7
No ratings yet
Prof Ed 7
3 pages
Creation of BFARMC
No ratings yet
Creation of BFARMC
1 page
Sea Turtles Smithsonian Ocean
No ratings yet
Sea Turtles Smithsonian Ocean
31 pages
EDUC 4 Lesson 5
No ratings yet
EDUC 4 Lesson 5
13 pages
Appropriateness of Assessment Methods Assessment Instruments Appropriateness
No ratings yet
Appropriateness of Assessment Methods Assessment Instruments Appropriateness
3 pages
Validity and Reliability: Purpose of Tests
No ratings yet
Validity and Reliability: Purpose of Tests
19 pages
APPENDIX A - Tide and Tide Data
No ratings yet
APPENDIX A - Tide and Tide Data
8 pages
PCK 304 W3 Output 2.1 (Lyka)
No ratings yet
PCK 304 W3 Output 2.1 (Lyka)
3 pages
Assessment Assignment 2
No ratings yet
Assessment Assignment 2
2 pages
FISH OIL and FPC
No ratings yet
FISH OIL and FPC
5 pages
CHAPTER 2 - Selection of Fishfarm Site
No ratings yet
CHAPTER 2 - Selection of Fishfarm Site
12 pages
Aq. Set... G Identification
No ratings yet
Aq. Set... G Identification
7 pages
Humidity
No ratings yet
Humidity
9 pages
APPENDIX C - Acid Sulfate Soil
No ratings yet
APPENDIX C - Acid Sulfate Soil
4 pages
Junior Highschool Versus Senior High School
No ratings yet
Junior Highschool Versus Senior High School
2 pages
Norman H. Hadley (Auth.) Elective Mutism - A Handbook For Educators, Counsellors and Health Care Professionals 1994
No ratings yet
Norman H. Hadley (Auth.) Elective Mutism - A Handbook For Educators, Counsellors and Health Care Professionals 1994
268 pages
Chitta', "The Mind-Stuff" As A Cognitive Apparatus Model of Mind and Process of Cognition As in Yogasutra of Patanjali - RP
No ratings yet
Chitta', "The Mind-Stuff" As A Cognitive Apparatus Model of Mind and Process of Cognition As in Yogasutra of Patanjali - RP
7 pages
Alternative Response
0% (1)
Alternative Response
28 pages
Ruby Bridges Lesson Plan
No ratings yet
Ruby Bridges Lesson Plan
5 pages
A Detailed Lesson Plan in English 3
No ratings yet
A Detailed Lesson Plan in English 3
4 pages
Feet Competencies and Rubrics 5
No ratings yet
Feet Competencies and Rubrics 5
12 pages
MCT Formative Assessment Report 1 Shaimaa Habeeb Mohsin Salem Al Saqqaf h00412506
No ratings yet
MCT Formative Assessment Report 1 Shaimaa Habeeb Mohsin Salem Al Saqqaf h00412506
2 pages
Item Analysis
100% (1)
Item Analysis
9 pages
ILP Semester 2
No ratings yet
ILP Semester 2
5 pages
The Impact of Welfare Scheme On The Motivation of Workers (A Case Study of (ESBS) Enugu)
100% (1)
The Impact of Welfare Scheme On The Motivation of Workers (A Case Study of (ESBS) Enugu)
17 pages
English 5: File Created by Deped Click
No ratings yet
English 5: File Created by Deped Click
1 page
Probability Group 5
No ratings yet
Probability Group 5
3 pages
LUMOS - Dementia
No ratings yet
LUMOS - Dementia
100 pages
Learning Journal Unit 4
No ratings yet
Learning Journal Unit 4
3 pages
Laboratory Environment and Academic
No ratings yet
Laboratory Environment and Academic
8 pages
Ai Unit 4 Notes
No ratings yet
Ai Unit 4 Notes
11 pages
Stages of Child Language Acquisition
No ratings yet
Stages of Child Language Acquisition
3 pages
Math 8 q4 Week 8
No ratings yet
Math 8 q4 Week 8
4 pages
Richgels ProfessionalLibraryPhonemic 2001
No ratings yet
Richgels ProfessionalLibraryPhonemic 2001
6 pages
Lesson Plan Grade 9 January
No ratings yet
Lesson Plan Grade 9 January
7 pages
Schools Division of Negros Oriental: Republic of The Philippines Region VII, Central Visayas
No ratings yet
Schools Division of Negros Oriental: Republic of The Philippines Region VII, Central Visayas
2 pages
Legal Reasoning With Argumentation Schemes: Thomas F. Gordon Thomas - Gordon@fokus - Fraunhofer.de Douglas Walton
No ratings yet
Legal Reasoning With Argumentation Schemes: Thomas F. Gordon Thomas - Gordon@fokus - Fraunhofer.de Douglas Walton
10 pages
Active and Passive Voice
No ratings yet
Active and Passive Voice
5 pages

Establishing Validity-and-Reliability-Test

Uploaded by

Establishing Validity-and-Reliability-Test

Uploaded by

ESTABLISHING TEST VALIDITY AND

Neither Valid Reliable

Fairly Valid but Valid &

1. The number of items in the test - The more

 Administer it again at another time to the “ same

 The test is repeatedly used for different

 Especially appropriate when the test is very

► K = number of items on the test

 Applicable when the assessment requires the

In the context of what you understand about

Question: how are they all inter-related?

 Example: Let’s say you are interested in measuring,

Content areas Categories of Performance (Mental

Knowledge Comprehension Analysis Total

 Predictive Criterion Validity = how well performance on

 Both are only possible IF the predictors are VALID

You might also like