0% found this document useful (0 votes)

7 views49 pages

Module 4 Psychometric Properties

Module 4 covers psychometric properties, focusing on reliability and validity of tests. It discusses various types of reliability, including test-retest, internal consistency, and inter-scorer reliability, as well as different forms of validity such as content, criterion-related, and construct validity. The document emphasizes the importance of these properties in ensuring accurate and consistent measurement in psychological testing.

Uploaded by

Mariyam Safreen K H

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views49 pages

Module 4 Psychometric Properties

Uploaded by

Mariyam Safreen K H

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 49

Module 4

Psychometric Properties
Topics to cover…

• Unit 1: Reliability: Meaning and significance.

• Unit 2: Types of reliability- Test retest, Alternate forms, Split half, Coefficient
alpha, KR-20, Inter scorer reliabilities.

• Unit 3: Standard error of measurement. Factors influencing reliability

• Unit 4: Validity: Content, Criterion- Predictive and concurrent, Construct-

Convergent and Discriminant.

• Unit 5: Validity coefficient and standard error of estimate, Factors influencing

validity.
Reliability
• Precision or accuracy of the measurement or score

• Refers to the consistency of scores or measurement which is reflected

in the reproducibility of the scores.

• When all the factors are controlled – a reliable test is one that produces
identical results from one occasion to another

• Self correlation of the test

Reliability
• Temporal stability: consistency of scores obtained upon testing and
retesting

• Test is said to be consistent if the examinees who obtain high score on

one set of items also score high on equivalent set of items. (vice versa)

• The correlation coefficient indicating temporal stability – coefficient of

stability’

• Coefficient of stability= two sets of scores found upon testing and

retesting are correlated with each other.
Reliability
• Internal consistency:

• Consistency of scores obtained from two equivalent set of items of a

single test after single administration is referred to as the internal
consistency.

• The correlation coefficient indicating internal consistency– alpha

coefficient

• Alpha coefficient= two sets of scores by two equivalent sets of items of

the same test after its single administration are correlated.
Reliability- features
• Concerned with the test scores obtained with an assessment instrument
and not with the test itself

• It is the property of the test score and not of the test

• It is a necessary but not sufficient condition for validity

• Estimate of reliability is concerned with a particular type of

consistency (Scores are reliable over diff. period, over diff. raters or
samples)
Test retest reliability
• Single form of the of the test is administered twice on the same sample
with a reasonable time gap

• Two administrations of the same test yield two independent sets of

scores

• Reliability coefficient= two sets of scores when correlated

• Temporal stability coefficient – extent to which examinees retain their

relative position as measured in terms of the test score over a period of
time
Test retest reliability
• High test retest reliability coefficient – examinee who obtained low
score in the first administration tends to score low in second
administration (vice versa)

• Problem – reasonable time gap (two weeks)

Test retest reliability-disadvantage
• Time consuming method

• Assumes physical and psychological set up remains unchanged

• Uncontrolled environmental factors likely to influence total score

• Maturational effect contribute to the error variance

• Test that measure constantly changing characteristics – not appropriate

Internal consistency reliability
• Indicates the homogeneity of the test

• Homogeneity – when all items measure the same function or trait

• Split half method- test is divided into two equal halves

• Odd-even method: odd numbered items constitute one part of the test &
even numbered items constitute second part.

• Each examinee gets two sets of score- Single administration ( odd and
even part)

• Product Moment Correlation is computed – reliability of half test

Internal consistency reliability
Using split half method:
2*reliability of half the test
• Reliability of the whole test =
1+ reliability of half test

• rtt = 2r1/2 1/11

1+r1/2 1/11
Internal consistency reliability
Advantage:

• All data necessary for computation of reliability coefficient is available in one

single administration

• Quick estimate of reliability

Disadvantage:

• It should not be used with speed test

• Temporary condition and changes within examinee and environment works wither
favorable or unfavorably – enhancement/ depression in reliability coefficient
Internal consistency reliability
Kuder-Richardson Formulas

• K-R20

• K-R21

Requirements:

• All items should be homogenous – items measure the same factor

• Items must be given a score of 1 or 0

• K-R20 -Items should not vary in indices of difficulty

Internal consistency reliability
Kuder-Richardson Formulas

• K-R21 -Items should be of same difficulty value

• If indices of difficulty are not equal – value of reliability will be low

Internal consistency reliability
Kuder-Richardson Formulas

• K-R20
Internal consistency reliability
Kuder-Richardson Formulas

• K-R21
Internal consistency reliability
Kuder-Richardson Formulas

• K-R20

Item analysis worksheet

Difficulty value of each item

• K-R21

Mean of total score

SD of total score

No. of items
Internal consistency reliability
Kuder-Richardson Formulas - Disadvantages
• Used in heterogenous test – items measure different functions

• If items differ widely in difficulty value - lower coefficient

• Not suitable for speed test

• Applicable only if items are score 0 or 1

Internal consistency reliability
Coefficient alpha
• Suitable for multipoint items

• A.K.A- Cronbach's alpha ( α )

Internal consistency reliability
Coefficient alpha
• Overall consistency of the measure

• Good Cronbach’s alpha - .60 to .90

• Ranges from 0 ( zero internal consistency) to 1 (perfect internal consistency)

Parallel / Alternate / Equivalent forms reliability
• It requires the test to be developed in two forms which should be
comparable or equivalent

• Two forms of the test are administered to the same sample-

immediately or time gap of two weeks

• When reliability is calculated based on the data collected immediately-

alternate form immediate reliability

• When reliability is calculated after a time gap - alternate form delayed

reliability.
Parallel / Alternate / Equivalent forms reliability
Criteria for judging if two forms are parallel:

• Number of items in both forms should be same

• Items in both form should have uniformity regarding content and

difficulty

• Distribution of the index of difficulty of items in both should be similar

• Items should be of equal degree of homogeneity

• Mean and SD of both forms should be equal or nearly equal

Inter scorer reliability
• The extent to which independent evaluators produce similar ratings in
judging the same abilities or characteristics in the same target person or
object.

• If consistency is high, a researcher can be confident that similarly

trained individuals would likely produce similar scores on targets of the
same kind.

• The correlation coefficient is scorer reliability.

VALIDITY
• Anastasi (1968, 99) has said, “The validity of a test concerns what the
test measures and how well it does so.”

• Lindquist (1951, 213) has defined validity of a test as “the accuracy

with which it measures that which is intended to measure or s the
degree to which it approaches infallibility what it purports to measure.”
Types
Content or Curricular Validity

• Content validity is a nonstatistical type of validity that is usually

associated with achievement tests.

• Anastasi (1968): involves essentially the systematic examination of the

test content to deter whether it covers a representative sample of the
behavior domain to be measured.

• Content validity requires both item validity and sample validity.

Types
Content or Curricular Validity

• Item validity is concerned with whether the test items represent

measurements in the intended content area.

• Sampling validity - extent to which the test samples the total content
area.
Types
Content or Curricular Validity

The following points should be fully covered for ensuring full content validation
of a test:

• The area of content (or items) should be specified explicitly so that all major
portions in equal proportion be adequately covered by the items.

• Before the item writing starts, the content area should be fully defined in clear
words

• The relevance of contents or items should be established in the light of the

Types
Criterion-related Validity

• Criterion-related validity is one which is obtained by comparing (or

correlating) the test scores with scores obtained on a criterion available
at present or to be available in the future.

• The criterion is defined as an external and independent measure of

essentially the same variable that the test claims to measure.

• There are two subtypes of criterion-related validity: (a) predictive

validity, and (b) concurrent validity.
Types
Predictive Validity

• The predictive validity coefficient is a Pearson product-moment correlation

between the scores on the test and an appropriate criterion, where the criterion
measure is obtained after the desired lapse of time.

Concurrent Validity

• There is no time gap in obtaining test scores and criterion scores. The test is
correlated with a criterion which is available at the present time.

• The resulting coefficient of correlation will be an indicator of concurrent

validity.
Types
Concurrent Validity

• Concurrent validity can be determined by establishing relationship or

discrimination.

a) The test is administered to a defined group of individuals.

b) The criterion or previously established valid test is also administered to the

same group of individuals.

c) Subsequently, the two sets of scores are correlated.

d) The resulting coefficient indicates the concurrent validity of the test. If the
coefficient is high, the test has good concurrent validity.
Types
Concurrent Validity

• Major Qualities Desired in a Criterion Measure

1. Relevance

2. Freedom from bias

3. Reliability

4. Availability
Types
Construct Validity

• The term “construct validity” was first introduced in 1954 in the

Technical Recommendations of the American Psychological
Association

• the extent to which the test may be said to measure a theoretical

compute construct or trait
Types
Construct Validity

• A construct is a non observable trait, such as intelligence, which explains our

behaviour.

The process of validation:

• Specifying the possible different measures of the construct

• Determining the extent of correlation between all or some of those measures of

construct.

• Determining whether or not all or some measures act as if they were measuring
the construct
Types
Construct Validity

Gregory (2005) - indicate that a new test has construct validity

i. The test appears to be homogeneous and therefore, measures a single
construct
ii. The test correlates highly with related tests/instruments/variables than with
unrelated tests/instruments/variables
iii.Development changes over time or across periods of the different ages are
considered with the theory of construct being assessed.
Types
Construct Validity

iv. Differences among the well defined groups on the test are theory
consistent

v. Intervention effects produces changes on the test scores that are theory
consistent.

vi. The factor analysis of the test scores produces results that are
understandable in the light of the theory by which the test was
constructed.
Types
Convergent validation

• Convergent validity takes two measures that are supposed to be

measuring the same construct and shows that they are related.

Discriminant validation

• Discriminant validity shows that two measures that are not supposed to
be related are in fact, unrelated.
Types
E.g.:

In order to measure depression (the construct), you use two

measurements: a survey and participant observation. If the scores from
your two measurements are close enough (i.e. they converge), this
demonstrates that they are measuring the same construct. If they don’t
converge, this could indicate they are measuring different constructs (for
example, anger and depression or self-worth and depression).
Statistical methods for calculating validity
• Correlation Methods

• The validity of a test is defined as its correlation with some outside

independent criterion.

• The correlation coefficient can be calculated by different methods, the

important ones being Pearson r, biserial , point biserial r, tetrachoric ,
phi coefficient and the multiple correlation
Statistical methods for calculating validity
• Expectancy Tables

• The expectancy table is one way of showing the relation between the
test scores and the criterion measures.

• In an expectancy table, the expectancy (expressed in terms of

percentage or proportion) of the criterion measures for each examinee
is given against each test score.
Statistical methods for calculating validity
• Cut-off Score

• Cut-off score is defined as that score on the test which separates or

demarcates between the potentially superior and the potentially inferior
examinees.
Standard error of estimate
• Standard error of estimate is used to express test validity.

• A small standard error of estimate indicates a more valid test.

FACTORS INFLUENCING VALIDITY
• Length of the Test

• Range of Ability (or Sample Heterogeneity)

• Ambiguous Directions

• Socio-cultural Differences

• Addition of Inappropriate Items

Standard error of measurement
 To express the reliability of test scores

 Not influenced by the variability in the range of scores

 Standard error of measurement is defined as the standard deviation of

the error component (or score) in the obtained test scores.
 Standard error of measurement is calculated indirectly on the basis of
the standard deviation of the test scores and the reliability of the test.
Standard error of measurement
 Provides a direct indication of the absolute accuracy of test scores.

 Smaller the standard error of measurement, the more reliable or

consistent are the test scores
Factors influencing reliability
Extrinsic Factors

• Group variability

• Guessing by the examinees

• Environmental conditions

• Momentary fluctuations in the examinee

Factors influencing reliability
Intrinsic Factors

• Length of the test

• Range of the total scores

• Homogeneity of items

• Difficulty value of items

• Discrimination value

• Scorer reliability
Improve reliability of test scores
• Group of examinees should be heterogeneous

• Items should be homogeneous.

• Test should preferably be a longer one.

• As far as possible, items should be of moderate difficulty values; in

other word indexes of item difficulty should have the range of 0.40-
0.50-0.60.

• Items should be discriminatory ones.

Improve reliability of test scores
• Increase the length of the test- assumes that if new items similar to the
original set of items are added, the reliability of the test would tend to
increase.

• Discard the items that bring down the reliability- two techniques: factor
analysis and item analysis.

• Factor analysis ensures that tests are most reliable if they are unidimensional

• Item analysis - the correlation between each item and the total score for the
test is examined.
RELATION OF VALIDITY TO RELIABILITY
• Reliability and validity are two dimensions – Test efficiency

• Validity is the correlation of the test with some outside independent

criteria and reliability of the test is the self-correlation of the test.

• Validity is dependent upon reliability- For homogeneous test only.

• Reliability is a sufficient but not a necessary condition for validity.

PH and PH Meter-1
100% (1)
PH and PH Meter-1
9 pages
Lesson in EDUC 4 (Establishing Test Validity and Reliability)
No ratings yet
Lesson in EDUC 4 (Establishing Test Validity and Reliability)
20 pages
Work Sampling
100% (1)
Work Sampling
69 pages
Validity and Reliability of Research Instrument
100% (5)
Validity and Reliability of Research Instrument
47 pages
Measurement of Conductance and Kohlrauch's Law
No ratings yet
Measurement of Conductance and Kohlrauch's Law
23 pages
Cie 15 2004 Tables
No ratings yet
Cie 15 2004 Tables
34 pages
PROKON Footing-Design-Calculation PDF
100% (3)
PROKON Footing-Design-Calculation PDF
45 pages
Establishing Validity-and-Reliability-Test
No ratings yet
Establishing Validity-and-Reliability-Test
28 pages
Basis Worksheet
No ratings yet
Basis Worksheet
52 pages
Qualities of An Evaluation Tool
No ratings yet
Qualities of An Evaluation Tool
42 pages
An Episodic History of Mathematics PDF
No ratings yet
An Episodic History of Mathematics PDF
483 pages
Precedence, Dominance and C-Command: Binding Theory
100% (1)
Precedence, Dominance and C-Command: Binding Theory
6 pages
KENZA MAX Biochemistry: Compliance
No ratings yet
KENZA MAX Biochemistry: Compliance
43 pages
Water Well Drilling Machine and Tools Catalogue
No ratings yet
Water Well Drilling Machine and Tools Catalogue
49 pages
Protech Controller LF-313LD
100% (4)
Protech Controller LF-313LD
2 pages
Validity and Reliability
100% (1)
Validity and Reliability
22 pages
Chracteristics of A Good Test
No ratings yet
Chracteristics of A Good Test
58 pages
9 Reliability
No ratings yet
9 Reliability
10 pages
Optical Computers Technical Seminar Report Vtu Ece
100% (1)
Optical Computers Technical Seminar Report Vtu Ece
33 pages
An Achievement Test
100% (2)
An Achievement Test
20 pages
Permutation
No ratings yet
Permutation
91 pages
Sap Pi Adapters Faq
100% (3)
Sap Pi Adapters Faq
16 pages
Al2 Report
No ratings yet
Al2 Report
87 pages
CBI245A Data Sheet - R27-D
No ratings yet
CBI245A Data Sheet - R27-D
1 page
Radio Network Planning: in Arcgis
No ratings yet
Radio Network Planning: in Arcgis
12 pages
Reliability Reviewer
No ratings yet
Reliability Reviewer
5 pages
Assignment No. 2 (8624)
No ratings yet
Assignment No. 2 (8624)
109 pages
Qualities of Test (Validity & Relibility Etc)
No ratings yet
Qualities of Test (Validity & Relibility Etc)
38 pages
Establishing Validity and Reliability
No ratings yet
Establishing Validity and Reliability
39 pages
Test Constrcution
No ratings yet
Test Constrcution
39 pages
0520 20Validity2020Reliability
No ratings yet
0520 20Validity2020Reliability
37 pages
Chapter 13 Assessing Quality of Measurement Tools 2
No ratings yet
Chapter 13 Assessing Quality of Measurement Tools 2
57 pages
Error Detection and Correction
No ratings yet
Error Detection and Correction
38 pages
M2 Lesson 4 Slides For Students
No ratings yet
M2 Lesson 4 Slides For Students
48 pages
Validity and Reliability
No ratings yet
Validity and Reliability
24 pages
Chapter 7 1
No ratings yet
Chapter 7 1
46 pages
Ita Report-N7-V3 BD P
No ratings yet
Ita Report-N7-V3 BD P
12 pages
Ts1 ts2
No ratings yet
Ts1 ts2
61 pages
EIDMAT0806 AB A1 Ans
No ratings yet
EIDMAT0806 AB A1 Ans
41 pages
Evaluating Selection Techniques and Decisions
No ratings yet
Evaluating Selection Techniques and Decisions
14 pages
Reliability 2019
No ratings yet
Reliability 2019
7 pages
Reliability and Validity
No ratings yet
Reliability and Validity
19 pages
Sudhanshu Rai Visual Basic Assignment
No ratings yet
Sudhanshu Rai Visual Basic Assignment
33 pages
Dod STD 2183
No ratings yet
Dod STD 2183
19 pages
BBA-BI-Class 19 Business Research Notes For BHM
No ratings yet
BBA-BI-Class 19 Business Research Notes For BHM
28 pages
Psych Testing Reviewer Midterm
No ratings yet
Psych Testing Reviewer Midterm
9 pages
Concept of Reliability, Validity and Norms (AutoRecovered)
No ratings yet
Concept of Reliability, Validity and Norms (AutoRecovered)
10 pages
Characteristics of A Good Test
No ratings yet
Characteristics of A Good Test
41 pages
Word High Quality Assessment Componentsr
No ratings yet
Word High Quality Assessment Componentsr
8 pages
290 Module III
No ratings yet
290 Module III
31 pages
Chapter 4 Bending Part 1
No ratings yet
Chapter 4 Bending Part 1
35 pages
Reliability and Validity
No ratings yet
Reliability and Validity
21 pages
Measuring Instrument Module 2
No ratings yet
Measuring Instrument Module 2
10 pages
Validity and Reliability Lesson 3.
No ratings yet
Validity and Reliability Lesson 3.
48 pages
Unit 9
No ratings yet
Unit 9
27 pages
Chapter 4 Assessment & Evaluation
No ratings yet
Chapter 4 Assessment & Evaluation
10 pages
LESSON 6 Assessment Reviewer
No ratings yet
LESSON 6 Assessment Reviewer
7 pages
Strructures
No ratings yet
Strructures
28 pages
WA DOC 20230324 44dd412a
No ratings yet
WA DOC 20230324 44dd412a
8 pages
Validity&Reliability
No ratings yet
Validity&Reliability
16 pages
Industrial/Organizational Psychology - Chapters 6 and 7
No ratings yet
Industrial/Organizational Psychology - Chapters 6 and 7
13 pages
Quality of A Test
No ratings yet
Quality of A Test
7 pages
Gates Belt Guide
No ratings yet
Gates Belt Guide
12 pages
Royal University of Phnom Penh Faculty of Education Master of Education Program
No ratings yet
Royal University of Phnom Penh Faculty of Education Master of Education Program
41 pages
Hyperbola
No ratings yet
Hyperbola
2 pages
Educ Measurement Prelim
No ratings yet
Educ Measurement Prelim
24 pages
What Is Validit1
No ratings yet
What Is Validit1
5 pages
Reviewer Test Measurement Midterms
No ratings yet
Reviewer Test Measurement Midterms
6 pages
Test - Education (1) STANDARDIZED TESTS
No ratings yet
Test - Education (1) STANDARDIZED TESTS
9 pages
I/O Chapter 6
No ratings yet
I/O Chapter 6
4 pages
Chapter 6edited
No ratings yet
Chapter 6edited
15 pages
Properties of Assessment Method: Validity
No ratings yet
Properties of Assessment Method: Validity
30 pages
Assessment Midtrerm Quiz Reviewer
No ratings yet
Assessment Midtrerm Quiz Reviewer
3 pages
PT Presentaion
No ratings yet
PT Presentaion
25 pages
Psych Ass Ratio March 4
No ratings yet
Psych Ass Ratio March 4
4 pages
Midterm Assess
No ratings yet
Midterm Assess
3 pages
Heat Pipes Write Up With Example
No ratings yet
Heat Pipes Write Up With Example
9 pages
Top 4 Characteristics of A Good Test: Characteristic # 1. Reliability
No ratings yet
Top 4 Characteristics of A Good Test: Characteristic # 1. Reliability
21 pages
Validity & Reliability
No ratings yet
Validity & Reliability
27 pages
Paprint
No ratings yet
Paprint
3 pages
Language Test Reliability
No ratings yet
Language Test Reliability
20 pages
Morris 2014
No ratings yet
Morris 2014
11 pages
Psychometric Properties
No ratings yet
Psychometric Properties
3 pages
Characteristics of Effective Selection Techniques
No ratings yet
Characteristics of Effective Selection Techniques
17 pages
Assessment of Learning: Proverbs 3:3
No ratings yet
Assessment of Learning: Proverbs 3:3
2 pages
KPD Validity & Realibility
No ratings yet
KPD Validity & Realibility
25 pages
Validity & Realibility
No ratings yet
Validity & Realibility
13 pages
CISA EXAM-Testing Concept-Knowledge of Compliance & Substantive Testing Aspects
From Everand
CISA EXAM-Testing Concept-Knowledge of Compliance & Substantive Testing Aspects
Hemang Doshi
3/5 (4)
ISTQB Certified Tester Advanced Level Test Manager (CTAL-TM): Practice Questions Syllabus 2012
From Everand
ISTQB Certified Tester Advanced Level Test Manager (CTAL-TM): Practice Questions Syllabus 2012
Gabriel Awoyemi
No ratings yet
Healthcare Staffing Candidate Screening and Interviewing Handbook
From Everand
Healthcare Staffing Candidate Screening and Interviewing Handbook
Business Success Shop
No ratings yet