0% found this document useful (0 votes)
7 views49 pages

Module 4 Psychometric Properties

Module 4 covers psychometric properties, focusing on reliability and validity of tests. It discusses various types of reliability, including test-retest, internal consistency, and inter-scorer reliability, as well as different forms of validity such as content, criterion-related, and construct validity. The document emphasizes the importance of these properties in ensuring accurate and consistent measurement in psychological testing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views49 pages

Module 4 Psychometric Properties

Module 4 covers psychometric properties, focusing on reliability and validity of tests. It discusses various types of reliability, including test-retest, internal consistency, and inter-scorer reliability, as well as different forms of validity such as content, criterion-related, and construct validity. The document emphasizes the importance of these properties in ensuring accurate and consistent measurement in psychological testing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Module 4

Psychometric Properties
Topics to cover…

• Unit 1: Reliability: Meaning and significance.

• Unit 2: Types of reliability- Test retest, Alternate forms, Split half, Coefficient
alpha, KR-20, Inter scorer reliabilities.

• Unit 3: Standard error of measurement. Factors influencing reliability

• Unit 4: Validity: Content, Criterion- Predictive and concurrent, Construct-


Convergent and Discriminant.

• Unit 5: Validity coefficient and standard error of estimate, Factors influencing


validity.
Reliability
• Precision or accuracy of the measurement or score

• Refers to the consistency of scores or measurement which is reflected


in the reproducibility of the scores.

• When all the factors are controlled – a reliable test is one that produces
identical results from one occasion to another

• Self correlation of the test


Reliability
• Temporal stability: consistency of scores obtained upon testing and
retesting

• Test is said to be consistent if the examinees who obtain high score on


one set of items also score high on equivalent set of items. (vice versa)

• The correlation coefficient indicating temporal stability – coefficient of


stability’

• Coefficient of stability= two sets of scores found upon testing and


retesting are correlated with each other.
Reliability
• Internal consistency:

• Consistency of scores obtained from two equivalent set of items of a


single test after single administration is referred to as the internal
consistency.

• The correlation coefficient indicating internal consistency– alpha


coefficient

• Alpha coefficient= two sets of scores by two equivalent sets of items of


the same test after its single administration are correlated.
Reliability- features
• Concerned with the test scores obtained with an assessment instrument
and not with the test itself

• It is the property of the test score and not of the test

• It is a necessary but not sufficient condition for validity

• Estimate of reliability is concerned with a particular type of


consistency (Scores are reliable over diff. period, over diff. raters or
samples)
Test retest reliability
• Single form of the of the test is administered twice on the same sample
with a reasonable time gap

• Two administrations of the same test yield two independent sets of


scores

• Reliability coefficient= two sets of scores when correlated

• Temporal stability coefficient – extent to which examinees retain their


relative position as measured in terms of the test score over a period of
time
Test retest reliability
• High test retest reliability coefficient – examinee who obtained low
score in the first administration tends to score low in second
administration (vice versa)

• Problem – reasonable time gap (two weeks)


Test retest reliability-disadvantage
• Time consuming method

• Assumes physical and psychological set up remains unchanged

• Uncontrolled environmental factors likely to influence total score

• Maturational effect contribute to the error variance

• Test that measure constantly changing characteristics – not appropriate


Internal consistency reliability
• Indicates the homogeneity of the test

• Homogeneity – when all items measure the same function or trait

• Split half method- test is divided into two equal halves

• Odd-even method: odd numbered items constitute one part of the test &
even numbered items constitute second part.

• Each examinee gets two sets of score- Single administration ( odd and
even part)

• Product Moment Correlation is computed – reliability of half test


Internal consistency reliability
Using split half method:
2*reliability of half the test
• Reliability of the whole test =
1+ reliability of half test

• rtt = 2r1/2 1/11


1+r1/2 1/11
Internal consistency reliability
Advantage:

• All data necessary for computation of reliability coefficient is available in one


single administration

• Quick estimate of reliability

Disadvantage:

• It should not be used with speed test

• Temporary condition and changes within examinee and environment works wither
favorable or unfavorably – enhancement/ depression in reliability coefficient
Internal consistency reliability
Kuder-Richardson Formulas

• K-R20

• K-R21

Requirements:

• All items should be homogenous – items measure the same factor

• Items must be given a score of 1 or 0

• K-R20 -Items should not vary in indices of difficulty


Internal consistency reliability
Kuder-Richardson Formulas

• K-R21 -Items should be of same difficulty value

• If indices of difficulty are not equal – value of reliability will be low


Internal consistency reliability
Kuder-Richardson Formulas

• K-R20
Internal consistency reliability
Kuder-Richardson Formulas

• K-R21
Internal consistency reliability
Kuder-Richardson Formulas

• K-R20

Item analysis worksheet

Difficulty value of each item

• K-R21

Mean of total score

SD of total score

No. of items
Internal consistency reliability
Kuder-Richardson Formulas - Disadvantages
• Used in heterogenous test – items measure different functions

• If items differ widely in difficulty value - lower coefficient

• Not suitable for speed test

• Applicable only if items are score 0 or 1


Internal consistency reliability
Coefficient alpha
• Suitable for multipoint items

• A.K.A- Cronbach's alpha ( α )


Internal consistency reliability
Coefficient alpha
• Overall consistency of the measure

• Good Cronbach’s alpha - .60 to .90

• Ranges from 0 ( zero internal consistency) to 1 (perfect internal consistency)


Parallel / Alternate / Equivalent forms reliability
• It requires the test to be developed in two forms which should be
comparable or equivalent

• Two forms of the test are administered to the same sample-


immediately or time gap of two weeks

• When reliability is calculated based on the data collected immediately-


alternate form immediate reliability

• When reliability is calculated after a time gap - alternate form delayed


reliability.
Parallel / Alternate / Equivalent forms reliability
Criteria for judging if two forms are parallel:

• Number of items in both forms should be same

• Items in both form should have uniformity regarding content and


difficulty

• Distribution of the index of difficulty of items in both should be similar

• Items should be of equal degree of homogeneity

• Mean and SD of both forms should be equal or nearly equal


Inter scorer reliability
• The extent to which independent evaluators produce similar ratings in
judging the same abilities or characteristics in the same target person or
object.

• If consistency is high, a researcher can be confident that similarly


trained individuals would likely produce similar scores on targets of the
same kind.

• The correlation coefficient is scorer reliability.


VALIDITY
• Anastasi (1968, 99) has said, “The validity of a test concerns what the
test measures and how well it does so.”

• Lindquist (1951, 213) has defined validity of a test as “the accuracy


with which it measures that which is intended to measure or s the
degree to which it approaches infallibility what it purports to measure.”
Types
Content or Curricular Validity

• Content validity is a nonstatistical type of validity that is usually


associated with achievement tests.

• Anastasi (1968): involves essentially the systematic examination of the


test content to deter whether it covers a representative sample of the
behavior domain to be measured.

• Content validity requires both item validity and sample validity.


Types
Content or Curricular Validity

• Item validity is concerned with whether the test items represent


measurements in the intended content area.

• Sampling validity - extent to which the test samples the total content
area.
Types
Content or Curricular Validity

The following points should be fully covered for ensuring full content validation
of a test:

• The area of content (or items) should be specified explicitly so that all major
portions in equal proportion be adequately covered by the items.

• Before the item writing starts, the content area should be fully defined in clear
words

• The relevance of contents or items should be established in the light of the


Types
Criterion-related Validity

• Criterion-related validity is one which is obtained by comparing (or


correlating) the test scores with scores obtained on a criterion available
at present or to be available in the future.

• The criterion is defined as an external and independent measure of


essentially the same variable that the test claims to measure.

• There are two subtypes of criterion-related validity: (a) predictive


validity, and (b) concurrent validity.
Types
Predictive Validity

• The predictive validity coefficient is a Pearson product-moment correlation


between the scores on the test and an appropriate criterion, where the criterion
measure is obtained after the desired lapse of time.

Concurrent Validity

• There is no time gap in obtaining test scores and criterion scores. The test is
correlated with a criterion which is available at the present time.

• The resulting coefficient of correlation will be an indicator of concurrent


validity.
Types
Concurrent Validity

• Concurrent validity can be determined by establishing relationship or


discrimination.

a) The test is administered to a defined group of individuals.

b) The criterion or previously established valid test is also administered to the


same group of individuals.

c) Subsequently, the two sets of scores are correlated.

d) The resulting coefficient indicates the concurrent validity of the test. If the
coefficient is high, the test has good concurrent validity.
Types
Concurrent Validity

• Major Qualities Desired in a Criterion Measure

1. Relevance

2. Freedom from bias

3. Reliability

4. Availability
Types
Construct Validity

• The term “construct validity” was first introduced in 1954 in the


Technical Recommendations of the American Psychological
Association

• the extent to which the test may be said to measure a theoretical


compute construct or trait
Types
Construct Validity

• A construct is a non observable trait, such as intelligence, which explains our


behaviour.

The process of validation:

• Specifying the possible different measures of the construct

• Determining the extent of correlation between all or some of those measures of


construct.

• Determining whether or not all or some measures act as if they were measuring
the construct
Types
Construct Validity

Gregory (2005) - indicate that a new test has construct validity


i. The test appears to be homogeneous and therefore, measures a single
construct
ii. The test correlates highly with related tests/instruments/variables than with
unrelated tests/instruments/variables
iii.Development changes over time or across periods of the different ages are
considered with the theory of construct being assessed.
Types
Construct Validity

iv. Differences among the well defined groups on the test are theory
consistent

v. Intervention effects produces changes on the test scores that are theory
consistent.

vi. The factor analysis of the test scores produces results that are
understandable in the light of the theory by which the test was
constructed.
Types
Convergent validation

• Convergent validity takes two measures that are supposed to be


measuring the same construct and shows that they are related.

Discriminant validation

• Discriminant validity shows that two measures that are not supposed to
be related are in fact, unrelated.
Types
E.g.:

In order to measure depression (the construct), you use two


measurements: a survey and participant observation. If the scores from
your two measurements are close enough (i.e. they converge), this
demonstrates that they are measuring the same construct. If they don’t
converge, this could indicate they are measuring different constructs (for
example, anger and depression or self-worth and depression).
Statistical methods for calculating validity
• Correlation Methods

• The validity of a test is defined as its correlation with some outside


independent criterion.

• The correlation coefficient can be calculated by different methods, the


important ones being Pearson r, biserial , point biserial r, tetrachoric ,
phi coefficient and the multiple correlation
Statistical methods for calculating validity
• Expectancy Tables

• The expectancy table is one way of showing the relation between the
test scores and the criterion measures.

• In an expectancy table, the expectancy (expressed in terms of


percentage or proportion) of the criterion measures for each examinee
is given against each test score.
Statistical methods for calculating validity
• Cut-off Score

• Cut-off score is defined as that score on the test which separates or


demarcates between the potentially superior and the potentially inferior
examinees.
Standard error of estimate
• Standard error of estimate is used to express test validity.

• A small standard error of estimate indicates a more valid test.


FACTORS INFLUENCING VALIDITY
• Length of the Test

• Range of Ability (or Sample Heterogeneity)

• Ambiguous Directions

• Socio-cultural Differences

• Addition of Inappropriate Items


Standard error of measurement
 To express the reliability of test scores

 Not influenced by the variability in the range of scores

 Standard error of measurement is defined as the standard deviation of


the error component (or score) in the obtained test scores.
 Standard error of measurement is calculated indirectly on the basis of
the standard deviation of the test scores and the reliability of the test.
Standard error of measurement
 Provides a direct indication of the absolute accuracy of test scores.

 Smaller the standard error of measurement, the more reliable or


consistent are the test scores
Factors influencing reliability
Extrinsic Factors

• Group variability

• Guessing by the examinees

• Environmental conditions

• Momentary fluctuations in the examinee


Factors influencing reliability
Intrinsic Factors

• Length of the test

• Range of the total scores

• Homogeneity of items

• Difficulty value of items

• Discrimination value

• Scorer reliability
Improve reliability of test scores
• Group of examinees should be heterogeneous

• Items should be homogeneous.

• Test should preferably be a longer one.

• As far as possible, items should be of moderate difficulty values; in


other word indexes of item difficulty should have the range of 0.40-
0.50-0.60.

• Items should be discriminatory ones.


Improve reliability of test scores
• Increase the length of the test- assumes that if new items similar to the
original set of items are added, the reliability of the test would tend to
increase.

• Discard the items that bring down the reliability- two techniques: factor
analysis and item analysis.

• Factor analysis ensures that tests are most reliable if they are unidimensional

• Item analysis - the correlation between each item and the total score for the
test is examined.
RELATION OF VALIDITY TO RELIABILITY
• Reliability and validity are two dimensions – Test efficiency

• Validity is the correlation of the test with some outside independent


criteria and reliability of the test is the self-correlation of the test.

• Validity is dependent upon reliability- For homogeneous test only.

• Reliability is a sufficient but not a necessary condition for validity.

You might also like