Validity and Reliability
Validity and Reliability
Class4
A person's performance in one administration of a measure does not reflect with complete
accuracy or the 'true' amount of the trait that the individual possesses.
There may be other systematic or chance factors that may affect his/her score on the measure.
Eg. the person's emotional state of mind, fatigue, the noise outside the test room, etc.
The observed test score is thus comprised of the true score and the error score.
TYPES OF RELIABILITY
Test re-test reliability – Administer twice to same individuals, correlate coefficient. Personality
and exercise example- similar results for both cardio exercise and personality results on two
different occasions.
Alternate Form reliability- two equivalent questionnaires and forms of exercises are
administered on two different occasions then correlated
Split-Half reliability – Divide the questionnaire and exercises into two halves, administer the
first half of questionnaire/ exercises, administer 2nd half of questionnaire/ exercises and
correlate.
Inter-item Consistency- formula used to see how consistent our questionnaire questions and
exercises we created.
Inter-scorer (rater) reliability – how consistently are we as scientist scoring or rating our
participants on their questionnaires and exercises.
Intra-scorer (rater) reliability – Are we as individual scientist rating or scoring consistently and
not being affected by other personal factors of the participant.
FACTORS AFFECTING RELIABILITY
Respondent (testee) error (person doing the test)
Non-response errors / Self-selection bias
Measure is speeded
Variability in individual scores (compare scores to the population for which the measure was intended)
Ability levels (compute reliability separately for homogeneous subgroups, such as age, gender, occupation)
Response bias:
- ¨Extremity bias (very + or -)
¨Centrality or neutrality bias
¨Stringency or leniency bias (raters are lenient or strict)
¨Acquiescence bias (respondent agrees with all questions - no preferences are noted)
¨Halo effect (raters rate more favourably if they like individual)
¨Social desirability bias (respond in a way you think is socially desirable)
¨Purposive falsification
¨Unconscious misrepresentation
Purposive falsification
Unconscious misrepresentation
FACTORS AFFECTING RELIABILITY
Administrative error (person administerinng the test)
Variations in:
Instructions
Assessment conditions
Interpretation of instructions
Scoring or ratings
Countered by:
-manuals with standardised instructions
-Following these instructions
INTERPRETING RELIABILITY
Standardised measures should have reliabilities ranging between .80 and .90. Some scholars
state that reliability coefficients should be .85 or higher if measures are used to make
decisions about people, while it may be .65 or higher for decisions about groups.
Magnitude of reliability coefficient:
Ø Standardized measures 0.8 to 0.9
Ø Individuals 0.85 or higher
Ø Groups 0.65 or higher
Ø Personality and Interest measures 0.8 to 0.85
Ø Aptitude 0.9 or higher
Content-description Procedures:
Face validity – Does it look like a personality measure or exercise watch
Content Validity- involves determining whether the content of the measure (the
questions in our questionnaire or the exercises we want our watch to measure) covers
a representative sample of the behaviour domain/ aspect to be measured (e.g. the
competency to be measured). A frequently used procedure to ensure high content
validity is the use of a panel of subject experts to evaluate the items during the test
construction phase.
TYPES OF VALIDITY
Construct-Identification Procedures:
Construct Validity: involves a quantitative, statistical analysis procedure. The construct validity of a measure
is the extent to which it measures the theoretical construct or trait it is supposed to measure. Such as;
Extroversion, neuroticism, resting heart rate, metabolic age etc.
Correlation with other tests- We would administer our personality measure and compare their results to an
already established personality measure, and compare our readings of a single participant on their readings
on our exercise watch to a Polar and Fit Bit readings.
Factorial validity- Factor analysis is a statistical technique for analysing the interrelationships of variables.
The aim is to determine the underlying structure or dimensions of a set of variables because, by identifying
the common variance between them, it is possible to reduce a large number of variables to a relatively small
number of factors or dimensions. We can use this technique when we consult our panel of experts to do a
theme analysis on what personality characteristics is important to going to space. We can do a meta analysis
on previous studies done on what physical exercises stand in your favour to going to space and correlating
those constructs with what our exercise watch actually measures.
TYPES OF VALIDITY
Convergent and discriminant validity- A measure demonstrates construct validity when it correlates highly
with other variables with which it should theoretically correlate (convergent validity)- extroversion and
warmth towards others, cardio-vascular fitness with metabolic age. And when it correlates minimally with
variables from which it should differ (discriminant validity)- extroversion and structure, heart rate and shoe
size for example.
Incremental and differential validity- A measure displays incremental validity when it explains numerically
additional variance compared to a set of other measures when predicting a dependent variable. For instance,
a measure of emotional intelligence would posses incremental validity if it explains additional variance
compared to a set of the ‘big five’ in personality measures when predicting job performance. So would our
personality questionnaire give any more info than an already existing measure.
Criterion- Prediction Procedures
Concurrent validity- How accurately can our personality questionnaire or exercise watch measure our
participants’ current personality functioning and current fitness levels.
Predictive Validity – How accurately can our personality measure and fitness watch predict future behaviour
or future fitness levels.
INTERPRETATION OF VALIDITY
Predictive validity coefficient = correlation coefficient
between the predictor variable(s) and the criterion
variable
The magnitude of the validity coefficient:
Statistically significant at 0.05 and 0.01 levels
For selection purposes à values of 0.30 and 0.20
are acceptable