0% found this document useful (0 votes)
83 views37 pages

Construct Reability Validity

This document discusses survey methodology and assessing the reliability and validity of survey instruments. It defines reliability as the consistency of measurements and identifies three types: test-retest, alternate-form, and internal consistency reliability. Validity refers to how well a survey measures its intended concept and includes face, content, criterion, and construct validity. The document provides examples and guidelines for evaluating each type of reliability and validity.

Uploaded by

Nguyen Thuy
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views37 pages

Construct Reability Validity

This document discusses survey methodology and assessing the reliability and validity of survey instruments. It defines reliability as the consistency of measurements and identifies three types: test-retest, alternate-form, and internal consistency reliability. Validity refers to how well a survey measures its intended concept and includes face, content, criterion, and construct validity. The document provides examples and guidelines for evaluating each type of reliability and validity.

Uploaded by

Nguyen Thuy
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 37

Survey Methodology

Reliability and Validity


EPID 626
Lecture 12
References
The majority of this lecture was taken
from
Litwin, Mark. How to Measure Survey
Reliability and Validity. Sage Publications.
1995.
Lecture objectives
To review the definitions of reliability
and validity

To review methods of evaluating
reliability and validity in survey research
Reliability

Definition
The degree of stability exhibited when a
measurement is repeated under identical
conditions

Lack of reliability may arise from divergences
between observers or instruments of
measurement or instability of the attribute
being measured
(from Last. Dictionary of Epidemiology)
Assessment of reliability
Reliability is assessed in 3 forms
Test-retest reliability
Alternate-form reliability
Internal consistency reliability

Test-retest reliability
Most common form in surveys
Measured by having the same respondents
complete a survey at two different points in
time to see how stable the responses are
Usually quantified with a correlation
coefficient (r value)
In general, r values are considered good if
r > 0.70
Test-retest reliability (2)
If data are recorded by an observer, you
can have the same observer make two
separate measurements
The comparison between the two
measurements is intraobserver
reliability
What does a difference mean?
Test-retest reliability (3)
You can test-retest specific questions or
the entire survey instrument
Be careful about test-retest with items or
scales that measure variables likely to
change over a short period of time, such
as energy, happiness, anxiety
If you do it, make sure that you test-retest
over very short periods of time
Test-retest reliability (4)
Potential problem with test-retest is the
practice effect
Individuals become familiar with the items
and simply answer based on their memory
of the last answer
What effect does this have on your
reliability estimates?
It inflates the reliability estimate
Alternate-form reliability
Use differently worded forms to
measure the same attribute
Questions or responses are reworded
or their order is changed to produce two
items that are similar but not identical
Alternate-form reliability (2)
Be sure that the two items address the same
aspect of behavior with the same vocabulary
and the same level of difficulty
Items should differ in wording only
It is common to simply change the order of
the response alternatives
This forces respondents to read the response
alternatives carefully and thus reduces practice
effect
Example: Assessment of depression
Circle one item
Version A:
During the past 4 weeks, I have felt downhearted:
Every day 1
Some days 2
Never 3

Version B:
During the past 4 weeks, I have felt downhearted:
Never 1
Some days 2
Every day 3
Alternate-form reliability (3)
You could also change the wording of the
response alternatives without changing the
meaning
Example: Assessment of urinary function
Version A:
During the past week, how often did you usually empty your
bladder?
1 to 2 times per day
3 to 4 times per day
5 to 8 times per day
12 times per day
More than 12 times per day
Example: Assessment of urinary function
Version B:
During the past week, how often did you usually empty your
bladder?
Every 12 to 24 hours
Every 6 to 8 hours
Every 3 to 5 hours
Every 2 hours
More than every 2 hours

Alternate-form reliability (4)
You could also change the actual wording of
the question
Be careful to make sure that the two items are
equivalent
Items with different degrees of difficulty do not
measure the same attribute
What might they measure?
Reading comprehension or cognitive function
Example: Assessment of loneliness
Version A:
How often in the past month have you felt alone in the world?
Every day
Some days
Occasionally
Never
Version B:
During the past 4 weeks, how often have you felt a sense of
loneliness?
All of the time
Sometimes
From time to time
Never
Example of nonequivalent item rewording
Version A:
When your boss blames you for something you did not do,
how often do you stick up for yourself?
All the time
Some of the time
None of the time
Version B:
When presented with difficult professional situations where a
superior censures you for an act for which you are not
responsible, how frequently do you respond in an
assertive way?
All of the time
Some of the time
None of the time
Alternate-form reliability (5)
You can measure alternate-form reliability at
the same timepoint or separate timepoints
If you have a large enough sample, you can
split it in half and administer one item to each
half and then compare the two halves
This is called a split-halves method
You could also split into thirds and administer
three forms of the item, etc.
Internal consistency reliability
Applied not to one item, but to groups of
items that are thought to measure different
aspects of the same concept
Cronbachs coefficient alpha
Measures internal consistency reliability among a
group of items combined to form a single scale
It is a reflection of how well the different items
complement each other in their measurement of
different aspects of the same variable or quality
Interpret like a correlation coefficient (>0.70 is
good)
Example: Assessment of physical function
Limited a
lot
Limited a
little
Not
limited
Vigorous activities, such as running, lifting heavy
objects, participating in strenuous sports

1 2 3
Moderate activities, such as moving a table,
pushing a vacuum cleaner, bowling, or playing golf

1 2 3
Lifting or carrying groceries 1 2 3
Climbing several flights of stairs 1 2 3
Bending, kneeling, or stooping 1 2 3
Walking more than a mile 1 2 3
Walking several blocks 1 2 3
Walking one block 1 2 3
Bathing or dressing yourself 1 2 3


Calculation of Cronbachs coefficient alpha
Example: Assessment of emotional health
During the past month: Yes No
Have you been a very nervous person? 1 0
Have you felt downhearted and blue? 1 0
Have you felt so down in the dumps that
nothing could cheer you up? 1 0

Results

Patient

Item 1

Item 2

Item 3
Summed
scale score
1 0 1 1 2
2 1 1 1 3
3 0 0 0 0
4 1 1 1 3
5 1 1 0 2
Percentage
positive

3/5=.6

4/5=.8

3/5=.6



Calculations
Mean score=2

Sample variance=



86 . 0
2
3
5 . 1
) 4 )(. 6 (. ) 2 )(. 8 (. ) 4 )(. 6 (.
1
1
) (% ) (%
1
=
(

+ +
=
(

(
(

=

k
k
Var
neg pos
alpha CC
i i
Conclude that this scale has good reliability
Internal consistency reliability (2)
If internal consistency is low you can
add more items or re-examine existing
items for clarity
Interobserver reliability
How well two evaluators agree in their
assessment of a variable
Use correlation coefficient to compare
data between observers
May be used as property of the test or
as an outcome variable
Validity

Definition
How well a survey measures what it
sets out to measure
Assessment of validity
Validity is measured in four forms
Face validity
Content validity
Criterion validity
Construct validity
Face validity
Cursory review of survey items by
untrained judges
Ex. Showing the survey to untrained
individuals to see whether they think the
items look okay
Very casual, soft
Many dont really consider this as a
measure of validity at all
Content validity
Subjective measure of how appropriate
the items seem to a set of reviewers
who have some knowledge of the
subject matter
Usually consists of an organized review of
the surveys contents to ensure that it
contains everything it should and doesnt
include anything that it shouldnt
Still very qualitative
Content validity (2)
Who might you include as reviewers?
How would you incorporate these two
assessments of validity (face and
content) into your survey instrument
design process?
Criterion validity
Measure of how well one instrument
stacks up against another instrument or
predictor
Concurrent: assess your instrument
against a gold standard
Predictive: assess the ability of your
instrument to forecast future events,
behavior, attitudes, or outcomes
Assess with correlation coefficient
Construct validity
Most valuable and most difficult
measure of validity
Basically, it is a measure of how
meaningful the scale or instrument is
when it is in practical use
Construct validity (2)
Convergent: Implies that several
different methods for obtaining the
same information about a given trait or
concept produce similar results
Evaluation is analogous to alternate-form
reliability except that it is more theoretical
and requires a great deal of work-usually
by multiple investigators with different
approaches
Construct validity (3)
Divergent: The ability of a measure to
estimate the underlying truth in a given
area-must be shown not to correlate too
closely with similar but distinct concepts
or traits

You might also like