Chapter 12 - Measurement-Scaling, Reliability and Validity
Chapter 12 - Measurement-Scaling, Reliability and Validity
LEVELS OF MEASUREMENT
1. The nominal level of measurement classifies data into mutually exclusive categories in which
no order or ranking can be imposed on the data.
2. The ordinal level of measurement classifies data into categories that can be ranked; however,
precise differences between the ranks do not exist.
3. The interval level of measurement ranks data, and precise differences between units of
measure do exist; however, there is no meaningful zero.
4. The ratio level of measurement possesses all the characteristics of interval measurement, and
there exists a true zero. In addition, true ratios exist when the same variable is measured on two
different members of the population.
Comparison of different scales
RATING SCALES
Dichotomous scale
A dichotomous scale is a two-point scale which presents options that are absolutely opposite each
other. This type of response scale does not give the respondent an opportunity to be neutral on his
answer in a question.
Examples
Yes … No
True … False
Agree … Disagree
Category scale
The category scale uses multiple items to elicit a single response, as per the following example.
This also uses the nominal scale.
A semantic differential scale is only used in specialist surveys in order to gather data and interpret
based on the connotative meaning of the respondent’s answer. It uses a pair of clearly opposite
words, where the respondent is asked to rate an object, person or any concept by putting a mark on
one of the spaces along each dimension.
The Likert scale is designed to examine how strongly subjects agree or disagree with statements on
a five-point scale.
Example:
This job is interesting.
Strongly agree Agree Neutral Disgree Strongly disagree
Fixed or constant sum scale
The respondents are here asked to distribute a given number of points across various items as per
the example below. This is more in the nature of an ordinal scale.
Examples: In choosing a toilet soap, indicate the importance you attach to each of the following five
aspects by allotting points for each to total 100 in all.
Fragrance —
Color —
Shape —
Size —
Texture of lather —
Total points 100
Stapel scale
Stapel Scale is a unipolar rating scale designed to measure the respondent’s attitude towards the
object or event.
Example:
How do you like the food?
+3
+2
+1
Food Quality
–1
–2
–3
VALIDITY TYPES
– Face validity: Do “experts” validate that the instrument measures what its name suggests it
measures?
– Criterion-related validity: Does the measure differentiate in a manner that helps to predict a
criterion variable?
– Concurrent validity: Does the measure differentiate in a manner that helps to predict a criterion
variable currently?
VALIDITY TYPES
– Predictive validity: Does the measure differentiate individuals in a manner that helps predict a
future criterion?
– Discriminant validity: Does the measure have a low correlation with a variable that is supposed
to be unrelated to this variable?
RELIABILITY
The reliability of a measure indicates the extent to which it is without bias and hence ensures
consistent measurement across time and across the various items in the instrument. In other words,
the reliability of a measure is an indication of the stability and consistency with which the
instrument measures the concept and helps to assess the “goodness” of a measure.
a. Test–retest reliability
The reliability coefficient obtained by repetition of the same measure on a second occasion is called
the test–retest reliability. That is, when a questionnaire containing some items that are supposed to
measure a concept is administered to a set of respondents, now and again to the same respondents,
say several weeks to six months later, then the correlation between the scores obtained at the two
different times from one and the same set of respondents is called the test–retest coefficient. The
higher it is, the better the test– retest reliability and, consequently, the stability of the measure across
time.
b. Parallel-form reliability
When responses on two comparable sets of measures tapping the same construct are highly
correlated, we have parallel-form reliability. Both forms have similar items and the same response
format, the only changes being the wording and the order or sequence of the questions. What we try
to establish here is the error variability resulting from wording and ordering of the questions. If two
such comparable forms are highly correlated (say 8 and above), we may be fairly certain that the
measures are reasonably reliable, with minimal error variance caused by wording, ordering, or other
factors.
c. Inter-item consistency reliability
The inter-item consistency reliability is a test of the consistency of respondents’ answers to all the
items in a measure. To the degree that items are independent measures of the same concept, they
will be correlated with one another.
d. Split-half reliability