Chapter Five Measurement and Scaling Techniques Measurement in Research
Chapter Five Measurement and Scaling Techniques Measurement in Research
In an artificial or nominal way, categorical data (qualitative or descriptive) can be made into
numerical data and if we code the various categories, we refer to the numbers we record as
nominal data. Nominal data are numerical in name only, because they do not share any of the
properties of the numbers we deal in ordinary arithmetic. For instance if we record marital
status as 1, 2, 3, or 4 for single, married, divorced and widowed respectively, we cannot write
4 > 2 or 3 < 4 and we cannot write 3 – 1 = 4 – 2, 1 + 3 = 4 or 4 = 2 + 2.
In those situations when we cannot do anything except set up inequalities, we refer to the data
as ordinal data. For instance, if one mineral can scratch another, it receives a higher hardness
number and on Mohs‘ scale the numbers from 1 to 10 are assigned respectively to talc,
gypsum, calcite, fluorite, apatite, feldspar, quartz, topaz, sapphire and diamond. With these
numbers we can write 5 > 2 or 6 < 9 as apatite is harder than gypsum and feldspar is softer
than sapphire, but we cannot write for example 10 – 9 = 5 – 4, because the difference in
hardness between diamond and sapphire is actually much greater than that between apatite
and fluorite. It would also be meaningless to say that topaz is twice as hard as fluorite simply
because their respective hardness numbers on Mohs‘ scale are 8 and 4. The greater than
1
symbol (i.e., >) in connection with ordinal data may be used to designate ―happier than‖
―preferred to‖ and so on.
When in addition to setting up inequalities we can also form differences, we refer to the data
as interval data. Suppose we are given the following temperature readings (in degrees
Fahrenheit): 58°, 63°, 70°, 95°, 110°, 126° and 135°. In this case, we can write 100° > 70° or
95° < 135° which simply means that 110° is warmer than 70° and that 95° is cooler than
135°. We can also write for example 95° – 70° = 135° – 110°, since equal temperature
differences are equal in the sense that the same amount of heat is required to raise the
temperature of an object from 70° to 95° or from 110° to 135°. On the other hand, it would
not mean much if we said that 126° is twice as hot as 63°, even though 126° = 63° * 2. To
show the reason, we have only to change to the centigrade scale, where the first temperature
becomes 5/9 (126 – 32) = 52°, the second temperature becomes 5/9 (63 – 32) = 17° and the
first figure is now more than three times the second. This difficulty arises from the fact that
Fahrenheit and Centigrade scales both have artificial origins (zeros) i.e., the number 0 of
neither scale is indicative of the absence of whatever quantity we are trying to measure.
When in addition to setting up inequalities and forming differences we can also form
quotients (i.e., when we can perform all the customary operations of mathematics), we refer
to such data as ratio data. In this sense, ratio data includes all the usual measurement (or
determinations) of length, height, money amounts, weight, volume, area, pressures etc. The
above stated distinction between nominal, ordinal, interval and ratio data is important for the
use of particular statistical techniques. A researcher has to be quite alert about this aspect
while measuring properties of objects or of abstract concepts.
Nominal scale: Nominal scale is simply a system of assigning number symbols to events in
order to label them. The usual example of this is the assignment of numbers of football
players in order to identify them. Nominal scales provide convenient ways of keeping track
of people, objects and events. One cannot do much with the numbers involved. For example,
2
one cannot usefully average the numbers on the back of a group of football players and come
up with a meaningful value. Neither can one usefully compare the numbers assigned to one
group with the numbers assigned to another. The counting of members in each group is the
only possible arithmetic operation when a nominal scale is employed. Accordingly, we are
restricted to use mode as the measure of central tendency. There is no generally used measure
of dispersion for nominal scales.
Chi-square test is the most common test of statistical significance that can be utilized, and for
the measures of correlation, the contingency coefficient can be worked out. Nominal scale is
the least powerful level of measurement. It indicates no order or distance relationship and has
no arithmetic origin. Nominal data are, thus, counted data. In spite of all this, nominal scales
are still very useful.
Ordinal scale: The lowest level of the ordered scale that is commonly used is the ordinal
scale. The ordinal scale places events in order, but there is no attempt to make the intervals of
the scale equal in terms of some rule. Rank orders represent ordinal scales and are frequently
used in research relating to qualitative phenomena. A student‘s rank in his graduation class
involves the use of an ordinal scale. One has to be very careful in making statement about
scores based on ordinal scales. For instance, if Robel‘s position in his class is 10 and
Mulugeta‘s position is 40, it cannot be said that Robel‘s position is four times as good as that
of Mulugeta. The statement would make no sense at all. Ordinal scales only permit the
ranking of items from highest to lowest. Ordinal measures have no absolute values, and the
real differences between adjacent ranks may not be equal. All that can be said is that one
person is higher or lower on the scale than another, but more precise comparisons cannot be
made.
Thus, the use of an ordinal scale implies a statement of ‗greater than‘ or ‗less than‘ (an
equality statement is also acceptable) without our being able to state how much greater or
less. The real difference between ranks 1 and 2 may be more or less than the difference
between ranks 5 and 6. Since the numbers of this scale have only a rank meaning, the
appropriate measure of central tendency is the median. A percentile or quartile measure is
used for measuring dispersion. Correlations are restricted to various rank order methods.
Measures of statistical significance are restricted to the non-parametric methods.
Interval scale: In the case of interval scale, the intervals are adjusted in terms of some rule
that has been established as a basis for making the units equal. Interval scales can have an
3
arbitrary zero, but it is not possible to determine for them what may be called an absolute
zero or the unique origin. The primary limitation of the interval scale is the lack of a true
zero; it does not have the capacity to measure the complete absence of a trait or characteristic.
The Fahrenheit scale is an example of an interval scale and shows similarities in what one can
and cannot do with it. One can say that an increase in temperature from 30° to 40° involves
the same increase in temperature as an increase from 60° to 70°, but one cannot say that the
temperature of 60° is twice as warm as the temperature of 30° because both numbers are
dependent on the fact that the zero on the scale is set arbitrarily at the temperature of the
freezing point of water. The ratio of the two temperatures, 30° and 60°, means nothing
because zero is an arbitrary point.
Interval scales provide more powerful measurement than ordinal scales for interval scale also
incorporates the concept of equality of interval. As such more powerful statistical measures
can be used with interval scales. Mean is the appropriate measure of central tendency, while
standard deviation is the most widely used measure of dispersion. Product moment
correlation techniques are appropriate and the generally used tests for statistical significance
are the ‗t‘ test and ‗F‘ test.
Ratio scale: Ratio scales have an absolute or true zero of measurement. We can conceive of
an absolute zero of length and similarly we can conceive of an absolute zero of time. For
example, the zero point on a centimeter scale indicates the complete absence of length or
height. The number of minor traffic-rule violations and the number of incorrect letters in a
page of type script represent scores on ratio scales. Both these scales have absolute zeros and
as such all minor traffic violations and all typing errors can be assumed to be equal in
significance.
Ratio scale represents the actual amounts of variables. Measures of physical dimensions such
as weight, height, distance, etc. are examples. Generally, all statistical techniques are usable
with ratio scales and all manipulations that one can carry out with real numbers can also be
carried out with ratio scale values. Multiplication and division can be used with this scale but
not with other scales mentioned above. Geometric and harmonic means can be used as
measures of central tendency and coefficients of variation may also be calculated.
Thus, proceeding from the nominal scale (the least precise type of scale) to ratio scale (the
most precise), relevant information is obtained increasingly. If the nature of the variables
permits, the researcher should use the scale that provides the most precise description.
4
5.3. Sources of Error in Measurement
Measurement should be precise and unambiguous in an ideal research study. The following
are the possible sources of error in measurement.
Respondent: The respondent may be reluctant to express strong negative feelings or it is just
possible that he may have very little knowledge. Temporary factors like fatigue, boredom,
anxiety, etc. may limit the ability of the respondent to respond accurately and fully.
Situation: Situational factors may also come in the way of correct measurement. Any
condition which places a strain on interview can have serious effects on the interviewer-
respondent rapport. For instance, if someone else is present or if the respondent feels that
anonymity is not assured, he may be reluctant to express certain feelings.
Measurer: The interviewer can distort responses by rewording or reordering questions. His
behaviour, style and looks may encourage or discourage certain replies from respondents.
Careless mechanical processing may distort the findings. Errors may also creep in because of
incorrect coding, faulty tabulation and/or statistical calculations, particularly in the data-
analysis stage.
Instrument: Error may arise because of the defective measuring instrument. The use of
complex words, beyond the comprehension of the respondent, ambiguous meanings, poor
printing, inadequate space for replies, response choice omissions, etc. are a few things that
make the measuring instrument defective and may result in measurement errors.
Therefore, researcher must, to the extent possible, try to eliminate, neutralize or otherwise
deal with all the possible sources of error so that the final results may not be contaminated.
1. Test of Validity
Validity refers to the extent to which a test measures what we actually wish to measure.
Validity is the most critical criterion and indicates the degree to which an instrument
measures what it is supposed to measure. One can certainly consider three types of validity in
this connection: Content validity, Criterion-related validity and Construct validity.
5
Content validity is the extent to which a measuring instrument provides adequate coverage of
the topic under study. If the instrument contains a representative sample of the universe, the
content validity is good. Its determination is primarily judgmental and intuitive. It can also be
determined by using a panel of persons who shall judge how well the measuring instrument
meets the standards, but there is no numerical way to express it.
Criterion-related validity relates to our ability to predict some outcome or estimate the
existence of some current condition. This form of validity reflects the success of measures
used for some empirical estimating purpose. The concerned criterion must possess the
following qualities:
Relevance: A criterion is relevant if it is defined in terms we judge to be the proper measure.
Free from bias: Freedom from bias is attained when the criterion gives each subject an equal
opportunity to score well.
Reliability: A reliable criterion is stable or reproducible.
Availability: The information specified by the criterion must be available.
Construct validity is the most complex and abstract. A measure is said to possess construct
validity to the degree that it confirms to predicted correlations with other theoretical
propositions. Construct validity is the degree to which scores on a test can be accounted for
by the explanatory constructs of a sound theory. For determining construct validity, we
associate a set of other propositions with the results received from using our measurement
instrument. If measurements on our devised scale correlate in a predicted way with these
other propositions, we can conclude that there is some construct validity.
If the above stated criteria and tests are met with, we may state that our measuring instrument
is valid and will result in correct measurement; otherwise we shall have to look for more
information and/or resort to exercise of judgement.
2. Test of Reliability
Reliability has to do with the accuracy and precision of a measurement procedure. The test of
reliability is another important test of sound measurement. A measuring instrument is reliable
if it provides consistent results. Reliable measuring instrument does contribute to validity, but
a reliable instrument need not be a valid instrument. For instance, a scale that consistently
overweighs objects by five kgs., is a reliable scale, but it does not give a valid measure of
weight.
6
Two aspects of reliability viz., stability and equivalence deserve special mention. The
stability aspect is concerned with securing consistent results with repeated measurements of
the same person and with the same instrument. We usually determine the degree of stability
by comparing the results of repeated measurements. The equivalence aspect considers how
much error may get introduced by different investigators or different samples of the items
being studied. A good way to test for the equivalence of measurements by two investigators
is to compare their observations of the same events.
3. Test of Practicality
Practicality is concerned with a wide range of factors of economy, convenience, and
interpretability. The practicality characteristic of a measuring instrument can be judged in
terms of economy, convenience and interpretability. From the operational point of view, the
measuring instrument ought to be practical i.e., it should be economical, convenient and
interpretable. Economy consideration suggests that some trade-off is needed between the
ideal research project and that which the budget can afford.
Convenience test suggests that the measuring instrument should be easy to administer. For
instance, a questionnaire, with clear instructions, is certainly more effective and easier to
complete than one which lacks these features.
Interpretability consideration is specially important when persons other than the designers of
the test are to interpret the results. The measuring instrument, in order to be interpretable,
must be supplemented by: detailed instructions for administering the test; scoring keys;
evidence about the reliability and guides for using the test and for interpreting results.
5.5. Scaling
In research we quite often face measurement problem, specially when the concepts to be
measured are complex and abstract and we do not possess the standardised measurement
tools. Alternatively, we can say that while measuring attitudes and opinions, we face the
problem of their valid measurement. Similar problem may be faced by a researcher, of course
in a lesser degree, while measuring physical or institutional concepts. As such we should
study some procedures which may enable us to measure abstract concepts more accurately.
Scaling describes the procedures of assigning numbers to various degrees of opinion, attitude
and other concepts. A scale is a range, consisting of the highest point and the lowest point
along with several intermediate points between these two extreme points.
7
Hence the term ‗scaling‘ is applied to the procedures for attempting to determine quantitative
measures of subjective abstract concepts. Scaling has been defined as a ―procedure for the
assignment of numbers (or other symbols) to a property of objects in order to impart some of
the characteristics of numbers to the properties in question.‖
Rating scales: The rating scale involves qualitative description of a limited number of
aspects of a thing or of traits of a person. When we use rating scales (or categorical scales),
we judge an object in absolute terms against some specified criteria i.e., we judge properties
of objects without reference to other similar objects. These ratings may be in such forms as
―like-dislike‖, ―above average, average, below average‖, or other classifications with more
categories such as ―like very much—like some what—neutral—dislike somewhat—dislike
very much‖; ―excellent—good—average—below average—poor‖, ―always—often—
occasionally—rarely—never‖, and so on. There is no specific rule whether to use a two-
points scale, three-points scale or scale with still more points. In practice, three to seven
points scales are generally used for the simple reason that more points on a scale provide an
opportunity for greater sensitivity of measurement.
Rating scales have certain good points. The results obtained from their use compare
favourably with alternative methods. They require less time, are interesting to use and have a
wide range of applications. Besides, they may also be used with a large number of properties
or variables. But their value for measurement purposes depends upon the assumption that the
respondents can and do make good judgements. If the respondents are not very careful while
rating, errors may occur.
Ranking scales: Under ranking scales (or comparative scales) we make relative judgements
against other similar objects. The respondents under this method directly compare two or
more objects and make choices among them. There are two generally used approaches of
ranking scales viz.
a. Method of paired comparisons: Under it the respondent can express his attitude by
making a choice between two objects, say between a new flavour of soft drink and an
established brand of drink.
8
b. Method of rank order: Under this method of comparative scaling, the respondents are
asked to rank their choices. There are limitations of this method. Then there may be the
problem of respondents becoming careless in assigning ranks particularly when there are
many items.
Researchers must as well be aware that inferring attitude from what has been recorded in
opinionnaires has several limitations. People may conceal their attitudes and express socially
acceptable opinions. They may not really know how they feel about a social issue. People
may be unaware of their attitude about an abstract situation; until confronted with a real
situation, they may be unable to predict their reaction. Even behaviour itself is at times not a
true indication of attitude. For instance, when politicians kiss babies, their behaviour may not
be a true expression of affection toward infants.
Thus, there is no sure method of measuring attitude; we only try to measure the expressed
opinion and then draw inferences from it about people‘s real feelings or attitudes. With all
these limitations in mind, psychologists and sociologists have developed several scale
construction techniques for the purpose. The researcher should know these techniques so as
to develop an appropriate scale for his own study. Some of the important approaches, along
with the corresponding scales developed under each approach to measure attitude are as
follows:
1. Arbitrary Scales
Arbitrary scales are developed on ad hoc basis and are designed largely through the
researcher‘s own subjective selection of items. The researcher first collects few statements or
items which he believes are unambiguous and appropriate to a given topic. Some of these are
selected for inclusion in the measuring instrument and then people are asked to check in a list
the statements with which they agree.
The chief merit of such scales is that they can be developed very easily, quickly and with
relatively less expense. They can also be designed to be highly specific and adequate.
9
Because of these benefits, such scales are widely used in practice. At the same time there are
some limitations of these scales. The most important one is that we do not have objective
evidence that such scales measure the concepts for which they have been developed. We have
simply to rely on researcher‘s insight and competence.
2. Likert-type Scales
This scale consists of a number of statements which express either a favourable or
unfavourable attitude towards the given object to which the respondent is asked to react. The
respondent indicates his agreement or disagreement with each statement in the instrument.
Each response is given a numerical score, indicating its favourableness or unfavourableness,
and the scores are totalled to measure the respondent‘s attitude. In other words, the overall
score represents the respondent‘s position on the continuum of favourable-unfavourableness
towards an issue.
In a Likert scale, the respondent is asked to respond to each of the statements in terms of
several degrees, usually five degrees (but at times 3 or 7 may also be used) of agreement or
disagreement. For example, when asked to express opinion whether one considers his job
quite pleasant, the respondent may respond in any one of the following ways: (i) strongly
agree, (ii) agree, (iii) undecided, (iv) disagree, (v) strongly disagree.
Response indicating the least favourable degree of job satisfaction is given the least score
(say 1) and the most favourable is given the highest score (say 5). These score—values are
normally not printed on the instrument but are shown here just to indicate the scoring pattern.
The Likert scaling technique assigns a scale value to each of the five responses.
3. Cumulative scales
Cumulative scales like other scales, consist of series of statements to which a respondent
expresses his agreement or disagreement. The special feature of this type of scale is that
statements in it form a cumulative series. This, in other words, means that the statements are
related to one another in such a way that an individual, who replies favourably to say item
No. 3, also replies favourably to items No. 2 and 1, and one who replies favourably to item
No. 4 also replies favourably to items No. 3, 2 and 1, and so on. This being so an individual
whose attitude is at a certain point in a cumulative scale will answer favourably all the items
on one side of this point, and answer unfavourably all the items on the other side of this point.
10