Measurement in Research
Measurement in Research
Measurement in Research
We measure physical objects and abstract concepts in our daily lives
o When we use yard stick to determine weight, height, or some other feature of physical
object
o When we judge how much we like a song or a painting or the personalities of our friends
It is a complex and demanding task – especially when concerned with measuring abstract
phenomena (an event, an occurrence, happening, circumstance, situation)
In the case of research, measurement means the process of assigning numbers to objects or
observations
When is it difficult?
It is easy to assign numbers in respect of properties of some objects but it is relatively difficult in
respect of others
Measuring things such as social conformity, intelligence, or marital adjustment is much less
obvious and requires much closer attention than measuring physical weight, age, or financial
assets
Easy to measure properties like weight, height, etc. with use of standard unit of measurement –
not so much with properties like motivation, stress, etc.
o Expect high accuracy in measuring length of pipe with yard stick
o We are less confident about accuracy of the results of measurement if the concept is
abstract and the measurement tools are not standardized
For example:
o We want to find the male to female attendance ratio while conducting a study of
persons who attend some show, then we may tabulate those who come to the show
according to sex
o Mapping the observed physical properties of an audience in a show (the domain) on to a
sex classification (the range)
o The rule of correspondence is: If the person/object is male, we assign it to “O” and if
female, assign to “1”
Measurement Scales
Nominal Scale
It is simply a system of assigning number symbols to events in order to label them
o The number is not associated with an ordered scale you can’t say that 1 is greater
than 0 because the numbers are just labels for the particular class of events and as such
have no quantitative value
Example:
o Assignment of numbers of basketball players in order to identify them
Possible arithmetic: counting only of members in each group
Measure of central tendency: mode
Test of significance: commonly chi-square test is utilized
Measure of correlation: contingency coefficient
Least powerful level
No order of distance relationship, no arithmetic origin
Nominal scale simply describes differences between things by assigning them to categories
Ordinal Scale
Places events in order, but no attempt to make the intervals of the scale equal in terms of some
rule
Ordinal scales only permit the ranking of items from highest to lowest. Ordinal measures have
no absolute values, and the real differences between adjacent ranks may not be equal.
Example:
o Ranks in competitions uses an ordinal scale
Measure of central tendency: median
Dispersion: percentile or quartile measure
The median is the score at the middle of all scores, or more formally defined “the middle value
in a distribution, below and above which lie values with equal total frequencies or probabilities”
(Porkess, 1991, p. 134). This means that 50% of the respondents scored equal or higher to the
median, and also 50% of the respondents scored lower or equal. If for example at a school exam
the results indicate that the median is a 70 (out of 100, with 55 or more being a pass), then we
know that at least 50% of the students passed. From a frequency table, the median can quickly
be found by looking at the cumulative percentages.
In the example from Table 5 we can see that the cumulative percent passes the 50% mark when
it goes from 31.3 to 67.8. So, one of the 348 people that chose ‘Not too scientific’ is the one
exactly in the middle. The median is therefore 'not too scientific’.
43350_4.pdf (sagepub.com)
Interval Scale
More powerful than ordinal scale because it incorporates the concept of equality of interval
Interval scales lack a true zero – it does not have the capacity to measure the complete absence
of a trait or characteristic
Central tendency: mean
Dispersion: standard deviation
Statistical significance: t test and the F test
3. Interval Scale –
An interval scale has ordered numbers with meaningful divisions, the magnitude between the
consecutive intervals are equal. Interval scales do not have a true zero i.e In Celsius 0 degrees
does not mean the absence of heat.
Interval scales have the properties of:
Identity
Magnitude
Equal distance
For example, temperature on Fahrenheit/Celsius thermometer i.e. 90° are hotter than 45° and
the difference between 10° and 30° are the same as the difference between 60° degrees and
80°.
Ratio Scale
Ratio scales represents the actual amounts of variables.
Have an absolute or true zero
Example:
o Length, weight, distance – measures of physical dimensions
All statistical techniques are usable
4. Ratio Scale –
The ratio scale of measurement is similar to the interval scale in that it also represents quantity
and has equality of units with one major difference: zero is meaningful (no numbers exist below
the zero). The true zero allows us to know how many times greater one case is than another.
Ratio scales have all of the characteristics of the nominal, ordinal and interval scales. The
simplest example of a ratio scale is the measurement of length. Having zero length or zero
money means that there is no length and no money but zero temperature is not an absolute
zero.
Properties of Ratio Scale:
Identity
Magnitude
Equal distance
Absolute/true zero
Validity - extent to which a test measures what we actually wish to measure; validity is the evidence for
inferences made about a test score
The use of categories does not imply that there are distinct forms of validity care exercised in
making distinctions because the categories actually overlap (Kaplan & Sacuzzo, 2015)
Is it really measuring what it is supposed to measure?
o Most critical crirterion
o Can be thought of as utility?
How do we check an instrument’s validity? We seek other relevant evidence that confirms the
answers we have found with our measuring tool
Content validity – the extent to which the instrument provides adequate coverage of the topic
under study
o E.g. we can have an expert panel to judge how the instrument meets the standards
o No numerical way to express it
Criterion related validity – our ability to predict some outcome or estimate the existence of
some current condition; broad term that actually refers to predictive and concurrent validity
What do we mean by criterion??? A basis, a reference
o Criterion must be: relevant, free from bias, reliable (stable), and available
o Predictive validity – usefulness of a test in predicting some future performance
o Concurrent validity – usefulness of a test in closely relating to other measures of known
validity;
Criterion and measure are taken at the same time
Example: learning disability test and school performance (Kaplan & Sacuzzo,
2015)
Here the measure and the criterion are taken at the same time because the test
is designed to explain why the person is now having difficulty in school
Expression: coefficient of correlation between test scores and some measure of future
performance or between test scores and scores on another measure of known validity
Construct validity – the degree to which scores/measurement using a test can be accounted for
by explanatory constructs of a sound theory
o Convergent - Convergent evidence comes from correlations between the test and other
variables that are hypothetically related to the construct.
o Divergent or discriminant validity - Discriminant evidence shows that the measure does
not include superfluous items and that the test measures something distinct from other
tests.
o Construct validity evidence is used when a specific criterion is not well defined.
Reliability and validity are related because it is difficult to obtain evidence for validity
unless a measure has reasonable validity.
o Construct validity evidence is established through a series of activities in which a
researcher simultaneously defines some construct and develops the instrumentation to
measure it. This process is required when “no criterion or universe of content is
accepted as entirely adequate to define the quality to be measured” (Cronbach &
Meehl, 1955, p. 282; Sackett, 2003). Construct validation involves assembling evidence
about what a test means. This is done by showing the relationship between a test and
other tests and measures. Each time a relationship is demonstrated, one additional bit
of meaning can be attached to the test. Over a series of studies, the meaning of the test
gradually begins to take shape. The gathering of construct validity evidence is an
ongoing process that is similar to amassing support for a complex scientific theory.
Although no single set of observations provides crucial or critical evidence, many
observations over time gradually clarify what the test means.
As we saw in Chapter 4, if a test measures whatever it measures well, its scores may be deemed to be
reliable (consistent, precise, or trustworthy), but they are not necessarily valid in the contemporary,
fuller sense of the term. In other words, test scores may be relatively free of measurement error, and
yet may not be very useful as bases for making the inferences we need to make
According to the testing pioneer Lee Cronbach, it may not be appropriate to continue to divide validity
into three parts: “All validation is one, and in a sense all is construct validation” (1980, p. 99). Recall
that the 2012 edition of Standards for Educational and Psychological Testing no longer recognizes
different categories of validity. Instead, it recognizes different categories of evidence for validity.