Validity and Reliability in Research
Validity and Reliability in Research
John Adepoju
Validity and Reliability in Research
Validity and reliability are important concepts in research methodology, and they ensure that
measurements and results are done in a valid and reliable way, making the findings of a certain research
credible and generalisable.
The concept of validity relates to the extent to which your research instrument actually measures what
it’s supposed to: how true it is. It’s about whether a scientific finding is accurate. It refers to the extent to
which your research is specific to what you want to assess, addressing the question: does your study
measure what you say it measures?
There are several types of validity, each addressing different aspects of the measurement process:
1. Content Validity: This deals with whether the measuring instrument covers the whole range of
the construct intended to be measured. For instance, consider a test designed to measure
mathematical ability. The test is said to have good content validity if it covers the whole range of
mathematical abilities and includes questions from each area of mathematics. For example, the
test should not include only algebra questions.
2. Construct Validity: This measures whether the instrument actually measures the theoretical
construct that it intends to measure. Construct validity is measured through convergent and
divergent validity. Convergent validity tests that the measure is correlated well with other
measures of the construct; divergent validity tests that it is not correlated with those measures
that measure different constructs
3. Criterion-related Validity: This speaks to the success of a measure when it is related to an
external criterion (eg, predictive validity is the extent to which the measure predicts a future
outcome, and concurrent validity is the extent to which the measure correlates with an outcome
measured at the same time).
4. Face Validity: Though considered the weakest form of validity, face validity is the extent to
which a measurement instrument appears effective in terms of its stated aims, purely based on
subjective judgment.
Validity therefore has to be built into the initial design, including the development of measurement
instruments that must be written, critiqued by peers, and usually piloted before use.
Reliability is the extent to which a test produces essentially the same results on repeated administrations
under comparable conditions. There are several kinds of reliability:
1. Test-retest Reliability: This refers to the stability of an instrument over time. It is evaluated by
administering the same test to the same group of individuals at two different time points in time
and correlating the score.
2. Inter-rater Reliability: It is an indicator of the extent to which different raters agree upon using
this instrument. High inter-rater reliability reveals that the same instrument is yielding similar
results among different raters.
3. Internal Consistency: This assesses the degree of consistency of results across items in a test.
The most common way of estimating this is by determining what is known as Cronbach’s alpha.
Cronbach’s alpha estimates the average correlation between items in a survey or test.
4. Parallel-forms Reliability: This involves comparing two different forms of the same test, which
are designed to be equivalent, to see if they produce similar results.
Because it’s necessary but not sufficient, high reliability, by itself, doesn’t guarantee validity. In fact, an
instrument can be highly reliable yet invalid: that is, consistent and each time, wrong – the cruellest kind
of inaccuracy. (‘Exactly right, as always.’ ) But for validity, reliability is obligatory. Researchers strive
for reliability and validity, using sophisticated statistical techniques such as factor analysis, structural
equation modelling and item response theory to boost the psychometric properties of their instruments.
It can be hard to strike such a balance. For example, increasing a test’s reliability (ie, making it more
standardised and structured) might, at times, decrease its validity (ie, making it less holistic and
flexible). It falls to researchers to design their studies in ways that optimise both these qualities,
potentially iteratively refining their measurement tools.
Sampling Techniques
Sampling techniques involves the selection of a subset from a population in a way that the subset
reflects the population. These techniques can greatly affect the generalisability and reliability of the
findings of research. Sampling techniques can generally be divided into two categories: probability
sampling techniques and non-probability sampling techniques.
Probability Sampling involves random selection. This means that any particular member of the
population has the same chance of being selected as any other, and this increases the ability to generalise
the results. Specific types of probability sampling include:
1. Simple Random Sampling: Every member of the population has an equal chance of being
selected. This can be done by using random number generators or drawing lots. Simple random
sampling is easy, but it might be impractical to use for a very large population.
2. Systematic Sampling: In this approach, every nth member of the population is selected after a
random starting point. It is easier to implement that simple random sampling, but bias can be
introduced if the population has a hidden pattern.
3. Stratified Sampling: The population is divided into strata based on certain characteristics (eg,
age, gender) and a random sample is taken from each stratum This allows for representation of
all key subgroups and increases the precision of the results.
4. Cluster Sampling: The population is broken into units or clusters (such as geographical areas)
and a random sample of clusters is selected and everyone in those selected clusters is included in
the sample. Cluster sampling is useful for very large and geographically widespread populations,
since it saves time and resources but can result in larger sampling error than stratified sampling.
Non-probability Sampling does not involve random selection, which limits the generalizability of the
results. However, it is often easier and cheaper to implement. Common types include:
1. Convenience Sampling: The sample is taken from a group that is conveniently accessible to the
researcher. This method is quick and easy but also highly biased and often unrepresentative.
2. Judgmental (or Purposive) Sampling: The researcher’s judgment is used to decide who would
be good to include in a sample to be considered representative of the population. This is good for
qualitative research or exploratory studies but is very subjective since it can introduce bias.
3. Snowball Sampling: Existing study subjects recruit future subjects from among their
acquaintances. This method is useful for accessing hard-to-reach populations but can lead to
biased samples due to the non-random nature of the recruitment process.
4. Quota Sampling: The researcher attempts to fulfil a pre-assigned ratio of specific characteristics
(e.g., 50% males and 50% females). Although this approach can aid representativeness compared
with convenience sampling, it still does not achieve the randomness of probability sampling.
New technologies and data science are transforming sampling approaches. In some contexts,
advances in big data analytics allow for near-population-level analyses for some questions.
Nevertheless, whether we refer to our subjects as ‘participants’, ‘respondents’ or ‘students’, data-
quality, representativeness and ethical questions relating to data collection and use remain
paramount.
They also make available a range of adaptive sampling designs, able to dynamically respond to
patterns that arise in the course of collecting data. Such approaches can be particularly useful in
ecological and rare population studies, where fixed designs might prove costly or insufficient.
For hidden populations, the most notable advance in survey design has been the use of respondent-
driven sampling, which is based on snowball sampling but combines elements from a mathematical
model to weight the sample to account for non-random recruitment patterns.
Yet longitudinal studies face a troublesome problem of attrition, requiring sophisticated sampling
and analysis strategies to mitigate bias from differential dropout rates, including techniques such as
multiple imputation and inverse probability weighting. These measures help to ensure the validity of
long-term follow-up studies.
As mixed-methods research designs have become more commonplace, so too have novel sampling
designs that transcend the quantitative-qualitative dichotomy. Sequential mixed methods sampling
represents one such innovative synthesis in which the results of initial quantitative data inform
subsequent qualitative sampling (or vice versa).