Chap 4
Chap 4
Learning Objectives
By the end of this chapter you should be able to:
• distinguish between a random and non-random methods of
sample selection
• describe the advantages of random sample selection
• identify different methods of sample selection
• match different methods of sample selection to research
design
• describe the factors influencing sample size
• describe how to calculate the appropriate sample size
Introduction
Sampling is a crucial issue in health and social care research Definition:
Definition
design. When undertaking a quantitative approach, Internal validity relates
researchers are striving to be able to generalise their findings to the validity of the
to the wider world (achieve external validity). It is therefore study itself
itself including
essential that both the sampling method used and the sample both the design and
size achieved are appropriate firstly, if the results are to be the instruments used.
representative, and secondly, if statistically significant Definition:
associations or differences are to be identified. External validity relates
It is important to achieve representativeness in a sample, to the extent to which
because it is this very ‘representativeness’ that allows the the findings of the
study can be
researcher to generalise the findings derived from a sample to
generalised to a wider
the wider population. Obtaining a representative sample is population and be
therefore desirable as this means that your sample accurately claimed to be
reflects the population under investigation. representative.
In qualitative research, issues of sampling are different,
although no less important. Here, the objective of research is
more concerned with internal validity – with providing an
accurate representation of reality – than with the external
validity that will enable generalisation to other settings. Hence,
appropriate sampling to achieve representativeness is of
importance.
In this chapter we will concentrate on the different sampling
techniques that can be used in health and social care research to
achieve the desired validity objectives. We will examine
random and non-random methods of sample selection, which
are typically used in quantitative and qualitative research
respectively. In the second part of this chapter we will discuss
issues and techniques of sample size.
4-1
Why do we need to select a sample anyway?
In some circumstances it is not necessary to select a sample. If
the subjects of your study are so rare, for example a child with
a disease seen once in 100,000 children, then you might want to
study every case you can find. However, more generally you
are likely to find yourself in a situation where the potential
subjects of your study are more common and you cannot
include everybody.
Definition:
Definition If you were to include everybody who is eligible in your study,
A census is an for example everybody in the UK who has been diagnosed as
enumeration of a suffering from asthma, then this would be defined as a census.
population originally In many instances, a census is simply too large to handle. It
for purposes of taxation would take too long and cost too much money. If a census is
and military service. too large, it will be necessary to use a sample to carry out the
Derived from the Latin,
research.
censes meaning ‘rate’.
Before looking at different sampling techniques it is important
to differentiate between two terms: population and sample.
A population refers to ‘units’ (i.e. people, schools, cities) that
you wish to make a statement about whereas the sample refers
to the group of units selected from the population from which
information would be collected. For example imagine you wish
to carry out a study investigating the height of seven-year-old
boys in England. The population would be all seven-year-old
boys in England whereas the sample would be a group of
seven-year-old boys in Sheffield from which you could gather
data on.
There are a number of different sample techniques but overall
they can be split into two main groups: probability and non-
probability sampling, or as they are more commonly referred
to random vs. non-random methods.
Probability Sampling
Definition:
Definition they allow the use of inferential statistics. Inferential
Inferential statistics are statistics provide ways of judging whether it is possible to
used to make generalise the results between the sample and the population.
generalisations from a
There are five different types of probability sampling which are
sample to a population.
described in Fig 4.1 below:
4-2
Figure 4.1. Type of sampling and Selection Strategy (adapted from Henry 1990: 27)
Simple Each member of the study group has an equal probability of being selected
Random
Systematic Each member of the study population is either assembled or listed, a random
start is designated, then members of the population are selected at equal
intervals
Stratified Each member of the study population is assigned to a group or stratum, then a
simple random sample is selected from each stratum
Cluster Each member of the study population is assigned to a group or cluster, then
clusters are selected at random and all members of selected clusters are
included in the sample
Multistage Clusters are selected as in a cluster sample, and then sample members are
selected from the cluster members by simple random sampling. Clustering
may be done at more than one stage.
4-3
likely to influence the outcome or dependent variables in
either of the groups.
It is important to remember that randomisation is different
from random sampling. As we described in chapter 2,
randomisation within an experimental design is a way of
ensuring control over confounding variables and as such it
allows the researcher to have a greater confidence in
identifying real associations between an independent variable
(the cause) and a dependent variable (the effect or outcome
measure).
The term ‘random’ may imply to you that it is possible to take
some sort of haphazard or ad hoc approach, for example
stopping the first 20 people you meet in the street for inclusion
in your study. This is not random in the true sense of the word.
To be a ‘random’ sample, every individual in the population
must have an equal probability of being selected. In order to
carry out random sampling properly, strict procedures need to
be adhered to.
Random sampling techniques can be split into simple random
sampling and systematic random sampling.
4-4
clinics in total and we only require a sample of 200, we would
need to:
• calculate the sampling interval by dividing 3,000 by 200 to
give a sampling fraction of 15
• select a random number between one and 15 using a set of
random tables
• if this number were 13, we select the individual allocated
number 13 and then go on to select every 15th person
• this will give us a total sample size of 200 as required.
Care needs to be taken when using a systematic random
sampling method in case there is some bias in the way that lists
of individuals are compiled, for example if all the husbands’
names precede wives’ names and the sampling interval is an
even number, then we may end up selecting all women and no
men.
4-5
If we really want to be able to compare the survey results of the
minority individuals with those of the larger group, then it is
necessary to use a disproportionate sampling method.
With disproportionate sampling, the strata selected are not
selected pro-rata for their size in the wider population.
For instance, if we are interested in comparing the views and
behaviour of particular minority groups with other larger
groups, then it is necessary to over sample the smaller
categories in order to achieve statistical power, i.e. in order to
be able to demonstrate statistically significant differences
between groups.
If during data analysis we wish to refer to the total sample as a
whole; representative of the wider population, then it will
become necessary to re-weight the categories back into the
proportions in which they are represented in reality.
For example, if we wanted to compare the views and
satisfaction levels of women who gave birth in a birth pool
compared with those who gave birth ‘normally’ in a bed, a
systematic random sample, although representative of all
women giving birth would not produce a sufficient number of
women giving birth in water to be able to compare the results,
unless the total sample was so big that it would take many
years to collate. We would also end up interviewing more
women than we needed who have given birth ‘normally’. In
this case it would be necessary to over sample or over
represent those women giving birth in water to have enough
individuals in each group in order to compare them. We would
therefore use disproportionate stratified random sampling to
select the sample in this instance.
The important thing to note here is that random sampling is still
taking place within each strata or category. So we would use
systematic random selection to select a sample of women
giving birth in water and the same process to select women
giving birth ‘normally’.
Cluster Sampling
Cluster sampling is a method frequently employed in national
surveys where it is uneconomic to carry out interviews with
individuals scattered across the country. Cluster sampling
allows individuals to be selected in geographic batches. So, for
instance, before selecting at random, the researcher may decide
to focus on certain towns, electoral wards, hospitals or general
practices.
An example of a piece of research that used a cluster sampling
method can be found in:
4-6
Kennedy et al (2003) A cluster-randomised controlled trial of a
patient-centred guidebook for patients with ulcerative colitis:
effect on knowledge, anxiety and quality of life. Health and
Social Care in the Community, 11: 64
You can find this article on line, at:
http://www.shef.ac.uk/library/elecjnls/
The aim of the study was evaluate the impact of an evidence
based guidebook on anxiety, knowledge and quality of life in
patients with ulcerative colitis
Stage 1: A random sample of six hospitals were chosen from
the 20 district hospitals in the North West of England.
Stage 2: Hospitals were randomised to either receiving the
guidebook (intervention group) or not receiving the guidebook
(control group).
Multi-stage sampling
Multi-stage sampling allows the individuals within the
selected cluster units to then be selected at random. Obviously
care must be taken to ensure that the cluster units selected are
generally representative of the population and are not strongly
biased in any way.
An example of a piece of research that used a multistage
sampling method is:
Hughes et al (1997) Young people, alcohol and designer
drinks: quantitative and qualitative study. BMJ, 314: 414-
You can find this article on-line via the University of Sheffield
library, at:
http://www.shef.ac.uk/library/elecjnls/brbz.html
4-7
Non-Random Sampling
Non-random or non-probability sampling refers to sampling
methods that do not adhere to the principles of probability
sampling, i.e. that not everyone in the population has an equal
chance of being selected. For this reason, non-random
sampling is not used very often in quantitative health and social
care research.
Since the objective of qualitative research is to understand and
give meaning to a social process, rather than quantify and
generalise to a wider population, it is inappropriate to use
random sampling and apply statistical tests. The issues for
qualitative studies are therefore somewhat different, and the
approach to sampling is distinctive. Sample sizes used in
qualitative research are usually very small and the application
of statistical tests would be neither appropriate nor feasible.
For example, a study may wish to consider attitudes towards
compliance with asthma treatment. A small sample of local
asthmatic teenagers may be interviewed. The sample is thus
chosen on the basis of ‘theory’: this kind of theoretical
sampling aims to maximise the range of responses, but does
not strictly seek to ‘represent’ all the people in the community
and other stakeholders.
Non-random sampling techniques are also important to
consider as they are being used increasingly in market research
and commissioned studies such as political opinion polling.
There are three main types of non-random sampling: quota
sampling, convenience sampling and snowball sampling. The
technique most commonly used is known as quota sampling
Quota Sampling
Quota sampling is a technique for sampling whereby the
researcher decides in advance on certain key characteristics that
they will use to stratify the sample. Interviewers are often set
sample quotas in terms of age and sex. So, for example, with a
sample of 200 people, they may decide that 50% should be
male and 50% should be female; and 40% should be aged over
40 years and 60% aged 39 years or less. The difference with a
stratified sample is that the respondents in a quota sample are
not randomly selected within the strata. The respondents may
be selected just because they are accessible to the interviewer.
Because random sampling is not employed, it is not possible to
apply inferential statistics and generalise the findings to a wider
population
4-8
Convenience or Opportunistic Sampling.
Selecting respondents purely because they are easily accessible
is known as convenience sampling. Although quantitative
researchers generally frown upon this technique, it is an
acceptable approach when using a qualitative design, since
generalisability is not a main aim of qualitative approaches.
In fact qualitative data are often collected using a convenience
or opportunistic sampling approach, for instance where the
researcher selects volunteers amongst his or her work
colleagues. However, as this is rather a haphazard method,
many qualitative researchers employ a purposive sample to
identify specific groups of people who exhibit the
characteristics of the social process or phenomenon under
study. For example, a researcher may be seeking to interview
people who have recently been bereaved, or are being
rehabilitated into intermediate care; another researcher may be
seeking people who have experienced long-term
unemployment and suffer from chronic asthma. Sometimes
researchers try to find people who typify the characteristics
they are looking for. This is known as an ‘ideal type’.
Occasionally it is useful to deliberately include people who
exhibit the required characteristics in the extreme. Close
examination of extreme cases can sometimes be very
illuminating, when trying to formulate a theory. However,
caution must be adopted with this approach, to ensure these
extreme cases are not used to generalise.
Theoretical Sampling
If a researcher wishes to develop a social theory, a
‘theoretical’ sampling technique may be used. The idea is
that the researcher selects the subjects, collates and analyses the
4-9
data to produce an initial theory that is then used to guide
further sampling and data collection from which further theory
is developed. Since theoretical sampling is one of the main
methods used in qualitative research, this raises issues about
the degree to which findings can be generalised or achieve
transferability. For more information on this topic and
qualitative research in general see Chapter 7.
4 - 10
Self-Assessment Questions 4.1 Sampling methods
1. A sample of school children some with asthma and some without is selected
from GP records. The children are selected randomly within each of the two
groups and the number of children in each group is representative of the total
patient population for this age group.
2. A sample of children is selected from social workers records. The children are selected
so that 50% have been placed into foster care and 50% are waiting for foster parents. The
children are randomly selected within each group.
3. A survey is carried out to examine the attitudes of mothers with children under one
year. The sample is selected by interviewers stopping likely-looking women pushing
prams in the street. The number of respondents who fall into different age bands and
social classes is strictly collated.
4. A sample of drug users is gathered by advertising in the local newspaper for potential
respondents.
5. All male adults whose National Insurance numbers end in ‘5’ are selected for a survey.
6. All patients attending wards 3, 5 and 10 in a hospital are selected for a study.
4 - 11
Now that you have been introduced to sampling techniques
used in qualitative and quantitative research we would like you
to read more on these sampling methods. Please read chapter
four of Alan Bryman’s book ‘Sampling’, which is provided
for you in the supplementary reading. When you have
finished, please complete the following SAQ. You may wish
to take notes as you read.
2. For which research methodology are you most likely to use a snowball sampling technique
and why?
4. What is meant by non-response? In what way could this affect obtaining a representative
sample?
4 - 12
Answers to SAQ 4.1
1. Stratified random sample. The sample is stratified because
the sample has been selected to ensure that two different groups
are represented.
2. Disproportionate stratified random sample. This sample is
stratified to ensure that patients from the two different groups
are picked up, however the two groups are selected so that they
are equal in size and are not representative of the patient base.
3. Quota. The sample is not randomly selected but the
respondents are selected to meet certain criteria.
4. Convenience. The sample is not randomly selected and no
quotas are applied.
5. Systematic random sample.
6. Cluster sample because the patients are selected only from
certain wards.
4 - 13
Reflective Exercise 4.1 Sampling
We would like you to look at the practical issues surrounding
sampling techniques in research. Please read the journal article:
Helder D et al (2002) Living with Huntington's disease: Illness
perceptions, coping mechanisms, and patients' well-being. British
Journal of Health Psychology, 7 (4): 449 – 462.
(Tip: read the methods and the discussion).
4 - 14
Answers to SAQs 4.2
1. Non-representative because of the possible interviewer bias
involved in selecting the respondents, respondents may be
untypical as only those that are available and in the
interviewers vicinity are typically selected, the interviewer is
often prone to bias in selecting respondents, i.e. they may
assume that a person is younger than required for the study and
therefore not approach them, the reliance upon using social
class often causes problems in making sure that respondents are
correctly from these social class groups.
2. Qualitative because generalisability which is an essential
requirement within quantitative research aren’t possible with
this non-probability sampling technique.
3. The main advantage of using a stratified sample is that the
sample should be distributed in the same way as the population
which may not be achieved with just using a simple random
sample.
4. Non-response refers to social survey research and those
subjects participating in the research who do not return self
completed questionnaires. Non-response can affect obtaining a
representative sample as those participants who have not
agreed to participate may differ from those subjects who have
participated (i.e. in terms of age social class, reading and
writing abilities) which may make the sample left in the study
unrepresentative of the population.
4 - 15
Part 2: Sample size and the ‘power’ of research
In the previous section, we looked at methods of sampling.
Now we want to turn to another aspect of sampling: how big a
sample needs to be in quantitative research to enable a study to
have sufficient ‘power’ to do the job of testing a hypothesis.
At first glance, many pieces of research seem to choose a
sample size merely on the basis of what ‘looks’ about right: or
perhaps simply for reasons of convenience: ten seems a bit
small, and hundred would be difficult to obtain, so 40 is a
happy compromise! Unfortunately, a lot of published research
uses precisely this kind of logic. In the following section, we
want to show you why using such reasoning could make your
research worthless. Choosing the correct size of sample is not
a matter of preference, it is a crucial element of the research
process without which you may well be spending months trying
to investigate a problem with a tool which is either completely
useless, or over expensive in terms of time and other resources.
4 - 16
The measurement of such generalisability of a study is done by
statistical tests of inference.
You may be familiar with some such tests: tests such as chi-
squared, the t-test, and tests of correlation. We will not look See Chapter 6 for
further details of
at these tests in any detail in this course, but we need to
these tests.
understand that the purpose of these and other tests of
statistical inference is to assess the extent to which the
findings of a study can be accepted as valid for the population
from which the study sample has been drawn. If the statistics
we use suggest that the findings are ‘true’, then we can be
happy to conclude that (within certain limits of probability), we
can assume that the study’s findings can be generalised, and we
can act on them (to reduce exposure to traffic by school
children, for instance).
From common sense, we see that the larger the sample is, the
easier it is to be satisfied that it is representative of the
population from which it is drawn: but how large does it need
to be? This is the question that we need to answer, and to do
so, we need to think a little more about the possibilities that our
findings may not reflect reality: that we have committed an
error in our conclusions.
4 - 17
asthma example again, we conduct a study and find no
association, missing one that really does exist.)
Both types of error are serious. Both have consequences:
imagine the money that might be spent on reducing traffic
pollution, and all the time it does not really affect asthma
(result of a Type 1 error). Or imagine allowing traffic pollution
to continue, while it really is affecting children’s health (result
of a Type 2 error). Good research will minimise the chances of
committing both Type 1 and Type 2 errors as far as possible,
although they can never be ruled out absolutely.
Figure 4.1: The Null Hypothesis (Ho), Statistical Significance and Statistical Power
POPULATION
Null Hypothesis False True
is:
❶ ❷
❸ ❹
4 - 18
Cell 1. The null hypothesis has been disproved by the results of
the study (that is, there is support for a hypothesis which
suggests some differences between groups or association
between variables). This is also the situation in the population.
Thus, we can be satisfied that the study is reflecting the world
outside the limits of the study and it is to be accepted as a
'correct' result.
Cell 4. The results from the study support the null hypothesis.
This is the situation which pertains in the population, so we can
be satisfied that our study reflects the circumstances in the
population. Once again, this is a 'correct' result.
Cell 2. In this cell, as in cell 1, the study results falsify the null
hypothesis, indicating some kind of difference or association
between variables. However, in the world beyond the study,
the null hypothesis is actually true and there is no effect. This
is the Type I error: the error of wrongly rejecting a true null
hypothesis. The likelihood of committing a Type I error is
known as the alpha value or the statistical significance of the
test. Some of you may be familiar with alpha as the quoted p
level of significance of a test. The p value marks the
probability of committing a Type I error; thus a p value of 0.05
indicates a five per cent (or one in 20) chance of committing a
Type I error. Cell 2 thus reflects an incorrect finding from a
study, and the alpha value represents the likelihood of this
occurring.
Cell 3. This cell similarly reflects an undesirable outcome of a
study. Here, as in Cell 4, a study supports the null hypothesis,
implying that there is no difference or association in the
population under investigation. But in reality, the null
hypothesis is false and there is some kind of difference or
association that the study is missing. This mistake is the Type
II error of wrongly accepting a false null hypothesis. The
likelihood of committing a Type II error is the beta value of a
statistical test, and the value (1 - beta) is the statistical power
of the test. Thus, the statistical power of a test is the likelihood
of avoiding a Type II error. Conventionally, a value of 0.80 or
80% is the target value for statistical power, representing the
likelihood that four times out of five that a false null hypothesis
will be rejected. Outcomes of studies that fall into cell 3 are
incorrect; beta or its complement (1-beta) are the measures of
power: the likelihood of such an outcome of a study.
All research should seek to avoid both Type I and Type II
errors, which lead to incorrect inferences about the world
beyond the study. In practice, there is a trade-off. Reducing
the likelihood of committing a Type I error by increasing the
level of significance at which one is willing to accept a positive
finding reduces the statistical power of the test, increasing the
4 - 19
possibility of a Type II error, and vice versa. However, both
statistical significance and statistical power are affected by
sample size.
Most researchers who use statistics will be aware that the
chances of gaining a statistically significant result will be
increased by enlarging a study's sample. Similarly, the
statistical power of a study is enhanced as sample size
increases. Let us look at each of these aspects of quantitative
research in turn.
4 - 20
The Statistical Power of a Study
As we have just seen, statistical tests build in a safety margin to
avoid generalising false positive results, possibly with
disastrous or expensive consequences. Researchers who use
small samples thus run the risk of not being able to demonstrate
differences or associations that really do exist. Thus they are in
danger of committing a Type 2 error (Cell 3 in Fig 5.1), of
accepting a false null hypothesis. Such studies are ‘under-
powered’, not possessing sufficient statistical power to detect
the effects they set out to detect. Conventionally, the target is
a power of 80% or 0.8, meaning that a study has an 80 per
cent likelihood of detecting a difference or association that
exists.
Examination of research undertaken in various fields of study
suggests that many studies do not meet this 0.8 conventional
target for power (Fox and Mathers 1997). What this means is
that many studies have much reduced likelihood of being able
to discern the effects which they set out to seek: a study with a
power of 0.66 will only detect an effect two times out of three,
while studies with power of 0.5 or less will detect effects at
levels less frequent than those achieved by tossing a coin. A
non-significant finding of a study may thus simply reflect
the inadequate power of the study to detect differences or
associations at levels that are conventionally accepted as
statistically significant.
In such situations one must ask the simple question of such
research: ‘Why did you bother, when your study had little
chance of finding what you set out to find?’
A hypothetical example will illustrate the importance of being
able to detect a false null hypothesis. Imagine that researchers
wish to compare two inhalers for their efficacy in delivering
drugs for countering asthma attacks. Inhaler A is twice as
expensive as inhaler B. In the study, inhaler A demonstrates a
slightly higher level of efficacy, but because the study is small
(only 50 people in each group), this difference does not reach
significance, and the researchers conclude there is no
demonstrable difference. They report this finding, and as a
consequence, the cheaper but less effective inhaler comes to be
the one recommended to doctors to prescribe. If this study is
under-powered, this decision is based on the acceptance of a
false null hypothesis (Cell 3: false negative), the consequences
for children using a less effective inhaler could be fatal.
Statistical power calculations can be undertaken after a study
has been completed, to assess the likelihood of a study
discovering effects. More importantly, such calculations need
to be undertaken prior to a study to avoid both the wasteful
consequences of under-powering, (or of overpowering in
which sample sizes are excessively large, leading to very high
4 - 21
power at the expense of higher than necessary study costs).
Power is a function of three variables: sample size, the chosen
level of statistical significance (alpha) and effect size. While
calculation of power entails recourse to tables of values for
these variables, the calculation is relatively straightforward in
most cases.
4 - 22
3. One can make a decision about the smallest size of effect
which it is worth identifying. To consider the earlier
example of two rival inhalers, if we are willing to accept the
two inhalers as equivalent if there is no more than a ten per
cent difference in their efficacy of treatment, then this effect
size may be set, acknowledging that smaller effects will not
be discernible.
4. As a last resort, one can use a `guesstimate’ as to whether an
ES is `small’, `medium’, or `large’.
These definitions and values for `small’, `medium’ and `large’
effects are conventions, as described by Cohen (1970). A
`medium’ effect is defined as one which is 'visible to the naked
eye' - in other words - which could be discerned from everyday
experience without recourse to formal measurement.
For example, the difference between male and female adult
heights in the UK would be counted as a medium ES. Most
effects encountered in biomedical and social research should be
assumed to be small, unless there is a good reason to claim a
medium effect, while a `large’ effect size would probably need
to be defined as one which is so large that it hardly seems
necessary to undertake research into something so well
established.
Cohen offers the example of the difference between the heights
of 13 and 18 year old girls as a `large’ effect.
Power calculations may be used as part of the critical appraisal
of research papers. Unfortunately it is rare to see beta values
quoted for tests in research reports, and indeed often the results
reported are inadequate to calculate effect sizes. Studies have
evaluated various scientific subjects, including nursing,
education, management and medicine.
4 - 23
Table 4.2: Necessary sample sizes for statistical tests where
alpha (p) = 0.05
Test Degrees of Freedom ES = Small ES = Medium
T-test 300 per group 50 per group
F-test 2 322 per group 64 per group
3 274 per group 45 per group
4 240 per group 49 per group
Chi squared 1 785 total 87 total
2 964 total 107 total
3 1090 total 121 total
Pearson’s 618 68
correlation
4 - 24
Reflective Exercise 4.2: Statistical power in health research
Please read the article in the associated reading by Fox and
Mathers entitled ‘Empowering your research: statistical power
in general practice research’
Now answer these questions:
2. What proportion of papers had too high a power, and why is this an issue?
4. At what point in your PhD studies, should you consult a statistician, to ensure your
study is neither under- or over-powered?
4 - 25
Summary
Key points to remember when deciding on sample selection
are:
1. Always try to use a random method where possible and
remember that random does not mean haphazard.
2. Random selection means that everybody in your sampling
frame has an equal opportunity of being included in your
study.
3. If you need to be able to generalise about small or minority
groups and to compare those with larger groups, consider
using disproportionate stratified sampling, but remember to
re-weight the results afterwards if you wish to generalise
from the whole sample.
4 - 26
Recommended Reading
4 - 27
4 - 28
Learning Review Form: Chapter 4
Please complete this form when you have completed the chapter. If you do not consider you
have achieved the learning outcomes, you need to go back and do more work on the chapter
or read the recommended reading.
When you have completed the form, save it to hand in with your log book
sample selection
selection
sample size
4 - 29
4 - 30