Stratified Random Sampling
Stratified Random Sampling
Let’s also assume that we want to sample 200 teachers. Since 50% of those
teachers need to be elementary teachers, we need 100 elementary teachers
in our sample (200 X .50). To achieve this, we obtain a list of all of the
elementary teachers in the system. From that list we randomly select 100.
Importantly, strata used in this technique must not overlap, because if they
did, some individuals would have a higher chance of being selected than
others. This would create a skewed sample that would bias the research and
render the results invalid.
Some of the most common strata used in stratified random sampling include
age, gender, religion, race, educational attainment, socioeconomic status, and
nationality.
For example, let’s say you have four strata with population sizes of 200, 400,
600, and 800. If you choose a sampling fraction of ½, this means you must
randomly sample 100, 200, 300, and 400 subjects from each stratum
respectively. The same sampling fraction is used for each stratum regardless
of the differences in population size of the strata.
A stratified sample can also be smaller in size than simple random samples,
which can save a lot of time, money, and effort for the researchers. This is
because this type of sampling technique has a high statistical precision
compared to simple random sampling.
When it comes to statistical surveys and getting the data you need, there’s no shortage of
sampling techniques you can use.
Simple sampling, systematic sampling, quota sampling, cluster sampling — there are numerous
methods for designing a sample to represent your population of interest.
Of course, each varies in accuracy, reliability, and efficiency. No two methods are the same and
some are more complicated than others.
In this article, we’re going to focus on one in particular: stratified random sampling. We’re going
to highlight what it is, how you can use it to your advantage, and several best-practice tips to
help you get going.
Each stratum (the singular for strata) is formed based on shared attributes or characteristics —
such as level of education, income and/or gender. Random samples are then selected from each
stratum and can be compared against each other to reach specific conclusions.
For example, a researcher might want to know the correlation between income and education —
they could use stratified random sampling to divide the population into strata and take a random
sample from it.
Stratified random sampling is typically used by researchers when trying to evaluate data from
different subgroups or strata. It allows them to quickly obtain a sample population that best
represents the entire population being studied.
Stratified random sampling is one of four probability sampling techniques: Simple random
sampling, systematic sampling, stratified sampling, and cluster sampling.
Of course, your choice of sampling technique will depend on your goals, budget, and desired
level of accuracy. With this in mind, make sure to clearly outline what it is you want to achieve
and try out different methods to see which work best for your research.
But for now, where do you start with stratified random sampling?
If we’re investigating wage differences between genders, we can stratify a larger population into
different genders (e.g. female and male) or pay grades (e.g. under $50k, $50-100k, $100-250k,
over $250k).
If we choose to stratify by gender and randomly select a sample across each of the gender
groups, then these samples can be compared using pay grades to explore wage gaps.
So in the example below, the total population is 15. When gender is applied to the population, we
can see there are more men (9) than women (6). This gives us a sample ratio of 2:1, or a sample
fraction of ⅔ men to ⅓ women.
If we want a sample size of 5 (one-third of the total population), we must randomly select
participants in proportion to the size of each stratum. The number of participants selected must
reflect the sample ratio.
Image as originally seen on Scribbr
As a result, the final sample will have 5 randomly selected participants, which will be split by
gender (made up of 2 women and 3 men).
There are three forms of cluster sampling: one-stage, two-stage and multi-stage.
One-stage cluster sampling first creates groups, or clusters, from the population of participants
that represent the total population. These groups are based on comparable groupings that exist –
e.g. zip codes, schools or cities.
The clusters are randomly selected, and then sampling occurs within these selected clusters.
There can be many clusters and these are mutually exclusive, so participants don’t overlap
between the groups
Two-stage cluster sampling first randomly selects the cluster, then the participants are randomly
selected from within that cluster.
Multi-stage cluster sampling is a more complex process which involves dividing the population
into groups before one or more clusters are chosen at random and sampled.
The main difference between stratified sampling and cluster sampling is that with cluster
sampling, there are natural groups separating your population. In cluster sampling, the sampling
unit is the whole cluster. Instead of sampling individuals from each group, a researcher will
study whole clusters.
In stratified random sampling, however, a sample is drawn from each strata (using a random
sampling method like simple random sampling or systematic sampling). Elements of each of the
samples will be distinct, giving the entire population an equal opportunity to be part of these
samples. Typically, natural groups do not exist, so you divide your target population into groups
(stratum).
Generally, cluster sampling is much more affordable and “efficient”, whereas stratified random
sampling is more precise.
Simple random sampling selects a smaller group (the sample) from a larger group of the total
number of participants (the population). It’s one of the simplest systematic sampling methods
used to gain a random sample. Simple random sampling relies on using a selection method that
provides each participant with an equal chance of being selected. And, since the selection
process is based on probability and a random selection, the smaller sample is more likely to be
representative of the total population and free from researcher bias. This method is also called a
method of chance.
Simple random sampling involves randomly selecting data from the entire population so each
possible sample is likely to occur. There are no constraints with this method and therefore no
bias.
Stratified random sampling, on the other hand, divides the population into smaller groups (strata)
based on shared characteristics. A random sample is then taken from each (in direct proportion to
the size of the stratum compared to the population) and combined to create a random sample.
With stratified random sampling, you will end up with a sample that is proportionally
representative to the population based on the stratum used.
In most cases, this will work well. However, you may need to vary the proportions manually if
you’re aware of additional information that could skew the results.
For instance, using our wage example from above, the sample has 5 randomly selected
participants, which will be split by gender (made up of 2 women and 3 men). If you’re aware that
the wage gap range is larger across men, then this sample may miss key information as you don’t
have enough male data to support the reality.
Either, adjust the sample ratio to include more men – e.g. from 2:1 (6 men to 3 women) to 3:1 (8
men to 2 women).
Or, increase the sample size to include more of the population, to better reflect the wage range in
the male proportion of the sample – e.g. increasing the sample size from 5 to 10.
If you’re unsure where to start, try our sample size calculator to get a good indication.
But what if you want to simplify the process further by using a research panel?
If you’re thinking of using a research panel instead of conducting research yourself, you may
way to read our in-depth eBook: The Panel Management Guide
If you have 4 strata with 500, 1000, 1500, 2000 respective sizes and the
research organization selects ½ as sampling fraction. A researcher has to
then select 250, 500, 750, 1000 members from the respective stratum.
Stratum A B C D
Population Size 500 1000 1500 200
Sampling Fraction 1/2 1/2 1/2 1/2
Final Sampling Size Results 250 500 750 100
Irrespective of the sample size of the population, the sampling fraction will
remain uniform across all the strata.
Stratum A B C D
Population Size 500 1000 1500 2000
Sampling Fraction 1/2 1/3 1/4 1/5
Final Sampling Size Results 250 333 375 400
The success of this sampling method depends on the researcher’s precision
at fraction allocation. If the allotted fractions aren’t accurate, the results may
be biased due to the overrepresented or underrepresented strata.
Learn more: Cluster Sampling
Let’s say, 100 (Nh) students of a school having 1000 (N) students were asked
questions about their favorite subject. It’s a fact that the students of the 8th
grade will have different subject preferences than the students of the 9th
grade. For the survey to deliver precise results, the ideal manner is to divide
each grade into various strata.
Here’s a table of the number of students in each grade:
Grade Number of students (n)
5 150
6 250
7 300
8 200
9 100