Unit 1 Tutorials Key Principles of Statistical Methods
Unit 1 Tutorials Key Principles of Statistical Methods
Statistical Methods
INSIDE UNIT 1
Statistics Fundamentals
Statistics Overview
Data
Qualitative and Quantitative Data
Discrete vs. Continuous Data
Sampling
Sampling
Random & Probability Sampling
Simple Random and Systematic Random Sampling
Stratified Random and Cluster Sampling
Multi-Stage Sampling
Experiments
Data
Variables
Question Types
Accuracy and Precision in Measurements
Absolute Change and Relative Change
Using Percentages in Statistics
Index Number and Reference Value
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 1
Evaluating Studies
Bias
Nonresponse and Response Bias
Selection and Deliberate Bias
Convenience & Self-Selected Samples
Random and Systematic Errors
Margin of Error
Statistics Overview
by Sophia
WHAT'S COVERED
This lesson will provide you with an overview of what statistics really is by exploring:
1. Statistics
2. Types of Statistics
1. Statistics
You might be wondering, what is statistics? Is it some complicated formula? Is it some goofy graph that you
really don't know that much about?
When people refer to statistics, they're usually referring to information called data that's been collected and
synthesized within a statistical study, and sometimes presented in a graphical form, like this.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 2
While the image may be small and difficult to read, you get the idea that a LOT of information can be
presented in the form of a graph.
It can also be presented numerically such as "The median household income in the United States is $46,326."
Video Transcription
[MUSIC PLAYING] The practice of statistics deals with these four concepts here. Collect, analyze,
interpret, and present. You begin by collecting information from a variety of sources. You then proceed
to analyze that information that you've collected. After that, you interpret what that analysis means and
then you present it in a way that anyone can understand. And in this course you're going to learn how to
do all those things, and if I may try to be honest-- though as a robot, I can't fully experience the feeling of
honesty-- I do understand statistics quite well.
And I must say it's a really neat way to describe our messy world. It's not pretty all the time, but statistics
allow us a way to simplify things.
[MUSIC PLAYING]
STEP BY STEP
Statistics is a neat way to describe a messy world. It's not pretty all the time. But statistics allows us a way to
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 3
simplify things down.
TERM TO KNOW
Statistical study
A way to collect information from individuals
2. Types of Statistics
When you use descriptive statistics, you are going to analyze what's going on at aparticular point and use
statistics to describe the information that you've obtained.
On the other hand, when you use inferential statistics, you are going to use statistics that you've obtained and
make a generalization about the population at large.
IN CONTEXT
Let's say that you read the newspaper this morning and discovered that the average household
income in the United States was reported to be $46,700.
This information didn't come from sampling every household in the United States. That wouldn't be
realistic or feasible to knock on all the doors and speak to all those people. But someone arrived at
this number. So, how did they get it?
Well, a sample was taken, and a generalization was made about the entire United States based on
that sample.
TERMS TO KNOW
Descriptive statistics
Using only the information at hand to describe the selected group of individuals.
Inferential statistics
Using the information at hand to make a larger, more general statement about the entire population of
individuals.
SUMMARY
Statistics allows us to synthesize the information we get from the world around us. There are two
types of statistics. Descriptive statistics describe information gathered at a particular point. Inferential
statistics gather information and then makes a generalization or prediction about the population.
Good luck!
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 4
Source: THIS WORK IS ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS. bar chart, CC,
https://en.wikipedia.org/wiki/Chart#/media/File:Black_cherry_tree_histogram.svg no modifications made
TERMS TO KNOW
Descriptive statistics
Using only the information at hand to describe the selected group of individuals
Inferential statistics
Using the information at hand to make a larger, more general statement about the entire population of
individuals
Statistical analysis
All the ways of collecting, analyzing, and interpreting the data
Statistical study
A way to collect information from individuals
Statistics
The study of collecting, analyzing, interpreting, and presenting information
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 5
Data
by Sophia
WHAT'S COVERED
This lesson will introduce the collection and evaluation of data including:
1. Defining Data
2. Evaluating Types of Data
3. Gathering Data
1. Defining Data
Data is the pieces of information that we use in order to answer some statistical question. It could be a
number or an attribute.
But ultimately, it's the pieces of information that we use to get a more accurate picture of a scenario. Every
piece of data helps us to get a more accurate description, which begs the question, how do you obtain data?
Where does it come from? Do you just make it up? Where is data?
TERM TO KNOW
Data
Information used in a study to answer a statistical question.
Now, who collects data? Well, a lot of places collect data, such as:
Government organizations
Polling organizations
News sources
Government entities
Private entities
The vast majority of sources are trustworthy. However, when using available data, it's important to think
critically about what the information is trying to convey. It’s essential to break apart the information and ask
yourself these questions:
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 6
Who collected it?
Are they reputable?
Are they trustworthy?
When was it collected?
How was it collected?
Why did they collect it?
So, how do you know when you need to gather the information yourself? Gathering information yourself is
called raw data. Obviously, if the population doesn’t match your topic of interest, then it is of no value to you,
so you need to gather it yourself.
But what about less obvious characteristics such as whether or not a source has an agenda? This is a key
point here. Having an agenda, whether intentional or not, can introduce what's called bias.
Often, polling organizations and news organizations and government entities try to do the best job they can to
get relevant information. It's not usually intentionally put out there. But sometimes it is when they're trying to
push some kind of agenda.
TERMS TO KNOW
Available Data
Data collected by some other entity - a government organization or private company.
Raw Data
Unorganized, unprocessed, and not summarized. Typically, this is data that is not already available.
Bias
The systematic favoring of certain outcomes in a study. There are many ways to introduce bias into a
study.
3. Gathering Data
If you choose to collect your own data, you must think critically and ask yourself these questions:
Collecting data is important because it's the source of statistics. Think about data as the raw means of creating
something useful. If you collect your data well, the statistics are going to be accurate. If you collect your data
poorly, then your data is poor. There's no rescuing that.
⭐ BIG IDEA
You can't make useful statistics out of poor data. Thinking critically will help you determine which type of
data should be used for your purposes.
SUMMARY
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 7
This tutorial defined data as “information used in a study to answer a statistical question.” We
discussed how to evaluate types of data, available or raw, and questions focusing on the who, what,
why, and how should be posed to help identify bias. When gathering your own data, it’s important to
understand your audience and consider how they will gain access to all your hard work.
Good luck!
TERMS TO KNOW
Available Data
Data collected by some other entity - a government organization or private company.
Bias
The systematic favoring of certain outcomes in a study. There are many ways to introduce bias into a
study
Data
Information used in a study to answer a statistical question
Raw Data
Unorganized, unprocessed and not summarized.. Typically, this is data that is not already available
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 8
Qualitative and Quantitative Data
by Sophia
WHAT'S COVERED
In this tutorial, you're going to learn about the difference between qualitative data and quantitative
data by examining:
1. Qualitative Data
a. Nominal Measurements
b. Ordinal Measurements
2. Quantitative Data
3. Qualitative and Quantitative Data in Practice
1. Qualitative Data
Qualitative data is also often called “categorical data”. It is not numerical in the sense that we can do
numerical operations with it, like adding numbers together or finding an average, but rather, it fits in the
category.
EXAMPLE Gender: male and female. That's a qualitative variable with two categories.
Letter grades AND zip codes feature numbers, but you wouldn’t necessarily do mathematical equations with
them. You wouldn’t find an average zip code, for instance. The purpose of zip codes is to divide areas into
categories. Hair color is another example of qualitative data because you can group those with black hair and
put those with blonde hair in another group.
It's important to know that qualitative data can be divided further into two categories:
Nominal Measurements
Ordinal Measurements
TERM TO KNOW
Qualitative/Categorical Data
Data whose values are the names of categories. These can be numbers, but not the kinds of numbers
with which it makes sense to do any numerical operations.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 9
With nominal data, it only makes sense to reference which category has the largest frequency. In this case,
let’s say most people say that green is their favorite color. That is what you would report and it doesn’t matter
that green is the 4th box from the left.
TERM TO KNOW
Pain Scale
❍ ❍ ❍ ❍ ❍ ❍ ❍
No Moderate Worst
Pain Pain Pain
With ordinal data, it’s important to keep the order straight, or rather, in order, to express a spectrum ranging
from lowest to highest, or worst to best. Ratings like that.
TERM TO KNOW
2. Quantitative Data
On the other hand, you have quantitative data. Quantitative data are expressed numerically. It makes sense
to do numerical operations with it, like finding averages or adding them together.
Weight
Commute time to work
Outdoor temperature
All of these are measured in numbers. It makes sense to find, for instance, averages of these. So you can do
numerical operations with them.
It's important to note that data is displayed differently for qualitative data than with quantitative data. Statistical
operations depending on the type of data that we have.
TERM TO KNOW
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 10
Quantitative Data
Data whose values are numbers and it makes sense to do numerical operations.
Video Transcription
[MUSIC PLAYING] Here we have some examples to help you understand the differences between
qualitative and quantitative data. So first, we have blood type. That's going to be an example of
qualitative data. It's a description. It's telling you something about yourself, but it's not something that can
be added, or subtracted, or used for arithmetic even.
On the other hand, number of kids is quantitative data. Or how about a phone number? So even though
it's a number, it's still qualitative data because, really, who would ever add or subtract those values? So
what's an example of quantitative data? How about something like income? Income is quantitative data
because, again, it's a value that's giving us a quantity. It's telling us how much money you make, and
that's a value you could add, subtract, and do the mean and other measures of arithmetic.
[MUSIC PLAYING]
SUMMARY
Data used in statistics falls under one of two broad classifications: categorical, which is called
“qualitative,” or numerical, which is called “quantitative.”
Qualitative data branches out even further to either nominal, which means that the names are
important, and ordinal, which means the order is important.
Numerical values must make sense to do numerical operations with them. They are treated differently
when organizing graphical displays and applying statistics to them.
Good luck!
TERMS TO KNOW
Nominal Data
Categorical data with qualities that cannot be ordered or ranked.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 11
Ordinal Data
Categorical data with qualities that can be ordered or ranked.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 12
Discrete vs. Continuous Data
by Sophia
WHAT'S COVERED
This tutorial will discuss types of data by contrasting the following types of data:
1. Discrete Data
2. Continuous Data
3. Discrete and Continuous Data in Practice
1. Discrete Data
Now both of these are numerical or quantitative data, but discrete data can only take on certain values within
a range. Examples of discrete data would be the number of pets that someone has. Those can only take
whole number values. You can't have half of a pet.
Rail cars on the train and shoe sizes--now you can have half size shoe sizes. But that's all you can have. You
can't have quarter size shoe sizes, or eighth of size shoe sizes, or 0.01 shoe sizes. You can't say that you're a
size 9 and an eighth. So there are only certain values that shoe size can take. That makes it discrete.
TERM TO KNOW
Discrete Data
Data that can only take so many different values.
2. Continuous Data
Now the difference between discrete and continuous is continuous data can take any value within a range.
Some examples of data that are continuous are temperature, commute time, and wait. With all of these
examples, you can take on any value within a range. So for instance, suppose you're talking about daytime
temperature.
The daytime temperature could be something between 50 and 80 degrees on a summer's day, and it takes
on any value between those. Same with commute time. One day it might take you 30 minutes and five
seconds to get to work. The next day it might take you 32 minutes and 17 seconds.
And weight, one person might weigh 150.75 pounds, and one person might weigh 102.62 pounds. They can
take on any value within a spectrum. As opposed to discrete values can only take certain values within a
spectrum.
TERM TO KNOW
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 13
Continuous Data
Data that can take any value within an interval.
Video Transcription
[MUSIC PLAYING] Now, let's take a look at a few examples and determine if a situation is discrete, or
continuous. The time it takes to complete a race-- is this discrete or continuous? The time to complete a
race or any task is continuous data. Time can take on any value. You can measure the time it takes to
finish a race in hours, minutes, seconds, even fractions of a second.
The number of pairs of shoes you own-- discrete or continuous? This is discrete. You can't have half a
pair. OK, I suppose if you lose a shoe, you can have half a pair. But then, it's no longer a pair. Am I right?
Whatever the case, your number of pairs of shoes is not any number within a certain range. Your number
of shoes is a specific whole number, which is therefore a discrete number.
The time it takes for a light bulb to burn out-- is this a discrete or continuous number? This would be
continuous data. It could take any length of time for your light bulb to burn out, from 0 seconds up to
many years. How about the number of green chocolate candies in a bag? Is that discrete or continuous?
If you said discrete, you're correct. You typically would be dealing with only whole number values,
unless the poor bag of candy is crushed.
Barometric pressure-- is this discrete or continuous? You should have said that barometric pressure is
continuous because it can take any value within a certain range, usually somewhere around 30 inches
hg.
TRY IT
Question: What about the time for a light bulb to burn out?
Answer: That's continuous. It could take any length of time from zero seconds all the way to a couple of
years.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 14
Question: Number of green M&Ms in a bag?
Answer: Discrete. Typically, again, we're dealing only with whole number values.
SUMMARY
Quantitative data can be broken down into two subcategories. It can be called continuous. It can take
on a range of values, or if it can only take certain values, we call it discrete. And every quantitative
data measurement that we get is either going to be continuous or discrete. And the terms we used
are continuous data, which can take on any number in a range; and discrete data, which can only take
on certain values. This tutorial also put discrete and continuous data in practice to allow for some
application!
Good Luck!
TERMS TO KNOW
Continuous Data
Data that can take any value within an interval.
Discrete Data
Data that can only take so many different values.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 15
Sampling
by Sophia
WHAT'S COVERED
In this tutorial, you're going to learn all about sampling, focusing on:
Typically, we use the population of the United States, the population of the world, or the population of a state
to be the population that we wish to generalize our findings to since examining all members of a population
may not be feasible. This method, examining all members, is called a census. Hopefully, a group of people can
represent the population.
Since the group of people from the United States seems like too big of an example, a smaller example of
billiard balls will be demonstrated. As you see in the image below, the complete set of things in this particular
example are the 15 billiard balls on a pool table.
With a group so small, it's possible to take all of them and define some attribute of them like color, or weight,
or what have you--whether they're striped or solid, there are lots of different ways that you could describe
each pool ball. And it's easy enough just to take the entire population and examine all of them.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 16
TERMS TO KNOW
Census
Using the entire population to obtain data.
Population
The entire set of individuals from which to sample.
2. Sample
When you think about the United States example, you can see that it's not really always feasible. Suppose
your population is a large group of people, much larger than 15 people. It's kind of a big group, and it might be
hard to get answers from everybody.
What you might choose to do is take a small subset of those individuals and make a sample. In this case,
perhaps seven of these many individuals in the population were chosen. A sample is a subset of the
population and you would obtain data from that subset and leave everyone else out.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 17
From that sample, you would obtain your data and calculate your statistics. The idea is hopefully you would
like the sample to be a small version of the population. A microcosm of the population, such that when you
calculate your statistics from the data we obtain from the sample, it's about the same as what we would have
gotten if we had measured the population directly. That's what we mean when we say that we want the
sample to be a representative sample of the population.
There are certain ways that you can guarantee that a sample will be representative. One way is to take the
entire population and put them in a hat.
Now again, this is a lot easier with billiard balls then it is with people. But imagine putting all the billiard balls
into the hat.
Let’s say you shake up the hat, and take out a sample of five.
There are certain ways to guarantee that you won't get a representative population. Suppose I specifically
cherry picked only solid colored billiard balls. Well, that wouldn't be very representative of the population of
15.
⚙ THINK ABOUT IT
Is it possible that when you take that hat and pull out five billiard balls that all five of them are solid? Sure,
that's possible, it's just not all that likely. If you cherry pick, that's not a good idea because you're getting
something that's specifically not represented.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 18
TERMS TO KNOW
Sample/Sampling
A subset of the population. There are many ways to select a sample.
Representative Sample
A sample that accurately reflects the population.
SUMMARY
A census is a way of collecting data that uses everybody. And a sample only uses some. To
generalize the findings from the sample to the population at large, it has to be representative of your
population at large. Once again, the terms that we've described in this tutorial are population, census,
the noun sample, and the verb sampling, and the idea that a sample should be representative.
Good luck!
TERMS TO KNOW
Census
Using the entire population to obtain data
Population
The entire set of individuals from which to sample
Representative Sample
A sample that accurately reflects the population
Sample/Sampling
A subset of the population. There are many ways to select a sample.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 19
Random & Probability Sampling
by Sophia
WHAT'S COVERED
This tutorial covers random and probability sampling methods, focusing on:
1. Random Sample
2. Probability
1. Random Sample
The term “random” is used a lot in everyday speech, but what does it mean when it comes to statistics? In
statistics, random refers to something that is unpredictable and does not have a recognizable pattern.
With a random sample, every member of the population has the same chance of getting selected. This is the
best way to get a representative sample. Recall that a representative sample is when the population and the
sample have the same set of relevant characters.
If you want a random sample, you would need to select participants in such a way that every member of that
population has an equal chance of being selected for the sample. This is also known as random selection.
You need to come up with a method to achieve a random sample, and you can do that with aprobability
sampling plan. This plan must be made first before a random sample can be taken. You can also “weight”
certain people so that they might be more likely to be selected for the sample, too.
IN CONTEXT
What does a random sample look like in context? Suppose there are 15 billiard balls from a pool
table:
You place them all in a hat, and you shake the hat, and voila, here's a sample of five.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 20
Shake #1
Suppose you place the billiard balls back in the hat and shake the hat for a second time.
Shake #2
This is another sample of five and is not that different than the previous example. If you conducted
the same hat trick over and over again, they would all have an equal chance of being pulled.
Shake #3
What happened here was we got balls 9, 11, 12, 13, and 14--all of which happened to be striped
billiard balls. No solids. If you only had access to this information, you might be led to believe that all
the balls in the hat were striped, which wouldn't be the case.
This may seem odd, but it can certainly happen even though you selected these randomly--you did a
probability sampling plan. The reason being, this sample of five is just as likely as any other sample
of five to be chosen.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 21
TERMS TO KNOW
Random Sample
A sample that has been selected in a manner where every member of the population has some
predetermined chance of being selected for the sample.
Random Selection
The method of obtaining a random sample.
SUMMARY
The best method for selecting a sample that's representative is a random sample and a probability
sampling plan. Now, this won't always get you a representative sample. But often, you will get one
when you do random samples.
Good luck!
TERMS TO KNOW
Random Sample
A sample that has been selected in a manner where every member of the population has some
predetermined chance of being selected for the sample
Random Selection
The method of obtaining a random sample
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 22
Simple Random and Systematic Random
Sampling
by Sophia
WHAT'S COVERED
This lesson will explain how to ensure everyone in the population has an equal chance of
participating in a sample, specifically focusing on:
If you’ve ever experienced a raffle situation, you’ve experienced a simple random sample. What generally
happens at these events is that someone removes tickets from the raffle puts them into a bucket.
The tickets are mixed up in the bucket, and one ticket is pulled out. The owner of that ticket usually wins
some kind of fantastic prize. Now, being in a simple random sample is pretty much the same thing. The only
difference is that instead of winning the prize, you get to be part of the sample and that's your prize.
IN CONTEXT
Suppose you take billiard balls from a pool table and put those all into a hat.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 23
Next, shake it up, and pour out five billiard balls. Do this for two shakes.
Shake #1 Shake #2
You may have noticed that the solid, yellow “1” ball was in both of these first two examples.
However, it doesn't mean it's any more likely to be selected than any of the other balls. It's the same
likelihood. Any sample of five, the first sample or second sample of five, were equally likely samples
of five
Shake #3
Now, notice, all five of these were striped billiard balls, not one solid ball in the bunch. Is that
unusual? Sure, it's kind of unusual to happen. Unusual samples have an equal likelihood to happen
too. Just because they're strange and don't happen very often doesn't mean they can't happen. In
fact, they have the same likelihood as any other selection of five.
Therefore, knowing how to take a Simple Random Sample, abbreviated SRS, is important because most
inferences about the population that we do assume that we collected data in this way. So names in a hat are
fine. In our case, raffle tickets in a bucket, or billiard balls in a hat...that's all fine.
TERM TO KNOW
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 24
However, what about the situations where we don’t have the manpower to pull numbers or names from a hat?
There are two other ways to take a simple random sample. One way is using a random number generator
and the other is a random number table. First, we are going to discuss the random number generator.
EXAMPLE Suppose that we want to take a sample of 100 individuals from a population of 2,000
people. Below you will see some of those individuals lined up, and you can imagine that individuals 10
through 1,995 are somewhere in the middle. Each is assigned a unique number so no one can have
the same number as anybody else.
Using technology such as a website, you can search "random number generator" on the internet, and
websites will come up. Or, you can use a calculator. This particular model of a calculator is the Texas
Instruments calculator:
“RandInt” indicates random integer”--an integer is a whole number-- from 0 to 1. And so it picks either 0 or 1.
When you put in the third number, it's asking how many of them do you want? In this case, you entered five.
Now, you don't want numbers between 0 and 1 in this case, and you don't want five of them. You want
numbers between 0 and 2,000., and you want 100 of them. Now, why was 150 written when you only want
100 numbers?
You can’t select one person twice, so repeats must be ignored. It's incredibly likely that if you had just written
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 25
100 instead of 150, there would have been at least one repeat in the bunch.
Finally, you're going to select the individuals that correspond to those first 100 different numbers that were
picked.
So, person number 8, and the person that corresponds to 1,119, and the person who corresponds to 1,996 are
a few that are chosen. Now, notice that the person corresponding to 8 was chosen again--you can see that it’s
listed twice in the list. You're not going to select that person twice because they've already been selected
once, so they are crossed out. This is the reason 150 numbers were created, so you have room to cross
repeats out.
TERM TO KNOW
The same method as the random number generator cannot be used, because the number 2,000 has four
digits, and the number 1 only has one digit. All of these must have the same number of digits, so instead of 1,
it's 0001. Instead of 2, it's 0002, and so forth, all the way up to 2,000. A table of random digits can be found in
a textbook or online. Four numbers will be selected at a time because each individual has four numbers.
EXAMPLE Suppose the first four numbers found were 1-9-2-2. That corresponds to someone in the
list. There is someone who is 1,922 so that individual will be selected for the sample. It’s circled in
green below since a person corresponds to that number. The next number found is 3-9-5-0. No one
on the list that corresponds to the number 3,950, so it is ignored. The next number, 3-4-0-5, does not
correspond to an individual either, so that is ignored as well.
You'll notice that all numbers circled in red are numbers that are unassigned in our list. This is going to make
this a very cumbersome process. It will go for a while until 100 individuals are obtained. Will this work? It will
work, but it might take a very long time.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 26
One of the numbers circled in green is 0001. This is the very first person on the list, and it just happens that
person 0001 will be among the sample. This individual will be selected along with everyone else whose four-
digit number was selected.
TERM TO KNOW
The value of "k" can be anything. You could choose every second individual, in which case all the green
people are in, and all these black stick figures are out. Or, you could do every third person, where one person
is in and then skip two; then the fourth person is in and skip two. Or, you could go every fourth person.
Often people prefer systematic samples to simple random samples because systematic samples are so much
easier to take. It's easier than getting a whole list of people and assigning everyone a number or putting all
the people's names in a hat. It's easier to take every fifth person or whatever you decide "k" should be.
HINT
The nice thing about a systematic sample is that it can be tailored to fit your sample size. If you wanted a
sample of 25 from 500 individuals, you could sample every 20th person since 500 divided by 25 equals
20. So you would obtain your sample of 25 by sampling every 20th person.
IN CONTEXT
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 27
Suppose that you have 20 students in a class, and they're in rows, assigned to their desks randomly.
If that were the case, you could count off every fourth student and have five students go up to the
chalkboard to do a homework problem on the chalkboard.
1 2 3 ✘ 5
6 7 ✘ 9 10
11 ✘ 13 14 15
✘ 17 18 19 ✘
So, person one, two, and three don't have to do it. Person number four heads up to the chalkboard
to work on a problem. Five, six, and seven don't have to do it, but number eight does. You can see
the checkmarks to indicate the pattern and who needs to go up to the chalkboard.
Adamson
Abbott Acosta Adams Adler
✘
Frye
Anderson Bueller Grey Jones
✘
Morris
McClurg Peterson Pickett Rooney
✘
Ruck Ward
Sara Sheen Stein
✘ ✘
By selecting say, Adamson, you automatically know who all the rest of the people are going to be.
Since Adler is right next to Adamson, you know that Adler won't get chosen. Nor will Anderson or
Bueller, but Frye will.
If these students were randomly assigned to the seats, picking Adamson would not predetermine
who all the other people were going to be selected for the sample, but having them alphabetized
impacts the random selection process.
TERM TO KNOW
SUMMARY
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 28
A simple random sample is the ideal sampling method if your goal is to obtain a representative
sample. Sometimes, with big populations, it's not feasible to assign everyone a number or put
everything into a hat so other sampling methods may be used. The random number generator is
typically used with a calculator and is a fast way to calculate random “integers” without needing to
assign same-number digits to each individual. The random number table is a more time-consuming
method and generally used when technology is not available. A systematic sample can be similarly
valid, and it is much easier to perform. It involves taking every "k"-th individual--however, the
population must be randomly sorted before the systematic selection. Otherwise, it won't be
considered random.
Good luck!
TERMS TO KNOW
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 29
Stratified Random and Cluster Sampling
by Sophia
WHAT'S COVERED
This tutorial will cover the topic of stratified random sampling, which is a random sampling procedure
that subdivides the population into groups. In addition, we will introduce cluster samples. This lesson
will focus on:
For a simple random sample of 42 students, think of ways that 42 students could be chosen, each having an
equal chance of being selected. First, assign each student a unique number 1 to 420 (total number of
students). Once this is done, you could:
Use a random number generator to select 42 numbers, ignoring repeats. The students who
corresponded to those numbers will be surveyed about the school's new, healthy options.
Put the 420 student names in a hat and draw out 42.
Now, is there a way that the study might improve and guarantee an accurate cross-section of students
between the grades? After all, freshman might feel differently about the healthy options than seniors so it will
be important to have individuals from each grade weigh in on the lunch options.
This can be done with a stratified random sample. Stratified random sampling is a method where the
population is subdivided into groups called strata. Strata are groups with homogeneous characteristic(s). They
are separated by the characteristic that we think might affect the overall sample. This is to avoid having too
many of the sample having this one characteristic that may affect the sample.
In the above example, it would look something like this: since 42 is 10% of the school's population, your survey
should be 10% of each grade.
10% of the freshmen class of 100 is 10, so you would want to randomly select ten individuals from the
freshman class to participate.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 30
10% of the sophomore class of 110 is 11, so you would want to randomly select 11 individuals from the
sophomore class to participate.
10% of the junior class of 120 is 12, so you would want to randomly select 12 individuals from the junior
class to participate.
10% of the senior class of 90 is 9, so you would want to randomly select nine individuals from the senior
class to participate.
Once the groups are in place, a simple random sample is carried out within each stratum, like putting names
in a hat or assigning everyone a unique number and randomly selecting numbers. You can have as many
strata as you please, but they must be roughly homogeneous.
Video Transcription
[MUSIC PLAYING] Pretend you've subdivided billiard balls into low, middle, and high numbers. To take a
stratified random sample of the 15, this is what you do. Put all the low numbered balls in hat one. Put all
the middle numbered balls in hat two. And finally, put all the high numbered balls in hat three.
At that point, you'd randomly select two from each hat. The result would give you a stratified random
sample of six billiard balls. You're guaranteed to have exactly two low numbers, exactly two middle
numbers, and exactly two high numbers.
TERM TO KNOW
Stratum/Strata
The homogeneous groups in a stratified random sample. All individuals in each stratum have
something in common, and we would like to see how that affects the outcome of the sample.
2. Cluster Samples
When using a cluster sample, the population is divided into groups. These groups are calledclusters. It’s
important to note that these groups are natural groupings. They don't necessarily have anything in common,
other than say, geography, typically. Therefore, we're going to take a random sample of clusters instead of a
random sample of individuals.
Each individual in the cluster is going to be part of the sample if we select that cluster. So unlike the groups in
a stratified random sample, the groups in a cluster sample aren't based on a characteristic or variable. The
individuals in the cluster just happen to be near each other.
IN CONTEXT
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 31
Suppose you work at a potato chip company and it’s your job to implement some quality control in
the manufacturing department. Maybe you stand at the start of the assembly line and take a simple
random sample of individual chips. That would work just fine.
However, it might be easier for you to sample some bags of chips. The bags of chips are clusters.
You would then take a bag of chips off the assembly line and sample every chip in that bag for
quality control. That’s cluster sampling.
Similar to every sampling method, cluster sampling has pros and cons.
Easier than a simple random sample, and often it doesn't cost as much
Advantages
Typically gives similar results because the clusters are fairly heterogeneous
Risk that clusters are NOT heterogeneous--perhaps they do have some characteristic other
Disadvantages than just being geographically different from each other that might affect the sample's
findings.
TERMS TO KNOW
Cluster Sample
A sampling method where the population is separated into groups, typically geographically, and a
random selection of clusters is made. Each individual in the cluster becomes part of the sample.
Clusters
Smaller subgroups of the population, not necessarily similar in any way besides all being together in
one place, making the individuals easier to sample together.
3. Real-World Comparison
Suppose a landlord of an apartment complex wants to know whether a new carpet he's considering is
appropriate for all the apartments in the building. Each of the four floors has eight apartments.
⚙ THINK ABOUT IT
What would a simple random sample look like? How might a cluster sample be different from a stratified
random sample?
Simple Random Sample: He could randomly select eight apartments from the building.
Stratified Random Sample: He could randomly select two apartments per floor.
Cluster Sample: He could take a spinner like the one shown below and spin it.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 32
Suppose it landed on three. That means that every apartment on the third floor would receive carpeting.
He doesn't have to have the carpet installers going to all these different rooms on all these different
floors. He can simply instruct everyone to go up to the third floor and install carpet in every room on that
floor, which would be far easier for him and just as cost-effective.
But what if all the floors were NOT heterogeneous? What if apartments on the third floor allowed pets? The
carpet might not hold up as well. That’s one of the disadvantages of cluster sampling in action. But typically,
the clusters are fairly representative and very similar to a simple random sample.
SUMMARY
In a stratified random sample, the population is broken down into homogeneous groups called
"strata." The reason for this is to separate an otherwise homogeneous group that exhibits
characteristics that may misrepresent the population. The idea is to force them into groups and then
take a simple random sample within each of the strata. Cluster sampling, on the other hand, is done
by taking naturally-occurring--typically geographically--similar groups and taking a simple random
sample of the clusters. Then, each member of the cluster becomes part of the sample. A couple of
advantages of cluster samples are that they are more cost effective, and usually achieve the same
results as a simple random sample. The disadvantage is that sometimes the cluster may not be
heterogeneous, as seen in the landlord example with pets allowed on carpet.
Good luck!
TERMS TO KNOW
Cluster Sample
A sampling method where the population is separated into groups, typically geographically, and a
random selection of clusters is made. Each individual in the cluster becomes part of the sample.
Clusters
Smaller subgroups of the population, not necessarily similar in any way besides all being together in one
place, making the individuals easier to sample together.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 33
Stratum/Strata
The homogenous groups in a stratified random sample. All individuals in each stratum have something
in common, and we would like to see how that affects the outcome of the sample.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 34
Multi-Stage Sampling
by Sophia
WHAT'S COVERED
You'd have to somehow account for every person in the United States, and maybe assign them a number,
and pull numbers out of a hat, or use some kind of random sampling procedure. This would be too difficult to
assign to everyone.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 35
Strata, in this case, are still too big. You might take a few people from Maine, and a few people from
Minnesota, and a few people from North Dakota, etc., and it would still be too large. Plus, it really wouldn't be
cost effective, commuting to all these different places.
If you identified states as clusters, you would randomly select some of the clusters and then sample everyone
within that cluster. You'd be sampling entire states. For example, everyone in North Carolina would be in the
sample if you select that state as a cluster, which simply isn't feasible.
Therefore, none of those really make any sense. The way out of the box here is a multi-stage design.
2. Multi-Stage Sampling
Multi-stage sampling is a common sampling procedure utilized when the population is very, very large. With
multi-stage sampling, you continue zooming in from larger areas to smaller and smaller areas until you can
find a small enough sample of the people you need.
To perform a multi-stage sampling, first select clusters, then take a simple random sample from each cluster.
Video Transcription
[MUSIC PLAYING] Suppose you want to sample the United States as a whole. Because of geographic
simplicity, states make the most sense as clusters. If every state needs to be represented, a stratified
random sample should be performed. However, it's not realistic or feasible to sample everyone within
each state. So in this instance, you can randomly select five states to make up the clusters for your multi-
stage sample. Of these five states, you pick one to begin the process.
Let's say you start with Minnesota. And because it's equally unrealistic to sample everyone in a state,
you continue to narrow down your population with a random selection of counties. You once again
select five. If you were able to sample everyone in these counties, you can stop. But if you still need a
smaller sample size, randomly choose just one, such as Carver County. Then you can randomly select
three towns within that county.
Again, if those are small enough units, you can stop. However, if the sample size is still too large,
continue to narrow it down by selecting just one town, like Chaska. Within Chaska, for example, you can
sample some neighborhoods.
Typically, by the time you get to the neighborhood level, it's easy enough to walk around and get almost
everybody within that neighborhood. This method of drilling down from state to county to town to
neighborhood would give you a multi-stage sample of your first cluster, Minnesota. Then it's on to the
next cluster, where you would repeat the process with the remaining four to achieve a multi-stage
sampling of the United States.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 36
STEP BY STEP
Step 1: States
When sampling the United States as a whole, states make the most sense as clusters because of
geographic simplicity. It’s not realistic or feasible to sample everyone within a state, so randomly select
just five states: California, Tennessee, Minnesota, Massachusetts, and Oklahoma. Pick one state and start
the process.
Step 2: Counties
It is equally unrealistic to sample everyone in Minnesota, so you can narrow your sample by randomly
select counties. Perhaps you select Carver County, Marshall County, and maybe a few other counties. If
that's a small enough basis for you to get everyone within the county, then you can stop.
Step 3: Towns
If you need yet a smaller sample size, you can choose just one county, like Carver County, and sample
towns within that county. Perhaps you randomly select three of those towns: Chanhassen, Waconia, and
Chaska. If those are small enough units, then you can stop.
Step 4: Neighborhoods
However, if the sample size is still too large, you can continue to narrow it down. Within Chaska, for
example, you can sample some neighborhoods. Typically by the time you get to neighborhoods within a
town, it's easy enough to walk around the neighborhood and get almost everybody within that
neighborhood.
Now you can move onto the next cluster where you would repeat this process with the remaining four states.
TERM TO KNOW
Multi-Stage Sampling
A sampling design which combines elements of cluster sampling, stratified random sampling, and
simple random sampling. It "zooms in" on smaller areas to sample so that sampling becomes more
feasible.
SUMMARY
Multi-stage sampling is used when the population is so big and the groups, strata or clusters so large
that it makes more sense to zoom in and take small groups. You begin with certain clusters, and then
you sample within those clusters instead of taking the full cluster. Therefore, multi-stage sampling
combines elements of cluster sampling, stratified designs, and simple random designs, which were
contrasted within this tutorial, though you may recall, none of these were feasible when attempting
the sample of the United States.
Good luck!
Source: SOURCE: THIS WORK IS ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS. MN MAP:
HTTPS://EN.WIKIPEDIA.ORG/WIKI/LIST_OF_COUNTIES_IN_... CARVER COUNTY:
HTTPS://EN.WIKIPEDIA.ORG/WIKI/LIST_OF_COUNTIES_IN_...
TERMS TO KNOW
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 37
Multi-Stage Sampling
A sampling design which combines elements of cluster sampling, stratified random sampling, and simple
random sampling. It "zooms in" on smaller areas to sample so that sampling becomes more feasible.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 38
Observational Studies and Experiments
by Sophia
WHAT'S COVERED
This tutorial will explore observational studies and how they are conducted. We will also cover
experiments, which are a little different than observational studies, through the exploration of:
1. Observational Studies
2. Types of Observational Studies
3. Experiments
4. Experiments vs. Observational Studies
1. Observational Studies
An observational study is a type of study where the researcher can observe but does not administer any
treatment. Therefore, whatever would normally happen, the researcher has to allow it to happen.
Researchers can't change anything about the people or subjects they are studying. The researcher can
record the variables of interest, but again, can't affect the study. People have to be allowed to do whatever it
is they were going to do without interruption.
TERM TO KNOW
Observational Study
A type of study where researchers can observe the participants, but not affect the behavior or
outcomes in any way.
Retrospective Study: Researchers look to the past to see what has already happened; also known as a
case-control study.
EXAMPLE Consider observing people who are sick--those are called the cases--versus people
that aren't sick, which are the controls. Then, you look back to see what similarities the cases have in
common and what similarities the controls have in common.
Prospective Study: Researchers select individuals to participate and record what happens as it happens;
also known as a longitudinal study.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 39
EXAMPLE Individuals are engaging in activities like smoking or jogging. You record what happens
as it happens, as opposed to trying to look back and figure it out.
IN CONTEXT
The year is 1929 and a cancer doctor has a suspicion that smoking may cause cancer. His cancer
patients become his subjects, or participants, in his study. He asks his subjects, “Did you happen to
smoke before you got cancer?” What he found was an overwhelming majority of his cancer patients
did, in fact, smoke. Therefore, this doctor was the very first person to suggest a link between
smoking and cancer.
That inspired some new studies, one of which began in 1934. It dealt with several thousand doctors,
so it was a physician’s smoking study. The reason doctors were chosen is that doctors are usually
very diligent about following protocols, meaning that those who smoked would likely continue to
smoke, and those who didn't smoke would likely continue not smoking. Also, doctors typically
wouldn't drop out of a study. Notice in the image below, how some of these physicians smoked, and
some of them did not.
They did the study, and some of the doctors got cancer. Now, not every doctor who smoked ended
up getting cancer, and not every person who got cancer was a smoker. However, what they found
was that the vast majority of the time, it was the doctors who smoked that got cancer.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 40
This study was conducted over a long period of time--a 20-year study. At its conclusion, this was the
most convincing evidence that smoking had an effect on cancer. This was an example of a
prospective study because it started with the doctors and followed them through to 1954.
It is important to note, however, that neither of these types of studies, prospective or a retrospective, can
actually prove a cause-and-effect relationship. The only thing that can prove a cause-and-effect relationship
between two variables is an experiment.
TERMS TO KNOW
Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand how they
became the way they are in the present.
Prospective Study
A study that begins by selecting participants, tracking them, and keeping data on the subjects as they
go into the future.
Subjects/Participants
The people or things being examined in an observational study.
3. Experiments
An experiment is a different type of study than an observational study. The differences will be covered in
detail shortly, but essentially, the researchers are allowed to impose treatments on the participants.
Treatments are administered and response to those treatments is measured. Because the researchers are the
ones implementing the treatments and measuring the response, a cause-and-effect relationship between
variables can be determined.
When discussing experiments, there is some very common terminology that you should be aware of. For
example, as mentioned in the section above, subjects and participants are used interchangeably and describe
people involved in an experiment. If animals or things are used in an experiment, they are referred to as
experimental units. While it may seem a bit impersonal, it is universal terminology in the field of experiments.
TERMS TO KNOW
Experiment
A type of study where researchers impose treatments on the participants or experimental units.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 41
Experimental Unit
An animal or thing involved in an experiment.
An experiment, on the other hand, is far more active on the part of the researcher. The researcher is creating
the differences between the two groups, then determining whether or not there is a cause-and-effect
relationship.
If you have a study that you'd like to do, but you can't perform it due to ethical or practical concerns, or it
takes too much time or money, you can avoid those concerns or circumvent them by doing an observational
study.
⚙ THINK ABOUT IT
When trying to determine if cigarette smoking causes cancer, several observational studies have been
conducted, but never a true experiment. Why would that be?
Well, it would be unethical to break people into groups and administer cigarettes to a group of people
when trying to determine if it causes terminal illness. The same applies to alcohol consumption.
⭐ BIG IDEA
There are certain instances in which an observational study will be preferred over an experiment due to
factors like time, money, and privacy, where it is unlikely people will divulge that type of information
SUMMARY
An observational study is a type of study where the researcher can observe but not influence the
behavior of the participants, or subjects. A retrospective study involves looking back at behavior,
while a prospective study involves gathering your participants and following them along as they live
their lives. An observational study, though, cannot prove a cause-and-effect relationship
Conversely, in an experiment, a researcher can directly influence the subjects by applying treatments.
Because the researchers are the ones implementing the treatments and measuring the response, a
cause-and-effect relationship between variables can be determined. Terminology such as subjects
and participants is important to know since it identifies individuals directly involved in the experiment.
Animals may be directly involved in an experiment, but they are referred to as experimental units
rather than subjects or participants.
Sometimes an experiment may be unethical, expensive, or too lengthy. In those cases, observational
studies may be used, which allow a researcher to study occurrences in a natural setting without
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 42
administering treatment of any kind.
Good luck!
TERMS TO KNOW
Experiment
A type of study where researchers impose treatments on the participants or experimental units.
Experimental Unit
An animal or thing involved in an experiment.
Observational Study
A type of study where researchers can observe the participants, but not affect the behavior or outcomes
in any way.
Prospective Study
A study that begins by selecting participants, then tracks them and keeps data on the subjects as they
go into the future.
Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand how they
became the way they are in the present.
Subjects/Participants
The people or things being examined in an observational study.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 43
Prospective and Retrospective Studies
by Sophia
WHAT'S COVERED
This tutorial will explore observational studies and how they are conducted. We will also cover
experiments, which are a little different than observational studies, through the exploration of:
1. Observational Studies
2. Types of Observational Studies
a. Prospective Study
b. Retrospective Study
1. Observational Studies
An observational study is a type of study where the researcher can observe but does not administer any
treatment. Therefore, whatever would normally happen, the researcher has to allow it to happen.
Researchers can't change anything about the people or subjects they are studying. The researcher can
record the variables of interest, but again, can't affect the study. People have to be allowed to do whatever it
is they were going to do without interruption.
TERM TO KNOW
Observational Study
A type of study where researchers can observe the participants, but not affect the behavior or
outcomes in any way.
It can be similar to a matched-pair design in an experiment, but in this case, the researchers are not giving a
treatment or doing anything to affect the people.
EXAMPLE In a study, suppose you take a pair of participants, who are similar across most
variables except for one major difference -- one participant has a disease, "the case", and one
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 44
participant who does not have a disease, "the control". Because the participants are so similar, you are
focusing on just that disease and seeing how it affects the participants or what causes the disease.
This is considered retrospective because it looks in the past. You ask the participants to recall past
events or use information about their past to determine what risk factors there are for the disease.
TERM TO KNOW
Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand how they
became the way they are in the present.
EXAMPLE The Framingham Heart Study started in 1948 and is still going on today. 5,209 healthy
adults from Framingham enrolled in this study. Researchers collected a variety of information about the
subjects, including social networks, eating habits, exercise habits, and several markers for heart
health.
Over a thousand different research papers have been written using this information. Some of these
papers have proven that obesity and smoking increase the risk of heart failure. Other papers look at
how the social networks tie to obesity risks.
TERM TO KNOW
Prospective Study
A study that begins by selecting participants, tracking them, and keeping data on the subjects as they
go into the future.
Subjects/Participants
The people or things being examined in an observational study.
SUMMARY
An observational study is a type of study where the researcher can observe but not influence the
behavior of the participants, or subjects. A retrospective study involves looking back at behavior,
while a prospective study involves gathering your participants and following them along as they live
their lives. An observational study, though, cannot prove a cause-and-effect relationship
Good luck!
TERMS TO KNOW
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 45
Observational Study
A type of study where researchers can observe the participants, but not affect the behavior or outcomes
in any way.
Prospective Study
A study that begins by selecting participants, then tracks them and keeps data on the subjects as they
go into the future.
Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand how they
became the way they are in the present.
Subjects/Participants
The people or things being examined in an observational study.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 46
Experimental Design
by Sophia
WHAT'S COVERED
In this tutorial, you're going to learn about the principles of experimental design.
1. Control
2. Randomization
3. Replication
TERMS TO KNOW
Experimental Design
The way in which an experiment is carried out. A good design has key elements of randomization,
replication, and control.
Treatment
Something the researchers administer to the subjects or experimental units.
1a. Control
Control means holding everything else besides what you're trying to measure constant. The purpose is to
determine whether or not your treatment is effective. In other words, if there is an observable difference
between groups, is it due to the treatments or due to a confounding variable? It is important to control all other
variables to help limit confounding.
Source: This work is adapted from Sophia author Jonathan Osters.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 47
Video Transcription
[MUSIC PLAYING] Hello. Let's take a look at a real-life example using experimental design. Suppose a
farmer wants to try a new fertilizer in the fields. The three components of experimental design can be
used to determine if the new fertilizer is better than the old one. Here's how it would work.
The first thing the farmer would do is determine the control by selecting 10 fields with similar soil
nutrients, sunlight, and water. These are all variables that could affect the crop growth. The farmer would
then apply the old fertilizer to five fields and the new fertilizer to the other five. By keeping the control
elements consistent across the 10 fields, the differences between them can be isolated and attributed to
either the old or the new fertilizer.
Next, the farmer takes randomization into account by randomly assigning which five fields will get the
new fertilizer. While the fields selected were as similar as possible, there may be an unknown variable
that was not accounted for. Perhaps some fields had moles underground. And that would affect how the
crops grow.
By randomly assigning treatments, the farmer should get some fields with moles using the new fertilizer
and some fields with moles using the old fertilizer. Randomization smooths out those effects that
unknown variables might bring into the equation.
Lastly, the farmer understands the significance of repeated results rather than a one-off result. Say the
farmer was only able to find two fields similar to each other and randomly assigned one for the new
fertilizer and one for the old. It is possible in that case that the field with the old fertilizer does very well
just by random chance. This would make it seem like the new fertilizer is not effective when perhaps it is.
Or the opposite could happen where it seems like the new fertilizer is effective when it's not. So it would
always be better to randomly assign 10 fields as the farmer is more likely to find valid trends among 10
fields than two. Thanks for watching. And see you next time.
IN CONTEXT
Suppose you are a farmer and you want to try a new fertilizer in your field. One thing you could do is
choose ten fields with similar soil nutrients, sunlight, and water--all variables that could affect the
crop growth.
You could then apply the old fertilizer to five fields and the new fertilizer to the other five. By keeping
all the other variables--soil nutrients, sunlight, water--consistent, the differences between the fields
can be isolated and attributed to the old fertilizer or the new fertilizer.
Does the new fertilizer work? Is it effective? This is the idea behind controlling for all of these other
variables.
TERM TO KNOW
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 48
Control
The principle of experimental design that requires that other variables which may confound the
experiment be held constant between the treatment groups so that any differences in the groups can
be attributed to the different treatments.
1b. Randomization
The second big idea of experimental design is randomization. The treatments must be assigned to the
subject using a random process, otherwise known as "randomization." The purpose of random assignment is
to try and filter out all the other sources of variation that you couldn't anticipate to control for.
EXAMPLE Referring to the farmer example, even though you made the fields as similar as possible
with respect to water, sunlight, and soil, it's possible that there is a variable that you didn't think to
control for. Perhaps some fields had moles under the ground, and that would affect how the crops
grow. How would you know to control for moles?
By randomly assigning treatments to the fields, you can hopefully get some fields with moles in fields
with both the new and old fertilizer. Randomization smooths out those effects that other variables
might bring into the equation.
HINT
Randomizing also helps avoid bias, because you can’t be tempted to assign treatments to the
experimental units you think might give favorable outcomes.
Randomization in an experiment does not really achieve the same purpose as a random selection in a sample.
When you do a simple random sample, the idea is to get a sample that's representative of the population. In
an experiment, the purpose of randomly assigning individuals to groups is to filter out unknown sources of
variation. The assignment in an experiment, however, is fairly similar to the way you would randomly select in
a sample.
TERM TO KNOW
Randomization
The principle of experimental design that requires that the subjects/experimental units be assigned to
groups using some random process. This ensures that the two groups are roughly equal prior to
assigning treatments.
1c. Replication
Replication is the last key idea in experimental design, which basically states that a bigger sample is better.
Repeating the experiment on multiple subjects or experimental units is a better idea than doing a few. Why is
that?
A larger size of the experiment means it's more likely that you can find trends that perhaps you wouldn't have
found in a smaller experiment. The more you replicate, and the more experimental units you can get into your
experiment, the more likely it is that you're going to find the true trends that arise, rather than some freak
anomaly.
⚙ THINK ABOUT IT
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 49
What if the farmer could have just found two fields that were similar to each other, instead of 10 fields, and
randomly assigned one to get the new fertilizer and one to get the old. Isn't it possible in that case that
maybe the field with the old fertilizer does very well just by random chance?
This would make it seem like the new fertilizer is not effective when perhaps it is. Or the opposite could
happen, where it seems like the fertilizer is effective when it's not. It would be better to randomly assign
five plots, as opposed to just two, as it is more likely that the farmer is going to find trends among those
five plots that are more valid.
TERM TO KNOW
Replication
Repeating the experiment on multiple subjects/experimental units. This principle of experimental
design that states that a larger experiment with more subjects/experimental units will allow us to more
clearly see differences between the treatments.
SUMMARY
Good luck!
TERMS TO KNOW
Control
The principle of experimental design that requires that other variables which may confound the
experiment be held constant between the treatment groups, so that any differences in the groups can
be attributed to the different treatments.
Experimental Design
The way in which an experiment is carried out. A good design has key elements of randomization,
replication, and control.
Randomization
The principle of experimental design that requires that the subjects/experimental units be assigned to
groups using some random process. This ensures that the two groups are roughly equal prior to
assigning treatments.
Replication
Repeating the experiment on multiple subjects/experimental units. This principle of experimental design
that states that a larger experiment with more subjects/experimental units will allow us to more clearly
see differences between the treatments.
Treatment
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 50
Something the researchers administer to the subjects or experimental units.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 51
Randomized Block Design
by Sophia
WHAT'S COVERED
This tutorial is going to teach you about a randomized block design. A randomized block design is a
little bit different than other types of designs that we've studied so this tutorial will focus on:
1. Randomized Design
2. Block Design vs. Randomized Design
1. Randomized Design
Randomized block design is a type of experiment where participants are first divided into homogenous
groups. This means that they are the same across some variable of interest, such as age, race, income,
location, job, or gender.
Once participants are in their similar group, they are randomly assigned to treatment or control within that
group.
An advantage is that it controls for variables that would otherwise be confounding. If we think that job has an
effect, we can make sure that a proportion number of people who have the same job are assigned a
treatment and control group.
IN CONTEXT
Suppose you are a researcher and you want to identify whether a new acid reflux drug is more
effective than the one that's currently available. You gather 500 volunteers with acid reflux, put the
number one on 250 cards, and the number two on another 250, and place all the cards in a hat. You
mix them up and have people pull out numbers.
People who received a "1" receive a new drug, and those who selected "2" received the old drug.
The image below would be your original plan, starting with all these volunteers, men and women,
and then you randomly assigned them to groups.
The problem is, what if men and women respond differently to the drug?
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 52
The better design is using a randomized block design, so you try something different. First, take
your large group and break it into smaller subgroups of just men and just women.
The image above has nine men and 14 women; you had a lot more in the old design, but now you’re
going to run the experiments essentially in parallel: one experiment for men and one experiment for
women. Now you’re going to take the men and randomly assign half of them to the treatment and
half to the control. You’re going to take half the women and assign them to the treatment and assign
them to the control, which looks like this:
Men and women receiving the treatment are in purple, and the men and women receiving the
control are in green. You might notice there are five men receiving treatment and only four receiving
control. It’s not necessary to have exactly equally sized groups.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 53
TERM TO KNOW
EXAMPLE Suppose the drug was more effective for women than for men. You would see that in
this experiment here. You would see that the drug was effective for women. You would also see that it
wasn't effective for men.
One minor disadvantage to running a block design is that you do lose some of the replication that you would
have if you had run it in a large group. Sometimes you need to make your sample size a little bit bigger to
overcome that. It might be a little bit harder to draw legitimate conclusions with small groups.
SUMMARY
In a randomized design, you saw how an experiment might miss an extra level of depth, such as men
and women reacting differently to a drug. The subjects or experimental units are grouped by some
similar characteristic that you think might affect the outcome. In this example, we used gender. When
evaluating block design vs. randomized design, you saw that with a randomized block design,
experiments run in parallel, resulting in two or more separate experiments. Then, you can compare
the treatments within each of those groups.
Good luck!
TERMS TO KNOW
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 54
Completely Randomized Design
by Sophia
WHAT'S COVERED
This tutorial will discuss a completely randomized design of an experiment through an exploration of:
An advantage of this design is that it is very quick and easy to implement. You could take your group of
experimental units, assign them a number, and have the odds in the treatment group and the evens in the
control group.
However, a disadvantage of this design is that treatment and control groups could have disproportionate
representations of the population.
IN CONTEXT
Let’s say you developed a new drug to combat the symptoms of acid reflux. You want to see if it’s
more effective than what is currently available. So you get 500 volunteers and write “1” on 250 slips
of papers and “2” on the other 250 slips of paper. You put all 500 sheets of papers into a hat, mix
them up, and the volunteers retrieve one slip of paper each.
Those who selected “1” will receive the new drug and those who selected “2” receive the drug that's
currently available. This is the simplest way to assign subjects to treatments. However, it's not
necessarily ideal for every scenario.
Let’s say that the acid reflux drug is more effective for men than it is for women. It’s not really a
problem if you divide the treatment control groups like this:
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 55
In this particular case, you can see there is roughly the same amount of females and males in the
treatment group and the control group. Since there is a relative equal assignment on each side, it will
be easy to see if the new drug is more effective for males than for females. Problems occur when the
random assignment doesn't match the proportions of the population equally.
Both groups are roughly the same size. Will you be able to determine if the treatment is more
effective for men? Why not?
If the drug were more effective for men and than women, you actually wouldn't notice because there
aren't that many men in the treatment group. The proportions are way out of whack. This sometimes
happens with random assignment.
You can see that in a completely randomized design, subjects are assigned using random processes such as
numbers in a random number generator, random number table, numbers in a hat, or names in a hat. The
problem is that it's not always the best way to assign treatments.
TRY IT
A tire company wants to launch a new type of rubber for its bicycle tires. It has 300 bikes to use for study
and a completely randomized design is desired. What would be the first step to achieving a completely
randomized design?
They could place numbers 1-300 in a hat and have each rider pull out one number. Numbers 1-150 receive
the old rubber tires and 151-300 receive the new rubber tires. The cyclists won’t know which type of tire
they are receiving.
There is an issue with this design. Can you think what this might be?
What if bike commuters are all in the same group? They might wear their tires out faster regardless of the
new or old tires. Can you think of other aspects that may impact this experiment?
⭐ BIG IDEA
While there are better ways to gather information for an experiment, a completely randomized design is
the easiest.
TERM TO KNOW
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 56
An experimental design where the assignment of subjects to treatments is done entirely at random
SUMMARY
In a completely randomized design, which is the simplest way of assigning individuals, the subjects
are assigned using a random process like numbers in a random number generator, random number
table, numbers in a hat, names in a hat. The problem is it's not always the best way to assign
treatments.
Good luck!
TERMS TO KNOW
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 57
Matched-Pair Design
by Sophia
WHAT'S COVERED
This tutorial will explain matched-pair design experiments by examining the characteristics and
examples of:
1. Matched-Pair Design
a. With Subjects in Pairs
b. With Subjects as Individuals
1. Matched-Pair Design
In a matched-pair design experiment, you form experimental units by pairing subjects that are as similar as
possible. One subject goes to the treatment group and the other subject goes to the control group. Having
very similar pairs helps control for the other variables we haven't considered.
EXAMPLE Choosing a pair of women who are the same age, have the same exercise habits, and
live in the same area allows us to look at only the variable we are studying, while avoiding the effects
of age, exercise, and location on the outcomes of the experiment.
In matched-pair design, subjects can be assigned to the treatment and control groups in two different ways:
Subjects who are similar with respect to variables that could affect the outcome of the experiment are
paired together, and then one of them is assigned to the treatment group and one is assigned to the
control group
Each subject is assigned to both groups, where each subject acts as their own matched-pair.
HINT
This type of design is also similar to a case-control study, but here researchers are giving a treatment
instead of just observing the participants.
TERM TO KNOW
Matched-Pair Design
An experimental design where two subjects who are similar with respect to variables that could affect
the outcome of the experiment are paired together, then one of them is assigned to one treatment
and one is assigned to the control. This can also be done by assigning each subject to both groups,
where each subject acts as their own matched-pair.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 58
variable that may affect the outcome.
Video Transcription
[MUSIC PLAYING] Hello. Let's take a look at a common instance of how matched pair design is used. An
experiment is being conducted to test the effectiveness of a new flu vaccine. Gender and age are the
two variables that may play a significant role in how well this vaccine works.
So how can we study only the effects of the vaccine? A matched pair design, that's how. Groups of two
who are similar in both gender and age are created. Then one is given the vaccine. And the other is
given a placebo shot. This allows us to study only the effects of the vaccine and not the effects of the
other variables.
Here's how it goes. For this study, there are 20 participants-- 10 men and 10 women of varying ages
labeled A through T. The first variable being gender, we separate the 20 participants into two groups--
one group of 10 males, the other 10 females.
With the second variable being age, we will pair males of similar ages. Then we'll do the same with
females. So looking at the males, the first similar ages we see are 24 and 25. Our first matched pair will
be participants A and H. Using this process, we see participants L and J, D and C, T and K, and P and R
will also be good matched pairs. Then the same method is applied for similarly-aged females.
Once we have our 10 sets of matched pairs, we can randomly assigned the treatment to one half of the
pair and the control to the other half. This will allow us to study how this new flu vaccine works. Oh, that
reminds me. Be sure to get your flu shot. I'm getting one after yoga this evening. Kidding. That's
ridiculous since I'm a computer. Totally can't do yoga or flu shots. Thanks for watching and see you next
time.
IN CONTEXT
There are 20 participants for an experiment for a flu vaccine. Gender and age may play a role in how
well this treatment works. Groups of two are created; each group is as similar as possible with
respect to any variable that may affect the outcome.
Participant 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Gender M F M M F F F M F M M M F F F M F M F M
Age 24 21 42 39 35 37 22 25 31 32 51 31 61 26 38 55 26 56 52 48
There are 10 men and 10 women of all different ages. Participants will be listed by gender. So
participant 1, 3, 4, 8, 10, 11, 12, 16, 18, and 20 are the males. The rest are females.
Participant 1 3 4 8 10 11 12 16 18 20
Males
Age 24 42 39 25 32 51 31 55 56 48
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 59
Participant 2 5 6 7 9 13 14 15 17 19
Females
Age 21 35 37 22 31 61 26 38 26 52
Age is suspected to also play a role in effectiveness, so within the male category, two ages that that
are closest together--24 and 25--are chosen. Therefore, participants 1 and 8 will form a matched pair.
Participants 10 & 12, 4 & 3, 20 & 11, and 16 & 18 are also matched pairs due to similarly aged males.
The same criteria is applied for similarly aged females.
Participant 1 8 12 10 4 3 20 11 16 18
Males
Age 24 25 31 32 39 42 48 51 55 56
Participant 2 7 14 17 9 5 6 15 19 13
Females
Age 21 22 26 26 31 35 37 38 52 61
Now, to continue the experiment, one of the two in the pair is randomly assigned to receive the flu
vaccine and the other one will be assigned to the control group.
IN CONTEXT
Suppose that you have a tire company that's considering rolling out a new type of rubber for its
bicycle tires. There are 300 bicycles available. In a completely randomized design, you would place
the numbers 1 - 300 in a hat. Bikers that pull numbers 1 -150 would receive old rubber tires, and the
151- 300 would receive the new rubber tires. They won’t necessarily know who's getting which tires.
But what if the 300 riders don't all ride the same way or equally as often? What do you do then?
How do you create two groups that are roughly the same, with the exception of the bicycle tires?
One way to do it is with a matched-pair design. You could still put the numbers 1 - 300 in a hat. The
only difference is that the people who pull out 1- 150 would get both the old and the new. They
would put the old in the front and the new rubber tire in the back.
Then, the people who pulled out 151 - 300 would get the new rubber tire in the front and the old one
in the back.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 60
So there's still some randomization going on. The only difference is that every biker will get one old
tire and one new tire. This will allow you to compare the tread wear for each bike because the front
and rear tire get worn somewhat equally. It won't matter how much the biker rides or where.
SUMMARY
In a matched-pair design, two numbers whose characteristics are very similar are paired, then each
one is sent to a different group. When applying matched-pair design, typically, each subject is
assigned to both groups instead of one, as was the case with the bicycle tires situation. Matched-pairs
designs are often done by assigning both treatments to every participant, which is commonly used in
the matched-pairs design.
Good luck!
TERMS TO KNOW
Matched-Pair Design
An experimental design where two subjects who are similar with respect to variables that could affect
the outcome of the experiment are paired together, then one of them is assigned to one treatment and
one is assigned to the control. This can also be done by assigning each subject to both treatments,
where each subject acts as their own matched-pair.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 61
Surveys
by Sophia
WHAT'S COVERED
This tutorial will briefly introduce you to surveys, demonstrating the following concepts:
1. Introduction to Surveys
2. Survey Design
1. Introduction to Surveys
A survey is a data gathering technique. It's an information collection tool, and a lot of organizations use these.
Surveys allow organizations a way to gather data so that they can target the specific information that they
want.
A store might use a survey to figure out something about its customers.
Politicians might use a survey to gather information about their constituents.
Someone hiring for a position in a company might use a survey to learn more about their labor market,
who they can hire, and who is not available in that area, etc.
In all of these examples, the survey is a tool being used to increase the amount of specific information
someone has. For each survey, the researcher has selected the variables of interest, or the variables that he
or she is interested in gathering data on.
TERMS TO KNOW
Survey/Sample Survey
A data collection tool that individuals in a study can fill out and return to the researcher.
Variables of Interest
The variables the survey wishes to measure about those taking the survey.
2. Survey Design
A survey must be carefully designed to elicit the intended information. The survey design is an important
element of surveys. If you are designing a survey, you want to get a representative sample of your population.
So as with every sampling technique, designing a survey is all about the process and being able to get
accurate data from a representative sample.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 62
⭐ BIG IDEA
Just like with any sample, it's important to define what you're interested in before you begin surveying.
BRAINSTORM
You might ask yourself: What are the variables that you want to measure? What information do you want
people to provide in your survey? Answering these questions is going to be important because those
answers will help you understand the purpose of the information you generate with your survey.
So, for example, if it's a survey about employment, you're going to want to ask about employment, former
employment, current employment, and things like that.
IN CONTEXT
Suppose a teacher uses the following survey at the end of the year for her students:
Course Survey
Strongly Strongly
Agree Neutral Disagree
Agree Agree
2. The methods for evaluating student work have been applied fairly. ❍ ❍ ❍ ❍ ❍
8. This course covered more material than I thought it was going to. ❍ ❍ ❍ ❍ ❍
This teacher wants to know whether or not she did a good job outlining course objectives. This
survey asks about evaluating student work and academic challenge. You'll notice that she's provided
answer choices from strongly agree to strongly disagree.
The teacher thought about all of the different things she wanted to learn from her students including
her teaching and listed them all in her survey. The information she gathers from this survey will help
her answer the question of how clearly she outlined her course objectives for her students.
TERM TO KNOW
Survey Design
The way the survey is set up. This deals with the wording of questions and answer choices.
SUMMARY
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 63
To recap, surveys are used to obtain data or information from the population. It's important that you
determine what you want to understand and why and for whom this is being collected, which may
impact survey design. We talked about surveys, which are also called sample surveys. We also talked
about variables of interest, which are the things that you wanted to measure because you're
interested in knowing them.
TERMS TO KNOW
Survey Design
The way the survey is set up. This deals with the wording of questions and answer choices.
Survey/Sample Survey
A data collection tool that individuals in a study can fill out and return to the researcher.
Variables of Interest
The variables the survey wishes to measure about those taking the survey.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 64
Blinding
by Sophia
WHAT'S COVERED
This tutorial is going to teach you about blinding and will explain the following topics:
1. Blinding
2. Double-Blind and Single-Blind Experiments
1. Blinding
Blinding is one of those principles of experimental design whereby the subjects don't know what treatments
they're going to receive.
When you randomize an experiment, it is done to reduce bias. However, it's possible to give subtle clues
regarding what treatment they're receiving; it’s important that the people don’t know what they're receiving.
Why is this? Because it might be an incentive for them to either stay on the treatment if it's a drug or go off the
treatment if they think they're not getting the real drug.
Also, it may be true that people with an agenda might want to bend the results in their favor. They might want
to make the results of an experiment seem more positive than they really are. This idea of the experimenter
wanting to bend the results in their favor is called the “experimenter effect”.
To counteract both of those two ideas, we implement a strategy called blinding. Only people who are behind
the scenes will know who is getting what. No one, either directly involved in the experiment or taking any of
the treatments, knows what treatments they're receiving.
IN CONTEXT
If subjects know which treatment group they are assigned to, it may influence behavior. So the
treatment group will receive a pill, and the control group will receive a pill. The only difference is that
one pill has the active treatment in it and will be only given to those in the treatment group.
Ideally, when you open the pills up, they would look the same on the inside, too. The idea is that no
one knows which pill is fake and which one has the tested drug.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 65
The fake drug is usually some kind of a sugar or something that makes the person in the control
group feel like they're actually taking something when they’re really not.
TERM TO KNOW
Blinding
The practice of making sure that certain individuals do not know which subjects are receiving which
treatment.
Single-blind experiments, on the other hand, can have subjects blinded, but the researchers are not.
IN CONTEXT
A double-blind study is ideal, but sometimes it is just not feasible. Suppose there is an exercise
study--whether or not exercise is effective for weight loss. People are going to know if they're
exercising or not. It's impossible to assign people to exercise--the treatment in this case--and have
them not know they're receiving the treatment.
However, the experimenters don't need to who was assigned not to exercise. This is single-blind
because the experimenters don't know. The experimenters were blinded, but the subjects were not.
BRAINSTORM
Can you think of a single-blind experiment that would be set up to have the researchers know group
assignments, but the participants do not?
TERMS TO KNOW
Double-Blind Experiment
An experiment where neither the subjects nor anyone in contact with them has any knowledge of
which subjects are receiving which treatment.
Single-Blind Experiment
An experiment where either the subjects have no knowledge of which subjects are receiving which
treatment or people in contact with the subjects have no knowledge of which subjects are receiving
which treatment, but not both.
SUMMARY
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 66
Blinding is a powerful tool for preventing different types of biases, such as the experimenter effect.
Different studies allow for different levels of blinding. Ideally, double-blind is best since both
participants and the people with direct contact with the participants are not aware of group
assignment. As you saw in the exercise example, sometimes double-blind just is not realistic.
Participants will know if they are exercising or not. In that case, single-blind experiments are the next
best thing, which means that either the subjects or the researcher are aware of group assignments;
but not both.
Good luck!
TERMS TO KNOW
Blinding
The practice of making sure that certain individuals do not know which subjects are receiving which
treatment.
Double-Blind Experiment
An experiment where neither the subjects, nor anyone in contact with them, has any knowledge of
which subjects are receiving which treatment.
Single-Blind Experiment
An experiment where either the subjects have no knowledge of which subjects are receiving which
treatment, or people in contact with the subjects have no knowledge of which subjects are receiving
which treatment, but not both.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 67
Placebo
by Sophia
WHAT'S COVERED
1. Placebo
1. Placebo
In basic terms, placebo is a fake treatment. That doesn’t mean that people don’t respond to it; instead, they
think or expect that the treatment will result in a change. A placebo doesn't do anything. It has no active
treatment, yet people feel better anyway as if they have willed themselves to feel better. This is called the
Placebo Effect.
While the treatment group gets the actual drug, the control group receives a placebo as their treatment. They
get the fake drug with no active ingredient in it--usually some kind of a sugar or something. It doesn't do
anything and has no active ingredient.
Sometimes, the treatment containing the actual drug doesn't work any better than the placebo. This can
happen. It’s evidence against the treatment working.
IN CONTEXT
Suppose that you developed a treatment that relieved pain and you conducted a study on pain. You
had a control group receiving a sugar pill and a treatment group receiving the actual drug that you
created. Here are your results.
Would you say that your treatment is effective? Why or why not?
The answer is here is that your treatment is not very effective. The numbers, 42 and 36, are not far
apart. These results would be weak evidence for the effectiveness of the drug.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 68
What if the results looked like this?
Notice that you still have 36% of patients in the placebo group reporting relief of pain. However, the
difference between 36% and 80% is significant. This would be considered the evidence for the
effectiveness of the drug.
TERMS TO KNOW
Placebo
An inert drug or treatment given to the control group. It has no active ingredient in it.
Placebo Effect
The observed phenomenon whereby certain individuals will exhibit a desired response even when
taking a placebo, which contains no active ingredient.
SUMMARY
Placebos are a form of control. They're a fake drug. People can respond to the fake drug, thinking
they are receiving treatment, which is called the Placebo Effect. Experimenters will assess the
effectiveness of the treatment against the effectiveness of the placebo. If the gap between the two is
significant, it is considered evidence that treatment has a considerable effect.
Good luck!
TERMS TO KNOW
Placebo
An inert drug or treatment given to the control group. It has no active ingredient in it.
Placebo Effect
The observed phenomenon whereby certain individuals will exhibit a desired response even when
taking a placebo, which contains no active ingredient.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 69
Variables
by Sophia
WHAT'S COVERED
This tutorial will discuss variables within the field of statistics, and introduce the concept of
confounding variables. The following elements will be the main focus of this tutorial:,
1. Variables
a. Variables of Interest
b. Explanatory and Response Variables
2. Confounding Variables
1. Variables
In statistics, a variable is any attribute that we can measure about a population, used in a study. It is very
important to carefully define the variables to be measured when creating a study.
Age
Weight
Gender
Ethnicity
Favorite Food
Number of Pets
Smoker or Non-Smoker
ZIP Code
Number of Siblings
Political Affiliation
Favorite Sport
All sorts of these things are variables. You might only want to know one of these things or some of these
things.
TERM TO KNOW
Variable
Any attribute or number that can be measured about individuals in a study.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 70
1a. Variables of Interest
For a political poll, for example, you wouldn't necessarily need to know if a candidate was a smoker or the
number pets they have. However, you might want to know about their age, gender, state, political affiliations,
zip code, ethnicity, and city.
Since those variables could potentially have some bearing on a political poll. They are thevariables of interest
for this study--literally, the variables you would be interested in measuring.
However, if you were conducting a weight loss study, the political affiliation will likely not be a variable to
measure, but favorite food might seem important.
TERM TO KNOW
Variable of Interest
Any variable which we need to know about in the context of a study.
In those cases, we define the one that causes the other as theexplanatory variable. In a study, you can have
more than explanatory variable.
Then, variables that are the result are called response variables.
Explanatory: Average monthly temperature You might assume that as the temperatures get warmer, that
Response: Ice cream sales ice cream sales would go up in kind.
Something that's a little bit less obvious is whether or not gender, which is a categorical variable, plays a role
in which political party people will choose. Are males more likely to be Republican? Or are women more likely
to be independent voters? We don't know. But that would be an interesting question to investigate.
TERMS TO KNOW
Explanatory Variable
A variable that we believe is predictive of something else. An increase in this variable will correspond
to an increase or decrease in some other variable.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 71
Response Variable
A variable that is affected by the explanatory variable.
2. Confounding Variables
The word confounding refers to when two variables get mixed up with one another and you can't tell the
effect of one variable from the effect of the other variable. The confounding variable is the one not accounted
for in a study. It is an unseen variable that has a significant effect on the response variable and is also related
to the explanatory variable.
IN CONTEXT
Suppose that a researcher wants to know whether a high protein diet will help lab rats gain more
weight than a low protein diet. The researcher has 26 lab rats and she selects 13 of the smallest rats
to receive the low protein diet and 13 of the largest to receive the high protein diet. At the end of the
study, she weighs the rats to determine their weight gain and finds that the rats on the high protein
diet gained more weight.
Can you think of anything that she did wrong in this study?
The answer involves the occurrence of confounding. Remember, confounding is when two variables
get mixed up and you can't tell the effect of one variable from the effect of the other variable.
So in this case, the effect of the diets--whether or not the high protein diet caused the rats to gain
more weight--was confounded by the fact that the heaviest rats were put on the high protein diet. It’s
not clear if the high protein diets were effective at weight gain. Something else may have caused the
weight gain since they were heavy already.
Therefore, these are the two variables of interest in the study. The high protein diet was supposed
to be the explanatory variable. The weight gain was supposed to be the response variable. The
researcher was going to try to figure out a link between the two.
However, because of the way she assigned the rats, only a limited conclusion could be drawn. She
wasn't able to draw the direct conclusion that she was hoping for--and that is confounding.
Confounding should be limited in experiments when possible.
TRY IT
A high school math teacher, hoping to have his students do well on the final, offers an optional review
session. He states, “No one who's ever attended the review session has ever scored less than a B”.
What is the teacher trying to imply? Why isn’t his implication correct?
You may have come up with that he's trying to imply that the review sessions will cause the students to do
better. That may be true; however, there may be a few confounding variables. Maybe only his best and
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 72
brightest students attend the optional review and these are students that may have done well on the final
exam anyway. The effects, if any, are confounded by the intrinsic motivation of students to show up to the
session.
TERMS TO KNOW
Confounding
Occurs when the effects of the treatments, if any, are indistinguishable from the potential effects of
some other variable which was unaccounted for.
Confounding Variable
A variable which was not accounted for in a study, which limits the conclusions that the study can
draw.
SUMMARY
Variables are what we choose to measure in a study. The variables of interest will depend on the
questions that you're trying to answer. Not every variable must be measured--just the ones that are of
interest. By looking at variables in context, you learned that if a cause and effect relationship is
thought to exist, you can break the variables down even further into explanatory and response
variables. Confounding occurs when there is a variable that is chosen as an explanatory variable in an
experiment, but because another variable got in the way, it cannot be determined to explain a cause.
You explored confounding variables in action to demonstrate how they can limit the conclusions that
can be drawn from the supposed explanatory variable. In effect, the confounding variable inhibits a
cause-and-effect conclusion. Often, it's one that you didn't think to measure, which is problematic.
Good luck!
TERMS TO KNOW
Confounding
Occurs when the effects of the treatments, if any, are indistinguishable from the potential effects of
some other variable which was unaccounted for.
Confounding Variable
A variable which was not accounted for in a study, which limits the conclusions that the study can draw.
Explanatory Variable
A variable that we believe is predictive of something else. An increase in this variable will correspond to
an increase or decrease in some other variable.
Response Variable
A variable that is affected by the explanatory variable.
Variable
Any attribute or number that can be measured about individuals in a study.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 73
Variable of Interest
Any variable which we need to know about in the context of a study.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 74
Question Types
by Sophia
WHAT'S COVERED
This tutorial will cover the topic of question types. We will cover binomial questions, as well as discuss
the difference between open-ended and closed questions, through the exploration of:
1. Binomial Questions
2. Closed Questions
3. Open-ended Questions
1. Binomial Questions
Recall that there are two types of data:
A binomial question is a type of question with only two answer choices. In order to understand what a
binomial question is, it helps to break down the word itself. Bi means “two” and nomial means “names”. So a
binomial question is a question with two names.
Do you think that this is a qualitative type of question or a quantitative type of question?
A binomial question collects qualitative data because there are two possible responses. It's a question with
two categories.
EXAMPLE The simplest version of a binomial question is yes or no. You might remember this type
of question from elementary or middle school:
Yes No
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 75
Other examples of binomial questions include:
In that last question, some people feel like they fall somewhere in between the two options. They may
currently be a smoker, but they are trying to quit. Sometimes questions have some shades of gray. What
about this one?
Sometimes things don't neatly fit into two boxes. Nor do they work when the questions have more than two
answers or are open-ended questions such as, “How do you feel about the construction of the new baseball
diamond located on the north end of town?". It doesn't really work to place something like that into two
categories.
TERM TO KNOW
Binomial Question
A question with only two answer choices.
2. Closed Questions
Many surveys have a combination of open and closed questions.Closed questions have short, definite,
usually multiple choice type answers.
The Teacher ❍ ❍ ❍ ❍ ❍
Class Content ❍ ❍ ❍ ❍ ❍
In the above example, you'll notice that the highlighted pink area shows multiple choices --poor, fair,
satisfactory, good and excellent-- and those are your only choices.
HINT
When there are only certain answers to select, such as yes/no or multiple choice, that is the signal that
you are dealing with a closed question.
TERM TO KNOW
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 76
Closed Question
A question type with only so many different answer choices.
3. Open-ended Questions
Open questions, also called open-ended questions, are subjective. These are areas where someone can click
into the field and start to type their comments and/or opinions. These comments are open to the
interpretation of the person being surveyed.
The comments are also open to the interpretation of the person conducting the survey when they do the
analysis. Usually, they need to be analyzed by a person in order to really get the full effect from it. Oftentimes,
in the desire for simplicity, someone will give a question in closed form that really should be an open-ended
question.
The Teacher ❍ ❍ ❍ ❍ ❍
Class Content ❍ ❍ ❍ ❍ ❍
TERM TO KNOW
Open Question
A question type with no answer choices; the respondent can choose what he or she wants to say to
answer the question.
⚙ THINK ABOUT IT
Suppose you are in a court of law and the lawyer asks, “Were you at the crime scene?”
“Yes, but I didn’t see anything other than people running and police arriving. It was chaos.”
The lawyer asked a closed question and wants only a yes/no answer. By attempting to explain your
circumstance, you were trying to answer it in an open-ended question type. The lawyer reverts back to
the closed question again by asking you to select either “yes” or “no.”
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 77
SUMMARY
Binomial questions produce categorical data. These are questions with two possible responses, or
two categories. It's important to consider whether or not there really are just two categories before
you ask something as a binomial question. Open questions allow for more explanation and they're
sometimes difficult to interpret because they're not very cut and dried like closed questions.
Sometime open-ended questions are called "essay" questions. Closed questions are easier to
interpret, but they're not always appropriate for the situation. Closed questions are sometimes called
multiple choice type questions.
Good luck!
TERMS TO KNOW
Closed Question
A question type with only so many different answer choices.
Open Question
A question type with no answer choices; the respondent can choose what he or she wants to say to
answer the question.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 78
Accuracy and Precision in Measurements
by Sophia
WHAT'S COVERED
This tutorial will discuss accuracy in measurement versus precision through the following exploration:
Precision, on the other hand, is concerned with how consistent the measurements are to each other. In other
words, how close are the measurements to a single value, regardless of whether or not that single value is the
right answer.
TERMS TO KNOW
Accuracy
The extent to which the values, when considered all together, center around the correct value for a
variable.
Precision
The extent to which the values are very close to each other, even if they are not near the correct
value.
Take a look at Scale #1 and determine if this scale is accurate, precise, both, or neither.
Scale 1
Accuracy ✔ Precision ✘
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 79
Scale #1 is accurate because the numbers average out to the right answer of 161.8. Although it reported a
fairly low number such as 158.8 and a high number of 164.2, by and large, the numbers average out to what's
pretty close to the right answer.
However, Scale #1 is not precise because the numbers are not close to a single value every time.
Take a look at Scale #2 and determine if this scale is accurate, precise, both, or neither.
Scale 2
Accuracy ✘ Precision ✔
You can tell just by looking at the numbers that all values are within 1 pound of each other, which means it is
precise. Remember, it doesn’t need to be close to the actual correct number, but they need to be close to
each other.
But take a look at the average. The average of Scale #2 is about 168, which is overestimating by at least 7
pounds, so this scale is not accurate.
Take a look at Scale #3 and determine if this scale is accurate, precise, both, or neither.
Scale 3
Accuracy ✔ Precision ✔
All of these are within a pound of each other. They're also very close to 161.8 pounds, the true weight of the
individual you selected. Having the numbers all close to each other make it precise, and the numbers average
out to be very close to the correct weight of 161.8. Therefore, Scale #3 is both accurate and precise.
Take a look at Scale #4 and determine if this scale is accurate, precise, both, or neither.
Scale 4
Accuracy ✘ Precision ✘
It actually did get the correct weight of 161.8 once, but if you look at the five measurements taken as a whole,
they're pretty far off and they tend to overestimate. They don't really center around the right number all that
much, so it’s not accurate. The numbers are also all over the place, so this scale is not precise.
BRAINSTORM
If you worked for a consumer report company and you were evaluating the above scales, which scale
would you choose and why?
Accurate
Not Accurate
Precise and Accurate: In the top left corner, the darts are clumped together AND around the bulls-eye.
Not Precise, but Accurate: In the top right corner, the darts are not clumped together, but they loosely
surround the bulls-eye.
Precise, but Not Accurate: In the bottom left corner, the darts are clumped together, but not around the
correct “value”, or in this case, the bulls-eye.
Not Precise nor Accurate: In the bottom right corner, the darts are spread out and are not surrounding the
bullseye.
SUMMARY
By contrasting accuracy and precision, you now know that accuracy is how close the measurements
are to the right answer, though they may not necessarily land exactly on the correct answer. Precision
is how consistent measurements are with each other, even if they are not near the correct value.
Generally, you will see them clumped together. In a given measurement scenario, high accuracy and
high precision is ideal.
Good luck!
TERMS TO KNOW
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 81
Accuracy
The extent to which the values, when considered all together, center around the correct value for a
variable.
Precision
The extent to which the values are very close to each other, even if they are not near the correct value.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 82
Absolute Change and Relative Change
by Sophia
WHAT'S COVERED
In this tutorial, you're going to learn about the difference between absolute change, which is an
increase or decrease represented as a raw number, and relative change, which relates that change
differential back to the original value. Specifically, this lesson will cover:
EXAMPLE Suppose a political candidate's approval rating went up from 44% to 48%. That absolute
change is four percentage points.
Relative change is the percent difference from the previous value, and it's always expressed as a percent.
HINT
IN CONTEXT
An infant weighed 6.5 pounds at birth, and one year later, weighed 14.5 pounds. Decide if each of
the following statements are true.
Well, that's a true statement. 14.5 minus 6.5 is 8 pounds. It increased by 8 pounds.
This one's a little bit less obvious, but it's also true. The eight-pound increase was more than double
what the birth weight was. It was an increase of over 100%. In fact, when you do the calculation, 8
divided by 6.5 is 123%.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 83
TERMS TO KNOW
Absolute Change
The raw increase or decrease in the value of a variable
Relative Change
The percent increase or decrease in the value of a variable.
FORMULA
Absolute Change
FORMULA
Relative Change
In the example above, the absolute difference was 8 pounds and the original value was 6.5. When you put this
into a calculator, you get 1.23.
When expressed as a percent, 1.23 is 123%. That means that there was a 123% increase over the birth weight.
That was the relative change.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 84
Video Transcription
Hey again. Let's walk through an example of how absolute change and relative change are found and
the differences between them. The data we'll use is the enrollment for this year and last year classes at
Memorial High School. First, we'll seek to determine which class has the highest absolute change. Then
the highest relative change.
Will it be the burnouts, the nerds, the geeks, or the dweebs? It's more than anyone's guess. It's statistics.
Anyway, let's find out. We'll start with absolute change. To calculate this, simply subtract last year's value
from this year. As you can see, three of the four classes had increases in enrollment.
So of the classes that had a positive absolute change, the burnouts had the highest with 310 students.
Now onto relative change, which is calculated by dividing the absolute change by the original number.
With that in mind, and looking at the classes again, repeat this formula with all four groups. This is what
you'll see.
The relative change for the burnouts is a sizable increase of 24%, while a more modest 10% appears for
the nerds. The Geeks, on the other hand, experienced a decrease of 6%. Which finally brings us to the
dweebs. While they're the smallest overall class, they have the highest relative change with an increase
of 26%. The dweebs enrollment wasn't big to begin with, so even a normal absolute change resulted in
the largest relative change.
To summarize, here's a breakdown of the distinction between the two categories. Absolute change is
the difference in raw numbers. In this case, it's the actual change in enrollment from one year to the next.
Whereas, the relative change converts how this year compared to last year in terms of a percent of the
original number.
Looking at the absolute change and relative change can tell different stories, and often times you
humans find these stories are a valuable way to analyze data. There you have it. A quick illustration of
absolute change and relative change. Keep plugging away and I'll see you in the next video.
IN CONTEXT
Let's look at another example. The following table shows the results of the 1990 census and the
2000 census, along with the absolute change and relative change.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 85
Illinois 11,430,602 12,419,293 988,691 9%
Absolute Change: To calculate the absolute value, simply subtract the 1990 value from the 2000
value. For example, Florida's absolute value can be found by subtracting 12,937,926 from 15,982,378
to get an absolute change of 3,044,452.
All of the states in the list had increases in the population. Some were not very much, like Hawaii,
which only had about a 100,000-person increase. Some were a lot, like Georgia and Florida, which
increased by over a million people. The highest absolute change was 3,044,452 people, in Florida.
Relative Change: The question of which state had the largest relative change between that time is a
little bit different. Looking at Florida again, you need to figure out if the absolute change of around 3
million was a large change percentage-wise from the old population of about 13 million. It was a
large increase but was it the largest percent increase in the list?
To find the relative change, take each absolute change and divide by the old population from 1990.
Florida's relative change was positive 24%--approximately 3 million divided by 13 million gives you
about 24%. Georgia's increase was about 26%, a little bit larger of a percent increase than Florida.
The highest of the list was a 29% increase in the state of Idaho. Notice it didn't have a very large
absolute change. But its population wasn't very big to begin with, and so even a small absolute
change can be a large relative change.
SUMMARY
Absolute change is the absolute difference in raw numbers. It's the change in units. Relative change
examines how the new number compares to the previous number in terms of a percent. Did it go up
by 10%? Did it go down by 7%? What happened percentage-wise from then to now?
Good luck!
TERMS TO KNOW
Absolute Change
The raw increase or decrease in the value of a variable
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 86
Relative Change
The percent increase or decrease in the value of a variable.
FORMULAS TO KNOW
Absolute Change
Relative Change
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 87
Using Percentages in Statistics
by Sophia
WHAT'S COVERED
This tutorial will discuss how to use percentages wisely in statistics by focusing on:
Percents are used to describe the relative change. Percentage points are used to measure absolute change.
TERMS TO KNOW
Percentage Points
An absolute increase or decrease in a percent value.
Percent Change
A relative increase or decrease in a percent value.
2. Examples
2a. Retaking a Test
Suppose a teacher gives a particularly difficult exam and these six students all failed it. The teacher graciously
offered a retake to the students and they all passed.
The table below shows their original score and their retake score. On the retake, Jonathan scored an 88,
Ryan scored a 78, Katherine scored an 84, etc.
Original
Student Retake Score
Score
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 88
Katherine 61% 84%
These changes can be expressed as either percentage points or percent increase. First, which student had
the highest increase in percentage points?
Change in
Original
Student Retake Score Percentage
Score
Points
Jonathan went from 52% to 88%, that's an increase of 36 percentage points. Ryan went from 38% to 78%,
that's an increase of 40 percentage points. We can calculate that for all of them and see that it was Kelly who
increased 47 percentage points.
Now, who had the highest percent increase? Now you need to look at the raw increased numbers and
determine who had the highest percent increase over their old score.
Begin with Jonathan's scores. We need to determine how much of an increase 36 percentage points was
over that original score of 52.
Change in
Original Percent
Student Retake Score Percentage
Score Increase
Points
Jonathan's score increased by 69%. Katherine's only increased by 38% because she had a fairly high score to
begin with.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 89
But it was Ryan who had the highest percent increase. He started with a 38 and finished with a 78, a 40
percentage point increase. A 40 percentage point increase over a score of 38, is over 100%, meaning he
more than doubled his old score.
Video Transcription
Let's take a moment to look at one more example of using percentages in statistics. Suppose young
Patrick hare has found his way to class president at Memorial High School, but his approval rating has
just hit the skids, dropping from 56% to 42%. Perhaps this is thanks in part to his proposal to phase out
all computer generated voices with English accents. I'm just saying.
Whatever the case, let's determine the absolute change in his approval rating. Take a moment to
calculate it out. All right, here's what you should have done. Take 42 and subtract 56 from it. This gives
you negative 14. So Patrick's approval rating dropped 14 percentage points. It's a drop, but looking at it
that way, Patrick isn't too concerned.
However, how does that drop look when you calculate it in terms of relative change? Again, take a
moment to calculate it out. OK, here's where you start. Take the 14 percentage point drop and divide it
by the original approval rating, 56. That will give you minus 0.25, or a 25% drop. Viewed in this context,
Patrick sees the drop is a significant one, which he might not have expected.
Do you see what happens, Patrick? Do you see what happens when you try to phase out a crisp and
pleasant sounding computer generated English accent?
Suppose Patrick has found his way to class president at Memorial High School. But his approval rating has just
hit the skids, dropping from 56% to 42%.
First, let’s determine the absolute change in his approval rating. Take 42 and subtract 56 from it.
This gives you negative 14. So Patrick's approval rating dropped 14 percentage points. It’s a drop, but looking
at it that way, Patrick isn’t too concerned.
However, how does that drop look when you calculate it in terms of relative change? Take the 14 percentage
point drop and divide it by the original approval rating, 56.
That will give you -0.25, or a 25% drop. Viewed in this context Patrick sees the drop is a significant one, which
he might not have expected.
SUMMARY
When percentages are used in statistics it's important to know whether the focus is absolute change
or relative change. Absolute change is the difference in percentage points and relative change is a
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 90
percent increase or percent decrease.
TERMS TO KNOW
Percent Change
A relative increase or decrease in a percent value
Percentage Points
An absolute increase or decrease in a percent value.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 91
Index Number and Reference Value
by Sophia
WHAT'S COVERED
This tutorial is going to teach you about index numbers and reference values, through the definition
and discussion of:
To calculate the index value for other points in time, you would take the current price, divide by the reference
value, and then convert that value to a percent.
FORMULA
Index Number
How do we work with index numbers and reference values most of the time? Consider the following example:
In 1983 a gallon of milk cost $2.24, so you assign this reference value of $2.24 an index value of 100.
Essentially this means that it cost 100% of what it cost in 1983--a fairly obvious statement.
To calculate the index value for other points in time, like in 1988 when a gallon of milk costs $2.30 or 1993
when it cost $2.86, you would take the current price, divide by the reference value of $2.24, and then convert
that value to a percent.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 92
The index value in 1998, then, is $2.30 divided by the reference value of $2.24. That gives you 1.027, which as
a percent is 102.7%. Note that index values are expressed without the percent symbol, so the index value in
1988 was 102.7. You can complete the table with the remaining values.
What this indicates is that by the time you get to 2003, a gallon of milk cost 142.4% as much as it did in 1983,
or a 42% increase over 1983.
TERMS TO KNOW
Index Number
A way to measure the relative change in a value, usually the price of a good or service, over time. If
the index number is over 100, that means the price has increased. If the price has decreased, then the
index number will be less than 100.
Reference Value
An arbitrarily chosen starting value for an index. It is assigned an index number of 100.
The CPI is a general measure of inflation. Inflation means that the index is going up. It's a decline in
purchasing power, which means that it costs more now to buy these goods and services than it did then. That
means that the dollar is inflated. Put another way, inflation means that with the same amount of money coming
in and with the same income, you have less purchasing power. It may cost you much more now to do what it
cost $100 to do in 1983.
Here's a graph of the CPI over time. Notice the index value is 100 in 1983, between 1980 and 1990. Goods and
services costing $100 in 1983 will cost you around $200 if you look at around 2007. Therefore, the index
value was 200 in 2007.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 93
TERMS TO KNOW
Inflation
A relative increase in the price of a good or service over time. A person will need to pay more to
receive the same good or service than they did at a previous point in time.
SUMMARY
Index numbers allow us to check changes, typically in prices, from one point in time to another. We
begin with a reference value, which is the price at some arbitrary point in time. The index numbers are
the percent increase or decrease from that reference value. If the price goes up, the index number will
be over 100. If the price goes down, the index number will be under 100. The most commonly referred
index would be the Consumer Price Index or CPI. The CPI shows percent increase or decrease in the
prices of many goods and services, which helps determine the amount of inflation.
Good luck!
TERMS TO KNOW
Index Number
A way to measure the relative change in a value, usually the price of a good or service, over time. If the
index number is over 100, that means the price has increased. If the price has decreased, then the index
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 94
number will be less than 100.
Inflation
A relative increase in the price of a good or service over time. A person will need to pay more to receive
the same good or service than they did at a previous point in time.
Reference Value
An arbitrarily chosen starting value for an index. It is assigned an index number of 100.
FORMULAS TO KNOW
Index Number
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 95
Bias
by Sophia
WHAT'S COVERED
This tutorial will cover the topic of bias, specifically focusing on:
1. Bias
2. Hawthorne Effect
1. Bias
Most often, research is done accurately and with integrity. People want to get the job done right. They want to
get the answer correct. But sometimes there's something that happens systematically in the experiment or the
study that limits the accurate representation of the population that researching.
Bias, in the statistics world, is systematically misrepresenting the population. It refers to the favoring of certain
outcomes in a sample that limits our ability to draw conclusions about the population. The key word is
systematical--it's not necessarily intentional. It could be intentional, but it doesn't have to be.
A way of selecting the sample for your study such that the sample doesn't accurately reflect the population is
called selection bias. It's not good, but sometimes it can't be avoided. On the other hand, sometimes itcan be
avoided, but isn't.
Publication bias occurs when researchers only want to publish the most sensational findings, or rather, only
the positive ones. Only the results that people will want to read make it to people's eyeballs, while findings
deemed boring do not.
TERMS TO KNOW
Bias
The tendency for collected data to differ from what is expected in a systematic way. Biased data can
often favor a specific group of those studied.
Selection Bias
Selecting a sample in such a way that certain subsets of the population are systematically excluded.
Publication Bias
The desire of researchers (and research publications) to only print the most sensational or interesting
articles.
2. Hawthorne Effect
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 96
Often, people will behave differently if they know that they're under observation. They become a bit self-
conscious when they are observed and want to do it “right”, so they act differently.
This idea that people might change what they would typically do based on the fact they're under observation
is a type of bias called the Hawthorne Effect.
IN CONTEXT
Suppose you are in charge of a weight loss study. One group is told to take a pill every day. The
other group is also told to take a pill every day, but it doesn't have any active ingredient in it.
You instruct them not to change their behavior. You don’t want them changing the results by eating
differently or exercising more. However, these people might change their behavior based on the fact
that they know they're going to be weighed later.
Another thing to consider is when a study is based on participants volunteering their time to be a part of this
study. What may happen is that only people with a passion specific to the study may sign up, which is known
as participation bias.
Furthermore, another issue may be that the participants tell you what theythink you want to hear, which is
response bias.
TERMS TO KNOW
Hawthorne Effect
People have the tendency to change their behavior when they know they are being monitored.
Response Bias
Bias that occurs when a respondent tells the interviewer "what they want to hear" or lies due to the
sensitive nature of the question.
SUMMARY
Bias has a problematic influence on many experiments and samples. Unfortunately, when bias exists,
the results received cannot be generalized to the population, because they are not reliable. It’s
important to know that bias is not always intentional. It can be a systematic flaw in the sample or the
experiment, but it's not always on purpose. Selection bias happens when the sample is not truly
representative of the population to which you want to generalize the information. Publication bias is
when researchers publish only the information that they think people want people to see. The
Hawthorne Effect is a type of bias that happens when people act differently, just knowing they are
being observed.
Good luck!
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 97
Source: Adapted from Sophia tutorial by Jonathan Osters.
TERMS TO KNOW
Bias
The tendency for collected data to differ from what is expected in a systematic way. Biased data can
often favor a specific group of those studied.
Hawthorne Effect
People have the tendency to change their behavior when they know they are being monitored.
Publication Bias
The desire of researchers (and research publications) to only print the most sensational or interesting
articles.
Response Bias
Bias that occurs when a respondent tells the interviewer "what they want to hear" or lies due to the
sensitive nature of the question.
Selection Bias
Selecting a sample in such a way that certain subsets of the population are systematically excluded.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 98
Nonresponse and Response Bias
by Sophia
WHAT'S COVERED
This tutorial will cover the topics of nonresponse bias and response bias by focusing on:
1. Nonresponse Bias
2. Participation Bias
3. Response Bias
1. Nonresponse Bias
A nice way to think of sampling is to use a "pot of soup" analogy. You want a representative sample, right?
Well, you don't need to drink the entire pot of soup in order to figure out what's in it. You just need the right
taste.
It would be like selecting all of the ingredients from the soup in a single tasting, but certain things can go
wrong with the taste test that can affect what you think is in the soup. Just like you don't really know what the
population looks like, you really don’t have a clear idea of all the ingredients in the soup. All you get is the
taste, and if you don't get the right taste, you're going to leave something out and not know exactly what's in
the soup (or, population).
In terms of sampling, nonresponse means that someone selected for the sample either can't be contacted or
is unwilling to participate.
Now, nonresponse happens. It's an inevitability that you will get uncooperative people, people that don't want
to take your survey or people who refuse to be part of your experiment. It may be that you just won't be able
to contact certain people.
The problem of nonresponse is not a problem until the people that weren't able to be contacted or refused to
participate differ substantially from the people that were in the sample. Now the sample is not representative
of the population. That is called nonresponse bias because you're not getting an accurate cross-section of
opinions. The opinions of people that you wanted to get are left out.
IN CONTEXT
A workplace wishes to survey 200 of its 1,000 employees about their workload and their stress level,
so they put 200 surveys in the workers' mailboxes. It’s likely that the people who have the biggest
workloads might get left out of the sample because they don't check their mailboxes as often as
other people. Or if they do get around to checking their mailbox, they may not complete the survey,
or don't return it, because they're so busy.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 99
What effect might that have? The 200 respondents that completed the survey may have reported
that workload level is not that high. The only problem is that the people with the lower workloads are
the only people who turned them in, because they had the time to take it. Also, the people with the
higher workloads didn't have the time to take it, reinforcing the conclusion that the company might
think the workload level is lower than it really is.
The nonresponse rate is easy to calculate. You just subtract the number that you got back from the number
that you mailed out, and that's your nonresponse rate.
EXAMPLE Say you mailed out 100, and you only got 80 back. Well, that's 20 out of 100, or 20%
nonresponse rate.
⚙ THINK ABOUT IT
Consider the different ways of conducting a survey, a poll, or a sample. Which of the following methods do
you think has the highest nonresponse rate?
Mail
Telephone
Face-to-Face
The answer is the mail. People will either throw it away, forget to fill it out, or maybe they'll fill it out and
then forget to mail it back. This is problematic because when the United States takes its census of
everyone in the country, it does so by mail. Sometimes they have to do follow-ups.
In samples with high rates of nonresponse, follow-ups typically are needed. Suppose you started with a
mailing. You might need to follow up by calling them at home. If you can't reach them by calling them at home,
you might need to follow up by coming directly to their house.
Sometimes, even when they are contacted, someone will refuse to participate. Follow-ups like this might be
more necessary in some areas of the country than others because different areas of the country have
different rates of nonresponse.
TERMS TO KNOW
Nonresponse
Nonresponse is a lack of response from people you've selected. It affects the ability to draw
conclusions from your sample.
Nonresponse Bias
Bias that occurs when the people who were unable to be reached or unwilling to participate in a
sample have substantially different opinions than the people who were included in the sample,
resulting in a misrepresentation of the population.
2. Participation Bias
On the other end of the spectrum is when people are excessively passionate about a topic and they’re eager
to participate. The people who raise their hand to participate are volunteering their time because they have a
strong opinion about the topic at hand. Participation bias happens when people participate because they
have strong opinions about the topic, or they’re ambivalent because they are only participating because they
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 100
are getting paid to participate.
EXAMPLE Suppose you need to gather information on an upcoming election and you ask people
to participate in a focus group. In your group, you find that you have a group in strong support of the
Democratic party and you have a group in strong support of the Republican party, and no one in the
middle.
To correct this, you decide you’re going to pay participants $20 for their time. Now your group is filled
with people who will simply tell you want they think you want to hear, which invites participation bias.
TERM TO KNOW
Participation Bias
Bias that occurs when participation in a study is voluntary. People who feel strongly may be the only
participants.
3. Response Bias
Response bias is when people's answers are influenced. Remember the pot of soup analogy? When you get a
representative sample, that's like getting a little taste of everything in the soup. However, things can go wrong
and you don't get the right taste of the soup.
Response bias can occur if the wording of the question is unclear to the respondent, if a respondent is
uncomfortable due to the sensitive or personal nature of the questions, or if the respondent feels like the
questioner is implying that the question has a "correct" response. That's also called social desirability bias.
IN CONTEXT
On April 20, 1993, the New York Times published an article on a survey conducted by the Roper
Organization on behalf of the Jewish American Community about the soon-to-be opened Holocaust
Museum in Washington, DC.
The newspaper reported that 22%, an astounding number of adults surveyed, expressed some
doubt as to whether the Holocaust had actually occurred. The actual question that was presented to
people was:
"Does it seem possible, or does it seem impossible to you, that the Nazi extermination of the
Jews never happened?"
This seems to be a fairly straightforward question, but there was a big problem with it, and it caused
response bias. The problem is that the question contained a double negative, which are confusing.
Saying it is impossible that it never happened is the same as someone saying they are certain that it
did happen, but the question doesn't clearly read that way.
The good thing is that, one year later, the question was revised, and it became clearer. The new
question stated:
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 101
"Does it seem possible to you that the Nazi extermination of the Jews never happened, or do
you feel certain that it happened?"
With this new, clearer question, the question clearly distinguishes between what the two options
are--"does it seem possible," or "do you feel certain?" With the two options clearly defined, less than
2% of individuals were unsure as to whether it was real or not. This provided a more accurate
interpretation of what the American public felt.
Therefore, unclear questions can lead to an inaccurate representation due to response bias. The other
scenario in which this can occur is when people will answer a question because they are either ashamed, or
they think that there's a "right" answer that someone is fishing for.
There are certain topics that are particularly sensitive and might make a person want to lie.
This may result in many people saying they've never used drugs, whether they
Drugs actually have or not. Even if there's no consequence and the survey is anonymous,
they'll still say they've never used drugs when, in fact, they have.
Criminal history Participants might say they don't have one, even if they do.
Sexual behavior This might cover topics of a highly sensitive and personal nature.
There's an implied right answer; people don't want to say that they're racially
Racial prejudice
prejudiced.
People will report it as being higher than it actually is if they're of low-income status,
or even possibly more surprisingly, people will report it as lower than it really is if
Income they're of very high-income status. A lot of people don't want to be showy about
their wealth, and so they'll try and come up with a more reasonable number, in their
eyes.
How does this affect what we think about the population? How does this affect the "soup?"
It's like taking a sample of the soup and only tasting the things that you want to taste. Maybe you don't like
beans, and so you just sort of ignore the fact that they're in there. You don't get the overall flavor of what's
supposed to happen. It's the same thing with response bias. It doesn't give you the right overall interpretation
of what things the population is supposed to be like.
TERM TO KNOW
Response Bias
Bias that occurs when either (1) the question is poorly worded so that certain responses are over-
represented, or (2) the respondent is confused by the question or feel like they should lie due to the
sensitive nature of the question.
SUMMARY
Nonresponse bias occurs when people who are selected for the sample can't participate, either
because you can't find them, or because they're actively refusing. The biggest problem is that if you
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 102
have high rates of nonresponse, it might give you an inaccurate representation of what's going on
with your population. You won't be able to use your sample to draw an inference about your
population. Response bias occurs one of two ways: either a respondent doesn't understand the
question and so gives an answer that he wasn't intending; or, the respondent wants to give a
supposedly correct answer to the questioner. Both of these can be inaccurate representations of
what actually is the truth about the population. Response bias is a tough thing to get rid of, especially
when it is unintentional and surrounds the wording of the questions.
Good luck!
TERMS TO KNOW
Nonresponse Bias
Bias that occurs when the people who were unable to be reached or unwilling to participate in a sample
have substantially different opinions than the people who were included in the sample, resulting in a
misrepresentation of the population.
Participation Bias
Bias that occurs when participation in a study is voluntary. People who feel strongly may be the only
participants.
Response Bias
Bias that occurs when either (1) the question is poorly worded so that certain responses are over-
represented, or (2) the respondent is confused by the question or feel like they should lie due to the
sensitive nature of the question.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 103
Selection and Deliberate Bias
by Sophia
WHAT'S COVERED
This tutorial will cover the topics of selection, deliberate, and unintentional bias. These may all impact
the selection of the right group of people for your sample, so it’s very important to be aware of them
when attempting to generalize findings. Our discussion breaks down as follows:
1. Selection Bias
2. Random Digit Dialing
3. Deliberate Bias
4. Unintentional Bias
1. Selection Bias
You may recall that sampling is like a pot of soup. Selecting a little bit of each ingredient for the soup is like
obtaining a representative sample for an experiment. However, things can go wrong with the taste test, which
may limit the ability to draw conclusions about the pot of soup as a whole.
Selection bias is also called undercoverage bias. It occurs when a significant subset of the population is left
out of the sample. This is not necessarily intentional, but rather, occurs when they were systematically ignored
by whoever was taking the sample.
IN CONTEXT
In 2008, almost every poll showed Barack Obama leading by at least five percentage points leading
up to the New Hampshire presidential primary. All of these were based on random digit dialers
calling a random sample of New Hampshire households. It was a well-done survey of all accounts.
However, what happened was that Clinton gained some support in the last few days. Mainly, a lot of
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 104
college students ended up coming out in support of Hillary Clinton in the last days when people
were expecting all college students to come out in support of Obama.
Because a lot of the college students are from out of state, they aren't actually New Hampshire
residents. For that reason, they were not counted and, as a result, the sample got every prediction
wrong and Clinton ended up winning.
TERM TO KNOW
Selection Bias
A bias that results from systematically excluding certain subsets of the population from the sample. It
is not necessarily intentional.
The biggest advantage of using random digit dialers is that they can reach mobile phones and unlisted
numbers that you wouldn't be able to obtain using a phone book. So, it evens the playing field a bit since
anyone can be selected for that sample as long as the phone number is within that particular area code.
⚙ THINK ABOUT IT
How does selection bias affect what we think is in the soup? Imagine that certain ingredients were
located only in certain locations in the pot. Maybe noodles sunk to the bottom. If you tasted only from the
top, it doesn't matter how big that taste is. If you missed the noodles, you wouldn't even know they were
there. That's the same as dealing with selection bias. Because you didn't select the representative group
of ingredients from the population, you don't get the right idea of what's going on. It limits your ability to
generalize your findings to the general population.
TERM TO KNOW
3. Deliberate Bias
Deliberate bias is exactly what it sounds like: it's a bias that's done on purpose. While deliberate bias doesn’t
happen very often, it can occur when there's a conflict of interest between the people performing research
and the people funding--who are usually the ones benefiting from--that research.
Typically deliberate bias is motivated by an interest unrelated to the integrity of whatever you’re researching.
Most research is done with integrity, but when personal prestige, the advancement of some ideology, or
money get in the way, it’s harder to prove that intentions are pure. Politics can be an industry ripe for
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 105
deliberate bias. Perhaps people call with a poll, but the survey includes a leading question to cause the person
to respond in a certain way. When this is done it's called “push polling” and it’s highly suspect.
IN CONTEXT
Deliberate bias can happen in other areas too--even the medical field. Suppose there are two drugs:
Drug A and Drug B. The company for Drug B posed the following leading question:
Based on how this question was posed, Drug B would be more likely to be chosen.
But there’s more. They've put a thought into the participant’s head that Drug A is linked to cancer.
Did they ever explicitly say that? No, they said if it was linked to cancer. However, now they've
placed the association in the participant's mind. Subconsciously they're beginning to steer
consumers away from Drug A and towards Drug B.
If a drug company funds a study to determine if it's latest drug is effective, the researchers stand to gain a lot
of money and prestige for having tested the drug, if proven effective. For this reason, they might not be the
best choice to test the drug.
IN CONTEXT
An environmental research group is hired by a real estate developer to investigate the effects of a
new building. If the results are favorable, they might get another contract with that real estate
developer. If the environmental research group doesn’t come through with a favorable
interpretation, another group will, and that group will get the next contract.
The environmental research group wants to be hired by the developer on another project, so there
is a conflict of interest.
TERM TO KNOW
Deliberate Bias
The purposeful misrepresentation of data for the purpose of advancing an agenda.
4. Unintentional Bias
Unintentional bias occurs when there is simply an error in the design of the study. Two types of unintentional
bias include:
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 106
Response bias, which involves the wording of questions or refers to people feeling like they have to lie.
Selection bias, which involves how the sample was selected, such as when people are not included in the
selection process, even though they make up a portion of the population.
Both are simply errors with no hidden agenda. They're not intentional and are not meant to purposely steer
the direction of the respondents.
TERM TO KNOW
Unintentional Bias
Bias that is not purposeful. It exists because of errors in the design of the study.
SUMMARY
Selection bias occurs when some subset of the population is left out. It might be intentional or
unintentional. Since some section of the population is left out, the coverage is lacking, which is why
selection bias is also known as “under-coverage”. Random digit dialing is a great tool to use since it
helps extend coverage to mobile phones and unlisted numbers. Most of the time, deliberate bias-- a
bias that is done on purpose--is not typically a cause of concern. Sometimes, however, people with
personal interests, like the advancement of an ideology or financial gain, steer results towards
outcomes that are favorable to them. Most of the time, research is done with integrity. When bias
does occur, it is accidental, which is called unintentional bias.
Good luck!
TERMS TO KNOW
Deliberate Bias
The purposeful misrepresentation of data for the purpose of advancing an agenda.
Selection Bias
A bias that results from systematically excluding certain subsets of the population from the sample. It is
not necessarily intentional.
Unintentional Bias
Bias that is not purposeful. It exists because of errors in the design of the study.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 107
Convenience & Self-Selected Samples
by Sophia
WHAT'S COVERED
This lesson will explain two types of samples: convenience and self-selected samples. Our discussion
breaks down as follows:
1. Representative Samples
2. Non-Representative Samples
a. Convenience Samples
b. Self-selected Samples
1. Representative Samples
One of the things that we know about sampling is that it's important for samples to be representative of the
population, also known as a representative sample. What we mean by that is when we take our sample--
which is a subset of a larger population--we want this sample to behave just like the population would if we
sampled them all.
⭐ BIG IDEA
The sample should represent the group/population at large, so it’s important individuals are selected
carefully for the sample. That way, accurate information will be gained and can be used to describe the
group/population at large.
The goal is to generalize what is found in the sample and apply it to the people outside of the box, or the
population.
TERM TO KNOW
Representative Sample
A sample that accurately reflects the population.
2. Non-Representative Samples
The two methods analyzed in this tutorial have major flaws--these two designs donot result in representative
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 108
samples. They are conducted often, so it’s important for you to recognize them.
IN CONTEXT
Suppose there is a crowd of people at a mall and there is one guy with a clipboard, and he wants
some data. He might take the people nearest to him, and say, “Hey, would you like to take my
survey, please?”
The people he asks might be representative of the population, but they might not. They all simply
happen to be at the same place at the same time. This means they might have some similarities that
could make them not representative of the larger population. The risk of them not representing the
group/population at large is too high.
EXAMPLE If you ask people about their spending habits, and they all happen to be shopping in
the headphones section, that probably means they have similar ideas about how they should spend
their money.
TERM TO KNOW
Convenience Sample
A sample that is easily obtained. It is often not representative of the population.
EXAMPLE If your focus group is about politics, you might get only the very, very liberal people or
the very, very conservative people. You might get the most extreme viewpoints but none of the
viewpoints in the middle. Or, there are also a lot of people who are ambivalent about politics. They
don't really care, but they want to get paid if this is a sample that offers compensation or another type
of reward like free lunch.
TERM TO KNOW
SUMMARY
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 109
Representative samples are important if we want to accurately generalize our findings to the
population. Convenience samples render people who are simply in the vicinity and happen to be at
the same place at the same time. Self-selected samples are also called “voluntary response” and tend
to elicit either strong opinions or no opinion at all.
Good luck!
TERMS TO KNOW
Convenience Sample
A sample that is easily obtained. It is often not representative of the population.
Representative Sample
A sample that accurately reflects the population.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 110
Random and Systematic Errors
by Sophia
WHAT'S COVERED
This tutorial will compare random errors vs. systematic errors. Our discussion breaks down as follows:
1. Random Errors
2. Systematic Errors
1. Random Errors
Random errors are exactly that: random. They can simply occur through no fault of the person taking the
sample. When a sample is taken from a larger population, the results are unknown, meaning that it’s unclear if
the results will accurately represent exactly what the population looks like.
IN CONTEXT
Suppose there were 100 individuals, which we will consider the population. Twenty of them were
college students. You select 5 people out of the overall 100 for a sample. What would you expect to
happen?
You would expect that twenty percent of the population are college students, which is one out of
every 5 people. So you would probably expect one individual within your sample of 5 people to be a
college student.
However, that doesn't always happen. You might not get any college students, or all five of them
may be college students. Just because you expect to get one doesn't mean that will actually
happen. Why not?
Let’s say that the individuals with numbers 1 - 20 are the college students. Numbers 21 - 100 are
individuals not in college. Using a random number generator, you might get a simple random sample
that looks like this:
Sample Percentage
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 111
Another simple random sample might look like this:
Sample Percentage
However, you might get a simple random sample that looks like this:
Sample Percentage
Here, the second person, number 5, and the fifth person, number 20, are college students, out of
100 individuals in the population. That’s 40%. What went wrong? Nothing went wrong--it’s just that
random errors happen sometimes.
Random error occurs when the sample, just by chance, doesn't match up perfectly with the population.
Random error is not a mistake that is correctable; it is simply something that happens when sampling
randomly. While it can’t be corrected or avoided completely, the impact can be minimized by increasing the
sample size. The larger the group, the better the chances are that a representative group will be obtained.
EXAMPLE Recall the example from above. Suppose that ten individuals from the group of 100
were chosen instead of five. Two college students would be expected to make it into the sample. So, if
the sample was off by one, it reduces the impact since at least one college student would be
represented.
TERM TO KNOW
Random Error
When the resulting value obtained from the sample does not match the value from the population
simply by chance. This is not a mistake, but is inherent in the variability in sampling.
2. Systematic Errors
Now, by contrast, systematic errors are mistakes. Systematic errors are due to flaws in the design.
IN CONTEXT
Suppose a school board wants to estimate how many students are eligible for free or reduced lunch.
If you have an under-coverage bias, or selection bias, your sample may include people from a
poorer neighborhood that didn't respond to a questionnaire that was sent out. Perhaps their parents
were working nights and didn’t have time to complete the survey.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 112
Therefore, the board may underestimate the true number of students requiring free and reduced
lunch. This type of error cannot be remedied by increasing the sample size.
EXAMPLE A child has a growth chart in his room and his parents mistakenly put it up above the
baseboard--an extra 2 inches from the floor. This is going to result in the child thinking he’s 2 inches
shorter than he actually is, an example of measurement bias, which is systematically wrong.
TERMS TO KNOW
Systematic Error
When the resulting value obtained from the sample does not match the value from the population as a
result of an incorrect measurement or bias. This is a mistake made by the researcher.
Selection Bias
A bias that occurs when certain groups are systematically left out of the sample. This is a systematic
error.
Measurement Bias
A mistake in the measurements taken in the study. This is a systematic error.
SUMMARY
Random errors occur when the sample selected doesn't match up with the population. It cannot be
controlled, but using a larger sample will lessen the effect. Conversely, systematic errors result in
wrong answers or wrong values in your sample, due to some kind of bias or error with your
measurement. Increasing the sample size will not fix the issue. When a systematic error occurs, you
might as well just start over, because there's no rescuing poorly collected data!
Good luck!
TERMS TO KNOW
Measurement Bias
A mistake in the measurements taken in the study. This is a systematic error.
Random Error
When the resulting value obtained from the sample does not match the value from the population
simply by chance. This is not a mistake, but is inherent in the variability in sampling.
Selection Bias
A bias that occurs when certain groups are systematically left out of the sample. This is a systematic
error.
Systematic Error
When the resulting value obtained from the sample does not match the value from the population as a
result of an incorrect measurement or bias. This is a mistake made by the researcher.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 113
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 114
Margin of Error
by Sophia
WHAT'S COVERED
1. Margin of Error
2. Confidence Interval
1. Margin of Error
You may have seen something in your local newspaper stating that, for example, a political candidate leads
the field by 5%, and that there is a 3% margin of error in the poll. What does this mean?
When surveys are done, collecting the right amount of data is important to ensure the answer is correct.
Samples are often reported with something called a margin of error, meaning that the results may be off by a
little bit, though it can be estimated by how much. It explains to the reader that the right answer is not 100%
accurate, but it is a close estimate.
IN CONTEXT
Suppose you are an administrator of a school and you need to determine the overall percentage of
left-handed students. Maybe 10% of students in the school are left-handed, but when you take a
sample, even though you were diligent about the way data was collected, you got 8%. The answer
was not accurate. What happened?
It's possible that the data obtained was not exactly the same as what the population would have
obtained. Maybe only 8% of left-handed people were in the sample, even though the population
actually contains 10% who are left-handed. You didn't do anything wrong, but samples might be
inherently off the mark due to the random selection process.
TERMS TO KNOW
Margin of Error
An amount by which we believe our sample's mean may deviate from the true mean of the population.
Estimate
The mean value obtained from the sample. If the sample was well-collected, the estimate should be
reasonably close to the true value.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 115
2. Confidence Interval
The confidence interval uses both the estimate and margin of error. When we combine these two parts, it
gives us a range of possible values that our estimate can be.
This confidence level tells us how sure we are that our interval contains the actual population value or how
sure we are that our sample falls in that range.
IN CONTEXT
Suppose a newspaper polled 500 voters and 48% responded that they were going to vote for
Candidate X in the upcoming election. The newspaper might print a margin of error along with that
48% mark; perhaps they use four percentage points as their margin of error. It's not particularly
important how this 4% was calculated, but it is important to note that a margin of error was reported
along with the percent value.
What does this 4% margin of error mean? It means the researchers are pretty confident that the true
amount of people that will vote for Candidate X is within 4% of 48, which means that it could be as
low as 44%, or as high as 52%, or anywhere in between. This idea of creating some wiggle room on
either side of 48% is the confidence interval.
Suppose on election day, 46% of the people voted for Candidate X. Since this falls into the range of
44% to 52%, it is a close enough estimate to the right answer.
⚙ THINK ABOUT IT
What happens to the margin of error as the sample size increases? Will the margin of error go up, down,
or stay about the same?
As the sample size goes up, the margin of error goes down because a larger sample size gives a more
accurate portrait of the population. What’s happening is that you cast a wider net to include people that may
be closer to representing the actual population.
If you had a sample size of 4 people and you want to generalize the findings to a population of 200 people,
it’s unlikely that just those four people have enough of the characteristics to represent the population.
However, when the sample size is increased, you get closer to achieving a representative sample, which
means the confidence interval can be lower; in other words, the higher the sample size, the less wiggle is
needed room on each side of the measurement.
TERM TO KNOW
Confidence Interval
A range of potential values that the true value could be. It is obtained by adding and subtracting the
margin of error from the value in the sample.
SUMMARY
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 116
Most statistical results are reported alongside a margin of error, which is an amount by which the
sample's mean may deviate from the true mean of the population. If the data is well-collected, then it's
likely that the true population value is within the confidence interval created by the reported value,
plus or minus the margin of error. It's a bad idea to compare two values within the same confidence
interval since both would be accurate enough to be correct. That would be a statistical dead heat.
Good luck!
TERMS TO KNOW
Confidence Interval
A range of potential values that the true value could be. It is obtained by adding and subtracting the
margin of error from sample mean.
Estimate
The mean value obtained from the sample. If the sample was well-collected, the estimate should be
reasonably close to the true value.
Margin of Error
An amount by which we believe our sample's mean may deviate from the true mean of the population.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 117
Terms to Know
Absolute Change
The raw increase or decrease in the value of a variable
Accuracy
The extent to which the values, when considered all together, center around the correct
value for a variable.
Available Data
Data collected by some other entity - a government organization or private company.
Bias
The tendency for collected data to differ from what is expected in a systematic way. Biased
data can often favor a specific group of those studied.
Blinding
The practice of making sure that certain individuals do not know which subjects are
receiving which treatment.
Census
Using the entire population to obtain data
Closed Question
A question type with only so many different answer choices.
Cluster Sample
A sampling method where the population is separated into groups, typically geographically,
and a random selection of clusters is made. Each individual in the cluster becomes part of
the sample.
Clusters
Smaller subgroups of the population, not necessarily similar in any way besides all being
together in one place, making the individuals easier to sample together.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 118
random.
Confidence Interval
A range of potential values that the true value could be. It is obtained by adding and
subtracting the margin of error from sample mean.
Confounding
Occurs when the effects of the treatments, if any, are indistinguishable from the potential
effects of some other variable which was unaccounted for.
Confounding Variable
A variable which was not accounted for in a study, which limits the conclusions that the
study can draw.
Continuous Data
Data that can take any value within an interval.
Control
The principle of experimental design that requires that other variables which may confound
the experiment be held constant between the treatment groups, so that any differences in
the groups can be attributed to the different treatments.
Convenience Sample
A sample that is easily obtained. It is often not representative of the population.
Data
Information used in a study to answer a statistical question
Deliberate Bias
The purposeful misrepresentation of data for the purpose of advancing an agenda.
Descriptive statistics
Using only the information at hand to describe the selected group of individuals
Discrete Data
Data that can only take so many different values.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 119
Double-Blind Experiment
An experiment where neither the subjects, nor anyone in contact with them, has any
knowledge of which subjects are receiving which treatment.
Estimate
The mean value obtained from the sample. If the sample was well-collected, the estimate
should be reasonably close to the true value.
Experiment
A type of study where researchers impose treatments on the participants or experimental
units.
Experimental Design
The way in which an experiment is carried out. A good design has key elements of
randomization, replication, and control.
Experimental Unit
An animal or thing involved in an experiment.
Explanatory Variable
A variable that we believe is predictive of something else. An increase in this variable will
correspond to an increase or decrease in some other variable.
Hawthorne Effect
People have the tendency to change their behavior when they know they are being
monitored.
Index Number
A way to measure the relative change in a value, usually the price of a good or service, over
time. If the index number is over 100, that means the price has increased. If the price has
decreased, then the index number will be less than 100.
Inferential statistics
Using the information at hand to make a larger, more general statement about the entire
population of individuals
Inflation
A relative increase in the price of a good or service over time. A person will need to pay
more to receive the same good or service than they did at a previous point in time.
Margin of Error
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 120
An amount by which we believe our sample's mean may deviate from the true mean of the
population.
Matched-Pair Design
An experimental design where two subjects who are similar with respect to variables that
could affect the outcome of the experiment are paired together, then one of them is
assigned to one treatment and one is assigned to the control. This can also be done by
assigning each subject to both treatments, where each subject acts as their own matched-
pair.
Measurement Bias
A mistake in the measurements taken in the study. This is a systematic error.
Multi-Stage Sampling
A sampling design which combines elements of cluster sampling, stratified random
sampling, and simple random sampling. It "zooms in" on smaller areas to sample so that
sampling becomes more feasible.
Nominal Data
Categorical data with qualities that cannot be ordered or ranked.
Nonresponse Bias
Bias that occurs when the people who were unable to be reached or unwilling to participate
in a sample have substantially different opinions than the people who were included in the
sample, resulting in a misrepresentation of the population.
Observational Study
A type of study where researchers can observe the participants, but not affect the behavior
or outcomes in any way.
Open Question
A question type with no answer choices; the respondent can choose what he or she wants
to say to answer the question.
Ordinal Data
Categorical data with qualities that can be ordered or ranked.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 121
Participation Bias
Bias that occurs when participation in a study is voluntary. People who feel strongly may be
the only participants.
Percent Change
A relative increase or decrease in a percent value
Percentage Points
An absolute increase or decrease in a percent value.
Placebo
An inert drug or treatment given to the control group. It has no active ingredient in it.
Placebo Effect
The observed phenomenon whereby certain individuals will exhibit a desired response
even when taking a placebo, which contains no active ingredient.
Population
The entire set of individuals from which to sample
Precision
The extent to which the values are very close to each other, even if they are not near the
correct value.
Prospective Study
A study that begins by selecting participants, then tracks them and keeps data on the
subjects as they go into the future.
Publication Bias
The desire of researchers (and research publications) to only print the most sensational or
interesting articles.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 122
Random Digit Dialing
A method of contacting people on the phone. Random numbers are dialed, so this allows
researchers to sample people with unlisted phone numbers.
Random Error
When the resulting value obtained from the sample does not match the value from the
population simply by chance. This is not a mistake, but is inherent in the variability in
sampling.
Random Sample
A sample that has been selected in a manner where every member of the population has
some predetermined chance of being selected for the sample
Random Selection
The method of obtaining a random sample
Randomization
The principle of experimental design that requires that the subjects/experimental units be
assigned to groups using some random process. This ensures that the two groups are
roughly equal prior to assigning treatments.
Raw Data
Unorganized, unprocessed and not summarized.. Typically, this is data that is not already
available
Reference Value
An arbitrarily chosen starting value for an index. It is assigned an index number of 100.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 123
Relative Change
The percent increase or decrease in the value of a variable.
Replication
Repeating the experiment on multiple subjects/experimental units. This principle of
experimental design that states that a larger experiment with more subjects/experimental
units will allow us to more clearly see differences between the treatments.
Representative Sample
A sample that accurately reflects the population
Response Bias
Bias that occurs when a respondent tells the interviewer "what they want to hear" or lies
due to the sensitive nature of the question.
Response Variable
A variable that is affected by the explanatory variable.
Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand
how they became the way they are in the present.
Sample/Sampling
A subset of the population. There are many ways to select a sample.
Selection Bias
Selecting a sample in such a way that certain subsets of the population are systematically
excluded.
Single-Blind Experiment
An experiment where either the subjects have no knowledge of which subjects are
receiving which treatment, or people in contact with the subjects have no knowledge of
which subjects are receiving which treatment, but not both.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 124
Statistical analysis
All the ways of collecting, analyzing, and interpreting the data
Statistical study
A way to collect information from individuals
Statistics
The study of collecting, analyzing, interpreting, and presenting information
Stratum/Strata
The homogenous groups in a stratified random sample. All individuals in each stratum have
something in common, and we would like to see how that affects the outcome of the
sample.
Subjects/Participants
The people or things being examined in an observational study.
Survey Design
The way the survey is set up. This deals with the wording of questions and answer choices.
Survey/Sample Survey
A data collection tool that individuals in a study can fill out and return to the researcher.
Systematic Error
When the resulting value obtained from the sample does not match the value from the
population as a result of an incorrect measurement or bias. This is a mistake made by the
researcher.
Treatment
Something the researchers administer to the subjects or experimental units.
Unintentional Bias
Bias that is not purposeful. It exists because of errors in the design of the study.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 125
Variable
Any attribute or number that can be measured about individuals in a study.
Variable of Interest
Any variable which we need to know about in the context of a study.
Variables of Interest
The variables the survey wishes to measure about those taking the survey.
Formulas to Know
Absolute Change
Index Number
Relative Change
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 126