0% found this document useful (0 votes)

22 views126 pages

Unit 1 Tutorials Key Principles of Statistical Methods

The document provides an overview of key statistical concepts including data, sampling techniques, experimental design, variables, and evaluating studies. It discusses collecting and analyzing data, making inferences about populations from samples, and accounting for bias and errors in studies. The goal is to help readers understand fundamental statistical principles.

Uploaded by

ptmwriters

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views126 pages

Unit 1 Tutorials Key Principles of Statistical Methods

Uploaded by

ptmwriters

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 126

Unit 1 Tutorials: Key Principles of

Statistical Methods
INSIDE UNIT 1

Statistics Fundamentals

Statistics Overview
Data
Qualitative and Quantitative Data
Discrete vs. Continuous Data

Sampling

Sampling
Random & Probability Sampling
Simple Random and Systematic Random Sampling
Stratified Random and Cluster Sampling
Multi-Stage Sampling

Experiments

Observational Studies and Experiments

Prospective and Retrospective Studies
Experimental Design
Randomized Block Design
Completely Randomized Design
Matched-Pair Design
Surveys
Blinding
Placebo

Data

Variables
Question Types
Accuracy and Precision in Measurements
Absolute Change and Relative Change
Using Percentages in Statistics
Index Number and Reference Value

Bias
Nonresponse and Response Bias
Selection and Deliberate Bias
Convenience & Self-Selected Samples
Random and Systematic Errors
Margin of Error

Statistics Overview
by Sophia

 WHAT'S COVERED

This lesson will provide you with an overview of what statistics really is by exploring:

1. Statistics
2. Types of Statistics

1. Statistics
You might be wondering, what is statistics? Is it some complicated formula? Is it some goofy graph that you
really don't know that much about?

When people refer to statistics, they're usually referring to information called data that's been collected and
synthesized within a statistical study, and sometimes presented in a graphical form, like this.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 2
While the image may be small and difficult to read, you get the idea that a LOT of information can be
presented in the form of a graph.

It can also be presented numerically such as "The median household income in the United States is $46,326."

Video Transcription
[MUSIC PLAYING] The practice of statistics deals with these four concepts here. Collect, analyze,
interpret, and present. You begin by collecting information from a variety of sources. You then proceed
to analyze that information that you've collected. After that, you interpret what that analysis means and
then you present it in a way that anyone can understand. And in this course you're going to learn how to
do all those things, and if I may try to be honest-- though as a robot, I can't fully experience the feeling of
honesty-- I do understand statistics quite well.

And I must say it's a really neat way to describe our messy world. It's not pretty all the time, but statistics
allow us a way to simplify things.

[MUSIC PLAYING]

 STEP BY STEP

The practice of statistics deals with four main steps:

1. Collect. Collect the information from a variety of sources

2. Analyze. Analyze the information that you've collected
3. Interpret. Interpret what that analysis means
4. Present. Present it in a way that anyone can understand

Statistics is a neat way to describe a messy world. It's not pretty all the time. But statistics allows us a way to

 TERM TO KNOW

Statistical study
A way to collect information from individuals

2. Types of Statistics
When you use descriptive statistics, you are going to analyze what's going on at aparticular point and use
statistics to describe the information that you've obtained.

On the other hand, when you use inferential statistics, you are going to use statistics that you've obtained and
make a generalization about the population at large.

IN CONTEXT
Let's say that you read the newspaper this morning and discovered that the average household
income in the United States was reported to be $46,700.

This information didn't come from sampling every household in the United States. That wouldn't be
realistic or feasible to knock on all the doors and speak to all those people. But someone arrived at
this number. So, how did they get it?

Well, a sample was taken, and a generalization was made about the entire United States based on
that sample.

This is inferential statistics.

 TERMS TO KNOW

Descriptive statistics
Using only the information at hand to describe the selected group of individuals.

Inferential statistics
Using the information at hand to make a larger, more general statement about the entire population of
individuals.

 SUMMARY

Statistics allows us to synthesize the information we get from the world around us. There are two
types of statistics. Descriptive statistics describe information gathered at a particular point. Inferential
statistics gather information and then makes a generalization or prediction about the population.

Good luck!

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 4
Source: THIS WORK IS ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS. bar chart, CC,
https://en.wikipedia.org/wiki/Chart#/media/File:Black_cherry_tree_histogram.svg no modifications made

 TERMS TO KNOW

Descriptive statistics
Using only the information at hand to describe the selected group of individuals

Inferential statistics
Using the information at hand to make a larger, more general statement about the entire population of
individuals

Statistical analysis
All the ways of collecting, analyzing, and interpreting the data

Statistical study
A way to collect information from individuals

Statistics
The study of collecting, analyzing, interpreting, and presenting information

 WHAT'S COVERED

This lesson will introduce the collection and evaluation of data including:

1. Defining Data
2. Evaluating Types of Data
3. Gathering Data

1. Defining Data
Data is the pieces of information that we use in order to answer some statistical question. It could be a
number or an attribute.

But ultimately, it's the pieces of information that we use to get a more accurate picture of a scenario. Every
piece of data helps us to get a more accurate description, which begs the question, how do you obtain data?
Where does it come from? Do you just make it up? Where is data?

 TERM TO KNOW

Data
Information used in a study to answer a statistical question.

2. Evaluating Types of Data

There are two types of data to serve your purposes. It's possible that the easier route is to go with something
someone else has already done. Available data is data that has already been collected by somebody.

Now, who collects data? Well, a lot of places collect data, such as:

Government organizations
Polling organizations
News sources
Government entities
Private entities

The vast majority of sources are trustworthy. However, when using available data, it's important to think
critically about what the information is trying to convey. It’s essential to break apart the information and ask
yourself these questions:

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 6
Who collected it?
Are they reputable?
Are they trustworthy?
When was it collected?
How was it collected?
Why did they collect it?

So, how do you know when you need to gather the information yourself? Gathering information yourself is
called raw data. Obviously, if the population doesn’t match your topic of interest, then it is of no value to you,
so you need to gather it yourself.

But what about less obvious characteristics such as whether or not a source has an agenda? This is a key
point here. Having an agenda, whether intentional or not, can introduce what's called bias.

Often, polling organizations and news organizations and government entities try to do the best job they can to
get relevant information. It's not usually intentionally put out there. But sometimes it is when they're trying to
push some kind of agenda.

 TERMS TO KNOW

Available Data
Data collected by some other entity - a government organization or private company.

Raw Data
Unorganized, unprocessed, and not summarized. Typically, this is data that is not already available.

Bias
The systematic favoring of certain outcomes in a study. There are many ways to introduce bias into a
study.

3. Gathering Data
If you choose to collect your own data, you must think critically and ask yourself these questions:

Who will receive this data?

For whom is the data intended?
How will you and others gain access to it?

Collecting data is important because it's the source of statistics. Think about data as the raw means of creating
something useful. If you collect your data well, the statistics are going to be accurate. If you collect your data
poorly, then your data is poor. There's no rescuing that.

⭐ BIG IDEA

You can't make useful statistics out of poor data. Thinking critically will help you determine which type of
data should be used for your purposes.

 SUMMARY

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 7
This tutorial defined data as “information used in a study to answer a statistical question.” We
discussed how to evaluate types of data, available or raw, and questions focusing on the who, what,
why, and how should be posed to help identify bias. When gathering your own data, it’s important to
understand your audience and consider how they will gain access to all your hard work.

Good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Available Data
Data collected by some other entity - a government organization or private company.

Bias
The systematic favoring of certain outcomes in a study. There are many ways to introduce bias into a
study

Data
Information used in a study to answer a statistical question

Raw Data
Unorganized, unprocessed and not summarized.. Typically, this is data that is not already available

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 8
Qualitative and Quantitative Data
by Sophia

 WHAT'S COVERED

In this tutorial, you're going to learn about the difference between qualitative data and quantitative
data by examining:

1. Qualitative Data
a. Nominal Measurements
b. Ordinal Measurements
2. Quantitative Data
3. Qualitative and Quantitative Data in Practice

1. Qualitative Data
Qualitative data is also often called “categorical data”. It is not numerical in the sense that we can do
numerical operations with it, like adding numbers together or finding an average, but rather, it fits in the
category.

 EXAMPLE Gender: male and female. That's a qualitative variable with two categories.
Letter grades AND zip codes feature numbers, but you wouldn’t necessarily do mathematical equations with
them. You wouldn’t find an average zip code, for instance. The purpose of zip codes is to divide areas into
categories. Hair color is another example of qualitative data because you can group those with black hair and
put those with blonde hair in another group.

It's important to know that qualitative data can be divided further into two categories:

Nominal Measurements
Ordinal Measurements

 TERM TO KNOW

Qualitative/Categorical Data
Data whose values are the names of categories. These can be numbers, but not the kinds of numbers
with which it makes sense to do any numerical operations.

1a. Nominal Measurements

 EXAMPLE Favorite color. The order of the listed categories makes no difference. It doesn't matter
if you put the colors below in the order of the color spectrum or not.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 9
With nominal data, it only makes sense to reference which category has the largest frequency. In this case,
let’s say most people say that green is their favorite color. That is what you would report and it doesn’t matter
that green is the 4th box from the left.

 TERM TO KNOW

Nominal Level of Measurement

Qualitative data where the order in which the categories are presented does not matter.

1b. Ordinal Measurements

 EXAMPLE Rating scale. The order of the listed categories is very important because the order is
associated with a type of value. It’s very important that you don’t mix up the order here because the
circle on the furthest left indicates you are feeling no pain.

Pain Scale
❍ ❍ ❍ ❍ ❍ ❍ ❍

No Moderate Worst
Pain Pain Pain

With ordinal data, it’s important to keep the order straight, or rather, in order, to express a spectrum ranging
from lowest to highest, or worst to best. Ratings like that.

 TERM TO KNOW

Ordinal Level of Measurement

Qualitative data where the order in which the categories are presented matters.

2. Quantitative Data
On the other hand, you have quantitative data. Quantitative data are expressed numerically. It makes sense
to do numerical operations with it, like finding averages or adding them together.

Examples of quantitative data include:

Weight
Commute time to work
Outdoor temperature

All of these are measured in numbers. It makes sense to find, for instance, averages of these. So you can do
numerical operations with them.

It's important to note that data is displayed differently for qualitative data than with quantitative data. Statistical
operations depending on the type of data that we have.

 TERM TO KNOW

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 10
Quantitative Data
Data whose values are numbers and it makes sense to do numerical operations.

3. Qualitative and Quantitative Data in Practice

Determine if each situation is qualitative or quantitative data.

Video Transcription
[MUSIC PLAYING] Here we have some examples to help you understand the differences between
qualitative and quantitative data. So first, we have blood type. That's going to be an example of
qualitative data. It's a description. It's telling you something about yourself, but it's not something that can
be added, or subtracted, or used for arithmetic even.

On the other hand, number of kids is quantitative data. Or how about a phone number? So even though
it's a number, it's still qualitative data because, really, who would ever add or subtract those values? So
what's an example of quantitative data? How about something like income? Income is quantitative data
because, again, it's a value that's giving us a quantity. It's telling us how much money you make, and
that's a value you could add, subtract, and do the mean and other measures of arithmetic.

[MUSIC PLAYING]

 SUMMARY

Data used in statistics falls under one of two broad classifications: categorical, which is called
“qualitative,” or numerical, which is called “quantitative.”

Qualitative data branches out even further to either nominal, which means that the names are
important, and ordinal, which means the order is important.

Numerical values must make sense to do numerical operations with them. They are treated differently
when organizing graphical displays and applying statistics to them.

Good luck!

Source: This work is adapted from Sophia author Jonathan Osters.

 TERMS TO KNOW

Nominal Data
Categorical data with qualities that cannot be ordered or ranked.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 11
Ordinal Data
Categorical data with qualities that can be ordered or ranked.

Qualitative (Categorical) Data

Data that describes. It can't be measured or used for arithmetic.

Quantitative (Numerical) Data

Data that is numerical. It can be measured and it can be used for arithmetic. .

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 12
Discrete vs. Continuous Data
by Sophia

 WHAT'S COVERED

This tutorial will discuss types of data by contrasting the following types of data:

1. Discrete Data
2. Continuous Data
3. Discrete and Continuous Data in Practice

1. Discrete Data
Now both of these are numerical or quantitative data, but discrete data can only take on certain values within
a range. Examples of discrete data would be the number of pets that someone has. Those can only take
whole number values. You can't have half of a pet.

Rail cars on the train and shoe sizes--now you can have half size shoe sizes. But that's all you can have. You
can't have quarter size shoe sizes, or eighth of size shoe sizes, or 0.01 shoe sizes. You can't say that you're a
size 9 and an eighth. So there are only certain values that shoe size can take. That makes it discrete.

 TERM TO KNOW

Discrete Data
Data that can only take so many different values.

2. Continuous Data
Now the difference between discrete and continuous is continuous data can take any value within a range.
Some examples of data that are continuous are temperature, commute time, and wait. With all of these
examples, you can take on any value within a range. So for instance, suppose you're talking about daytime
temperature.

The daytime temperature could be something between 50 and 80 degrees on a summer's day, and it takes
on any value between those. Same with commute time. One day it might take you 30 minutes and five
seconds to get to work. The next day it might take you 32 minutes and 17 seconds.

And weight, one person might weigh 150.75 pounds, and one person might weigh 102.62 pounds. They can
take on any value within a spectrum. As opposed to discrete values can only take certain values within a
spectrum.

 TERM TO KNOW

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 13
Continuous Data
Data that can take any value within an interval.

3. Discrete and Continuous Data in Practice

Determine if each situation is discrete or continuous.

Video Transcription
[MUSIC PLAYING] Now, let's take a look at a few examples and determine if a situation is discrete, or
continuous. The time it takes to complete a race-- is this discrete or continuous? The time to complete a
race or any task is continuous data. Time can take on any value. You can measure the time it takes to
finish a race in hours, minutes, seconds, even fractions of a second.

The number of pairs of shoes you own-- discrete or continuous? This is discrete. You can't have half a
pair. OK, I suppose if you lose a shoe, you can have half a pair. But then, it's no longer a pair. Am I right?
Whatever the case, your number of pairs of shoes is not any number within a certain range. Your number
of shoes is a specific whole number, which is therefore a discrete number.

The time it takes for a light bulb to burn out-- is this a discrete or continuous number? This would be
continuous data. It could take any length of time for your light bulb to burn out, from 0 seconds up to
many years. How about the number of green chocolate candies in a bag? Is that discrete or continuous?
If you said discrete, you're correct. You typically would be dealing with only whole number values,
unless the poor bag of candy is crushed.

Barometric pressure-- is this discrete or continuous? You should have said that barometric pressure is
continuous because it can take any value within a certain range, usually somewhere around 30 inches
hg.

 TRY IT

Question: Barometric pressure. Discrete or is it continuous?

Answer: You should have said that barometric pressure is continuous because it can take any value
within a certain range, usually somewhere around 30.

Question: The number of pairs of shoes someone owns?

Answer: Discrete. You can't have half a pair--I suppose you can half a pair of shoes if you've lost one--but
you can't have any number of pairs of shoes within a certain range. Typically, it takes only whole number
values.

Question: What about the time for a light bulb to burn out?
Answer: That's continuous. It could take any length of time from zero seconds all the way to a couple of
years.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 14
Question: Number of green M&Ms in a bag?
Answer: Discrete. Typically, again, we're dealing only with whole number values.

 SUMMARY

Quantitative data can be broken down into two subcategories. It can be called continuous. It can take
on a range of values, or if it can only take certain values, we call it discrete. And every quantitative
data measurement that we get is either going to be continuous or discrete. And the terms we used
are continuous data, which can take on any number in a range; and discrete data, which can only take
on certain values. This tutorial also put discrete and continuous data in practice to allow for some
application!

Good Luck!

Source: This work is adapted from Sophia author Jonathan Osters.

 TERMS TO KNOW

Continuous Data
Data that can take any value within an interval.

Discrete Data
Data that can only take so many different values.

 WHAT'S COVERED

In this tutorial, you're going to learn all about sampling, focusing on:

1. Population and Census

2. Sample

1. Population and Census

Sampling always starts with a population. Population is the complete set of all the things that are being
studied.

Typically, we use the population of the United States, the population of the world, or the population of a state
to be the population that we wish to generalize our findings to since examining all members of a population
may not be feasible. This method, examining all members, is called a census. Hopefully, a group of people can
represent the population.

Since the group of people from the United States seems like too big of an example, a smaller example of
billiard balls will be demonstrated. As you see in the image below, the complete set of things in this particular
example are the 15 billiard balls on a pool table.

With a group so small, it's possible to take all of them and define some attribute of them like color, or weight,
or what have you--whether they're striped or solid, there are lots of different ways that you could describe
each pool ball. And it's easy enough just to take the entire population and examine all of them.

Census
Using the entire population to obtain data.

Population
The entire set of individuals from which to sample.

2. Sample
When you think about the United States example, you can see that it's not really always feasible. Suppose
your population is a large group of people, much larger than 15 people. It's kind of a big group, and it might be
hard to get answers from everybody.

What you might choose to do is take a small subset of those individuals and make a sample. In this case,
perhaps seven of these many individuals in the population were chosen. A sample is a subset of the
population and you would obtain data from that subset and leave everyone else out.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 17
From that sample, you would obtain your data and calculate your statistics. The idea is hopefully you would
like the sample to be a small version of the population. A microcosm of the population, such that when you
calculate your statistics from the data we obtain from the sample, it's about the same as what we would have
gotten if we had measured the population directly. That's what we mean when we say that we want the
sample to be a representative sample of the population.

There are certain ways that you can guarantee that a sample will be representative. One way is to take the
entire population and put them in a hat.

Now again, this is a lot easier with billiard balls then it is with people. But imagine putting all the billiard balls
into the hat.

Let’s say you shake up the hat, and take out a sample of five.

There are certain ways to guarantee that you won't get a representative population. Suppose I specifically
cherry picked only solid colored billiard balls. Well, that wouldn't be very representative of the population of
15.

⚙ THINK ABOUT IT

Is it possible that when you take that hat and pull out five billiard balls that all five of them are solid? Sure,
that's possible, it's just not all that likely. If you cherry pick, that's not a good idea because you're getting
something that's specifically not represented.

Sample/Sampling
A subset of the population. There are many ways to select a sample.

Representative Sample
A sample that accurately reflects the population.

 SUMMARY

A census is a way of collecting data that uses everybody. And a sample only uses some. To
generalize the findings from the sample to the population at large, it has to be representative of your
population at large. Once again, the terms that we've described in this tutorial are population, census,
the noun sample, and the verb sampling, and the idea that a sample should be representative.

Good luck!

Source: this work is adapted from sophia author jonathan osters.

 TERMS TO KNOW

Census
Using the entire population to obtain data

Population
The entire set of individuals from which to sample

Representative Sample
A sample that accurately reflects the population

Sample/Sampling
A subset of the population. There are many ways to select a sample.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 19
Random & Probability Sampling
by Sophia

 WHAT'S COVERED

This tutorial covers random and probability sampling methods, focusing on:

1. Random Sample
2. Probability

1. Random Sample
The term “random” is used a lot in everyday speech, but what does it mean when it comes to statistics? In
statistics, random refers to something that is unpredictable and does not have a recognizable pattern.

With a random sample, every member of the population has the same chance of getting selected. This is the
best way to get a representative sample. Recall that a representative sample is when the population and the
sample have the same set of relevant characters.

If you want a random sample, you would need to select participants in such a way that every member of that
population has an equal chance of being selected for the sample. This is also known as random selection.

You need to come up with a method to achieve a random sample, and you can do that with aprobability
sampling plan. This plan must be made first before a random sample can be taken. You can also “weight”
certain people so that they might be more likely to be selected for the sample, too.

IN CONTEXT
What does a random sample look like in context? Suppose there are 15 billiard balls from a pool
table:

You place them all in a hat, and you shake the hat, and voila, here's a sample of five.

We got ball numbers 1, 5, 7, 10, and 14.

Suppose you place the billiard balls back in the hat and shake the hat for a second time.

Shake #2

This is another sample of five and is not that different than the previous example. If you conducted
the same hat trick over and over again, they would all have an equal chance of being pulled.

Let's shake the hat for a third time.

Shake #3

What happened here was we got balls 9, 11, 12, 13, and 14--all of which happened to be striped
billiard balls. No solids. If you only had access to this information, you might be led to believe that all
the balls in the hat were striped, which wouldn't be the case.

This may seem odd, but it can certainly happen even though you selected these randomly--you did a
probability sampling plan. The reason being, this sample of five is just as likely as any other sample
of five to be chosen.

Random Sample
A sample that has been selected in a manner where every member of the population has some
predetermined chance of being selected for the sample.

Random Selection
The method of obtaining a random sample.

Probability Sampling Plan

The way to collect a random sample that guarantees a certain likelihood for each member of the
population to be selected.
Might you get something that's unrepresentative? Yes. But the vast majority of the time, it will be
representative.

 SUMMARY

The best method for selecting a sample that's representative is a random sample and a probability
sampling plan. Now, this won't always get you a representative sample. But often, you will get one
when you do random samples.

Good luck!

Source: This work is adapted from sophia author jonathan osters.

 TERMS TO KNOW

Probability Sampling Plan

The way to collect a random sample that guarantees a certain likelihood for each member of the
population to be selected

Random Sample
A sample that has been selected in a manner where every member of the population has some
predetermined chance of being selected for the sample

Random Selection
The method of obtaining a random sample

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 22
Simple Random and Systematic Random
Sampling
by Sophia

 WHAT'S COVERED

This lesson will explain how to ensure everyone in the population has an equal chance of
participating in a sample, specifically focusing on:

1. Simple Random Sample

a. Random Number Generator
b. Random Number Table
2. Systematic Random Samples

1. Simple Random Sample

A Simple Random Sample (SRS) is a sampling method that not only ensures that everyone in the population
has an equal chance of being in the sample, but also that every sample is equally likely to be the sample that's
being selected.

If you’ve ever experienced a raffle situation, you’ve experienced a simple random sample. What generally
happens at these events is that someone removes tickets from the raffle puts them into a bucket.

The tickets are mixed up in the bucket, and one ticket is pulled out. The owner of that ticket usually wins
some kind of fantastic prize. Now, being in a simple random sample is pretty much the same thing. The only
difference is that instead of winning the prize, you get to be part of the sample and that's your prize.

IN CONTEXT
Suppose you take billiard balls from a pool table and put those all into a hat.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 23
Next, shake it up, and pour out five billiard balls. Do this for two shakes.

Shake #1 Shake #2

You may have noticed that the solid, yellow “1” ball was in both of these first two examples.
However, it doesn't mean it's any more likely to be selected than any of the other balls. It's the same
likelihood. Any sample of five, the first sample or second sample of five, were equally likely samples
of five

Let's shake the hat for a third time.

Shake #3

Now, notice, all five of these were striped billiard balls, not one solid ball in the bunch. Is that
unusual? Sure, it's kind of unusual to happen. Unusual samples have an equal likelihood to happen
too. Just because they're strange and don't happen very often doesn't mean they can't happen. In
fact, they have the same likelihood as any other selection of five.

Therefore, knowing how to take a Simple Random Sample, abbreviated SRS, is important because most
inferences about the population that we do assume that we collected data in this way. So names in a hat are
fine. In our case, raffle tickets in a bucket, or billiard balls in a hat...that's all fine.

 TERM TO KNOW

Simple Random Sample (SRS)

A method of selection that guarantees that every sample of a certain size has an equal chance of
being the selected sample

1a. Random Number Generator

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 24
However, what about the situations where we don’t have the manpower to pull numbers or names from a hat?
There are two other ways to take a simple random sample. One way is using a random number generator
and the other is a random number table. First, we are going to discuss the random number generator.

 EXAMPLE Suppose that we want to take a sample of 100 individuals from a population of 2,000
people. Below you will see some of those individuals lined up, and you can imagine that individuals 10
through 1,995 are somewhere in the middle. Each is assigned a unique number so no one can have
the same number as anybody else.

Using technology such as a website, you can search "random number generator" on the internet, and
websites will come up. Or, you can use a calculator. This particular model of a calculator is the Texas
Instruments calculator:

“RandInt” indicates random integer”--an integer is a whole number-- from 0 to 1. And so it picks either 0 or 1.
When you put in the third number, it's asking how many of them do you want? In this case, you entered five.
Now, you don't want numbers between 0 and 1 in this case, and you don't want five of them. You want
numbers between 0 and 2,000., and you want 100 of them. Now, why was 150 written when you only want
100 numbers?

You can’t select one person twice, so repeats must be ignored. It's incredibly likely that if you had just written

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 25
100 instead of 150, there would have been at least one repeat in the bunch.

Finally, you're going to select the individuals that correspond to those first 100 different numbers that were
picked.

So, person number 8, and the person that corresponds to 1,119, and the person who corresponds to 1,996 are
a few that are chosen. Now, notice that the person corresponding to 8 was chosen again--you can see that it’s
listed twice in the list. You're not going to select that person twice because they've already been selected
once, so they are crossed out. This is the reason 150 numbers were created, so you have room to cross
repeats out.

 TERM TO KNOW

Random Number Generator

A method of collecting a sample that utilizes technology to select random numbers corresponding to
individuals in the population.

1b. Random Number Table

Using a random number table is basically the same idea, though it is a little bit more cumbersome. For
starters, it’s generally used if no technology is available. You will soon notice this is a long process and more
time-consuming than using a random number generator. A random number generator typically goes faster.
Each individual is assigned a unique number, just like the random number generator; however, each
member's number must have the same number of digits.

The same method as the random number generator cannot be used, because the number 2,000 has four
digits, and the number 1 only has one digit. All of these must have the same number of digits, so instead of 1,
it's 0001. Instead of 2, it's 0002, and so forth, all the way up to 2,000. A table of random digits can be found in
a textbook or online. Four numbers will be selected at a time because each individual has four numbers.

 EXAMPLE Suppose the first four numbers found were 1-9-2-2. That corresponds to someone in the
list. There is someone who is 1,922 so that individual will be selected for the sample. It’s circled in
green below since a person corresponds to that number. The next number found is 3-9-5-0. No one
on the list that corresponds to the number 3,950, so it is ignored. The next number, 3-4-0-5, does not
correspond to an individual either, so that is ignored as well.

You'll notice that all numbers circled in red are numbers that are unassigned in our list. This is going to make
this a very cumbersome process. It will go for a while until 100 individuals are obtained. Will this work? It will
work, but it might take a very long time.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 26
One of the numbers circled in green is 0001. This is the very first person on the list, and it just happens that
person 0001 will be among the sample. This individual will be selected along with everyone else whose four-
digit number was selected.

 TERM TO KNOW

Random Number Table

A method of collecting a sample to select random numbers corresponding to individuals in the
population. Each is assigned a number, which is then selected from the table.

2. Systematic Random Sampling

There is one thing to know about systematic sampling right off the bat: it is not inherently random. You have to
be very careful about this. A systematic random sample involves assigning a value, "k," to individuals within a
population. Then, you state that every “k"th individual is chosen, similar to elementary school when you
counted off by 3’s to create teams.

The value of "k" can be anything. You could choose every second individual, in which case all the green
people are in, and all these black stick figures are out. Or, you could do every third person, where one person
is in and then skip two; then the fourth person is in and skip two. Or, you could go every fourth person.

Often people prefer systematic samples to simple random samples because systematic samples are so much
easier to take. It's easier than getting a whole list of people and assigning everyone a number or putting all
the people's names in a hat. It's easier to take every fifth person or whatever you decide "k" should be.

 HINT

The nice thing about a systematic sample is that it can be tailored to fit your sample size. If you wanted a
sample of 25 from 500 individuals, you could sample every 20th person since 500 divided by 25 equals
20. So you would obtain your sample of 25 by sampling every 20th person.

IN CONTEXT

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 27
Suppose that you have 20 students in a class, and they're in rows, assigned to their desks randomly.
If that were the case, you could count off every fourth student and have five students go up to the
chalkboard to do a homework problem on the chalkboard.

1 2 3 ✘ 5

6 7 ✘ 9 10

11 ✘ 13 14 15

✘ 17 18 19 ✘

So, person one, two, and three don't have to do it. Person number four heads up to the chalkboard
to work on a problem. Five, six, and seven don't have to do it, but number eight does. You can see
the checkmarks to indicate the pattern and who needs to go up to the chalkboard.

What if they were alphabetized instead of randomly assigned?

Adamson
Abbott Acosta Adams Adler
✘

Frye
Anderson Bueller Grey Jones
✘

Morris
McClurg Peterson Pickett Rooney
✘

Ruck Ward
Sara Sheen Stein
✘ ✘

By selecting say, Adamson, you automatically know who all the rest of the people are going to be.
Since Adler is right next to Adamson, you know that Adler won't get chosen. Nor will Anderson or
Bueller, but Frye will.

If these students were randomly assigned to the seats, picking Adamson would not predetermine
who all the other people were going to be selected for the sample, but having them alphabetized
impacts the random selection process.

 TERM TO KNOW

Systematic Random Sample

A sampling method where every "k"th individual is selected for the sample (e.g. every 2nd, 4th, 20th
individual).

 SUMMARY

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 28
A simple random sample is the ideal sampling method if your goal is to obtain a representative
sample. Sometimes, with big populations, it's not feasible to assign everyone a number or put
everything into a hat so other sampling methods may be used. The random number generator is
typically used with a calculator and is a fast way to calculate random “integers” without needing to
assign same-number digits to each individual. The random number table is a more time-consuming
method and generally used when technology is not available. A systematic sample can be similarly
valid, and it is much easier to perform. It involves taking every "k"-th individual--however, the
population must be randomly sorted before the systematic selection. Otherwise, it won't be
considered random.

Good luck!

Source: THIS WORK IS ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS.

 TERMS TO KNOW

Random Number Generator

A method of collecting a sample that utilizes technology to select random numbers corresponding to
individuals in the population

Random Number Table

A method of collecting a sample to select random numbers corresponding to individuals in the
population. Each individual is assigned a number, which are then selected from the table.

Simple Random Sample

A method of selection that guarantees that every sample of a certain size has an equal chance of being
the selected sample

Systematic Random Sample

A sampling method where every "k"th individual is selected for the sample (e.g. every 2nd, 4th, 20th
individual)

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 29
Stratified Random and Cluster Sampling
by Sophia

 WHAT'S COVERED

This tutorial will cover the topic of stratified random sampling, which is a random sampling procedure
that subdivides the population into groups. In addition, we will introduce cluster samples. This lesson
will focus on:

1. Stratified Random Samples

2. Cluster Samples
3. Real-World Comparison

1. Stratified Random Samples

Suppose a high school has just adopted a new, healthy lunch provider, and they would like to solicit student
opinion on the healthy lunch options. The school has a total of 420 students: 100 freshmen, 110 sophomores,
120 juniors, and 90 seniors.

How would a simple random sample look?

For a simple random sample of 42 students, think of ways that 42 students could be chosen, each having an
equal chance of being selected. First, assign each student a unique number 1 to 420 (total number of
students). Once this is done, you could:

Use a random number generator to select 42 numbers, ignoring repeats. The students who
corresponded to those numbers will be surveyed about the school's new, healthy options.
Put the 420 student names in a hat and draw out 42.

Now, is there a way that the study might improve and guarantee an accurate cross-section of students
between the grades? After all, freshman might feel differently about the healthy options than seniors so it will
be important to have individuals from each grade weigh in on the lunch options.

This can be done with a stratified random sample. Stratified random sampling is a method where the
population is subdivided into groups called strata. Strata are groups with homogeneous characteristic(s). They
are separated by the characteristic that we think might affect the overall sample. This is to avoid having too
many of the sample having this one characteristic that may affect the sample.

In the above example, it would look something like this: since 42 is 10% of the school's population, your survey
should be 10% of each grade.

10% of the freshmen class of 100 is 10, so you would want to randomly select ten individuals from the
freshman class to participate.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 30
10% of the sophomore class of 110 is 11, so you would want to randomly select 11 individuals from the
sophomore class to participate.
10% of the junior class of 120 is 12, so you would want to randomly select 12 individuals from the junior
class to participate.
10% of the senior class of 90 is 9, so you would want to randomly select nine individuals from the senior
class to participate.

Once the groups are in place, a simple random sample is carried out within each stratum, like putting names
in a hat or assigning everyone a unique number and randomly selecting numbers. You can have as many
strata as you please, but they must be roughly homogeneous.

Video Transcription
[MUSIC PLAYING] Pretend you've subdivided billiard balls into low, middle, and high numbers. To take a
stratified random sample of the 15, this is what you do. Put all the low numbered balls in hat one. Put all
the middle numbered balls in hat two. And finally, put all the high numbered balls in hat three.

At that point, you'd randomly select two from each hat. The result would give you a stratified random
sample of six billiard balls. You're guaranteed to have exactly two low numbers, exactly two middle
numbers, and exactly two high numbers.

 TERM TO KNOW

Stratified Random Sample

A random sampling method where individuals are separated into homogeneous groups, then simple
random samples are taken within each group.

Stratum/Strata
The homogeneous groups in a stratified random sample. All individuals in each stratum have
something in common, and we would like to see how that affects the outcome of the sample.

2. Cluster Samples
When using a cluster sample, the population is divided into groups. These groups are calledclusters. It’s
important to note that these groups are natural groupings. They don't necessarily have anything in common,
other than say, geography, typically. Therefore, we're going to take a random sample of clusters instead of a
random sample of individuals.

Each individual in the cluster is going to be part of the sample if we select that cluster. So unlike the groups in
a stratified random sample, the groups in a cluster sample aren't based on a characteristic or variable. The
individuals in the cluster just happen to be near each other.

IN CONTEXT

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 31
Suppose you work at a potato chip company and it’s your job to implement some quality control in
the manufacturing department. Maybe you stand at the start of the assembly line and take a simple
random sample of individual chips. That would work just fine.

However, it might be easier for you to sample some bags of chips. The bags of chips are clusters.
You would then take a bag of chips off the assembly line and sample every chip in that bag for
quality control. That’s cluster sampling.

Similar to every sampling method, cluster sampling has pros and cons.

Advantages and Disadvantages for Cluster Sampling

Easier than a simple random sample, and often it doesn't cost as much
Advantages
Typically gives similar results because the clusters are fairly heterogeneous

Risk that clusters are NOT heterogeneous--perhaps they do have some characteristic other
Disadvantages than just being geographically different from each other that might affect the sample's
findings.

 TERMS TO KNOW

Cluster Sample
A sampling method where the population is separated into groups, typically geographically, and a
random selection of clusters is made. Each individual in the cluster becomes part of the sample.

Clusters
Smaller subgroups of the population, not necessarily similar in any way besides all being together in
one place, making the individuals easier to sample together.

3. Real-World Comparison
Suppose a landlord of an apartment complex wants to know whether a new carpet he's considering is
appropriate for all the apartments in the building. Each of the four floors has eight apartments.

⚙ THINK ABOUT IT

What would a simple random sample look like? How might a cluster sample be different from a stratified
random sample?

Simple Random Sample: He could randomly select eight apartments from the building.
Stratified Random Sample: He could randomly select two apartments per floor.
Cluster Sample: He could take a spinner like the one shown below and spin it.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 32
Suppose it landed on three. That means that every apartment on the third floor would receive carpeting.
He doesn't have to have the carpet installers going to all these different rooms on all these different
floors. He can simply instruct everyone to go up to the third floor and install carpet in every room on that
floor, which would be far easier for him and just as cost-effective.
But what if all the floors were NOT heterogeneous? What if apartments on the third floor allowed pets? The
carpet might not hold up as well. That’s one of the disadvantages of cluster sampling in action. But typically,
the clusters are fairly representative and very similar to a simple random sample.

 SUMMARY

In a stratified random sample, the population is broken down into homogeneous groups called
"strata." The reason for this is to separate an otherwise homogeneous group that exhibits
characteristics that may misrepresent the population. The idea is to force them into groups and then
take a simple random sample within each of the strata. Cluster sampling, on the other hand, is done
by taking naturally-occurring--typically geographically--similar groups and taking a simple random
sample of the clusters. Then, each member of the cluster becomes part of the sample. A couple of
advantages of cluster samples are that they are more cost effective, and usually achieve the same
results as a simple random sample. The disadvantage is that sometimes the cluster may not be
heterogeneous, as seen in the landlord example with pets allowed on carpet.

Good luck!

 TERMS TO KNOW

Clusters
Smaller subgroups of the population, not necessarily similar in any way besides all being together in one
place, making the individuals easier to sample together.

Stratified Random Sample

A random sampling method where individuals are separated into homogenous groups, then simple
random samples are taken within each group.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 33
Stratum/Strata
The homogenous groups in a stratified random sample. All individuals in each stratum have something
in common, and we would like to see how that affects the outcome of the sample.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 34
Multi-Stage Sampling
by Sophia

 WHAT'S COVERED

This tutorial will introduce multi-stage sampling, focusing specifically on:

1. Comparing Sampling Methods

2. Multi-Stage Sampling

1. Comparing Sampling Methods

Suppose that you wanted to sample from the entire United States as a whole.

Can you perform a simple random sample (SRS)?

You'd have to somehow account for every person in the United States, and maybe assign them a number,
and pull numbers out of a hat, or use some kind of random sampling procedure. This would be too difficult to
assign to everyone.

Can you perform a stratified random sample?

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 35
Strata, in this case, are still too big. You might take a few people from Maine, and a few people from
Minnesota, and a few people from North Dakota, etc., and it would still be too large. Plus, it really wouldn't be
cost effective, commuting to all these different places.

Can you perform a cluster sample of states?

If you identified states as clusters, you would randomly select some of the clusters and then sample everyone
within that cluster. You'd be sampling entire states. For example, everyone in North Carolina would be in the
sample if you select that state as a cluster, which simply isn't feasible.

Therefore, none of those really make any sense. The way out of the box here is a multi-stage design.

2. Multi-Stage Sampling
Multi-stage sampling is a common sampling procedure utilized when the population is very, very large. With
multi-stage sampling, you continue zooming in from larger areas to smaller and smaller areas until you can
find a small enough sample of the people you need.

To perform a multi-stage sampling, first select clusters, then take a simple random sample from each cluster.

Let's take a look at an example:

Video Transcription
[MUSIC PLAYING] Suppose you want to sample the United States as a whole. Because of geographic
simplicity, states make the most sense as clusters. If every state needs to be represented, a stratified
random sample should be performed. However, it's not realistic or feasible to sample everyone within
each state. So in this instance, you can randomly select five states to make up the clusters for your multi-
stage sample. Of these five states, you pick one to begin the process.

Let's say you start with Minnesota. And because it's equally unrealistic to sample everyone in a state,
you continue to narrow down your population with a random selection of counties. You once again
select five. If you were able to sample everyone in these counties, you can stop. But if you still need a
smaller sample size, randomly choose just one, such as Carver County. Then you can randomly select
three towns within that county.

Again, if those are small enough units, you can stop. However, if the sample size is still too large,
continue to narrow it down by selecting just one town, like Chaska. Within Chaska, for example, you can
sample some neighborhoods.

Typically, by the time you get to the neighborhood level, it's easy enough to walk around and get almost
everybody within that neighborhood. This method of drilling down from state to county to town to
neighborhood would give you a multi-stage sample of your first cluster, Minnesota. Then it's on to the
next cluster, where you would repeat the process with the remaining four to achieve a multi-stage
sampling of the United States.

Step 1: States
When sampling the United States as a whole, states make the most sense as clusters because of
geographic simplicity. It’s not realistic or feasible to sample everyone within a state, so randomly select
just five states: California, Tennessee, Minnesota, Massachusetts, and Oklahoma. Pick one state and start
the process.
Step 2: Counties
It is equally unrealistic to sample everyone in Minnesota, so you can narrow your sample by randomly
select counties. Perhaps you select Carver County, Marshall County, and maybe a few other counties. If
that's a small enough basis for you to get everyone within the county, then you can stop.
Step 3: Towns
If you need yet a smaller sample size, you can choose just one county, like Carver County, and sample
towns within that county. Perhaps you randomly select three of those towns: Chanhassen, Waconia, and
Chaska. If those are small enough units, then you can stop.
Step 4: Neighborhoods
However, if the sample size is still too large, you can continue to narrow it down. Within Chaska, for
example, you can sample some neighborhoods. Typically by the time you get to neighborhoods within a
town, it's easy enough to walk around the neighborhood and get almost everybody within that
neighborhood.
Now you can move onto the next cluster where you would repeat this process with the remaining four states.

 TERM TO KNOW

Multi-Stage Sampling
A sampling design which combines elements of cluster sampling, stratified random sampling, and
simple random sampling. It "zooms in" on smaller areas to sample so that sampling becomes more
feasible.

 SUMMARY

Multi-stage sampling is used when the population is so big and the groups, strata or clusters so large
that it makes more sense to zoom in and take small groups. You begin with certain clusters, and then
you sample within those clusters instead of taking the full cluster. Therefore, multi-stage sampling
combines elements of cluster sampling, stratified designs, and simple random designs, which were
contrasted within this tutorial, though you may recall, none of these were feasible when attempting
the sample of the United States.

Good luck!

Source: SOURCE: THIS WORK IS ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS. MN MAP:
HTTPS://EN.WIKIPEDIA.ORG/WIKI/LIST_OF_COUNTIES_IN_... CARVER COUNTY:
HTTPS://EN.WIKIPEDIA.ORG/WIKI/LIST_OF_COUNTIES_IN_...

 TERMS TO KNOW

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 37
Multi-Stage Sampling
A sampling design which combines elements of cluster sampling, stratified random sampling, and simple
random sampling. It "zooms in" on smaller areas to sample so that sampling becomes more feasible.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 38
Observational Studies and Experiments
by Sophia

 WHAT'S COVERED

This tutorial will explore observational studies and how they are conducted. We will also cover
experiments, which are a little different than observational studies, through the exploration of:

1. Observational Studies
2. Types of Observational Studies
3. Experiments
4. Experiments vs. Observational Studies

1. Observational Studies
An observational study is a type of study where the researcher can observe but does not administer any
treatment. Therefore, whatever would normally happen, the researcher has to allow it to happen.

Researchers can't change anything about the people or subjects they are studying. The researcher can
record the variables of interest, but again, can't affect the study. People have to be allowed to do whatever it
is they were going to do without interruption.

 TERM TO KNOW

Observational Study
A type of study where researchers can observe the participants, but not affect the behavior or
outcomes in any way.

2. Types of Observational Studies

There are two types of observational studies:

Retrospective Study: Researchers look to the past to see what has already happened; also known as a
case-control study.

 EXAMPLE Consider observing people who are sick--those are called the cases--versus people
that aren't sick, which are the controls. Then, you look back to see what similarities the cases have in
common and what similarities the controls have in common.

Prospective Study: Researchers select individuals to participate and record what happens as it happens;
also known as a longitudinal study.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 39
 EXAMPLE Individuals are engaging in activities like smoking or jogging. You record what happens
as it happens, as opposed to trying to look back and figure it out.

IN CONTEXT
The year is 1929 and a cancer doctor has a suspicion that smoking may cause cancer. His cancer
patients become his subjects, or participants, in his study. He asks his subjects, “Did you happen to
smoke before you got cancer?” What he found was an overwhelming majority of his cancer patients
did, in fact, smoke. Therefore, this doctor was the very first person to suggest a link between
smoking and cancer.

That inspired some new studies, one of which began in 1934. It dealt with several thousand doctors,
so it was a physician’s smoking study. The reason doctors were chosen is that doctors are usually
very diligent about following protocols, meaning that those who smoked would likely continue to
smoke, and those who didn't smoke would likely continue not smoking. Also, doctors typically
wouldn't drop out of a study. Notice in the image below, how some of these physicians smoked, and
some of them did not.

They did the study, and some of the doctors got cancer. Now, not every doctor who smoked ended
up getting cancer, and not every person who got cancer was a smoker. However, what they found
was that the vast majority of the time, it was the doctors who smoked that got cancer.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 40
This study was conducted over a long period of time--a 20-year study. At its conclusion, this was the
most convincing evidence that smoking had an effect on cancer. This was an example of a
prospective study because it started with the doctors and followed them through to 1954.

It is important to note, however, that neither of these types of studies, prospective or a retrospective, can
actually prove a cause-and-effect relationship. The only thing that can prove a cause-and-effect relationship
between two variables is an experiment.

 TERMS TO KNOW

Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand how they
became the way they are in the present.

Prospective Study
A study that begins by selecting participants, tracking them, and keeping data on the subjects as they
go into the future.

Subjects/Participants
The people or things being examined in an observational study.

3. Experiments
An experiment is a different type of study than an observational study. The differences will be covered in
detail shortly, but essentially, the researchers are allowed to impose treatments on the participants.
Treatments are administered and response to those treatments is measured. Because the researchers are the
ones implementing the treatments and measuring the response, a cause-and-effect relationship between
variables can be determined.

When discussing experiments, there is some very common terminology that you should be aware of. For
example, as mentioned in the section above, subjects and participants are used interchangeably and describe
people involved in an experiment. If animals or things are used in an experiment, they are referred to as
experimental units. While it may seem a bit impersonal, it is universal terminology in the field of experiments.

 TERMS TO KNOW

Experiment
A type of study where researchers impose treatments on the participants or experimental units.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 41
Experimental Unit
An animal or thing involved in an experiment.

4. Experiments vs. Observational Studies

In an observational study, the researcher observes the individuals but does not administer treatment. The
researcher simply has to allow what would normally happen to happen. Again, they can record variables of
interest, but not affect it in any way. The researcher is not necessarily an active participant in the study, other
than observing and recording.

An experiment, on the other hand, is far more active on the part of the researcher. The researcher is creating
the differences between the two groups, then determining whether or not there is a cause-and-effect
relationship.

If you have a study that you'd like to do, but you can't perform it due to ethical or practical concerns, or it
takes too much time or money, you can avoid those concerns or circumvent them by doing an observational
study.

⚙ THINK ABOUT IT

When trying to determine if cigarette smoking causes cancer, several observational studies have been
conducted, but never a true experiment. Why would that be?

Well, it would be unethical to break people into groups and administer cigarettes to a group of people
when trying to determine if it causes terminal illness. The same applies to alcohol consumption.

⭐ BIG IDEA

There are certain instances in which an observational study will be preferred over an experiment due to
factors like time, money, and privacy, where it is unlikely people will divulge that type of information

 SUMMARY

An observational study is a type of study where the researcher can observe but not influence the
behavior of the participants, or subjects. A retrospective study involves looking back at behavior,
while a prospective study involves gathering your participants and following them along as they live
their lives. An observational study, though, cannot prove a cause-and-effect relationship

Conversely, in an experiment, a researcher can directly influence the subjects by applying treatments.
Because the researchers are the ones implementing the treatments and measuring the response, a
cause-and-effect relationship between variables can be determined. Terminology such as subjects
and participants is important to know since it identifies individuals directly involved in the experiment.
Animals may be directly involved in an experiment, but they are referred to as experimental units
rather than subjects or participants.

Sometimes an experiment may be unethical, expensive, or too lengthy. In those cases, observational
studies may be used, which allow a researcher to study occurrences in a natural setting without

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 42
administering treatment of any kind.

Good luck!

 TERMS TO KNOW

Experiment
A type of study where researchers impose treatments on the participants or experimental units.

Experimental Unit
An animal or thing involved in an experiment.

Observational Study
A type of study where researchers can observe the participants, but not affect the behavior or outcomes
in any way.

Prospective Study
A study that begins by selecting participants, then tracks them and keeps data on the subjects as they
go into the future.

Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand how they
became the way they are in the present.

Subjects/Participants
The people or things being examined in an observational study.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 43
Prospective and Retrospective Studies
by Sophia

 WHAT'S COVERED

This tutorial will explore observational studies and how they are conducted. We will also cover
experiments, which are a little different than observational studies, through the exploration of:

1. Observational Studies
2. Types of Observational Studies
a. Prospective Study
b. Retrospective Study

 TERM TO KNOW

Observational Study
A type of study where researchers can observe the participants, but not affect the behavior or
outcomes in any way.

2. Types of Observational Studies

There are two types of observational studies:

2a. Retrospective Study

Retrospective Study, also known as a case-control study. Researchers look to the past to see what has
already happened.

It can be similar to a matched-pair design in an experiment, but in this case, the researchers are not giving a
treatment or doing anything to affect the people.

 EXAMPLE In a study, suppose you take a pair of participants, who are similar across most
variables except for one major difference -- one participant has a disease, "the case", and one

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 44
participant who does not have a disease, "the control". Because the participants are so similar, you are
focusing on just that disease and seeing how it affects the participants or what causes the disease.

This is considered retrospective because it looks in the past. You ask the participants to recall past
events or use information about their past to determine what risk factors there are for the disease.
 TERM TO KNOW

Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand how they
became the way they are in the present.

2b. Prospective Study

Prospective Study, also known as a longitudinal study, occurs over a long period of time. It observes the
same set of people and follows the same variables over that chunk of time. It can be as many as several
decades. While this type of study is not quick to do, it provides a lot of data and many different researchers
can use this information in a variety of ways.

 EXAMPLE The Framingham Heart Study started in 1948 and is still going on today. 5,209 healthy
adults from Framingham enrolled in this study. Researchers collected a variety of information about the
subjects, including social networks, eating habits, exercise habits, and several markers for heart
health.

Over a thousand different research papers have been written using this information. Some of these
papers have proven that obesity and smoking increase the risk of heart failure. Other papers look at
how the social networks tie to obesity risks.
 TERM TO KNOW

Prospective Study
A study that begins by selecting participants, tracking them, and keeping data on the subjects as they
go into the future.

Subjects/Participants
The people or things being examined in an observational study.

 SUMMARY

Good luck!

 TERMS TO KNOW

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 45
Observational Study
A type of study where researchers can observe the participants, but not affect the behavior or outcomes
in any way.

Prospective Study
A study that begins by selecting participants, then tracks them and keeps data on the subjects as they
go into the future.

Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand how they
became the way they are in the present.

Subjects/Participants
The people or things being examined in an observational study.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 46
Experimental Design
by Sophia

 WHAT'S COVERED

In this tutorial, you're going to learn about the principles of experimental design.

1. Components of Experimental Design

a. Control
b. Randomization
c. Replication

1. Components of Experimental Design

Experimental design refers to how an experiment is carried out. Many experimental designs include a control
group and a treatment group to compare effects of treatment (exercise, drug, video watching, etc.). You can
have a good design of an experiment or a poor design of an experiment.

Good experimental design will have these three components:

1. Control
2. Randomization
3. Replication

 TERMS TO KNOW

Experimental Design
The way in which an experiment is carried out. A good design has key elements of randomization,
replication, and control.

Treatment
Something the researchers administer to the subjects or experimental units.

1a. Control
Control means holding everything else besides what you're trying to measure constant. The purpose is to
determine whether or not your treatment is effective. In other words, if there is an observable difference
between groups, is it due to the treatments or due to a confounding variable? It is important to control all other
variables to help limit confounding.
Source: This work is adapted from Sophia author Jonathan Osters.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 47
Video Transcription
[MUSIC PLAYING] Hello. Let's take a look at a real-life example using experimental design. Suppose a
farmer wants to try a new fertilizer in the fields. The three components of experimental design can be
used to determine if the new fertilizer is better than the old one. Here's how it would work.

The first thing the farmer would do is determine the control by selecting 10 fields with similar soil
nutrients, sunlight, and water. These are all variables that could affect the crop growth. The farmer would
then apply the old fertilizer to five fields and the new fertilizer to the other five. By keeping the control
elements consistent across the 10 fields, the differences between them can be isolated and attributed to
either the old or the new fertilizer.

Next, the farmer takes randomization into account by randomly assigning which five fields will get the
new fertilizer. While the fields selected were as similar as possible, there may be an unknown variable
that was not accounted for. Perhaps some fields had moles underground. And that would affect how the
crops grow.

By randomly assigning treatments, the farmer should get some fields with moles using the new fertilizer
and some fields with moles using the old fertilizer. Randomization smooths out those effects that
unknown variables might bring into the equation.

Lastly, the farmer understands the significance of repeated results rather than a one-off result. Say the
farmer was only able to find two fields similar to each other and randomly assigned one for the new
fertilizer and one for the old. It is possible in that case that the field with the old fertilizer does very well
just by random chance. This would make it seem like the new fertilizer is not effective when perhaps it is.

Or the opposite could happen where it seems like the new fertilizer is effective when it's not. So it would
always be better to randomly assign 10 fields as the farmer is more likely to find valid trends among 10
fields than two. Thanks for watching. And see you next time.

IN CONTEXT
Suppose you are a farmer and you want to try a new fertilizer in your field. One thing you could do is
choose ten fields with similar soil nutrients, sunlight, and water--all variables that could affect the
crop growth.

You could then apply the old fertilizer to five fields and the new fertilizer to the other five. By keeping
all the other variables--soil nutrients, sunlight, water--consistent, the differences between the fields
can be isolated and attributed to the old fertilizer or the new fertilizer.

Does the new fertilizer work? Is it effective? This is the idea behind controlling for all of these other
variables.

 TERM TO KNOW

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 48
Control
The principle of experimental design that requires that other variables which may confound the
experiment be held constant between the treatment groups so that any differences in the groups can
be attributed to the different treatments.

1b. Randomization
The second big idea of experimental design is randomization. The treatments must be assigned to the
subject using a random process, otherwise known as "randomization." The purpose of random assignment is
to try and filter out all the other sources of variation that you couldn't anticipate to control for.

 EXAMPLE Referring to the farmer example, even though you made the fields as similar as possible
with respect to water, sunlight, and soil, it's possible that there is a variable that you didn't think to
control for. Perhaps some fields had moles under the ground, and that would affect how the crops
grow. How would you know to control for moles?

By randomly assigning treatments to the fields, you can hopefully get some fields with moles in fields
with both the new and old fertilizer. Randomization smooths out those effects that other variables
might bring into the equation.

 HINT

Randomizing also helps avoid bias, because you can’t be tempted to assign treatments to the
experimental units you think might give favorable outcomes.
Randomization in an experiment does not really achieve the same purpose as a random selection in a sample.
When you do a simple random sample, the idea is to get a sample that's representative of the population. In
an experiment, the purpose of randomly assigning individuals to groups is to filter out unknown sources of
variation. The assignment in an experiment, however, is fairly similar to the way you would randomly select in
a sample.

 TERM TO KNOW

Randomization
The principle of experimental design that requires that the subjects/experimental units be assigned to
groups using some random process. This ensures that the two groups are roughly equal prior to
assigning treatments.

1c. Replication
Replication is the last key idea in experimental design, which basically states that a bigger sample is better.
Repeating the experiment on multiple subjects or experimental units is a better idea than doing a few. Why is
that?
A larger size of the experiment means it's more likely that you can find trends that perhaps you wouldn't have
found in a smaller experiment. The more you replicate, and the more experimental units you can get into your
experiment, the more likely it is that you're going to find the true trends that arise, rather than some freak
anomaly.

⚙ THINK ABOUT IT

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 49
What if the farmer could have just found two fields that were similar to each other, instead of 10 fields, and
randomly assigned one to get the new fertilizer and one to get the old. Isn't it possible in that case that
maybe the field with the old fertilizer does very well just by random chance?

This would make it seem like the new fertilizer is not effective when perhaps it is. Or the opposite could
happen, where it seems like the fertilizer is effective when it's not. It would be better to randomly assign
five plots, as opposed to just two, as it is more likely that the farmer is going to find trends among those
five plots that are more valid.
 TERM TO KNOW

Replication
Repeating the experiment on multiple subjects/experimental units. This principle of experimental
design that states that a larger experiment with more subjects/experimental units will allow us to more
clearly see differences between the treatments.

 SUMMARY

The components of an experimental design--that is, a well-designed experiment--are control,

randomization, and replication. Control helps to isolate the effects of the treatments, randomization
helps to make the groups as similar as possible and helps to avoid bias, and replication helps you to
see the differences that might not have been evident if you had used a small sample. Treatments,
again, are the things that the researchers administer to the subjects or experimental units.

Good luck!

 TERMS TO KNOW

Control
The principle of experimental design that requires that other variables which may confound the
experiment be held constant between the treatment groups, so that any differences in the groups can
be attributed to the different treatments.

Experimental Design
The way in which an experiment is carried out. A good design has key elements of randomization,
replication, and control.

Replication
Repeating the experiment on multiple subjects/experimental units. This principle of experimental design
that states that a larger experiment with more subjects/experimental units will allow us to more clearly
see differences between the treatments.

Treatment

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 50
Something the researchers administer to the subjects or experimental units.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 51
Randomized Block Design
by Sophia

 WHAT'S COVERED

This tutorial is going to teach you about a randomized block design. A randomized block design is a
little bit different than other types of designs that we've studied so this tutorial will focus on:

1. Randomized Design
2. Block Design vs. Randomized Design

1. Randomized Design
Randomized block design is a type of experiment where participants are first divided into homogenous
groups. This means that they are the same across some variable of interest, such as age, race, income,
location, job, or gender.

Once participants are in their similar group, they are randomly assigned to treatment or control within that
group.

An advantage is that it controls for variables that would otherwise be confounding. If we think that job has an
effect, we can make sure that a proportion number of people who have the same job are assigned a
treatment and control group.

A disadvantage is that it can reduce the sample size of each group.

IN CONTEXT
Suppose you are a researcher and you want to identify whether a new acid reflux drug is more
effective than the one that's currently available. You gather 500 volunteers with acid reflux, put the
number one on 250 cards, and the number two on another 250, and place all the cards in a hat. You
mix them up and have people pull out numbers.

People who received a "1" receive a new drug, and those who selected "2" received the old drug.
The image below would be your original plan, starting with all these volunteers, men and women,
and then you randomly assigned them to groups.

The problem is, what if men and women respond differently to the drug?

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 52
The better design is using a randomized block design, so you try something different. First, take
your large group and break it into smaller subgroups of just men and just women.

The image above has nine men and 14 women; you had a lot more in the old design, but now you’re
going to run the experiments essentially in parallel: one experiment for men and one experiment for
women. Now you’re going to take the men and randomly assign half of them to the treatment and
half to the control. You’re going to take half the women and assign them to the treatment and assign
them to the control, which looks like this:

Men and women receiving the treatment are in purple, and the men and women receiving the
control are in green. You might notice there are five men receiving treatment and only four receiving
control. It’s not necessary to have exactly equally sized groups.

Randomized Block Design

An experimental design where the subjects are separated into homogeneous groups, called blocks,
based on some variable we think may affect the outcome of the experiment. We then run the
experiment separately within each block.

2. Block Design vs. Randomized Design

By doing a block design rather than a completely randomized design, you can observe differences within the
group that you might have missed had you done it with a large group.

 EXAMPLE Suppose the drug was more effective for women than for men. You would see that in
this experiment here. You would see that the drug was effective for women. You would also see that it
wasn't effective for men.
One minor disadvantage to running a block design is that you do lose some of the replication that you would
have if you had run it in a large group. Sometimes you need to make your sample size a little bit bigger to
overcome that. It might be a little bit harder to draw legitimate conclusions with small groups.

 SUMMARY

In a randomized design, you saw how an experiment might miss an extra level of depth, such as men
and women reacting differently to a drug. The subjects or experimental units are grouped by some
similar characteristic that you think might affect the outcome. In this example, we used gender. When
evaluating block design vs. randomized design, you saw that with a randomized block design,
experiments run in parallel, resulting in two or more separate experiments. Then, you can compare
the treatments within each of those groups.

Good luck!

Source: This work is adapted from sophia author jonathan osters.

 TERMS TO KNOW

Randomized Block Design

An experimental design where the subjects are separated into homogenous groups, called blocks,
based on some variable we think may affect the outcome of the experiment. We then run the experiment
separately within each block.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 54
Completely Randomized Design
by Sophia

 WHAT'S COVERED

This tutorial will discuss a completely randomized design of an experiment through an exploration of:

1. Completely Randomized Design

A completely randomized design means that treatments will be randomly assigned to individual participants
in an experiment.

An advantage of this design is that it is very quick and easy to implement. You could take your group of
experimental units, assign them a number, and have the odds in the treatment group and the evens in the
control group.

However, a disadvantage of this design is that treatment and control groups could have disproportionate
representations of the population.

IN CONTEXT
Let’s say you developed a new drug to combat the symptoms of acid reflux. You want to see if it’s
more effective than what is currently available. So you get 500 volunteers and write “1” on 250 slips
of papers and “2” on the other 250 slips of paper. You put all 500 sheets of papers into a hat, mix
them up, and the volunteers retrieve one slip of paper each.

Those who selected “1” will receive the new drug and those who selected “2” receive the drug that's
currently available. This is the simplest way to assign subjects to treatments. However, it's not
necessarily ideal for every scenario.

Let’s say that the acid reflux drug is more effective for men than it is for women. It’s not really a
problem if you divide the treatment control groups like this:

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 55
In this particular case, you can see there is roughly the same amount of females and males in the
treatment group and the control group. Since there is a relative equal assignment on each side, it will
be easy to see if the new drug is more effective for males than for females. Problems occur when the
random assignment doesn't match the proportions of the population equally.

Consider for a moment if this happened:

Both groups are roughly the same size. Will you be able to determine if the treatment is more
effective for men? Why not?

If the drug were more effective for men and than women, you actually wouldn't notice because there
aren't that many men in the treatment group. The proportions are way out of whack. This sometimes
happens with random assignment.

You can see that in a completely randomized design, subjects are assigned using random processes such as
numbers in a random number generator, random number table, numbers in a hat, or names in a hat. The
problem is that it's not always the best way to assign treatments.

 TRY IT

A tire company wants to launch a new type of rubber for its bicycle tires. It has 300 bikes to use for study
and a completely randomized design is desired. What would be the first step to achieving a completely
randomized design?
They could place numbers 1-300 in a hat and have each rider pull out one number. Numbers 1-150 receive
the old rubber tires and 151-300 receive the new rubber tires. The cyclists won’t know which type of tire
they are receiving.
There is an issue with this design. Can you think what this might be?
What if bike commuters are all in the same group? They might wear their tires out faster regardless of the
new or old tires. Can you think of other aspects that may impact this experiment?

⭐ BIG IDEA

While there are better ways to gather information for an experiment, a completely randomized design is
the easiest.
 TERM TO KNOW

Completely Randomized Design

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 56
An experimental design where the assignment of subjects to treatments is done entirely at random

 SUMMARY

In a completely randomized design, which is the simplest way of assigning individuals, the subjects
are assigned using a random process like numbers in a random number generator, random number
table, numbers in a hat, names in a hat. The problem is it's not always the best way to assign
treatments.

Good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Completely Randomized Design

An experimental design where the assignment of subjects to treatments is done entirely at random.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 57
Matched-Pair Design
by Sophia

 WHAT'S COVERED

This tutorial will explain matched-pair design experiments by examining the characteristics and
examples of:

1. Matched-Pair Design
a. With Subjects in Pairs
b. With Subjects as Individuals

1. Matched-Pair Design
In a matched-pair design experiment, you form experimental units by pairing subjects that are as similar as
possible. One subject goes to the treatment group and the other subject goes to the control group. Having
very similar pairs helps control for the other variables we haven't considered.

 EXAMPLE Choosing a pair of women who are the same age, have the same exercise habits, and
live in the same area allows us to look at only the variable we are studying, while avoiding the effects
of age, exercise, and location on the outcomes of the experiment.
In matched-pair design, subjects can be assigned to the treatment and control groups in two different ways:

Subjects who are similar with respect to variables that could affect the outcome of the experiment are
paired together, and then one of them is assigned to the treatment group and one is assigned to the
control group
Each subject is assigned to both groups, where each subject acts as their own matched-pair.

 HINT

This type of design is also similar to a case-control study, but here researchers are giving a treatment
instead of just observing the participants.
 TERM TO KNOW

Matched-Pair Design
An experimental design where two subjects who are similar with respect to variables that could affect
the outcome of the experiment are paired together, then one of them is assigned to one treatment
and one is assigned to the control. This can also be done by assigning each subject to both groups,
where each subject acts as their own matched-pair.

1a. With Subjects in Pairs

Matched-pair design involves matching subjects into pairs that are as similar as possible with respect to any

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 58
variable that may affect the outcome.

Video Transcription
[MUSIC PLAYING] Hello. Let's take a look at a common instance of how matched pair design is used. An
experiment is being conducted to test the effectiveness of a new flu vaccine. Gender and age are the
two variables that may play a significant role in how well this vaccine works.

So how can we study only the effects of the vaccine? A matched pair design, that's how. Groups of two
who are similar in both gender and age are created. Then one is given the vaccine. And the other is
given a placebo shot. This allows us to study only the effects of the vaccine and not the effects of the
other variables.

Here's how it goes. For this study, there are 20 participants-- 10 men and 10 women of varying ages
labeled A through T. The first variable being gender, we separate the 20 participants into two groups--
one group of 10 males, the other 10 females.

With the second variable being age, we will pair males of similar ages. Then we'll do the same with
females. So looking at the males, the first similar ages we see are 24 and 25. Our first matched pair will
be participants A and H. Using this process, we see participants L and J, D and C, T and K, and P and R
will also be good matched pairs. Then the same method is applied for similarly-aged females.

Once we have our 10 sets of matched pairs, we can randomly assigned the treatment to one half of the
pair and the control to the other half. This will allow us to study how this new flu vaccine works. Oh, that
reminds me. Be sure to get your flu shot. I'm getting one after yoga this evening. Kidding. That's
ridiculous since I'm a computer. Totally can't do yoga or flu shots. Thanks for watching and see you next
time.

IN CONTEXT
There are 20 participants for an experiment for a flu vaccine. Gender and age may play a role in how
well this treatment works. Groups of two are created; each group is as similar as possible with
respect to any variable that may affect the outcome.

Participant 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Gender M F M M F F F M F M M M F F F M F M F M

Age 24 21 42 39 35 37 22 25 31 32 51 31 61 26 38 55 26 56 52 48

There are 10 men and 10 women of all different ages. Participants will be listed by gender. So
participant 1, 3, 4, 8, 10, 11, 12, 16, 18, and 20 are the males. The rest are females.

Participant 1 3 4 8 10 11 12 16 18 20
Males
Age 24 42 39 25 32 51 31 55 56 48
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 59
Participant 2 5 6 7 9 13 14 15 17 19
Females
Age 21 35 37 22 31 61 26 38 26 52

Age is suspected to also play a role in effectiveness, so within the male category, two ages that that
are closest together--24 and 25--are chosen. Therefore, participants 1 and 8 will form a matched pair.
Participants 10 & 12, 4 & 3, 20 & 11, and 16 & 18 are also matched pairs due to similarly aged males.
The same criteria is applied for similarly aged females.

Participant 1 8 12 10 4 3 20 11 16 18
Males
Age 24 25 31 32 39 42 48 51 55 56

Participant 2 7 14 17 9 5 6 15 19 13
Females
Age 21 22 26 26 31 35 37 38 52 61

Now, to continue the experiment, one of the two in the pair is randomly assigned to receive the flu
vaccine and the other one will be assigned to the control group.

1b. With Subjects as Individuals

Also in a matched-pair design, each subject can be assigned to both groups instead of one, then randomly
assigned the order in which treatments are applied. Each participant then counts as his or her own matched
pair. This design essentially compares someone to themselves.

IN CONTEXT
Suppose that you have a tire company that's considering rolling out a new type of rubber for its
bicycle tires. There are 300 bicycles available. In a completely randomized design, you would place
the numbers 1 - 300 in a hat. Bikers that pull numbers 1 -150 would receive old rubber tires, and the
151- 300 would receive the new rubber tires. They won’t necessarily know who's getting which tires.

But what if the 300 riders don't all ride the same way or equally as often? What do you do then?
How do you create two groups that are roughly the same, with the exception of the bicycle tires?

One way to do it is with a matched-pair design. You could still put the numbers 1 - 300 in a hat. The
only difference is that the people who pull out 1- 150 would get both the old and the new. They
would put the old in the front and the new rubber tire in the back.

Then, the people who pulled out 151 - 300 would get the new rubber tire in the front and the old one
in the back.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 60
So there's still some randomization going on. The only difference is that every biker will get one old
tire and one new tire. This will allow you to compare the tread wear for each bike because the front
and rear tire get worn somewhat equally. It won't matter how much the biker rides or where.

 SUMMARY

In a matched-pair design, two numbers whose characteristics are very similar are paired, then each
one is sent to a different group. When applying matched-pair design, typically, each subject is
assigned to both groups instead of one, as was the case with the bicycle tires situation. Matched-pairs
designs are often done by assigning both treatments to every participant, which is commonly used in
the matched-pairs design.

Good luck!

 TERMS TO KNOW

Matched-Pair Design
An experimental design where two subjects who are similar with respect to variables that could affect
the outcome of the experiment are paired together, then one of them is assigned to one treatment and
one is assigned to the control. This can also be done by assigning each subject to both treatments,
where each subject acts as their own matched-pair.

 WHAT'S COVERED

This tutorial will briefly introduce you to surveys, demonstrating the following concepts:

1. Introduction to Surveys
2. Survey Design

1. Introduction to Surveys
A survey is a data gathering technique. It's an information collection tool, and a lot of organizations use these.
Surveys allow organizations a way to gather data so that they can target the specific information that they
want.

The following are examples of how surveys can be used:

A store might use a survey to figure out something about its customers.
Politicians might use a survey to gather information about their constituents.
Someone hiring for a position in a company might use a survey to learn more about their labor market,
who they can hire, and who is not available in that area, etc.

In all of these examples, the survey is a tool being used to increase the amount of specific information
someone has. For each survey, the researcher has selected the variables of interest, or the variables that he
or she is interested in gathering data on.

 TERMS TO KNOW

Survey/Sample Survey
A data collection tool that individuals in a study can fill out and return to the researcher.

Variables of Interest
The variables the survey wishes to measure about those taking the survey.

2. Survey Design
A survey must be carefully designed to elicit the intended information. The survey design is an important
element of surveys. If you are designing a survey, you want to get a representative sample of your population.
So as with every sampling technique, designing a survey is all about the process and being able to get
accurate data from a representative sample.

Just like with any sample, it's important to define what you're interested in before you begin surveying.

 BRAINSTORM

You might ask yourself: What are the variables that you want to measure? What information do you want
people to provide in your survey? Answering these questions is going to be important because those
answers will help you understand the purpose of the information you generate with your survey.
So, for example, if it's a survey about employment, you're going to want to ask about employment, former
employment, current employment, and things like that.

IN CONTEXT
Suppose a teacher uses the following survey at the end of the year for her students:

Course Survey

Strongly Strongly
Agree Neutral Disagree
Agree Agree

1. The course objectives have been clearly outlined for me. ❍ ❍ ❍ ❍ ❍

2. The methods for evaluating student work have been applied fairly. ❍ ❍ ❍ ❍ ❍

3. This course has challenged me intellectually. ❍ ❍ ❍ ❍ ❍

4. I have worked hard to meet the requirements of this course. ❍ ❍ ❍ ❍ ❍

5. This course was harder than I thought it was going to be. ❍ ❍ ❍ ❍ ❍

6. I looked forward to attending classes. ❍ ❍ ❍ ❍ ❍

7. I have learned a great deal. ❍ ❍ ❍ ❍ ❍

8. This course covered more material than I thought it was going to. ❍ ❍ ❍ ❍ ❍

9. I know more now than before taking this course. ❍ ❍ ❍ ❍ ❍

This teacher wants to know whether or not she did a good job outlining course objectives. This
survey asks about evaluating student work and academic challenge. You'll notice that she's provided
answer choices from strongly agree to strongly disagree.

The teacher thought about all of the different things she wanted to learn from her students including
her teaching and listed them all in her survey. The information she gathers from this survey will help
her answer the question of how clearly she outlined her course objectives for her students.

 TERM TO KNOW

Survey Design
The way the survey is set up. This deals with the wording of questions and answer choices.

 SUMMARY

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 63
To recap, surveys are used to obtain data or information from the population. It's important that you
determine what you want to understand and why and for whom this is being collected, which may
impact survey design. We talked about surveys, which are also called sample surveys. We also talked
about variables of interest, which are the things that you wanted to measure because you're
interested in knowing them.

Thank you and good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Survey Design
The way the survey is set up. This deals with the wording of questions and answer choices.

Survey/Sample Survey
A data collection tool that individuals in a study can fill out and return to the researcher.

Variables of Interest
The variables the survey wishes to measure about those taking the survey.

 WHAT'S COVERED

This tutorial is going to teach you about blinding and will explain the following topics:

1. Blinding
2. Double-Blind and Single-Blind Experiments

1. Blinding
Blinding is one of those principles of experimental design whereby the subjects don't know what treatments
they're going to receive.

When you randomize an experiment, it is done to reduce bias. However, it's possible to give subtle clues
regarding what treatment they're receiving; it’s important that the people don’t know what they're receiving.

Why is this? Because it might be an incentive for them to either stay on the treatment if it's a drug or go off the
treatment if they think they're not getting the real drug.

Also, it may be true that people with an agenda might want to bend the results in their favor. They might want
to make the results of an experiment seem more positive than they really are. This idea of the experimenter
wanting to bend the results in their favor is called the “experimenter effect”.

To counteract both of those two ideas, we implement a strategy called blinding. Only people who are behind
the scenes will know who is getting what. No one, either directly involved in the experiment or taking any of
the treatments, knows what treatments they're receiving.

IN CONTEXT
If subjects know which treatment group they are assigned to, it may influence behavior. So the
treatment group will receive a pill, and the control group will receive a pill. The only difference is that
one pill has the active treatment in it and will be only given to those in the treatment group.

Ideally, when you open the pills up, they would look the same on the inside, too. The idea is that no
one knows which pill is fake and which one has the tested drug.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 65
The fake drug is usually some kind of a sugar or something that makes the person in the control
group feel like they're actually taking something when they’re really not.

 TERM TO KNOW

Blinding
The practice of making sure that certain individuals do not know which subjects are receiving which
treatment.

2. Double-Blind and Single-Blind Experiments

A lot of the times, experiments are what we call double-blind. Double-blind experiments means that the
subjects don't know what treatment they're receiving, nor does anyone who has any contact with them. This
can eliminate bias, due to a subject thinking they know what group they're in. It also reduces the
experimenter effect of someone trying to bend the results.

Single-blind experiments, on the other hand, can have subjects blinded, but the researchers are not.

IN CONTEXT
A double-blind study is ideal, but sometimes it is just not feasible. Suppose there is an exercise
study--whether or not exercise is effective for weight loss. People are going to know if they're
exercising or not. It's impossible to assign people to exercise--the treatment in this case--and have
them not know they're receiving the treatment.

However, the experimenters don't need to who was assigned not to exercise. This is single-blind
because the experimenters don't know. The experimenters were blinded, but the subjects were not.

 BRAINSTORM

Can you think of a single-blind experiment that would be set up to have the researchers know group
assignments, but the participants do not?
 TERMS TO KNOW

Double-Blind Experiment
An experiment where neither the subjects nor anyone in contact with them has any knowledge of
which subjects are receiving which treatment.

Single-Blind Experiment
An experiment where either the subjects have no knowledge of which subjects are receiving which
treatment or people in contact with the subjects have no knowledge of which subjects are receiving
which treatment, but not both.

 SUMMARY

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 66
Blinding is a powerful tool for preventing different types of biases, such as the experimenter effect.
Different studies allow for different levels of blinding. Ideally, double-blind is best since both
participants and the people with direct contact with the participants are not aware of group
assignment. As you saw in the exercise example, sometimes double-blind just is not realistic.
Participants will know if they are exercising or not. In that case, single-blind experiments are the next
best thing, which means that either the subjects or the researcher are aware of group assignments;
but not both.

Good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Blinding
The practice of making sure that certain individuals do not know which subjects are receiving which
treatment.

Double-Blind Experiment
An experiment where neither the subjects, nor anyone in contact with them, has any knowledge of
which subjects are receiving which treatment.

Single-Blind Experiment
An experiment where either the subjects have no knowledge of which subjects are receiving which
treatment, or people in contact with the subjects have no knowledge of which subjects are receiving
which treatment, but not both.

 WHAT'S COVERED

This tutorial will discuss the Placebo Effect by focusing on:

1. Placebo

1. Placebo
In basic terms, placebo is a fake treatment. That doesn’t mean that people don’t respond to it; instead, they
think or expect that the treatment will result in a change. A placebo doesn't do anything. It has no active
treatment, yet people feel better anyway as if they have willed themselves to feel better. This is called the
Placebo Effect.

While the treatment group gets the actual drug, the control group receives a placebo as their treatment. They
get the fake drug with no active ingredient in it--usually some kind of a sugar or something. It doesn't do
anything and has no active ingredient.

Sometimes, the treatment containing the actual drug doesn't work any better than the placebo. This can
happen. It’s evidence against the treatment working.

IN CONTEXT
Suppose that you developed a treatment that relieved pain and you conducted a study on pain. You
had a control group receiving a sugar pill and a treatment group receiving the actual drug that you
created. Here are your results.

Would you say that your treatment is effective? Why or why not?

The answer is here is that your treatment is not very effective. The numbers, 42 and 36, are not far
apart. These results would be weak evidence for the effectiveness of the drug.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 68
What if the results looked like this?

Notice that you still have 36% of patients in the placebo group reporting relief of pain. However, the
difference between 36% and 80% is significant. This would be considered the evidence for the
effectiveness of the drug.

 TERMS TO KNOW

Placebo
An inert drug or treatment given to the control group. It has no active ingredient in it.

Placebo Effect
The observed phenomenon whereby certain individuals will exhibit a desired response even when
taking a placebo, which contains no active ingredient.

 SUMMARY

Placebos are a form of control. They're a fake drug. People can respond to the fake drug, thinking
they are receiving treatment, which is called the Placebo Effect. Experimenters will assess the
effectiveness of the treatment against the effectiveness of the placebo. If the gap between the two is
significant, it is considered evidence that treatment has a considerable effect.

Good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Placebo
An inert drug or treatment given to the control group. It has no active ingredient in it.

Placebo Effect
The observed phenomenon whereby certain individuals will exhibit a desired response even when
taking a placebo, which contains no active ingredient.

 WHAT'S COVERED

This tutorial will discuss variables within the field of statistics, and introduce the concept of
confounding variables. The following elements will be the main focus of this tutorial:,

1. Variables
a. Variables of Interest
b. Explanatory and Response Variables
2. Confounding Variables

1. Variables
In statistics, a variable is any attribute that we can measure about a population, used in a study. It is very
important to carefully define the variables to be measured when creating a study.

Think of things that we could find out about people:

Age
Weight
Gender
Ethnicity
Favorite Food
Number of Pets
Smoker or Non-Smoker
ZIP Code
Number of Siblings
Political Affiliation
Favorite Sport

All sorts of these things are variables. You might only want to know one of these things or some of these
things.

 TERM TO KNOW

Variable
Any attribute or number that can be measured about individuals in a study.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 70
1a. Variables of Interest
For a political poll, for example, you wouldn't necessarily need to know if a candidate was a smoker or the
number pets they have. However, you might want to know about their age, gender, state, political affiliations,
zip code, ethnicity, and city.

Since those variables could potentially have some bearing on a political poll. They are thevariables of interest
for this study--literally, the variables you would be interested in measuring.

However, if you were conducting a weight loss study, the political affiliation will likely not be a variable to
measure, but favorite food might seem important.

 TERM TO KNOW

Variable of Interest
Any variable which we need to know about in the context of a study.

1b. Explanatory and Response Variables

Some studies try to determine a cause-and-effect relationship between two variables in that one variable
causes the other. An increase in one corresponds to an increase or decrease in the other.

In those cases, we define the one that causes the other as theexplanatory variable. In a study, you can have
more than explanatory variable.

Then, variables that are the result are called response variables.

Examples of Explanatory and Response Variables

You might hypothesize that as you increase the number of

Explanatory: Number of hours you study hours that you study, your grade on the exam will increase
Response: Grade on the exam as well. So the number of hours you study, therefore, helps
to explain your grade.

Explanatory: Average monthly temperature You might assume that as the temperatures get warmer, that
Response: Ice cream sales ice cream sales would go up in kind.

Something that's a little bit less obvious is whether or not gender, which is a categorical variable, plays a role
in which political party people will choose. Are males more likely to be Republican? Or are women more likely
to be independent voters? We don't know. But that would be an interesting question to investigate.

 TERMS TO KNOW

Explanatory Variable
A variable that we believe is predictive of something else. An increase in this variable will correspond
to an increase or decrease in some other variable.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 71
Response Variable
A variable that is affected by the explanatory variable.

2. Confounding Variables
The word confounding refers to when two variables get mixed up with one another and you can't tell the
effect of one variable from the effect of the other variable. The confounding variable is the one not accounted
for in a study. It is an unseen variable that has a significant effect on the response variable and is also related
to the explanatory variable.

IN CONTEXT
Suppose that a researcher wants to know whether a high protein diet will help lab rats gain more
weight than a low protein diet. The researcher has 26 lab rats and she selects 13 of the smallest rats
to receive the low protein diet and 13 of the largest to receive the high protein diet. At the end of the
study, she weighs the rats to determine their weight gain and finds that the rats on the high protein
diet gained more weight.

Can you think of anything that she did wrong in this study?

The answer involves the occurrence of confounding. Remember, confounding is when two variables
get mixed up and you can't tell the effect of one variable from the effect of the other variable.

So in this case, the effect of the diets--whether or not the high protein diet caused the rats to gain
more weight--was confounded by the fact that the heaviest rats were put on the high protein diet. It’s
not clear if the high protein diets were effective at weight gain. Something else may have caused the
weight gain since they were heavy already.

Therefore, these are the two variables of interest in the study. The high protein diet was supposed
to be the explanatory variable. The weight gain was supposed to be the response variable. The
researcher was going to try to figure out a link between the two.

However, because of the way she assigned the rats, only a limited conclusion could be drawn. She
wasn't able to draw the direct conclusion that she was hoping for--and that is confounding.
Confounding should be limited in experiments when possible.

 TRY IT

A high school math teacher, hoping to have his students do well on the final, offers an optional review
session. He states, “No one who's ever attended the review session has ever scored less than a B”.

What is the teacher trying to imply? Why isn’t his implication correct?

You may have come up with that he's trying to imply that the review sessions will cause the students to do
better. That may be true; however, there may be a few confounding variables. Maybe only his best and

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 72
brightest students attend the optional review and these are students that may have done well on the final
exam anyway. The effects, if any, are confounded by the intrinsic motivation of students to show up to the
session.
 TERMS TO KNOW

Confounding
Occurs when the effects of the treatments, if any, are indistinguishable from the potential effects of
some other variable which was unaccounted for.

Confounding Variable
A variable which was not accounted for in a study, which limits the conclusions that the study can
draw.

 SUMMARY

Variables are what we choose to measure in a study. The variables of interest will depend on the
questions that you're trying to answer. Not every variable must be measured--just the ones that are of
interest. By looking at variables in context, you learned that if a cause and effect relationship is
thought to exist, you can break the variables down even further into explanatory and response
variables. Confounding occurs when there is a variable that is chosen as an explanatory variable in an
experiment, but because another variable got in the way, it cannot be determined to explain a cause.
You explored confounding variables in action to demonstrate how they can limit the conclusions that
can be drawn from the supposed explanatory variable. In effect, the confounding variable inhibits a
cause-and-effect conclusion. Often, it's one that you didn't think to measure, which is problematic.

Good luck!

Source: ADAPTED FROM SOPHIA TUTORIAL BY JONATHAN OSTERS.

 TERMS TO KNOW

Confounding
Occurs when the effects of the treatments, if any, are indistinguishable from the potential effects of
some other variable which was unaccounted for.

Confounding Variable
A variable which was not accounted for in a study, which limits the conclusions that the study can draw.

Explanatory Variable
A variable that we believe is predictive of something else. An increase in this variable will correspond to
an increase or decrease in some other variable.

Response Variable
A variable that is affected by the explanatory variable.

Variable
Any attribute or number that can be measured about individuals in a study.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 73
Variable of Interest
Any variable which we need to know about in the context of a study.

 WHAT'S COVERED

This tutorial will cover the topic of question types. We will cover binomial questions, as well as discuss
the difference between open-ended and closed questions, through the exploration of:

1. Binomial Questions
2. Closed Questions
3. Open-ended Questions

1. Binomial Questions
Recall that there are two types of data:

Qualitative Data Quantitative Data

Deals with numbers and can be

Deals with categories or descriptions measured or perform arithmetic
with
Also called Categorical Data
Also called Numerical Data

A binomial question is a type of question with only two answer choices. In order to understand what a
binomial question is, it helps to break down the word itself. Bi means “two” and nomial means “names”. So a
binomial question is a question with two names.

Do you think that this is a qualitative type of question or a quantitative type of question?

A binomial question collects qualitative data because there are two possible responses. It's a question with
two categories.

 EXAMPLE The simplest version of a binomial question is yes or no. You might remember this type
of question from elementary or middle school:

Do you like me?

(Check Yes or No)

Yes No

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 75
Other examples of binomial questions include:

Do you prefer dogs or cats?

Are you a smoker or non-smoker?

In that last question, some people feel like they fall somewhere in between the two options. They may
currently be a smoker, but they are trying to quit. Sometimes questions have some shades of gray. What
about this one?

"Have you ever smoked?"

This is a binomial question that would address people who don't currently smoke but used to.

Sometimes things don't neatly fit into two boxes. Nor do they work when the questions have more than two
answers or are open-ended questions such as, “How do you feel about the construction of the new baseball
diamond located on the north end of town?". It doesn't really work to place something like that into two
categories.

 TERM TO KNOW

Binomial Question
A question with only two answer choices.

2. Closed Questions
Many surveys have a combination of open and closed questions.Closed questions have short, definite,
usually multiple choice type answers.

Your Overall Experience

Excellent Good Satisfactory Fair Poor

The Teacher ❍ ❍ ❍ ❍ ❍

Class Content ❍ ❍ ❍ ❍ ❍

The Class as a Whole ❍ ❍ ❍ ❍ ❍

What did you like about the class?

In the above example, you'll notice that the highlighted pink area shows multiple choices --poor, fair,
satisfactory, good and excellent-- and those are your only choices.

 HINT

When there are only certain answers to select, such as yes/no or multiple choice, that is the signal that
you are dealing with a closed question.
 TERM TO KNOW

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 76
Closed Question
A question type with only so many different answer choices.

3. Open-ended Questions
Open questions, also called open-ended questions, are subjective. These are areas where someone can click
into the field and start to type their comments and/or opinions. These comments are open to the
interpretation of the person being surveyed.

The comments are also open to the interpretation of the person conducting the survey when they do the
analysis. Usually, they need to be analyzed by a person in order to really get the full effect from it. Oftentimes,
in the desire for simplicity, someone will give a question in closed form that really should be an open-ended
question.

An example of an open-ended question is highlighted in blue below.

Your Overall Experience

Excellent Good Satisfactory Fair Poor

The Teacher ❍ ❍ ❍ ❍ ❍

Class Content ❍ ❍ ❍ ❍ ❍

The Class as a Whole ❍ ❍ ❍ ❍ ❍

What did you like about the class?

 TERM TO KNOW

Open Question
A question type with no answer choices; the respondent can choose what he or she wants to say to
answer the question.

⚙ THINK ABOUT IT

Suppose you are in a court of law and the lawyer asks, “Were you at the crime scene?”

“Yes, but I didn’t see anything other than people running and police arriving. It was chaos.”

“Just yes or no, please.”

The lawyer asked a closed question and wants only a yes/no answer. By attempting to explain your
circumstance, you were trying to answer it in an open-ended question type. The lawyer reverts back to
the closed question again by asking you to select either “yes” or “no.”

Binomial questions produce categorical data. These are questions with two possible responses, or
two categories. It's important to consider whether or not there really are just two categories before
you ask something as a binomial question. Open questions allow for more explanation and they're
sometimes difficult to interpret because they're not very cut and dried like closed questions.
Sometime open-ended questions are called "essay" questions. Closed questions are easier to
interpret, but they're not always appropriate for the situation. Closed questions are sometimes called
multiple choice type questions.

Good luck!

Source: ADAPTED FROM SOPHIA TUTORIAL BY JONATHAN OSTERS.

 TERMS TO KNOW

Binomial Question Type

A question that will yield categorical data with just two possible values.

Closed Question
A question type with only so many different answer choices.

Open Question
A question type with no answer choices; the respondent can choose what he or she wants to say to
answer the question.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 78
Accuracy and Precision in Measurements
by Sophia

 WHAT'S COVERED

This tutorial will discuss accuracy in measurement versus precision through the following exploration:

1. Contrasting Accuracy and Precision

a. Scale Example
b. Dartboard Example

1. Contrasting Accuracy and Precision

When talking about accuracy, the focus is on how close the measurement is to what the measurement should
have been.

Precision, on the other hand, is concerned with how consistent the measurements are to each other. In other
words, how close are the measurements to a single value, regardless of whether or not that single value is the
right answer.

 TERMS TO KNOW

Accuracy
The extent to which the values, when considered all together, center around the correct value for a
variable.

Precision
The extent to which the values are very close to each other, even if they are not near the correct
value.

1a. Scale Example

Suppose you work for a consumer report company that sells personal weight scales. It’s your job to decide
whether each of these scales, labeled #1, #2, #3, and #4, are accurate, precise, both, or neither.
You take someone who weighs 161.8 pounds and placed them on the four different scales, five times each.

Take a look at Scale #1 and determine if this scale is accurate, precise, both, or neither.

Scale 1
Accuracy ✔ Precision ✘

160.4 158.8 161.4 164.2 162.0

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 79
Scale #1 is accurate because the numbers average out to the right answer of 161.8. Although it reported a
fairly low number such as 158.8 and a high number of 164.2, by and large, the numbers average out to what's
pretty close to the right answer.

However, Scale #1 is not precise because the numbers are not close to a single value every time.

Take a look at Scale #2 and determine if this scale is accurate, precise, both, or neither.

Scale 2
Accuracy ✘ Precision ✔

168.2 167.8 167.8 168.0 168.4

You can tell just by looking at the numbers that all values are within 1 pound of each other, which means it is
precise. Remember, it doesn’t need to be close to the actual correct number, but they need to be close to
each other.

But take a look at the average. The average of Scale #2 is about 168, which is overestimating by at least 7
pounds, so this scale is not accurate.

Take a look at Scale #3 and determine if this scale is accurate, precise, both, or neither.

Scale 3
Accuracy ✔ Precision ✔

161.0 161.8 161.6 162.0 161.2

All of these are within a pound of each other. They're also very close to 161.8 pounds, the true weight of the
individual you selected. Having the numbers all close to each other make it precise, and the numbers average
out to be very close to the correct weight of 161.8. Therefore, Scale #3 is both accurate and precise.

Take a look at Scale #4 and determine if this scale is accurate, precise, both, or neither.

Scale 4
Accuracy ✘ Precision ✘

161.8 170.2 165.4 168.4 164.8

It actually did get the correct weight of 161.8 once, but if you look at the five measurements taken as a whole,
they're pretty far off and they tend to overestimate. They don't really center around the right number all that
much, so it’s not accurate. The numbers are also all over the place, so this scale is not precise.

 BRAINSTORM

If you worked for a consumer report company and you were evaluating the above scales, which scale
would you choose and why?

1b. Dartboard Example

A dartboard is a very popular example of precision and accuracy, assuming the bulls-eye is the desired
outcome, or “value”.
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 80
Precise Not Precise

Accurate

Not Accurate

For the cases above:

Precise and Accurate: In the top left corner, the darts are clumped together AND around the bulls-eye.
Not Precise, but Accurate: In the top right corner, the darts are not clumped together, but they loosely
surround the bulls-eye.
Precise, but Not Accurate: In the bottom left corner, the darts are clumped together, but not around the
correct “value”, or in this case, the bulls-eye.
Not Precise nor Accurate: In the bottom right corner, the darts are spread out and are not surrounding the
bullseye.

 SUMMARY

By contrasting accuracy and precision, you now know that accuracy is how close the measurements
are to the right answer, though they may not necessarily land exactly on the correct answer. Precision
is how consistent measurements are with each other, even if they are not near the correct value.
Generally, you will see them clumped together. In a given measurement scenario, high accuracy and
high precision is ideal.

Good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 81
Accuracy
The extent to which the values, when considered all together, center around the correct value for a
variable.

Precision
The extent to which the values are very close to each other, even if they are not near the correct value.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 82
Absolute Change and Relative Change
by Sophia

 WHAT'S COVERED

In this tutorial, you're going to learn about the difference between absolute change, which is an
increase or decrease represented as a raw number, and relative change, which relates that change
differential back to the original value. Specifically, this lesson will cover:

1. Absolute Change and Relative Change

2. Calculating Absolute Change
3. Calculating Relative Change
4. Examples of Absolute Change and Relative Change

1. Absolute Change and Relative Change

Absolute change is the actual change in units. It could be the actual change in pounds, degrees, inches,
percentage points, or lots of different things.

 EXAMPLE Suppose a political candidate's approval rating went up from 44% to 48%. That absolute
change is four percentage points.
Relative change is the percent difference from the previous value, and it's always expressed as a percent.

 HINT

Relative change can also be referred to as the percent error.

IN CONTEXT
An infant weighed 6.5 pounds at birth, and one year later, weighed 14.5 pounds. Decide if each of
the following statements are true.

Statement 1: The infant's weight change was an increase of eight pounds.

Well, that's a true statement. 14.5 minus 6.5 is 8 pounds. It increased by 8 pounds.

Statement 2: The infant's weight change was an increase of 123%.

This one's a little bit less obvious, but it's also true. The eight-pound increase was more than double
what the birth weight was. It was an increase of over 100%. In fact, when you do the calculation, 8
divided by 6.5 is 123%.

Absolute Change
The raw increase or decrease in the value of a variable

Relative Change
The percent increase or decrease in the value of a variable.

2. Calculating Absolute Change

How do you calculate absolute change? Another word for it is the absolute difference. You simply calculate
the difference between the new and the old.

 FORMULA

Absolute Change

In the example above, 14.5 minus 6.5 was a difference of 8 pounds.

It is also a positive 8 pounds because it went up, versus going down.

3. Calculating Relative Change

The relative change, or the relative difference, is calculated by taking the absolute difference and dividing it
by its originating value.

 FORMULA

Relative Change

In the example above, the absolute difference was 8 pounds and the original value was 6.5. When you put this
into a calculator, you get 1.23.

When expressed as a percent, 1.23 is 123%. That means that there was a 123% increase over the birth weight.
That was the relative change.

4. Examples of Absolute Change and Relative

Change
Consider the following example that shows this year and last year's enrollment at Memorial High School.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 84
Video Transcription
Hey again. Let's walk through an example of how absolute change and relative change are found and
the differences between them. The data we'll use is the enrollment for this year and last year classes at
Memorial High School. First, we'll seek to determine which class has the highest absolute change. Then
the highest relative change.

Will it be the burnouts, the nerds, the geeks, or the dweebs? It's more than anyone's guess. It's statistics.
Anyway, let's find out. We'll start with absolute change. To calculate this, simply subtract last year's value
from this year. As you can see, three of the four classes had increases in enrollment.

So of the classes that had a positive absolute change, the burnouts had the highest with 310 students.
Now onto relative change, which is calculated by dividing the absolute change by the original number.
With that in mind, and looking at the classes again, repeat this formula with all four groups. This is what
you'll see.

The relative change for the burnouts is a sizable increase of 24%, while a more modest 10% appears for
the nerds. The Geeks, on the other hand, experienced a decrease of 6%. Which finally brings us to the
dweebs. While they're the smallest overall class, they have the highest relative change with an increase
of 26%. The dweebs enrollment wasn't big to begin with, so even a normal absolute change resulted in
the largest relative change.

To summarize, here's a breakdown of the distinction between the two categories. Absolute change is
the difference in raw numbers. In this case, it's the actual change in enrollment from one year to the next.
Whereas, the relative change converts how this year compared to last year in terms of a percent of the
original number.

Looking at the absolute change and relative change can tell different stories, and often times you
humans find these stories are a valuable way to analyze data. There you have it. A quick illustration of
absolute change and relative change. Keep plugging away and I'll see you in the next video.

IN CONTEXT
Let's look at another example. The following table shows the results of the 1990 census and the
2000 census, along with the absolute change and relative change.

1990 2000 Absolute Relative

State
Population Population Change Change

Florida 12,937,926 15,982,378 3,044,452 24%

Georgia 6,478,216 8,186,453 1,708,237 26%

Hawaii 1,108,229 1,211,537 103,308 9%

Idaho 1,006,749 1,293,953 287,204 29%

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 85
Illinois 11,430,602 12,419,293 988,691 9%

Indiana 5,544,159 6,080,485 536,326 10%

Iowa 2,776,755 2,926,324 149,569 5%

Kansas 2,477,574 2,688,418 210,844 9%

Absolute Change: To calculate the absolute value, simply subtract the 1990 value from the 2000
value. For example, Florida's absolute value can be found by subtracting 12,937,926 from 15,982,378
to get an absolute change of 3,044,452.

All of the states in the list had increases in the population. Some were not very much, like Hawaii,
which only had about a 100,000-person increase. Some were a lot, like Georgia and Florida, which
increased by over a million people. The highest absolute change was 3,044,452 people, in Florida.

Relative Change: The question of which state had the largest relative change between that time is a
little bit different. Looking at Florida again, you need to figure out if the absolute change of around 3
million was a large change percentage-wise from the old population of about 13 million. It was a
large increase but was it the largest percent increase in the list?

To find the relative change, take each absolute change and divide by the old population from 1990.

Florida's relative change was positive 24%--approximately 3 million divided by 13 million gives you
about 24%. Georgia's increase was about 26%, a little bit larger of a percent increase than Florida.
The highest of the list was a 29% increase in the state of Idaho. Notice it didn't have a very large
absolute change. But its population wasn't very big to begin with, and so even a small absolute
change can be a large relative change.

 SUMMARY

Absolute change is the absolute difference in raw numbers. It's the change in units. Relative change
examines how the new number compares to the previous number in terms of a percent. Did it go up
by 10%? Did it go down by 7%? What happened percentage-wise from then to now?

Good luck!

Source: THIS WORK IS ADAPTED FROM SOPHIA AUTHOR JONATHAN OSTERS.

 TERMS TO KNOW

Absolute Change
The raw increase or decrease in the value of a variable

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 86
Relative Change
The percent increase or decrease in the value of a variable.

 FORMULAS TO KNOW

Absolute Change

Relative Change

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 87
Using Percentages in Statistics
by Sophia

 WHAT'S COVERED

This tutorial will discuss how to use percentages wisely in statistics by focusing on:

1. Percentage Point vs. Percent

2. Examples
a. Retaking a Test
b. A Politician's Approval Rating

1. Percentage Point vs. Percent

People tend to use percentages without really thinking about what type of percentages they're talking about.
Results and statistics are often expressed as percents but it's important to distinguish between percentage
points and percents.

Percents are used to describe the relative change. Percentage points are used to measure absolute change.

 TERMS TO KNOW

Percentage Points
An absolute increase or decrease in a percent value.

Percent Change
A relative increase or decrease in a percent value.

2. Examples
2a. Retaking a Test
Suppose a teacher gives a particularly difficult exam and these six students all failed it. The teacher graciously
offered a retake to the students and they all passed.

The table below shows their original score and their retake score. On the retake, Jonathan scored an 88,
Ryan scored a 78, Katherine scored an 84, etc.

Original
Student Retake Score
Score

Johnathan 52% 88%

Ryan 38% 78%

Isaiah 44% 89%

Teri 50% 82%

Kelly 48% 95%

These changes can be expressed as either percentage points or percent increase. First, which student had
the highest increase in percentage points?

Change in
Original
Student Retake Score Percentage
Score
Points

Johnathan 52% 88% 36%

Ryan 38% 78% 40%

Katherine 61% 84% 23%

Isaiah 44% 89% 45%

Teri 50% 82% 32%

Kelly 48% 95% 47%

Jonathan went from 52% to 88%, that's an increase of 36 percentage points. Ryan went from 38% to 78%,
that's an increase of 40 percentage points. We can calculate that for all of them and see that it was Kelly who
increased 47 percentage points.
Now, who had the highest percent increase? Now you need to look at the raw increased numbers and
determine who had the highest percent increase over their old score.

Begin with Jonathan's scores. We need to determine how much of an increase 36 percentage points was
over that original score of 52.

Change in
Original Percent
Student Retake Score Percentage
Score Increase
Points

Johnathan 52% 88% 36% 69%

Ryan 38% 78% 40% 105%

Katherine 61% 84% 23% 38%

Isaiah 44% 89% 45% 102%

Teri 50% 82% 32% 64%

Kelly 48% 95% 47% 98%

Jonathan's score increased by 69%. Katherine's only increased by 38% because she had a fairly high score to
begin with.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 89
But it was Ryan who had the highest percent increase. He started with a 38 and finished with a 78, a 40
percentage point increase. A 40 percentage point increase over a score of 38, is over 100%, meaning he
more than doubled his old score.

2b. A Politician's Approval Rating

Video Transcription
Let's take a moment to look at one more example of using percentages in statistics. Suppose young
Patrick hare has found his way to class president at Memorial High School, but his approval rating has
just hit the skids, dropping from 56% to 42%. Perhaps this is thanks in part to his proposal to phase out
all computer generated voices with English accents. I'm just saying.

Whatever the case, let's determine the absolute change in his approval rating. Take a moment to
calculate it out. All right, here's what you should have done. Take 42 and subtract 56 from it. This gives
you negative 14. So Patrick's approval rating dropped 14 percentage points. It's a drop, but looking at it
that way, Patrick isn't too concerned.

However, how does that drop look when you calculate it in terms of relative change? Again, take a
moment to calculate it out. OK, here's where you start. Take the 14 percentage point drop and divide it
by the original approval rating, 56. That will give you minus 0.25, or a 25% drop. Viewed in this context,
Patrick sees the drop is a significant one, which he might not have expected.

Do you see what happens, Patrick? Do you see what happens when you try to phase out a crisp and
pleasant sounding computer generated English accent?

Suppose Patrick has found his way to class president at Memorial High School. But his approval rating has just
hit the skids, dropping from 56% to 42%.

First, let’s determine the absolute change in his approval rating. Take 42 and subtract 56 from it.

This gives you negative 14. So Patrick's approval rating dropped 14 percentage points. It’s a drop, but looking
at it that way, Patrick isn’t too concerned.

However, how does that drop look when you calculate it in terms of relative change? Take the 14 percentage
point drop and divide it by the original approval rating, 56.

That will give you -0.25, or a 25% drop. Viewed in this context Patrick sees the drop is a significant one, which
he might not have expected.

 SUMMARY

When percentages are used in statistics it's important to know whether the focus is absolute change
or relative change. Absolute change is the difference in percentage points and relative change is a

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 90
percent increase or percent decrease.

Source: this work is adapted from sophia author jonathan osters.

 TERMS TO KNOW

Percent Change
A relative increase or decrease in a percent value

Percentage Points
An absolute increase or decrease in a percent value.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 91
Index Number and Reference Value
by Sophia

 WHAT'S COVERED

This tutorial is going to teach you about index numbers and reference values, through the definition
and discussion of:

1. Index Numbers and Reference Values

2. Consumer Price Index and Inflation

1. Index Numbers and Reference Values

An index number is a way to measure a percent increase or decrease from one point to another. This is
typically done with price changes. We set an arbitrary starting point in time and assign that price an index
number of 100. This starting price is called the reference value because we refer back to it every time the
price changes.

To calculate the index value for other points in time, you would take the current price, divide by the reference
value, and then convert that value to a percent.

 FORMULA

Index Number

How do we work with index numbers and reference values most of the time? Consider the following example:

In 1983 a gallon of milk cost $2.24, so you assign this reference value of $2.24 an index value of 100.
Essentially this means that it cost 100% of what it cost in 1983--a fairly obvious statement.

Year 1983 1988 1993 1998 2003

Price($) $2.24 $2.30 $2.86 $3.16 $3.19

Index Value 100

To calculate the index value for other points in time, like in 1988 when a gallon of milk costs $2.30 or 1993
when it cost $2.86, you would take the current price, divide by the reference value of $2.24, and then convert
that value to a percent.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 92
The index value in 1998, then, is $2.30 divided by the reference value of $2.24. That gives you 1.027, which as
a percent is 102.7%. Note that index values are expressed without the percent symbol, so the index value in
1988 was 102.7. You can complete the table with the remaining values.

Year 1983 1988 1993 1998 2003

Price($) $2.24 $2.30 $2.86 $3.16 $3.19

Index Value 100 102.7 127.7 141.1 142.4

What this indicates is that by the time you get to 2003, a gallon of milk cost 142.4% as much as it did in 1983,
or a 42% increase over 1983.

 TERMS TO KNOW

Index Number
A way to measure the relative change in a value, usually the price of a good or service, over time. If
the index number is over 100, that means the price has increased. If the price has decreased, then the
index number will be less than 100.

Reference Value
An arbitrarily chosen starting value for an index. It is assigned an index number of 100.

2. Consumer Price Index and Inflation

The most prominent index number that you see in everyday life is called theConsumer Price Index. The
Consumer Price Index (abbreviated as CPI) measures a percent increase or decrease in the price of goods
and services. Its reference value is 1983, which is why that was the reference value used in the previous
example. The U.S. Bureau of Labor Statistics updates the CPI every month.

The CPI is a general measure of inflation. Inflation means that the index is going up. It's a decline in
purchasing power, which means that it costs more now to buy these goods and services than it did then. That
means that the dollar is inflated. Put another way, inflation means that with the same amount of money coming
in and with the same income, you have less purchasing power. It may cost you much more now to do what it
cost $100 to do in 1983.

Here's a graph of the CPI over time. Notice the index value is 100 in 1983, between 1980 and 1990. Goods and
services costing $100 in 1983 will cost you around $200 if you look at around 2007. Therefore, the index
value was 200 in 2007.

Consumer Price Index

An index published by the US Bureau of Labor Statistics that shows the change in the price of many
different goods or services in the United States. It provides a measure of purchasing power.

Inflation
A relative increase in the price of a good or service over time. A person will need to pay more to
receive the same good or service than they did at a previous point in time.

 SUMMARY

Index numbers allow us to check changes, typically in prices, from one point in time to another. We
begin with a reference value, which is the price at some arbitrary point in time. The index numbers are
the percent increase or decrease from that reference value. If the price goes up, the index number will
be over 100. If the price goes down, the index number will be under 100. The most commonly referred
index would be the Consumer Price Index or CPI. The CPI shows percent increase or decrease in the
prices of many goods and services, which helps determine the amount of inflation.

Good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Consumer Price Index

An index published by the US Bureau of Labor Statistics that shows the change in the price of many
different goods or services in the United States. It provides a measure of purchasing power.

Index Number
A way to measure the relative change in a value, usually the price of a good or service, over time. If the
index number is over 100, that means the price has increased. If the price has decreased, then the index

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 94
number will be less than 100.

Inflation
A relative increase in the price of a good or service over time. A person will need to pay more to receive
the same good or service than they did at a previous point in time.

Reference Value
An arbitrarily chosen starting value for an index. It is assigned an index number of 100.

 FORMULAS TO KNOW

Index Number

 WHAT'S COVERED

This tutorial will cover the topic of bias, specifically focusing on:

1. Bias
2. Hawthorne Effect

1. Bias
Most often, research is done accurately and with integrity. People want to get the job done right. They want to
get the answer correct. But sometimes there's something that happens systematically in the experiment or the
study that limits the accurate representation of the population that researching.

Bias, in the statistics world, is systematically misrepresenting the population. It refers to the favoring of certain
outcomes in a sample that limits our ability to draw conclusions about the population. The key word is
systematical--it's not necessarily intentional. It could be intentional, but it doesn't have to be.

A way of selecting the sample for your study such that the sample doesn't accurately reflect the population is
called selection bias. It's not good, but sometimes it can't be avoided. On the other hand, sometimes itcan be
avoided, but isn't.

Publication bias occurs when researchers only want to publish the most sensational findings, or rather, only
the positive ones. Only the results that people will want to read make it to people's eyeballs, while findings
deemed boring do not.

 TERMS TO KNOW

Bias
The tendency for collected data to differ from what is expected in a systematic way. Biased data can
often favor a specific group of those studied.

Selection Bias
Selecting a sample in such a way that certain subsets of the population are systematically excluded.

Publication Bias
The desire of researchers (and research publications) to only print the most sensational or interesting
articles.

2. Hawthorne Effect

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 96
Often, people will behave differently if they know that they're under observation. They become a bit self-
conscious when they are observed and want to do it “right”, so they act differently.

This idea that people might change what they would typically do based on the fact they're under observation
is a type of bias called the Hawthorne Effect.

IN CONTEXT
Suppose you are in charge of a weight loss study. One group is told to take a pill every day. The
other group is also told to take a pill every day, but it doesn't have any active ingredient in it.

You instruct them not to change their behavior. You don’t want them changing the results by eating
differently or exercising more. However, these people might change their behavior based on the fact
that they know they're going to be weighed later.

Another thing to consider is when a study is based on participants volunteering their time to be a part of this
study. What may happen is that only people with a passion specific to the study may sign up, which is known
as participation bias.

Furthermore, another issue may be that the participants tell you what theythink you want to hear, which is
response bias.

 TERMS TO KNOW

Hawthorne Effect
People have the tendency to change their behavior when they know they are being monitored.

Participation (Voluntary Response) Bias

Bias that occurs when a sample consists entirely of volunteers. People with strong opinions may be
the only ones who volunteer.

Response Bias
Bias that occurs when a respondent tells the interviewer "what they want to hear" or lies due to the
sensitive nature of the question.

 SUMMARY

Bias has a problematic influence on many experiments and samples. Unfortunately, when bias exists,
the results received cannot be generalized to the population, because they are not reliable. It’s
important to know that bias is not always intentional. It can be a systematic flaw in the sample or the
experiment, but it's not always on purpose. Selection bias happens when the sample is not truly
representative of the population to which you want to generalize the information. Publication bias is
when researchers publish only the information that they think people want people to see. The
Hawthorne Effect is a type of bias that happens when people act differently, just knowing they are
being observed.

Good luck!

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 97
Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Bias
The tendency for collected data to differ from what is expected in a systematic way. Biased data can
often favor a specific group of those studied.

Hawthorne Effect
People have the tendency to change their behavior when they know they are being monitored.

Participation (Voluntary Response) Bias

Bias that occurs when a sample consists entirely of volunteers. People with strong opinions may be the
only ones who volunteer.

Publication Bias
The desire of researchers (and research publications) to only print the most sensational or interesting
articles.

Response Bias
Bias that occurs when a respondent tells the interviewer "what they want to hear" or lies due to the
sensitive nature of the question.

Selection Bias
Selecting a sample in such a way that certain subsets of the population are systematically excluded.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 98
Nonresponse and Response Bias
by Sophia

 WHAT'S COVERED

This tutorial will cover the topics of nonresponse bias and response bias by focusing on:

1. Nonresponse Bias
2. Participation Bias
3. Response Bias

1. Nonresponse Bias
A nice way to think of sampling is to use a "pot of soup" analogy. You want a representative sample, right?
Well, you don't need to drink the entire pot of soup in order to figure out what's in it. You just need the right
taste.

It would be like selecting all of the ingredients from the soup in a single tasting, but certain things can go
wrong with the taste test that can affect what you think is in the soup. Just like you don't really know what the
population looks like, you really don’t have a clear idea of all the ingredients in the soup. All you get is the
taste, and if you don't get the right taste, you're going to leave something out and not know exactly what's in
the soup (or, population).

In terms of sampling, nonresponse means that someone selected for the sample either can't be contacted or
is unwilling to participate.

Now, nonresponse happens. It's an inevitability that you will get uncooperative people, people that don't want
to take your survey or people who refuse to be part of your experiment. It may be that you just won't be able
to contact certain people.

The problem of nonresponse is not a problem until the people that weren't able to be contacted or refused to
participate differ substantially from the people that were in the sample. Now the sample is not representative
of the population. That is called nonresponse bias because you're not getting an accurate cross-section of
opinions. The opinions of people that you wanted to get are left out.

IN CONTEXT
A workplace wishes to survey 200 of its 1,000 employees about their workload and their stress level,
so they put 200 surveys in the workers' mailboxes. It’s likely that the people who have the biggest
workloads might get left out of the sample because they don't check their mailboxes as often as
other people. Or if they do get around to checking their mailbox, they may not complete the survey,
or don't return it, because they're so busy.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 99
What effect might that have? The 200 respondents that completed the survey may have reported
that workload level is not that high. The only problem is that the people with the lower workloads are
the only people who turned them in, because they had the time to take it. Also, the people with the
higher workloads didn't have the time to take it, reinforcing the conclusion that the company might
think the workload level is lower than it really is.

The nonresponse rate is easy to calculate. You just subtract the number that you got back from the number
that you mailed out, and that's your nonresponse rate.

 EXAMPLE Say you mailed out 100, and you only got 80 back. Well, that's 20 out of 100, or 20%
nonresponse rate.

⚙ THINK ABOUT IT

Consider the different ways of conducting a survey, a poll, or a sample. Which of the following methods do
you think has the highest nonresponse rate?
Mail
Telephone
Face-to-Face

The answer is the mail. People will either throw it away, forget to fill it out, or maybe they'll fill it out and
then forget to mail it back. This is problematic because when the United States takes its census of
everyone in the country, it does so by mail. Sometimes they have to do follow-ups.
In samples with high rates of nonresponse, follow-ups typically are needed. Suppose you started with a
mailing. You might need to follow up by calling them at home. If you can't reach them by calling them at home,
you might need to follow up by coming directly to their house.

Sometimes, even when they are contacted, someone will refuse to participate. Follow-ups like this might be
more necessary in some areas of the country than others because different areas of the country have
different rates of nonresponse.

 TERMS TO KNOW

Nonresponse
Nonresponse is a lack of response from people you've selected. It affects the ability to draw
conclusions from your sample.

Nonresponse Bias
Bias that occurs when the people who were unable to be reached or unwilling to participate in a
sample have substantially different opinions than the people who were included in the sample,
resulting in a misrepresentation of the population.

2. Participation Bias
On the other end of the spectrum is when people are excessively passionate about a topic and they’re eager
to participate. The people who raise their hand to participate are volunteering their time because they have a
strong opinion about the topic at hand. Participation bias happens when people participate because they
have strong opinions about the topic, or they’re ambivalent because they are only participating because they

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 100
are getting paid to participate.

 EXAMPLE Suppose you need to gather information on an upcoming election and you ask people
to participate in a focus group. In your group, you find that you have a group in strong support of the
Democratic party and you have a group in strong support of the Republican party, and no one in the
middle.

To correct this, you decide you’re going to pay participants $20 for their time. Now your group is filled
with people who will simply tell you want they think you want to hear, which invites participation bias.
 TERM TO KNOW

Participation Bias
Bias that occurs when participation in a study is voluntary. People who feel strongly may be the only
participants.

3. Response Bias
Response bias is when people's answers are influenced. Remember the pot of soup analogy? When you get a
representative sample, that's like getting a little taste of everything in the soup. However, things can go wrong
and you don't get the right taste of the soup.

Response bias can occur if the wording of the question is unclear to the respondent, if a respondent is
uncomfortable due to the sensitive or personal nature of the questions, or if the respondent feels like the
questioner is implying that the question has a "correct" response. That's also called social desirability bias.

IN CONTEXT
On April 20, 1993, the New York Times published an article on a survey conducted by the Roper
Organization on behalf of the Jewish American Community about the soon-to-be opened Holocaust
Museum in Washington, DC.

The newspaper reported that 22%, an astounding number of adults surveyed, expressed some
doubt as to whether the Holocaust had actually occurred. The actual question that was presented to
people was:

"Does it seem possible, or does it seem impossible to you, that the Nazi extermination of the
Jews never happened?"

This seems to be a fairly straightforward question, but there was a big problem with it, and it caused
response bias. The problem is that the question contained a double negative, which are confusing.
Saying it is impossible that it never happened is the same as someone saying they are certain that it
did happen, but the question doesn't clearly read that way.

The good thing is that, one year later, the question was revised, and it became clearer. The new
question stated:

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 101
"Does it seem possible to you that the Nazi extermination of the Jews never happened, or do
you feel certain that it happened?"

With this new, clearer question, the question clearly distinguishes between what the two options
are--"does it seem possible," or "do you feel certain?" With the two options clearly defined, less than
2% of individuals were unsure as to whether it was real or not. This provided a more accurate
interpretation of what the American public felt.

Therefore, unclear questions can lead to an inaccurate representation due to response bias. The other
scenario in which this can occur is when people will answer a question because they are either ashamed, or
they think that there's a "right" answer that someone is fishing for.

There are certain topics that are particularly sensitive and might make a person want to lie.

Topics that Could Result in a Response Bias

This may result in many people saying they've never used drugs, whether they
Drugs actually have or not. Even if there's no consequence and the survey is anonymous,
they'll still say they've never used drugs when, in fact, they have.

Criminal history Participants might say they don't have one, even if they do.

Sexual behavior This might cover topics of a highly sensitive and personal nature.

There's an implied right answer; people don't want to say that they're racially
Racial prejudice
prejudiced.

People will report it as being higher than it actually is if they're of low-income status,
or even possibly more surprisingly, people will report it as lower than it really is if
Income they're of very high-income status. A lot of people don't want to be showy about
their wealth, and so they'll try and come up with a more reasonable number, in their
eyes.
How does this affect what we think about the population? How does this affect the "soup?"

It's like taking a sample of the soup and only tasting the things that you want to taste. Maybe you don't like
beans, and so you just sort of ignore the fact that they're in there. You don't get the overall flavor of what's
supposed to happen. It's the same thing with response bias. It doesn't give you the right overall interpretation
of what things the population is supposed to be like.

 TERM TO KNOW

Response Bias
Bias that occurs when either (1) the question is poorly worded so that certain responses are over-
represented, or (2) the respondent is confused by the question or feel like they should lie due to the
sensitive nature of the question.

 SUMMARY

Nonresponse bias occurs when people who are selected for the sample can't participate, either
because you can't find them, or because they're actively refusing. The biggest problem is that if you

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 102
have high rates of nonresponse, it might give you an inaccurate representation of what's going on
with your population. You won't be able to use your sample to draw an inference about your
population. Response bias occurs one of two ways: either a respondent doesn't understand the
question and so gives an answer that he wasn't intending; or, the respondent wants to give a
supposedly correct answer to the questioner. Both of these can be inaccurate representations of
what actually is the truth about the population. Response bias is a tough thing to get rid of, especially
when it is unintentional and surrounds the wording of the questions.

Good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Nonresponse Bias
Bias that occurs when the people who were unable to be reached or unwilling to participate in a sample
have substantially different opinions than the people who were included in the sample, resulting in a
misrepresentation of the population.

Participation Bias
Bias that occurs when participation in a study is voluntary. People who feel strongly may be the only
participants.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 103
Selection and Deliberate Bias
by Sophia

 WHAT'S COVERED

This tutorial will cover the topics of selection, deliberate, and unintentional bias. These may all impact
the selection of the right group of people for your sample, so it’s very important to be aware of them
when attempting to generalize findings. Our discussion breaks down as follows:

1. Selection Bias
2. Random Digit Dialing
3. Deliberate Bias
4. Unintentional Bias

1. Selection Bias
You may recall that sampling is like a pot of soup. Selecting a little bit of each ingredient for the soup is like
obtaining a representative sample for an experiment. However, things can go wrong with the taste test, which
may limit the ability to draw conclusions about the pot of soup as a whole.

Selection bias is also called undercoverage bias. It occurs when a significant subset of the population is left
out of the sample. This is not necessarily intentional, but rather, occurs when they were systematically ignored
by whoever was taking the sample.

IN CONTEXT
In 2008, almost every poll showed Barack Obama leading by at least five percentage points leading
up to the New Hampshire presidential primary. All of these were based on random digit dialers
calling a random sample of New Hampshire households. It was a well-done survey of all accounts.

However, what happened was that Clinton gained some support in the last few days. Mainly, a lot of

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 104
college students ended up coming out in support of Hillary Clinton in the last days when people
were expecting all college students to come out in support of Obama.

Because a lot of the college students are from out of state, they aren't actually New Hampshire
residents. For that reason, they were not counted and, as a result, the sample got every prediction
wrong and Clinton ended up winning.

 TERM TO KNOW

Selection Bias
A bias that results from systematically excluding certain subsets of the population from the sample. It
is not necessarily intentional.

2. Random Digit Dialing

The New Hampshire primary used random digit dialers. Random digit dialing involves using a machine to
select random phone numbers from within selected area codes. It doesn't randomly select the area code
necessarily, but once it's in the area code, it can randomly select digits and dial that particular phone number
after which the poll can be conducted.

The biggest advantage of using random digit dialers is that they can reach mobile phones and unlisted
numbers that you wouldn't be able to obtain using a phone book. So, it evens the playing field a bit since
anyone can be selected for that sample as long as the phone number is within that particular area code.

⚙ THINK ABOUT IT

How does selection bias affect what we think is in the soup? Imagine that certain ingredients were
located only in certain locations in the pot. Maybe noodles sunk to the bottom. If you tasted only from the
top, it doesn't matter how big that taste is. If you missed the noodles, you wouldn't even know they were
there. That's the same as dealing with selection bias. Because you didn't select the representative group
of ingredients from the population, you don't get the right idea of what's going on. It limits your ability to
generalize your findings to the general population.
 TERM TO KNOW

Random Digit Dialing

A method of contacting people on the phone. Random numbers are dialed, so this allows researchers
to sample people with unlisted phone numbers.

3. Deliberate Bias
Deliberate bias is exactly what it sounds like: it's a bias that's done on purpose. While deliberate bias doesn’t
happen very often, it can occur when there's a conflict of interest between the people performing research
and the people funding--who are usually the ones benefiting from--that research.

Typically deliberate bias is motivated by an interest unrelated to the integrity of whatever you’re researching.
Most research is done with integrity, but when personal prestige, the advancement of some ideology, or
money get in the way, it’s harder to prove that intentions are pure. Politics can be an industry ripe for
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 105
deliberate bias. Perhaps people call with a poll, but the survey includes a leading question to cause the person
to respond in a certain way. When this is done it's called “push polling” and it’s highly suspect.

IN CONTEXT
Deliberate bias can happen in other areas too--even the medical field. Suppose there are two drugs:
Drug A and Drug B. The company for Drug B posed the following leading question:

“If Drug A was linked to cancer, would you be:

more likely to choose Drug B?
less likely to choose Drug B?
equally likely to choose Drug B?”

Based on how this question was posed, Drug B would be more likely to be chosen.

But there’s more. They've put a thought into the participant’s head that Drug A is linked to cancer.
Did they ever explicitly say that? No, they said if it was linked to cancer. However, now they've
placed the association in the participant's mind. Subconsciously they're beginning to steer
consumers away from Drug A and towards Drug B.

If a drug company funds a study to determine if it's latest drug is effective, the researchers stand to gain a lot
of money and prestige for having tested the drug, if proven effective. For this reason, they might not be the
best choice to test the drug.

IN CONTEXT
An environmental research group is hired by a real estate developer to investigate the effects of a
new building. If the results are favorable, they might get another contract with that real estate
developer. If the environmental research group doesn’t come through with a favorable
interpretation, another group will, and that group will get the next contract.

The environmental research group wants to be hired by the developer on another project, so there
is a conflict of interest.

 TERM TO KNOW

Deliberate Bias
The purposeful misrepresentation of data for the purpose of advancing an agenda.

4. Unintentional Bias
Unintentional bias occurs when there is simply an error in the design of the study. Two types of unintentional
bias include:

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 106
Response bias, which involves the wording of questions or refers to people feeling like they have to lie.
Selection bias, which involves how the sample was selected, such as when people are not included in the
selection process, even though they make up a portion of the population.

Both are simply errors with no hidden agenda. They're not intentional and are not meant to purposely steer
the direction of the respondents.

 TERM TO KNOW

Unintentional Bias
Bias that is not purposeful. It exists because of errors in the design of the study.

 SUMMARY

Selection bias occurs when some subset of the population is left out. It might be intentional or
unintentional. Since some section of the population is left out, the coverage is lacking, which is why
selection bias is also known as “under-coverage”. Random digit dialing is a great tool to use since it
helps extend coverage to mobile phones and unlisted numbers. Most of the time, deliberate bias-- a
bias that is done on purpose--is not typically a cause of concern. Sometimes, however, people with
personal interests, like the advancement of an ideology or financial gain, steer results towards
outcomes that are favorable to them. Most of the time, research is done with integrity. When bias
does occur, it is accidental, which is called unintentional bias.

Good luck!

 TERMS TO KNOW

Deliberate Bias
The purposeful misrepresentation of data for the purpose of advancing an agenda.

Random Digit Dialing

A method of contacting people on the phone. Random numbers are dialed, so this allows researchers to
sample people with unlisted phone numbers.

Selection Bias
A bias that results from systematically excluding certain subsets of the population from the sample. It is
not necessarily intentional.

Unintentional Bias
Bias that is not purposeful. It exists because of errors in the design of the study.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 107
Convenience & Self-Selected Samples
by Sophia

 WHAT'S COVERED

This lesson will explain two types of samples: convenience and self-selected samples. Our discussion
breaks down as follows:

1. Representative Samples
2. Non-Representative Samples
a. Convenience Samples
b. Self-selected Samples

1. Representative Samples
One of the things that we know about sampling is that it's important for samples to be representative of the
population, also known as a representative sample. What we mean by that is when we take our sample--
which is a subset of a larger population--we want this sample to behave just like the population would if we
sampled them all.

 DID YOU KNOW

Now, sampling everybody is not a sample at all; that’s called a census.

We want the sample to behave as similar to the population as possible so that when we calculate statistics
from our data, the statistics are as accurate about the population as they can be.

⭐ BIG IDEA

The sample should represent the group/population at large, so it’s important individuals are selected
carefully for the sample. That way, accurate information will be gained and can be used to describe the
group/population at large.
The goal is to generalize what is found in the sample and apply it to the people outside of the box, or the
population.

 TERM TO KNOW

Representative Sample
A sample that accurately reflects the population.

2. Non-Representative Samples
The two methods analyzed in this tutorial have major flaws--these two designs donot result in representative

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 108
samples. They are conducted often, so it’s important for you to recognize them.

2a. Convenience Samples

A convenience sample that is easily obtained is not valid because people in similar locations often feel the
same way.

IN CONTEXT
Suppose there is a crowd of people at a mall and there is one guy with a clipboard, and he wants
some data. He might take the people nearest to him, and say, “Hey, would you like to take my
survey, please?”

The people he asks might be representative of the population, but they might not. They all simply
happen to be at the same place at the same time. This means they might have some similarities that
could make them not representative of the larger population. The risk of them not representing the
group/population at large is too high.

 EXAMPLE If you ask people about their spending habits, and they all happen to be shopping in
the headphones section, that probably means they have similar ideas about how they should spend
their money.
 TERM TO KNOW

Convenience Sample
A sample that is easily obtained. It is often not representative of the population.

2b. Self-selected Samples

Next, let's discuss self-selected samples, which are also called voluntary response samples. These are
samples where people can choose to participate.

 EXAMPLE Focus groups are a common example of self-selected samples.

Participants who feel very strongly about the subject at hand are likely to be the volunteer for the self-
selected sample. On the other end of the spectrum, participants may be compensated for their time and may
simply tell the interviewer what they want to hear.

 EXAMPLE If your focus group is about politics, you might get only the very, very liberal people or
the very, very conservative people. You might get the most extreme viewpoints but none of the
viewpoints in the middle. Or, there are also a lot of people who are ambivalent about politics. They
don't really care, but they want to get paid if this is a sample that offers compensation or another type
of reward like free lunch.
 TERM TO KNOW

Self-Selected (Voluntary Response) Sample

A sample that the participants choose to be a part of.

 SUMMARY

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 109
Representative samples are important if we want to accurately generalize our findings to the
population. Convenience samples render people who are simply in the vicinity and happen to be at
the same place at the same time. Self-selected samples are also called “voluntary response” and tend
to elicit either strong opinions or no opinion at all.

Good luck!

 TERMS TO KNOW

Convenience Sample
A sample that is easily obtained. It is often not representative of the population.

Representative Sample
A sample that accurately reflects the population.

Self-Selected (Voluntary Response) Sample

A sample that the participants choose to be a part of.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 110
Random and Systematic Errors
by Sophia

 WHAT'S COVERED

This tutorial will compare random errors vs. systematic errors. Our discussion breaks down as follows:

1. Random Errors
2. Systematic Errors

1. Random Errors
Random errors are exactly that: random. They can simply occur through no fault of the person taking the
sample. When a sample is taken from a larger population, the results are unknown, meaning that it’s unclear if
the results will accurately represent exactly what the population looks like.

IN CONTEXT
Suppose there were 100 individuals, which we will consider the population. Twenty of them were
college students. You select 5 people out of the overall 100 for a sample. What would you expect to
happen?

You would expect that twenty percent of the population are college students, which is one out of
every 5 people. So you would probably expect one individual within your sample of 5 people to be a
college student.

However, that doesn't always happen. You might not get any college students, or all five of them
may be college students. Just because you expect to get one doesn't mean that will actually
happen. Why not?

Let’s say that the individuals with numbers 1 - 20 are the college students. Numbers 21 - 100 are
individuals not in college. Using a random number generator, you might get a simple random sample
that looks like this:

Sample Percentage

85, 27, 17, 94, 74 1 of 5, or 20%

One out of five of those is a college student, which is 20%.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 111
Another simple random sample might look like this:

Sample Percentage

72, 92, 45, 20, 38 1 of 5, or 20%

Again, one out of five is a college student.

However, you might get a simple random sample that looks like this:

Sample Percentage

46, 5, 83, 26, 20 2 of 5, or 40%

Here, the second person, number 5, and the fifth person, number 20, are college students, out of
100 individuals in the population. That’s 40%. What went wrong? Nothing went wrong--it’s just that
random errors happen sometimes.

Random error occurs when the sample, just by chance, doesn't match up perfectly with the population.
Random error is not a mistake that is correctable; it is simply something that happens when sampling
randomly. While it can’t be corrected or avoided completely, the impact can be minimized by increasing the
sample size. The larger the group, the better the chances are that a representative group will be obtained.

 EXAMPLE Recall the example from above. Suppose that ten individuals from the group of 100
were chosen instead of five. Two college students would be expected to make it into the sample. So, if
the sample was off by one, it reduces the impact since at least one college student would be
represented.
 TERM TO KNOW

Random Error
When the resulting value obtained from the sample does not match the value from the population
simply by chance. This is not a mistake, but is inherent in the variability in sampling.

2. Systematic Errors
Now, by contrast, systematic errors are mistakes. Systematic errors are due to flaws in the design.

IN CONTEXT
Suppose a school board wants to estimate how many students are eligible for free or reduced lunch.
If you have an under-coverage bias, or selection bias, your sample may include people from a
poorer neighborhood that didn't respond to a questionnaire that was sent out. Perhaps their parents
were working nights and didn’t have time to complete the survey.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 112
Therefore, the board may underestimate the true number of students requiring free and reduced
lunch. This type of error cannot be remedied by increasing the sample size.

 EXAMPLE A child has a growth chart in his room and his parents mistakenly put it up above the
baseboard--an extra 2 inches from the floor. This is going to result in the child thinking he’s 2 inches
shorter than he actually is, an example of measurement bias, which is systematically wrong.
 TERMS TO KNOW

Systematic Error
When the resulting value obtained from the sample does not match the value from the population as a
result of an incorrect measurement or bias. This is a mistake made by the researcher.

Selection Bias
A bias that occurs when certain groups are systematically left out of the sample. This is a systematic
error.

Measurement Bias
A mistake in the measurements taken in the study. This is a systematic error.

 SUMMARY

Random errors occur when the sample selected doesn't match up with the population. It cannot be
controlled, but using a larger sample will lessen the effect. Conversely, systematic errors result in
wrong answers or wrong values in your sample, due to some kind of bias or error with your
measurement. Increasing the sample size will not fix the issue. When a systematic error occurs, you
might as well just start over, because there's no rescuing poorly collected data!

Good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Measurement Bias
A mistake in the measurements taken in the study. This is a systematic error.

Random Error
When the resulting value obtained from the sample does not match the value from the population
simply by chance. This is not a mistake, but is inherent in the variability in sampling.

Selection Bias
A bias that occurs when certain groups are systematically left out of the sample. This is a systematic
error.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 113
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 114
Margin of Error
by Sophia

 WHAT'S COVERED

This tutorial will explain margin of error by focusing specifically on:

1. Margin of Error
2. Confidence Interval

1. Margin of Error
You may have seen something in your local newspaper stating that, for example, a political candidate leads
the field by 5%, and that there is a 3% margin of error in the poll. What does this mean?

When surveys are done, collecting the right amount of data is important to ensure the answer is correct.
Samples are often reported with something called a margin of error, meaning that the results may be off by a
little bit, though it can be estimated by how much. It explains to the reader that the right answer is not 100%
accurate, but it is a close estimate.

IN CONTEXT
Suppose you are an administrator of a school and you need to determine the overall percentage of
left-handed students. Maybe 10% of students in the school are left-handed, but when you take a
sample, even though you were diligent about the way data was collected, you got 8%. The answer
was not accurate. What happened?

It's possible that the data obtained was not exactly the same as what the population would have
obtained. Maybe only 8% of left-handed people were in the sample, even though the population
actually contains 10% who are left-handed. You didn't do anything wrong, but samples might be
inherently off the mark due to the random selection process.

 TERMS TO KNOW

Margin of Error
An amount by which we believe our sample's mean may deviate from the true mean of the population.

Estimate
The mean value obtained from the sample. If the sample was well-collected, the estimate should be
reasonably close to the true value.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 115
2. Confidence Interval
The confidence interval uses both the estimate and margin of error. When we combine these two parts, it
gives us a range of possible values that our estimate can be.

This confidence level tells us how sure we are that our interval contains the actual population value or how
sure we are that our sample falls in that range.

IN CONTEXT
Suppose a newspaper polled 500 voters and 48% responded that they were going to vote for
Candidate X in the upcoming election. The newspaper might print a margin of error along with that
48% mark; perhaps they use four percentage points as their margin of error. It's not particularly
important how this 4% was calculated, but it is important to note that a margin of error was reported
along with the percent value.

What does this 4% margin of error mean? It means the researchers are pretty confident that the true
amount of people that will vote for Candidate X is within 4% of 48, which means that it could be as
low as 44%, or as high as 52%, or anywhere in between. This idea of creating some wiggle room on
either side of 48% is the confidence interval.

Suppose on election day, 46% of the people voted for Candidate X. Since this falls into the range of
44% to 52%, it is a close enough estimate to the right answer.

⚙ THINK ABOUT IT

What happens to the margin of error as the sample size increases? Will the margin of error go up, down,
or stay about the same?
As the sample size goes up, the margin of error goes down because a larger sample size gives a more
accurate portrait of the population. What’s happening is that you cast a wider net to include people that may
be closer to representing the actual population.

If you had a sample size of 4 people and you want to generalize the findings to a population of 200 people,
it’s unlikely that just those four people have enough of the characteristics to represent the population.

However, when the sample size is increased, you get closer to achieving a representative sample, which
means the confidence interval can be lower; in other words, the higher the sample size, the less wiggle is
needed room on each side of the measurement.

 TERM TO KNOW

Confidence Interval
A range of potential values that the true value could be. It is obtained by adding and subtracting the
margin of error from the value in the sample.

 SUMMARY

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 116
Most statistical results are reported alongside a margin of error, which is an amount by which the
sample's mean may deviate from the true mean of the population. If the data is well-collected, then it's
likely that the true population value is within the confidence interval created by the reported value,
plus or minus the margin of error. It's a bad idea to compare two values within the same confidence
interval since both would be accurate enough to be correct. That would be a statistical dead heat.

Good luck!

Source: Adapted from Sophia tutorial by Jonathan Osters.

 TERMS TO KNOW

Confidence Interval
A range of potential values that the true value could be. It is obtained by adding and subtracting the
margin of error from sample mean.

Estimate
The mean value obtained from the sample. If the sample was well-collected, the estimate should be
reasonably close to the true value.

Margin of Error
An amount by which we believe our sample's mean may deviate from the true mean of the population.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 117
Terms to Know
Absolute Change
The raw increase or decrease in the value of a variable

Accuracy
The extent to which the values, when considered all together, center around the correct
value for a variable.

Available Data
Data collected by some other entity - a government organization or private company.

Bias
The tendency for collected data to differ from what is expected in a systematic way. Biased
data can often favor a specific group of those studied.

Binomial Question Type

A question that will yield categorical data with just two possible values.

Blinding
The practice of making sure that certain individuals do not know which subjects are
receiving which treatment.

Census
Using the entire population to obtain data

Closed Question
A question type with only so many different answer choices.

Cluster Sample
A sampling method where the population is separated into groups, typically geographically,
and a random selection of clusters is made. Each individual in the cluster becomes part of
the sample.

Clusters
Smaller subgroups of the population, not necessarily similar in any way besides all being
together in one place, making the individuals easier to sample together.

Completely Randomized Design

An experimental design where the assignment of subjects to treatments is done entirely at

Confidence Interval
A range of potential values that the true value could be. It is obtained by adding and
subtracting the margin of error from sample mean.

Confounding
Occurs when the effects of the treatments, if any, are indistinguishable from the potential
effects of some other variable which was unaccounted for.

Confounding Variable
A variable which was not accounted for in a study, which limits the conclusions that the
study can draw.

Consumer Price Index

An index published by the US Bureau of Labor Statistics that shows the change in the price
of many different goods or services in the United States. It provides a measure of
purchasing power.

Continuous Data
Data that can take any value within an interval.

Control
The principle of experimental design that requires that other variables which may confound
the experiment be held constant between the treatment groups, so that any differences in
the groups can be attributed to the different treatments.

Convenience Sample
A sample that is easily obtained. It is often not representative of the population.

Data
Information used in a study to answer a statistical question

Deliberate Bias
The purposeful misrepresentation of data for the purpose of advancing an agenda.

Descriptive statistics
Using only the information at hand to describe the selected group of individuals

Discrete Data
Data that can only take so many different values.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 119
Double-Blind Experiment
An experiment where neither the subjects, nor anyone in contact with them, has any
knowledge of which subjects are receiving which treatment.

Estimate
The mean value obtained from the sample. If the sample was well-collected, the estimate
should be reasonably close to the true value.

Experiment
A type of study where researchers impose treatments on the participants or experimental
units.

Experimental Design
The way in which an experiment is carried out. A good design has key elements of
randomization, replication, and control.

Experimental Unit
An animal or thing involved in an experiment.

Explanatory Variable
A variable that we believe is predictive of something else. An increase in this variable will
correspond to an increase or decrease in some other variable.

Hawthorne Effect
People have the tendency to change their behavior when they know they are being
monitored.

Index Number
A way to measure the relative change in a value, usually the price of a good or service, over
time. If the index number is over 100, that means the price has increased. If the price has
decreased, then the index number will be less than 100.

Inferential statistics
Using the information at hand to make a larger, more general statement about the entire
population of individuals

Inflation
A relative increase in the price of a good or service over time. A person will need to pay
more to receive the same good or service than they did at a previous point in time.

Margin of Error

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 120
An amount by which we believe our sample's mean may deviate from the true mean of the
population.

Matched-Pair Design
An experimental design where two subjects who are similar with respect to variables that
could affect the outcome of the experiment are paired together, then one of them is
assigned to one treatment and one is assigned to the control. This can also be done by
assigning each subject to both treatments, where each subject acts as their own matched-
pair.

Measurement Bias
A mistake in the measurements taken in the study. This is a systematic error.

Multi-Stage Sampling
A sampling design which combines elements of cluster sampling, stratified random
sampling, and simple random sampling. It "zooms in" on smaller areas to sample so that
sampling becomes more feasible.

Nominal Data
Categorical data with qualities that cannot be ordered or ranked.

Nonresponse Bias
Bias that occurs when the people who were unable to be reached or unwilling to participate
in a sample have substantially different opinions than the people who were included in the
sample, resulting in a misrepresentation of the population.

Observational Study
A type of study where researchers can observe the participants, but not affect the behavior
or outcomes in any way.

Open Question
A question type with no answer choices; the respondent can choose what he or she wants
to say to answer the question.

Ordinal Data
Categorical data with qualities that can be ordered or ranked.

Participation (Voluntary Response) Bias

Bias that occurs when a sample consists entirely of volunteers. People with strong opinions
may be the only ones who volunteer.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 121
Participation Bias
Bias that occurs when participation in a study is voluntary. People who feel strongly may be
the only participants.

Percent Change
A relative increase or decrease in a percent value

Percentage Points
An absolute increase or decrease in a percent value.

Placebo
An inert drug or treatment given to the control group. It has no active ingredient in it.

Placebo Effect
The observed phenomenon whereby certain individuals will exhibit a desired response
even when taking a placebo, which contains no active ingredient.

Population
The entire set of individuals from which to sample

Precision
The extent to which the values are very close to each other, even if they are not near the
correct value.

Probability Sampling Plan

The way to collect a random sample that guarantees a certain likelihood for each member
of the population to be selected

Prospective Study
A study that begins by selecting participants, then tracks them and keeps data on the
subjects as they go into the future.

Publication Bias
The desire of researchers (and research publications) to only print the most sensational or
interesting articles.

Qualitative (Categorical) Data

Data that describes. It can't be measured or used for arithmetic.

Quantitative (Numerical) Data

Data that is numerical. It can be measured and it can be used for arithmetic. .

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 122
Random Digit Dialing
A method of contacting people on the phone. Random numbers are dialed, so this allows
researchers to sample people with unlisted phone numbers.

Random Error
When the resulting value obtained from the sample does not match the value from the
population simply by chance. This is not a mistake, but is inherent in the variability in
sampling.

Random Number Generator

A method of collecting a sample that utilizes technology to select random numbers
corresponding to individuals in the population

Random Number Table

A method of collecting a sample to select random numbers corresponding to individuals in
the population. Each individual is assigned a number, which are then selected from the
table.

Random Sample
A sample that has been selected in a manner where every member of the population has
some predetermined chance of being selected for the sample

Random Selection
The method of obtaining a random sample

Randomization
The principle of experimental design that requires that the subjects/experimental units be
assigned to groups using some random process. This ensures that the two groups are
roughly equal prior to assigning treatments.

Randomized Block Design

An experimental design where the subjects are separated into homogenous groups, called
blocks, based on some variable we think may affect the outcome of the experiment. We
then run the experiment separately within each block.

Raw Data
Unorganized, unprocessed and not summarized.. Typically, this is data that is not already
available

Reference Value
An arbitrarily chosen starting value for an index. It is assigned an index number of 100.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 123
Relative Change
The percent increase or decrease in the value of a variable.

Replication
Repeating the experiment on multiple subjects/experimental units. This principle of
experimental design that states that a larger experiment with more subjects/experimental
units will allow us to more clearly see differences between the treatments.

Representative Sample
A sample that accurately reflects the population

Response Bias
Bias that occurs when a respondent tells the interviewer "what they want to hear" or lies
due to the sensitive nature of the question.

Response Variable
A variable that is affected by the explanatory variable.

Retrospective Study
A study that observes what happened to the subjects in the past, in an effort to understand
how they became the way they are in the present.

Sample/Sampling
A subset of the population. There are many ways to select a sample.

Selection Bias
Selecting a sample in such a way that certain subsets of the population are systematically
excluded.

Self-Selected (Voluntary Response) Sample

A sample that the participants choose to be a part of.

Simple Random Sample

A method of selection that guarantees that every sample of a certain size has an equal
chance of being the selected sample

Single-Blind Experiment
An experiment where either the subjects have no knowledge of which subjects are
receiving which treatment, or people in contact with the subjects have no knowledge of
which subjects are receiving which treatment, but not both.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 124
Statistical analysis
All the ways of collecting, analyzing, and interpreting the data

Statistical study
A way to collect information from individuals

Statistics
The study of collecting, analyzing, interpreting, and presenting information

Stratified Random Sample

A random sampling method where individuals are separated into homogenous groups, then
simple random samples are taken within each group.

Stratum/Strata
The homogenous groups in a stratified random sample. All individuals in each stratum have
something in common, and we would like to see how that affects the outcome of the
sample.

Subjects/Participants
The people or things being examined in an observational study.

Survey Design
The way the survey is set up. This deals with the wording of questions and answer choices.

Survey/Sample Survey
A data collection tool that individuals in a study can fill out and return to the researcher.

Systematic Error
When the resulting value obtained from the sample does not match the value from the
population as a result of an incorrect measurement or bias. This is a mistake made by the
researcher.

Systematic Random Sample

A sampling method where every "k"th individual is selected for the sample (e.g. every 2nd,
4th, 20th individual)

Treatment
Something the researchers administer to the subjects or experimental units.

Unintentional Bias
Bias that is not purposeful. It exists because of errors in the design of the study.

© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 125
Variable
Any attribute or number that can be measured about individuals in a study.

Variable of Interest
Any variable which we need to know about in the context of a study.

Variables of Interest
The variables the survey wishes to measure about those taking the survey.

Formulas to Know
Absolute Change

Index Number

Relative Change

Statistics by Jim PDF
40% (5)
Statistics by Jim PDF
25 pages
Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries
From Everand
Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries
Jim Frost
5/5 (1)
MDG Related Statistics Statistics Sierra Leone: Sesric
No ratings yet
MDG Related Statistics Statistics Sierra Leone: Sesric
51 pages
Gfmam The Maintenance Framework First Edition English Version
100% (1)
Gfmam The Maintenance Framework First Edition English Version
24 pages
Unit 1 Tutorials Key Principles of Statistical Methods
No ratings yet
Unit 1 Tutorials Key Principles of Statistical Methods
85 pages
Unit 1 Tutorials Key Principles of Statistical Methods
No ratings yet
Unit 1 Tutorials Key Principles of Statistical Methods
135 pages
COMP 312 Chapter 1
No ratings yet
COMP 312 Chapter 1
13 pages
1define What Is Statistic
No ratings yet
1define What Is Statistic
2 pages
LESSON-1 Statistics
No ratings yet
LESSON-1 Statistics
7 pages
Comp. Stats
No ratings yet
Comp. Stats
9 pages
Notes
No ratings yet
Notes
5 pages
8 - 5-24 L1 Stat
No ratings yet
8 - 5-24 L1 Stat
9 pages
UNIT 1 - Introduction - Basic Sources of Statistics
No ratings yet
UNIT 1 - Introduction - Basic Sources of Statistics
5 pages
BasicStatsGuide Final 6-21-07
No ratings yet
BasicStatsGuide Final 6-21-07
29 pages
Data Sources Descriptive Statistics Statistical Inference Computers and Statistical Analysis
No ratings yet
Data Sources Descriptive Statistics Statistical Inference Computers and Statistical Analysis
40 pages
BS Topic 1. Post Lecture
No ratings yet
BS Topic 1. Post Lecture
16 pages
GROUP 1 SEC. 22 MPA Chapter 5 Introduction and Application of Research Statistics
No ratings yet
GROUP 1 SEC. 22 MPA Chapter 5 Introduction and Application of Research Statistics
7 pages
Week 05 - Introduction To Statistics
No ratings yet
Week 05 - Introduction To Statistics
42 pages
Statistics Handbook For Data Analysts - by Anita Gupta - Medium
No ratings yet
Statistics Handbook For Data Analysts - by Anita Gupta - Medium
17 pages
Introduction To Statistics (Stat 2181)
No ratings yet
Introduction To Statistics (Stat 2181)
169 pages
Educational Statistics EDU 408.doc Ready
No ratings yet
Educational Statistics EDU 408.doc Ready
41 pages
Pa 1 2024
No ratings yet
Pa 1 2024
88 pages
CamScanner 12-28-2021 16.31
No ratings yet
CamScanner 12-28-2021 16.31
59 pages
Sasa Reviewer P1 J P4 at P5
No ratings yet
Sasa Reviewer P1 J P4 at P5
10 pages
Sample Intro Statistics Intuitive Guide
50% (2)
Sample Intro Statistics Intuitive Guide
25 pages
Lesson 1 Definition of Statistics
No ratings yet
Lesson 1 Definition of Statistics
49 pages
Lectures in Educational Statistics
No ratings yet
Lectures in Educational Statistics
16 pages
Fundamentals of Statistics
No ratings yet
Fundamentals of Statistics
10 pages
PDF Maker 1746194114925
No ratings yet
PDF Maker 1746194114925
28 pages
Overall Descriptive Statistics
No ratings yet
Overall Descriptive Statistics
127 pages
Definition and Kinds of Statistics
No ratings yet
Definition and Kinds of Statistics
11 pages
Prepared by Kenish
No ratings yet
Prepared by Kenish
28 pages
Stats 3Q
No ratings yet
Stats 3Q
7 pages
Topic 1
No ratings yet
Topic 1
43 pages
WEEK 1 - Basic Concepts of Statistics
No ratings yet
WEEK 1 - Basic Concepts of Statistics
2 pages
CE2B - BARUELO, CHRISTIAN - Check-Up Activity On Introduction To Stats and DA
No ratings yet
CE2B - BARUELO, CHRISTIAN - Check-Up Activity On Introduction To Stats and DA
2 pages
Stat 1
No ratings yet
Stat 1
5 pages
Module 1
No ratings yet
Module 1
44 pages
Unit Iv-Gen Ed 4
No ratings yet
Unit Iv-Gen Ed 4
26 pages
Lesson 1 Stats
No ratings yet
Lesson 1 Stats
5 pages
Data Science 2
No ratings yet
Data Science 2
8 pages
Data Analysis Quantitative
No ratings yet
Data Analysis Quantitative
10 pages
Module1-Talk-GITAA-modified (Autosaved)
No ratings yet
Module1-Talk-GITAA-modified (Autosaved)
328 pages
Sw8prelim Stat
No ratings yet
Sw8prelim Stat
10 pages
Engineering Data Analysis: Categories of Statistics
No ratings yet
Engineering Data Analysis: Categories of Statistics
23 pages
Statistics
No ratings yet
Statistics
35 pages
Statistics, Data, and Statistical Thinking
No ratings yet
Statistics, Data, and Statistical Thinking
19 pages
Statistics 1 Chapter 1
No ratings yet
Statistics 1 Chapter 1
28 pages
Topic 1 - Ch1 Ch2
No ratings yet
Topic 1 - Ch1 Ch2
20 pages
Stat
No ratings yet
Stat
9 pages
Types of Statistical Analysis
No ratings yet
Types of Statistical Analysis
2 pages
Statistics 5e Chapter 1
No ratings yet
Statistics 5e Chapter 1
54 pages
Umehabiba - 2340 - 4448 - 3 - Lec 1,2
No ratings yet
Umehabiba - 2340 - 4448 - 3 - Lec 1,2
41 pages
Statistics Beginners Guide
No ratings yet
Statistics Beginners Guide
42 pages
Math in The Modern World
No ratings yet
Math in The Modern World
5 pages
Slide 1
No ratings yet
Slide 1
3 pages
STATISTICS N Quantitative
No ratings yet
STATISTICS N Quantitative
58 pages
Statistics
100% (1)
Statistics
12 pages
MEL761: Statistics For Decision Making: About The Course
No ratings yet
MEL761: Statistics For Decision Making: About The Course
65 pages
Chap1 and 2
No ratings yet
Chap1 and 2
62 pages
Data Collection: Getting Started With Statistics
From Everand
Data Collection: Getting Started With Statistics
Lee Baker
No ratings yet
ERP in FMCG Company
No ratings yet
ERP in FMCG Company
48 pages
T150mm - Beam and Blocks PDF
No ratings yet
T150mm - Beam and Blocks PDF
2 pages
VRTM
No ratings yet
VRTM
161 pages
Adjeivos Comparativos y Superativos Teoria y Practica
No ratings yet
Adjeivos Comparativos y Superativos Teoria y Practica
4 pages
Short Story
No ratings yet
Short Story
2 pages
A Brief History of Consumer Culture
No ratings yet
A Brief History of Consumer Culture
6 pages
Cka Practice Questions
100% (1)
Cka Practice Questions
11 pages
Jamb Mat Questions 1 5
No ratings yet
Jamb Mat Questions 1 5
46 pages
Pengaruh Feed Rate Terhadap Sifat Mekanik Pada Pengelasan Friction Stir Welding Alumunium 6110
No ratings yet
Pengaruh Feed Rate Terhadap Sifat Mekanik Pada Pengelasan Friction Stir Welding Alumunium 6110
10 pages
Proposal Assignment
No ratings yet
Proposal Assignment
10 pages
Term 2 Basic 3 Week 3 Lesson Plan
No ratings yet
Term 2 Basic 3 Week 3 Lesson Plan
20 pages
Trevithick Second Steam Locomotive PDF
50% (2)
Trevithick Second Steam Locomotive PDF
6 pages
2 PDF
No ratings yet
2 PDF
232 pages
FCE Sample Use of English 1, Twins, Edinburugh, Languages
No ratings yet
FCE Sample Use of English 1, Twins, Edinburugh, Languages
6 pages
CPAP-HFNC - Medin - NC3 Ops - Manual Book
No ratings yet
CPAP-HFNC - Medin - NC3 Ops - Manual Book
59 pages
RDMC - Cairo Metro Line-3 Checklist 03-02: Rail - Greasy Status Check Preventive
No ratings yet
RDMC - Cairo Metro Line-3 Checklist 03-02: Rail - Greasy Status Check Preventive
1 page
1.1 How-To-Use-This-Competency-Based-Learning-Material
No ratings yet
1.1 How-To-Use-This-Competency-Based-Learning-Material
2 pages
Illustrated Parts Catalog Bo105 Ls A-3: Lifting System Assy
No ratings yet
Illustrated Parts Catalog Bo105 Ls A-3: Lifting System Assy
2 pages
Module of Applied Entomology Only Agricultural Part
No ratings yet
Module of Applied Entomology Only Agricultural Part
53 pages
A First Book Nature UK Part4
100% (1)
A First Book Nature UK Part4
13 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
2 pages
Scherfi Gsvej 8, DK-2100 Copenhagen Ø, Denmark Tel.: +45 39 17 17 17. Fax: +45 39 17 18 18. E-Mail: Postmaster@euro - Who.int Web Site: WWW - Euro.who - Int
No ratings yet
Scherfi Gsvej 8, DK-2100 Copenhagen Ø, Denmark Tel.: +45 39 17 17 17. Fax: +45 39 17 18 18. E-Mail: Postmaster@euro - Who.int Web Site: WWW - Euro.who - Int
205 pages
ACIIA July Newsletter
No ratings yet
ACIIA July Newsletter
14 pages
Mini Research On Homeless
No ratings yet
Mini Research On Homeless
6 pages
Telehandler Genie GTH 1048-Specifications
No ratings yet
Telehandler Genie GTH 1048-Specifications
2 pages
FINAL MODEL PAPER 2023-24 Class 7
No ratings yet
FINAL MODEL PAPER 2023-24 Class 7
11 pages
RRL
100% (1)
RRL
3 pages
Se 221FJ01071
No ratings yet
Se 221FJ01071
3 pages
IFU SURGICAL INSTRUMENTS Titan
No ratings yet
IFU SURGICAL INSTRUMENTS Titan
2 pages