Welcome to S&DS 242!
Why study statistics?
▶ Statistics is everywhere: academics fields, finance, industry,
politics, sports
▶ And the misuse is statistics is astounding
▶ Why Most Published Research Findings Are False
▶ The ASA Statement on p-Values: Context, Process, and
Purpose
▶ https://xkcd.com/882/
Goals
Course goals:
▶ Give you a solid foundation in statistical theory
▶ Be able to formulate a statistical approach to real-world
problems
▶ Understand the assumptions and limitations of various
statistical methods
What is statistics?
Figure 1: https://towardsdatascience.com/difference-between-
probability-and-statistics-d69db0ff3f71
Example 1
Probability: What is the probability of obtaining 60 or more heads
in 100 tosses of a fair coin?
Statistics: Suppose we flip a coin 100 times and obtain heads 60
times. How can we estimate 𝑝, the probability that the coin comes
up heads? How can we quantify our uncertainty based on limited
data?
Example 2
Probability: A drug has an effectiveness of 80% at curing a
particular disease. If the drug is given to 50 patients, on average
40 will be cured, and at least 34 will be cured 99% of the time.
Statistics: Suppose 41 out of 50 patients were cured. We can be
95% confident that the effectiveness of the drug will be between
71.35% and 92.65% on other groups of patients.
Example 3: German tank problem
From https://en.wikipedia.org/wiki/German_tank_problem
The problem is named after its
historical application by Allied
forces in World War II to the es-
timation of the monthly rate of
German tank production from
very limited data. This ex-
ploited the manufacturing prac-
tice of assigning and attach-
ing ascending sequences of se-
rial numbers to tank compo-
nents (chassis, gearbox, engine,
wheels), with some of the tanks
eventually being captured in
battle by Allied forces.
Example 3 (continued)
The members of a population of unknown size are labeled
1, 2, … , 𝑁 (where 𝑁 is unknown). Suppose we obtain a random
sample from the population whose labels are 3, 24, 57, 66, and
119. How can estimate 𝑁 ?
Let’s do some statistics!
… but first we’re going to spend a few class periods reviewing
probability theory