0% found this document useful (0 votes)
73 views49 pages

Level-Up Probability and Statistics

The document outlines an agenda for a workshop on probability theory and statistics, beginning with an overview of probability theory including probability distributions, expected value, variance, conditional probability, and independence. It then discusses statistics topics such as data types, common distributions like the normal distribution, and the central limit theorem. The document provides examples and explanations of key probability concepts to help introduce participants to probability and statistics.

Uploaded by

hafemik287
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views49 pages

Level-Up Probability and Statistics

The document outlines an agenda for a workshop on probability theory and statistics, beginning with an overview of probability theory including probability distributions, expected value, variance, conditional probability, and independence. It then discusses statistics topics such as data types, common distributions like the normal distribution, and the central limit theorem. The document provides examples and explanations of key probability concepts to help introduce participants to probability and statistics.

Uploaded by

hafemik287
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Level-up!

Probability and Statistics


Morris Alper

Morris Alper – ITC Level-up December 2020 cohort


Agenda

Probability Theory
● Probability and random variables
● Expected value, variance, and standard deviation
● Conditional probability
● Independence

Statistics
● Data types
● Common distributions
● Central limit theorem
Probability Theory
Probability Theory

The probability of an event is a number between 0 and 1 that


describes how likely that event is to occur.

For example, I can ask:

● What is the probability of getting heads when I flip a fair


coin?
Probability Theory

The probability of an event is a number between 0 and 1 that


describes how likely that event is to occur.

For example, I can ask:

● What is the probability of getting heads when I flip a fair


coin? 0.5
Probability Theory

The probability of an event is a number between 0 and 1 that


describes how likely that event is to occur.

For example, I can ask:

● What is the probability of getting HH when I flip a fair coin


twice?
Probability Theory

The probability of an event is a number between 0 and 1 that


describes how likely that event is to occur.

For example, I can ask:

● What is the probability of getting HH when I flip a fair coin


twice? 0.25
Probability Theory

The probability of an event is a number between 0 and 1 that


describes how likely that event is to occur.

For example, I can ask:

● What is the probability of getting all heads when I flip a fair


coin 100 times?
Probability Theory

The probability of an event is a number between 0 and 1 that


describes how likely that event is to occur.

For example, I can ask:

● What is the probability of getting all heads when I flip a fair


coin 100 times? 1/2100 ≈ 7.9 x 10-31
Probability Theory

For discrete, equally likely events we can measure the probabilities of


outcomes by counting:

Example: Probability of heads when flipping one coin:

Positive outcome(s): H (1 possibility) P(H) = 1/(1+1) = 0.5

Negative outcome(s): T (1 possibility)

Reproduced from
https://en.wikipedia.org
/wiki/Probability
Probability Theory

For discrete, equally likely events we can measure the probabilities of


outcomes by counting:

Example: Probability of HH when flipping two coins:

Positive outcome(s): HH (1) P(HH) = 1/(1+3) = 0.25

Negative outcome(s): TT HT TH (3)

Reproduced from
https://en.wikipedia.org
/wiki/Probability
Probability Theory

Example: Probability of all heads when flipping 100 coins:

Positive outcome(s): HHH… (1)

Negative outcome(s): 2100-1 possibilities

P(all heads) = 1/2100 ≈ 7.9 x 10-31

Reproduced from
https://en.wikipedia.org
/wiki/Probability
Probability Theory

Example: Probability of getting


total K when rolling two dice.

P(K=1) = 1/36 = 0.02777…

P(K=7) = 6/36 = 0.1666…

etc.
Reproduced from
https://en.wikipedia.org
/wiki/Probability
Probability Theory

A random variable is any random value that can be sampled

A probability distribution is a function that represents the


probability of the random variable taking some value(s)

For a random variable X with discrete values, we write P(X = k)


to mean the probability that X takes value k. This is called the
Probability Mass Function (PMF)
Probability Theory

Example 1: The random variable X is the result of a coin toss

P(X = heads) = 0.5

P(X = tails) = 0.5

Example 2: The random variable X is the age bracket of a


random Israeli

P(X = “15-24 years”) = 0.16


Source: https://www.indexmundi.com/israel/age_structure.html
Probability Theory

For a random variable X with continuous values, we write


𝑏
P(a < X < b) = ‫𝑋𝑝 𝑎׬‬ 𝑡 𝑑𝑡

Where pX(t) is the Probability Density Function (PDF) of X


Probability Theory

Example 3: The random variable X is


the height (cm) of a random adult male

The PDF is approximately a normal


curve (explanation to come…)
𝑡 −177 2
1
pX(t) ≈ 𝑒 − 128
8 2𝜋

Integrating this gives the probability of


an adult being in a height range
Probability Theory

Definition: The expected value of a random variable X is


defined to be

• E[X] = σ 𝑡 ⋅ 𝑃 𝑋 = 𝑡 (if X is discrete)

• E[X] = ‫𝑡𝑑 𝑡 𝑋𝑝 ⋅ 𝑡 ׬‬ (if X is continuous)


Probability Theory

Example:

Let X be the payoff upon playing the lottery, with probability


p=5.7x10-9 of winning $2 million.

The expected payoff is:

𝐸 𝑋 = 𝑝 ⋅ 2 × 106 + 1 − 𝑝 ⋅ 0 ≈ $0.0114

So on average you can expect to win over one cent every


time you play! 🤑 🤑 🤑
Probability Theory

Different probability
distributions can have the
same expected value but be
more or less spread out:

Q: How can we measure this?


Probability Theory

A: Variance, defined as:

𝑉𝑎𝑟 𝑋 = 𝐸[ 𝑋 − 𝜇 2 ]

where 𝜇 = 𝐸[𝑋] is the expected value of X.

Standard Deviation is defined as the square root of


variance:
𝜎𝑋 = 𝑉𝑎𝑟(𝑋)
Probability Theory

Example 1:

For a fair coin ( P(X = 0) = P(X = 1) = 0.5), we have

𝑬 𝑿 = 0.5 ⋅ 0 + 0.5 ⋅ 1 = 𝟎. 𝟓
𝑽𝒂𝒓 𝑿 = 𝐸 𝑋 − 0.5 2 = 0.5 ⋅ −0.5 2 + 0.5 ⋅ 0.5 2 = 𝟎. 𝟐𝟓
𝝈𝑿 = 0.25 = 𝟎. 𝟓
Probability Theory

Example 2:

Suppose X ~ Unif(4, 6) is selected from a uniform distribution


on the interval [4, 6]. 𝑝𝑋 (𝑡)
Reproduced from
https://en.wikipedi
1 a.org/wiki/Uniform
2 _distribution_(cont
inuous)#/media/Fil
e:Uniform_Distribu
tion_PDF_SVG.sv
g

4 6 𝑡
Probability Theory

Example 2:

Try sampling from X~Unif(4, 6) yourself:


Probability Theory

Example 2:

6 6
1 1 1 2 1 2
𝐸 𝑋 = න 𝑡 𝑝𝑋 𝑡 𝑑𝑡 = න 𝑡 𝑑𝑡 = 6 − 4 =5
4 4 2 2 2 2
Probability Theory

Example 2:
6
𝑉𝑎𝑟 𝑋 = 𝐸 𝑋 − 𝐸 𝑋 2 =𝐸 𝑋−5 2 =න 𝑡−5 2 𝑝𝑋 𝑡 𝑑𝑡
4
Probability Theory

Example 2:
6
𝑉𝑎𝑟 𝑋 = 𝐸 𝑋 − 𝐸 𝑋 2 =𝐸 𝑋−5 2 =න 𝑡−5 2 𝑝𝑋 𝑡 𝑑𝑡
4
1 6 2
1 1 3
1 3
1
= න 𝑡−5 𝑑𝑡 = 6−5 − 4−5 = = 0.333 …
2 4 2 3 3 3
Probability Theory

Example 2:
6
𝑉𝑎𝑟 𝑋 = 𝐸 𝑋 − 𝐸 𝑋 2 =𝐸 𝑋−5 2 =න 𝑡−5 2 𝑝𝑋 𝑡 𝑑𝑡
4
1 6 2
1 1 3
1 3
1
= න 𝑡−5 𝑑𝑡 = 6−5 − 4−5 = = 0.333 …
2 4 2 3 3 3

𝜎𝑋 = 𝑉𝑎𝑟(𝑋) = 1/3 = 0.577 …


Probability Theory

Example 3:
Probability Theory

Given two events A and B, the conditional probability


𝑷 𝑨 𝑩) is defined as:
𝑃(𝐴 ∩ 𝐵)
𝑃 𝐴 𝐵) =
𝑃(𝐵)
Where 𝑃(𝐴 ∩ 𝐵) is the probability of A and B
occurring.
Probability Theory

Example:

Suppose in our clinic we observe the following


distribution of people and symptoms:

Healthy Sick
Not coughing 0.72 0.06
Coughing 0.08 0.14
Probability Theory

Q1: If someone is sick, what is the probability that they are


coughing?

Healthy Sick
Not coughing 0.72 0.06
Coughing 0.08 0.14
Probability Theory

Q1: If someone is sick, what is the probability that they are


coughing?

𝑃 𝑐𝑜𝑢𝑔ℎ𝑖𝑛𝑔 ∩ 𝑠𝑖𝑐𝑘 0.14


A1: 𝑃 𝑐𝑜𝑢𝑔ℎ𝑖𝑛𝑔 𝑠𝑖𝑐𝑘) = = = 0.7
𝑃(𝑠𝑖𝑐𝑘) 0.14+0.06

Healthy Sick
Not coughing 0.72 0.06
Coughing 0.08 0.14
Probability Theory

Q2: If someone is coughing, what is the probability that they


are sick?

Healthy Sick
Not coughing 0.72 0.06
Coughing 0.08 0.14
Probability Theory

Q2: If someone is coughing, what is the probability that they


are sick?

𝑃 𝑐𝑜𝑢𝑔ℎ𝑖𝑛𝑔 ∩ 𝑠𝑖𝑐𝑘 0.14


A2: 𝑃 𝑠𝑖𝑐𝑘 𝑐𝑜𝑢𝑔ℎ𝑖𝑛𝑔) = = ≈ 0.64
𝑃(𝑐𝑜𝑢𝑔ℎ𝑖𝑛𝑔) 0.08+0.14

Healthy Sick
Not coughing 0.72 0.06
Coughing 0.08 0.14
Probability Theory

Two random variables X and Y are independent if

𝑃 𝑋∩𝑌 =𝑃 𝑋 𝑃 𝑌

This means that the outcomes of X and Y do not affect each


other.

Otherwise, X and Y are dependent.


Probability Theory

Example 1: I roll two dice; X represents the outcome of the


first die and Y the outcome of the second die.

X and Y are independent (do not affect each other)

1 1 1
𝑃 𝑋 =4∩𝑌 =5 =𝑃 𝑋 =4 𝑃 𝑌 =5 = ⋅ =
6 6 36
Probability Theory

Example 2: Let X and Y be as before and let Z=X+Y be the


sum of the values on the two dice.

X and Z are dependent


1
𝑃 𝑋 =4∩𝑍 =5 = (can only happen if X=4 and Y=1)
36

1 5
But 𝑃 𝑋 = 4 = and 𝑃 𝑍 = 5 = ,
6 36

so it is not equal to 𝑃 𝑋 = 4)𝑃(𝑍 = 5


Statistics
Statistics

Numerical Data Categorical Data

Heights Months (Jan, May, Dec)

Temperature Gender (male, female, other)

Number of children Nationality (Israel, USA, France)

Age Age brackets (18-25, 26-35, …)


Statistics

⚠️ Be careful!

A feature containing numbers might be either numerical


data or categorical data.

Example: family data collected on subjects split into six


different groups

Number of siblings (0, 1, 2, 3, 4): numerical

Group number (0, 1, 2, 3, 4): categorical


Statistics

Discrete probability distribution of


random variable that can take two
values (normally 0 and 1).

X ~ Bernoulli(p)
P(X = 1) = p
P(X = 0) = 1– p
Statistics

Example 1: flipping a coin


p ≈ 0.5 q ≈ 0.5

Example 2:
winning the lottery
p ≈ 5.7x10-9 q =1-p
Statistics

For X ~ Bernoulli(p):
𝐸 𝑋 =𝑝
𝑉𝑎𝑟 𝑋 = 𝑝 1 − 𝑝

𝜎𝑋 = 𝑝(1 − 𝑝)
Statistics

𝑋 ~ 𝒩(𝜇, 𝜎 2 )
1 𝑡 −𝜇 2

𝑝𝑋 𝑡 = 𝑒 2𝜎2
𝜎 2𝜋

𝐸𝑋 =𝜇
Var(X) = 𝜎 2
𝜎𝑋 =𝜎

The shape of the normal PDF is known


as a “bell curve”.
Q: Why is this a useful definition?
Statistics

A: The Central Limit Theorem states


that, under reasonable conditions,
the sum of independent, identically
distributed trials asymptotically
approaches a normal distribution

Lots of real-life data includes many


nearly independent sources of
random noise added together
Statistics

Examples:
• Number of heads when flipping
one million coins (approximately)
• Height in human population
• Birth weight of newborn babies
• Variation in outdoor temperature
from monthly average

See
https://galtonboard.com/probability
examplesinlife for more examples
Further Reading
Further reading

● Introduction to Probability from MIT OCW

● An Introduction to Statistics by Keone Hon

● The Probability Cheatsheet

● The Statistics Cheatsheet

● Common Probability Distributions by Sean Owens

● Central Limit Theorem Explained

You might also like