Formula Sheet for Test 2 of Stat4001
2:00 pm – 5:00 pm on Wednesday July 24, 2024 at SJC-207
Student ID:____________ Signature: _______________________
Classical Probability:
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎𝑛 𝑒𝑣𝑒𝑛𝑡 =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
Empirical Probability:
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑒𝑣𝑒𝑛𝑡 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑑 𝑖𝑛 𝑝𝑎𝑠𝑡
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎𝑛 𝑒𝑣𝑒𝑛𝑡 ℎ𝑎𝑝𝑝𝑒𝑛𝑖𝑛𝑔 =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
Probability Rules
Additional Rule: If A and B are disjoint events, then 𝑃(𝐴 𝑜𝑟 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
General Additional Rule: For any two events A and B
𝑃(𝐴 𝑜𝑟 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 𝑎𝑛𝑑 𝐵)
Complement Rule: 𝑃(𝐴) = 1 − 𝑃(𝐴𝑐 )
Conditional Probability: The probability of A given B is noted as 𝑃(𝐴|𝐵), and
𝑃(𝐴 𝑎𝑛𝑑 𝐵)
𝑃(𝐴|𝐵) = 𝑖𝑓 𝑃(𝐵) > 0
𝑃(𝐵)
General Multiplication Rule: For any two events A and B
𝑃(𝐴 𝑎𝑛𝑑 𝐵) = 𝑃(𝐴)𝑃(𝐵|𝐴) = 𝑃(𝐵)𝑃(𝐴|𝐵)
Bayes Rule:
𝑃(𝐵|𝐴𝑖 )𝑃(𝐴𝑖 )
𝑃(𝐴𝑖 |𝐵) =
∑ 𝑃(𝐵|𝐴𝑖 )𝑃(𝐴𝑖 )
Independence: If events A and B are independent, then
𝑃(𝐴 𝑎𝑛𝑑 𝐵) = 𝑃(𝐴)𝑃(𝐵)
Expected Value: 𝜇 = 𝐸𝑉 = 𝐸(𝑋) = ∑ 𝑥𝑃(𝑥)
Variance: 𝜎 2 = 𝑉𝑎𝑟(𝑋) = ∑(𝑥 − 𝜇)2 𝑃(𝑥) , Standard Deviation: 𝜎 = 𝑆𝐷(𝑋) = √𝑉𝑎𝑟(𝑋)
1|Page
The Empirical Rule For a symmetrical, bell-shaped frequency distribution:
➢ Approximately 68 percent of the observations will lie within ±1 standard
deviation of the mean.
➢ About 95 percent of the observations will lie within ±2 standard deviations of
the mean.
➢ Practically all (99.7 percent) will lie within ±3 standard deviations of the mean.
Addition Rule for Expected Values of Random Variable: 𝐸(𝑋 ± 𝑌) = 𝐸𝑋 ± 𝐸𝑌
Addition Rule for Variance of Random Variable: If random variables X and Y are
independent, then
𝑉𝑎𝑟(𝑋 ± 𝑌) = 𝑉𝑎𝑟𝑋 ± 𝑉𝐴𝑟𝑌
𝑆𝐷(𝑋 ± 𝑌) = √𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌)
If c is a constant number, then
𝐸(𝑋 ± 𝑐) = 𝐸(𝑋) ± 𝑐, 𝐸(𝑐𝑋) = 𝑐𝐸(𝑋)
𝑉𝑎𝑟(𝑋 ± 𝑐) = 𝑉𝑎𝑟(𝑋), 𝑉𝑎𝑟(𝑐𝑋) = 𝑐 2 𝑉𝑎𝑟(𝑋)
𝑆𝐷(𝑋 ± 𝑐) = 𝑆𝐷(𝑋), 𝑆𝐷(𝑐𝑋) = |𝑐|𝑆𝐷(𝑋)
Bernoulli Trial
➢ There are only two possible outcomes (success and failure) for each trial.
➢ The probability of success, denoted p, is the same for each trial. The probability
of failure is q = 1 – p.
➢ The trials are independent.
Geometric Distribution: Random variable X represent the number of trials until the
first success happens
1 𝑞
𝑃(𝑋 = 𝑘) = 𝑞 𝑘−1 𝑝, 𝑘 = 1, 2, 3, ⋯ ; 𝜇 = 𝐸(𝑋) = , 𝜎 = 𝑆𝐷(𝑋) = √
𝑝 𝑝2
Binomial Distribution: Random variable X represents the number of success
happened in n times trials.
𝑛
𝑃(𝑋 = 𝑘) = ( ) 𝑝𝑘 𝑞 𝑛−𝑘 = 𝑛 𝐶𝑘 𝑝𝑘 𝑞 𝑛−𝑘 , 𝑘 = 0, 1, 2, ⋯ , 𝑛; 𝜇 = 𝐸(𝑋) = 𝑛𝑝; 𝜎 = 𝑆𝐷(𝑋) = √𝑛𝑝𝑞
𝑘
Poisson Distribution: X represents the number of events that occur over a given
interval of time or space.
𝑒 −𝜆 𝜆𝑘
𝑃(𝑋 = 𝑘) = , 𝑘 = 0, 1, 2, ⋯ ; 𝜇 = 𝐸(𝑋) = 𝜆, 𝜎 = 𝑆𝐷(𝑋) = √𝜆
𝑘!
2|Page
Uniform Distribution:
For values c and d (𝑐 ≤ 𝑑) both within the interval [a, b]:
𝑑−𝑐
𝑃(𝑐 ≤ 𝑋 ≤ 𝑑) =
𝑏−𝑎
𝑎+𝑏 (𝑏 − 𝑎)2 (𝑏 − 𝑎)2
𝐸(𝑋) = , 𝑉𝑎𝑟(𝑋) = , 𝑆𝐷(𝑋) = √
2 12 12
𝑋−𝜇
Standard Normal Value 𝑧= 𝜎
If random variable X is normal distribution variable with mean 𝜇 and standard
𝑋−𝜇
deviation 𝜎, then 𝑧 = is a standard normal distribution variable.
𝜎
Normal Approximation:
A discrete Binomial model is approximately Normal if
𝑛𝑝 ≥ 10 𝑎𝑛𝑑 𝑛𝑞 ≥ 10
𝑥−𝜇
then for sufficiently large n, the random variable has a standard normal
𝜎
distribution where 𝜇 = 𝑛𝑝 and 𝜎 = √𝑛𝑝(1 − 𝑝).
Or we can say x is approximately normal distributed with mean 𝜇 = 𝑛𝑝 and standard
deviation 𝜎 = √𝑛𝑝(1 − 𝑝).
Exponential Distribution:
1 1
𝑃(𝑠 ≤ 𝑋 ≤ 𝑡) = 𝑒 −𝜆𝑠 − 𝑒 −𝜆𝑡 , 𝑃(𝑋 ≤ 𝑡) = 1 − 𝑒 −𝜆𝑡 , 𝜇 = 𝐸(𝑋) = , 𝜎 = 𝑆𝐷(𝑋) =
𝜆 𝜆
Central Limit Theorem
The means of all the possible random samples with the same sample size has a
sampling distribution which is approximately a normal distribution. The larger the
sample size is, the better the approximation will be.
Sampling Distribution for Proportion
𝑝(1 − 𝑝) 𝑝𝑞
𝜇(𝑝̂ ) = 𝑝, 𝑆𝐷(𝑝̂ ) = √ =√
𝑛 𝑛
𝑝𝑞
The normal model 𝑁 (𝑝, √ 𝑛 ) is a sampling distribution model for the sample proportion.
n is the sample size, q = 1 – p is the proportion of failures.
3|Page
Sampling Distribution for Mean
When a random sample is drawn from any population with mean μ and standard
deviation σ, its sample mean, 𝑥̅ , has a normal distribution with the mean μ and the
standard deviation is
𝜎
𝑆𝐷(𝑥̅ ) =
√𝑛
Confidence Interval for Proportions
𝑝̂ is the sample estimate of the true proportion 𝑝 and 𝑞̂ = 1 − 𝑝̂ .
𝑝̂ 𝑞̂ 𝑝̂ (1 − 𝑝̂ )
Standard Error is 𝑆𝐸(𝑝̂ ) = √ =√
𝑛 𝑛
𝑝̂ 𝑞̂ 𝑝̂ 𝑞̂
Confidence Interval is 𝑝̂ ± 𝑧 ∗ 𝑆𝐸(𝑝̂ ) = 𝑝̂ ± 𝑧 ∗ √ , 𝑴𝒂𝒓𝒈𝒊𝒏 𝒐𝒇 𝑬𝒓𝒓𝒐𝒓, 𝑀𝐸 = 𝑧 ∗ √
𝑛 𝑛
𝑧 ∗ is the critical value based on the confidence level, n is the sample size
Confidence Interval for the difference between two Proportions
𝒑̂𝟏 𝒒
̂𝟏 ̂𝟐 𝒒
𝒑 ̂𝟐
(𝒑 ̂ 𝟐 ) ± 𝒛∗ 𝑺𝑬(𝒑
̂𝟏 − 𝒑 ̂𝟏 − 𝒑
̂ 𝟐 ), 𝑺𝑬(𝒑 ̂𝟐 ) = √
̂𝟏 − 𝒑 + ̂𝟏 = 𝟏 − 𝒑
, 𝒒 ̂𝟏 , ̂𝟐 = 𝟏 − 𝒑
𝒒 ̂𝟐
𝒏𝟏 𝒏𝟐
where 𝑝̂1 𝑎𝑛𝑑 𝑝̂2 are the sample estimates of the true proportions from population 1 and 2,
respectively. 𝑧 ∗ is the critical value based on the confidence level.
Confidence Interval for Means when Population Standard Deviation is known
𝜎 𝜎
Confidence Interval is 𝑦̅ ± 𝑧 ∗ 𝑆𝐷(𝑦̅) = 𝑦̅ ± 𝑧 ∗ , 𝑴𝒂𝒓𝒈𝒊𝒏 𝒐𝒇 𝑬𝒓𝒓𝒐𝒓, 𝑀𝐸 = 𝑧 ∗ .
√𝑛 √𝑛
𝑧 ∗ is the critical value based on the confidence level, n is the sample size.
List of Excel Functions
AVERAGE(range of data in Excel) gives the mean of the data collection
MEDIAN(range of data in Excel) gives the median of the data collection
MODE(range of data in Excel) gives the median of the data collection
4|Page
STDEV.P(range of data in Excel) gives the population standard deviation
STDEV.S(range of data in Excel) gives the sample standard deviation
VAR.P(range of data in Excel) gives the population variance
VAR.S(range of data in Excel) gives the sample variance
EXP(x) gives the value of 𝑒 𝑥
FACT(k) gives the value of k!
𝑛
COMBIN(n,k) gives the value of combination 𝑛 𝐶𝑘 𝑜𝑟 ( )
𝑘
𝑛
BINOM.DIST(k,n,p,0) gives the value of 𝑃(𝑋 = 𝑘) = ( ) 𝑝𝑘 𝑞 𝑛−𝑘 = 𝑛 𝐶𝑘 𝑝
𝑘 𝑛−𝑘
𝑞 ,𝑞 = 1 − 𝑝
𝑘
BINOM.DIST(k, n, p, 1) gives the value of 𝑃(𝑋 ≤ 𝑘) = ∑𝑘𝑖=0 𝑃(𝑋 = 𝑖) = ∑𝑘𝑖=0 𝑛 𝐶𝑖 𝑝𝑖 𝑞 𝑛−𝑖
EXPON.DIST(t,λ,1) gives the value of cumulative probability 𝑃(𝑋 ≤ 𝑡) = 1 − 𝑒 −𝜆𝑡
𝑒 −𝜆 𝜆𝑘
POISSON.DIST(k,λ,0) gives the value of cumulative probability 𝑃(𝑋 = 𝑘) = 𝑘!
𝑒 −𝜆 𝜆𝑖
POISSON.DIST(k,λ,1) gives the value of cumulative probability 𝑃(𝑋 ≤ 𝑘) = ∑𝑘𝑖=1 𝑖!
NORM.S.DIST(z,1) gives the value of cumulative probability 𝑃(𝑋 < 𝑧)
if is X follows standard normal distribution.
NORM.DIST(z,µ,ơ,1) gives the value of cumulative probability 𝑃(𝑋 < 𝑧)
if is X follows normal distribution with mean µ and standard deviation ơ
NORM.S.INV(𝒑) gives the standard value of z
if X follows standard normal distribution and 𝑃(𝑋 < 𝑧) = 𝑝
NORM.INV(𝒑, 𝝁, 𝝈) gives the cut-off value of z,
if is X follows normal distribution with mean µ and
standard deviation ơ and 𝑃(𝑋 < 𝑧) = 𝑝
CONFIDENCE.NORM(α, ơ, n) gives the margin of error of a confidence interval,
where 1 - α is confidence level, ơ is population standard
deviation and n is the sample size.
5|Page