Probability 3.2 EdX
Probability 3.2 EdX
Dave Goldsman
H. Milton Stewart School of Industrial and Systems Engineering
Georgia Institute of Technology
3/2/20
ISYE 6739
Outline
ISYE 6739
Introduction to Descriptive Statistics
ISYE 6739
Introduction to Descriptive Statistics
Examples:
Election polling.
Coke vs. Pepsi.
The effect of cigarette smoking on the probability of getting cancer.
The effect of a new drug on the probability of contracting hepatitis.
What’s the most popular TV show during a certain time period?
The effect of various heat-treating methods on steel tensile strength.
Which fertilizers improve crop yield?
King of Siam — etc., etc., etc.
Idea (Election polling example): We can’t poll every single voter. Thus, we
take a sample of data from the population of voters, and try to make a
reasonable conclusion based on that sample.
ISYE 6739
Introduction to Descriptive Statistics
Statistics tells us how to conduct the sampling (i.e., how many observations to
take, how to take them, etc.), and then how to draw conclusions from the
sampled data.
Types of Data
Continuous variables: Can take on any real value in a certain
interval. For example, the lifetime of a lightbulb or the weight of a
newborn child.
Discrete variables: Can only take on specific values. E.g., the number
of accidents this week at a factory or the possible rolls of a pair of dice.
Categorical variables: These data are not typically numerical.
What’s your favorite TV show during a certain time slot?
ISYE 6739
Introduction to Descriptive Statistics
Plotting Data
A picture is worth 1000 words. Always plot data before doing anything else,
if only to identify any obvious issues such as nonstandard distributions,
missing data points, outliers, etc.
Histograms provide a quick, succinct look at what you are dealing with. If
you take enough observations, the histogram will eventually converge to the
true distribution. But sometimes choosing the optimal number of cells is a
little tricky — like Goldilocks!
ISYE 6739
Summarizing Data
Outline
ISYE 6739
Summarizing Data
It’s nice to have lots of data. But sometimes it’s too much of a good thing!
Need to summarize.
23 62 91 83 82 64 73 94 94 52
67 11 87 99 37 62 40 33 80 83
99 90 18 73 68 75 75 90 36 55
ISYE 6739
Summarizing Data
Stem-and-Leaf Diagram of grades. Easy way to write down all of the data.
Saves some space, and looks like a sideways histogram.
9 9944100
8 73320
7 5533
6 87422
5 52
4 0
3 763
2 3
1 81
ISYE 6739
Summarizing Data
Grouped Data
Cumul. Proportion of
Range Freq. Freq. observations so far
0–20 2 2 2/30
21–40 5 7 7/30
41–60 2 9 9/30
61–80 10 19 19/30
81–100 11 30 1
ISYE 6739
Summarizing Data
Summary Statistics:
n = 30 observations.
ISYE 6739
Summarizing Data
ISYE 6739
Summarizing Data
Remark: The sample median is less susceptible to “outlier” data than the
sample mean. One bad number can spoil the sample mean’s entire day.
Sample Mode: “Most common” value. Not the most useful measure
sometimes.
ISYE 6739
Summarizing Data
Sample Variance:
n n
X
2 1 X 2 1 2 2
S ≡ (Xi − X̄) = Xi − nX̄ ,
n−1 n−1
i=1 i=1
ISYE 6739
Summarizing Data
Xj 1 2 3 4 5 6
fj 2 1 1 3 0 3
ISYE 6739
Summarizing Data
Xj interval mj fj
100–150 125 10
150–200 175 15
200–300 250 5
Pc
j=1 fj mj
X̄ ≈ = 170.833 and
n
Pc 2
j=1 fj mj − nX̄ 2
S2 ≈ = 1814. 2
n−1
ISYE 6739
Candidate Distributions
Outline
ISYE 6739
Candidate Distributions
ISYE 6739
Candidate Distributions
ISYE 6739
Candidate Distributions
ISYE 6739
Introduction to Estimation
Outline
ISYE 6739
Introduction to Estimation
Statistics are random variables. If we take two different samples, we’d expect
to get two different values of a statistic.
Examples of parameters: µ, σ 2 .
ISYE 6739
Introduction to Estimation
Its expected value should equal the parameter it’s trying to estimate.
ISYE 6739
Unbiased Estimation
Outline
ISYE 6739
Unbiased Estimation
ISYE 6739
Unbiased Estimation
ISYE 6739
Unbiased Estimation
So. . .
ISYE 6739
Unbiased Estimation
n n
1 hX i 1 hX i
E[S 2 ] = E (Xi − X̄)2 = E Xi2 − nX̄ 2
n−1 n−1
i=1 i=1
n
1 X
= E[Xi2 ] − nE[X̄ 2 ]
n−1
i=1
n
= E[X12 ] − E[X̄ 2 ] (since the Xi ’s are iid)
n − 1
n
= Var(X1 ) + (E[X1 ])2 − Var(X̄) − (E[X̄])2
n−1
n
= (σ 2 − σ 2 /n) (since E[X1 ] = E[X̄] and Var(X̄) = σ 2 /n)
n−1
= σ 2 . Done. 2
ISYE 6739
Unbiased Estimation
iid
Big Example: Suppose that X1 , . . . , Xn ∼ Unif(0, θ), i.e., the pdf is
f (x) = 1/θ, for 0 < x < θ. Think of it this way: I give you a bunch of
random numbers between 0 and θ, and you have to guess what θ is.
Y1 = 2X̄.
n+1
Y2 = n max1≤i≤n Xi .
(
12X̄ w.p. 1/2
Y3 =
−8X̄ w.p. 1/2.
ISYE 6739
Unbiased Estimation
Why might this estimator for θ make sense? (We’ll say why it’s “better” in a
little while.)
n+1
Proof (that it’s unbiased): E[Y2 ] = n E[maxi Xi ] = θ iff
nθ
E[max Xi ] = (which is what we’ll show below).
n+1
ISYE 6739
Unbiased Estimation
= (y/θ)n .
ISYE 6739
Unbiased Estimation
d ny n−1
fM (y) ≡ (y/θ)n = , 0 < y < θ,
dy θn
and this implies that
θ θ
ny n
Z Z
nθ
E[M ] = yfM (y) dy = dy = .
0 0 θn n+1
n+1
Whew! This finally shows that Y2 = n max1≤i≤n Xi is an unbiased
estimator for θ! 2
ISYE 6739
Unbiased Estimation
“Ugly” Estimator:
(
12X̄ w.p. 1/2
Y3 =
−8X̄ w.p. 1/2.
Ha! It’s possible to get a negative estimate for θ, which is strange since θ > 0!
Usually, it’s good for an estimator to be unbiased, but the “ugly” estimator Y3
shows that unbiased estimators can sometimes be goofy.
ISYE 6739
Unbiased Estimation
4 4 θ2 θ2
Var(Y1 ) = 4Var(X̄) = · Var(Xi ) = · = .
n n 12 3n
ISYE 6739
Unbiased Estimation
Meanwhile,
2
n+1
Var(Y2 ) = Var(M )
n
2 2
n+1 n+1
= E[M 2 ] − · E[M ]
n n
2 Z θ
ny n+1
n+1
= dy − θ2
n 0 θn
(n + 1)2 θ2 θ2
= θ2 · − θ2 = < .
n(n + 2) n(n + 2) 3n
Thus, both Y1 and Y2 are unbiased, but Y2 has much lower variance than Y1 .
We can break the “unbiasedness tie” by choosing Y2 . 2
ISYE 6739
Mean Squared Error
Outline
ISYE 6739
Mean Squared Error
ISYE 6739
Mean Squared Error
= E[T 2 ] − 2θE[T ] + θ2
So MSE = Bias2 + Var, and thus combines the bias and variance of an
estimator. 2
ISYE 6739
Mean Squared Error
The lower the MSE the better. If T1 (X) and T2 (X) are two estimators of θ,
we’d usually prefer the one with the lower MSE — even if it happens to have
higher bias.
Example: Suppose that estimator A has bias = 3 and variance = 10, while
estimator B has bias = −2 and variance = 14. Which estimator (A or B) has
the lower mean squared error?
ISYE 6739
Mean Squared Error
iid
Example: X1 , . . . , Xn ∼ Unif(0, θ).
n+1
Two estimators: Y1 = 2X̄, and Y2 = n maxi Xi .
θ2 θ2
Also, Var(Y1 ) = 3n , and Var(Y2 ) = n(n+2) .
Thus,
θ2 θ2
MSE(Y1 ) = and MSE(Y2 ) = ,
3n n(n + 2)
so Y2 is better (by an order of magnitude, actually). 2
ISYE 6739
Maximum Likelihood Estimation
Outline
ISYE 6739
Maximum Likelihood Estimation
Remark: We can very informally regard the MLE as the “most likely”
estimate of θ.
ISYE 6739
Maximum Likelihood Estimation
iid
Example: Suppose X1 , . . . , Xn ∼ Exp(λ). Find the MLE for λ.
Now maximize L(λ) with respect to λ. Could take the derivative and plow
through all of the horrible algebra. Too tedious. Need a trick. . . .
Useful Trick: Since the natural log function is one-to-one, it’s easy to see
that the λ that maximizes L(λ) also maximizes `n(L(λ))!
n
X n
X
n
`n(L(λ)) = `n λ exp − λ xi = n`n(λ) − λ xi .
i=1 i=1
ISYE 6739
Maximum Likelihood Estimation
Remarks:
λ̂ = 1/X̄ makes sense, since E[X] = 1/λ.
hat over λ to indicate that this is the MLE. It’s
At the end, we put a little d
like a party hat!
At the end, we make all of the little xi ’s into big Xi ’s to indicate that this
is a random variable.
Just to be careful, you “probably” ought to do a second-derivative test.
ISYE 6739
Maximum Likelihood Estimation
iid
Example: Suppose X1 , . . . , Xn ∼ Bern(p). Find the MLE for p.
Useful trick for this problem: Since
(
1 w.p. p
Xi =
0 w.p. 1 − p,
f (x) = px (1 − p)1−x , x = 0, 1.
ISYE 6739
Maximum Likelihood Estimation
⇒ P P
d i xi n − i xi
`n(L(p)) = − ≡ 0
dp p 1−p
⇒
n
X n
X
(1 − p) xi = p n− xi
i=1 i=1
⇒
p̂ = X̄.
This makes sense since E[X] = p. 2
ISYE 6739
Trickier MLE Examples
Outline
ISYE 6739
Trickier MLE Examples
iid
Example: X1 , . . . , Xn ∼ Nor(µ, σ 2 ). Get simultaneous MLEs for µ and σ 2 .
n n
Y Y 1 n 1 (xi − µ)2 o
L(µ, σ 2 ) = f (xi ) = √ exp −
2πσ 2 2 σ2
i=1 i=1
n 1 X (x − µ)2 on
1 i
= exp − .
(2πσ 2 )n/2 2 σ2
i=1
n
n n 1 X
⇒ `n(L(µ, σ 2 )) = − `n(2π) − `n(σ 2 ) − 2 (xi − µ)2
2 2 2σ
i=1
n
∂ 1 X
⇒ `n(L(µ, σ 2 )) = 2 (xi − µ) ≡ 0,
∂µ σ
i=1
and so µ̂ = X̄.
ISYE 6739
Trickier MLE Examples
c2 is a little bit biased, but it has slightly less variance than S 2 . Anyway, as n
σ
gets big, S 2 and σ c2 become the same.
ISYE 6739
Trickier MLE Examples
λr r−1 −λx
f (x) = x e , x > 0.
Γ(r)
iid
Suppose X1 , . . . , Xn ∼ Gam(r, λ). Find the MLEs for r and λ.
n n
Y λnr Y r−1 −λ Pi xi
L(r, λ) = f (xi ) = xi e
[Γ(r)]n
i=1 i=1
n
Y n
X
⇒ `n(L) = rn `n(λ) − n `n(Γ(r)) + (r − 1)`n xi − λ xi
i=1 i=1
n
∂ rn X
⇒ `n(L) = − xi ≡ 0,
∂λ λ
i=1
so that λ̂ = r̂/X̄.
ISYE 6739
Trickier MLE Examples
The Trouble in River City is, we need to find r̂. To do so, we have
n n
∂ ∂ h Y X i
`n(L) = rn `n(λ) − n `n(Γ(r)) + (r − 1)`n xi − λ xi
∂r ∂r
i=1 i=1
n
n d Y
= n `n(λ) − Γ(r) + `n xi
Γ(r) dr
i=1
n
Y
= n `n(λ) − nΨ(r) + `n xi ≡ 0,
i=1
ISYE 6739
Trickier MLE Examples
. Γ(r + h) − Γ(r)
Γ0 (r) = (for any small h of your choosing). 2
h
ISYE 6739
Trickier MLE Examples
iid
Example: Suppose X1 , . . . , Xn ∼ Unif(0, θ). Find the MLE for θ.
The pdf is f (x) = 1/θ, 0 < x < θ, (beware of the funny limits). Then
n
Y
L(θ) = f (xi ) = 1/θn if 0 ≤ xi ≤ θ, ∀i
i=1
ISYE 6739
Invariance Property of MLEs
Outline
ISYE 6739
Invariance Property of MLEs
Remark: We noted before that such a property does not hold for
unbiasedness.
√ For instance, although E[S 2 ] = σ 2 , it is usually the case that
2
E[ S ] 6= σ.
ISYE 6739
Invariance Property of MLEs
iid
Example: Suppose X1 , . . . , Xn ∼ Nor(µ, σ 2 ).
1 Pn
We saw that the MLE for σ 2 is σ
c2 =
n i=1 (Xi − X̄)2 .
√
If we consider the function h(y) = + y, then the Invariance Property says
that the MLE of σ is
v
q u n
u1 X
σ̂ = 2
σ =t
c (Xi − X̄)2 . 2
n
i=1
iid
Example: Suppose X1 , . . . , Xn ∼ Bern(p).
We saw that the MLE for p is p̂ = X̄. Then Invariance says that the MLE for
Var(Xi ) = p(1 − p) is p̂(1 − p̂) = X̄(1 − X̄). 2
ISYE 6739
Invariance Property of MLEs
iid
Example: Suppose X1 , . . . , Xn ∼ Exp(λ).
[
F̄ (x) = e−λ̂x = e−x/X̄ .
This kind of thing is used all of the time in the actuarial sciences. 2
ISYE 6739
Method of Moments Estimation
Outline
ISYE 6739
Method of Moments Estimation
Remark:
Pn As n → ∞, the Law of Large Numbers implies that
k k
i=1 Xi /n → E[X ], i.e., mk → µk (so this is a good estimator).
ISYE 6739
Method of Moments Estimation
Examples:
General Game Plan: Express the parameter of interest in terms of the true
moments µk = E[X k ]. Then substitute in the sample moments mk .
ISYE 6739
Method of Moments Estimation
iid
Example: Suppose X1 , . . . , Xn ∼ Pois(λ).
iid
Example: Suppose X1 , . . . , Xn ∼ Nor(µ, σ 2 ).
n−1 2
MoM estimators for µ and σ 2 are X̄ and n S (or S 2 ), respectively.
For this example, these estimators are the same as the MLEs. 2
ISYE 6739
Method of Moments Estimation
iid
Example: Suppose X1 , . . . , Xn ∼ Beta(a, b). The pdf is
Γ(a + b) a−1
f (x) = x (1 − x)b−1 , 0 < x < 1.
Γ(a)Γ(b)
a ab
E[X] = and Var(X) = .
a+b (a + b)2 (a + b + 1)
ISYE 6739
Method of Moments Estimation
We have
a b E[X] . bX̄
E[X] = ⇒ a = = , (1)
a+b 1 − E[X] 1 − X̄
so
ab E[X]b
Var(X) = = .
(a + b)2 (a + b + 1) (a + b)(a + b + 1)
bX̄
Plug into the above X̄ for E[X], S 2 for Var(X), and 1−X̄
for a. Then after
lots of algebra, we can solve for b:
. (1 − X̄)2 X̄
b = − 1 + X̄.
S2
To finish up, you can plug back into Equation (1) to get the MoM estimator
for a.
ISYE 6739
Method of Moments Estimation
0.86 0.77 0.84 0.38 0.83 0.54 0.77 0.94 0.37 0.40
ISYE 6739
Sampling Distributions
Outline
ISYE 6739
Sampling Distributions
Since statistics are RV’s, it’s useful to figure out their distributions.
The distribution of a statistic is called a sampling distribution.
iid
Example: X1 , . . . , Xn ∼ Nor(µ, σ 2 ) ⇒ X̄ ∼ Nor(µ, σ 2 /n).
ISYE 6739
Sampling Distributions
χ2 Distribution
iid
Definition/Theorem: If Z1 , . . . , Zk ∼ Nor(0, 1), then Y ≡ ki=1 Zi2 has
P
the chi-squared distribution with k degrees of freedom (df), and we
write Y ∼ χ2 (k).
ISYE 6739
Sampling Distributions
For k > 2, the χ2 (k) pdf is skewed to the right. (You get an occasional
“large” observation.)
ISYE 6739
Sampling Distributions
ISYE 6739
Sampling Distributions
iid
Example: If X1 , . . . , Xn ∼ Nor(µ, σ 2 ), then, as we’ll show in the next
module,
n
1 X σ 2 χ2 (n − 1)
S 2
= (Xi − X̄)2 ∼ . 2
n−1 n−1
i=1
ISYE 6739
Sampling Distributions
t Distribution
2
Definition/Theorem: Suppose thatp Z ∼ Nor(0, 1), Y ∼ χ (k), and Z and
Y are independent. Then T ≡ Z/ Y /k has the Student t distribution
with k degrees of freedom, and we write T ∼ t(k).
The pdf is
Γ k+1
− k+1
2 x2 2
fT (x) = √ k
+ 1 , x ∈ R.
πk Γ 2 k
Fun Facts: The t(k) looks like the Nor(0,1), except the t has fatter tails.
The k = 1 case gives the Cauchy distribution, which has really fat tails.
It’s used when we find confidence intervals and conduct hypothesis tests for
the mean µ. Stay tuned.
“Student” is the pseudonym of the guy (William Gossett) who first derived it.
Gossett was a statistician at the Guinness Brewery.
ISYE 6739
Sampling Distributions
F Distribution
The pdf is
n n n
Γ n+m ) 2 x 2 −1
2 (m
fF (x) = n+m , x > 0.
Γ n2 Γ m
n
2 (m x + 1) 2
Tables came be found in back of the book for various α, n, m or you can use
the Excel function F.INV(1 − α, n, m)
Remarks: It can be shown that F1−α,m,n = 1/Fα,n,m . Use this fact if you
have to find something like F0.95,10,5 = 1/F0.05,5,10 = 1/3.326.
It’s used when we find confidence intervals and conduct hypothesis tests for
the ratio of variances from two different processes. Details later.
ISYE 6739