Probability and Statistics
Probability and Statistics
Probability
or
n
Y
L(θ) = fX (xi , θ) for continuous case
i=1
Probability
ln properties
ln(AB) = ln(A) + ln(B)
ln(A/B) = ln(A) − ln(B)
ln(A)B = B · ln(A)
ln(e) = 1
ln(1) = 0
n
Y y i − yi
L(θ) = e θ
θ2
i=1
n y n
X i yi
X yi
ln(L(θ)) = l(λ) = ln e− θ = ln yi − − 2 ln θ
θ2 θ
i=1 i=1
n n
dl(θ) X yi 2 1 X 2n
= 0= 0+ − = yi −
dθ θ=θ̂ θ̂2 θ̂ θ̂2 θ̂
i=1 i=1
n
X
n
yi
X i=1 9.2 + 5.6 + 18.4 + 12.1 + 10.7
yi = 2nθ̂ ⇒ θ̂ = =
2n 2(5)
i=1
= 5.6.
n
Y e −λ λk i
L(λ) =
ki !
i=1
n n
e −λ λki
X X
ln(L(λ)) = l(λ) = ln = (−λ + ki ln(λ) − ln(ki !))
ki !
i=1 i=1
n
X n
X
= −nλ + ln(λ) ki − ln(ki !)
i=1 i=1
Pn
dl(λ) i=1 ki
= −n + =0
dλ λ
n
X
ki
i=1
⇒ λ̂ = = k̄. Note: n=4.
n
n
Y
L(θ) = θe −θ(xi +1)
i=1
n
X Xn Xn
ln(L(θ)) = ln θe −θ(xi +1) = ln e −θ(xi +1) + ln(θ)
i=1 i=1 i=1
n
X
= n ln(θ) − θ (xi + 1)
i=1
n
dL(θ) 1 X
= n· − (xi + 1) = 0
dθ θ
i=1
Pn
i=1 (xi + 1)
⇒ θ̂ =
n
Probability
Probability
For a parameter θ in the pdf of a random variable, we have an
estimate θ̂ based on a sample. We want to find an interval
[θ̂ − d, θ̂ + d] with confidence (1 − α)100%
Normal Distributions
Suppose X ∼ Normal(µ, σ 2 ) with known σ and unknown µ. The
maximum likelihood estimator for µ based on a sample
X1 = x1 , X2 = x2 , . . . , Xn = xn is
X1 + X2 + · · · + Xn
µ̂ = = x̄
n
By CLT, we know that X ∼ Normal(µ, σ 2 /n). Then,
x̄ − µ
Z= √ ∼ Normal(0, 1)
σ/ n
Z (standard normal)
95%
−b = −zα/2 0 b = zα/2
Example 1
Suppose X ∼ Normal(µ, σ 2 ) with known σ = 2. Suppose a sample of
size 6 is 10.1, 15, 11.7, 14.2, 10, 11 with a sample mean of x = 12.
1 Find the 95% confidence interval for µ.
2 In order for x̄ to have 95% confidence interval of width at most
3, how large is the sample size have to be?
3 Find the 99% confidence interval for µ.
Example 2
An institute wants to estimate the household income in a country.
The incomes are normally distributed with standard deviation
$26, 000. The institute take a survey of 2416 households randomly.
The average household income in the survey is $56, 000.
1 Find a 95% confidence interval for the average household
income in the country.
2 How large does the sample size have to be to guarantee that the
length of the 95% confidence interval for µ will be less than
$1000.
Probability
100(1 − α)% confidence interval for a population Proportion P
r r
p̂(1 − p̂) p̂(1 − p̂)
p̂ − zα/2 ≤ p ≤ p̂ + zα/2
n n
Theorem
In order for k̄ to have 100(1 − α)% confidence interval of width at
most 2d, the sample size should be no smaller than
2
z p̂(1 − p̂)
α/2
if p̂ is known
n= 2
d2
z
α/2
if p̂ is unknown
4d 2
Probability
Example 3
In the example of flipping a coin 10 times with 6 faces, n = 10 and
p̂ = 6/10.
1 Find the 90% confidence interval for p.
2 The margin of error is associated to p̂ is
3 In order for p̂ to have 90% confidence interval of width at most
0.2, how large does the sample size have to be?
Probability
Solution
1 The 90% confidencerinterval for p is calculatedr by
p̂(1 − p̂) p̂(1 − p̂)
p̂ − zα/2 ≤ p ≤ p̂ + zα/2
n n
Here, zα/2 = invNorm(0.95, 0, 1) ≈ 1.64,
p p
0.6 − 1.64 0.6(0.4)/10 ≤ p ≤ 0.6 + 1.64 0.6(0.4)/10
which is [0.346, 0.854].
2 The margin of error is associated to p̂ is
zα/2 1.64
d = √ = √ = 0.26
2 n 2 10
3
2
zα/2 1.642
n= = ≈ 67.2
4d 2 4(0.1)2
The sample size should be no smaller than 68.
Probability
Example 4
A poll was conducted to find out the percentage of people who will
vote A or B for mayor of a city. Out of 500 people polled, 263 said A
and the rest said B.
1 The MLE for p.
2 The 95% confidence interval for p.
3 The margin of error at the 95% confidence interval for p.
4 Find the minimal number of people to be polled for error
≤ 2.6%.
Probability Solution
263
1 The MLE for p is p̂ = = 0.526 = 52.6%
500
2 The 95% confidence rinterval for p is calculatedrby
p̂(1 − p̂) p̂(1 − p̂)
p̂ − zα/2 ≤ p ≤ p̂ + zα/2
n n
Here, zα/2 = invNorm(0.975, 0, 1) ≈ 1.96,
q q
0.526 − 1.96 0.526(0.474)
500 ≤ p ≤ 0.526 + 1.96 0.526(0.474)
500
which is [0.482, 0.57].
3 The margin of error is associated to p̂ is
zα/2 1.96
d= √ = √ = 0.04
2 n 2 500
2
zα/2 1.962
4 n= = = 1420.7. The sample size should be no
4d 2 4(0.026)2
smaller than 1421.
Section 9.4: Interval Estimate