notes19
notes19
Chrysafis Vogiatzis
Lecture 19
Learning objectives
We flip a coin 10 times and we get 6 Heads and 4 Tails. Do you be-
lieve it is a fair coin? What does the method of moments and the
maximum likelihood estimation method say about this situation?
Quick review
L ( θ ) = f ( X1 , θ ) · f ( X2 , θ ) · . . . · f ( X n , θ )
ln ( L (θ )) = ln ( f ( X1 , θ )) + ln ( f ( X2 , θ )) + . . . + ln ( f ( Xn , θ ))
3. One that is fair and has a side of Heads and a side of Tails.
Assume I randomly pick one coin and start flipping it. I report
to you the number of tries (n) and the number of Heads (x). For
example, I may tell you n = 8, x = 5 or n = 2, x = 0, and so on.
Flipping the coin: first take
I let you know that I flipped the coin three times and got
Heads both times: n = 3, x = 2. What are the method of
moments and the maximum likelihood estimators for p?
But... I carry three coins with me. Shouldn’t I use this infor-
mation somehow?
Bayesian estimation
We separate the discussion between discrete sets for the values the
parameter can take (like in the previous example where I carried
3 distinct coins with me) and between continuous sets, where the
parameter can be any real number in a range of values.
bayesian estimation 3
0
• P ( p = 0) = 0+0.125+0.1875 = 0.
0.125
• P ( p = 1) = 0+0.125+0.1875 = 0.4.
0.1875
• P( p = 0.5) = 0+0.125+0.1875 = 0.6.
From the results it seems that the vehicle that first passed
today is more likely a truck!
bayesian estimation 6
6.24·10−17
• personal car: 6.24·10−17 +1.43·10−7 +1.81·10−7 +8.18·10−8
= 1.54 · 10−10 ≈ 0.
• motorcycle: 0.3527.
• truck: 0.4458.
• bike: 0.2015.
Figure 1: The distribution of the probability of getting Heads in the continuous version
of the problem. We see how p = 0.5 is more likely, but we can get values as low as 0.1,
0.2, or as high as 0.8, 0.9, albeit with very small likelihood.
Now that we know this, say we tossed a coin 10 times, and got 10
straight times Heads! Recall that both the method of moments and
the maximum likelihood estimation method would simply assume
that the coin has p = 1 and proceed.
Getting 10 Heads in 10 tosses would be highly improbable for
a coin that is “50-50”, but it could mean that I have a biased coin
towards Heads. So, what should our estimate be?
bayesian estimation 7
First, calculate the likelihood function, the way we did during the
maximum likelihood estimation calculations. In this case, it would be
L( p) = p10 . Let’s plot that (see Figure 2).
Figure 2: The likelihood function of getting 10 Heads after tossing a coin 10 times. It is
maximized at p = 1, which would then be our maximum likelihood estimator.
Figure 3: The posterior distribution, found by multiplying f (θ ) (the pdf of the normal
distribution N (0.5, 0.01)) with the likelihood function L(θ ). The maximizer here is the
Bayesian estimator and is found at p̂ = 0.6531.
And, yes! This sums it up. Let us view one example from beginning
to end using the method.
Mortality risk
Prior distribution:
1 −1p
f ( p) = e 1.5 .
1.5
Likelihood function:
L( p) = p25 · (1 − p)125 .
Posterior distribution:
1 − 1 p 25
f ( p) · L( p) = e 1.5 · p · (1 − p)125 .
1.5
bayesian estimation 9
∂ f ( p) · L( p) 4 2
= e− 3 p ( p − 1)124 p24 (( p − 226) p + 37.5) = 0 =⇒
∂p 9
=⇒ (( p − 226) p + 37.5) = 0 =⇒ p = 0.16605.
If we want to, we can see the same result visually. First, plot our
prior beliefs/distribution:
∂ f (θ ) L(θ )
= 0 =⇒ 0.0447628 · 0.58441θ (1 + θ )3 (17.4783 + θ (−11.3083 + θ )) = 0.
∂θ
We get three possible solution: θ = −1, θ = 1.85, or θ = 9.46.
We note that the last one cannot happen as θ is between -2 and 2.
Between the two remaining possible solutions, we compare their
posterior distribution values:
• f (−1) · L(−1) = 1
12 (3 − (−1)) · ((−1) + 1)4 0.5844096−1 = 0.
• f (1.85) · L(1.85) = 1
12 (3 − 1.85) · (1.85 + 1)4 0.58440961.85 = 2.34.
−2 −1 1 2