Lecture1 introToBayes
Lecture1 introToBayes
Rebecca C. Steorts
Bayesian Methods and Modern Statistics: STA 360/601
Lecture 1
1
I Course Webpage
I Syllabus
I LaTeX reference manual
I R markdown reference manual
I Please come to office hours for all questions.
I Office hours are not a review period if you cannot come to
class.
I Join Google group
I Graded on Labs/HWs, Exams.
I Labs/HWs and Exams .R markdown format (it must compile).
I Nothing late will be accepted.
I You’re lowest homework will be dropped.
I Announcements: Emails or in class.
I All your lab/homework assignments will be uploaded to Sakai.
I How to reach me and TAs – email or Google.
2
Expectations
3
Expectations for Homework
4
Things available to you!
5
I Why should we learn about Bayesian concepts?
I Natural if thinking about unknown parameters as random.
I They naturally give a full distribution when we perform an
update.
I We automatically get uncertainty quantification.
I Drawbacks: They can be slow and inconsistent.
6
Record linkage
7
8
9
10
These are clearly not the same Steve Fienberg!
10
Syrian Civil War
11
Bayesian Model
12
Bayesian Model
12
Bayesian Model
12
Bayesian Model
Yj 0 ` ∼ G`
zij` | βi` ∼ Bernoulli(βi` )
βi` ∼ Beta(a, b)
k
X
λij ∼ DiscreteUniform(1, . . . , Nmax ), where Nmax = ni
i=1
12
The model I showed you is very complicated.
This course will give you an intro to Bayesian models and methods.
Often Bayesian models are hard to work with, so we’ll learn about
approximations.
13
I “Bayesian” traces its origin to the 18th century and English
Reverend Thomas Bayes, who along with Pierre-Simon
Laplace discovered what we now call “Bayes’ Theorem”.
14
I “Bayesian” traces its origin to the 18th century and English
Reverend Thomas Bayes, who along with Pierre-Simon
Laplace discovered what we now call “Bayes’ Theorem”.
p(x|θ)p(θ)
p(θ|x) = ∝ p(x|θ)p(θ). (1)
p(x)
14
I “Bayesian” traces its origin to the 18th century and English
Reverend Thomas Bayes, who along with Pierre-Simon
Laplace discovered what we now call “Bayes’ Theorem”.
p(x|θ)p(θ)
p(θ|x) = ∝ p(x|θ)p(θ). (1)
p(x)
14
I “Bayesian” traces its origin to the 18th century and English
Reverend Thomas Bayes, who along with Pierre-Simon
Laplace discovered what we now call “Bayes’ Theorem”.
p(x|θ)p(θ)
p(θ|x) = ∝ p(x|θ)p(θ). (1)
p(x)
p(θ|x) posterior
14
I “Bayesian” traces its origin to the 18th century and English
Reverend Thomas Bayes, who along with Pierre-Simon
Laplace discovered what we now call “Bayes’ Theorem”.
p(x|θ)p(θ)
p(θ|x) = ∝ p(x|θ)p(θ). (1)
p(x)
p(θ|x) posterior
p(x|θ) likelihood
14
I “Bayesian” traces its origin to the 18th century and English
Reverend Thomas Bayes, who along with Pierre-Simon
Laplace discovered what we now call “Bayes’ Theorem”.
p(x|θ)p(θ)
p(θ|x) = ∝ p(x|θ)p(θ). (1)
p(x)
p(θ|x) posterior
p(x|θ) likelihood
p(θ) prior
14
Polling Example 2012
15
Polling Example 2012
15
Polling Example 2012
15
Polling Example 2012
15
Polling Example 2012
15
Density
0.0
0.2
Prior
0.4
θ
0.6
0.8
1.0
16
3.5
3.0
Prior
2.5
Likelihood
2.0
Density
1.5
1.0
0.5
0.0
17
3.5
3.0
Prior
2.5
Likelihood
Posterior
2.0
Density
1.5
1.0
0.5
0.0
18
The basic philosophical difference between the frequentist and
Bayesian paradigms is that
I Bayesians treat an unknown parameter θ as random.
19
The basic philosophical difference between the frequentist and
Bayesian paradigms is that
I Bayesians treat an unknown parameter θ as random.
I Frequentists treat θ as unknown but fixed.
19
Stopping Rule
Let θ be the probability of a particular coin landing on heads, and
suppose we want to test the hypotheses
20
Stopping Rule
Let θ be the probability of a particular coin landing on heads, and
suppose we want to test the hypotheses
20
Stopping Rule
Let θ be the probability of a particular coin landing on heads, and
suppose we want to test the hypotheses
20
Stopping Rule
Let θ be the probability of a particular coin landing on heads, and
suppose we want to test the hypotheses
20
I Suppose the experiment is “Flip six times and record the
results.”
21
I Suppose the experiment is “Flip six times and record the
results.”
I X counts the number of heads, and X ∼ Binomial(6, θ).
I The observed data was x = 5, and the p-value of our
hypothesis test is
21
I Suppose the experiment is “Flip six times and record the
results.”
I X counts the number of heads, and X ∼ Binomial(6, θ).
I The observed data was x = 5, and the p-value of our
hypothesis test is
p-value = Pθ=1/2 (X ≥ 5)
= Pθ=1/2 (X = 5) + Pθ=1/2 (X = 6)
21
I Suppose the experiment is “Flip six times and record the
results.”
I X counts the number of heads, and X ∼ Binomial(6, θ).
I The observed data was x = 5, and the p-value of our
hypothesis test is
p-value = Pθ=1/2 (X ≥ 5)
= Pθ=1/2 (X = 5) + Pθ=1/2 (X = 6)
6 1 7
= + = = 0.109375 > 0.05.
64 64 64
21
I Suppose the experiment is “Flip six times and record the
results.”
I X counts the number of heads, and X ∼ Binomial(6, θ).
I The observed data was x = 5, and the p-value of our
hypothesis test is
p-value = Pθ=1/2 (X ≥ 5)
= Pθ=1/2 (X = 5) + Pθ=1/2 (X = 6)
6 1 7
= + = = 0.109375 > 0.05.
64 64 64
So we fail to reject H0 at α = 0.05.
21
I Suppose now the experiment is “Flip until we get tails.”
22
I Suppose now the experiment is “Flip until we get tails.”
I X counts the number of the flip on which the first tails occurs,
and X ∼ Geometric(1 − θ).
I The observed data was x = 6, and the p-value of our
hypothesis test is
p-value = Pθ=1/2 (X ≥ 6)
22
I Suppose now the experiment is “Flip until we get tails.”
I X counts the number of the flip on which the first tails occurs,
and X ∼ Geometric(1 − θ).
I The observed data was x = 6, and the p-value of our
hypothesis test is
p-value = Pθ=1/2 (X ≥ 6)
= 1 − Pθ=1/2 (X < 6)
22
I Suppose now the experiment is “Flip until we get tails.”
I X counts the number of the flip on which the first tails occurs,
and X ∼ Geometric(1 − θ).
I The observed data was x = 6, and the p-value of our
hypothesis test is
p-value = Pθ=1/2 (X ≥ 6)
= 1 − Pθ=1/2 (X < 6)
5
X
= 1− Pθ=1/2 (X = x)
x=1
22
I Suppose now the experiment is “Flip until we get tails.”
I X counts the number of the flip on which the first tails occurs,
and X ∼ Geometric(1 − θ).
I The observed data was x = 6, and the p-value of our
hypothesis test is
p-value = Pθ=1/2 (X ≥ 6)
= 1 − Pθ=1/2 (X < 6)
5
X
= 1− Pθ=1/2 (X = x)
x=1
1 1 1 1 1 1
= 1− + + + + = = 0.03125 < 0.05.
2 4 8 16 32 32
So we reject H0 at α = 0.05.
22
I The conclusions differ, which seems strikes some people as
absurd.
23
I The conclusions differ, which seems strikes some people as
absurd.
I P-values aren’t close—one is 3.5 times as large as the other.
23
I The conclusions differ, which seems strikes some people as
absurd.
I P-values aren’t close—one is 3.5 times as large as the other.
I The result our hypothesis test depends on whether we would
have stopped flipping if we had gotten a tails sooner.
23
I The conclusions differ, which seems strikes some people as
absurd.
I P-values aren’t close—one is 3.5 times as large as the other.
I The result our hypothesis test depends on whether we would
have stopped flipping if we had gotten a tails sooner.
I The tests are dependent on what we call the stopping rule.
23
I The likelihood for the actual value of x that was observed is
the same for both experiments (up to a constant):
p(x|θ) ∝ θ5 (1 − θ).
24
I The likelihood for the actual value of x that was observed is
the same for both experiments (up to a constant):
p(x|θ) ∝ θ5 (1 − θ).
24
I The likelihood for the actual value of x that was observed is
the same for both experiments (up to a constant):
p(x|θ) ∝ θ5 (1 − θ).
24
Hierarchical Bayesian Models
25
Hierarchical Bayesian Models
X|θ ∼ f (x|θ)
Θ|γ ∼ π(θ|γ)
Γ ∼ φ(γ),
26
Conjugate Distributions
27
Conjugate Distributions
27
Conjugate Distributions
27
Beta-Binomial
28
Beta-Binomial
π(θ|x) ∝ p(x|θ)p(θ)
28
Beta-Binomial
π(θ|x) ∝ p(x|θ)p(θ)
n x Γ(a + b) a−1
∝ θ (1 − θ)n−x θ (1 − θ)b−1
x Γ(a)Γ(b)
28
Beta-Binomial
π(θ|x) ∝ p(x|θ)p(θ)
n x Γ(a + b) a−1
∝ θ (1 − θ)n−x θ (1 − θ)b−1
x Γ(a)Γ(b)
∝ θx (1 − θ)n−x θa−1 (1 − θ)b−1
28
Beta-Binomial
π(θ|x) ∝ p(x|θ)p(θ)
n x Γ(a + b) a−1
∝ θ (1 − θ)n−x θ (1 − θ)b−1
x Γ(a)Γ(b)
∝ θx (1 − θ)n−x θa−1 (1 − θ)b−1
∝ θx+a−1 (1 − θ)n−x+b−1 =⇒
28
Beta-Binomial
π(θ|x) ∝ p(x|θ)p(θ)
n x Γ(a + b) a−1
∝ θ (1 − θ)n−x θ (1 − θ)b−1
x Γ(a)Γ(b)
∝ θx (1 − θ)n−x θa−1 (1 − θ)b−1
∝ θx+a−1 (1 − θ)n−x+b−1 =⇒
28