0% found this document useful (0 votes)
37 views42 pages

Slides 1

This document discusses Bayesian methods and the R statistical computing environment. It provides information on R, including that it is a language and environment for statistical computing and graphics. R allows users to add functionality by defining new functions. The document also notes that R is a GNU project similar to the S language and environment. It then discusses Bayesian analysis of binomial data using uniform and beta priors. Finally, it introduces simulation-based Bayesian estimation approaches.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views42 pages

Slides 1

This document discusses Bayesian methods and the R statistical computing environment. It provides information on R, including that it is a language and environment for statistical computing and graphics. R allows users to add functionality by defining new functions. The document also notes that R is a GNU project similar to the S language and environment. It then discusses Bayesian analysis of binomial data using uniform and beta priors. Finally, it introduces simulation-based Bayesian estimation approaches.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Bayesian Methods

LABORATORY
Lesson 1: Jan 24 2002 Software:

Bayesian Methods p.1/20

The R Project for Statistical Computing


http://www.r-project.org/

R is a language and environment for statistical computing and graphics.

Bayesian Methods p.2/20

The R Project for Statistical Computing


http://www.r-project.org/

R is a language and environment for statistical computing and graphics.


R, like S, is designed around a computer language,

and it allows users to add additional functionality by defining new functions.

Bayesian Methods p.2/20

The R Project for Statistical Computing


http://www.r-project.org/

R is a language and environment for statistical computing and graphics.


R, like S, is designed around a computer language,

and it allows users to add additional functionality by defining new functions.


The term "environment" is intended to characterize

it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools.

Bayesian Methods p.2/20

It is a GNU project which is similar to the S language and environment.

Bayesian Methods p.3/20

It is a GNU project which is similar to the S language and environment.


The GNU Project was launched in 1984 to develop

a complete Unix-like operating system which is free software.

Bayesian Methods p.3/20

It is a GNU project which is similar to the S language and environment.


The GNU Project was launched in 1984 to develop

a complete Unix-like operating system which is free software.


Free software is a matter of the users freedom to

run, copy, distribute, study, change and improve the software. It is not a matter of price!

Bayesian Methods p.3/20

It is a GNU project which is similar to the S language and environment.


The GNU Project was launched in 1984 to develop

a complete Unix-like operating system which is free software.


Free software is a matter of the users freedom to

run, copy, distribute, study, change and improve the software. It is not a matter of price!

R can be considered as a different implementation of S. There are some important


differences, but much code written for S runs unaltered under R.

Bayesian Methods p.3/20

O NE - DIMENSIONAL parameter models


1. The Binomial model and its coniugate Beta prior in Bin(n, ) there is a single parameter of interest (n is tipically assumed known), that is the probability of a certain outcome in each of the n trials considered.

Bayesian Methods p.4/20

Bayesian estimation of a probability from B INOMIAL data


Gelman

book, pag. 39, sec. 2.5 R code placenta.r is in the Lab notes at the course web page
Our interest focus on the proportion of female births in

the so called maternal condition placenta previa

Bayesian Methods p.5/20

Bayesian estimation of a probability from B INOMIAL data


Gelman

book, pag. 39, sec. 2.5 R code placenta.r is in the Lab notes at the course web page
Our interest focus on the proportion of female births in

the so called maternal condition placenta previa


Our data consist in a early study in Germany: 437

females on 980 placenta previa births

Bayesian Methods p.5/20

Bayesian estimation of a probability from B INOMIAL data


Gelman

book, pag. 39, sec. 2.5 R code placenta.r is in the Lab notes at the course web page
Our interest focus on the proportion of female births in

the so called maternal condition placenta previa


Our data consist in a early study in Germany: 437

females on 980 placenta previa births


How much evidence do they provide that the

proportion of placenta previa female births is < 0.485, the proportion of the general population female births?

Bayesian Methods p.5/20

Analysis using a

UNIFORM PRIOR

Let the 1-parameter denote the proportion of placenta previa female births

Bayesian Methods p.6/20

Analysis using a

UNIFORM PRIOR

Let the 1-parameter denote the proportion of placenta previa female births We assume a Bin(, 980) 437 (1 )980437 to be the model generating the data

Bayesian Methods p.6/20

Analysis using a

UNIFORM PRIOR

Let the 1-parameter denote the proportion of placenta previa female births We assume a Bin(, 980) 437 (1 )980437 to be the model generating the data We specify the prior for to be a U [0, 1]

Bayesian Methods p.6/20

Analysis using a

UNIFORM PRIOR

Let the 1-parameter denote the proportion of placenta previa female births We assume a Bin(, 980) 437 (1 )980437 to be the model generating the data We specify the prior for to be a U [0, 1] The posterior for is, then, 437 (1 )980437 , i.e., is a Beta(437 + 1, 980 437 + 1)

Bayesian Methods p.6/20

(Beta-)UniformBinomial
25

likelihood

0 0.35

10

15

20

0.40

0.45

0.50

0.55

Bayesian Methods p.7/20

(Beta-)UniformBinomial
25

likelihood

10

15

20

prior
0 0.35

0.40

0.45

0.50

0.55

Bayesian Methods p.7/20

(Beta-)UniformBinomial
25

likelihood

15

20

posterior

10

prior
0 0.35

0.40

0.45

0.50

0.55

Bayesian Methods p.7/20

Analysis using different

BETA PRIORS

As the likelihood p(y|) L(; y) is y (1 )ny if the prior is of the same form, e.g., p() is 1 (1 )1 then the posterior will also be of this form. In fact, p(|y) is y+1 (1 )ny+1 = Beta( + y, + n y)

-> the B ETA prior distribution is a coniugate family for


the BINOMIAL likelihood

Bayesian Methods p.8/20

20

15

10

0.35 25

0.40

0.45

0.50

0.55

10

15

20

a+b2= 0

a+b2= 0

0.35

0.40

0.45

0.50

0.55

10 15 20 25

10

15

20

a+b2= 10

a+b2= 100

0.35 35

0.40

0.45

0.50

0.55

0 0.35

0.40

0.45

0.50

0.55

a+b2= 1000

a+b2= 10000

25

15

0 5

20

40

60

80

Bayesian Methods p.9/20

0.35

0.40

0.45

0.50

0.55

0.35

0.40

0.45

0.50

0.55

20

15

10

0.35 25

0.40

0.45

0.50

0.55

10

15

20

a+b2= 0

a+b2= 0

0.35

0.40

0.45

0.50

0.55

10 15 20 25

10

15

20

a+b2= 10

a+b2= 100

0.35 35

0.40

0.45

0.50

0.55

0 0.35

0.40

0.45

0.50

0.55

a+b2= 1000

a+b2= 10000

25

15

0 5

20

40

60

80

Bayesian Methods p.9/20

0.35

0.40

0.45

0.50

0.55

0.35

0.40

0.45

0.50

0.55

20

15

10

0.35 25

0.40

0.45

0.50

0.55

10

15

20

a+b2= 0

a+b2= 0

0.35

0.40

0.45

0.50

0.55

10 15 20 25

10

15

20

a+b2= 10

a+b2= 100

0.35 35

0.40

0.45

0.50

0.55

0 0.35

0.40

0.45

0.50

0.55

a+b2= 1000

a+b2= 10000

25

15

0 5

20

40

60

80

Bayesian Methods p.9/20

0.35

0.40

0.45

0.50

0.55

0.35

0.40

0.45

0.50

0.55

How does posterior COMPROMISE between prior and the data?

The compromise depends on how much weight prior has (or how much informative it is) w.r.t. the data at hand

Bayesian Methods p.10/20

How does posterior COMPROMISE between prior and the data?

The compromise depends on how much weight prior has (or how much informative it is) w.r.t. the data at hand i.e., in the binomial case, depends on the relative weight of +2 number of prior observations ( prior precision)
(1) Note: precision=1/variance, var= ++1

w.r.t. n, the sample size

Bayesian Methods p.10/20

A first SENSITIVITY ANALYSIS


concept of sensitivity: sensitivity or robustness of the inferences to the choice of the prior
Prior information +2 0 0 10 100 1000 10000 mean 0.500 0.485 0.485 0.485 0.485 0.485 Posterior information mean 0.446 0.446 0.446 0.450 0.466 0.482 95% interval [ 0.415 , 0.477 ] [ 0.415 , 0.477 ] [ 0.416 , 0.477 ] [ 0.420 , 0.479 ] [ 0.444 , 0.488 ] [ 0.472 , 0.491 ]

NOTE: in placenta previa example n 1000 and y = 0.446

Bayesian Methods p.11/20

The SIMULATION-based estimation approach

The modern approach to Bayesian estimation has become closely linked to simulation-based estimation methods.

Bayesian Methods p.12/20

The SIMULATION-based estimation approach

The modern approach to Bayesian estimation has become closely linked to simulation-based estimation methods. In fact, Bayesian estimation focuses on estimating the entire density of a parameter.

Bayesian Methods p.12/20

The SIMULATION-based estimation approach

The modern approach to Bayesian estimation has become closely linked to simulation-based estimation methods. In fact, Bayesian estimation focuses on estimating the entire density of a parameter. This density estimation is based on generating samples from the posterior density of the parameters themselves or of functions of parameters.

Bayesian Methods p.12/20

In the B ETA -B INOMIAL model, the coniugacy allows us knowing the posterior density in closed form.

Bayesian Methods p.13/20

In the B ETA -B INOMIAL model, the coniugacy allows us knowing the posterior density in closed form. Then, direct calculations are feasible or direct simulation from it can be performed.

Bayesian Methods p.13/20

In the B ETA -B INOMIAL model, the coniugacy allows us knowing the posterior density in closed form. Then, direct calculations are feasible or direct simulation from it can be performed. However, even if posterior density cannot be explicitly integrated, iterative simulation methods (or MCMC) are alternatively used. We will see them in future labs.

Bayesian Methods p.13/20

a first (direct) simulation


Congdon

book, pag. 31, sec. 2.11

Wilcox (1996) presents data from a 1991 gallup

opinion poll about the morality of President Bushs not helping Iraqi rebel groups after the formal end of the gulf war. Of the 751 adults responding, 150 thought the presidents actions were not moral.

Bayesian Methods p.14/20

a first (direct) simulation


Congdon

book, pag. 31, sec. 2.11

Wilcox (1996) presents data from a 1991 gallup

opinion poll about the morality of President Bushs not helping Iraqi rebel groups after the formal end of the gulf war. Of the 751 adults responding, 150 thought the presidents actions were not moral.
We are interested in assessing the probability that a

randomly sampled adult would respond immoral.

Bayesian Methods p.14/20

a first (direct) simulation


Congdon

book, pag. 31, sec. 2.11

Wilcox (1996) presents data from a 1991 gallup

opinion poll about the morality of President Bushs not helping Iraqi rebel groups after the formal end of the gulf war. Of the 751 adults responding, 150 thought the presidents actions were not moral.
We are interested in assessing the probability that a

randomly sampled adult would respond immoral.


In the inference we might use evidence from previous

polls on the proportion of the population generally likely to consider a Presidents actions immoral.

Bayesian Methods p.14/20

a first (direct) simulation


The R code is in betabin.r at the course web page
We present Bayesian inference about the probability of

an adult responding immoral assuming different Beta priors: 1. = = 1 prior information 0 E = 1/2 2. = = 0.001 3. = 1 = 0.11 4. = 1.8 = 0.2
prior information < 0 prior information < 0 prior information 0

E = 1/2 E = 0.9 E = 0.9

5. = 4.5, 45 = 0.5, 5 E = 0.9

prior information 5,50

Bayesian Methods p.15/20

1., 2. are both non informative, but 2. is a reasonable choice for one-off events (or for correlated data) 3., 4. may be assumed on the basis of previous polls. Although E=0.9 they still are diffuse.
density(x = rbeta(50000, 45, 5))

5., 6 are

increasingly informative.
10 a=45 b=5 8 Density 6 a=.001 b=.001 4

a=1 b=1

0 0.0

0.2

0.4

0.6

0.8

1.0

Bayesian Methods p.16/20

N = 50000 Bandwidth = 0.004278

Legend for the next figure >


in each figure: curves: histogram of 10,000 draws from the posterior

Beta(150+,601+); likelihood Bin(150,751). intervals: Unif-Bin 95% posterior interval; 95% (Beta(150+,601+)) posterior interval; Normal approximation of the 95% posterior interval; Inverted 95% posterior interval on the logit scale.

Bayesian Methods p.17/20

Legend for the next figure >


in each figure: curves: histogram of 10,000 draws from the posterior

Beta(150+,601+); likelihood Bin(150,751). intervals: Unif-Bin 95% posterior interval; 95% (Beta(150+,601+)) posterior interval; Normal approximation of the 95% posterior interval; Inverted 95% posterior interval on the logit scale.
Though is close to 0, because of the large sample

size (751), the normal approximation is good as well as posterior inferences are insensitive to prior choice (even if discordant to data), at least for prior information 0.

Bayesian Methods p.17/20

30

20

5 10

a+b2= 0 a= 1
0.16 0.18 0.20 0.22 0.24 0.26

5 10

20

30

a+b2= 1.998 a= 0.001


0.16 0.18 0.20 0.22 0.24 0.26

Histogram of post
30 30

Histogram of post

20

5 10

a+b2= 0.89 a= 1
0.16 0.18 0.20 0.22 0.24 0.26 0.16

5 10

20

a+b2= 0 a= 1.8
0.18 0.20 0.22 0.24 0.26 0.28

Histogram of post
30 30

Histogram of post

20

5 10

a+b2= 3 a= 4.5
0.16 0.18 0.20 0.22 0.24 0.26 0.20

5 10

20

a+b2= 48 a= 45
0.22 0.24 0.26 0.28 0.30

Bayesian Methods p.18/20

And what about if our sample size was only

n = 5, with y = 1 adults considering immoral the Presidents actions? ->

NOTE :

the empirical mean still is y/n = 0.2

Bayesian Methods p.19/20

Histogram of post
2.0 3

Histogram of post

1.0

0.0

a+b2= 0 a= 1
0.0 0.2 0.4 0.6 0.8 0.0

a+b2= 1.998 a= 0.001


0.2 0.4 0.6 0.8

Histogram of post
2.0 2.0

Histogram of post

1.0

0.0

a+b2= 0.89 a= 1
0.0 0.2 0.4 0.6 0.8

0.0

1.0

a+b2= 0 a= 1.8
0.0 0.2 0.4 0.6 0.8

Histogram of post
2.0 8

Histogram of post

1.0

0.0

a+b2= 3 a= 4.5
0.2 0.4 0.6 0.8 0.65 0.70

a+b2= 48 a= 45
0.75 0.80 0.85 0.90 0.95

Bayesian Methods p.20/20

You might also like