0% found this document useful (0 votes)
42 views

Chapter 2: Belief, Probability, and Exchangeability: Lecture 1: Probability, Bayes Theorem, Distributions

This document discusses probability, belief functions, and exchangeability in Bayesian statistics. It defines belief functions and how they should reflect degrees of belief in statements. It notes that probabilities satisfy the properties of reasonable belief functions. The document then reviews probability functions and axioms, as well as properties like total probability, marginal probability, and Bayes' rule for events. It provides an example application of Bayes' theorem. Finally, it discusses independence, random variables and their distributions, both discrete and continuous, and joint distributions for discrete random variables.

Uploaded by

xiuxian li
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Chapter 2: Belief, Probability, and Exchangeability: Lecture 1: Probability, Bayes Theorem, Distributions

This document discusses probability, belief functions, and exchangeability in Bayesian statistics. It defines belief functions and how they should reflect degrees of belief in statements. It notes that probabilities satisfy the properties of reasonable belief functions. The document then reviews probability functions and axioms, as well as properties like total probability, marginal probability, and Bayes' rule for events. It provides an example application of Bayes' theorem. Finally, it discusses independence, random variables and their distributions, both discrete and continuous, and joint distributions for discrete random variables.

Uploaded by

xiuxian li
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Chapter 2: Belief, Probability, and Exchangeability

MSU-STT-465: Summer-20B

Lecture 1: Probability, Bayes Theorem, Distributions

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 1 / 17


Introduction
Belief Function

A belief function is a function that assigns numerical values to


statements/beliefs such that the larger the value the higher the degree of
belief. Also, we would expect belief functions to satisfy certain natural
properties.

Let F , G , and H be events/statements. A “belief” function Be(·) should


reflect our beliefs about the likelihood of events. For example,
Be(F | H ) > Be(G | H ) means that we believe more in F than in G, if we
knew H were true.
Since probabilities satisfy the properties of reasonable belief functions and
are used in Bayesian approach, we would focus and will give a brief review
of probability.

Also, exchangeability and de Finetti’s theorem play important role,


especially in Bayesian statistics. We will discuss them in rather detail.
(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 2 / 17
Probability functions and properties

As probability functions satisfy the axioms of a belief function and also


have solid mathematical foundation, we would use them henceforth. We
start with

Axioms of Probability
P1 Contradictions and tautologies:
0 = P (not H | H ) ≤ P (F | H ) ≤ P (H | H ) = 1;
P2 Addition rule: P (F ∪ G | H ) = P (F | H ) + P (G | H ), if F ∩ G = ∅;
P3 Multiplication rule: P (F ∩ G | H ) = P (F | H )P (G | F ∩ H ),
where ∅ is a empty set.

Note the probability axioms and its properties are common to both a
Bayesian and frequentist interpretation of probability.

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 3 / 17


Properties

Consider the set H , which is the “set of all possible truths.” Partition H into
discrete subsets {H1 , . . . , Hk }, where only one subset contains the truth.
Statistically, H is called the sample space, the set of possible outcomes of
a random experiment and Hi ∈ H is called an event. We can assign
probabilities whether each of these sets contains the truth. First, some
event in H is true, so that P (H) = 1.

Let E be some observation (related to the truth of one Hi ). Then,


Pn
(i) Rule of total probability: P (Hi ) = 1
i =1
(ii) Marginal probability: P (E ) = i =1 P (E ∩ Hi ) = ni=1 P (E | Hi )P (Hi )
Pn P

(iii) The rule (ii) says that the total probability of an event occurring is the
sum of all of its probabilities under the possible partitions of truths

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 4 / 17


Bayes’ rule for events
Bayes’ rule for events

Let H1 , . . . , Hn be n disjoint hypotheses/events such that ∪ni=1 Hi = H .


Assume the prior probabilities P (H1 ), . . . , P (Hn ) are known. Let E be a
hypothesis for which P (E |Hi ) are also known, for every 1 ≤ i ≤ n.

Then the posterior probabilities every P (Hi |E ) are given by

P (Hi ∩ E )
P (Hi |E ) =
P (E )
P (Hi ∩ E )
= Pn
i =1 P (E ∩ Hi )
likelihood prior
z }| { z}|{
P (E | Hi ) P (Hi )
= Pn .
i =1 P (E |Hi )P (Hi )

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 5 / 17


An Example
An Example
Let H1 and H2 denote the event that “landing gear” of a plane extends and
fails to extend, respectively. Let E denote the event that “warning light”
goes on. Suppose based on previous experience, P (E |H1 ) = 0.05,
P (E |H2 ) = 0.99. Also, record shows that P (H1 ) = 0.98 and P (H2 ) = 0.02.
Find P (H1 |E ).
From Bayes theorem,

P (E |H1 )P (H1 )
P (H1 |E ) =
P (E |H1 )P (H1 ) + P (E |H2 )P (H2 )
(0.05)(0.98)
=
(0.05)(0.98) + (0.99)(0.02)
49
=
68.8
≈ 0.7.

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 6 / 17


Bayes factor

Bayes Factor. Observe that from Bayes rule,

P (Hi | E ) P (E ∪ Hi )/P (E )
=
P (Hj | E ) P (E ∪ Hj )/P (E )
P (E | Hi ) P (Hi )
= ×
P (E | Hj ) P (Hj )
= Bayes Factor × prior probabilites.

So, the Bayes rule does not tell us about our prior beliefs should be, but
tell us how they should change after the data is obtained.

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 7 / 17


Independence
(i) Two events F and G are independent if P (F ∩ G ) = P (F )P (G ).
(ii) Two events F and G are conditionally independent given H,
denoted by F ⊥ G | H, if
P (F ∩ G | H ) = P (F | H )P (G | H ).
Note by P3, we have
P (F ∩ G | H ) =P (F | H )P (G | F ∩ H ) (always true)
=P (F | H )P (G | H ) (by conditional independence)

That is, then knowing F does not change our belief about G, when H is
known.
Example 1
Consider F = { Patient is a smoker}, G = { Patient has lung cancer} and
H = { smoking causes lung cancer}. As we know H is true, knowing that
patient has lung cancer, we do believe in F; that is, F and G are related.
What if H is not true?
(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 8 / 17
Random variables and their distributions (Contd.)

Discrete RVs

A random variable Y is discrete if the set of all its possible values Y is


countable, i.e. they can be enumerated {y1 , y2 , . . . } = Y. Examples include
the binomial and Poisson distributions. Their distributions are described
through probability mass function (PMF) probability density function or
sometimes called (PDF) p (y ) = P (Y = y ) which assigns a certain
probability to every point in its sample space. Using PDF, a cumulative
distribution function (CDF) is also defined:
X
F (y ) = P (Y ≤ y ) = p (yi ), y ∈ R.
y i ≤y

If A and B are disjoint, then

P (Y ∈ A or Y ∈ B ) = P (A ∪ B ) = P (Y ∈ A ) + P (Y ∈ B ).

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 9 / 17


Random variables and their distributions (Contd.)

Continuous RVs

A rv Y is called continuous if it can take any value in an interval and the


probability of Y taking a single value in the interval/sample space is 0. So,
we describe such distributions with probability density functions (PDFs)
f (y ), which must be integrated over an interval to obtain a probability. For
a < b, we have Z b
P (a ≤ Y ≤ b ) = f (y ) dy .
a

The CDF of a continuous rv Y is defined in a similar way:


Z y
F (y ) = P (Y ≤ y ) = f (x ) dx , y ∈ R.
−∞

Continuous distributions include the normal, gamma and beta distributions.

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 10 / 17


Joint distributions - Discrete
(i) Discrete RVs.
Let Y1 and Y2 be two discrete random variables taking values in Y1 and
Y2 , respectively. Then the joint PDF of Y1 and Y2 is defined as:
p (y1 , y2 ) = P ({Y1 = y1 } ∩ {Y2 = y2 })
and the marginal density of Y1 is obtained by summing over all possible
values of Y2 :
X
p (y1 ) = p ( y1 , y2 )
y2 ∈Y2
X
= p (y1 | y2 )p (y2 ),
y2 ∈Y2

where p (y1 | y2 ) is called the conditional density of Y2 given Y1 and is


defined by
p (y1 , y2 )
p (y2 | y1 ) =
p (y1 )
(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 11 / 17
Joint distributions - Discrete

Remark 0.1
Given the joint density p (y1 , y2 ), we can calculate marginal and conditional
densities {p (y1 ), p (y2 ), p (y1 | y2 ), p (y2 | y1 )}. Also, given p (y1 ) and
p (y2 | y1 ), we can reconstruct the joint distribution.

However, given only marginal densities p (y1 ) and p (y2 ), we can’t


reconstruct the joint distribution, since we don’t know whether the events
associated with the rvs are independent or not.

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 12 / 17


Continuous
(ii) Continuous RVs
Let Y1 and Y2 be two continuous rvs. Then their joint PDF p (y1 , y2 ) (note
we use the same notation used for the discrete case) satisfies
Z ∞ Z ∞
(i ) p (y1 , y2 ) ≥ 0; (ii ) p (y1 , y2 ) dy1 dy2 = 1.
−∞ −∞
Its CDF is given by
Z y1 Z y2
F ( y1 , y2 ) = p (x1 , x2 ) dx2 dx1 .
−∞ −∞
The marginal densities can be obtained from
Z ∞
p (y1 ) = p (y1 , y2 ) dy2
Z−∞

p (y2 ) = p (y1 , y2 ) dy1
−∞
With the marginal densities, we can compute the conditional densities
p (y2 | y1 ) = p (y1 , y2 )/p (y1 ), etc.
(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 13 / 17
Mixed Random Variables
In Bayesian inference, the parameters that appear in the probability
distributions are treated as rvs. Also, the sampling model p (y | θ) may be
discrete, but the distribution of θ, p (θ), may be a continuous distribution.
This leads to th following case.
(iii) Mixed RVs
Let Y1 be a discrete rv and Y2 be a continuous rv. Then the joint density of
Y1 and Y2 is defined as

p (y1 , y2 ) =p (y1 )p (y2 |y1 )

so that
XZ
Pr (Y1 ∈ A ; Y2 ∈ B ) = p (y1 )p (y2 |y1 )dy2
y 1 ∈A y 2 ∈B
Z X
= { p (y1 , y2 )}dy2 .
y 2 ∈B y ∈A
1

(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 14 / 17


Bayes Rule for Densities
(iii) Bayes Rule Parameter Estimation
In the Bayesian Inference, the parameter θ is random and its density, p (θ),
is called the prior density of θ. The sampling model p (y |θ) is then the
conditional density of given Y given θ.
For example, θ might represent the proportion of people having a
property and Y denotes the count in the sample with that property. In that
case, it is natural to treat θ continuous and Y a discrete rv.
Theorem 0.1 (Bayes Theorem)
Let p (θ) denote the prior density of θ. Then the posterior density p (θ|y ), is

p (θ)p (y |θ)
p (θ|y ) = R ,
p (θ)p (y |θ)d θ
p (θ)p (y |θ)
= ,
p (y )
where p (y ) is called the marginal density of Y .
(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 15 / 17
Bayes Rule for Densities
Proof.
Let p (y ) is the marginal density of Y . Then the joint density p (y , θ) is

p (y , θ) = p (θ)p (y |θ).

Using this, we have

p (y , θ)
p (θ|y ) =
p (y )
p (y , θ)
=R
p (y , θ)d θ
p (θ)p (y |θ)
=R
p (θ)p (y |θ)d θ
= p (θ)p (y |θ)/p (y ),

which proves the result. 


(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 16 / 17
Bayes Rule for Densities
Remarks 0.1
(i) In the posterior density p (θ|y ), the information on θ, coming from data
y, enters through p (y |θ)
(ii) Since Y = y is given, p (y |θ) should be viewed as a function of θ,
called likelihood function and denoted by `(θ|y ). That is,
`(θ|y ) = p (y |θ).
(iii) The posterior density p (θ|y ) is the main basis of Bayesian inference;
that is, inference procedures for θ are based on p (θ|y ) only.
(iv) Let θ1 and θ2 be two value of θ. Then

p (θ1 |y ) p (θ1 )f (y |θ1 )/f (y )


=
p (θ2 |y ) p (θ2 )f (y |θ2 )/f (y )
p (θ1 )f (y |θ1 )
=
p (θ2 )f (y |θ2 )

and so we do not really need to compute p (y ).


(P. Vellaisamy: MSU-STT-465: Summer-20B) Bayesian Statistical Methods 17 / 17

You might also like