0% found this document useful (0 votes)
49 views41 pages

Bayes For Beginners: Luca Chech and Jolanda Malamud Supervisor: Thomas Parr 13 February 2019

This document provides an introduction to Bayesian statistics and Bayesian inference. It defines key concepts like probability distributions, joint probability, marginal probability, conditional probability, and Bayes' theorem. It also provides examples of how to calculate conditional probabilities and use Bayes' theorem. Finally, it contrasts frequentist and Bayesian statistical models and outlines the goal of Bayesian inference as updating beliefs with new data by calculating the posterior probability density.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views41 pages

Bayes For Beginners: Luca Chech and Jolanda Malamud Supervisor: Thomas Parr 13 February 2019

This document provides an introduction to Bayesian statistics and Bayesian inference. It defines key concepts like probability distributions, joint probability, marginal probability, conditional probability, and Bayes' theorem. It also provides examples of how to calculate conditional probabilities and use Bayes' theorem. Finally, it contrasts frequentist and Bayesian statistical models and outlines the goal of Bayesian inference as updating beliefs with new data by calculating the posterior probability density.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Bayes for Beginners

LUCA CHECH AND JOLANDA MALAMUD


SUPERVISOR: THOMAS PARR
13 T H FEBRUARY 2019
Outline
• Probability distributions
• Joint probability
• Marginal probability
• Conditional probability
• Bayes’ theorem
• Bayesian inference
• Coin toss example
“Probability is orderly opinion and inference from data is
nothing other than the revision of such opinion in the light of
relevant new information.”
Eliezer S. Yudkowsky
Some applications
Probability distribution
Discrete Continuous
X P(X) PDF
Height
2 1 1/100
1 …
2 1/100
… … UK
100 POPULATION
100 1/100
P(X) PMF

  𝑃𝑀𝐹 ( 𝑋 )=1
∑ X P(X)
𝑋
1.8 m 0
1/100
P given by the area
1 2 … 100
X 1  .75 ≤ 𝑋 ≤ 1 .85
Probability
• Probability of A occurring: P(A)

• Probability of B occurring: P(B)

• Joint probability (A AND B both occurring): P(A,B)


 
Marginal probability
joint probability : 𝑃
  ( 𝑋=0 ,𝑌 =1 ) =0 . 1
disease
x ∑
  𝑃 ( 𝑋 =𝑥 ,𝑌 = 𝑦 )=1
0 1 𝑥, 𝑦
symptoms

0 0.5 0.1
Y 𝑃
  ( 𝑌 =1 )=0 . 1+0 .3=0 . 4
1 0.1 0.3
𝑃
  ( 𝑋=0 )=0 .1+0 . 5=0 .6

 𝑃 ( 𝑋=𝑥 ) = ∑ 𝑃( 𝑋=𝑥 ,𝑌 = 𝑦 )
𝑦
Conditional probability
What is the probability of A occurring, given that B has occurred?

Probability of A given B?
Conditional Probability
joint probability : 𝑃
  ( 𝑋=0
𝑋= 0,𝑌
,𝑌 =1
=1)) =0.1

disease
x Conditional probability:

0 1   (( 𝑋
 𝑃 0.3
0.3 3
𝑃 𝑋== 1|𝑌 =1 )
  𝑋=1 𝑌 =1 ))=0.3 = =
symptoms

0.1+0.3
0.1+0.3 4
0 0.5 0.1
Y 0.1 1
1 0.1 0.3 𝑃  ((𝑋=
 𝑃 𝑋=
𝑋=00
0|𝑌 =1
=1 )) =
=0.1 =
  0.1+0.3
0.1+0.3 4

  P(X|Y)=
Conditional probability: Example
  (𝐶)= 1
𝑃   ( 𝑁𝐶 ) = 99
𝑃   (+ ¿𝐶 ) = 90
𝑃   (+ ¿ 𝑁𝐶 )= 8
𝑃
100 100 100 100

  𝑃 ¿

𝑃 (+ ,𝐶 )
   
𝑃
𝑃¿¿
𝑃¿
  𝑃
𝑃
  ((+¿𝐶 )=
+ ¿𝐶) =¿   𝑃 ¿
𝑃 (𝐶 )

¿ ∑ ,+¿)¿( 𝑋 ,+¿)= 𝑃 ¿ ¿
  𝑃(𝑋 𝑃
 𝑃 ∑
𝑥 𝑥
  𝑃(+ , 𝑁𝐶 )   (+ 8 99 792
𝑃 (+ ¿ 𝑁𝐶 )= 𝑃
  +,,𝑁𝐶
𝑁𝐶)=𝑃 ( +¿𝑁𝐶
)=𝑃(+¿ ) ×𝑃(
𝑁𝐶)× 𝑃 𝑁𝐶
( 𝑁𝐶) )= × =
𝑃(𝑁𝐶 ) 100 100 10000
Conditional probability: Example
  (𝐶)= 1
𝑃   ( 𝑁𝐶 ) = 99
𝑃   (+ ¿𝐶 ) = 90
𝑃   (+ ¿ 𝑁𝐶 )= 8
𝑃
100 100 100 100

  𝑃 ¿

𝑃 ¿
 
Derivation of Bayes’ theorem
Bayes’ theorem
  𝑃 (𝐵∨ 𝐴)× 𝑃( 𝐴)
𝑃 ( 𝐴|𝐵 ) =
𝑃 (𝐵)

  𝑃𝑃(𝐴
(𝐴∩𝐵)
∩𝐵) 𝑃(𝐵∨𝐴 )× 𝑃 (𝐴 )
1 𝑃 ( 𝐴𝐴||𝐵𝐵) )=
= =
𝑃(𝐵)
𝑃(𝐵) 𝑃(𝐵)
 
  ( 𝐵| 𝐴 ) = ∩ 𝐴)
𝑃 (𝐵 ∩ 𝐴) 𝑃( 𝐴 ∩ 𝐵)
2 𝑃 =
𝑃( 𝐴)
𝐴) 𝑃( 𝐴)

𝑃
  ( 𝐴 ∩ 𝐵 )=𝑃 ( 𝐵| 𝐴 ) × 𝑃( 𝐴)
Bayes’ theorem, alternative form

  𝑃 (𝐵∨ 𝐴)× 𝑃( 𝐴)
𝑃 ( 𝐴|𝐵 ) =
𝑃 (𝐵)
Bayes’ theorem problems
Example 1
10% of patients in a clinic have liver disease. Five percent of the clinic’s patients are alcoholics.
Amongst those patients diagnosed with liver disease, 7% are alcoholics. You are interested in knowing
the probability of a patient having liver disease, given that he is an alcoholic.

 
P(A) = probability of liver disease = 0.10
P(B) = probability of alcoholism = 0.05
P(B|A) = 0.07
P(A|B) = ?

In other words, if the patient is an alcoholic, their chances of having liver disease is 0.14 (14%)
Example 2
A disease occurs in 0.5% of the population
A diagnostic test gives a positive result in:
◦ 99% of people with the disease
◦ 5% of people without the disease (false positive)

A person receives a positive result


What is the probability of them having the disease, given a positive result?
 

We know:

= 0.99
= 0.005
= ???
 

Where:
chance of having the disease
chance of not having the disease
Remember:
chance of positive test given that disease is present
chance of positive test given that the disease isn’t present
 
Therefore:
Frequentist vs. Bayesian statistics
Frequentist models in practice
•  Model:

• Data is random variable, while parameters are unknown but fixed

• We assume there is a true set of parameters, or true model of the world, and we
are concerned with getting the best possible estimate

• We are interested in point estimates of parameters given the data


Bayesian models in practice
 • Model:

• Data is fixed, while parameters are considered to be random variables

• There is no single set of parameters that denotes a true model of the


world - we have parameters that are more or less probable

• We are interested in distribution of parameters given the data


Bayesian Inference
• Provides a dynamic model through which our belief is constantly updated as
we add more data
• Ultimate goal is to calculate the posterior probability density, which is
proportional to the likelihood (of our data being correct) and our prior
knowledge
• Can be used as model for the brain (Bayesian brain), history and human
behaviour
Bayes rule
Likelihood Prior
Posterior
 
𝑃 ( 𝐷|𝜃 ) × 𝑃 ( 𝜃 )
𝑃 ( 𝜃|𝐷 ) = ∝ 𝑃 ( 𝐷|𝜃 ) × 𝑃 ( 𝜃 )
𝑃 ( 𝐷)
 
Evidence ∫ 𝑃 ( 𝐷|𝜃 )×𝑃 (𝜃 ) 𝑑𝜃
• How good are our parameters given the data
• Prior knowledge is incorporated and used to update our beliefs about the
parameters
Generative models
•  Specify a joint probability distribution over all variables (observations and
parameters)  requires a likelihood function and a prior:

• Model comparison based on the model evidence:


Principles of Bayesian Inference
• Formulation of a generative model
  Likelihood function
Model

  Prior distribution
• Observation of data
Measurement data D

• Model inversion – updating one’s belief


Posterior distribution
 𝑃 ( 𝜃|𝐷 ) ∝ 𝑃 ( 𝐷|𝜃 ) × 𝑃(𝜃)

Model evidence
Priors
 
Priors can be of different sorts, e.g.
• empirical (previous data)
• uninformed
• principled (e.g. positivity constraints)
• shrinkage

Conjugate priors = posterior is in the same family as the prior


 
𝑃 ( 𝜃|𝐷 ) ∝ 𝑃 ( 𝐷|𝜃 ) × 𝑃 (𝜃 ) ∝ 𝑙𝑖𝑘𝑒𝑙𝑖 h 𝑜𝑜𝑑 × 𝑝𝑟𝑖𝑜𝑟
• effect of more
informative prior
distributions on
the posterior
distribution
 
𝑃 ( 𝜃|𝐷 ) ∝ 𝑃 ( 𝐷|𝜃 ) × 𝑃 (𝜃 ) ∝ 𝑙𝑖𝑘𝑒𝑙𝑖 h 𝑜𝑜𝑑 × 𝑝𝑟𝑖𝑜𝑟
• effect of larger
sample sizes on
the posterior
distribution
Example: Coin flipping model
• Someone flips a coin

• We don’t know if the coin is fair or not

• We are told only the outcome of the coin flipping


Example: Coin flipping model
• 1st Hypothesis: Coin is fair, 50% Heads or Tails

• 2nd Hypothesis: Both sides of the coin are heads, 100% Heads
Example: Coin flipping model
 • 1st Hypothesis: Coin is fair, 50% Heads or Tails

• 2nd Hypothesis: Both sides of the coin are heads, 100% Heads
Example: Coin flipping model
• 
Example: Coin flipping model
• 
Example: Coin flipping model

Coin is flipped a second time and it is heads again

 Posterior in the previous time step becomes the new prior!!


Example: Coin flipping model
 
Hypothesis testing
Classical Bayesian Inference
• Define the null hypothesis • Define a hypothesis
• H0: Coin is fair θ=0.5 • H: θ>0.1

• 

0.1
Example: Coin flipping model
  and we think a priori that the coin is fair:

Evidence for a fair model is:

And for a bent model:

Posterior for the models:


"A Bayesian is one who,
vaguely expecting a horse,
and catching a glimpse of a donkey,
strongly believes he has seen a mule."
References
• Previous MfD slides
• Bayesian statistics (a very brief introduction) – Ken Rice
• http://www.statisticshowto.com/bayes-theorem-problems/
• Slides “Bayesian inference and generative models” of K.E. Stephan
• Introslides to probabilistic & unsupervised learning of M. Sahani
• Animations: https://blog.stata.com/2016/11/01/introduction-to-
bayesian-statistics-part-1-the-basic-concepts/

You might also like