0% found this document useful (0 votes)
46 views

Understanding Python

The document discusses Bayesian inference through the example of a coin toss experiment. It introduces Bayesian inference as updating prior beliefs about probabilities after observing data. It shows how to model a coin toss experiment as a binomial process and simulate drawing random probabilities from a uniform prior to generate fake data. The probabilities that generate the actual observed data are collected to form the posterior distribution, representing the updated beliefs after seeing the data. The document also discusses using informative priors representing existing beliefs, like a coin having a 50% chance of heads.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Understanding Python

The document discusses Bayesian inference through the example of a coin toss experiment. It introduces Bayesian inference as updating prior beliefs about probabilities after observing data. It shows how to model a coin toss experiment as a binomial process and simulate drawing random probabilities from a uniform prior to generate fake data. The probabilities that generate the actual observed data are collected to form the posterior distribution, representing the updated beliefs after seeing the data. The document also discusses using informative priors representing existing beliefs, like a coin having a 50% chance of heads.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

About Features Explore Membership Sea

Login Register

Files Attachment 2 22

No le attachment
Understanding Bayesian
Statistic Intuitively (In Imam AR'`s Book
Python)
Understanding Bayesian
bayesianstatistics bayesianinference statistics Statistic Intuitively (In
Python)

Imam Published on Sep Updated 4 See all notes


AR 15, 2020 months ago
, Indonesia

Overview

Table of Content

Introduction to Bayesian inference

On using informative prior

Some other example: linear regression

Practical implementation using PyMC3

Introducing Bayesian inference

I would like to start this -unlike other typical bayesian

tutorials- without the Bayesian theorem first. But let

start a very simple practical example in statistics: coin

toss! Let us do a 5 consecutive coin toss which

resulted in {HTTTT} (H means Head part of the coin,

and T means Tail part of the coin, basically, we get 1

head and 4 tail in the result). The question, how

certain we are if it was a fair coin? Or it is not?In

Bayesian inference, it is essential to understand the

process of how the data was being generated. In this

case, the coin toss process can be modeled with the

binomial process. Here is the key take point from

Wikipedia page :

In probability theory and statistics,

the binomial distribution with

parameters n and p, denoted

Bin(n,p) is the discrete probability

distribution of the number of

successes in a sequence of n

independent experiments, each

asking a yes-no question, and each

with its own boolean-valued

outcome: success/yes/true/one

(with probability p), or


failure/no/false/zero (with

probability q = 1 − p).

If we consider if H (head) means success and T (tail) is

not, this implies exactly as a binomial process by the

definition. The binomial process has two parameters

which are n (numbers of experiment conducted) and p

(probability of success), denoted by Bin(n,p). In this

case, let us put our case into the binomial distribution

parameter. Let see what we have: 5 times coin toss

(n=5) with unknown probability of success (unknown

p) and the outcome of {HTTTT} (1 fo 5 outcomes is a

success), and let put this into input-process-output

perspective, basically,

Input: n & p

Process: binomial distribution with parameter n & p

Output: {HTTTT}

As said before, understanding the data generation

process is essential in order to do Bayesian inference.

As we would like to know how likely a coin is a fair

coin, basically we are looking for a parameter p. If p

likely to be valued at 0.5, then we are pretty sure it

was a fair coin, and vice versa. How do we get this p?

Why won't we try a simulation?Let assume, we

completely have no idea regarding the value of p

before looking at the data. Then we can assume any

value between 0 and 1 should have equal probability

to be the value of p, which means, we model our prior

belief of p to be uniformly distributed from 0 to 1,

denoted by p_prior~Uniform(0,1) . And putting this

into our simulation machine, by doing the following:

Let take random value from the distribution of p_prior,

for example, we get p_prior=Uniform(0,1)=0.3

Let put this value into our data generation process

which is a binomial process. For example, we get

Bin(n,p)=Bin(5,0.3)={HHTTT} (2 success and 3 fail

experiment). As this is different from the actual data

we get (which is {HTTTT}), we should reject this

result.

Redo the simulation from 1 as many times as possible,

and we should only collect all the value of p which

resulted in the same data as the data we have.

Here the simulation of several iteration :


Iteration 1 : p=Uniform(0,1)=0.32 -->

Bin(5,0.32)={HHTTT}. rejected

Iteration 2 : p=Uniform(0,1)=0.11 --> Bin(5,0.11)=

{HTTTT}. get this value p=0.11

Iteration 3 : p=Uniform(0,1)=0.69 --> Bin(5,0.69)=

{HHHTT}. rejected

Doing this many times, we will get a set of the

possible value of p which matches our data. I am

trying to do this in python with 100,000 iterations and

plot the distribution of value p we get from the

simulation, following are the code

import matplotlib.pyplot as pltimport


matplotlibimport seaborn as
snssns.set()matplotlib.rcParams['figure.figsize']
= [10,5]import numpy as npn=5 # number of
trial/coin tossp_posterior=list()for i in
range(100000): p_prior = np.random.random()
n_success = np.random.binomial(5, p_prior, 1)[0]
# how many times we get head if n_success ==
1: # we collect everytime the simulation get the
same result as the data
p_posterior.append(p_prior)ax =
sns.distplot(p_posterior)ax.set(xlabel='p')

And here the distribution of p we generated,

Oh, look! we get some intuitive view over that. The

distribution of p parameter shows the most probable

value of p is around 0.2 (of course, it is 1 from 5 data

that show we get "success" or head result). BUT (big

but over there), as we only have 5 collected data, we

are actually quite uncertain about that one, which

actually also reflected to this resulted distribution. We

can see this is quite a fat distribution that implies our

uncertainty towards the value. Other than that, it also

possible to infer other value from the result. Like, how

confident we are if the actual value of p is lies

between 0.15-0.25? We can infer this from all of our

sample results which lies between 0.15-0.25 divided

by total sample (from my code, I get it ~24%), with the

following code
p_posterior =
np.array(p_posterior)sum((p_posterior<0.25) &
(p_posterior>0.15)) / len(p_posterior)

as the result of Bayesian inference is actually a

sample distribution, unlike the frequentist approach,

we can infer many things from the result only.

This resulted distribution is what we called a prior

distribution or a prior belief. In sense, what we have

done is that we update our prior belief of p (which

previously uniformly distributed from 0 to 1) into new

belief after looking at the data. And this is a Bayesian

inference is all about, use the data to update our

belief towards some parameter/value. In probability

term, our prior believe denoted by P(p) (probability of

p), and we infer the probability of p given the data

which denoted P(p|x) by some likelihood distribution

P(x|p) (or probability of the data given a parameter).

We can see that P(p|x) basically is the multiplication

of probability of p ( P(p) ) and the likelihood of the data

given parameter p P(x|p) (remember our process,

basically a multiplication of probability is looking at

the 2 events where both are true, which exactly how

our simulation is happening). We can write this as,

P(p|x) ∝ P(p)P(x|p)

This notation you might encounter at any Bayesian

inference tutorial out there as a core concept of the

inference. This concept is actually very natural for

humans, basically we revisit our prior belief after

looking at data/evidence. Like after watching the sun

always rise every day, we believe it will also rise

tomorrow with huge confidence. Or when you would

like to lend a car to somebody, you would rather

believe a friend whom you have already seen trusted

(in terms of driving capability and honesty) compare

to the complete strangers. And the more evidence you

have, the more confidence you are towards the future

outcome (whether the car will come back safely or

not). This is very natural in the way of human thinking.

On using informative prior

One interesting stuff on Bayesian statistics is, its

ability to incorporate our opinion or prior belief to the

model. For the case of the coin toss, usually, we know

that most of the coin will have a 50:50 change of

getting head or tail. Then we would like to incorporate

this belief to our model, how do we do it? This can be

done by using a concept called prior belief.


Remember that the prior distribution we were using

previously is p_prior ~ Uniform(0,1) ? Previously this

means we believe that the value of p can be anything

from 0 to 1 with same equal opportunity, then we

change this belief in accordance to the data. Right

now, we want to change this prior belief in

accordance to our knowledge first. In a real world

scenarios, this can be happen in form of expert

opinion, or previous similar experiment being

conducted.

Now going back to the coin toss case, we know that

the value of p is only possible between 0 and 1. And

we have some belief of 50:50 change on getting head

or tail. First, we need to look for a distribution that can

help us to incorporate this belief. Let use a Beta

distribution in this case. Note that the most important

thing about prior distribution is its shape. The reason

Beta distribution is a perfect choice in this case, it has

range of possible values between 0 and 1, and we can

alter its shape by changing the α and β parameter of

the distribution. Beta distribution denoted by

Beta(α,β) and you can see on the Wikipedia page on

how different parameters affects the distribution's

shape. I would like to use Beta(5,5) as my prior belief,

which resulted on the below distribution shape.

p_prior = np.random.beta(5,5,100000)ax =
sns.displot(p_prior)ax.set(xlabel="p")

Why Beta(5,5) ? As I said previously, the most

important thing about this prior distribution is it

shape. If we want to make a thinner shape, which

reflects a stronger belief towards p equal to 0.5, you

may choose Beta(100,100) , or weaker belief towards

p=0.5, you may choose Beta(2,2) . It really depends

on how much the distribution reflects your prior belief.

Now let see the differences between our previous

setup and the current modified prior belief setup.


Previously

p_prior~Uniform(0,1)data~Binomial(5,p)

Current Setup

p_prior~Beta(5,5)data~Binomial(5,p)

With this configuration, we can redo the simulation by

following

n = 5p_posterior = list()for I in range(100000):


p_prior = np.random.beta(5,5,1)[0] n_success =
np.random.bunomial(5, p_prior, 1)[0] if
n_success == 1:
p_posterior.append(p_prior)ax =
sns.distplot(p_posterior)ax.set(x_label="p")

See how the differences of the result? Here, the

resulted distribution does not really skew towards 0.2,

the peak of distribution still closer to 0.5 (if compared

with the previous result), but it does not exactly at 0.5,

in fact in between 0.3 and 0.4. Since we have a prior

belief of 50:50 chance, after looking at the data our

belief starts to change, but not as extreme as when

we have zero assumption towards the value of p .

Some other example: linear regression

In this section, I would give you a simple example of

how linear regression is performed within the

Bayesian framework. But, I will only show you the

setup, not actually implement it into a code, in order to

give you a better view on how to set up a problem in

Bayesian inference.

Up to know you might realize, in Bayesian inference,

the parameter itself has a distribution instead of a

single number value. In the usual sense, we fit data to

the equation y=ax+c in order to do a linear regression.

But in Bayesian statistics, we define y is the data, and


the data is normally distributed with a mean value of

ax+c . Or we can write as follows,

y~Normal(μ,σ)μ=ax+c

Notice that there is another parameter which are a, c,

and σ, in which we need to define it prior distribution

too. Since we don't know any prior information or

knowledge, we can use a non-informative prior to this

case, so our final model becomes

y~Normal(μ,σ)μ=ax+ca~Uniform(-inf,
inf)c~Uniform(-inf, inf)σ~HalfUniform(0, inf)

The actual implementation of this model will not be

done in this example. I only want to show you how the

model construction works in Bayesian inference.

Practical implementation using PyMC3

One of the biggest problems of Bayesian inference is

that it expensive computational costs that requires

huge resources. In our previous case, it possible to use

a simulation procedure in order to get a posterior

sample since we only handle 5 observations and very

limited parameters and processes. But in case of huge

observation, finding a set of simulation results that

match our observation must require a really long time

and many many iterations.

Fortunately, in this opensource era and current

computational resource, the application of Bayesian

inference becomes much easier and possible. For

example, one of the most popular algorithms to be

used in order to get a posterior sample without the

need of doing simulation is MCMC (Markov-Chain

Monte Carlo). Of course, if you would like to implement

this algorithm by yourself, it requires time to

implement it right (which will not also be

demonstrated in this article). Instead, we can use a

popular opensource package available in Python

called PyMC3. Since our focus is to practically

implement the inference, we can directly use this

package as the most important thing is the result.

There are several assumptions and diagnoses that

need to be understood prior to use this method in

order to do Bayesian inference, but it will be for

another article (stay tuned!). As a closing statement, I

will show you how to implement our coin toss problem

using PyMC3 packages below.


First, define our model

p ~ Beta(5,5)observation/data ~ Bin(5,p)

Second, put it into code

import pymc3 as pmdata = np.array([1]) # this


mean only 5 resulted in successwith pm.Model() as
coin_model: p = pm.Beta("p", 5, 5) obs =
pm.Binomial("obs", n=5, p=p, observed=data)
trace = pm.sample(10000, tune=2000, cores=4)ax =
sns.distplot(trace["p"])ax.set(xlabel="p")

Of course, not all subjects related to Bayesian

statistics can be discussed within one article. There

are many aspect such as various types of statistical

distribution, diagnosis on MCMC result, etc. The whole

point of this article is to understand general idea of

the Bayesian inference framework intuitively. If you

are interested on learning this further, below are

several source I recommend to read regarding this

matter:

Books: Bayesian Method for Hacker. Book about

Bayesian inference from practical point of view with

PyMC. If you are a hacker, this book really are

convenient to use as it has bigger weight on it

practical implementation. In the book, it use PyMC

instead of PyMC3, but there are converted PyMC3

version of implementation in this github.

Books: Bayesian Analysis with Python--Second

Edition. Understanding concept and practical

implementation using PyMC3.

Books: Statistical Rethinking. One of the most used

book to completely understand and start Bayesian

statistics. Implementation using STAN and R.

1 from 1 notes
Comment

Powered by Commento

You might also like