0% found this document useful (0 votes)
20 views26 pages

Bayesian Modeling - Student

Bayesian modelong INLA

Uploaded by

bayesianito
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views26 pages

Bayesian Modeling - Student

Bayesian modelong INLA

Uploaded by

bayesianito
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Notes

Bayesian (hierarchical) modeling


Showcasing a framework for good modeling
for the INLA course in Iceland 2016

Haakon Chris. Bakka

Norwegian University of Science and Technology

Modeling: How and why?

Wide time estimate: 3h presentation, 1h reflect on the exercises so far, 3h


do+reflect over the remaining exercises.

Goal
Notes

The goal of this presentation is to answer a question that has no


clear answer, and to explain why the answer is there:

”What is a good model?”

And then to give a framework for how to create good models.


Introduction
Notes

All models are wrong

...but some are useful.

It cannot be answered
Notes

”What is a good model?”

I This is not a mathematical question


I What you want is subjective
I You cannot observe it
I There are so many di↵erent answers out there
It can be answered
Notes

”What is a good model?”

I Physics works really well


I Like cleaning dishes with an unclean rag
I People all around you have their own answer that they often
don’t communicate
I A decent and limited and temporary answer beats a
non-answer

How does the answer look?


Notes

”What is a good model?”

I To a mathematical question, the answer looks nice and


compact, and clear
I But this kind of transient answer has many facets and
viewpoints
I Not too much philosophy
I Cannot focus on a single modeling problem
I Must take a step up: ”How to do good modeling”
I We can borrow a lot from physics!
I We know rules are correct when they give correct conclusions
Variance of estimators
Notes
Example of a better-than-nothing answer that is still not so good

When you want to know the uncertainty of an estimate, one of the


standard things you can do is to compute the variance of the
estimator.

This is not what you want!

What you want to know is ”what other parameter values


represents ok models of this data?”

Everyone likes their own framework/tools ...


Notes

...so why should you trust me?

You should not. Be critical!

If you disagree let me know...

Discuss with others, their opinions...

Create systematic and consistent rules for your own modeling!


Notes

Part 1: A very good answer

A framework for good modeling


Notes
Making models
1. Fully specified prior models ⇡(y )
2. Simulating data should give somewhat reasonable datasets
3. The model should be self-consistent (separable explanations)

Drawing conclusions
1. Look at posteriors directly (not utility-based estimates)
2. Run several models, and see how conclusions change
3. Posterior conclusion already assumed in the prior model?
— The common approach by scientific R-INLA users!

Rejecting inferior models


1. How model borrows strength and extrapolates (interpretability)
2. Predictive measures of fit (interpolation, extrapolation, train/test)
3. Measures of fit (data consistent with distributional assumptions)
Above all...
Notes
this is about understanding what your model is doing

Waypoint
Notes

So far Described a way to do good modeling


Presentation I have selected the sub-topics I think is the most
important
Next topic Fully specified models ⇡(y )
Notes

Part 2: A bit of history

Probability: Objective
Notes

”In 1654, [Pascal] he corresponded with Pierre de Fermat on the


subject of gambling problems [...]. The specific problem was that of
two players who want to finish a game early and, given the current
circumstances of the game, want to divide the stakes fairly, based
on the chance each has of winning the game from that point. ”

Objective probability: relative frequency of occurrence


(Ontological, Irreducible) (Aliatory uncertainty)
Probability: Objective
Notes
Examples

Action You throw two dice, and take the maximum


Repeat You repeat the action many many times
Frequency You note down the fraction of outcomes giving each
value
Conclusion When the fractions stop changing, you conclude they
are the probabilities

If you assume ”this” to be the (truly iid) data generating


mechanism, then you must conclude that ”...”

Probability: Subjective
Notes

”Before the middle of the seventeenth century, the term ’probable’


(Latin probabilis) meant approvable, [...] opinion and to action. A
probable action or opinion was one such as sensible people would
undertake or hold, in the circumstances. However, in legal contexts
especially, ’probable’ could also apply to propositions for which
there was good evidence.”

Subjective probability, i.e. degree of belief (Epistemological,


Reducible). Commonly used as betting odds. (Subjective
uncertainty)
Probability: Subjective
Notes
Examples

Prior You think that symbol on the black board, a, means


either ”becomes” or ”is part of”
Data You see it used two more times
Posterior You update your belief about what the symbol means

If you believe ”this” before you see data, you must believe ”that”
after you see the data.

Is there any method of inference consistent with both?


Notes

Yes, fully specified models ⇡(y ).

Sometimes known as Bayesian models (when you do not claim


prior knowledge).
Example model ⇡(y )
Notes
The model fits in one slide! There is nothing hidden!

⇡(yi |⌘i ) ⇠ N (⌘i , ✏) independently


⌘i = X i + u i
1
ui ⇠ N (0, uQ )
j ⇠ N (0, 1000)
✏ ⇠ exp(1/5)
u ⇠ exp(1)
1
r ⇠ exp(5)

where Q is the precision matrix for a discretised Matérn field with


hyperparameters r , marginal standard deviation 1, and smoothness
⌫ = 1. The Xi are known (or NA) vectors of covariate values.

Notes

Part 3: What is the posterior?


Interpreting the Posterior
Notes
When you have a prior model ⇡(y )

Say you observe first y1 then y2 . Mathematics demand

⇡(y1 , y2 ) = ⇡(y1 )⇡(y2 |y1 ).

I Prediction: ⇡(y2 |y1 )


I Posterior model: ⇡(ynew |y1 )
I The posterior can be summarised as those parameter values
that are needed for ⇡(ynew |y1 ), namely ⇡(✓|y1 )
I 95% Credible intervals summarise those parameter values that
are relevant
I 95% CI: Values supported by the data

No choices to be made
Notes
The posterior is a direct result of the prior

1. Set up the prior model


2. No more choices to make
3. Get INLA/MCMC to compute the posterior
Testing
Notes
Credible intervals are constructed from posterior marginals

Claim You can use credible intervals instead of hypothesis


testing
Strengths of using credible intervals
I Sensitivity analysis (wrt other parameters’ values)
I Sensitivity to other hypotheses
I Interpretation: The reasonable values under your assumptions
I Interpretation: Plotting prior-posterior
I Interpretation: The values supported by your dataset
Weakness of using credible intervals
I Sometimes people ask you for a test
I Some think they are subjective

Extract whatever you want!


Notes

I You can extract [thing] from your prior that you did not
explicitly model!
I You do not need [thing] explicitly in your prior model to find it
in the posterior
Example:
I E.g. A factor for car type
I Extract: Di↵erence between two car types
Notes

Part 4: Drawing conclusions

Two ways of being incorrect


Notes
What do you want to know?
Notes

ppregnant : Estimate 0.88 Interval [0.673, 0.98]

The posterior probability distrubution of this person being


pregnant!
I You can afterwards summarise however you want
I Interpretation for p: Fraction of people ”like her” who are
pregnant
I Summarising this removes so much information!
I An Estimate (minimizing some loss-function)
I Category (pregnant/not)
I Hypothesis test

Utility functions
Notes

For deciding what decision or categorisation to draw from a model,


we need to know the Bayesian utility.
I How much does these errors cost compared to these
Given utility, you can compute point estimates, e.g. classification.
Utility is a useful concept to think about. But it is not really that
useful to make a utility assumption, when you do not have to.
Notes

Part 5: Just because it is not right does not mean that it is wrong.

It could also be that it does not matter.

What is True? And what is relevant?


Notes
Thing Explainer

The foreword in thing explainer


Paralells:
I The world is round!
I Newton equations and integrals
Nitpicky: If talking about Columbus. Not nitpicky: Sending up a
satellite. Not nitpicky: If computing GPS.
Now: talk about what may not be relevant, in our practical
examples.
Example
Notes
Prior for fixed e↵ects

1 ⇠ N (0, 1000)

What if y is between -2 and 3, and x1 is between 0 and 100, in a


linear model?

Notes

Part X: Good and bad hyper-priors


Priors add useful information
Notes
A question for discussion

You suspect your friend has a cheating die, giving too many 6’s.
You plan to roll six hundred times to check. Set up any reasonable
model ⇡(y ).

A very bad prior for size of random e↵ect


Notes
Inverse Gamma prior on the variance
Also known as Gamma prior on the precision parameter
Also known as Conjugate prior to the Gaussian distribution
Also known as ”I assume there is a significant spatial e↵ect”
0.030

8
0.025

6
0.020
Density

Density
0.015

4
0.010

2
0.005
0.000

0 100 200 300 400 500 0.0 0.5 1.0 1.5 2.0

Precision Distance

u ⇠ N (0, ⌧u 1 ⌃) = N (0, 2
u ⌃)

Here, ⌧u is precision, and u is sd.


PC priors: A good default solution
Notes

⇡( u) = e u

Where you set the so that your median takes a reasonable value.
Or, as in the paper, you set a di↵erent quantile. Or, you ensure the
probability decay is reasonable.

Other parameters: Find a parametrisation that is ”like” the


standard deviation (using information theory).

I ”Penalising model component complexity: A principled, practical


approach to constructing priors” - Simpson et. al
I ”Constructing Priors that Penalize the Complexity of Gaussian Random
Fields” - Fuglstad et. al.

Notes

Part X: You cannot just fit one model


You cannot just run a model and wash your hands
Notes
If people misinterpret your model, it is still partially your fault

People my one day loose their health, life or job based on one of
your statistical analyses.
I did a model and got p=0.01

I Interpret mathematical assumptions to statements


I How does assumptions impact conclusions?
I Any natural interpretations of your answer?

M1 : ⇡(y ) ! ⇡(✓|y = data) ! Conclusion 1


M2 : ⇡(y ) ! ⇡(✓|y = data) ! Conclusion 2
M3 : ⇡(y ) ! ⇡(✓|y = data) ! Conclusion 3

When Reality is Incompatible


Notes
Example: The spatial range
Prectical example 1
Notes
Extreme overfitting

See wikipedia on overfitting


I Flat prior vs PC prior
I What distribution are you sampling curves from?

Practical example 2
Notes
Di↵erent priors for range gives di↵erent significance of random e↵ects

For the smoothing of a time series, or for a spatial e↵ect.


I Compare the posteriors of fixed e↵ecst
I Supporting/Auxiliary hypotheses
I Run several models and reject bad ones
Practical example 3
Notes
Prior-posterior plots

If the prior was ”prior information”, we would not do this!


I The prior is ”prior assumptions”
I Is posterior sensitive to this prior?
I We want to know if we should run di↵erent models, or if there
will be no di↵erence in the result

Exercises
Notes

Let us discuss the problems in exercise 1. What assumptions are


being made, and what impact do they have?
Notes

Part X: Modeling considerations

Learning about di↵erent modeling issues.

You have to run many models


Notes

Researchers ”get to know” their data after working with it and


modeling it in di↵erent ways...

This is a good thing!

I You are only cheating if you remove good models from your
analysis!
I Consider di↵erent observation likelihoods
I Consider the sampling cheme
I Consider how you borrow information/strength
I Consider where you assume additivity
I Consider type of random e↵ect
I Consider hyper-priors
From individual to population
Notes

Claim You are trying to draw information from one


individual to say something about the entire
population
Consider In what way are you borrowing information
one-to-one and one-to-all?
Pooling param. Hierarchical models have pooling parameters,
which are very powerful tools
Simplify What would you plot, illustrate or put in tables?

Consider the Observation process


Notes

The observation model: You probability of observing may depend


on the location, group size, or other ”What you predict that you
would observe” vs ”What you predict is the true value”
Random samples
Notes

Random samples ...

... you absolutely need them ...

... but you very rarely have them!

Truly random sampling is the only thing that can


protect us from unknown unknowns. - Kristin Lennox

Example sampling issue and resolution


Notes
Track data of seals

Data gathered
I Go to some seal ”homes” (haul-out sites) and tag seals
I Track their GPS position every 15 minutes
Goal
I Where (in space) are the seals
Problem
I False: ”The more you measure, the more you know”
I The seal is not representative for the population, and it
becomes less representative the more data you have.
Solution:
I Information is sub-linear in the number of measurements
Why can they do this? This gives sensible results.
Notes
Information as a function of observations.

0 200 400 600 800 1000

Conclusion
I Describing how you sample is more important than the exact
level of credibility/confidence of your result

What information is in your samples


Notes

If you want both additive noise, additive local species interaction


and local spatial density, how would you need to sample?
Consider the Likelihood
Notes

Example: the disease mapping in Brazil


I Extremely sensitive likelihood
I INLA problems with convergence
I INLA problems with matrices
I Part of the ”linkelihood” in the linear predictor
I Zero-inflated or not?

Thank you!
Notes

Thank you for the attention!

You might also like