Bayes Lecture Notes
Bayes Lecture Notes
Bayesian Analysis
Dr. Vilda Purutcuoglu
1
Edited by Anil A. Aksu based on lecture notes of STAT 565 course by Dr. Vilda
Purutcuoglu
1
Contents
Decision Theory and Bayesian Analysis 1
Lecture 1. Bayesian Paradigm 5
1.1. Bayes theorem for distributions 5
1.2. How Bayesian Statistics Uses Bayes Theorem 6
1.3. Prior to Posterior 8
1.4. Triplot 8
Lecture 2. Some Common Probability Distributions 13
2.1. Posterior 15
2.2. Weak Prior 17
2.3. Sequential Updating 19
2.4. Normal Sample 19
2.5. NIC distributions 19
2.6. Posterior 20
2.7. Weak prior 21
Lecture 3. Inference 23
3.1. Shape 24
3.2. Visualizing multivariate densities 26
3.3. Informal Inferences 27
3.4. Multivariate inference 28
Lecture 4. Formal Inference 29
4.1. Utility and decisions 29
4.2. Formal Hypothesis Testing 30
4.3. Nuisance Parameter 31
4.4. Transformation 32
4.5. The prior distribution 32
4.6. Subjectivity 32
4.7. Noninformative Priors 33
4.8. Informative Priors 34
4.9. Prior Choices 34
Lecture 5. Structuring Prior Information 39
3
4 DR. VILDA PURUTCUOGLU, BAYESIAN ANALYSIS
LECTURE 1
Bayesian Paradigm
(1.14) f (θ | X) = posterior.
(1.15) f (X | θ) = likelihood.
1.4. Triplot
If for any value of θ, we have either f (θ) = 0 or f (x | θ) = 0, then
we will also have f (θ | x) = 0. This is called the property of zero
preservation. So if either:
• the prior information says that this θ value is impossible
• the data say that this value of θ is impossible because if it
were the true value, then the observed data would have been
impossible, then the posterior distribution confirms that this
value of θ is impossible.
Definition 1.16. Crowwell’s Rule: If either information source com-
pletely rules out a specific θ, then the posterior must rule it out too.
LECTURE 1. BAYESIAN PARADIGM 9
0.8
prior
likelihood
0.6
0.4 posterior
f(x)
0.2
0.0
−4 −2 0 2 4
This means that we should be very careful about giving zero proba-
bility to something unless it is genuinely impossible. Once something
has zero probability then no amount of further evidence can cause it
to have a non-zero posterior probability.
• More generally, f (θ | x) will be low if either f (θ) is very small.
We will tend to find that f (x | θ) is large when both f (θ)
and f (x | θ) are relatively large, so that this θ value is given
support by both information sources.
When θ is a scalar parameter, a useful diagram is the triplot, which
shows the prior, likelihood and posterior on the same graph. An ex-
ample is in Figure 1.1
where a = nσ − 2/(nσ −2 + v −1 )
(1.24) R = (n−1 σ 2 + v)(x̄ − m)2
Therefore,
1 2
(1.25) f (µ | x) ∝ exp − n(µ − m)
2v
and we have shown that the posterior distribution is normal too: µ |
x ∼ N (m∗ , v ∗ )
• m∗ = weighted average of the mean m and the usual frequentist
data-only estimate x̄.
The weights ∝:
• Bayes’ theorem typically works in this way. We usually find
that posterior estimates are compromises between prior esti-
mates and data based estimates and tend to be closer whichever
information source is stronger. And we usually find that the
posterior variance is smaller than the prior variance.
2.1. Posterior
Not only the beta distributions are the simplest and the most conve-
nient distributions for a random variable confined to [0, 1], they also
work very nicely as prior distribution for a binomial observation. If
θ ∼ Be(p, q) then
n x
(2.22) f (x | θ) = θ (1 − θ)n−x .
x
for x = 1, 2, ..., n.
1
(2.23) f (θ) = θp−1 (1 − θ)q−1
Be(p, q)
where 0 ≤ θ ≤ 1 and p, q > 0.
(2.24)
Z Z 1
n 1
f (x) = f (θ)f (x | θ)dθ = θp+x−1 (1 − θ)q+n−x−1 dθ
r Be(p, q) 0
n Be(p + x, q + n − x)
= .
r Be(p, q)
From
f (θ)f (x | θ)
(2.25) f (θ | x) = .
f (x)
16 DR. VILDA PURUTCUOGLU, BAYESIAN ANALYSIS
(2.26)
θp+x−1 (1 − θ)q+n−x−1
f (θ | x) = ∝ θp−1 (1 − θ)q−1 θx (1 − θ)n−x .
Be(p + x, q + n − x) | {z }| {z }
Beta part Binomial part
0.8
prior
likelihood
posterior
0.6
0.4
f(x)
0.2
0.0
−4 −2 0 2 4
prior
likelihood
posterior
0.6
0.4
f(x)
0.2
0.0
−4 −2 0 2 4
2.6. Posterior
Supposing then that the prior distribution is N IC(m, v, a, d), we find
2 d+n+3 1
(2.34) f (µ, σ | x) ∝ σ exp − 2 θ
2σ
Posterior Distribution
0.25
0.20
0.15
f(θ|x )
0.10
0.05
0.00
0 5 10 15 20
23
24 DR. VILDA PURUTCUOGLU, BAYESIAN ANALYSIS
20 800
15 600
8
6
Sinc(
4
10 400
2 10
r)
0 5
−2
−10
0
Y
−5 200
0 5
−5
X
5
10 −10 0
5 10 15 20
0.10
0.05
0.00
−4 −2 0 2 4 6
3.1. Shape
In general, plots illustrate the shape of the posterior distribution. Im-
portant features of shape are modes (and antimodes), skewness and
kurtosis (peakedness or heavy tails). The quantitative summaries of
shape are needed to supplement like the view of mode (antimode).
The first task is to identify turning points of the density, i.e. solutions
of f 0 (θ) = 0. Such points include local maxima and minima of f (θ)
which we call mode and antimode, respectively.
LECTURE 3. INFERENCE 25
ab 2 2
f 00 (θ) = a θ − 2a(b − 1)θ − (b − 1)(b − 2) θb−3 e−aθ
(3.3)
Γ(b)
So from f 0 (θ), the turning point at θ = (b − 1)/a. For b ≤ 1, f 0 (θ) < 0
for all θ ≥ 0, so f (θ) is monotonic decreasing and the mode is at θ = 0.
For θ > 1, f (θ) → 0 as θ → 0, so the turning point θ = 0 is not a mode.
In this case, f 0 (θ) > 0 for θ < (b − 1)/a and f 0 (θ) < 0 for θ > (b − 1)/a.
Therefore θ = (b − 1)/a is the mode. Looking at f 00 (θ), the quadratic
1/2
expression has roots at θ = b−1 a
∓ (b−1)
a
. Therefore b > 1, these are
the points of inflection.
Example 3.1.2. Consider the mixture of two normal distributions
0.8 1 2 0.2 1 2
(3.4) f (θ) = √ exp − θ + √ exp − (θ − 4)
2π 2 2π 2
| {z } | {z }
∼N with weight ∼N with weight
0 0.8θ 1 2 0.2(θ − 4) 1 2
(3.5) f (θ) = − √ exp − θ − √ exp − (θ − 4)
2π 2 2π 2
For θ ≤ 0, f 0 (θ) > 0 and for θ ≥ 0, f 0 (θ) < 0, the turning points at
θ = 0.00034, 2.46498 and 3.9945.
(3.6)
0.8(θ2 − 1) 0.2((θ2 − 8θ − 15)
00 1 2 1 2
f (θ) = − √ exp − θ − √ exp − (θ − 4)
2π 2 2π 2
This is positive for θ ≤ −1, for 1 ≤ θ ≤ 3 and θ ≥ 5, confirming
that the middle turning point is an antimode. Calculating f 00 (θ) at the
other points confirms them to be modes. Finally points of inflection
are at θ = −0.99998, θ = 0.98254, θ = 3.17903, θ = 4.99971.
26 DR. VILDA PURUTCUOGLU, BAYESIAN ANALYSIS
0.30
0.25
0.20
f(θ|x )
0.15
0.10
0.05
0.00
−4 −2 0 2 4 6
0.20
0.15
f(θ|x )
0.10
0.05
0.00
0 5 10 15 20
L(d1 , θ) = k if θ∈A
(4.5)
= 0 if θ∈
/A
where k defines the relative seriousness of the first kind of error relative
to the second. Then Eθ (L(d0 , θ)) = P (θ ∈ / A | x), while Eθ (L(d1 , θ)) =
kP (θ ∈ A | x). The optimal decision is to select d0 (say that H is true)
1
if its probability P (θ ∈ A | x) exceeds k+1 . The greater the relative
seriousness k of the first kind of error, the more willing we are:
4.4. Transformation
If θ̂ is an estimate of θ, is g(θ̂) the appropriate estimate of φ?
This depends on the kind of inference being mode. In the particular
case of point estimation, then the posterior mean is not invariant in
this way.
Example 4.4.1. If φ = θ2 ), then
(4.6) E(φ | x) = E(θ2 | x) = v(θ | x) + E(θ | x)2 ≥ E(θ | x)2
The mode is not invariant to transformations but the median is invari-
ant, at least to 1 − 1 transformations.
Interval estimates are also invariant to 1 − 1 transformations in the
sense that if [a, b] is a 90% interval, say for θ, then [g(a), g(b)] is a 90%
interval for φ if g is a monotone increasing function. If [a, b] is a 90%
HPD interval for θ, then [g(a), g(b)] is an HPD interval for φ?
4.6. Subjectivity
The main critic to Bayesian methods is the subjectivity due to the
prior density.
If the data are sufficiently strong, the remaining element of personal
judgement will not matter, because all priors based on reasonable in-
terpretation of the prior information will lead to effectively the same
posterior inferences. Then we can claim robust conclusion on the basis
of the synthesis of prior information and data.
If the data are not that strong, then we do not yet have enough
scientific evidence to reach an objective conclusion. Any method which
LECTURE 4. FORMAL INFERENCE 33
n(x̄ − θ)2
(4.10) log f (x | θ) = −
2v
d2 n
(4.11) 2
log f (x | θ) = −
dθ 2v
Therefore,
d2 n
(4.12) I(θ) = −E( log f (x | θ)) =
dθ2 2v
As a result,
r
p n
(4.13) f0 (θ) = I(θ) = .
2v
Example 4.9.2. If x1 , x2 , ..., xn are distributed as N (µ, σ 2 )
with θ = (µ, σ 2 ), then
−n(s + (x̄ − θ)2 )
−n
(4.14) f (x | θ) ∝ σ exp −
2σ 2
(xi −x̄)2
P
where s = n
, Then what is the Jeffrey’s prior of f (µ, σ 2 )?
Solution:
(4.16) log f (x | θ) =
A number of objections can made to the Jeffrey’s prior, the
most important of which is that it depends on the form of the
data. The prior distribution should only represent the prior
information, and not be influenced by what data are to be col-
lected.
(5.1) xi | µ ∼ N (µ, 1)
(5.12) f (θ, φ) =
42 DR. VILDA PURUTCUOGLU, BAYESIAN ANALYSIS
(5.13) f (θ, φ | x) ∝
(5.14) f (θ | x) =
Shrinkage: It means that the posterior distribution and posterior esti-
mates of these parameters will generally be closer together than their
corresponding data estimates. This is a phenomenon known as ”shrink-
age”. Let w = (ξ, σ 2 , τ 2 ) and µ = (µ1 , µ2 , ..., µk ) and
f (µ | x, w) ∝ f (x | µ, σ 2 )f (µ | w)
k
(n )
i 2
(µi − ξ)2
Y Y 1 (yij − µi )
= √ exp − exp −
(5.15) i=1 j=1
2πσ 2σ 2 2τ 2
k
Y
= f (µi | x, w)
i=1
6.2. Identifiability
Let θ = (y(θ), h(θ)) and f (x | θ) depend only on g(θ).
f (θ | x) ∝ f (x | θ)f (θ) = f (x | g(θ))f (θ)
(6.8) = f (x | g(θ))f (g(θ))f (h(θ) | g(θ))
∝ f (g(θ) | x)f (h(θ) | g(θ)).
This says that the posterior distribution of θ is made up of the distri-
bution of g(θ) and the prior distribution of h(θ) given g(θ). So it is the
conditional posterior distribution of h(θ) given g(θ) that is the same
as the prior. That is
(6.9) f (h(θ) | x, g(θ)) = f (h(θ) | g(θ)).
We say that h(θ) is not identifiable from these data. No matter how
much data we get, we can not learn exactly what h(θ) is. With sufficient
data we can learn g(θ), but not h(θ).
Let the starting point S0 = [0, 5 0, 5], to get the first state
(7.5) S1 =
To get the second state,
(7.6) S2 =
(7.7) S3=
(7.8) S4 =
So the choice proportions are converging to [0, 75 0, 25] since the tran-
sition matrix is pushing toward a steady state or stationary distribution
of the proportions. So when we reach this distribution, all future states
are constant, that is stationary.
LECTURE 7. TACKLING REAL PROBLEMS 49
(7.10) Z
m1 +m2
p (x, y) = pm1 (x, z)pm2 (z, y)dz continuouscase.
range z
(7.15) π m (θj ) =
Z
(7.17) π t (θi )p(θi , θj )dθi = π t+1 (θj ) continuous case
L(θ)π(θ)
0.15
µg(θ)
0.10
0.05
0.00
−10 −5 0 5 10
Example 7.5.1.
LECTURE 7. TACKLING REAL PROBLEMS 53
So
k
X
λ | φ, k ∼ Gamma(α + yi , β + k),
i=1
n
X
φ | λ, k ∼ Gamma(γ + yi , δ + n − k).
i=k+1
k n
Y 1 k(φ−λ) −nφ Pki=1 yi Y yi
=( )e e λ ( φ )
(8.8) y!
i=1 i i=k+1
n
Y e−φ φyi k(φ−λ) λ Pki=1 yi
=( )(e ( ) )
i=k+1
y i! φ
= f (y, φ)L(y | k, λ, φ).
Example 8.1.3. The time series stored in the tile gives the # of British
cool mining disasters per year over the period 1851 − 1962.
6
5
# of disasters
4
3
2
1
year
There has been a reduction in the rate of disasters over the period.
Let yi denote the # of disasters in year i = 1, ..., 12 (relabelling the
years by numbers from 1 to n = 112). A model that has been proposed
in the literature has the form:
yi ∼ P oisson(θ), i = 1, ..., k
yi ∼ P oisson(λ), i = k + 1, ..., n
Let
θ ∼ Gamma(a1 , b1 )
λ ∼ Gamma(a2 , b2 )
k ∼ discrete uniform over {1, ..., n}
b1 ∼ Gamma(c1 , d1 )
b2 ∼ Gamma(c2 , d2 )
So
Pk Pn
π(θ, λ, k, b1 , b2 | y) ∝ e−θk θ i=1 yi −θ(n−k)
e λ i=k+1 yi a1 a1 −1 −b1 θ
b1 θ e
×bc11 −1 e−d1 b1 ba22 λa2 −1 e−b2 λ bc22 −1 e−d2 b2 I [k ∈ {1, 2, ..., n}] .
LECTURE 8. 59
k
X
θ | y, λ, b1 , b2 , k ∼ Gamma(a1 + yi , b1 + k)
i=1
n
X
λ | y, θ, b1 , b2 , k ∼ Gamma(a2 + yi , b2 + n − k)
i=k+1
b1 | y, θ, λ, b2 , k ∼ Gamma(c1 + a1 , d1 + θ)
b2 | y, θ, λ, b1 , k ∼ Gamma(c2 + a2 , d2 + λ)
and
Pk
e(λ−θ)k (θ/λ) i=1 yi I [k ∈ {1, 2, ..., n}]
p(k | y, θ1 , λ, b1 , b2 ) = Pn n (λ−φ)j θ Pk y o
j=1 e ( λ ) i=1 i
Burn-in period: The observations obtained after the chain has set-
tled down to the posterior will be more useful in estimating probabili-
ties and expectations for p(θ). If we throw out the early observations,
taken while the process was settling down, the remainder of the process
should be a very close approximation to one in which every observa-
tion is sampled from the posterior. Dropping the early observations is
referred to as using a burn-in period.
Thinning is a process used to make the observations more nearly
independent, hence more nearly random sample from the posterior dis-
tribution. Frankly, after a burn-in, there is not much point in thinnings
unless the correlations are extremely large. If there is a lot of correla-
tion between adjacent observations, a larger overall MC sample size is
needed to achieve reasonable numerical accuracy, in addition to needing
a much larger burn-in.
LECTURE 9
Summary of the properties of Gibbs
sampler
(a) Since the Gibbs sampler conditions on values from the last
iteration of its chain values, it clearly constitutes a Markov
chain.
(b) The Gibbs sampler has the true posterior distribution of pa-
rameter vector as its limiting distribution.
(c) The Gibbs sampler is a homogeneous Markov chain, the con-
secutive probabilities are independent of n, the current length
of the chain.
(d) The Gibbs sampler converges at a geometric rate: the total
variation distance between an arbitrary time and the point of
convergence decreases at a geometric rate in time (t).
(e) The Gibbs sampler is an ergodic Markov chain.
p(θ∗ )h(θk | θk ) ∼
∗ k
(9.1) α(θ , θ ) = min 1, = α.
p(θk )h(θ∗ | θk )
61
62 DR. VILDA PURUTCUOGLU, BAYESIAN ANALYSIS
t−1
p(θj∗ | θj−1 ∗
t
, y) if θ−j = θ−j
(9.6) hGibbs
j,t (θ∗ |θ t−1
)=
0 otherwise
p(θ∗ | y)/hGibbs
j,t (θ∗ | θt−1 )
α=
p(θt−1 | y)/hGibbs
j,t (θt−1 | θ∗ )
t−1
p(θ∗ | y)/p(θj∗ | θ−j , y)
=
(9.7) p(θjt−1 | y)/p(θ−j
t−1
| θ∗ , y)
t−1
p(θ−j | y)
= t−1
p(θ−j | y)
=1
To complete the Gibbs sampler, we also need to sample from the con-
ditional posterior distribution of yi∗ . This has density:
(9.13)
f (yi∗ , y | θ) ∝ f (y, yi∗ | θ)
θ−1
∝ (yi∗ − yi−1
∗ ∗
) 1 − (yi∗ − yi−1
∗
− yi∗ ) 1 − (yi+1 ∗
− yi∗ )
) (yi+1
∗
on the region yi∗ ∈ (yi−1 ∗
, yi−1 ∗
+ 1) ∩ (yi+1 ∗
− 1, yi+1 ). This sampling can
be carried out by rejection sampling.
GIBBS SAMPLING
yi ∼ P oisson(θ), i = 1, ..., k
yi ∼ P oisson(λ), i = k + 1, ..., n
θ ∼ Gamma(a1 , b1 )
λ ∼ Gamma(a2 , b2 )
k ∼ discrete uniform over {1, ..., n}
b1 ∼ Gamma(c1 , d1 )
b2 ∼ Gamma(c2 , d2 )
Likelihoods:
k
Y
f (yI | θ) = f (yi | θ)
i=1
n
Y
f (yJ | λ) = f (yj | λ)
j=k+1
k Pk
Y e−θ θyi e−kθ θ i=1 yi
f (yi | θ) = = Qk
i=1
yi ! i=1 yi !
n n P
Y e−θ θyj e−(n−k)θ λ i=k+1 yi
f (yj | θ) = = Qn
j=k+1
yj ! i=k+1 yi !
Priors:
1 a1 −1 −θ/b1
π(θ) = θ e
ba11 Γ(a1 )
1 a2 −1 −θ/b2
π(λ) = θ e
ba22 Γ(a2 )
1
π(b1 ) = c1 bc11 −1 e−b1 /d1
d1 Γ(c1 )
1
π(b2 ) = bc2 −1 e−b2 /d2
d2 Γ(c2 ) 2
c2
67
68 DR. VILDA PURUTCUOGLU, GIBBS SAMPLING
Since
1
xα−1 e−x/β
β α Γ(α)
for x>0
g(x; α, β) =
0 elsewhere
Explicitly,
Pk Pn
π(θ, λ, k, b1 , b2 | y) ∝ (e−kθ θ i=1 yi
)(e−(n−k)θ λ i=k+1 yi
)(d−c 1 c1 −1 −b1 /d1
1 b1 e )
(d−c 2 c2 −1 −b2 /d2
2 b2 e )I [k ∈ {1, 2, ..., n}] (b−a2 a2 −1 −θ/b2
2 θ e )(b−a1 a1 −1 −θ/b1
1 θ e )
(b)
Z
π(λ | y, θ, k, b1 , b2 ) ∝ π(θ, λ, k, b1 , b2 | y)dθdb1 db2 dθ
Z Pn
∝ (e−(n−k)λ θ i=k+1 yi θa2 −1 e−θ/b2 )constantdλ
Z Pk
∝ (e−((n−k)+b2 )λ θ i=1 yi +a2 −1 dλ
n
X
= Gamma(a2 + yi , b2 + (n − k))
i=k+1
DR. VILDA PURUTCUOGLU, GIBBS SAMPLING 69
(c)
Z
π(b1 | y, λ, θ, k, b1 ) ∝ π(θ, λ, k, b1 , b2 | y)dydθdb2 dθ
Z
∝ bc11 −1 e−b1 /d1 θa1 −1 e−θ/b1 constantdb1
Z
∝ bc11 +a1 −1 e−b1 /(d1 +θ) db1
= Gamma(a1 + c1 , d1 + θ)
(d)
Z
π(b2 | y, λ, θ, k, b1 ) ∝ π(θ, λ, k, b1 , b2 | y)dydθdb1 dk
Z
∝ bc22 −1 e−b2 /d2 λa2 −1 e−λ/b2 constantdb2
Z
∝ bc22 +a2 −1 e−b2 /(d2 +λ) db2
= Gamma(a2 + c2 , d2 + λ)
(e)
Z
π(k | y, λ, θ, b1 , b2 ) ∝ π(θ, λ, k, b1 , b2 | y)dydθdb1 db2
Pk
e(λ−θ)k (θ/λ) i=1 yi
=P n Pj o I [k ∈ {1, 2, ..., n}]
n (λ−θ)j (θ/λ) i=1 yi
j=1 e
As the conditional distribution of k is discrete, thereby characterized
by a probability mass function.
DATA AUGMENTATION
Suppose Y0 , Y1 , ..., Yn is a time series of random defined by Y0 = 0,
i = 1, ..., n, Yi = Yi−1 + Si where Si ∼ Beta(θ, θ), θ > 0. Therefore
Yi | Y0 , Y1 , ..., Yi−1 ∼ Yi−1 + Si
Likelihood for the observations (y0 , y1 , ..., yn ) of θ:
n
Y n
Y
f (y0 , ..., yn | θ) = f (y0 | θ) f (yi | y0 , ..., yi−1 , θ) = f (yi | yi−1 , θ)
i=1 i=1
n
Y Γ(2θ)
= 2 (yi − yi−1 )
θ−1
{1 − (yi − yi−1 )}θ−1 I [0 < yi − yi−1 < 1] .
i=1
{Γ(θ)}
However suppose that an observation yi∗ is missing. So since the likeli-
hood is no longer in closed form, we assume that Y1 , ..., Yn are iid data
from the mixture density:
1 1 −yi2 /2 1 −(yi −θ)2 /2
f (yi | θ) = e + e .
2 (2π)1/2 (2π)1/2
and
n 2
2
Y
L(yi | θ) = f (yi | θ) e−yi /2 + e−(yi −θ) /2 .
i=1
Let
• z=the additional variables included in the model(z may be just
a single variable or a vector containing several variables).
• θ = original parameters in the model with prior π(θ).
• y = vector of observations.
So the posterior distribution of (θ, z) is proportional to
π(θ, z | y) ∝ f (y, z | θ)π(θ)
Data augmentation proceeds by carrying out Gibbs sampling to sample
successively form θ and z to produce a sample from this joint distribu-
tion. The marginal distribution of θ therefore the posterior distribution
of interest.
71
72 DR. VILDA PURUTCUOGLU, DATA AUGMENTATION
2 # #
3 # Posterior, Perspective and Contour plots #
4 # by Anil Aksu #
5 # #
6 ####################################################
7
8 ## the range of sampling
9 x=seq(0,20,length=101)
10 ## this function gets numbers from console
11 posterior=dnorm(x, mean = 7, sd = 1.5, log = FALSE)
12
13
14 ## let's plot them
15 plot(range(x), range(c(posterior)), type='n', ...
xlab=expression(paste(theta)), ...
ylab=expression(paste("f(", theta, " | x )")))
16
17 lines(x, posterior, type='l', col='blue',lwd=5)
18
19 title("Posterior Distribution")
20 legend = c("posterior")
21
22 ## perspective plot
23 x <− seq(−10, 10, length= 30)
24 y <− x
25 f <− function(x, y) { r <− sqrt(xˆ2+yˆ2); 10 * ...
sin(r)/r }
26 z <− outer(x, y, f)
27 z[is.na(z)] <− 1
28 op <− par(bg = "white")
29 persp(x, y, z, theta = 30, phi = 30, expand = 0.5, ...
col = "lightblue")
30 persp(x, y, z, theta = 30, phi = 30, expand = 0.5, ...
col = "lightblue",
31 ltheta = 120, shade = 0.75, ticktype = "detailed",
32 xlab = "X", ylab = "Y", zlab = "Sinc( r )"
33 ) −> res
34 round(res, 3)
35
36 # contour plot
37 a <− expand.grid(1:20, 1:20)
38 b <− matrix(a[,1]ˆ2 + a[,2]ˆ2, 20)
39 filled.contour(x = 1:20, y = 1:20, z = b,
40 plot.axes = { axis(1); axis(2); ...
points(10, 10) })
41
42
43 ## bivariate posterior sampling
DR. VILDA PURUTCUOGLU, R CODES 77
44
45 ## the range of sampling
46 x=seq(−4,6,length=101)
47 ## this function gets numbers from console
48 posterior=0.8*dnorm(x, mean = 0, sd = 1, log = ...
FALSE)+0.2*dnorm(x, mean = 4, sd = 1, log = FALSE)
49
50 ## let's plot them
51 plot(range(x), range(c(posterior)), type='n', ...
xlab=expression(paste(theta)), ...
ylab=expression(paste("f(", theta, " | x )")))
52
53 lines(x, posterior, type='l', col='blue',lwd=5)
54
55 # title("Bivariate Posterior Distribution")
56 legend = c("posterior")
57
58 ## credible interval posterior plot
59
60 ## the range of sampling
61 x=seq(0,20,length=101)
62 ## this function gets numbers from console
63 posterior=dnorm(x, mean = 7, sd = 2, log = FALSE)
64
65 ## let's plot them
66 plot(range(x), range(c(posterior)), type='n', ...
xlab=expression(paste(theta)), ...
ylab=expression(paste("f(", theta, " | x )")))
67
68 lines(x, posterior, type='l', col='blue',lwd=5)
69
70 # title("Bivariate Posterior Distribution")
71 legend = c("posterior")
79