0% found this document useful (0 votes)

9 views44 pages

Lec4 IntroToProbabilityAndStatistics

This document provides an introduction to probability and statistics concepts including covariance, uncorrelated random variables, multivariate random variables, independence, marginal and conditional densities, transformations of random variables, and Dirichlet distribution. It defines key terms such as covariance, correlation, variance, and covariance matrix for multivariate random variables. References for further reading on probability and statistics are also provided.

Uploaded by

hu jack

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views44 pages

Lec4 IntroToProbabilityAndStatistics

Uploaded by

hu jack

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Introduction to

Probability and Statistics

(Continued)
Prof. Nicholas Zabaras
Center for Informatics and Computational Science
https://cics.nd.edu/
University of Notre Dame
Notre Dame, Indiana, USA

Email: [email protected]
URL: https://www.zabaras.com/

August 28, 2018

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Contents
 Covariance, Uncorrelated Random Variables, Multivariate Random
Variables, Independence Vs Uncorrelated Random Variables

 Marginal and Conditional Densities, Conditional Expectaction

 The multivariate Gaussian, Multivariate Student t distribution

 Transformations of random variables

 Dirichlet distribution

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 2
References
• Following closely Chris Bishops’ PRML book, Chapter 2

• Kevin Murphy’s, Machine Learning: A probablistic perspective, Chapter 2

• Jaynes, E. T. (2003). Probability Theory: The Logic of Science. Cambridge

University Press.

• Bertsekas, D. and J. Tsitsiklis (2008). Introduction to Probability. Athena

Scientiﬁc. 2nd Edition

• Wasserman, L. (2004). All of statistics. A Concise Course in Statistical

Inference. Springer.

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 3
Covariance
 Consider two random variables X , Y :   .

 The joint probability distribution is defined as:

P  X  A, Y  B  P  X 1 ( A) Y 1 ( B )   p( x, y)dxdy
A B

 Two random variables are independent if

p ( x, y )  p ( x ) p ( y )

 The covariance of X and Y is defined as:

cov( X , Y )   X   X  Y  Y 


 It is straight forward to verify that: cov( X , Y )   XY    X  Y 

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 4
Correlation, Center Normalized Random Variables
 Consider two random variables X , Y :   .

 The correlation coefficient of X and Y is defined as:

cov( X , Y )
corrc( X , Y ) 
 XY

where the standard deviations of X and Y are

 X  cov( X ),  Y  cov(Y )
~ 𝑋−𝔼 𝑋
𝑋=
𝜎𝑋
 The center normalized random variables are defined as:
~ 𝑌−𝔼 𝑌
𝑌=
𝜎𝑌
 It is straight forward to verify that:

~ ~ ~ ~
𝔼 𝑋 =𝔼 𝑌 =0 var 𝑋 = var 𝑌 = 1

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 5
Variance and Covariance
 Variance

var  f    f ( X )   f ( X )  
2
 f ( X ) 2    f ( X )
2


 Covariance

cov  X , Y    X   X  Y  Y    XY    X  Y 
X ,Y  X ,Y

It expresses the extent to which 𝑋 and 𝑌 vary (linearly)

together.

 If 𝑋 and 𝑌 are independent, 𝑝(𝑋, 𝑌) = 𝑝(𝑋)𝑝(𝑌), their

covariance vanishes.

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 6
Uncorrelated Random Variables
 Consider two random variables X , Y :   .

 We say that 𝑋 and 𝑌 are uncorrelated when: cov( X , Y )  0  [ XY ]  [ X ] [Y ]

 If 𝑋 and 𝑌 are independent, then they are uncorrelated:

cov( X , Y )   X   X  Y  Y    X   X  Y  Y   0

  
The opposite is not true: Uncorrelated random variables are not
independent. Independency affects the whole density, not just the
expectation.

 𝑋 and 𝑌 are orthogonal if

 XY   0
In the last case, the following holds:
( X  Y ) 2    X 2   Y 2 

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 7
Multivariate Random Variables
 Consider  X1 
X 
X   2:  n
 .. 
 
Xn 

where each component X i is an - valued variable.

 𝑋 is defined by the joint probability density of its components


pX : n


 Define the cumulative distribution function is defined as:

F ( x 1 , x2 ,..., xn )  Pr  X 1  x1 , X 2  x2 ,..., X n  xn   [0,1]

 Then the probability density function of 𝑋 is defined as

n F ( x)
p ( x1 , x2 ,..., xn ) 
x1x2 ...xn
and  p( x)dx  1
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 8
Multivariate Random Variables
 Consider  X1 
X 
X   2:  n
 .. 
 
Xn 

where each component X i is an -valued variable.

 The expectation is defined as

 X    xp( x)dx  n
, or  X i    xi p( x)dx   xi p  xi dxi  , i  1, 2,.., n
n n

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 9
Covariance Matrix
 Consider  X1 
X 
X   2:  n
 .. 
 
Xn 
cov  X    x   X   x   X 
T n n
 The covariance matrix is: p ( x)dx 
n

or equivalently: cov  X ij    x   X   x

n
i i j  
 X j  p ( x)dx  , 1  i, j  n.

 The covariance matrix is symmetric and positive semi-definite, i.e.

∀𝜐 ∈ ℝ𝑛 , 𝜐 ≠ 0, 𝜐 𝑇 cov 𝑋 𝜐 = න 𝜐 𝑇 (𝑥 − 𝑥ҧ ) 𝑥 − 𝑥ҧ 𝑇 𝜐 𝑝(𝑥)𝑑𝑥 = න 𝜐 𝑇 (𝑥 − 𝑥ҧ ) 2 𝑝(𝑥)𝑑𝑥 ≥ 0.

ℝ𝑛 ℝ𝑛

 Note that the diagonal of the covariance matrix gives the variances of
the individual components:
cov  X ii   ( x   X )
i i
2
p ( x)dx   ( xi   X i ) 2  p ( xi , xi' )dxi' dxi   ( xi   X i ) 2 p( xi )dxi  var  X i 
n n n1 n

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 10
Covariance Matrix
 The covariance matrix of a vector 𝑿 can be written explicitly:
 var[ X 1 ] cov[ X 1 , X 2 ] ... cov[ X 1 , X d ] 
 var[ X 2 ] ... 
cov  X    
 
 
 cov[ X d , X 1 ] cov[ X d , X 2 ] ... var[ X d ] 

 A normalized version of this is the correlation matrix (all

elements between [−1,1] (diagonal elements = 1)
 corr[ X 1 , X 1 ] corr[ X 1 , X 2 ] ... corr[ X 1 , X d ] 
 corr[ X 2 , X 2 ] ... 
R 
 
 
 corr [ X d , X 1 ] corr [ X d , X 2 ] ... corr [ X d , X d 
]
cov( X , Y )
corr  X , Y  
var( X ) var(Y )
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 11
Correlation Coefficient Between -1 and 1
 Consider two scalar random variables 𝑋 and 𝑌. We can write
the following:

X Y    X Y 
2
  X Y 
2

0  var            
 X  Y    X  Y      X  Y  
var  X  var Y  cov  X , Y 
  2  1  1  2Corr  X , Y 
2
X 2
Y  XY
 Corr  X , Y   1
 Similarly starting with
X Y 
0  var     Corr  X , Y   1
 X  Y 
cov( X , Y )
corr  X , Y  
var( X ) var(Y )
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 12
Variance and Covariance
 The covariance of the vector random variables is:

cov  X , Y    X   X  Y  Y 
T
  XY T   X  Y T 
X ,Y
  X ,Y

 The covariance between the components of a vector:

cov  X , X   X ,X
 XX T   X   X T 

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 13
Correlation as a Degree of Linearity
 It can be shown that
If Y  aX  b, a  0, then : corr  X , Y   1
If Y  aX  b, a  0, then : corr  X , Y   1

The regression coefficient is 𝑎 = 𝑐𝑜𝑣[𝑋, 𝑌 ] / 𝑣𝑎𝑟 [𝑋].

 Think of the correlation coefficient as a degree of linearity.
 If 𝑋 and 𝑌 are independent, 𝑝(𝑋, 𝑌) = 𝑝(𝑋)𝑝(𝑌), then
𝑐𝑜𝑣 [𝑋, 𝑌] = 0, and hence 𝑐𝑜𝑟𝑟 [𝑋, 𝑌] = 0 so they are
uncorrelated.
 The converse is not true: uncorrelated does not imply
independence.

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 14
Independent vs Uncorrelated
 Note that:
var[ X  Y ]   X  Y  2    X  Y  
2

 
 X     X   Y    Y   2  XY    X  Y  
2 2
 2 2

 var  X   var Y   2 cov  X , Y 

 From the above equation, we note that if 𝑋, 𝑌 are

independent then:

var  X  Y   var  X   var Y 

but note that the linearity of expectation is valid even when

the variables are not independent:
 X  Y    X   Y 
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 15
Correlation and Dependence
 Uncorrelated does not imply independent.

 For example, let 𝑋 ∼ 𝒰(−1, 1) and 𝑌 = 𝑋2. Clearly 𝑌 is

dependent on 𝑋, yet one can show that 𝑐𝑜𝑟𝑟 [𝑋, 𝑌 ] = 0.

1  (1)   1
2
1  1
 
X   0, var  
X 
2 12 3

 X 2   var  X     X 
1 1
Y  
2
  02 
3 3
1
1
 XY    x3 p( x)dx   x3 dx 0
1
2

cov  X , Y   XY   [ X ] [Y ] 0  0 1/ 3
Corr  X , Y     0
 XY  XY  XY

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 16
Mutual Information
 The Figure given next shows several data sets where
there is clear dependence between 𝑋 and 𝑌, and yet the
correlation coefficient is 0.

 A more general measure of dependence between random

variables is mutual information.

The mutual information is zero if and only if the variables

are truly independent.

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 17
Uncorrelated Random Variables
 Several sets of (𝑥, 𝑦) points, with the correlation coefficient of 𝑥 and 𝑦
for each set.
 The correlation reflects the noisiness and direction of a linear
relationship (top row), but not the slope of that relationship (middle),
nor nonlinear relationships (bottom).
 The figure in the center has a slope of 0 but the correlation coefficient
is undefined because 𝑣𝑎𝑟[𝑌] = 0.

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 18
Marginal Density
 Consider two random variables X , Y :   with joint probability density
p ( x, y )

 The probability density of 𝑋 when 𝑌 can take any value is defined as:

p ( x)   p ( x, y )dy

 Similarly:
p ( y )   p ( x, y )dx.

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 19
Conditional Probability Density
 Consider two random variables X , Y :   with joint density p( x, y )

 The probability density of 𝑋 assuming that 𝑌 = 𝑦 is defined as

p ( x, y )
p( x | y)  , p( y )  0
p( y)

 One can show this by noting the following:

y  b b

  p( x, y)dxdy  2 p( x, y)dx b
p ( x, y )
P a  X  b | y    Y  y     
y a
y 
 a
dx
2 p ( y ) p( y)

a
p ( y )dy
p ( x|Y  y )
y 

 From this we derive the following important identity:

p ( x, y )  p ( x | y ) p ( y )  p ( y | x) p ( x).

 Bayes’ rule in terms of densities now follows as:

p( y | x) p( x)
p( x | y) 
p( y)
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 20
Conditional Expectations
 Consider two random variables X , Y :   .

 We define the conditional expectation as:  X | y    xp( x | y)dx

 The expectation of 𝑋 via conditional expectation can be computed as:

 
 X    xp( x)dx   x   p( x, y ) dy  dx 
 p ( x| y ) p ( y ) 

 X      xp( x | y )dx  p( y )dy    X | y p( y )dy 

 X     X | y p( y )dy
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 21
Linear Transformations
 Suppose y  f  x   Ax  b. You can show that:
 y  A  x  b
cov  y   A cov  x  AT

 For a scalar-valued function y  f  x   aT x b :

 
y  a T
 x b
var  y   a T cov  x  a

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 22
Multivariate Gaussian
is Gaussian or normally distributed X ~ N ( x0 ,  ) if:
2
 A random variable X 
 1 2
t
1
P  X  t 
2 
exp   ( x  x ) dx
2   2 2 0

 A multivariate X 
D
is Gaussian if its probability density is
1/2
 1   1 T 1 
p( x )    exp   ( x  x )  ( x  x ) 
  2  D det    2
0 0

 
where x0  D ,   DD is symmetric positive definite (covariance matrix).

 The symmetry property of the covariance matrix does not affect the value of
( x  x0 )T  1 ( x  x0 ). However, for symmetric covariance matrices we only
need to describe 𝐷(𝐷 + 1)/2 elements rather than 𝐷2.

 It is invariant under linear transformations, i.e. for A, B  M M  D , c  M

X1 ~ N ( 1 , 1 ), X 2 ~ N ( 2 ,  2 ) AX1  BX 2  c ~ N ( A1  B2  c, A1 AT  B 2 BT )

and X1 , X 2 independent 

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 23
Conditional and Marginal Probability Densities
conditional bivariate normal pdf conditional bivariate normal pdf
5

4.5 Conditional
4 1

3.5
Ellipsoids : 0.8

Probability Density
3

2.5
equiprobability 0.6

0.4
2 curves of 𝑝(𝑥, 𝑦)
0.2
1.5
. 0
1 5

0.5 p(x|y=2) 4
3 4
5

2 3
0 2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 1 1
y 0 0
Marginal bivariate normal pdf Marginal bivariate normal pdfx
5

4.5 Marginal
4 1

3.5 0.8

Probability Density
3 0.6

2.5
y

0.4

2
0.2

1.5
0
5
1

0.5
𝑝(𝑥) 4
3 4
5

2 Link here for 3a MatLab program

2
0
1 to 1generate these figures
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 y 0 0
x
x

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 24
Transformation of Probability Density
 A probability density transforms differently from functions.

 Let 𝑥 = 𝑔(𝑦).
dx
p y ( y )  px ( g ( y ))  px ( g ( y )) g '( y )  px ( g ( y )) sg '( y ), s {1,1}
dy
 This is easily derived by taking observations in the
interval (𝑥, 𝑥 + 𝑑𝑥) to be transformed to observations in
(𝑦, 𝑦 + 𝑑𝑦), i.e.
p y ( y )dy  px ( x)dx

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 25
Transformation of Probability Density
For example consider the Gamma distribution
b a a 1  xb
Gamma ( x | a, b)  x e
(a )
 Let us compute the density of 𝑌 = 1/𝑋.
b b
dx b a  ( a 1)  y 1 b a  ( a 1)  y
p y ( y )  px ( g ( y ))  y e  2  y e ,
dy (a ) y ( a )
dx 1
where  2
dy y

 This is the Inverse Gamma distribution

a b
b 
InvGamma ( y | a, b)  y  ( a 1) e y

(a )
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 26
Multivariate Change of Variables
 If 𝑓 is an invertible mapping, we can define the pdf of the
transformed variables using the Jacobian of the inverse
mapping 𝒚 → 𝒙:  y1 y1 
 x ... x 
x y  1 n

p y ( y )  px ( x ) det ,  ... 
y x 
yn yn 
 
 x ... x 
 1 n 
 As an example, it is trivial to show that transforming a
density from Cartesian coordinates 𝒙 = (𝑥, 𝑦) to polar
coordinates 𝒚 = (𝑟, 𝜃), where 𝑥 = 𝑟 cos 𝜃 and 𝑦 =
𝜕(𝑥,𝑦)
𝑟 sin 𝜃, | | = 𝑟, gives:
𝜕(𝑟,𝜃)
pr , (r , )  px , y (r cos  , r sin  ) r
pr , (r , )drd  px , y (r cos  , r sin  )rdrd
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 27
Transformation of Probability Density
dx
 Recall p y ( y )  px ( g ( y ))  px ( g ( y )) sg '( y ), s {1,1}
dy
 Using this Eq., note that modes of densities depend on
the choice of variables (see 2nd term on the rhs below):
p 'y ( y )  spx' ( x)  g '( y )  spx ( g ( y )) g ''( y )
2

 Consider 𝑋 ~ 𝒩(6,1) and the following

y 1
x  g ( y )  ln  5, y  g 1 ( x) 
1 y 1  e  x 5
 Transforming 𝑝𝑥(𝑥) as a
function gives the same mode
for 𝑝𝑥 (𝑔(𝑦)). The actual mode
of 𝑝𝑦(𝑦) is shifted.
 The histogram of 𝑝𝑦(𝑦) is
obtained as:
y ( s )  g 1 ( x ( s ) ), where : x ( s ) ~ px ( x)
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 28
Multivariate Student’s T Distribution

p( x |  , a, b)   N  x |  , 1  Gamma  | a, b  d
0

 If we return to the derivation of the univariate Student’s T distribution

and substitute   2a,   a ,    b / a , and use
b
ba a 1 b
Gamma  | a, b    e
( a )
we can write the Student’s T distribution as:*



T ( x |  ,  , )   N x |  ,  
0
1
 Gamma  |  / 2, / 2  d
 This form is useful in providing generalization to a multivatiate Student’s
T 


T ( x |  ,  , )   N x |  ,  
0
1
 Gamma  |  / 2, / 2  d
*Use change of variables for distributions, also d   d , and notice that the extra terms that appear cancel out.
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 29
Multivariate Student’s T Distribution

T ( x |  ,  , )   N  x |  ,    Gamma  |  / 2, / 2  d
1

0
 This integral can be computed analytically as:
D 
(  ) 2  /2  D /2
||   
1/2
T ( x |  ,  , )  2 2 D /2 
1 

( )     
2
 2   x      x    (Mahalanobis Distance )
T

 One can derive the above form of the distribution by substitution in the
Eq. on the top.
 / 2
 /2 

1/2

T ( x |  ,  , )  D/2 
 D /2  / 21e  /2 e  / 2 d Use     / 2   2 / 2 
2

  / 2   2  0

 / 2  
 /2
 
   
1/2 1/2

   
  / 2 d / 2  D /2  /2
0
D /2 /2 D /2  / 2 1 
  / 2   2
/ 2  e d  1   2
/ 
  / 2   2  D /2   / 2   
D/2

 Normalization proof is immediate from the normalization of the normal &

Gamma distributions.
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 30
Multivariate Student’s T Distribution
D 
(  )  2  /2  D /2

|  |1/2

T ( x |  ,  , )  2 2
D /2 
1  
      
( )
2
 Some useful results of the multivatiate Student’s T are given below:

 x   if   1, cov  x   1 if   2, mode  x   
 2
 One can show easily the expression for the mean by using 𝒙 = 𝒛 + 𝝁:
D 
 (  )  2  /2  D /2

|  |1/2

 x   2 2
 D /2 
1    z    dz
 ( )     
2
 The 1st term drops out since T ( z |  ,  , ) is even. The 2nd term gives 𝝁
from the normalization of the distribution.
 The covariance is computed as:
 / 2  
 /2

 
 0  x

cov  x      N x |  ,  
1
  x    x   T
dx Gamma  |  / 2, / 2  d 


  / 2   0
    /21e /2 d
1

 / 2    / 2  1  / 2    / 2  1 1   / 2 1   1

 /2
1
 
  / 2   / 2  / 2  / 2  /21   / 2   / 2 1  2
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 31
Multivariate Student’s T Distribution
D 
(  )  2  /2  D /2

|  |1/2

T ( x |  ,  , )  2 2
D /2 
1  
      
( )
2
 Differentiation with respect to 𝒙 also shows the mode being 𝝁:

 x    if   1, cov  x   1 if   2, mode  x   
 2
 The Student’s T has fatter tails than a Gaussian. The smaller 𝜐 is the
fatter the tails.

 For 𝜐 ∞, the distribution approaches a Gaussian. Indeed note that:

 2 
 /2  D /2
   D    2         2 1   2 2    2 
1     exp      ln 1     exp          exp    O  1  
    2 2      2   2      2 
  

 The distribution can also be written in terms of 𝚺 = 𝜦−1 (scale matrix –

not the covariance) or 𝑽 = 𝜐𝚺.

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 32
Dirichlet Distribution
 We introduce the Dirichlet distribution as a family of
“conjugate priors” (to be formally introduced in a follow up
lecture) for the parameters 𝜇𝑘 in the multinomial
distribution.

The Dirichlet distribution 𝒟𝒾𝓇(𝛼), is a family of continuous

multivariate probability distributions parametrized by the
vector 𝜶 of positive reals.

 It is the multivariate generalization of the 𝓑𝓮𝓽𝓪

distribution.

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 33
Dirichlet Distribution
 Its probability density function returns the belief that the
probabilities of 𝐾 rival events are 𝜇𝑘 given that each event
has been observed 𝛼𝑘 − 1 times:
K
p(  |  )   k k 1 ,
k 1

0  k  1,
K


k 1
k 1

The distribution over the space of 𝜇𝑘 is 𝐾 − 1 dimensional

due to the last constraint above.

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 34
Dirichlet Distribution
The Dirichlet distribution of order 𝐾 ≥ 2 with parameters
𝛼1, … , 𝛼𝐾 > 0 has a PDF with respect to Lebesgue
measure on K 1 given by
K
1
p( |  )  
Beta ( ) k 1
k k 1

for all 𝜇1, … , 𝜇𝐾−1 > 0 satisfying 𝜇1+. . +𝜇𝐾−1 < 1, where 𝜇𝐾
is an abbreviation for 1 – 𝜇1 − ⋯ − 𝜇𝐾−1 . The normalizing
constant is the multinomial Beta function:
K

 ( k )
Beta( )  ,   1 ,  2 ,...,  K 
k 1 T

 K 
  k 
 k 1 
The Dirichlet distribution over (𝜇1, 𝜇2, 𝜇3)
is confined on a plane as shown.
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 35
Dirichlet Distribution
We write the Dirichlet distribution as:
K
(a0 )
p (  |  )  K ( ) k k 1 , K ( )  , a0  a1  ...  aK
k 1 (a1 )...(aK )
 Note the following useful relation:
 K
 K K

 j

k 1
 k 1
k 
 j
e
k 1
 k 1 ln k
 ln  j  k k 1
k 1

 From this we can derive an interesting expression for ln  j 


1 1 K 1 1 K
ln  j   K ( )  .. ln  j   k  k 1
d 1....d  K  K ( )  ..   k 1
d 1....d  K 
 j
k
0 0 k 1 0 0 k 1

  ln K ( )  ln ( j )  ln ( 0 )
1 1 K
1
 ..  
 k 1
K ( ) d 1....d  K K ( )   
 j  j K ( )  j  j  0
k
0 0 k 1
  0 
 
 j
where     d ln ( ) d is the digamma function.
 ln ( j )  ln ( 0 )
ln  j     j     0  ,   j   ,   0  
 j  0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 36
Dirichlet Distribution: Normalization
 To show the normalization, we use induction. The case for
𝑀 = 2 was shown earlier for the Beta distribution.
 Assume that the Dirichlet normalization formula is valid for
𝑀 − 1 terms. We will show the formula for 𝑀 terms:
 M 1
 k 1  
M 1 M 1
pM ( 1 ,...,  M 1 )  CM  k 1    j 
k 1  j 1 
M

 Let us integrate out M 1 :

M 2
1  j  M 1
 M  2  k 1   M 1 1  M  2 
j 1

pM 1 ( 1 ,...,  M  2 )  CM     k   M 1 1    j   M 1  d  M 1 
0  k 1   j 1 
 M 1 1 M 11
  k 1   M 1 1  
M 2 1 M 2
CM    k   t 1    j  1  t 
 M 1
 dt
 M 2   k 1 0  j 1 
 M 1 t 1
  j  
 j 1 

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 37
Dirichlet Distribution: Normalization
 M 1  M 1 1
 M 2
 
M 2
pM 1 ( 1 ,...,  M  2 )  CM   k k 1  1    j  1  t 
 M 1

 M 1 1
t dt 
 k 1  j 1  0
 M 1  M 1
 M  2  k 1   M  2    M 1    M 
 CM    k   1    j 
 k 1  j 1    M 1   M 
Dirichlet ( M 1)

 The last step above comes from the normalization of Beta.

 What we have above is an (𝑀 − 1) term Dirichlet
distribution with coefficients 1 ,....,  M 2 ,  M 1   M . Since we
assumed that the normalization formula is valid for (𝑀 − 1)
terms, we must have:
 1  ...  M  2    M 1   M    M 1    M 
1  CM 
 1  ...   M    M 1   M 
 1  ...   M 
CM 
 1  ...  M  2    M 1    M 
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 38
Dirichlet Distribution
Using the multinomial as a “likelihood” and the Dirichlet as
“the conjugate prior”, we arrive at the following “posterior”
K
p (  | D )  p(D |  ) p(  )  p(  | D )   k k  mk 1
k 1
Multinomial Dirichlet

which is a Dirichlet distribution Dir (  | 1  m1 ,...,  K  mK ).

The normalization factor is computed easily from the
normalization factor of the Dirichlet as:
 K 
  k  N  K
p(  | D )  K k 1 
 k k  mk 1
 ( k  mk ) k 1
k 1

 𝑎𝑘 can be interpreted as “the effective number of prior

observations of 𝑥𝑘 = 1”.
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 39
Dirichlet Distribution
Examples of Dirichlet distribution over (𝜇1, 𝜇2, 𝜇3) which can
be plotted in 2𝐷 since 𝜇3 = 1 − 𝜇1 − 𝜇2.
Uniform Broad centered Narrow centered
at (1/3,1/3,1/3) at (1/3,1/3,1/3)
a0  a1  ...  aK
controls how
peaked the
distribution is

The aK control the

location of the peak

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 40
Dirichlet Distribution
The Dirichlet distribution over (𝜇1, 𝜇2, 𝜇3) where the horizontal
axes are 𝜇1 and 𝜇2 and the vertical axis is the density.
{ K }  0.1 { K }  10
25
15
20

f(x1,x2,1-x1-x2)
f(x1,x2,1-x1-x2)

10 15

0
10 0
5

5
0 0.5
0 0.5
1
0.8
0.6 1
0.4 0.8
0.2 1 x2 0.6
0
0.4
x1 0.2 1 x2
0

{ K }  1 x1

3.5

3
f(x1,x2,1-x1-x2)

2.5

2 0

1.5

1 0.5
1
0.8
0.6
0.4
x2
MatLab Code
0.2 1
0
x1
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 41
Dirichlet Distribution
The Dirichlet distribution over (𝜇1, 𝜇2, 𝜇3) where the horizontal
axes are 𝜇1 and 𝜇2 and the vertical axis is the density.  =0.10

10
{ K }  0.1, 0.1, 0.1 If ak<1/3 for all k,
we obtain spikes

p
5 at the corners

0
1
1
0.5
 =10.00 0.5

{ K }  2, 2, 2 0 0

15
{ K }  10,10,10
p

0
1
Run visDirichletGui & dirichlet3dPlot 1
from PMTK 0.5
0.5

0 0

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 42
Dirichlet Distribution
fromfrom5-dimensional
Samples Samples Dir (alpha=5)
symmetric Dirichlet distribution.
1
0.5
Samples from Dir (alpha=0.1)
0 1
1 2 3 4 5
1 0.5
0.5 0
1 2 3 4 5
0 1
1 2 3 4 5
1 0.5
0.5 0
1 2 3 4 5
0 1
1 2 3 4 5
1 0.5
0.5 0
1 2 3 4 5
0 1
1 2 3 4 5
1 0.5
0.5 0
1 2 3 4 5
0 1
1 2 3 4 5
0.5
{ K }  5,5,...,5 0
1 2 3 4 5

{ K }  0.1, 0.1,..., 0.1

Run dirichletHistogramDemo
from PMTK

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 43
Dirichlet Distribution
 In closing, we have the following properties (you only need
the normalization    d   (a)...(a)(a ) , a  a  ...  a of the Dirichlet
K
 k 1 1 K
k 0 1 K
k 1

distribution and the property ( x  1)  x( x) to prove them):

k k 1  k  0   k   j l
[ k ]  , mode[ k ]  , var[ k ]  2 , cov[  j l ]   2  j  l
0 0 1  0 ( 0  1)  0 ( 0  1)
K
where :  0    k
k 1

 Often we use:
k   K

In this case:
1 K 1
[ k ]  , var[ k ]  2
K K (  1)

Increasing  increases the precision of the distribution.

Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 44

Appendix A Probability and Statistics
No ratings yet
Appendix A Probability and Statistics
12 pages
Covariance
No ratings yet
Covariance
5 pages
Covariance and Correlation: Multiple Random Variables
No ratings yet
Covariance and Correlation: Multiple Random Variables
2 pages
Reliability-21 08 2023
No ratings yet
Reliability-21 08 2023
51 pages
Unit - III Spatial Data Ajustment
No ratings yet
Unit - III Spatial Data Ajustment
127 pages
Chapter 6
No ratings yet
Chapter 6
39 pages
Joint Probability Distributions and Random Samples
No ratings yet
Joint Probability Distributions and Random Samples
28 pages
The PM F /PDF of and Independent) - The Discrete Case - The Continuous Case Mechanics - The Sum Independent Normals
No ratings yet
The PM F /PDF of and Independent) - The Discrete Case - The Continuous Case Mechanics - The Sum Independent Normals
14 pages
MATH230 Lecture Notes 3
No ratings yet
MATH230 Lecture Notes 3
45 pages
Econometrics1 2 PDF
No ratings yet
Econometrics1 2 PDF
63 pages
Chapter 8
No ratings yet
Chapter 8
39 pages
11 - Mean, Median, Covariance and Correlation
No ratings yet
11 - Mean, Median, Covariance and Correlation
19 pages
C0 English
No ratings yet
C0 English
42 pages
7 Multiple Random Variables
No ratings yet
7 Multiple Random Variables
15 pages
MFIN 5800 Chapter 11
No ratings yet
MFIN 5800 Chapter 11
32 pages
Covariances
No ratings yet
Covariances
12 pages
Capitulo 1 Rencher
No ratings yet
Capitulo 1 Rencher
19 pages
Elements of Probability Theory
No ratings yet
Elements of Probability Theory
6 pages
Stat
No ratings yet
Stat
53 pages
Handout 2 Multivariate
No ratings yet
Handout 2 Multivariate
10 pages
Lect 11
No ratings yet
Lect 11
39 pages
04 Ekspektasi - Matematik - SLIDE
No ratings yet
04 Ekspektasi - Matematik - SLIDE
28 pages
Week 6
No ratings yet
Week 6
3 pages
Week 3
No ratings yet
Week 3
3 pages
ch5 Covariance 0
No ratings yet
ch5 Covariance 0
15 pages
MIT6 436JF18 Lec06
No ratings yet
MIT6 436JF18 Lec06
18 pages
Variables Aleatorias
No ratings yet
Variables Aleatorias
14 pages
Probability
No ratings yet
Probability
12 pages
Data Science Imp Questions and Answers
No ratings yet
Data Science Imp Questions and Answers
13 pages
Week 3 - Notes
No ratings yet
Week 3 - Notes
3 pages
2 - Artificial Intelligence Mathematics
No ratings yet
2 - Artificial Intelligence Mathematics
53 pages
2A2. Review of Probability
No ratings yet
2A2. Review of Probability
8 pages
Multivariate Analysis (Slides 2)
No ratings yet
Multivariate Analysis (Slides 2)
25 pages
Review of Probability and Statistics
No ratings yet
Review of Probability and Statistics
34 pages
Activity in English III Court
No ratings yet
Activity in English III Court
3 pages
S M S T C Inverse Problems Lecture 4
No ratings yet
S M S T C Inverse Problems Lecture 4
47 pages
L21 MTH202 Atreyee
No ratings yet
L21 MTH202 Atreyee
12 pages
Covariance and Some Conditional Expectation Exercises: Scott Sheffield
No ratings yet
Covariance and Some Conditional Expectation Exercises: Scott Sheffield
69 pages
Chapter 4 SQQS1013
No ratings yet
Chapter 4 SQQS1013
20 pages
Lecture 23
No ratings yet
Lecture 23
5 pages
Covariance and Corelation
No ratings yet
Covariance and Corelation
19 pages
Lecture 20
No ratings yet
Lecture 20
17 pages
AB1202 Statistics and Analysis
No ratings yet
AB1202 Statistics and Analysis
16 pages
16oct24 Annotations
No ratings yet
16oct24 Annotations
35 pages
DV Stat
No ratings yet
DV Stat
39 pages
Cov Normal
No ratings yet
Cov Normal
5 pages
Lecture 19
No ratings yet
Lecture 19
8 pages
1 + X E (X Is Is Integrable, But Not Square Is Not Integrable, The Variance Is
No ratings yet
1 + X E (X Is Is Integrable, But Not Square Is Not Integrable, The Variance Is
18 pages
Summarystatistics - PPTX 0
No ratings yet
Summarystatistics - PPTX 0
17 pages
Correlations and Copulas
No ratings yet
Correlations and Copulas
53 pages
OptimalLinearFilters PDF
No ratings yet
OptimalLinearFilters PDF
107 pages
12th Maths Chapter 11
No ratings yet
12th Maths Chapter 11
23 pages
Covariance - Correlation and Regression (Lecture)
No ratings yet
Covariance - Correlation and Regression (Lecture)
11 pages
Distributions and Normal Random Variables
No ratings yet
Distributions and Normal Random Variables
8 pages
Joint Discrete PMFs PDF
No ratings yet
Joint Discrete PMFs PDF
2 pages
Correlation and Covariance Coefficients
No ratings yet
Correlation and Covariance Coefficients
3 pages
Lecture 2
No ratings yet
Lecture 2
15 pages
Week 3
No ratings yet
Week 3
3 pages
LN 1
No ratings yet
LN 1
11 pages
MIT14 381F13 Lec1 PDF
No ratings yet
MIT14 381F13 Lec1 PDF
8 pages
Sma 2231 - Probability and Statistics Iii
No ratings yet
Sma 2231 - Probability and Statistics Iii
3 pages
Lecture 8.2 - Variational Quantum Eigensolver
No ratings yet
Lecture 8.2 - Variational Quantum Eigensolver
27 pages
MATH2010 2022 23 AutumnNotes Gappy
No ratings yet
MATH2010 2022 23 AutumnNotes Gappy
92 pages
Lec20 RidgeRegression
No ratings yet
Lec20 RidgeRegression
21 pages
Lecture 1.1 - Single States
No ratings yet
Lecture 1.1 - Single States
49 pages
Midterm Exam in Statistics and Probability Grade 11
No ratings yet
Midterm Exam in Statistics and Probability Grade 11
2 pages
Durrande 2020
No ratings yet
Durrande 2020
90 pages
Lesson 5 - Probability Distributions
No ratings yet
Lesson 5 - Probability Distributions
8 pages
Cs229 HMM - Ps
No ratings yet
Cs229 HMM - Ps
13 pages
Ek 2020
No ratings yet
Ek 2020
203 pages
Assignment 8:: PHM111s - Probability and Statistics
No ratings yet
Assignment 8:: PHM111s - Probability and Statistics
2 pages
Dai 2020
No ratings yet
Dai 2020
62 pages
Lec30 GibbsSampling
No ratings yet
Lec30 GibbsSampling
55 pages
Gonzalez 2020
No ratings yet
Gonzalez 2020
79 pages
Lec7 InformationTheory
No ratings yet
Lec7 InformationTheory
41 pages
Lec29 ImportanceSampling
No ratings yet
Lec29 ImportanceSampling
84 pages
Seminar em
No ratings yet
Seminar em
51 pages
ACTED061L Lesson 4 - Discrete Probability Distributions
No ratings yet
ACTED061L Lesson 4 - Discrete Probability Distributions
45 pages
Lec31 32 CaterpillarRegressionExample
No ratings yet
Lec31 32 CaterpillarRegressionExample
108 pages
Chapter 4 CONTINUOUS PROBABILITY
No ratings yet
Chapter 4 CONTINUOUS PROBABILITY
46 pages
Lec14 15 GenerativeModelsForDiscreteData
No ratings yet
Lec14 15 GenerativeModelsForDiscreteData
74 pages
Lec33 MetropolisHastings
No ratings yet
Lec33 MetropolisHastings
66 pages
Lec25 MonteCarloMethods
No ratings yet
Lec25 MonteCarloMethods
57 pages
Lec9 MultivariateGaussian
No ratings yet
Lec9 MultivariateGaussian
60 pages
Lec16 SummarizingPosteriors BayesianModelSelection
No ratings yet
Lec16 SummarizingPosteriors BayesianModelSelection
59 pages
Lec27 AcceptReject
No ratings yet
Lec27 AcceptReject
30 pages
Introduction To State Space Models and Sequential Bayesian Inference
No ratings yet
Introduction To State Space Models and Sequential Bayesian Inference
58 pages
Andan Queen Zhien A. CE 4B Assignment 1
No ratings yet
Andan Queen Zhien A. CE 4B Assignment 1
10 pages
Lecture 3 - Entanglement in Action
No ratings yet
Lecture 3 - Entanglement in Action
36 pages
Lec22 Introduction2BayesianRegression
No ratings yet
Lec22 Introduction2BayesianRegression
42 pages
Lec35 SequentialImportanceSampling
No ratings yet
Lec35 SequentialImportanceSampling
46 pages
Lec17 PriorModeling
No ratings yet
Lec17 PriorModeling
37 pages
Lecture 4.1 - Quantum Query Algorithms
No ratings yet
Lecture 4.1 - Quantum Query Algorithms
38 pages
Lec23 Evidence4Regression
No ratings yet
Lec23 Evidence4Regression
38 pages
Lec24 BayesianLinearRegression
No ratings yet
Lec24 BayesianLinearRegression
29 pages
Practical 5
No ratings yet
Practical 5
4 pages
Probability Theory Lecture Notes
No ratings yet
Probability Theory Lecture Notes
68 pages
Lecture 8.1 - Iterative Quantum Phase Estimation - Moving Beyond Traditional QPE
No ratings yet
Lecture 8.1 - Iterative Quantum Phase Estimation - Moving Beyond Traditional QPE
31 pages
Chapter 8
No ratings yet
Chapter 8
64 pages
Lec18 HierarchicalBayesianModels
No ratings yet
Lec18 HierarchicalBayesianModels
20 pages
Probability Random Processes and Statistics
No ratings yet
Probability Random Processes and Statistics
23 pages
Lecture 7 - Introduction To Quantum Noise Bonus
No ratings yet
Lecture 7 - Introduction To Quantum Noise Bonus
13 pages
Lec21 BiasVarianceDecomposition
No ratings yet
Lec21 BiasVarianceDecomposition
15 pages
GMM
No ratings yet
GMM
25 pages
Pyq Probability Distributions
No ratings yet
Pyq Probability Distributions
3 pages
X Y X X Y y X X, Y y Y Y: Conditional Expectation
No ratings yet
X Y X X Y y X X, Y y Y Y: Conditional Expectation
21 pages
Soalan Klon Set D Pelajar (Pra PSPM) SM025
No ratings yet
Soalan Klon Set D Pelajar (Pra PSPM) SM025
8 pages
ch3 PDF
No ratings yet
ch3 PDF
5 pages
Assignment 4
No ratings yet
Assignment 4
7 pages
Table of Specification: San Jorge National High School
No ratings yet
Table of Specification: San Jorge National High School
2 pages
Week 9 - Activity 6
No ratings yet
Week 9 - Activity 6
5 pages
MSO 201A: Probability and Statistics 2021 (2nd Semester) Assignment-IV
No ratings yet
MSO 201A: Probability and Statistics 2021 (2nd Semester) Assignment-IV
5 pages
B2 - Probability Theory 2
No ratings yet
B2 - Probability Theory 2
5 pages
Probability Integral Transformation
No ratings yet
Probability Integral Transformation
5 pages
Quiz MTH202
No ratings yet
Quiz MTH202
2 pages
R Formula and Review Sheet
No ratings yet
R Formula and Review Sheet
2 pages
Assignments MA-41
No ratings yet
Assignments MA-41
1 page
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
Math for Computer Applications
From Everand
Math for Computer Applications
The Editors of REA
No ratings yet

Lec4 IntroToProbabilityAndStatistics

Uploaded by

Lec4 IntroToProbabilityAndStatistics

Uploaded by

Introduction to

Probability and Statistics

August 28, 2018

 Marginal and Conditional Densities, Conditional Expectaction

 The multivariate Gaussian, Multivariate Student t distribution

 Transformations of random variables

• Kevin Murphy’s, Machine Learning: A probablistic perspective, Chapter 2

• Jaynes, E. T. (2003). Probability Theory: The Logic of Science. Cambridge

• Bertsekas, D. and J. Tsitsiklis (2008). Introduction to Probability. Athena

• Wasserman, L. (2004). All of statistics. A Concise Course in Statistical

 The joint probability distribution is defined as:

 Two random variables are independent if

 The covariance of X and Y is defined as:

 It is straight forward to verify that: cov( X , Y )   XY    X  Y 

 The correlation coefficient of X and Y is defined as:

where the standard deviations of X and Y are

It expresses the extent to which 𝑋 and 𝑌 vary (linearly)

 If 𝑋 and 𝑌 are independent, 𝑝(𝑋, 𝑌) = 𝑝(𝑋)𝑝(𝑌), their

 We say that 𝑋 and 𝑌 are uncorrelated when: cov( X , Y )  0  [ XY ]  [ X ] [Y ]

 If 𝑋 and 𝑌 are independent, then they are uncorrelated:

cov( X , Y )   X   X  Y  Y    X   X  Y  Y   0

 𝑋 and 𝑌 are orthogonal if

where each component X i is an - valued variable.

 𝑋 is defined by the joint probability density of its components

 Define the cumulative distribution function is defined as:

 Then the probability density function of 𝑋 is defined as

where each component X i is an -valued variable.

 The expectation is defined as

or equivalently: cov  X ij    x   X   x

 The covariance matrix is symmetric and positive semi-definite, i.e.

∀𝜐 ∈ ℝ𝑛 , 𝜐 ≠ 0, 𝜐 𝑇 cov 𝑋 𝜐 = න 𝜐 𝑇 (𝑥 − 𝑥ҧ ) 𝑥 − 𝑥ҧ 𝑇 𝜐 𝑝(𝑥)𝑑𝑥 = න 𝜐 𝑇 (𝑥 − 𝑥ҧ ) 2 𝑝(𝑥)𝑑𝑥 ≥ 0.

 A normalized version of this is the correlation matrix (all

 The covariance between the components of a vector:

The regression coefficient is 𝑎 = 𝑐𝑜𝑣[𝑋, 𝑌 ] / 𝑣𝑎𝑟 [𝑋].

 var  X   var Y   2 cov  X , Y 

 From the above equation, we note that if 𝑋, 𝑌 are

var  X  Y   var  X   var Y 

but note that the linearity of expectation is valid even when

 For example, let 𝑋 ∼ 𝒰(−1, 1) and 𝑌 = 𝑋2. Clearly 𝑌 is

 A more general measure of dependence between random

The mutual information is zero if and only if the variables

 The probability density of 𝑋 assuming that 𝑌 = 𝑦 is defined as

 One can show this by noting the following:

 From this we derive the following important identity:

 Bayes’ rule in terms of densities now follows as:

 We define the conditional expectation as:  X | y    xp( x | y)dx

 The expectation of 𝑋 via conditional expectation can be computed as:

 X      xp( x | y )dx  p( y )dy    X | y p( y )dy 

 For a scalar-valued function y  f  x   aT x b :

 It is invariant under linear transformations, i.e. for A, B  M M  D , c  M

X1 ~ N ( 1 , 1 ), X 2 ~ N ( 2 ,  2 ) AX1  BX 2  c ~ N ( A1  B2  c, A1 AT  B 2 BT )

2 Link here for 3a MatLab program

 This is the Inverse Gamma distribution

 Consider 𝑋 ~ 𝒩(6,1) and the following

 If we return to the derivation of the univariate Student’s T distribution

 Normalization proof is immediate from the normalization of the normal &

 / 2    / 2  1  / 2    / 2  1 1   / 2 1   1

 For 𝜐 ∞, the distribution approaches a Gaussian. Indeed note that:

 The distribution can also be written in terms of 𝚺 = 𝜦−1 (scale matrix –

The Dirichlet distribution 𝒟𝒾𝓇(𝛼), is a family of continuous

 It is the multivariate generalization of the 𝓑𝓮𝓽𝓪

The distribution over the space of 𝜇𝑘 is 𝐾 − 1 dimensional

 From this we can derive an interesting expression for ln  j 

 Let us integrate out M 1 :

 The last step above comes from the normalization of Beta.

which is a Dirichlet distribution Dir (  | 1  m1 ,...,  K  mK ).

 𝑎𝑘 can be interpreted as “the effective number of prior

The aK control the

{ K }  0.1, 0.1,..., 0.1

distribution and the property ( x  1)  x( x) to prove them):

Increasing  increases the precision of the distribution.

You might also like