Differential Entropy: Peng-Hua Wang
Differential Entropy: Peng-Hua Wang
Differential Entropy
Peng-Hua Wang
x2
1 −
Example. If X ∼ N (0, σ 2 ) with pdf φ(x) = √2πσ 2
e , then
2σ 2
Z
ha (x) = − φ(x) loga φ(x)dx
2
1 x
Z
= − φ(x) loga √ − 2 loga e dx
2πσ 2 2σ
1 2 loga e 2 1 2
= loga (2πσ ) + 2
E φ [X ] = log a (2πeσ )
2 2σ 2
Remark. If a random variable with pdf f (x) has zero mean and
variance σ 2 , then
Z
− f (x) loga φ(x)dx
2
1 x
Z
= − f (x) loga √ − 2 loga e dx
2πσ 2 2σ
1 2 loga e 2 1 2
= loga (2πσ ) + 2
E f [X ] = log a (2πeσ )
2 2σ 2
Suppose that a random variable X with pdf f (x) has zero mean and
variance σ 2 , what is its maximal differential entropy?
Let φ(x) be the pdf of N (0, σ 2 ).
Z Z
φ(x)
h(X) + f (x) log φ(x)dx = f (x) log dx
f (x)
Z
φ(x)
≤ log f (x) dx (convexity of logarithm)
f (x)
Z
= log φ(x)dx = 0
That is,
1
Z
h(X) ≤ − f (x) log φ(x)dx = log(2πeσ 2 )
2
and equality holds if f (x) = φ(x).
1
− log f (X1 , X2 , . . . , Xn ) → E[− log f (X)] = h(X)
n
in probability.
(n)
Definition 2 (Typical Set) For ǫ > 0 the typical set Aǫ with respect
to f (x) is defined as
A(n)
ǫ = (x 1 , x 2 , . . . , x n ) ∈ S n
:
1
− log f (x1 , x2 , . . . , xn ) − h(X) ≤ ǫ
n
(n)
2. Vol(Aǫ ) ≤ 2n(h(X)+ǫ) for all n.
(n)
3. Vol(Aǫ ) ≥ (1 − ǫ)2n(h(X)−ǫ) for n sufficiently large.
Therefore,
Z
h(X1 , X2 , . . . , Xn ) = − φ(x) loga φ(x)dx
Z
1 1
= φ(x) loga (2π)n |K| + (x − µ)t K−1 (x − µ) loga e dx
2 2
1 1
= loga (2π)n |K| + (loga e) E (x − µ)t K−1 (x − µ)
2 2 | {z }
=n
1 1
= loga (2π)n |K| + n loga e
2 2
1
= loga (2πe)n |K|
2
Proof. Denote
| | |
t
K = E[YY ] =
k1 k2 ... kn
| | |
and
at1
at2
K−1 =
.
.
.
atn
Now,
at1 at1 Y
at2
at Y
2
t −1 t
Y K Y=Y .
Y = (Y1 , Y2 , . . . , Yn )
..
.
. .
atn atn Y
= Y1 at1 Y + Y2 at2 Y + · · · + Yn atn Y
and
f (x, y)
Z
I(X; Y ) = f (x, y) log dxdy
f (x)f (y)
D(f ||g) ≥ 0
Corollary 2
n
X
h(X1 , X2 , . . . , Xn ) ≤ h(Xi )
i=1
Corollary 3 (Hadamard’s inequality) If K is the covariance matrix of a
multivariate normal distribution, then
n
Y
|K| ≤ Kii .
i=1