Lecture 3
Lecture 3
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Outline
1 Generalized entropy
2 Fundamental inequality
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
This definition can be extended to α > 1 if PX̂ (x) > 0 for all
x ∈ X.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Lemma
When α → 1, we have the following:
and
lim Dα (X∥X̂) = D(X∥X̂).
α→1
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Fundamental inequality
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Setting y = 1/x and using FI above directly that for any y > 0, we
also have that
1
logD (y) ≥ logD (e)(1 − ),
y
also with equality iff y = 1. In the above the base-D logarithm was
used. Specifically, for a logarithm with base-2, the above
inequalities become
1
log2 (e)(1 − ) ≤ log2 (x) ≤ log2 (e) · (x − 1),
x
with equality iff x = 1.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Information inequality
Theorem
Let X and X̂ be two random variables, with probability mass
functions PX and PX̂ . Then
D(X∥X̂) ≥ 0,
with equality if and only if PX (x) = PX̂ (x) for all x ∈ X , i.e., X
and X̂ have the same distribution.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Proof.
∑ PX (x)
D(X∥X̂) = PX (x) log2
x∈X
PX̂ (x)
∑ PX̂(x)
≥ (log2 e) PX (x)(1 − )
x∈X
P X (x)
∑ ∑
= (log2 e) PX (x) − PX̂ (x)
x∈X x∈X
= 0,
where the second step follows from FI, and the equality holds if
and only if for every x ∈ X ,
PX (x)
=1
PX̂ (x)
for all x ∈ X .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Corollary
For any two random variables X, Y ,
I(X; Y ) ≥ 0,
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Corollary
D(p(y|x)∥q(y|x)) ≥ 0,
with equality if and only if p(y|x) = q(y|x) for all y and x such
that p(x) > 0.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Corollary
I(X; Y |Z) ≥ 0,
with equality if and only if X and Y are conditionally independent
given Z.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Theorem
If a random variable X takes values from a finite set X , then
H(X) ≤ log2 |X |,
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Proof.
log2 |X | − H(X)
∑ ∑
= PX (x) · log2 |X | + PX (x) log2 PX (x)
x∈X x∈X
∑
= PX (x) · log2 [|X | · PX (x)]
x∈X
∑ 1
≥ PX (x) · log2 (e)(1 − )
|X | · PX (x)
x∈X
∑ 1
= log2 (e) (PX (x) − )
|X |
x∈X
= log2 (e)(1 − 1) = 0.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Theorem
H(X|Y ) ≤ H(X),
with equality if and only if X and Y are independent.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Theorem
Let X1 , X2 , · · · , Xn be drawn according to p(x1 , x2 , · · · , xn ).
Then
∑
n
H(X1 , X2 , · · · , Xn ) ≤ H(Xi )
i=1
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
∑
n ∑n ∑n
ai ai
ai log ≥( ai ) log ∑i=1
n .
bi i=1 bi
i=1 i=1
∑n
ai ∑i=1 ai
with equality if and only if bi = n , which is a constant that
i=1 bi
does not depend on i.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Outline
1 Generalized entropy
2 Fundamental inequality
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Definition
A function f (x) is said to be convex over an interval (a, b) if for
every x1 , x2 ∈ (a, b) and 0 ≤ λ ≤ 1,
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Definition
A function f (x) is said to be convex over an interval (a, b) if for
every x1 , x2 ∈ (a, b) and 0 ≤ λ ≤ 1,
Definition
A function f is concave if −f is convex.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Theorem
If the function f has a second derivative that is non-negative
(positive) over an interval, the function is convex (strictly convex)
over that interval.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Jensen’s inequality
Theorem
If f is a convex function and X is a random variable,
Ef (x) ≥ f (EX).
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
∑
n ∑
n
αi f (ti ) ≥ f ( αi ti ).
i=1 i=1
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Outline
1 Generalized entropy
2 Fundamental inequality
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Theorem
H(PX ) is a concave function of PX , namely
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Theorem
Noting that I(X; Y ) can be written as I(PX , PY |X ), where
∑ ∑ PY |X (y|x)
I(PX , PY |X ) := x∈X y∈Y PY |X (y|x)PX (x) log2 ∑
PY |X (y|a)PX (a)
,
a∈X
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Theorem
D(PX ∥PX̂ ) is convex in pair (PX , PX̂ ), i.e., if (PX , PX̂ ) and
(QX , QX̂ ) are two pairs of probability mass functions, then
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .