0% found this document useful (0 votes)
5 views

Lecture 3

Uploaded by

liuyuexiao0305
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture 3

Uploaded by

liuyuexiao0305
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Generalized entropy

Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Lecture 3 More properties of entropy and mutual


information

September 6th, 2022

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Outline

1 Generalized entropy

2 Fundamental inequality

3 Convex function and Jensen’s inequality

4 Convexity/Concavity of information measures

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Definition (Rényi entropy)


Given the parameter α > 0 with α ̸= 1, and given a discrete
random variable X with alphabet X and distribution PX , its Rényi
entropy of order α is given by
1 ∑
Hα = log( PX (x)α ).
1−α
x∈X

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Definition (Rényi divergence)


Given a parameter 0 < α < 1, and two discrete random variables
X and X̂ with common alphabet X and distribution PX and PX̂ ,
respectively, then the Rényi divergence of order α between X and
X̂ is given by
1 ∑
Dα (X∥X̂) = log( [PXα (x)P 1−α (x)]).
α−1 X̂
x∈X

This definition can be extended to α > 1 if PX̂ (x) > 0 for all
x ∈ X.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Lemma
When α → 1, we have the following:

lim Hα (X) = H(X)


α→1

and
lim Dα (X∥X̂) = D(X∥X̂).
α→1

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Fundamental inequality

Lemma (Fundamental inequality (FI))


For any x > 0 and D > 1, we have that

logD (x) ≤ logD (e) · (x − 1),

with equality if and only if x = 1.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Setting y = 1/x and using FI above directly that for any y > 0, we
also have that
1
logD (y) ≥ logD (e)(1 − ),
y
also with equality iff y = 1. In the above the base-D logarithm was
used. Specifically, for a logarithm with base-2, the above
inequalities become
1
log2 (e)(1 − ) ≤ log2 (x) ≤ log2 (e) · (x − 1),
x
with equality iff x = 1.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Information inequality

Theorem
Let X and X̂ be two random variables, with probability mass
functions PX and PX̂ . Then

D(X∥X̂) ≥ 0,

with equality if and only if PX (x) = PX̂ (x) for all x ∈ X , i.e., X
and X̂ have the same distribution.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Proof.
∑ PX (x)
D(X∥X̂) = PX (x) log2
x∈X
PX̂ (x)
∑ PX̂(x)
≥ (log2 e) PX (x)(1 − )
x∈X
P X (x)
∑ ∑
= (log2 e) PX (x) − PX̂ (x)
x∈X x∈X
= 0,

where the second step follows from FI, and the equality holds if
and only if for every x ∈ X ,
PX (x)
=1
PX̂ (x)

for all x ∈ X .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Corollary
For any two random variables X, Y ,

I(X; Y ) ≥ 0,

with equality if and only if X and Y are independent.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Corollary

D(p(y|x)∥q(y|x)) ≥ 0,
with equality if and only if p(y|x) = q(y|x) for all y and x such
that p(x) > 0.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Corollary

I(X; Y |Z) ≥ 0,
with equality if and only if X and Y are conditionally independent
given Z.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Upper bound on entropy

Theorem
If a random variable X takes values from a finite set X , then

H(X) ≤ log2 |X |,

where |X | is the size of the set X . Equality holds if and only if X


is equiprobable or uniformly distributed over X (i.e. PX (x) = |X1 |
for all x ∈ X ).

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Proof.

log2 |X | − H(X)
∑ ∑
= PX (x) · log2 |X | + PX (x) log2 PX (x)
x∈X x∈X

= PX (x) · log2 [|X | · PX (x)]
x∈X
∑ 1
≥ PX (x) · log2 (e)(1 − )
|X | · PX (x)
x∈X
∑ 1
= log2 (e) (PX (x) − )
|X |
x∈X
= log2 (e)(1 − 1) = 0.

with equality if and only if |X | · PX (x) = 1.


. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Intuitively, entropy tells us how random X is.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Intuitively, entropy tells us how random X is.


X is deterministic if and only if H(X) = 0.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Intuitively, entropy tells us how random X is.


X is deterministic if and only if H(X) = 0.
If X is uniform (equiprobable), H(X) is maximized and equal
to log2 |X |.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Theorem
H(X|Y ) ≤ H(X),
with equality if and only if X and Y are independent.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Theorem
Let X1 , X2 , · · · , Xn be drawn according to p(x1 , x2 , · · · , xn ).
Then

n
H(X1 , X2 , · · · , Xn ) ≤ H(Xi )
i=1

with equality if and only if the Xi are independent.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Theorem (Log-sum inequality)


For non-negative numbers a1 , a2 , · · · , an and b1 , b2 , · · · , bn ,


n ∑n ∑n
ai ai
ai log ≥( ai ) log ∑i=1
n .
bi i=1 bi
i=1 i=1
∑n
ai ∑i=1 ai
with equality if and only if bi = n , which is a constant that
i=1 bi
does not depend on i.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Outline

1 Generalized entropy

2 Fundamental inequality

3 Convex function and Jensen’s inequality

4 Convexity/Concavity of information measures

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Convex and concave function

Definition
A function f (x) is said to be convex over an interval (a, b) if for
every x1 , x2 ∈ (a, b) and 0 ≤ λ ≤ 1,

f (λx1 + (1 − λ)x2 ) ≤ λf (x1 ) + (1 − λ)f (x2 ).


A function f is said to be strictly convex if equality holds only if
λ = 0 or λ = 1.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Convex and concave function

Definition
A function f (x) is said to be convex over an interval (a, b) if for
every x1 , x2 ∈ (a, b) and 0 ≤ λ ≤ 1,

f (λx1 + (1 − λ)x2 ) ≤ λf (x1 ) + (1 − λ)f (x2 ).


A function f is said to be strictly convex if equality holds only if
λ = 0 or λ = 1.

Definition
A function f is concave if −f is convex.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

A function is convex if it always lies below any chord. A function is


concave if it always lies above chord.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Theorem
If the function f has a second derivative that is non-negative
(positive) over an interval, the function is convex (strictly convex)
over that interval.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Jensen’s inequality

Theorem
If f is a convex function and X is a random variable,

Ef (x) ≥ f (EX).

Moreover, if f is strictly convex, the above inequality implies that


X = EX with probability 1.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

All the inequalities in last section can be also proved using


Jensen’s inequality.

Let f be a strictly convex function, αi ≥ 0, and ni=1 αi = 1.
Jensen’s inequality states that


n ∑
n
αi f (ti ) ≥ f ( αi ti ).
i=1 i=1

Equality holds if and only if ti is a constant for all i.



To prove the log-sum inequality, set αi = bi / nj=1 bj ,
ti = ai /bi , and f (t) = t · logD (t), we obtain the desired result.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Outline

1 Generalized entropy

2 Fundamental inequality

3 Convex function and Jensen’s inequality

4 Convexity/Concavity of information measures

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Theorem
H(PX ) is a concave function of PX , namely

H(λPX + (1 − λ)PX̃ ) ≥ λH(PX ) + (1 − λ)H(PX̃ )

for all λ ∈ [0, 1].

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Theorem
Noting that I(X; Y ) can be written as I(PX , PY |X ), where
∑ ∑ PY |X (y|x)
I(PX , PY |X ) := x∈X y∈Y PY |X (y|x)PX (x) log2 ∑
PY |X (y|a)PX (a)
,
a∈X

then I(X; Y ) is a concave function of PX (for fixed PY |X , and a


convex function of PY |X (for fixed PX ).

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information


Generalized entropy
Fundamental inequality
Convex function and Jensen’s inequality
Convexity/Concavity of information measures

Theorem
D(PX ∥PX̂ ) is convex in pair (PX , PX̂ ), i.e., if (PX , PX̂ ) and
(QX , QX̂ ) are two pairs of probability mass functions, then

D(λPX + (1 − λ)QX ∥λPX̂ + (1 − λ)QX̂ )


≤ λ · D(PX ∥PX̂ ) + (1 − λ) · D(QX ∥QX̂ ),

for all λ ∈ [0, 1].

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Lecture 3 More properties of entropy and mutual information

You might also like