0% found this document useful (0 votes)
18 views19 pages

Lecture 2

The lecture covers the concepts of entropy and mutual information in information theory, emphasizing their definitions, properties, and relationships. Entropy quantifies uncertainty in random variables, while mutual information measures the information one variable contains about another. Key formulas and properties, such as joint entropy and conditional entropy, are discussed to illustrate these concepts.

Uploaded by

mennatalah777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views19 pages

Lecture 2

The lecture covers the concepts of entropy and mutual information in information theory, emphasizing their definitions, properties, and relationships. Entropy quantifies uncertainty in random variables, while mutual information measures the information one variable contains about another. Key formulas and properties, such as joint entropy and conditional entropy, are discussed to illustrate these concepts.

Uploaded by

mennatalah777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Lecture 2: Entropy and Mutual Information

 Entropy
 Mutual Information

Dr. Yao Xie, ECE587, Information Theory, Duke University


The winner is:
Eunsu Ryu, with number 6
10

0
0 10 20 30 40 50 60 70 80 90 100

A strategy to win the game?

Dr. Yao Xie, ECE587, Information Theory, Duke University 1


!"#$" "&'() *&+,

Dr. Yao Xie, ECE587, Information Theory, Duke University 2


Uncertainty measure

 Let X be a random variable taking on a nite number M of dierent


values x1,    , xM

 What is X: English letter in a le, last digit of Dow-Jones index, result


of coin tossing, password
M
 With probability p1,    , pM , pi > 0, i=1 pi =1

 Question: what is the uncertainty associated with X?

 Intuitively: a few properties that an uncertainty measure should satisfy

 It should not depend on the way we choose to label the alphabet

Dr. Yao Xie, ECE587, Information Theory, Duke University 3


Desired properties

 It is a function of p1,    , pM

 Let this uncertainty measure be

H(p1,    , pM )

 Monotonicity. Let f (M ) = H(1M,    , 1M ). If M < M ′, then

f (M ) < f (M ′)

Picking one person randomly from the classroom should result less
possibility than picking a person randomly from the US.

Dr. Yao Xie, ECE587, Information Theory, Duke University 4


 Additivity. Two independent RV X and Y , each uniformly distributed,
alphabet size M and L. The uncertainty for the pair (X, Y ), is M L.
However, due to independence, when X is revealed, the uncertainty in
Y should not be aected. This means

f (M L)  f (M ) = f (L)

 Grouping rule (Problem 2.27 in Text). Dividing the outcomes into two,
randomly choose one group, and then randomly pick an element from
one group, does not change the number of possible outcomes.

Dr. Yao Xie, ECE587, Information Theory, Duke University 5


Entropy

 The only function that satises the requirements is the entropy function

M

H(p1,    , pM ) =  pi log2 pi
i=1

 General denition of entropy



H(X) =  p(x) log2 p(x) bits
x

 0 log 0 = 0

Dr. Yao Xie, ECE587, Information Theory, Duke University 6


 Uncertainty in a single random variable

 Can also be written as:


{ }
1
H(X) = E log
p(X)

 Intuition: H = log(#of outcomes/states)

 Entropy is a functional of p(x)

 Entropy is a lower bound on the number of bits need to represent a RV.


E.g.: a RV that that has uniform distribution over 32 outcomes

Dr. Yao Xie, ECE587, Information Theory, Duke University 7


Properties of entropy
 H(X)  0

 Denition, for Bernoulli random variable, X = 1 w.p. p, X = 0 w.p.


1p
H(p) = p log p  (1  p) log(1  p)

– Concave
– Maximizes at p = 12
Example: how to ask questions?

Dr. Yao Xie, ECE587, Information Theory, Duke University 8


Joint entropy

 Extend the notion to a pair of discrete RVs (X, Y )

 Nothing new: can be considered as a single vector-valued RV

 Useful to measure dependence of two random variables


∑∑
H(X, Y ) =  p(x, y) log p(x, y)
x y

H(X, Y ) = E log p(X, Y )

Dr. Yao Xie, ECE587, Information Theory, Duke University 9


Conditional Entropy

 Conditional entropy: entropy of a RV given another RV. If


(X, Y )  p(x, y)

H(Y X) = p(x)H(Y X = x)
x

 Various ways of writing this

Dr. Yao Xie, ECE587, Information Theory, Duke University 10


Chain rule for entropy
Entropy of a pair of RVs = entropy of one + conditional entropy of the
other:
H(X, Y ) = H(X) + H(Y X)
Proof:

 H(Y X) ̸= H(XY )

 H(X)  H(XY ) = H(Y )  H(Y X)

Dr. Yao Xie, ECE587, Information Theory, Duke University 11


Relative entropy

 Measure of distance between two distributions


∑ p(x)
D(pq) = p(x) log
q(x)
x

 Also known as Kullback-Leibler distance in statistics: expected


log-likelihood ratio

 A measure of ineciency of assuming that distribution is q when the


true distribution is p

 If we use distribution is q to construct code, we need H(p) + D(pq)


bits on average to describe the RV

Dr. Yao Xie, ECE587, Information Theory, Duke University 12


Mutual information
 Measure of the amount of information that one RV contains about
another RV
∑∑ p(x, y)
I(X;Y ) = p(x, y) log = D(p(x, y)p(x)p(y))
p(x)p(y)
x y

 Reduction in the uncertainty of one random variable due to the


knowledge of the other

 Relationship between entropy and mutual information

I(X; Y ) = H(Y )  H(Y X)

Proof:

Dr. Yao Xie, ECE587, Information Theory, Duke University 13


 I(X; Y ) = H(Y )  H(Y X)

 H(X, Y ) = H(X) + H(Y X)  I(X; Y ) = H(X) + H(Y )  H(X, Y )

 I(X; X) = H(X)  H(XX) = H(X)


Entropy isself-information
Example: calculating mutual information

Dr. Yao Xie, ECE587, Information Theory, Duke University 14


Vien diagram
H ( X ,Y )

H ( X |Y ) I(X ;Y ) H ( Y |X )

H (X ) H (Y )

I(X; Y ) is the intersection of information in X with information in Y

Dr. Yao Xie, ECE587, Information Theory, Duke University 15


*+ ,-../ 0123

! # !# $
%&'( !"# !"!% !"&' !"&'
4+ 567853 9.: )*+
;<=8 57853: )*+ !"!% !"# !"&' !"&'

,&-./0 !"!% !"!% !"!% !"!%

1.23 !"( ) ) )

*+ >7:?=87- @!"'A !"(A !"#A !"#B

4+ >7:?=87- @!"(A !"(A !"(A !"(B


C@*B D E"( ,=0; C@4B D ' ,=0;

F.8/=G.87- 380:.21+ C@*H4B D !!"# ,=0;A C@4H*B D !&"# ,=0;


C@4H*B I C@*H4B
JK0K7- =89.:>7G.8+ L@*M 4B D C@*B N C@*H4B D )O&EP ,=0
Dr. Yao Xie, ECE587, Information Theory, Duke University 16
#$!

#$!

Dr. Yao Xie, ECE587, Information Theory, Duke University 17


Summary

-$./012
34.456 '$70/&580$

!" %"
! 9(!" )*) !$+ 9(%" )*) %&:!") *) !$+ 9(%" )*) %&+
!$ %&

,(!+ '(!" )*) !$! %") *) %&+

Dr. Yao Xie, ECE587, Information Theory, Duke University 18

You might also like