0% found this document useful (0 votes)

6 views

Lecture 5

Uploaded by

AKANKSHA GARG

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Lecture 5

Uploaded by

AKANKSHA GARG

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Lecture 5 - Information theory

Jan Bouda

FI MU

May 18, 2012

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42
Part I

Uncertainty and entropy

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 2 / 42
Uncertainty
Given a random experiment it is natural to ask how uncertain we are
about an outcome of the experiment.
Compare two experiments - tossing an unbiased coin and throwing a
fair six-sided dice. First experiment attains two outcomes and the
second experiment has six possible outcomes. Both experiments have
the uniform probability distribution. Our intuition says that we are
more uncertain about an outcome of the second experiment.
Let us compare tossing of an ideal coin and a binary message source
emitting 0 and 1 both with probability 1/2. Intuitively we should
expect that the uncertainty about an outcome of each of these
experiments is the same. Therefore the uncertainty should be based
only on the probability distribution and not on the concrete sample
space.
Therefore, the uncertainty about a particular random experiment can
be specified as a function of the probability distribution
{p1 , p2 , . . . , pn } and we will denote it as H(p1 , p2 , . . . , pn ).
Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 3 / 42
Uncertainty - requirements

1 Let us fix the number of outcomes of an experiment and compare the

uncertainty of different probability distributions. Natural requirement
is that the most uncertain is the experiment with the uniform
probability distribution, i.e. H(p1 , . . . pn ) is maximal for
p1 = · · · = pn = 1/n.
2 Permutation of probability distribution does not change the
uncertainty, i.e. for any permutation π : {1 . . . n} → {1 . . . n} it holds
that H(p1 , p2 , . . . , pn ) = H(pπ(1) , pπ(2) . . . , pπ(n) ).
3 Uncertainty should be nonnegative and equals to zero if and only if
we are sure about the outcome of the experiment.
H(p1 , p2 , . . . , pn ) ≥ 0 and it is equal if and only of pi = 1 for some i.
4 If we include into an experiment an outcome with zero probability, this
does not change our uncertainty, i.e. H(p1 , . . . , pn , 0) = H(p1 , . . . , pn )

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 4 / 42
Uncertainty - requirements

5 As justified before, having the uniform probability distribution on n

outcomes cannot be more uncertain than having the uniform
probability distribution on n + 1 outcomes, i.e.

n× (n+1)×
z }| { z }| {
H(1/n, . . . , 1/n) ≤ H(1/(n + 1), . . . , 1/(n + 1)).

6 H(p1 , . . . , pn ) is a continuous function of its parameters.

7 Uncertainty of an experiment consisting of a simultaneous throw of m
and n sided die is as uncertain as an independent throw of m and n
sided die implying
mn× m× n×
z }| { z }| { z }| {
H(1/(mn), . . . , 1/(mn)) = H(1/m, . . . , 1/m) + H(1/n, . . . , 1/n).

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 5 / 42
Entropy and uncertainty
8 Let us consider a random choice
Pm of one of n + m balls, m being red
and n being blue. LetP p = i=1 pi be the probability that a red ball
is chosen and q = m+n i=m+1 pi be the probability that a blue one is
chosen. Then the uncertainty which ball is chosen is the uncertainty
whether red of blue ball is chosen plus weighted uncertainty that a
particular ball is chosen provided blue/red ball was chosen. Formally,
H(p1 , . . . , pm , pm+1 , . . . , pm+n ) =
(1)

p1 pm pm+1 pm+n
=H(p, q) + pH ,..., + qH ,..., .
p p q q
It can be shown that any function satisfying Axioms 1 − 8 is of the form
m
X
H(p1 , . . . , pm ) = −(loga 2) pi log2 pi (2)
i=1
showing that the function is defined uniquely up to multiplication by a
constant, which effectively changes only the base of the logarithm.
Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 6 / 42
Entropy and uncertainty

Alternatively, we may show that the function H(p1 , . . . , pm ) is uniquely

specified through axioms
1 H(1/2, 1/2) = 1.
2 H(p, 1 − p) is a continuous function of p.
3 H(p1 , . . . , pm ) = H(p1 + p2 , p3 , . . . , pm ) + (p1 + p2 )H( p1p+p
1
, p2 )
2 p1 +p2
as in Eq. (2).

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 7 / 42
Entropy
The function H(p1 , . . . , pn ) we informally introduced is called the (Shannon)
entropy and, as justified above, it measures our uncertainty about an
outcome of an experiment.
Definition
Let X be a random variable with probability distribution p(x). Then the
(Shannon) entropy of the random variable X is defined as
X
H(X ) = − p(X = x) log P(X = x).
x∈Im(X )

In the definition we use the convention that 0 log 0 = 0, what is justified by

limx→0 x log x = 0. Alternatively, we may sum only over nonzero
probabilities.
As explained above, all required properties are independent of
multiplication by a constant what changes the base of the logarithm in the
definition of the entropy. Therefore, in the rest of this part we will use
logarithm without explicit base. In case we want to measure information in
bits, we should use logarithm base 2.
Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 8 / 42
Entropy
Let φ : R → R be a function. Let us recall P that the expectation of the
transformed random variable is E [φ(X )] = x∈Im(X ) φ(x)P(X = x).
Using this formalism we may write most of the information-theoretic
quantities. In particular, the entropy can be expressed as

1
H(X ) = E log ,
p(X )

where p(x) = P(X = x).

Lemma
H(X ) ≥ 0.

Proof.
0 < p(x) ≤ 1 implies log(1/p(x)) ≥ 0.

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 9 / 42
Part II

Joint and Conditional entropy

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 10 / 42
Joint entropy
In order to examine an entropy of more complex random experiments
described by correlated random variables we have to introduce the entropy
of a pair (or n–tuple) of random variables.

Definition
Let X and Y be random variables distributed according to the probability
distribution p(x, y ) = P(X = x, Y = y ). We define the joint (Shannon)
entropy of random variables X and Y as
X X
H(X , Y ) = − p(x, y ) log p(x, y ),
x∈Im(X ) y ∈Im(Y )

or, alternatively,

1
H(X , Y ) = −E [log p(X , Y )] = E .
log p(X , Y )

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 11 / 42
Conditional entropy

Important question is how uncertain we are about an outcome of a

random variable X given an outcome of a random variable Y . Naturally,
our uncertainty about an outcome of X given Y = y is
X
H(X |Y = y ) = − P(X = x|Y = y ) log P(X = x|Y = y ). (3)
x∈Im(X )

The uncertainty about an outcome of X given an (unspecified) outcome of

Y is naturally defined as a sum of equations (3) weighted according to
P(Y = y ), i.e.

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 12 / 42
Conditional Entropy

Definition
Let X and Y be random variables distributed according to the probability
distribution p(x, y ) = P(X = x, Y = y ). Let us denote
p(x|y ) = P(X = x|Y = y ). The conditional entropy of X given Y is
X
H(X |Y ) = p(y )H(X |Y = y ) =
y ∈Im(Y )
X X
=− p(y ) p(x|y ) log p(x|y ) =
y ∈Im(Y ) x∈Im(X ) (4)
X X
=− p(x, y ) log p(x|y )
x∈Im(X ) y ∈Im(Y )

= − E [log p(X |Y )].

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 13 / 42
Conditional Entropy

Using the previous definition we may raise the question how much
information we learn on average about X given an outcome of Y .
Naturally, we may interpret it as the decrease of our uncertainty about X
when we learn outcome of Y , i.e. H(X ) − H(X |Y ). Analogously, the
amount of information we obtain when we learn the outcome of X is
H(X ).

Theorem (Chain rule of conditional entropy)

H(X , Y ) = H(Y ) + H(X |Y ).

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 14 / 42
Chain rule of conditional entropy
Proof.
X X
H(X , Y ) = − p(x, y ) log p(x, y ) =
x∈Im(X ) y ∈Im(Y )
X X
=− p(x, y ) log[p(y )p(x|y )] =
x∈Im(X ) y ∈Im(Y )
X X
=− p(x, y ) log p(y ) − p(x, y ) log p(x|y ) =
(5)
x∈Im(X ) x∈Im(X )
y ∈Im(Y ) y ∈Im(Y )
X X
=− p(y ) log p(y ) − p(x, y ) log p(x|y ) =
y ∈Im(Y ) x∈Im(X )
y ∈Im(Y )

=H(Y ) + H(X |Y ).

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 15 / 42
Chain rule of conditional entropy

Proof.
Alternatively we may use log p(X , Y ) = log p(Y ) + log p(X |Y ) and take
the expectation on both sides to get the desired result.

Corollary (Conditioned chain rule)

H(X , Y |Z ) = H(Y |Z ) + H(X |Y , Z ).

Note that in general H(Y |X ) 6= H(X |Y ). On the other hand,

H(X ) − H(X |Y ) = H(Y ) − H(Y |X ) showing that information is
symmetric.

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 16 / 42
Part III

Relative Entropy and Mutual Information

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 17 / 42
Relative entropy
Let us start with the definition of the relative entropy, which measures
inefficiency of assuming that a given distribution is q(x) when the true
distribution is p(x).

Definition
The relative entropy or Kullback-Leibler distance between two
probability distributions p(x) and q(x) is defined as

X p(x) p(X )
D(pkq) = p(x) log = E log .
q(x) q(X )
x∈Im(X )

In the definition we use the convention that 0 log q0 = 0 and p log p0 = ∞.

Important is that the relative entropy is always nonnegative and it is zero
if and only if p(x) = q(x). It is not a distance in the mathematical sense
since it is not symmetric in its parameters and it does not satisfy the
triangle inequality.
Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 18 / 42
Mutual information
Mutual information measures information one random variable contains
about another random variable. It is the decrease of the uncertainty about
an outcome of a random variable given an outcome of another random
variable, as already discussed above.
Definition
Let X and Y be random variables distributed according to the probability
distribution p(x, y ). The mutual information I (X ; Y ) is the relative
entropy between the joint distribution and the product of marginal
distributions
X X p(x, y )
I (X ; Y ) = p(x, y ) log
p(x)p(y )
x∈Im(X ) y ∈Im(Y )
(6)
p(X , Y )
=D(p(x, y )kp(x)p(y )) = E log .
p(X )p(Y )

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 19 / 42
Mutual Information and Entropy
Theorem
I (X ; Y ) = H(X ) − H(X |Y ).

Proof.

X p(x, y ) X p(x|y )
I (X ; Y ) = p(x, y ) log == p(x, y ) log =
x,y
p(x)p(y ) x,y
p(x)
X X
=− p(x, y ) log p(x) + p(x, y ) log p(x|y ) =
x,y x,y (7)
!
X X
=− p(x) log p(x) − − p(x, y ) log p(x|y ) =
x,y x,y

=H(X ) − H(X |Y ).

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 20 / 42
Mutual information

From symmetry we get also I (X ; Y ) = H(Y ) − H(Y |X ). X says about Y

as much as Y says about X . Using H(X , Y ) = H(X ) + H(Y |X ) we get

Theorem
I (X ; Y ) = H(X ) + H(Y ) − H(X , Y ).

Note that I (X ; X ) = H(X ) − H(X |X ) = H(X ).

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 21 / 42
Part IV

Properties of Entropy and Mutual Information

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 22 / 42
General Chain Rule for Entropy
Theorem
Let X1 , X2 , . . . , Xn be random variables. Then
n
X
H(X1 , X2 , . . . , Xn ) = H(Xi |Xi−1 , . . . , X1 ).
i=1

Proof.
We use repeated application of the chain rule for a pair of random variables

H(X1 , X2 ) =H(X1 ) + H(X2 |X1 ),

H(X1 , X2 , X3 ) =H(X1 ) + H(X2 , X3 |X1 ) =
=H(X1 ) + H(X2 |X1 ) + H(X3 |X2 , X1 ), (8)
..
.

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 23 / 42
General Chain Rule for Entropy

Proof.

..
.
H(X1 , X2 , . . . , Xn ) =H(X1 ) + H(X2 |X1 ) + · · · + H(Xn |Xn−1 , . . . , X1 ) =
n
X
= H(Xi |Xi−1 , . . . , X1 ).
i=1

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 24 / 42
Conditional Mutual Information

Definition
The conditional mutual information between random variables X and Y
given Z is defined as

p(X , Y |Z )
I (X ; Y |Z ) = H(X |Z ) − H(X |Y , Z ) = E log ,
p(X |Z )p(Y |Z )

where the expectation is taken over p(x, y , z).

Theorem (Chain rule for mutual information)

Pn
I (X1 , X2 , . . . , Xn ; Y ) = i=1 I (Xi ; Y |Xi−1 , . . . , X1 )

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 25 / 42
Conditional Relative Entropy

The relative entropy between two joint distributions can be expanded as

the sum of a relative entropy and a conditional relative entropy.

Theorem (Chain rule for relative entropy)

D(p(x, y )kq(x, y )) = D(p(x)kq(x)) + D(p(y |x)kq(y |x)).

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 26 / 42
Chain Rule for Relative Entropy

Proof.

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 27 / 42
Part V

Information inequality

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 28 / 42
Information Inequality

Theorem (Information inequality)

Let p(x) and q(x), x ∈ X, be two probability distributions. Then

D(pkq) ≥ 0

with equality if and only if p(x) = q(x) for all x.

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 29 / 42
Information Inequality
Proof.
Let A = {x|p(x) > 0} be the support set of p(x). Then
X p(x)
−D(pkq) = − p(x) log =
q(x)
x∈A
X q(x)
= p(x) log ≤
p(x)
x∈A
(∗) X q(x) (10)
≤ log =
p(x)
p(x)
x∈A
X X
= log q(x) ≤ log q(x) =
x∈A x∈X
= log 1 = 0,

where (∗) follows from Jensen’s inequality.

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 30 / 42
Information Inequality
Proof.
Since log t is a strictly concave function (implying − log t is strictly convex)
of t, we have equality in (∗) if and only if q(x)/p(x) = 1 everywhere, i.e.
p(x) = q(x). Also, if p(x) = q(x) the second inequality also becomes
equality.

Corollary (Nonnegativity of mutual information)

For any two random variables X , Y

I (X ; Y ) ≥ 0

with equality if and only if X and Y are independent.

Proof.
I (X ; Y ) = D(p(x, y )kp(x)p(y )) ≥ 0 with equality if and only if
p(x, y ) = p(x)p(y ), i.e. X and Y are independent.
Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 31 / 42
Consequences of Information Inequality

Corollary
D(p(y |x)kq(y |x)) ≥ 0
with equality if and only if p(y |x) = q(y |x) for all y and x with p(x) > 0.

Corollary
I (X ; Y |Z ) ≥ 0
with equality if and only if X and Y are conditionally independent given Z .

Theorem
H(X ) ≤ log |Im(X )| with equality if and only if X has a uniform
distribution over Im(X ).

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 32 / 42
Consequences of Information Inequality

Proof.
Let u(x) = 1/|Im(X )| be a uniform probability distribution over Im(X )
and let p(x) be the probability distribution of X . Then
X p(x)
D(pku) = p(x) log =
u(x)
X X
=− p(x) log u(x) − − p(x) log p(x) = log |Im(X )| − H(X ).

Theorem (Conditioning reduces entropy)

H(X |Y ) ≤ H(X )
with equality if and only if X and Y are independent.

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 33 / 42
Consequences of Information Inequality

Proof.
0 ≤ I (X ; Y ) = H(X ) − H(X |Y ).

Previous theorem says that on average knowledge of a random variable Y

reduces our uncertainty about other random variable X . However, there
may exist y such that H(X |Y = y ) > H(X ).

Theorem (Independence bound on entropy)

Let X1 , X2 , . . . , Xn be drawn according to p(x1 , x2 , . . . , xn ). Then
n
X
H(X1 , X2 , . . . , Xn ) ≤ H(Xi )
i=1

with equality if and only if Xi ’s are mutually independent.

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 34 / 42
Consequences of Information Inequality

Proof.
We use the chain rule for entropy
n
X
H(X1 , X2 , . . . , Xn ) = H(Xi |Xi−1 , . . . , X1 )
i=1
n (11)
X
≤ H(Xi ),
i=1

where the inequality follows directly from the previous theorem. We have
equality if and only if Xi is independent of all Xi−1 , . . . , X1 .

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 35 / 42
Part VI

Log Sum Inequality and Its Applications

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 36 / 42
Log Sum Inequality

Theorem (Log sum inequality)

For a nonnegative numbers a1 , a2 , . . . , an and b1 , b2 , . . . , bn it holds that
n n
! Pn
X ai X ai
ai log ≥ ai log Pni=1
bi i=1 bi
i=1 i=1

with equality if and only if ai /bi = const.

In the theorem we used again the convention that 0 log 0 = 0,

a log(a/0) = ∞ if a > 0 and 0 log(0/0) = 0.

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 37 / 42
Log Sum Inequality

Proof.
Assume WLOG that ai > 0 and bi > 0. The function f (t) = t log t is
strictly convex since f 00 (t) = 1t log e > 0 for all positive t. We use the
Jensen’s inequality to get
!
X X
αi f (ti ) ≥ f αi ti
i i
P Pn
for αi ≥ 0, i αi = 1. Setting αi = bi / j=1 bj and ti = ai /bi we obtain
!
a ai a a
Pi Pi Pi
X X X
log ≥ log ,
i j bj bi
i j bj
i j bj

what is the desired result.

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 38 / 42
Consequences of Log Sum Inequality
Theorem
D(pkq) is convex in the pair (p, q), i.e. if (p1 , q1 ) and (p2 , q2 ) are two
pairs of probability distributions, then

D(λp1 + (1 − λ)p2 kλq1 + (1 − λ)q2 ) ≤ λD(p1 kq1 ) + (1 − λ)D(p2 kq2 )

for all 0 ≤ λ ≤ 1.

Theorem (Concavity of entropy)

H(p) is a concave function of p

Theorem
Let (X , Y ) ∼ p(x, y ) = p(x)p(y |x). The mutual information I (X ; Y ) is a
concave function of p(x) for fixed p(y |x) and a convex function of p(y |x)
for fixed p(x).
Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 39 / 42
Part VII

Data Processing inequality

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 40 / 42
Data Processing Inequality

Theorem
X → Y → Z is a Markov chain if and only if X and Z are independent
when conditioned by Y , i.e.

p(x, z|y ) = p(x|y )p(z|y ).

Note that X → Y → Z implies Z → Y → X . Also, if Z = f (Y ), then

X → Y → Z.
Theorem (Data processing inequality)
If X → Y → Z , then I (X ; Y ) ≥ I (X ; Z ).

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 41 / 42
Data Processing Inequality

Proof.
We expand mutual information using the chain rule in two different ways as

I (X ; Y , Z ) =I (X ; Z ) + I (X ; Y |Z )
(12)
=I (X ; Y ) + I (X ; Z |Y ).

Since X and Z are conditionally independent given Y we have

I (X ; Z |Y ) = 0. Since I (X ; Y |Z ) ≥ 0 we have

I (X ; Y ) ≥ I (X ; Z ).

Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 42 / 42

Estimating The Errors On Measured Entropy and Mutual Information
No ratings yet
Estimating The Errors On Measured Entropy and Mutual Information
10 pages
Information Theory and Coding (Lecture 2) : Dr. Farman Ullah
No ratings yet
Information Theory and Coding (Lecture 2) : Dr. Farman Ullah
36 pages
IT-CO-1-EN
No ratings yet
IT-CO-1-EN
26 pages
1.1 Shannon's Information Measures: Lecture 1 - January 26
No ratings yet
1.1 Shannon's Information Measures: Lecture 1 - January 26
5 pages
Lecture 1: Introduction, Entropy and ML Estimation
No ratings yet
Lecture 1: Introduction, Entropy and ML Estimation
5 pages
Shannon Entropy
No ratings yet
Shannon Entropy
11 pages
Slide 04
No ratings yet
Slide 04
16 pages
Lecture 3 - Entropy
No ratings yet
Lecture 3 - Entropy
35 pages
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
No ratings yet
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
44 pages
The Binary Entropy Function: ECE 7680 Lecture 2 - Definitions and Basic Facts
No ratings yet
The Binary Entropy Function: ECE 7680 Lecture 2 - Definitions and Basic Facts
8 pages
Quantum Information: Stephen M. Barnett
No ratings yet
Quantum Information: Stephen M. Barnett
60 pages
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
No ratings yet
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
16 pages
L01
No ratings yet
L01
5 pages
Information Theory Textbook
No ratings yet
Information Theory Textbook
14 pages
Entropy, Relative Entropy and Mutual Information
No ratings yet
Entropy, Relative Entropy and Mutual Information
38 pages
CoverThomas Ch2 PDF
No ratings yet
CoverThomas Ch2 PDF
38 pages
BEC503-DC-M3-Information Theory
No ratings yet
BEC503-DC-M3-Information Theory
100 pages
Entropy and Uncertainty
No ratings yet
Entropy and Uncertainty
15 pages
MIT16 36s09 Lec03
No ratings yet
MIT16 36s09 Lec03
10 pages
Lecture 1
No ratings yet
Lecture 1
211 pages
Entropy
No ratings yet
Entropy
21 pages
LECTURE 1: Introduction
No ratings yet
LECTURE 1: Introduction
16 pages
Chapter 2
No ratings yet
Chapter 2
68 pages
Lecture 3: Entropy, Relative Entropy, and Mutual Information
No ratings yet
Lecture 3: Entropy, Relative Entropy, and Mutual Information
5 pages
Information Theory: Info Rmatio N Types
No ratings yet
Information Theory: Info Rmatio N Types
52 pages
Uncertain We Are of The Outcome
No ratings yet
Uncertain We Are of The Outcome
14 pages
15359-2009-lecture25
No ratings yet
15359-2009-lecture25
11 pages
Mathematical Problems and Solutions On Information Theory
No ratings yet
Mathematical Problems and Solutions On Information Theory
28 pages
Probabilistic Methods in Information Theory
No ratings yet
Probabilistic Methods in Information Theory
48 pages
IT_w1
No ratings yet
IT_w1
20 pages
Info Theory Exercise Solutions
No ratings yet
Info Theory Exercise Solutions
16 pages
Information Theory/ Data Compression Ma 4211: J Urgen Bierbrauer February 28, 2007
No ratings yet
Information Theory/ Data Compression Ma 4211: J Urgen Bierbrauer February 28, 2007
78 pages
Exercise Problems: Information Theory and Coding
No ratings yet
Exercise Problems: Information Theory and Coding
6 pages
Ech 4
No ratings yet
Ech 4
39 pages
Lecture Note PDF
No ratings yet
Lecture Note PDF
373 pages
2 Information Measurement and Entropy
No ratings yet
2 Information Measurement and Entropy
23 pages
Notes08 Infotheory
No ratings yet
Notes08 Infotheory
7 pages
Ch5 Entropy and Information
No ratings yet
Ch5 Entropy and Information
77 pages
Tema 1 Awp
No ratings yet
Tema 1 Awp
32 pages
Lec38 - 210108071 - AKSHAY KUMAR JHA
No ratings yet
Lec38 - 210108071 - AKSHAY KUMAR JHA
12 pages
Chapshannon PDF
No ratings yet
Chapshannon PDF
8 pages
Communication Theory and Coding: Basics
No ratings yet
Communication Theory and Coding: Basics
17 pages
Paper Theory on Information Theory
No ratings yet
Paper Theory on Information Theory
15 pages
Module14 InformationTheoryandEntropy
No ratings yet
Module14 InformationTheoryandEntropy
24 pages
Lecture 2: Entropy and Mutual Information: 2.1 Example
No ratings yet
Lecture 2: Entropy and Mutual Information: 2.1 Example
8 pages
Lec7 InformationTheory
No ratings yet
Lec7 InformationTheory
41 pages
Information Theory
No ratings yet
Information Theory
18 pages
Entropy: Scott Sheffield
No ratings yet
Entropy: Scott Sheffield
57 pages
Lecture_15
No ratings yet
Lecture_15
7 pages
Probability & Information: Prof. J Bapat
No ratings yet
Probability & Information: Prof. J Bapat
20 pages
Presentation Math7952
No ratings yet
Presentation Math7952
29 pages
Chapter2 PDF
No ratings yet
Chapter2 PDF
22 pages
Entropy 4
No ratings yet
Entropy 4
10 pages
MIT18 440S14 Lecture34
No ratings yet
MIT18 440S14 Lecture34
19 pages
Statistics Part3 2013
No ratings yet
Statistics Part3 2013
25 pages
Lecture 3: Entropy, Relative Entropy, and Mutual Information
No ratings yet
Lecture 3: Entropy, Relative Entropy, and Mutual Information
5 pages
PMIT-6214: Information Coding: Instructor: M. Shamim Kaiser Email: Text Phone: 01511000555
No ratings yet
PMIT-6214: Information Coding: Instructor: M. Shamim Kaiser Email: Text Phone: 01511000555
76 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Week 03 Lecture Material
No ratings yet
Week 03 Lecture Material
104 pages
Level Control of Quadruple Tank System Using Particle Swarm Optimized Sosmc
No ratings yet
Level Control of Quadruple Tank System Using Particle Swarm Optimized Sosmc
1 page
Numerical Formula Sheet
No ratings yet
Numerical Formula Sheet
4 pages
Master Class Game Theory
No ratings yet
Master Class Game Theory
90 pages
Principles of The Self-Organizing System (Ashby)
100% (3)
Principles of The Self-Organizing System (Ashby)
25 pages
ML unit-5
No ratings yet
ML unit-5
14 pages
Markovian Projection Method
No ratings yet
Markovian Projection Method
22 pages
(Einstein 1928) New Possibility For A Unified Field Theory of Gravitation and Electricity
No ratings yet
(Einstein 1928) New Possibility For A Unified Field Theory of Gravitation and Electricity
3 pages
MATH136: Linear Algebra 1 Written Assignment 3 Solutions Winter 2023
No ratings yet
MATH136: Linear Algebra 1 Written Assignment 3 Solutions Winter 2023
8 pages
CS8451-Design and Analysis of Algorithms PDF
No ratings yet
CS8451-Design and Analysis of Algorithms PDF
18 pages
3is Quiz1 Kian
No ratings yet
3is Quiz1 Kian
1 page
Prac 303
No ratings yet
Prac 303
19 pages
LP Duality
No ratings yet
LP Duality
15 pages
Ge-Mm Midterm Tos
No ratings yet
Ge-Mm Midterm Tos
2 pages
Note For Signals and Systems
No ratings yet
Note For Signals and Systems
105 pages
optimization techniques previous QP
No ratings yet
optimization techniques previous QP
10 pages
Simultaneous Equation Systems Groups 5 and 6 Presentation
No ratings yet
Simultaneous Equation Systems Groups 5 and 6 Presentation
17 pages
Week 3
No ratings yet
Week 3
5 pages
Dai Unit 02
No ratings yet
Dai Unit 02
30 pages
Enabling Assessment Answers
No ratings yet
Enabling Assessment Answers
6 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
31 pages
The Multiplication Rules and Conditional Probabilit1
No ratings yet
The Multiplication Rules and Conditional Probabilit1
13 pages
Syllabus PDF
No ratings yet
Syllabus PDF
4 pages
Malik Raheel Ahmad (12349) DAA Assignment 3
No ratings yet
Malik Raheel Ahmad (12349) DAA Assignment 3
6 pages
Dsa Sheet - Dsa Series Sheet
No ratings yet
Dsa Sheet - Dsa Series Sheet
2 pages
Code Rumble Study Guide
No ratings yet
Code Rumble Study Guide
3 pages
Lab Assignment 11.1
No ratings yet
Lab Assignment 11.1
3 pages
DWM Assignment 2 (24-25 Odd)
No ratings yet
DWM Assignment 2 (24-25 Odd)
2 pages
Signal Coding Estimation Theory 2011
No ratings yet
Signal Coding Estimation Theory 2011
7 pages
Profiling Poverty Using Machine Learning: Sarvar Abdullaev
No ratings yet
Profiling Poverty Using Machine Learning: Sarvar Abdullaev
19 pages

Lecture 5

Uploaded by

Lecture 5

Uploaded by

Lecture 5 - Information theory

May 18, 2012

Uncertainty and entropy

1 Let us fix the number of outcomes of an experiment and compare the

5 As justified before, having the uniform probability distribution on n

6 H(p1 , . . . , pn ) is a continuous function of its parameters.

Alternatively, we may show that the function H(p1 , . . . , pm ) is uniquely

In the definition we use the convention that 0 log 0 = 0, what is justified by

where p(x) = P(X = x).

Joint and Conditional entropy

Important question is how uncertain we are about an outcome of a

The uncertainty about an outcome of X given an (unspecified) outcome of

= − E [log p(X |Y )].

Theorem (Chain rule of conditional entropy)

Corollary (Conditioned chain rule)

Note that in general H(Y |X ) 6= H(X |Y ). On the other hand,

Relative Entropy and Mutual Information

In the definition we use the convention that 0 log q0 = 0 and p log p0 = ∞.

From symmetry we get also I (X ; Y ) = H(Y ) − H(Y |X ). X says about Y

Note that I (X ; X ) = H(X ) − H(X |X ) = H(X ).

Properties of Entropy and Mutual Information

H(X1 , X2 ) =H(X1 ) + H(X2 |X1 ),

where the expectation is taken over p(x, y , z).

Theorem (Chain rule for mutual information)

The relative entropy between two joint distributions can be expanded as

Theorem (Chain rule for relative entropy)

Theorem (Information inequality)

with equality if and only if p(x) = q(x) for all x.

where (∗) follows from Jensen’s inequality.

Corollary (Nonnegativity of mutual information)

with equality if and only if X and Y are independent.

Theorem (Conditioning reduces entropy)

Previous theorem says that on average knowledge of a random variable Y

Theorem (Independence bound on entropy)

with equality if and only if Xi ’s are mutually independent.

Log Sum Inequality and Its Applications

Theorem (Log sum inequality)

with equality if and only if ai /bi = const.

In the theorem we used again the convention that 0 log 0 = 0,

what is the desired result.

D(λp1 + (1 − λ)p2 kλq1 + (1 − λ)q2 ) ≤ λD(p1 kq1 ) + (1 − λ)D(p2 kq2 )

Theorem (Concavity of entropy)

Data Processing inequality

p(x, z|y ) = p(x|y )p(z|y ).

Note that X → Y → Z implies Z → Y → X . Also, if Z = f (Y ), then

Since X and Z are conditionally independent given Y we have

You might also like