0% found this document useful (0 votes)

79 views26 pages

Joint & Conditional Entropy, Mutual Information: Application of Information Theory, Lecture 2

This document summarizes key concepts from Lecture 2 of an information theory course, including: 1) Joint entropy measures the uncertainty of jointly distributed random variables and satisfies the chain rule. 2) Conditional entropy measures the remaining uncertainty of one random variable given another, and is always less than or equal to the unconditional entropy. 3) Mutual information can be defined in terms of conditional and joint entropies, and captures how much knowing one variable reduces uncertainty about another.

Uploaded by

Geremu Tilahun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views26 pages

Joint & Conditional Entropy, Mutual Information: Application of Information Theory, Lecture 2

Uploaded by

Geremu Tilahun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Application of Information Theory, Lecture 2

Joint & Conditional Entropy, Mutual Information

Handout Mode

Iftach Haitner

Tel Aviv University.

Nov 4, 2014

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 1 / 26

Part I

Joint and Conditional Entropy

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 2 / 26

Joint entropy

▸ Recall that the entropy of rv X over X , is defined by

H(X ) = − ∑ PX (x) log PX (x)

x∈X
▸ Shorter notation: for X ∼ p, let H(X ) = − ∑x p(x) log p(x)
(where the summation is over the domain of X ).
▸ The joint entropy of (jointly distributed) rvs X and Y with (X , Y ) ∼ p, is

H(X , Y ) = − ∑ p(x, y ) log p(x, y )

x,y
This is simply the entropy of the rv Z = (X , Y ).
▸ Example:
1 1 1 1 1 1
Y
0 1 H(X , Y ) = − log 1 − log 1 − log 1
X
1 1
2 2
4 4
4 4
0 4 4
1 1 1 1
1 2
0 = ⋅1+ ⋅2=1
2 2 2

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 3 / 26

Joint entropy, cont.

▸ The joint entropy of (X1 , . . . , Xn ) ∼ p, is

H(X1 , . . . , Xn ) = − ∑ p(x1 , . . . , xn ) log p(x1 , . . . , xn )

x1 ,...,xn

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 4 / 26

Conditional entropy

▸ Let (X , Y ) ∼ p.
▸ For x ∈ Supp(X ), the random variable Y ∣X = x is well defined.
▸ The entropy of Y conditioned on X , is defined by
H(Y ∣X ) ∶= E H(Y ∣X = x) = E H(Y ∣X )
x←X X

▸ Measures the uncertainty in Y given X .

▸ Let pX & pY ∣X be the marginal & conational distributions induced by p.
H(Y ∣X ) = ∑ pX (x) ⋅ H(Y ∣X = x)
x∈X
= − ∑ pX (x) ∑ pY ∣X (y ∣x) log pY ∣X (y ∣x)
x∈X y ∈Y

= − ∑ p(x, y ) log pY ∣X (y ∣x)

x∈X ,y ∈Y

= − E log pY ∣X (Y ∣X )
(X ,Y )
= − E log Z
Z =pY ∣X (Y ∣X )

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 5 / 26

Conditional entropy, cont.

▸ Example Y
X 0 1
1 1
0 4 4
1
1 2
0

What is H(Y ∣X ) and H(X ∣Y )?

H(Y ∣X ) = E H(Y ∣X = x)
x←X
1 1
= H(Y ∣X = 0) + H(Y ∣X = 1)
2 2
1 1 1 1 1
= H( , ) + H(1, 0) = .
2 2 2 2 2

H(X ∣Y ) = E H(X ∣Y = y )
y ←Y
3 1
= H(X ∣Y = 0) + H(X ∣Y = 1)
4 4
3 1 2 1
= H( , ) + H(1, 0) = 0.6887≠H(Y ∣X ).
4 3 3 4
Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 6 / 26
Conditional entropy, cont..

H(X ∣Y , Z ) = E H(X ∣Y = y , Z = z)
(y ,z)←(Y ,Z )

= E E H(X ∣Y = y , Z = z)
y ←Y z←Z ∣Y =y

= E E H((X ∣Y = y )∣Z = z)
y ←Y z←Z ∣Y =y

= E H(Xy ∣Zy )
y ←Y

for (Xy , Zy ) = (X , Z )∣Y = y

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 7 / 26

Relating mutual entropy to conditional entropy

▸ What is the relation between H(X ), H(Y ), H(X , Y ) and H(Y ∣Y )?

▸ Intuitively, 0 ≤ H(Y ∣X ) ≤ H(Y )
Non-negativity is immediate. We prove upperbound later.
▸ H(Y ∣X ) = H(Y ) iff X and Y are independent.
▸ In our example, H(Y ) = H( 34 , 14 ) > 1
2
= H(Y ∣X )
▸ Note that H(Y ∣X = x) might be larger than H(Y ) for some x ∈ Supp(X ).
▸ Chain rule (proved next). H(X , Y ) = H(X ) + H(Y ∣X )
▸ Intuitively, uncertainty in (X , Y ) is the uncertainty in X plus the
uncertainty in Y given X .
▸ H(Y ∣X ) = H(X , Y ) − H(X ) is as an alternative definition for H(Y ∣X ).

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 8 / 26

Chain rule (for the entropy function)

Claim 1
For rvs X , Y , it holds that H(X , Y ) = H(X ) + H(Y ∣X ).

▸ Proof immediately follow by the grouping axiom:

Y
X Let qi = ∑nj=1 pi,j
P1,1 ... P1,n
⋮ ⋮ ⋮ H(P1,1 , . . . , Pn,n )
Pn,1 ... Pn,n Pi,1 Pi,n
= H(q1 , . . . , qn ) + ∑ qi H( ,..., )
qi qi
= H(X ) + H(Y ∣X ).
▸ Another proof. Let (X , Y ) ∼ p.
▸ p(x, y ) = pX (x) ⋅ pY ∣X (x∣y ).
Ô⇒ log p(x, y ) = log pX (x) + log pY ∣X (x∣y )
Ô⇒ E log p(X , Y ) = E log pX (X ) + E log pY ∣X (Y ∣X )
Ô⇒ H(X , Y ) = H(X ) + H(Y ∣X ).

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 9 / 26

H(Y ∣X ) ≤ H(Y )
Jensen inequality: for any concave function f , values t1 , . . . , tk and
λ1 , . . . , λk ∈ [0, 1] with ∑i λi = 1, it holds that ∑i λi f (ti ) ≤ f (∑i λi ti ).
Let (X , Y ) ∼ p.
H(Y ∣X ) = − ∑ p(x, y ) log pY ∣X (y ∣x)
x,y

pX (x)
= ∑ p(x, y ) log
x,y p(x, y )
p(x, y ) pX (x)
= ∑ pY (y ) ⋅ log
x,y pY (y ) p(x, y )
p(x, y ) pX (x)
= ∑ pY (y ) ∑ log
y x pY (y ) p(x, y )
p(x, y ) pX (x)
≤ ∑ pY (y ) log ∑
y x pY (y ) p(x, y )
1
= ∑ pY (y ) log = H(Y ).
y pY (y )

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 10 / 26

H(Y ∣X ) ≤ H(Y ) cont.

▸ Assume X and Y are independent (i.e., p(x, y ) = pX (x) ⋅ pY (y ) for any

x, y )
Ô⇒ pY ∣X = pY
Ô⇒ H(Y ∣X ) = H(Y )

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 11 / 26

Other inequalities

▸ H(X ), H(Y ) ≤ H(X , Y ) ≤ H(X ) + H(Y ).

Follows from H(X , Y ) = H(X ) + H(Y ∣X ).
▸ Left inequality since H(Y ∣X ) is non negative.
▸ Right inequality since H(Y ∣X ) ≤ H(Y ).
▸ H(X , Y ∣Z ) = H(X ∣Z ) + H(Y ∣X , Z ) (by chain rule)
▸ H(X ∣Y , Z ) ≤ H(X ∣Y )
Proof:
H(X ∣Y , Z ) = E H(X ∣ Y , Z )
Z ,Y
= E E H(X ∣ Y , Z )
Y Z ∣Y

= E E H((X ∣ Y ) ∣ Z )
Y Z ∣Y

≤ E E H(X ∣ Y )
Y Z ∣Y

= E H(X ∣ Y )
Y
= H(X ∣Y ).

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 12 / 26

Chain rule (for the entropy function), general case

Claim 2
For rvs X1 , . . . , Xk , it holds that
H(X1 , . . . , Xk ) = H(Xi ) + H(X2 ∣X1 ) + . . . + H(Xk ∣X1 , . . . , Xk −1 ).

Proof: ?
▸ Extremely useful property!
▸ Analogously to the two variables case, it also holds that:
▸ H(Xi ) ≤ H(X1 , . . . , Xk ) ≤ ∑i H(Xi )
▸ H(X1 , . . . , XK ∣Y ) ≤ ∑i H(Xi ∣Y )

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 13 / 26

Examples

▸ (from last class) Let X1 , . . . , Xn be Boolean iid with Xi ∼ ( 13 , 23 ).

Compute H(X1 , . . . , Xn )
▸ As above, but under the condition that ⊕i Xi = 0 ?
▸ Via chain rule?
▸ Via mapping?

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 14 / 26

Applications

▸ Let X1 , . . . , Xn be Boolean iids with Xi ∼ (p, 1 − p) and let X = X1 , . . . , Xn .

Let f be such that Pr [f (X ) = z] = Pr [f (X ) = z ′ ], for every k ∈ N and
z, z ′ ∈ {0, 1}k . Let K = ∣f (X )∣.
Prove that E K ≤ n ⋅ h(p).
▸
n ⋅ h(p) = H(X1 , . . . , Xn )
≥ H(f (X ), K )
= H(K ) + H(f (X ) ∣ K )
= H(K ) + E K
≥ EK

▸ Interpretation
▸ Positive results

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 15 / 26

Applications cont.

▸ How many comparisons it takes to sort n elements?

Let A be a sorter for n elements algorithm making t comparisons.
What can we say about t?
▸ Let X be a uniform random permutation of [n] and let Y1 , . . . , Yt be the
answers A gets when sorting X .
▸ X is determined by Y1 , . . . , Yt .
Namely, X = f (Y1 , . . . , Yt ) for some function f .
▸ H(X ) = log n!
▸
H(X ) = H(f (Y1 , . . . , Yn ))
≤ H(Y1 , . . . , Yn )
≤ ∑ H(Yi )
i
= t

Ô⇒ t ≥ log n! = Θ(n log n)

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 16 / 26

Concavity of entropy function
Let p = (p1 , . . . , pn ) and q = (q1 , . . . , qn ) be two distributions, and for λ ∈ [0, 1]
consider the distribution τλ = λp + (1 − λ)q.
(i.e., τλ = (λp1 + (1 − λ)q1 , . . . , λpn + (1 − λ)qn ).

Claim 3
H(τλ ) ≥ λH(p) + (1 − λ)H(q)

Proof:
▸ Let Y over {0, 1} be 1 wp λ
▸ Let X be distributed according to p if Y = 0 and according to q otherwise.
▸ H(τλ ) = H(X ) ≥ H(X ∣ Y ) = λH(p) + (1 − λ)H(q)
We are now certain that we drew the graph of the (two-dimensional) entropy
function right...

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 17 / 26

Part II

Mutual Information

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 18 / 26

Mutual information
▸ I(X ; Y ) — the “information" that X gives on Y
▸
I(X ; Y ) ∶= H(Y ) − H(Y ∣X )
= H(Y ) − (H(X , Y ) − H(X ))
= H(X ) + H(Y ) − H(X , y )
= I(Y ; X ).
▸ The mutual information that X gives about Y equals the mutual
information that Y gives about X .
▸ I(X ; X ) = H(X )
▸ I(X ; f (X )) = H(f (X )) (and smaller then H(X ) is f is non-injective)
▸ I(X ; Y , Z ) ≥ I(X ; Y ), I(X ; Z ) (since H(X ∣ Y , Z ) ≤ H(X ∣ Y ), H(X ∣ Z ))
▸ I(X ; Y ∣Z ) ∶= H(Y ∣Z ) − H(Y ∣X , Z )
▸ I(X ; Y ∣Z ) = I(Y ; X ∣Z ) (since I(X ′ ; Y ′ ) = I(Y ′ ; X ′ ))

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 19 / 26

Numerical example

▸ Example

Y
X 0 1
1 1
0 4 4
1
1 2
0

I(X ; Y ) = H(X ) − H(X ∣Y )

3 1
= 1 − ⋅ h( )
4 3
= I(Y ; X )
= H(Y ) − H(Y ∣X )
1 1 1
= h( ) − h( )
4 2 2

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 20 / 26

Chain rule for mutual information

Claim 4 (Chain rule for mutual information)

For rvs X1 , . . . , Xk , Y , it holds that
I(X1 , . . . , Xk ; Y ) = I(X ; Y ) + I(X2 ; Y ∣X1 ) + . . . + I(Xk ; Y ∣X1 , . . . , Xk −1 ).

Proof: ? HW

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 21 / 26

Examples

▸ Let X1 , . . . , Xn be iid with Xi ∼ (p, 1 − p), under the condition that ⊕i xi = 0.

Compute I(X1 , . . . , Xn−1 ; Xn ).
By chain rule

I(X1 , . . . , Xn−1 ; Xn )
= H(X1 ; Xn ) + H(X2 ; Xn ∣X1 ) + . . . + H(Xn−1 ; Xn ∣X1 , . . . , Xn−2 )
= 0 + 0 + . . . + 1 = 1.

▸ Let T and F be the top and front side, respectively, of a 6-sided fair dice.
Compute I(T ; F ).

I(T ; F ) = H(T ) − H(T ∣F )

= log 6 − log 4
= log 3 − 1.

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 22 / 26

Part III

Data processing

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 23 / 26

Data processing Inequality

Definition 5 (Markov Chain)

Rvs (X , Y , Z ) ∼ p form a Markov chain, denoted X → Y → Z , if
p(x, y , z) = pX (x) ⋅ pY ∣X (y ∣x) ⋅ pZ ∣Y (z∣y ), for all x, y , z.

Example: random walk on graph.

Claim 6
If X → Y → Z , then I(X ; Y ) ≥ I(X ; Z ).

▸ By Chain rule, I(X ; Y , Z ) = I(X ; Z ) + I(X ; Y ∣Z ) = I(X ; Y ) + I(X ; Z ∣Y ).

▸ I(X ; Z ∣Y ) = 0
▸ p
Z ∣Y =y = pZ ∣Y =y ,X =x for any x, y
▸
I(X ; Z ∣Y ) = H(Z ∣Y ) − H(Z ∣Y , X )
= E H(pZ ∣Y =y ) − E H(pZ ∣Y =y ,X =x )
Y Y ,X
= E H(pZ ∣Y =y ) − E H(pZ ∣Y =y ) = 0.
Y Y

▸ Since I(X ; Y ∣Z ) ≥ 0, we conclude I(X ; Y ) ≥ I(X ; Z ).

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 24 / 26
Fano’s Inequality

▸ How well can we guess X from Y ?

▸ Could with no error if H(X ∣Y ) = 0. What if H(X ∣Y ) is small?
Theorem 7 (Fano’s inequality)
For any rvs X and Y , and any (even random) g, it holds that

h(Pe ) + Pe log ∣X ∣ ≥ H(X ∣X̂ ) ≥ H(X ∣Y )

for X̂ = g(Y ) and Pe = Pr [X̂ ≠ X ].

▸ Note that Pe = 0 implies that H(X ∣Y ) = 0

▸ The inequality can be weekend to 1 + Pe log ∣X ∣ ≥ H(X ∣Y ),
H(X ∣Y )−1
▸ Alternatively, to Pe ≥ log∣X ∣
1
▸ Intuition for ∝ log∣X ∣

▸ We call X̂ an estimator for X (from Y ).

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 25 / 26

Proving Fano’s inequality
Let X and Y be rvs, let X̂ = g(Y ) and Pe = Pr [X̂ ≠ X ].
1, X̂ ≠ X
▸ Let E = {
0, X̂ = X .

H(E, X ∣X̂ ) = H(X ∣X̂ ) + H(E∣X , X̂ )

´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
=0

= H(E∣X̂ ) + H(X ∣E, X̂ )

´¹¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
≤H(E)=h(Pe ) ≤Pe log∣X ∣(?)

▸ It follows that h(Pe ) + Pe log ∣X ∣ ≥ H(X ∣X̂ )

▸ Since X → Y → X̂ , it holds that I(X ; Y ) ≥ I(X ; X̂ )
Ô⇒ H(X ∣X̂ ) ≥ H(X ∣Y )

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 26 / 26

(397 P. COMPLETE SOLUTIONS) Elements of Information Theory 2nd Edition - COMPLETE Solutions Manual (Chapters 1-17)
85% (55)
(397 P. COMPLETE SOLUTIONS) Elements of Information Theory 2nd Edition - COMPLETE Solutions Manual (Chapters 1-17)
397 pages
Leetcode 75 Questions (NeetCode On Yt) - Google Sheets
No ratings yet
Leetcode 75 Questions (NeetCode On Yt) - Google Sheets
1 page
Unit I Finite Automata 1. What Is Deductive Proof?
No ratings yet
Unit I Finite Automata 1. What Is Deductive Proof?
9 pages
Question Bank Module 1: An Overview of Java
No ratings yet
Question Bank Module 1: An Overview of Java
4 pages
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
No ratings yet
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
16 pages
Lecture Note PDF
No ratings yet
Lecture Note PDF
373 pages
Math7224 Notes
No ratings yet
Math7224 Notes
32 pages
Entropy and Mutual Information
No ratings yet
Entropy and Mutual Information
63 pages
Jour 2
No ratings yet
Jour 2
37 pages
Info
No ratings yet
Info
79 pages
Entropy 4
No ratings yet
Entropy 4
10 pages
Session 3
No ratings yet
Session 3
44 pages
02 Measure of Information
No ratings yet
02 Measure of Information
17 pages
Lecture 3
No ratings yet
Lecture 3
31 pages
Problem Set 1
No ratings yet
Problem Set 1
3 pages
Lect2 PDF
No ratings yet
Lect2 PDF
25 pages
1 Convexity/Concavity of Mutual Information: Lecturer: Mark Braverman Scribe: Abhishek Bhowmick
No ratings yet
1 Convexity/Concavity of Mutual Information: Lecturer: Mark Braverman Scribe: Abhishek Bhowmick
4 pages
Entropy
No ratings yet
Entropy
21 pages
Information Theory: Info Rmatio N Types
No ratings yet
Information Theory: Info Rmatio N Types
52 pages
G - To - Fayissaa Report
No ratings yet
G - To - Fayissaa Report
42 pages
EC401 M1-Information Theory & Coding-Ktustudents - in PDF
No ratings yet
EC401 M1-Information Theory & Coding-Ktustudents - in PDF
50 pages
Information and Entropy: Aria Nosratinia - Information Theory 2-1
No ratings yet
Information and Entropy: Aria Nosratinia - Information Theory 2-1
7 pages
E2 201: Information Theory (2019) Solutions To Homework 3
No ratings yet
E2 201: Information Theory (2019) Solutions To Homework 3
11 pages
Lecture 2: Gibb's, Data Processing and Fano's Inequalities: 2.1.1 Fundamental Limits in Information Theory
No ratings yet
Lecture 2: Gibb's, Data Processing and Fano's Inequalities: 2.1.1 Fundamental Limits in Information Theory
6 pages
An Introduction To Information Theory: Adrish Banerjee
No ratings yet
An Introduction To Information Theory: Adrish Banerjee
16 pages
Solved Problems
No ratings yet
Solved Problems
7 pages
Lecture 3 - Entropy
No ratings yet
Lecture 3 - Entropy
35 pages
Module14 InformationTheoryandEntropy
No ratings yet
Module14 InformationTheoryandEntropy
24 pages
CS340 Machine Learning Information Theory
No ratings yet
CS340 Machine Learning Information Theory
22 pages
EE 231A: Information Theory: Rick Wesel Wesel@ee - Ucla.edu
No ratings yet
EE 231A: Information Theory: Rick Wesel Wesel@ee - Ucla.edu
16 pages
Tema 1 Awp
No ratings yet
Tema 1 Awp
32 pages
Information Theoretic Inequalities
No ratings yet
Information Theoretic Inequalities
18 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
More Than One Answer Is Correct
No ratings yet
More Than One Answer Is Correct
182 pages
ZXCTN 6000 Product Datasheet
No ratings yet
ZXCTN 6000 Product Datasheet
7 pages
Information Theory and Coding (Lecture 2) : Dr. Farman Ullah
No ratings yet
Information Theory and Coding (Lecture 2) : Dr. Farman Ullah
36 pages
Lecture 3: Entropy, Relative Entropy, and Mutual Information
No ratings yet
Lecture 3: Entropy, Relative Entropy, and Mutual Information
5 pages
Introduction To Information Theory
No ratings yet
Introduction To Information Theory
20 pages
Understanding Basic Probability
No ratings yet
Understanding Basic Probability
7 pages
Chapter 2
No ratings yet
Chapter 2
68 pages
55 Ic PPT 5
No ratings yet
55 Ic PPT 5
27 pages
ITC Module2 1
No ratings yet
ITC Module2 1
34 pages
lời giải
No ratings yet
lời giải
52 pages
LECTURE 1: Introduction
No ratings yet
LECTURE 1: Introduction
16 pages
Relative Entropy
No ratings yet
Relative Entropy
6 pages
2 Entropy and Mutual Information: I (A) F (P (A) )
No ratings yet
2 Entropy and Mutual Information: I (A) F (P (A) )
27 pages
Graph Theory Notes PDF
No ratings yet
Graph Theory Notes PDF
81 pages
Slide 04
No ratings yet
Slide 04
16 pages
Mutual Information
No ratings yet
Mutual Information
48 pages
Lecture 3: Entropy, Relative Entropy, and Mutual Information
No ratings yet
Lecture 3: Entropy, Relative Entropy, and Mutual Information
5 pages
Probc 1
No ratings yet
Probc 1
4 pages
Lec7 InformationTheory
No ratings yet
Lec7 InformationTheory
41 pages
CSCI203 Spring 2010 Workshops Lab 7 (1) With Answers!
0% (1)
CSCI203 Spring 2010 Workshops Lab 7 (1) With Answers!
9 pages
Lecture 1: Introduction, Entropy and ML Estimation
No ratings yet
Lecture 1: Introduction, Entropy and ML Estimation
5 pages
Complete Arrow Puzzle Guide Eaux Rev2-10
No ratings yet
Complete Arrow Puzzle Guide Eaux Rev2-10
32 pages
Entropy, Relative Entropy and Mutual Information
No ratings yet
Entropy, Relative Entropy and Mutual Information
38 pages
Bhavesh Krishan Garg Cse2b-G1 (Lab-04)
No ratings yet
Bhavesh Krishan Garg Cse2b-G1 (Lab-04)
10 pages
The Binary Entropy Function: ECE 7680 Lecture 2 - Definitions and Basic Facts
No ratings yet
The Binary Entropy Function: ECE 7680 Lecture 2 - Definitions and Basic Facts
8 pages
Notes It
No ratings yet
Notes It
46 pages
2 Information Theory
No ratings yet
2 Information Theory
40 pages
Lecturer: Mark Braverman Scribe: Mark Braverman: COS597D: Information Theory in Computer Science
No ratings yet
Lecturer: Mark Braverman Scribe: Mark Braverman: COS597D: Information Theory in Computer Science
5 pages
CoverThomas Ch2 PDF
No ratings yet
CoverThomas Ch2 PDF
38 pages
Project 1 (Servers Plus) Edited
No ratings yet
Project 1 (Servers Plus) Edited
58 pages
Computational Complexity
No ratings yet
Computational Complexity
20 pages
Network Optimization Models: Maximum Flow Problems
No ratings yet
Network Optimization Models: Maximum Flow Problems
14 pages
Research Article Performance Analysis of Cooperative NOMA Systems With Incremental Relaying
No ratings yet
Research Article Performance Analysis of Cooperative NOMA Systems With Incremental Relaying
15 pages
DAA Question Bank
0% (1)
DAA Question Bank
20 pages
Entropy, Relative Entropy and Mutual Information
No ratings yet
Entropy, Relative Entropy and Mutual Information
4 pages
2022-2023 - SEM - 2 - Online B.Sc. CS-Batch 1 - BCS ZC219 - Discrete Mathematics - EC-3 - 03-03-2023
No ratings yet
2022-2023 - SEM - 2 - Online B.Sc. CS-Batch 1 - BCS ZC219 - Discrete Mathematics - EC-3 - 03-03-2023
2 pages
CSE 460 Lec 2 Review of Digital Electronics
No ratings yet
CSE 460 Lec 2 Review of Digital Electronics
21 pages
Information Theory Entropy Relative Entropy
No ratings yet
Information Theory Entropy Relative Entropy
60 pages
Online Voting Using Face Recognition and Password Based Security System
No ratings yet
Online Voting Using Face Recognition and Password Based Security System
9 pages
nREseAuZSd247oONvw1T9Q PolyMooc4GCoursera
No ratings yet
nREseAuZSd247oONvw1T9Q PolyMooc4GCoursera
177 pages
CH10 Computer Arithmetic
No ratings yet
CH10 Computer Arithmetic
55 pages
Preconditioning: Condition Number
No ratings yet
Preconditioning: Condition Number
5 pages
Greedy Algorithm LectueNote
No ratings yet
Greedy Algorithm LectueNote
33 pages
Project 1 (Servers Plus)
No ratings yet
Project 1 (Servers Plus)
58 pages
Idm Serial Number
No ratings yet
Idm Serial Number
1 page
ITUHandbookTelecomAdminData2020 E Rev1
No ratings yet
ITUHandbookTelecomAdminData2020 E Rev1
262 pages
The Impact of Power Allocation On Cooperative Non Orthogonal Multiple Access Networks With SWIPT
No ratings yet
The Impact of Power Allocation On Cooperative Non Orthogonal Multiple Access Networks With SWIPT
13 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
05.numerical Methods Chapter4 2 OpenMethod
No ratings yet
05.numerical Methods Chapter4 2 OpenMethod
32 pages
CSP '24-25 Unit 10 - Algorithms
No ratings yet
CSP '24-25 Unit 10 - Algorithms
107 pages
Digital Image Processing
No ratings yet
Digital Image Processing
2 pages
Short-Packet Communications For MIMO NOMA Systems Over Nakagami-M Fading: BLER and Minimum Blocklength Analysis
No ratings yet
Short-Packet Communications For MIMO NOMA Systems Over Nakagami-M Fading: BLER and Minimum Blocklength Analysis
12 pages
Nda A
No ratings yet
Nda A
6 pages
QB Dsa
No ratings yet
QB Dsa
5 pages
4.4 Lagrange Polynomials
No ratings yet
4.4 Lagrange Polynomials
16 pages
02 Introduction To Optimization
No ratings yet
02 Introduction To Optimization
14 pages
Daa Unit 2 2 Daa
No ratings yet
Daa Unit 2 2 Daa
24 pages
Comp Sheet2B
No ratings yet
Comp Sheet2B
7 pages
Week 4
No ratings yet
Week 4
15 pages
Nested Quantifiers: Maria Tamoor
No ratings yet
Nested Quantifiers: Maria Tamoor
20 pages
Queses:: Array and List Representation, Operations (Traversal, Insertion and Deletion)
No ratings yet
Queses:: Array and List Representation, Operations (Traversal, Insertion and Deletion)
18 pages
Lesson 4 - Reading Algorithms
No ratings yet
Lesson 4 - Reading Algorithms
8 pages
Section
No ratings yet
Section
2 pages
Solution
No ratings yet
Solution
12 pages
Normal Docement For Upload
No ratings yet
Normal Docement For Upload
2 pages
Spring2324 IE303 Recitation VI
No ratings yet
Spring2324 IE303 Recitation VI
3 pages
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
4.5/5 (2)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Transformation of Axes (Geometry) Mathematics Question Bank
From Everand
Transformation of Axes (Geometry) Mathematics Question Bank
Mohmmad Khaja Shareef
3/5 (1)

Joint & Conditional Entropy, Mutual Information: Application of Information Theory, Lecture 2

Uploaded by

Joint & Conditional Entropy, Mutual Information: Application of Information Theory, Lecture 2

Uploaded by

Application of Information Theory, Lecture 2

Joint & Conditional Entropy, Mutual Information

Tel Aviv University.

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 1 / 26

Joint and Conditional Entropy

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 2 / 26

▸ Recall that the entropy of rv X over X , is defined by

H(X ) = − ∑ PX (x) log PX (x)

H(X , Y ) = − ∑ p(x, y ) log p(x, y )

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 3 / 26

▸ The joint entropy of (X1 , . . . , Xn ) ∼ p, is

H(X1 , . . . , Xn ) = − ∑ p(x1 , . . . , xn ) log p(x1 , . . . , xn )

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 4 / 26

▸ Measures the uncertainty in Y given X .

= − ∑ p(x, y ) log pY ∣X (y ∣x)

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 5 / 26

What is H(Y ∣X ) and H(X ∣Y )?

for (Xy , Zy ) = (X , Z )∣Y = y

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 7 / 26

▸ What is the relation between H(X ), H(Y ), H(X , Y ) and H(Y ∣Y )?

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 8 / 26

▸ Proof immediately follow by the grouping axiom:

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 9 / 26

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 10 / 26

▸ Assume X and Y are independent (i.e., p(x, y ) = pX (x) ⋅ pY (y ) for any

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 11 / 26

▸ H(X ), H(Y ) ≤ H(X , Y ) ≤ H(X ) + H(Y ).

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 12 / 26

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 13 / 26

▸ (from last class) Let X1 , . . . , Xn be Boolean iid with Xi ∼ ( 13 , 23 ).

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 14 / 26

▸ Let X1 , . . . , Xn be Boolean iids with Xi ∼ (p, 1 − p) and let X = X1 , . . . , Xn .

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 15 / 26

▸ How many comparisons it takes to sort n elements?

Ô⇒ t ≥ log n! = Θ(n log n)

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 16 / 26

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 17 / 26

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 18 / 26

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 19 / 26

I(X ; Y ) = H(X ) − H(X ∣Y )

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 20 / 26

Claim 4 (Chain rule for mutual information)

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 21 / 26

▸ Let X1 , . . . , Xn be iid with Xi ∼ (p, 1 − p), under the condition that ⊕i xi = 0.

I(T ; F ) = H(T ) − H(T ∣F )

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 22 / 26

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 23 / 26

Definition 5 (Markov Chain)

Example: random walk on graph.

▸ By Chain rule, I(X ; Y , Z ) = I(X ; Z ) + I(X ; Y ∣Z ) = I(X ; Y ) + I(X ; Z ∣Y ).

▸ Since I(X ; Y ∣Z ) ≥ 0, we conclude I(X ; Y ) ≥ I(X ; Z ).

▸ How well can we guess X from Y ?

h(Pe ) + Pe log ∣X ∣ ≥ H(X ∣X̂ ) ≥ H(X ∣Y )

for X̂ = g(Y ) and Pe = Pr [X̂ ≠ X ].

▸ Note that Pe = 0 implies that H(X ∣Y ) = 0

▸ We call X̂ an estimator for X (from Y ).

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 25 / 26

H(E, X ∣X̂ ) = H(X ∣X̂ ) + H(E∣X , X̂ )

= H(E∣X̂ ) + H(X ∣E, X̂ )

▸ It follows that h(Pe ) + Pe log ∣X ∣ ≥ H(X ∣X̂ )

Iftach Haitner (TAU) Application of Information Theory, Lecture 2 Nov 4, 2014 26 / 26

You might also like