0% found this document useful (0 votes)

50 views8 pages

Lecture 1: Entropy and Mutual Information: 2.1 Example

1. The document introduces concepts of entropy and mutual information from information theory. It discusses entropy as a measure of uncertainty for both discrete and continuous random variables. 2. Key properties of entropy are discussed, including non-negativity, chain rule, monotonicity under conditioning, and non-increasing under functions. Maximum entropy is also addressed. 3. Mutual information is defined as the reduction in uncertainty of one random variable due to knowledge of the other. It can be expressed in terms of entropies.

Uploaded by

Lokesh Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views8 pages

Lecture 1: Entropy and Mutual Information: 2.1 Example

Uploaded by

Lokesh Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Tufts University EE194 Network Information Theory

Electrical and Computer Engineering Prof. Mai Vu

Lecture 1: Entropy and mutual information

1 Introduction
Imagine two people Alice and Bob living in Toronto and Boston respectively. Alice (Toronto) goes
jogging whenever it is not snowing heavily. Bob (Boston) doesnt ever go jogging.
Notice that Alices actions give information about the weather in Toronto. Bobs actions give
no information. This is because Alices actions are random and correlated with the weather in
Toronto, whereas Bobs actions are deterministic.
How can we quantify the notion of information?

2 Entropy
Definition The entropy of a discrete random variable X with pmf pX (x) is
X
H(X) = p(x) log p(x) = E[ log(p(x)) ] (1)
x

The entropy measures the expected uncertainty in X. We also say that H(X) is approximately
equal to how much information we learn on average from one instance of the random variable X.
Note that the base of the algorithm is not important since changing the base only changes the
value of the entropy
P by a multiplicative constant.
P
Hb (X) = x p(x) logb p(x) = logb (a)[ x p(x) loga p(x)] = logb (a)Ha (X). Customarily, we use
the base 2 for the calculation of entropy.

2.1 Example
Suppose you have a random variable X such that:

0 with prob p
X= (2)
1 with prob 1 p,

then the entropy of X is given by

H(X) = p log p (1 p) log(1 p) = H(p) (3)

Note that the entropy does not depend on the values that the random variable takes (0 and 1
in this case), but only depends on the probability distribution p(x).

1
Tufts University EE194 Network Information Theory
Electrical and Computer Engineering Prof. Mai Vu

2.2 Two variables

Consider now two random variables X, Y jointly distributed according to the p.m.f p(x, y). We now
define the following two quantities.

Definition The joint entropy is given by

X
H(X, Y ) = p(x, y) log p(x, y). (4)
x,y

The joint entropy measures how much uncertainty there is in the two random variables X and Y
taken together.

Definition The conditional entropy of X given Y is

X
H(X|Y ) = p(x, y) log p(x|y) = E[ log(p(x|y)) ] (5)
x,y

The conditional entropy is a measure of how much uncertainty remains about the random variable
X when we know the value of Y .

2.3 Properties
The entropic quantities defined above have the following properties:

Non negativity: H(X) 0, entropy is always non-negative. H(X) = 0 iff X is deterministic.

Chain rule: We can decompose the joint entropy as follows:

n
X
H(X1 , X2 , . . . , Xn ) = H(Xi |X i1 ), (6)
i=1

where we use the notation X i1 = {X1 , X2 , . . . , Xi1 }.

For two variables, the chain rule becomes:

H(X, Y ) = H(X|Y ) + H(Y ) (7)

= H(Y |X) + H(X). (8)

Note that in general H(X|Y ) 6= H(Y |X).

Monotonicity: Conditioning always reduces entropy:

H(X|Y ) H(X). (9)

In other words information never hurts.

2
Tufts University EE194 Network Information Theory
Electrical and Computer Engineering Prof. Mai Vu

Maximum entropy: Let X be set from which the random variable X takes its values
(sometimes called the alphabet), then

H(X) log |X |. (10)

The above bound is achieved when X is uniformly distributed.

Non increasing under functions: Let X be a random variable and let g(X) be some
deterministic function of X. We have that:

H(X) H(g(X)), (11)

with equality iff g is invertible.

Proof: We will the two different expansions of the chain rule for two variables.

H(X, g(X)) = H(X, g(X)) (12)

H(X) + H(g(X)|X) = H(g(X)) + H(X|g(X)), (13)
| {z }
=0

so we have
H(X) H(g(X) = H(X|g(X)) 0. (14)
with equality if and only if we can deterministically guess X given g(X), which is only the
case if g is invertible.

3 Continuous random variables

Similarly to the discrete case we can define entropic quantities for continuous random variables.

Definition The differential entropy of a continuous random variable X with p.d.f f (x) is
Z
h(X) = f (x) log f (x)dx = E[ log(f (x)) ] (15)

Definition Consider a pair of continuous random variable (X, Y ) distributed according to the joint
p.d.f f (x, y). The joint entropy is given by
Z Z
h(X, Y ) = f (x, y) log f (x, y)dxdy, (16)

while the conditional entropy is

Z Z
h(X|Y ) = f (x, y) log f (x|y)dxdy. (17)

3
Tufts University EE194 Network Information Theory
Electrical and Computer Engineering Prof. Mai Vu

3.1 Properties
Some of the properties of the discrete random variables carry over to the continuous case, but some
do not. Let us go through the list again.

Non negativity doesnt hold: h(X) can be negative.

Example: Consider the R.V. X uniformly distributed on the interval [a, b]. The entropy is
given by Z b
1 1
h(X) = log dx = log(b a), (18)
a ba ba
which can be a negative quantity if b a is less than 1.
Chain rule holds for continuous variables:
h(X, Y ) = h(X|Y ) + h(Y ) (19)
= h(Y |X) + h(X). (20)

Monotonicity:
h(X|Y ) h(X) (21)
The proof follows from the non-negativity of mutual information (later).
Maximum entropy: We do not have a bound for general p.d.f functions f (x), but we do
have a formula for power-limited functions. Consider a R.V. X f (x), such that
Z
E[x ] = x2 f (x)dx P,
2
(22)

then
1
max h(X) = log(2eP ), (23)
2
and the maximum is achieved by X N (0, P ).
To verify this claim one can useR standard Lagrange multiplier Rtechniques from calculus to
solve the problem max h(f ) = f log f dx, subject to E[x2 ] = x2 f dx P .
Non increasing under functions: Doesnt necessarily hold since we cant guarantee
h(X|g(X)) 0.

4 Mutual information
Definition The mutual information between two discreet random variables X, Y jointly distributed
according to p(x, y) is given by
X p(x, y)
I(X; Y ) = p(x, y) log (24)
x,y
p(x)p(y)
= H(X) H(X|Y )
= H(Y ) H(Y |X)
= H(X) + H(Y ) H(X, Y ). (25)

4
Tufts University EE194 Network Information Theory
Electrical and Computer Engineering Prof. Mai Vu

We can also define the analogous quantity for continuous variables.

Definition The mutual information between two continuous random variables X, Y with joint p.d.f
f (x, y) is given by
Z Z
f (x, y)
I(X; Y ) = f (x, y) log dxdy. (26)
f (x)f (y)

For two variables it is possible to represent the different entropic quantities with an analogy
to set theory. In Figure 4 we see the different quantities, and how the mutual information is the
uncertainty that is common to both X and Y .

H(X) H(Y )

H(X|Y ) I(X : Y ) H(Y |X)

Figure 1: Graphical representation of the conditional entropy and the mutual information.

4.1 Non-negativity of mutual information

In this section we will show that
I(X; Y ) 0, (27)
and this is true for both the discrete and continuous cases.
Before we get to the proof, we have to introduce some preliminary concepts like Jensens in-
equality and the relative entropy.

Jensens inequality tells us something about the expected value of a random variable after
applying a convex function to it.
We say a function is convex on the interval [a, b] if, x1 , x2 [a, b] we have:

f (x1 + (1 )x2 ) f (x1 ) + (1 )f (x2 ). (28)

Another way stating the above is to say that the function always lies below the imaginary line
joining the points (x1 , f (x1 )) and (x2 , f (x2 )). For a twice-differentiable function f (x), convexity is
equivalent to the condition f 00 (x) 0, x [a, b].

5
Tufts University EE194 Network Information Theory
Electrical and Computer Engineering Prof. Mai Vu

Lemma Jensens inequality states that for any convex function f (x), we have

E[f (x)] f (E[x]). (29)

The proof can be found in [Cover & Thomas].

Note that an analogue of Jensens inequality exists for concave functions where the inequality
simply changes sign.

Relative entropy A very natural way to measure the distance between two probability distribu-
tions is the relative entropy, also sometimes called the Kullback-Leibler divergence.

Definition The relative entropy between two probability distributions p(x) and q(x) is given by
X p(x)
D(p(x)||q(x)) = p(x) log . (30)
x
q(x)

The reason why we are interested in the relative entropy in this section is because it is related
to the mutual information in the following way

I(X; Y ) = D(p(x, y)||p(x)p(y)). (31)

Thus, if we can show that the relative entropy is a non-negative quantity, we will have shown that
the mutual information is also non-negative.

Proof of non-negativity of relative entropy: Let p(x) and q(x) be two arbitrary probability distri-
butions. We calculate the relative entropy as follows:
X p(x)
D(p(x)||q(x)) = p(x) log
x
q(x)
X q(x)
= p(x) log
x
p(x)

q(x)
= E log
p(x)

q(x)
log E (by Jensens inequality for concave function log())
p(x)
!
X q(x)
= log p(x)
x
p(x)
!
X
= log q(x)
x
= 0.

6
Tufts University EE194 Network Information Theory
Electrical and Computer Engineering Prof. Mai Vu

4.2 Conditional mutual information

Definition Let X, Y, Z be jointly distributed according to some p.m.f. p(x, y, z). The conditional
mutual information between X, Y given Z is
X p(x, y|z)
I(X; Y |Z) = p(x, y, z) log (32)
x,y,z
p(x|z)p(y|z)
= H(X|Z) H(X|Y Z)
= H(XZ) + H(Y Z) H(XY Z) H(Z).

The conditional mutual information is a measure of how much uncertainty is shared by X and
Y , but not by Z.

4.3 Properties
Chain rule: We have the following chain rule
n
X
I(X ; Y1 Y2 . . . Yn ) = I(X; Yi |Y i1 ), (33)
i=1

where we have used again the shorthand notation Y i1 = {Y1 , Y2 , . . . , Yi1 }.

No monotonicity: Conditioning can either increase or decrease the mutual information
between two variables, so
I(X; Y |Z) I(X; Y ), and I(X; Y |Z) I(X; Y ). (34)

To illustrate the last point, consider the following two examples where conditioning has different
effects. In both cases we will make use of the following equation
I(X; Y Z) = I(X; Y Z)
I(X; Y ) + I(X; Z|Y ) = I(X; Z) + I(X; Y |Z). (35)

Increasing example: If we have some X, Y, Z such that I(X; Z) = 0 (which means X and Z
are independent variables), then equation (35) becomes:
I(X; Y ) + I(X; Z|Y ) = I(X; Y |Z), (36)
so I(X; Y |Z) I(X; Y ) = I(X; Z|Y ) 0, which implies
I(X; Y |Z) I(X; Y ). (37)

Decreasing example: On the other hand if we have a situation in which I(X; Z|Y ) = 0,
equation (35) becomes:
I(X; Y ) = I(X; Z) + I(X; Y |Z), (38)
which in implies that I(X; Y |Z) I(X; Y ).
So we see that conditioning of the mutual information can both increase or decrease it depending
on the situation.

7
Tufts University EE194 Network Information Theory
Electrical and Computer Engineering Prof. Mai Vu

5 Data processing inequality

For three variables X, Y, Z one situation which is of particular interest is when they form a Markov
chain: X Y Z. This relation is implies that the probability distribution p(x, z|y) =
p(x|y)p(z|y) which in turn implies that I(X; Z|Y ) = 0 like in the example above.
This situation often occurs when we have some input X that gets transformed by a channel to
give an output Y and then we want to apply some processing to obtain a signal Z as illustrated
below.
X Channel Y Processing Z

In this case we have the data processing inequality:

I(X; Z) I(X; Y ). (39)

In other words, processing cannot increase the information contained in a signal.

Chapter2 PDF
No ratings yet
Chapter2 PDF
22 pages
Lecture 3 - Entropy
No ratings yet
Lecture 3 - Entropy
35 pages
DifferentialEntropy Examples
No ratings yet
DifferentialEntropy Examples
17 pages
Slide 04
No ratings yet
Slide 04
16 pages
Information Theory: Info Rmatio N Types
No ratings yet
Information Theory: Info Rmatio N Types
52 pages
Lecture 1
No ratings yet
Lecture 1
211 pages
Lecture 3: Entropy, Relative Entropy, and Mutual Information
No ratings yet
Lecture 3: Entropy, Relative Entropy, and Mutual Information
5 pages
CoverThomas Ch2 PDF
No ratings yet
CoverThomas Ch2 PDF
38 pages
chapter16
No ratings yet
chapter16
71 pages
Entropy, Relative Entropy and Mutual Information
No ratings yet
Entropy, Relative Entropy and Mutual Information
38 pages
Chapter 2
No ratings yet
Chapter 2
68 pages
Entropy and Mutual Information
No ratings yet
Entropy and Mutual Information
63 pages
Lecture Note PDF
No ratings yet
Lecture Note PDF
373 pages
Entropy
No ratings yet
Entropy
21 pages
EE 231A: Information Theory: Rick Wesel Wesel@ee - Ucla.edu
No ratings yet
EE 231A: Information Theory: Rick Wesel Wesel@ee - Ucla.edu
16 pages
ITC-Post Mid1
No ratings yet
ITC-Post Mid1
36 pages
Tema 1 Awp
No ratings yet
Tema 1 Awp
32 pages
Lec7 InformationTheory
No ratings yet
Lec7 InformationTheory
41 pages
Elements of Information Theory-Chapter1-2
No ratings yet
Elements of Information Theory-Chapter1-2
63 pages
Differential Entropy: Peng-Hua Wang
No ratings yet
Differential Entropy: Peng-Hua Wang
24 pages
Lecture 5
No ratings yet
Lecture 5
42 pages
IT_w1
No ratings yet
IT_w1
20 pages
Lecture 2: Entropy and Mutual Information: 2.1 Example
No ratings yet
Lecture 2: Entropy and Mutual Information: 2.1 Example
8 pages
Lecture2
No ratings yet
Lecture2
19 pages
2 Entropy and Mutual Information: I (A) F (P (A) )
No ratings yet
2 Entropy and Mutual Information: I (A) F (P (A) )
27 pages
1.1 Shannon's Information Measures: Lecture 1 - January 26
No ratings yet
1.1 Shannon's Information Measures: Lecture 1 - January 26
5 pages
Information Theory and Coding (Lecture 2) : Dr. Farman Ullah
No ratings yet
Information Theory and Coding (Lecture 2) : Dr. Farman Ullah
36 pages
Lecture 8: Channel Capacity, Continuous Random Variables: 1.1 Examples
No ratings yet
Lecture 8: Channel Capacity, Continuous Random Variables: 1.1 Examples
6 pages
Analysing Causal Structures With Entropy: Department of Mathematics, University of York, Heslington, York, YO10 5DD, UK
No ratings yet
Analysing Causal Structures With Entropy: Department of Mathematics, University of York, Heslington, York, YO10 5DD, UK
25 pages
Information Theory Textbook
No ratings yet
Information Theory Textbook
14 pages
MIT16 36s09 Lec03
No ratings yet
MIT16 36s09 Lec03
10 pages
LECTURE 1: Introduction
No ratings yet
LECTURE 1: Introduction
16 pages
Entropy Handbook Definitions, Theorems, M-Files
No ratings yet
Entropy Handbook Definitions, Theorems, M-Files
22 pages
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
No ratings yet
ECE4007 Information Theory and Coding: DR - Sangeetha R.G
44 pages
L04
No ratings yet
L04
4 pages
Lecture_15
No ratings yet
Lecture_15
7 pages
Ejercicios Hidrogeo 21
No ratings yet
Ejercicios Hidrogeo 21
52 pages
Communication Theory and Coding: Basics
No ratings yet
Communication Theory and Coding: Basics
17 pages
Script War Mane
No ratings yet
Script War Mane
2 pages
Mutinf PDF
No ratings yet
Mutinf PDF
4 pages
Mutual Information
No ratings yet
Mutual Information
4 pages
Relative Entropy
No ratings yet
Relative Entropy
6 pages
Entropy and Mutual Information
No ratings yet
Entropy and Mutual Information
4 pages
IT-CO-1-EN
No ratings yet
IT-CO-1-EN
26 pages
ITC Module2 1
No ratings yet
ITC Module2 1
34 pages
ETN642-lec9_CH9 Differential Entropy
No ratings yet
ETN642-lec9_CH9 Differential Entropy
6 pages
Lecture 3: Entropy, Relative Entropy, and Mutual Information
No ratings yet
Lecture 3: Entropy, Relative Entropy, and Mutual Information
5 pages
Computer Science PDF
No ratings yet
Computer Science PDF
292 pages
Notes It
No ratings yet
Notes It
46 pages
Solutions To Problems Related To Information Theory
No ratings yet
Solutions To Problems Related To Information Theory
4 pages
L01
No ratings yet
L01
5 pages
Lecture 1: Introduction, Entropy and ML Estimation
No ratings yet
Lecture 1: Introduction, Entropy and ML Estimation
5 pages
Mutual Information
No ratings yet
Mutual Information
48 pages
1 Introduction To Information Theory
No ratings yet
1 Introduction To Information Theory
9 pages
Problem Set 1
No ratings yet
Problem Set 1
3 pages
Steam Tables - Superheated
No ratings yet
Steam Tables - Superheated
6 pages
Information Theory
No ratings yet
Information Theory
26 pages
Information Theory Entropy Relative Entropy
No ratings yet
Information Theory Entropy Relative Entropy
60 pages
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
No ratings yet
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
16 pages
Aohua 2800 Tower Endoscopy Detail Information
No ratings yet
Aohua 2800 Tower Endoscopy Detail Information
8 pages
Solar Tracker For Solar Panel PDF
No ratings yet
Solar Tracker For Solar Panel PDF
61 pages
Harga Ban Motor
92% (12)
Harga Ban Motor
10 pages
The Binary Entropy Function: ECE 7680 Lecture 2 - Definitions and Basic Facts
No ratings yet
The Binary Entropy Function: ECE 7680 Lecture 2 - Definitions and Basic Facts
8 pages
Project Report
No ratings yet
Project Report
14 pages
Detailed Advt. NIELIT - RECR 2018
No ratings yet
Detailed Advt. NIELIT - RECR 2018
14 pages
(M5-Technical) Suntay TS03
No ratings yet
(M5-Technical) Suntay TS03
20 pages
KG Web Products ENG
No ratings yet
KG Web Products ENG
7 pages
Enhancing Fish Tank VR
No ratings yet
Enhancing Fish Tank VR
9 pages
Gas Welding Defects
No ratings yet
Gas Welding Defects
8 pages
Chemtrails - The List of Patents For Stratosperic Arial Spraying Programs!
100% (5)
Chemtrails - The List of Patents For Stratosperic Arial Spraying Programs!
3 pages
(Some) Solutions For HW Set # 2
No ratings yet
(Some) Solutions For HW Set # 2
3 pages
Guideline For The Certification - of Wind Turbines Edition 2003
100% (1)
Guideline For The Certification - of Wind Turbines Edition 2003
324 pages
Farmweld A Crate Brochure PDF
No ratings yet
Farmweld A Crate Brochure PDF
2 pages
Soal Operasi Sitem
No ratings yet
Soal Operasi Sitem
2 pages
Voltage Stability Assessment Using Equivalent Nodal Analysis
No ratings yet
Voltage Stability Assessment Using Equivalent Nodal Analysis
10 pages
Digital Logic RTL & Verilog Interview Questions Preview
33% (6)
Digital Logic RTL & Verilog Interview Questions Preview
34 pages
Tool 5: T5 End Mill D 10: Part Operation.1
No ratings yet
Tool 5: T5 End Mill D 10: Part Operation.1
2 pages
Dow Corning TC-5022
No ratings yet
Dow Corning TC-5022
38 pages
1 Week: Phot Ogr Aphy Musi C TR Avel Sur PR I Si NG F R I Ends by I Nnovat I Ve PR Esent S
No ratings yet
1 Week: Phot Ogr Aphy Musi C TR Avel Sur PR I Si NG F R I Ends by I Nnovat I Ve PR Esent S
1 page
Pub Specs
No ratings yet
Pub Specs
98 pages
Lecture31 Dual Damascene
No ratings yet
Lecture31 Dual Damascene
2 pages
Notes On Power System Load Flow Analysis Using An Excel Workbook
No ratings yet
Notes On Power System Load Flow Analysis Using An Excel Workbook
20 pages
Syllabus ARB409 High Rise Buildings
No ratings yet
Syllabus ARB409 High Rise Buildings
2 pages
Komatsu Parts Numbering System: Call The Experts
100% (2)
Komatsu Parts Numbering System: Call The Experts
30 pages
TM-1802 AVEVA Everything3DGäó (2.1) Model Utilities Rev 2.0
100% (1)
TM-1802 AVEVA Everything3DGäó (2.1) Model Utilities Rev 2.0
100 pages
Valve Test Report
No ratings yet
Valve Test Report
3 pages
Project On Clap Switch
No ratings yet
Project On Clap Switch
22 pages
Despiehooj&Compflesor/Compressorparts/Vue Eclatee Du Compre ' "" ' K"C,, - ' ' J " ':, '
No ratings yet
Despiehooj&Compflesor/Compressorparts/Vue Eclatee Du Compre ' "" ' K"C,, - ' ' J " ':, '
3 pages
Helicopter Track and Balance Theory
No ratings yet
Helicopter Track and Balance Theory
7 pages
Benefits Shaft Alignment
No ratings yet
Benefits Shaft Alignment
4 pages
Mix Designs: #106, Mao Tse Tung BLVD, Khan Chamkamorn, Phnom Penh, Cambodia Tel: + (855-23) 958 958, M: + (855-67) 555 721
No ratings yet
Mix Designs: #106, Mao Tse Tung BLVD, Khan Chamkamorn, Phnom Penh, Cambodia Tel: + (855-23) 958 958, M: + (855-67) 555 721
1 page
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Elgenfunction Expansions Associated with Second Order Differential Equations
From Everand
Elgenfunction Expansions Associated with Second Order Differential Equations
E. C. Titchmarsh
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
3.5/5 (1)

Lecture 1: Entropy and Mutual Information: 2.1 Example

Uploaded by

Lecture 1: Entropy and Mutual Information: 2.1 Example

Uploaded by

Tufts University EE194 Network Information Theory

Electrical and Computer Engineering Prof. Mai Vu

Lecture 1: Entropy and mutual information

then the entropy of X is given by

H(X) = p log p (1 p) log(1 p) = H(p) (3)

2.2 Two variables

Definition The joint entropy is given by

Definition The conditional entropy of X given Y is

Non negativity: H(X) 0, entropy is always non-negative. H(X) = 0 iff X is deterministic.

Chain rule: We can decompose the joint entropy as follows:

where we use the notation X i1 = {X1 , X2 , . . . , Xi1 }.

H(X, Y ) = H(X|Y ) + H(Y ) (7)

Note that in general H(X|Y ) 6= H(Y |X).

Monotonicity: Conditioning always reduces entropy:

H(X|Y ) H(X). (9)

In other words information never hurts.

H(X) log |X |. (10)

The above bound is achieved when X is uniformly distributed.

H(X) H(g(X)), (11)

with equality iff g is invertible.

H(X, g(X)) = H(X, g(X)) (12)

3 Continuous random variables

while the conditional entropy is

Non negativity doesnt hold: h(X) can be negative.

We can also define the analogous quantity for continuous variables.

H(X|Y ) I(X : Y ) H(Y |X)

4.1 Non-negativity of mutual information

f (x1 + (1 )x2 ) f (x1 ) + (1 )f (x2 ). (28)

E[f (x)] f (E[x]). (29)

The proof can be found in [Cover & Thomas].

I(X; Y ) = D(p(x, y)||p(x)p(y)). (31)

4.2 Conditional mutual information

where we have used again the shorthand notation Y i1 = {Y1 , Y2 , . . . , Yi1 }.

5 Data processing inequality

In this case we have the data processing inequality:

I(X; Z) I(X; Y ). (39)

In other words, processing cannot increase the information contained in a signal.

You might also like