Lecture 3 - Mutual Information. Source Coding and Channel Coding 2
Lecture 3 - Mutual Information. Source Coding and Channel Coding 2
● Let us start with the definition of the relative entropy, which measures inefficiency of
assuming that a given distribution is q(x) when the true distribution is p(x).
● The relative entropy or Kullback-Leibler distance between two probability
distributions p(x) and q(x) is defined as
Relative entropy
● Important is that the relative entropy is always nonnegative and it is zero if and only
if p(x) = q(x).
● It is not a distance in the mathematical sense since it is not symmetric in its
parameters and it does not satisfy the triangle inequality. Nonetheless, it is often
useful to think of relative entropy as a “distance” between distributions.
Mutual information
● Consider two random variables X and Y with a joint probability mass function p(x, y)
and marginal probability mass functions p(x) and p(y). The mutual information I (X; Y
) is the relative entropy between the joint distribution and the product distribution
p(x)p(y):
Properties of mutual information
● Symmetrical property:
I(X;Y) = I(Y;X)
● Non-negative property:
I(x;y)≥0
Relationship between entropy and mutual information
Proof:
Relationship between entropy and mutual information
Proof:
Proof:
Thus the mutual information of a random variable with itself is the entropy of the random
variable. This is the reason that entropy is sometimes referred to as self-information.
Relationship between entropy and mutual information
● It can be shown from the definitions that the mutual information of (X, Y ) and Z is
the sum of the mutual information of X and Z and the conditional mutual information
of Y and Z given X. That is,
● Source coding
○ Loseless data compression
○ Lossy data compression
● Channel coding
● Prefix coding
● Huffman coding
● Shannon-Fano
● Lempel Ziv coding
● DM and adaptive DM (ADPCM)
Few Terms Related to Source Coding Process:
● Codeword Length
● Average Codeword Length
● Code Efficiency
● Code Redundancy
Codeword length
Let X be a DMS with finite entropy H (X) and an alphabet {𝑥1 … … . . 𝑥𝑚} with
corresponding probabilities of occurrence P(xi) (i = 1, …. , m). Let the binary codeword
assigned to symbol xi by the encoder have length ni, measured in bits. The length of the
codeword is the number of binary digits in the codeword.
Average codeword length
The parameter L represents the average number of bits per source symbol used in
the source coding process.
Code efficiency
Where Lmin is the minimum possible value of L. When approaches unity, the code is said
to be efficient.
Code redundancy
● Block coding
● Convolutional coding
● Turbo coding