ITC_Module2
ITC_Module2
Module 2
Topics
Joint entropy,
Conditional entropy,
Mutual information,
Discrete memoryless channels - BSC, BEC, noise-free channel, Channel with
independent I/O,
Cascaded channels. Channel Capacity.
Kullback–Leibler divergence (Relative Entropy),
Cross-Entropy,
Jensen–Shannon divergence
Joint entropy
Joint Entropy is a measure of the uncertainty associated with a set of variables.
Joint Entropy of two discrete random variables X and Y with a joint distribution p(x, y) may be defined as:
𝑯 𝑿, 𝒀 = − 𝒑 𝒙, 𝒚 𝐥𝐨𝐠 𝒑(𝒙, 𝒚)
𝒙∈𝑿 𝒚∈𝒀
Properties:
1) The joint entropy of a set of random variables is a nonnegative number. i.e: H(X,Y) ≥ 0.
2) The joint entropy of a set of variables is greater than or equal to the maximum of all of the individual entropies
of the variables in the set. i.e: H(X,Y) ≥ max [H(X), H(Y)]
3) The joint entropy of a set of variables is less than or equal to the sum of the individual entropies of the variables
in the set. i.e: H(X,Y) ≤ H(X) + H(Y)
𝒑(𝒙, 𝒚)
𝑯 𝑿|𝒀 = − 𝒑 𝒙, 𝒚 𝐥𝐨𝐠 𝒑(𝒙|𝒚) 𝑯 𝑿|𝒀 = − 𝒑 𝒙, 𝒚 𝐥𝐨𝐠
𝒑(𝒚)
𝒙∈ 𝒚∈𝒀 𝒙∈ 𝒚∈𝒀
𝒑 𝒙, 𝒚 = 𝒑 𝒙 𝒚 . 𝒑 𝒚 = 𝒑 𝒚 𝒙 . 𝒑(𝒙)
Conditional entropy
Property:
1) H(Y|X) = 0 iff value of Y is completely determined by value of X.
2) H(Y|X) = H(Y) and H(X|Y) = H(X) iff Y and X are independent random variables.
3) chain rule of conditional entropy: H(Y|X) = H(X,Y) - H(X). H(X|Y) = H(X,Y) - H(Y)
4) H(Y|X) ≤ H(Y)
5) H(X,Y) = H(Y|X) + H (X|Y) + I(X;Y)
6) H(X,Y) = H(X) + H (Y) - I(X;Y)
Mutual Information
It is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of
information" obtained about one random variable through observing the other random variable.
Properties:
1) I(X;Y) = I(Y;X)
2) I(X;Y) ≥ 0
3) I(X;Y) = H(X) + H(Y) – H(X,Y)
Prove:
1) I(X;Y) = I(Y;X)
2) I(X;Y) ≥ 0
Noiseless Channels
A channel in which there are at least as many output symbols as input symbols, but in which each of
the output symbols can be produced by the occurrence only of a particular one of the input symbols is
called a noiseless channel. The channel matrix of a noiseless channel has the property that there is
one, and only one, non-zero element in each column.:
I(A;B) = H(A)
That is, the amount of information provided by the channel is the same as the information sent through the
channel.
Deterministic Channels
A channel in which there are at least as many input symbols as output symbols, but
in which each of the input symbols is capable of producing only one of the output
symbols is called a deterministic channel.
The channel matrix of a deterministic channel has the property that there is one,
and only one, non-zero element in each row, and since the entries along each row
must sum to 1, that non-zero element is equal to 1.
• There is only one nonzero element in each row and that the element is 1.
Deterministic Channels
Mutual Information through a deterministic channel
I(A;B) = H(B)
That is, the amount of information provided by the channel is the same as theinformation produced by the
channel output
Cascaded Channels
Output of channel AB is connected to the input of channel
BC.
That is, channels tend to leak information and the amount of information out
of a
cascade can be no greater (and is usually less) than the information from
the
If first BC is noiseless.
channel
channel.
I(A;B) = I(A;C)
Channel Capacity: Maximum Mutual Information
Consider an information channel with input alphabet A output alphabet B and channel matrix PAB with conditional
channel probabilities P(bj | ai). The mutual information:
The maximum amount of information carrying capacity for the channel is H(A), the amount of information that is
being transmitted through the channel. But this is reduced by H(A | B), which is an indication of the amount of “noise”
present in the channel.
The expression for mutual information depends not only on the channel probabilities, P(bj | ai). which uniquely identify
a channel, but also on how the channel is used, the input or source probability assignment P(ai).
As such I(A;B), cannot be used to provide a unique and comparative measure of the information carrying capacity of
a channel since it depends on how the channel is used.
One solution is to ensure that the same probability assignment (or input distribution) is used in calculating the mutual
information for different channels.
Channel Capacity: Maximum Mutual Information
The maximum average mutual information, I(A;B) in any single use of a channel defines the channel capacity.
Mathematically,
the channel capacity, , is defined as:
that is, the maximum mutual information over all possible input probability assignments.
Shannon–Hartley theorem
In information theory, the Shannon–Hartley theorem tells the maximum rate at which information can be transmitted
over a communications channel of a specified bandwidth in the presence of noise.
The theorem establishes Shannon's channel capacity for such a communication link, a bound on the maximum
amount of error-free information per time unit that can be transmitted with a specified bandwidth in the presence of
the noise interference, assuming that the signal power is bounded, and that the Gaussian noise process is
characterized by a known power or power spectral density.
The Shannon–Hartley theorem states: “the channel capacity C, meaning the theoretical tightest upper
bound on the information rate of data that can be communicated at an arbitrarily low error rate using an
average received signal power S through an analog communication channel subject to additive white
Gaussian noise (AWGN) of power N”:
𝑺
𝑪 = 𝑩 𝐥𝐨𝐠 𝟐 𝟏+
𝑵
Shannon–Hartley theorem
𝑺
C: is the channel capacity in bits per second, a theoretical upper bound on the net bit 𝑪 = 𝑩 𝐥𝐨𝐠 𝟐 𝟏+
rate (information rate, sometimes denoted I); 𝑵
S: is the average received signal power over the bandwidth (in case of a carrier-
modulated passband transmission, often denoted C), measured in watts (or volts
squared);
N: is the average power of the noise and interference over the bandwidth measured in
watts (or volts squared);
S/N: is the signal-to-noise ratio (SNR) or the carrier-to-noise ratio (CNR) of the
communication signal to the noise and interference at the receiver (expressed as a
linear power ratio, not as logarithmic decibels).
Examples:
𝑺
Q1: At a SNR of 0 dB (Signal power = Noise power) the Capacity in bits/s is equal to 𝑪 = 𝑩 𝐥𝐨𝐠 𝟐 𝟏+
the ……………………………in hertz. 𝑵
Q2: If the SNR is 20 dB, and the bandwidth available is 4 kHz, which is appropriate for
telephone communications, then C = ?
Pij = P(bj/ai)
P =[1 0; 0 1]
Noisy: Say the channel is noisy and introduces a bit inversion 1% of the time, then the channel matrix is
given by
Binary symmetric channel (BSC)
• In BSC the input to the channel will be the binary digits {0, 1}.
• Another effect that noise (or more usually, loss of signal) may
have is to prevent the receiver from deciding whether the
symbol was a 0 or a 1.
x2 y2
0. 8
BEC: Numerical
A Channel has following channel matrix
1−𝑝 𝑝 0
𝑃 𝑌𝑋 =
0 𝑝 1−𝑝
a) Draw the channel diagram
b) If the source has equally likely output, compute the probability associated with the channel
output for p = 0.2
BSC: Numerical
For the given BSC.
1) Show that the mutual information is given by
I(X;Y)=H(Y) + p log2p + (1-p)log2(1-p)
2) Calculate I(X;Y) for α = 0.5 and p= 0.1
3) Calculate I(X;Y) for α = 0.5 and p= 0.5, comment on results
1-p
P(x1 )= α y1
p
p
1-p y2
P(x2 )= 1-α
BSC: Numerical