0% found this document useful (0 votes)
18 views

ITC_Module2

Itc modules information and entropy pdf hdjejendrndnendnrnfnfnrndndndsnsndndndndm

Uploaded by

Revanth Narne
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

ITC_Module2

Itc modules information and entropy pdf hdjejendrndnendnrnfnfnrndndndsnsndndndndm

Uploaded by

Revanth Narne
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Information Theory and Coding

Module 2
Topics
Joint entropy,
Conditional entropy,
Mutual information,
Discrete memoryless channels - BSC, BEC, noise-free channel, Channel with
independent I/O,
Cascaded channels. Channel Capacity.
 Kullback–Leibler divergence (Relative Entropy),
 Cross-Entropy,
Jensen–Shannon divergence
Joint entropy
Joint Entropy is a measure of the uncertainty associated with a set of variables.

Joint Entropy of two discrete random variables X and Y with a joint distribution p(x, y) may be defined as:

𝑯 𝑿, 𝒀 = − ෍ ෍ 𝒑 𝒙, 𝒚 𝐥𝐨𝐠 𝒑(𝒙, 𝒚)
𝒙∈𝑿 𝒚∈𝒀
Properties:
1) The joint entropy of a set of random variables is a nonnegative number. i.e: H(X,Y) ≥ 0.
2) The joint entropy of a set of variables is greater than or equal to the maximum of all of the individual entropies
of the variables in the set. i.e: H(X,Y) ≥ max [H(X), H(Y)]
3) The joint entropy of a set of variables is less than or equal to the sum of the individual entropies of the variables
in the set. i.e: H(X,Y) ≤ H(X) + H(Y)

Relations to other entropy measures:


• Joint entropy is used in the definition of conditional entropy. H(X|Y) = H(X,Y) – H(Y)
• t is also used in the definition of mutual information. I(X;Y) = H(X) + H(Y) – H(X,Y)
Conditional entropy
• Conditional entropy quantifies the amount of information needed to describe the outcome of a random variable Y
given that the value of another random variable X is known .
• Entropy of Y given X is written as H(Y|X).
𝒑(𝒙, 𝒚)
𝑯 𝒀|𝑿 = − ෍ ෍ 𝒑 𝒙, 𝒚 𝐥𝐨𝐠 𝒑(𝒚|𝒙) 𝑯 𝒀|𝑿 = − ෍ ෍ 𝒑 𝒙, 𝒚 𝐥𝐨𝐠
𝒑(𝒙)
𝒙∈ 𝒚∈𝒀 𝒙∈ 𝒚∈𝒀

• Entropy of X given Y is written as H(X|Y).

𝒑(𝒙, 𝒚)
𝑯 𝑿|𝒀 = − ෍ ෍ 𝒑 𝒙, 𝒚 𝐥𝐨𝐠 𝒑(𝒙|𝒚) 𝑯 𝑿|𝒀 = − ෍ ෍ 𝒑 𝒙, 𝒚 𝐥𝐨𝐠
𝒑(𝒚)
𝒙∈ 𝒚∈𝒀 𝒙∈ 𝒚∈𝒀

• NOTE (Bayes’ Theorem:):

𝒑 𝒙, 𝒚 = 𝒑 𝒙 𝒚 . 𝒑 𝒚 = 𝒑 𝒚 𝒙 . 𝒑(𝒙)
Conditional entropy
Property:
1) H(Y|X) = 0 iff value of Y is completely determined by value of X.
2) H(Y|X) = H(Y) and H(X|Y) = H(X) iff Y and X are independent random variables.
3) chain rule of conditional entropy: H(Y|X) = H(X,Y) - H(X). H(X|Y) = H(X,Y) - H(Y)
4) H(Y|X) ≤ H(Y)
5) H(X,Y) = H(Y|X) + H (X|Y) + I(X;Y)
6) H(X,Y) = H(X) + H (Y) - I(X;Y)
Mutual Information

It is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of
information" obtained about one random variable through observing the other random variable.

I(X;Y) = H(X) – H(X|Y) = H(Y) – H(Y|X) b/symbol

Properties:
1) I(X;Y) = I(Y;X)
2) I(X;Y) ≥ 0
3) I(X;Y) = H(X) + H(Y) – H(X,Y)

Prove:
1) I(X;Y) = I(Y;X)
2) I(X;Y) ≥ 0
Noiseless Channels
A channel in which there are at least as many output symbols as input symbols, but in which each of
the output symbols can be produced by the occurrence only of a particular one of the input symbols is
called a noiseless channel. The channel matrix of a noiseless channel has the property that there is
one, and only one, non-zero element in each column.:

• we know, with certainty 1, which input symbol, was transmitted given


knowledge of the received output symbol.
• there is only one non-zero element in each column.
Noiseless Channels
Mutual Information through a noiseless channel:

The mutual information for noiseless channels is given by

I(A;B) = H(A)

That is, the amount of information provided by the channel is the same as the information sent through the
channel.
Deterministic Channels
A channel in which there are at least as many input symbols as output symbols, but
in which each of the input symbols is capable of producing only one of the output
symbols is called a deterministic channel.

The channel matrix of a deterministic channel has the property that there is one,
and only one, non-zero element in each row, and since the entries along each row
must sum to 1, that non-zero element is equal to 1.

• There is only one nonzero element in each row and that the element is 1.
Deterministic Channels
Mutual Information through a deterministic channel

The mutual information for noiseless channels is given by

I(A;B) = H(B)

That is, the amount of information provided by the channel is the same as theinformation produced by the
channel output
Cascaded Channels
Output of channel AB is connected to the input of channel
BC.

Say the input symbol ai is transmitted through channel AB,


and this produces bj as the output from channel AB.
Then bj forms the input to channel BC which,
in turn, produces ck as the output from channel BC.

The output ck depends solely on bj not on ai .

The mutual information for noiseless channels is given by

I(A;B) ≥ I(A;C) wih equality iff P(a|c) =P(a|b), when P(b,c)≠ 0

That is, channels tend to leak information and the amount of information out
of a
cascade can be no greater (and is usually less) than the information from
the
If first BC is noiseless.
channel
channel.
I(A;B) = I(A;C)
Channel Capacity: Maximum Mutual Information

Consider an information channel with input alphabet A output alphabet B and channel matrix PAB with conditional
channel probabilities P(bj | ai). The mutual information:

The maximum amount of information carrying capacity for the channel is H(A), the amount of information that is
being transmitted through the channel. But this is reduced by H(A | B), which is an indication of the amount of “noise”
present in the channel.

The expression for mutual information depends not only on the channel probabilities, P(bj | ai). which uniquely identify
a channel, but also on how the channel is used, the input or source probability assignment P(ai).

As such I(A;B), cannot be used to provide a unique and comparative measure of the information carrying capacity of
a channel since it depends on how the channel is used.

One solution is to ensure that the same probability assignment (or input distribution) is used in calculating the mutual
information for different channels.
Channel Capacity: Maximum Mutual Information

The maximum average mutual information, I(A;B) in any single use of a channel defines the channel capacity.
Mathematically,
the channel capacity, , is defined as:

that is, the maximum mutual information over all possible input probability assignments.
Shannon–Hartley theorem

In information theory, the Shannon–Hartley theorem tells the maximum rate at which information can be transmitted
over a communications channel of a specified bandwidth in the presence of noise.

The theorem establishes Shannon's channel capacity for such a communication link, a bound on the maximum
amount of error-free information per time unit that can be transmitted with a specified bandwidth in the presence of
the noise interference, assuming that the signal power is bounded, and that the Gaussian noise process is
characterized by a known power or power spectral density.

The Shannon–Hartley theorem states: “the channel capacity C, meaning the theoretical tightest upper
bound on the information rate of data that can be communicated at an arbitrarily low error rate using an
average received signal power S through an analog communication channel subject to additive white
Gaussian noise (AWGN) of power N”:

𝑺
𝑪 = 𝑩 𝐥𝐨𝐠 𝟐 𝟏+
𝑵
Shannon–Hartley theorem
𝑺
C: is the channel capacity in bits per second, a theoretical upper bound on the net bit 𝑪 = 𝑩 𝐥𝐨𝐠 𝟐 𝟏+
rate (information rate, sometimes denoted I); 𝑵

B: is the bandwidth of the channel in hertz (passband bandwidth in case of a


bandpass signal);

S: is the average received signal power over the bandwidth (in case of a carrier-
modulated passband transmission, often denoted C), measured in watts (or volts
squared);

N: is the average power of the noise and interference over the bandwidth measured in
watts (or volts squared);

S/N: is the signal-to-noise ratio (SNR) or the carrier-to-noise ratio (CNR) of the
communication signal to the noise and interference at the receiver (expressed as a
linear power ratio, not as logarithmic decibels).
Examples:
𝑺
Q1: At a SNR of 0 dB (Signal power = Noise power) the Capacity in bits/s is equal to 𝑪 = 𝑩 𝐥𝐨𝐠 𝟐 𝟏+
the ……………………………in hertz. 𝑵

Q2: If the SNR is 20 dB, and the bandwidth available is 4 kHz, which is appropriate for
telephone communications, then C = ?

Q3: If the requirement is to transmit at 50 kbit/s, and a bandwidth of 10 kHz is used,


then the minimum S/N required is ?
Discrete Memoryless Channel (DMC)
• DMC is a statistical model with an input A and an output B.
• Channel accept an input symbol A and respond as an output B.
• Channel is ‘discrete’ in the sense: Number of symbols in A and
B are finite.
• Memoryless: Current output only depends on current input not on
previous inputs.
• P(ai): is assumed to be known.
• Each possible input output path can be represented by a
conditional probability.: P(bj | ai)-----Channel Transmission
Probability.
• The conditional probabilities that describe an information channel
can be represented conveniently using a matrix representation:
Discrete Memoryless Channel (DMC)
P is the channel matrix and for notational convenience we may sometimes rewrite this as

Pij = P(bj/ai)

The channel matrix exhibits the following properties and structure:


• Each row of P contains the probabilities of all possible outputs from the same input to the
channel.
• Each column of P contains the probabilities of all possible inputs to a particular output from the
channel.
• If we transmit the symbol ai we must receive an output symbol with probability 1, that is:

that is, the probability terms in each row must sum to 1.


Discrete Memoryless Channel (DMC)
Noiseless: If the channel is noiseless there will be no error in transmission, the channel matrix is
given by

P =[1 0; 0 1]

Noisy: Say the channel is noisy and introduces a bit inversion 1% of the time, then the channel matrix is
given by
Binary symmetric channel (BSC)

• In BSC the input to the channel will be the binary digits {0, 1}.

• channel is assumed memoryless.

• Ideally if there is no noise a transmitted 0 is detected by the


receiver as a 0, and a transmitted 1 is detected by the receiver as a
1.

• The most common effect of noise is to force the detector to detect


the wrong bit (bit inversion), that is, a 0 is detected as a 1, and a 1
is detected as a 0.
p = 1-q
q = is the probability of error (also called bit error probability,
bit error rate (BER), or “crossover” probability)

• The BSC is an important channel for digital communication systems


as noise present in physical transmission media (fibre optic cable,
copper wire, etc.) typically causes bit inversion errors in the
receiver.
Binary erasure channel (BEC)

• Another effect that noise (or more usually, loss of signal) may
have is to prevent the receiver from deciding whether the
symbol was a 0 or a 1.

• In this case the output alphabet includes an additional symbol,


? called the “erasure” symbol that denotes a bit that was not
able to be detected.

• Strictly speaking a BEC does not model the effect of bit


inversion; thus a transmitted bit is either received correctly
(probability = p) or is received as an “erasure” (probability = q
= 1-p ).

• A BEC is becoming an increasingly important model for


wireless mobile and satellite communication channels, which
suffer mainly from dropouts and loss of signal leading to the
receiver failing to detect any signal.
BSC: Numerical
For the given binary channel: 0. 9
a) Find the channel matrix. x1 y1
b) Find P(y1) and P(y2) if P(x1)=P(x2)=0.5
c) Find the joint probability P(x1, y2), P(x2,y1) when P(x1)=P(x2)=0.5

x2 y2
0. 8
BEC: Numerical
A Channel has following channel matrix
1−𝑝 𝑝 0
𝑃 𝑌𝑋 =
0 𝑝 1−𝑝
a) Draw the channel diagram
b) If the source has equally likely output, compute the probability associated with the channel
output for p = 0.2
BSC: Numerical
For the given BSC.
1) Show that the mutual information is given by
I(X;Y)=H(Y) + p log2p + (1-p)log2(1-p)
2) Calculate I(X;Y) for α = 0.5 and p= 0.1
3) Calculate I(X;Y) for α = 0.5 and p= 0.5, comment on results

1-p
P(x1 )= α y1
p

p
1-p y2
P(x2 )= 1-α
BSC: Numerical

A binary channel has the following noise characteristics:


If the input symbols are transmitted with probabilities ¾ & ¼ respectively, find H(X), H(Y),
H(X,Y), H(Y|X), H(X|Y).
BSC: Numerical
The joint probability matrix for a channel is given. Compute H(X), H(Y), H(XY), H(X/Y) &
H(Y/X).
BSC: Numerical
A source delivers the binary digits 0 and 1 with equal probability into a noisy channel at a
rate of 1000 digits / second. Owing to noise on the channel the probability of receiving a
transmitted ‘0’ as a ‘1’ is 1/16, while the probability of transmitting a ‘1’ and receiving a ‘0’
is 1/32. Determine the rate at which information is received.
BSC: Numerical
A transmitter produces three symbols ABC which are related with
joint probability shown. Calculate H(X,Y)
BSC: Numerical (Assignment)
BSC: Numerical (Assignment)

You might also like