0% found this document useful (0 votes)
2 views

ITC Module2 1

The document covers key concepts in Information Theory and Coding, including joint entropy, conditional entropy, mutual information, and discrete memoryless channels. It explains the relationships between these concepts, their properties, and provides examples and numerical problems related to binary symmetric channels and Jensen-Shannon Divergence. Additionally, it discusses the implications of noise in communication channels and presents various calculations to illustrate these theories.

Uploaded by

durgasathwik92
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

ITC Module2 1

The document covers key concepts in Information Theory and Coding, including joint entropy, conditional entropy, mutual information, and discrete memoryless channels. It explains the relationships between these concepts, their properties, and provides examples and numerical problems related to binary symmetric channels and Jensen-Shannon Divergence. Additionally, it discusses the implications of noise in communication channels and presents various calculations to illustrate these theories.

Uploaded by

durgasathwik92
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Information Theory and Coding

Module 2
H(X,Y) Joint entropy (combined randomness of X and Y)
P(X,Y) Joint probability (combined probability of occurrence of X and Y)
H(X|Y) Conditional entropy (The uncertainty in X, Y is known)
H(Y|X) Conditional entropy (The uncertainty in Y, X is known)
P(X|Y) Conditional probability (The probability of occurrence of X, Y is known)
P(Y|X) Conditional probability (The probability of occurrence of Y, X is known)
I(X;Y) Mutual information (amount of information shared between X and Y)
H(X;Y) Mutual entropy (total information contained in X and Y.)
Joint entropy
The joint entropy of two random variables 𝑋X and 𝑌Y, denoted as 𝐻(𝑋,𝑌)H(X,Y), measures the total uncertainty or
information contained in the pair (𝑋,𝑌)(X,Y). P(x,y) is the joint probability.
It generalizes the concept of entropy for a single random variable to the case where two variables are involved.:

𝑯 ( 𝑿 ,𝒀 ) =− ∑ ∑ 𝒑 ( 𝒙 , 𝒚 ) 𝐥𝐨𝐠 ⁡𝒑( 𝒙 , 𝒚)
𝒙 ∈ 𝑿 𝒚 ∈𝒀
Properties:
1) The joint entropy of a set of random variables is a nonnegative number. i.e: H(X,Y) ≥ 0.
2) The joint entropy of a set of variables is greater than or equal to the maximum of all of the individual entropies
of the variables in the set. i.e: H(X,Y) ≥ max [H(X), H(Y)]
3) The joint entropy of a set of variables is less than or equal to the sum of the individual entropies of the variables
in the set. i.e: H(X,Y) ≤ H(X) + H(Y)

Relations to other entropy measures:


• Joint entropy is used in the definition of conditional entropy. H(X|Y) = H(X,Y) – H(Y)
• t is also used in the definition of mutual information. I(X;Y) = H(X) + H(Y) – H(X,Y)

H(X|Y)------>The uncertainty in X, given by 𝑌 H(X;Y)---  The amount of information X contains about Y.


Joint entropy: Example

Marginal Probability
P(X=S)=0.1+0.4=0.5
P(X=R)=0.4+0.1=0.5
P(Y=Y)=0.1+0.4=0.5
P(Y=N)=0.4+0.1=0.5

H(X|Y)------>The uncertainty in X, given by 𝑌 H(X;Y)---  The amount of information X contains about Y.


Joint entropy: Example

H(X|Y)------>The uncertainty in X, given by 𝑌 H(X;Y)---  The amount of information X contains about Y.


Conditional entropy
• Conditional entropy quantifies the amount of information needed to describe the outcome of a random variable Y
given that the value of another random variable X is known.
• Entropy of Y when X is known H(Y|X).

𝑯 (𝒀 ∨ 𝑿 ) =− ∑ ∑ 𝒑 ( 𝒙 , 𝒚 ) 𝐥𝐨𝐠 ⁡𝒑(𝒚
𝑯∨𝒙)
(𝒀 ∨ 𝑿 ) =− ∑ ∑ 𝒑 ( 𝒙 , 𝒚 ) 𝐥𝐨𝐠 ⁡
𝒑(𝒙 , 𝒚 )
𝒙 ∈ 𝒚 ∈𝒀 𝒙 ∈ 𝒚 ∈𝒀 𝒑 ( 𝒙)

• Entropy of X when Y is known H(X|Y).

𝑯 ( 𝑿∨𝒀 ) =− ∑ ∑ 𝒑 ( 𝒙 , 𝒚 ) 𝐥𝐨𝐠 ⁡𝒑(𝒙∨𝒚 ) ) =− ∑ ∑


𝑯 ( 𝑿 ∨𝒀 𝒑 ( 𝒙 , 𝒚 ) 𝐥𝐨𝐠 ⁡
𝒑(𝒙 , 𝒚 )
𝒙 ∈ 𝒚 ∈𝒀 𝒙 ∈ 𝒚 ∈𝒀 𝒑(𝒚 )
• NOTE (Bayes’ Theorem:):

𝒑 ( 𝒙 , 𝒚 ) =𝒑 ( 𝒙| 𝒚 ) . 𝒑 ( 𝒚 ) =𝒑 ( 𝒚 |𝒙 ) . 𝒑 ( 𝒙 )

H(X|Y)------>The uncertainty in X, given by 𝑌 H(X;Y)---  The amount of information X contains about Y.


Conditional entropy
Property:
1) H(Y|X) = 0 if value of Y is completely determined by value of X.
2) H(Y|X) = H(Y) and H(X|Y) = H(X) if Y and X are independent random variables.
3) chain rule of conditional entropy: H(Y|X) = H(X,Y) - H(X). H(X|Y) = H(X,Y) - H(Y)
4) H(Y|X) ≤ H(Y)
5) H(X,Y) = H(Y|X) + H (X|Y) + I(X;Y)
6) H(X,Y) = H(X) + H (Y) - I(X;Y)

H(X|Y)------>The uncertainty in X, given by 𝑌 H(X;Y)---  The amount of information X contains about Y.


Chain rule
Chain rule: proof
Chain rule: proof
Conditional entropy: example
Relative entropy
Relative entropy: example
Mutual Information
It is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of
information" obtained about one random variable through observing the other random variable.

I(X;Y) = H(X) – H(X|Y) = H(Y) – H(Y|X) b/symbol

Properties:
1) I(X;Y) = I(Y;X)
2) I(X;Y) ≥ 0
3) I(X;Y) = H(X) + H(Y) – H(X,Y)

H(X|Y)------>The uncertainty in X, given by 𝑌 H(X;Y)---  The amount of information X contains about Y.


Mutual Information
Prove:
1) I(X;Y) = I(Y;X)
2) I(X;Y) ≥ 0

H(X|Y)------>The uncertainty in X, given by 𝑌 H(X;Y)---  The amount of information X contains about Y.


Discrete Memoryless Channel (DMC)
• DMC is a statistical model with an input A and an output B.
• Channel accept an input symbol A and respond as an output B.
• Channel is ‘discrete’ in the sense: Number of symbols in A and
B are finite.
• Memoryless: Current output only depends on current input not on
previous inputs.
• P(ai): is assumed to be known.
• Each possible input output path can be represented by a
conditional probability.: P(bj | ai)-----Channel Transmission
Probability.
• The conditional probabilities that describe an information channel
can be represented conveniently using a matrix representation:
Discrete Memoryless Channel (DMC)
P is the channel matrix and for notational convenience we may sometimes rewrite this as

Pij = P(bj/ai)

The channel matrix exhibits the following properties and structure:


• Each row of P contains the probabilities of all possible outputs from the same input
to the channel.
• Each column of P contains the probabilities of all possible inputs to a particular
output from the channel.
• If we transmit the symbol ai we must receive an output symbol with probability 1,
that is:

that is, the probability terms in each row must sum


to 1.
Discrete Memoryless Channel (DMC)
Noiseless: If the channel is noiseless there will be no error in transmission, the
channel matrix is given by

P=

Noisy: Say the channel is noisy and introduces a bit inversion 1% of the time, then the channel
matrix is given by
Binary symmetric channel (BSC)

• In BSC the input to the channel will be the binary digits {0,
1}.

• channel is assumed memoryless.

• Ideally if there is no noise a transmitted 0 is detected by the


receiver as a 0, and a transmitted 1 is detected by the
receiver as a 1.

• The most common effect of noise is to force the detector to


detect the wrong bit (bit inversion), that is, a 0 is detected as
a 1, and a 1 is detected as a 0. p = 1-q
q = is the probability of error (also called bit error
probability, bit error rate (BER), or “crossover”
probability)

• The BSC is an important channel for digital communication


systems as noise present in physical transmission media
Binary erasure channel (BEC)
• Another effect that noise (or more usually, loss of
signal) may have is to prevent the receiver from
deciding whether the symbol was a 0 or a 1.

• In this case the output alphabet includes an additional


symbol, ? called the “erasure” symbol that denotes a
bit that was not able to be detected.

• Strictly speaking a BEC does not model the effect of bit


inversion; thus a transmitted bit is either received
correctly (probability = p) or is received as an
“erasure” (probability = q = 1-p ).

• A BEC is becoming an increasingly important model for


wireless mobile and satellite communication channels,
which suffer mainly from dropouts and loss of signal
leading to the receiver failing to detect any signal.
Jensen-Shannon Divergence (JSD)
The Jensen-Shannon Divergence (JSD) is a widely used method for measuring the similarity or dissimilarity
between two probability distributions. Unlike other divergence measures, such as Kullback-Leibler divergence
(DKL​), JSD is symmetric and always yields a finite value. This makes it particularly suitable for comparing
distributions in real-world applications.
Jensen-Shannon Divergence (JSD)
The Jensen-Shannon Divergence (JSD) is a widely used method for measuring the similarity or dissimilarity
between two probability distributions. Unlike other divergence measures, such as Kullback-Leibler divergence
(DKL​), JSD is symmetric and always yields a finite value. This makes it particularly suitable for comparing
distributions in real-world applications.
Jensen-Shannon Divergence (JSD)
Jensen-Shannon Divergence (JSD):Numerical
Find the similarity between X=[0.2,0.28,0.14,0.25,0.14] and Y=[0.25,0.25,0.1,0.3,0.1] using Jensen-Shannon Divergence
BSC: Numerical
For the given binary channel: 0. 9
a) Find the channel matrix. x1 y1
b) Find P(y1) and P(y2) if P(x1)=P(x2)=0.5
c) Find the joint probability P(x1, y2), P(x2,y1) when P(x1)=P(x2)=0.5
d) Find mutual information I(Y;X)
x2 y2
0. 8
BSC: Numerical
BSC: Numerical
Let X and Y are two independent random variables with probabilities P(X)={0.2,0.25,0.2,0.2,0.15} and P(Y)={0.1,0.25,0.25,0.4}.
Find the joint entropy H(X,Y).

Since independent H(X,Y)=H(X)+H(Y)


BSC: Numerical
BSC: Numerical
1-p
For the given BSC. P(x1 )= α y1
1) Show that the mutual information is given by p
I(X;Y)=H(Y) + p log2p + (1-p)log2(1-p)
2) Calculate I(X;Y) for α = 0.5 and p= 0.1 p
3) Calculate I(X;Y) for α = 0.5 and p= 0.5, comment on results P(x2 )= 1-α 1-p
y2
BSC: Numerical
A binary channel has the following noise characteristics:
If the input symbols are transmitted with probabilities ¾ & ¼ respectively, find H(X), H(Y),
H(X,Y), H(Y|X), H(X|Y).
BSC: Numerical
The joint probability matrix for a channel is given. Compute H(X), H(Y), H(XY), H(X/Y) &
H(Y/X).
BSC: Numerical
A source delivers the binary digits 0 and 1 with equal probability into a noisy channel at a
rate of 1000 digits / second. Owing to noise on the channel the probability of receiving a
transmitted ‘0’ as a ‘1’ is 1/16, while the probability of transmitting a ‘1’ and receiving a
‘0’ is 1/32. Determine the rate at which information is received.
BSC: Numerical
A transmitter produces three symbols ABC which are related with
joint probability shown. Calculate H(X,Y)
BSC: Numerical (Assignment)
BSC: Numerical (Assignment)

You might also like