0% found this document useful (0 votes)
42 views26 pages

Information Theory

The document discusses information theory and key concepts like information sources, entropy, and channel capacity. Information sources can be analog or discrete and have memory or be memoryless. Entropy measures the uncertainty in a information source and is related to the probabilities of messages. The channel capacity is the maximum rate at which information can be transmitted over a channel without error.

Uploaded by

Arighna Basak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views26 pages

Information Theory

The document discusses information theory and key concepts like information sources, entropy, and channel capacity. Information sources can be analog or discrete and have memory or be memoryless. Entropy measures the uncertainty in a information source and is related to the probabilities of messages. The channel capacity is the maximum rate at which information can be transmitted over a channel without error.

Uploaded by

Arighna Basak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 26

Syllabus :-

Information theory Discrete and continuous messages, Message source, zero memory source, Discrete
memory-less source, extension of zero memory source, Markov source and their entropy, Channel
with and without memory, Hartley and Shannon’s law.
Introduction to Information Theory
The fundamental problem of communication is that of reproducing at one point either exactly or approximately a
message selected at another point.
(Claude Shannon, 1948)

Throughout this book we have studied electrical communication prirnarily in terms of signals-desired information-bearing
signals corrupted by noise and interference signals. Although signal theory has proved to be a valuable tool, it does not
come to grips with the fundamental communication process of information transfer, Recognizing the need for a broader
viewpoint, Claude Shannon drew upon the earlier work of Nyquist and Hartley and the concurrent investigations of Wiener
to develop and set forth in 1948 a radically new approach he called ”A Mathematical Theory of Communication."
Shannon's paper isolated the central task of communication engineering in this question: Given a message-producing
source, not of our choosing, how should the messages be represented for reliable transmission over a communication
channel with its inherent physical limitations? To .address that question, Shannon concentrated on the message
Information per se rather than on the signals. His approach was soon renamed information theory, and it has
subsequently evolved into a hybrid mathematical and engineering discipline. Information theory deals with three basic
concepts: the measure of source information, the information capacity of a channel, and codin g as a means of utilizing
channel capacity for information transfer. The term coding is taken here in the broadest sense of message representation,
including both discrete and continuous waveforms.

If the rate of information from a source does not exceed the capacity of a communication channel then there exists a
coding technique such that the information can be transmitted over the channel with an arbitrarily small frequency of
errors despite the presence of noise.
(Claude Shannon, 1948)
The surprising, almost incredible aspect of this statement is its promise of error-Free transmission on a noisy channel, a
condition achieved with the help of coding. The coding process generally involves two distinct encoding/decoding
operations, portrayed diagrammatically by Fig. below. The channel encoder/decoder units perform the task of error
control coding. Information theory asserts that optimum channel coding yields an equivalent noiseless channel with a well-
defined capacity for information transmission. The source encoder/decoder units then match the source to the equivalent
noiseless channel, provided that the source information rate falls within the channel capacity.

The information source emits a no of discrete message symbol with probabilities P1 P2 …..PQ such that

P1 +P2 +…..PQ =1 i.e.

Information sources may take a variety of different forms. For example , in radio broadcasting , the source is
generally audio source(voice or music). In TV broadcasting , the information source is video source whose output is
moving image. The output of this source is analog signals, and hence the sources are called analog sources. In
contrast computers and storage devices( magnetic or optical disks) produce discrete outputs(usually binary or
ASCII) and hence they are called discrete source. Whether a source is analog or digital, a digital communication
system is designed to transmit information in digital form. Consequently the output of this source must be converted
to a format that can be transmitted.
The simplest type of discrete source is one that emits a binary sequence of the form 1010101100… where the
alphabet consists of two letters {1,0}. In general a discrete information source with an alphabet of Q possible
symbols say {x1,x2,….xQ} emits a sequence of letters selected from the alphabet. In statistical terms , we assume
that each letter in the alphabet {x1,x2,,x3…xQ} has a given probability Pk that is
©® A.Sarkar ,ECE, JGEC, page no 1
Pk=P(X=XK) Q≥k≥1 where

Information source can be classified as having memory or being memory less. A source with memory is one for
which a current symbol depends on the previous symbols. A memory less source is one for which each symbol is
independent of the previous symbols. The symbols are chosen for transmission independently of one another i.e.
emission of one symbol does not depend on other symbol in the same alphabet. This information source need not
have to remember the symbols which are sent earlier to any symbol and hence called zero-memory information
source or memory-less information source.

INFORMATION MEASURE
We begin our study of information theory with the measure of information. Then we apply information measure to
determine the information rate of discrete sources. Particular attention will be given to binary coding for discrete
Memory-less sources.
Here we use information as a technical term, not to be confused with "knowledge" or "meaning” concepts that defy
precise definition and quantitative measurement. In the context of communication, information is simply the
commodity produced by the source for transfer to some user at the destination.
Suppose, for instance, that you're planning a trip to a distant city. To determine what clothes to pack, you might hear
one of the following forecasts:
 The sun will rise.
 There will be scattered rainstorms.
 There will be a tornado.
The first message conveys virtually no information since you are quite sure in advance that the sun will rise. The
forecast of rain, however, provides information not previously available to you. The third forecast gives you more
information, tornadoes being rare and unexpected events; you might even decide to cancel the trip !
Notice that the messages have been listed in order of decreasing likelihood and increasing information. The
less likely the message, the more information it conveys. We thus conclude that information measure must be
related to uncertainty, the uncertainty of the user as to what the message will be.
Whether you prefer the source or user viewpoint, it should be evident that information measure involves the
probability of a message. If x, denotes an arbitrary message and P(xi) = Pi is the probability of the event that xi is
selected for transmission, then the amount of information associated with xi should be some
function of Pi.Specifically, Shannon defined information measure by the logarithmic function

The amount of information is also called self-information.


If the base of log is 2 the unit is called “BITS”.
If the base of log is e the unit is called “NATS”.
If the base of log is 10 the unit is called “DECITS”.
If the base of log is r the unit is called “r-ary units”.

Reason for using log in the definition of the self-information


i) The self-information should be non-negative

ii) The lowest possible information must be zero which occurs for a sure message

iii) More information should be carried if the message is less likely one

iv) For independent message symbols , the total self-information should be equal to the sum of the individual self-
information.
Proof: Let Si and Sj are two consecutive independent symbols chosen for transmission with probabilities Pi and Pj respectively.
Then the total self-information contained in both Si and Sj is given by

©® A.Sarkar ,ECE, JGEC, page no 2


Iij=

Total self information=sum of individual self information.

Entropy of zero memory information source (Average Self-Information)


Let us consider a long sequence of length L message symbols w.r.t.. a source alphabet S 1,S2,….SQ
These long sequence contains
P1L no of messages of type S1
P2L no of messages of type S2
.
.
.
.
PQL no of messages of type SQ
Self information of S1=I1=log 1/P1 bits
P1L no of messages of type S1 consists of P1L log 1/P1 bits
Similarly
P2L no of messages of type S2 consists of P2L log 1/P2 bits
.
.
.
.
PQL no of messages of type SQ consists of PQL log 1/PQ bits

The total self information= Itotal= P1L log 1/P1+ P2L log 1/P2+…..+ PQL log 1/PQ bits

Average self Information = ITotal/L= bits/message symbol=H(S)=Entropy of sources

The amount of information produced by the source during an arbitrary symbol interval is a discrete random variable having the
possible values I1, I2, . . . , Iq. The expected information per symbol is then given by the statistical average which is called the
source entropy.
But we'll interpret above equation from the more pragmatic observation that when the source emits a sequence of n >> 1
symbols, the total information to be transferred is about nH(X) bits. Since the source produces r symbols per second on average,
the time duration of this sequence is about n/r. The information must therefore be transferred at the average rate nH(X)/(n/r) =
rH(X) bits per second. Formally, we define the source information rate a critical quantity relative to transmission.

Properties of Entropy
The value of H(X) for a- given source depends upon the symbol probabilities Pi and the alphabet size M.
Nonetheless, the source entropy always falls within the Limits

Lower Bound: if Pi=0 then H(s)=0


if Pi=1 then H(s)=0
The lower bound here corresponds to no uncertainty or freedom of choice, which occurs when one symbol has
probability Pj = 1 while Pi = 0 for i ≠ j-so the source almost always emits the same symbol.

Upper Bound: The upper bound corresponds to maximum uncertainty or freedom of choice, which occurs when
Pi = 1/M for all i-so the symbols are equally likely.

To illustrate the variation of H(X) between these extremes, take the special but important case of a binary
source (M = 2) with
P1=p and P2=1-p
Substituting these probabilities into Equation above yields the binary entropy
©® A.Sarkar ,ECE, JGEC, page no 3
The plot of figure above displays a rather broad maximum centered at p = 1 - p = 1/2 where H(X) = log 2 = 1
bit/symbol; H(X) then decreases monotonically to zero as p→1 or 1 - p→1.

Entropy function is continuous in the interval(0,1) and logarithm of a continuous function is continuous by itself
Consider

Let us consider a straight line y=v-1 and a curve y=lnv plotted on the same graph as shown below

it is clear from the graph that curve always lies


below the straight line except at v=1
ln(v) ≤v-1

©® A.Sarkar ,ECE, JGEC, page no 4


multiplying by -1 both sides –ln(v) ≤1-v therefore ln (1/v)≥1-v put v=1/QPi therefore ln (QPi) ≥1-1/QPi
multiplying by Pi , taking summation and then multiplying by log2e on both sides we get

when symbols are equiprobable when Pi=1/Q for all i=1,2,3,…Q


The maximum value of entropy is given by H(s)max=log2Q when all the symbols are equiprobable
Extension of a zero memory source
Let us consider a binary source emitting symbols S1 & S2 with probabilities P1 & P2 respectively such that P1+P2=1
Then the 2nd extension of the basic binary source will have 22 number of symbols=(No of basic source symbols)Extension=4
symbols given by
S1S1 occurring with probability P1P1=P12
S1S2 occurring with probability P1P2
S2S1 occurring with probability P1P2= P2P1
S2S2 occurring with probability P22
-------------------------------
Total Probability=(P1+P2)2=1
nd
The entropy of the 2 order extension source is given by

Similarly the 3rd extension of the source will have 23=8 symbols given by
S1S1S1 occurring with probability P13
S1S1S2 occurring with probability P12P2
S1S2S1 occurring with probability P12P2
S1S2S2 occurring with probability P1P22
S2S1S1 occurring with probability P12P2
S2S1S2 occurring with probability P1P22
S2S2S1 occurring with probability P1P22
S2S2S2 occurring with probability P23
-------------------------------
Total Probability=(P1+P2)3=1
Similarly the entropy of the 3rd extension is H(S3)=3H(S)
In general the nth extension of the source will have 2n symbols and the entropy of the nth extended source is given by
H(Sn)=nH(S)

Sources with finite memory: Markov source


In memory less source there I no inter symbol influence and in a long sequence the symbols occurs independently. In real life
sources however inter symbol influence do exist and occurrence of any symbol indeed depend on the previous symbols emitted
by the source. Specifically the occurrence of a source symbol Si , may depend on a finite number, ‘r’ of the previous symbols.
Such a source is known as “Markov source of rth order”. These sources are generally specified by conditional probabilities.

©® A.Sarkar ,ECE, JGEC, page no 5


{P(Si | Sj1, Sj2, Sj3,….. Sjr)}
i=1,2,3…Q
Jk=1,2,….Q
Where SI is the symbol in zero-th position . The time sequence of the emitted symbol is
Sj1, Sj2, Sj3,….. Sjr,, Si
Thus for an rth order Markov source, the probability of emitting a given symbol is known if we know the r –preceding symbols.
The conditional probabilities can be shown by state diagrams.

Entropy of Markov Source:


The ‘join probability’ of being in the state specified by { Sj1, Sj2, Sj3,….. Sjr} and Si occurs as
P(Sj1, Sj2, Sj3,….. Sjr,Si)=P(Sj1, Sj2, Sj3,….. Sjr)P((Si | Sj1, Sj2, Sj3,….. Sjr)
Then the amount of information we receive
I(SI | Sj1, Sj2, Sj3,….. Sjr)=log 1/ P(Si | Sj1, Sj2, Sj3,….. Sjr)
Hence Average amount of information per symbol

H(SI | Sj1, Sj2, Sj3,….. Sjr)= log 1/ P(Si | Sj1, Sj2, Sj3,….. Sjr)

The entropy of the source then is the average entropy of each state. i.e. we have to average above equation over the Q r possible
states

for a zero memory source rather than Markov source P(Si | Sj1, Sj2, Sj3,….. Sjr)=P(Si)

Some Probability theory


If one random variable has effect on the other, then we define conditional probability. Let A and B be two events, then the
probability of event A given that B has occurred is given by

P(B|A) is the probability of event B given that A has occurred.


From this it can be concluded that

Two events A and B are statistically independent if


P(A|B)=P(A)
Then P(AB)=P(A).P(B)
Bayes’ Rule:
If B1,B2,B3,…Bn are mutually exclusive events and event A occurs when any one of the B1,B2,B3,…B n occurs , then as per
bayes’ rule

©® A.Sarkar ,ECE, JGEC, page no 6


Several Probabilities:
P(xk) is the probability that the source selects symbol xk for transmission;
P(yj) is the probability that symbol yj is received at the destination;
P(xk,yj) is the joint probability that xk is transmitted and yj is received;
P(xk | yj) is the conditional probability that xk was transmitted given that yj is received;
P(yj|xk) is the conditional probability that yj is received given that xk was transmitted.

Joint entropy: Consider two independent events X and Y with m possibilities for x and n possibilities for y. If P(xi,yj) is the
joint probability, P(xi) is the input probability and P(yj) is the output probability, then the entropy of the joint event called the
joint entropy is defined as

Conditional Entropy:
From the definition of conditional probability we have
P(xk|yj)=p(xk,yj)/P(yj)
Then

Thus the set


[X|yj]={ x1|yj , x2|yj ,……. xm|yj }
P(X|yj)={ p(x1|yj) , p(x2|yj),……. p(xm|yj) }
Forms a complete finite scheme entropy function which may therefore be defined as

Taking the average of the above entropy function for all admissible characters received, we have the average “conditional
entropy”

©® A.Sarkar ,ECE, JGEC, page no 7


all the five entropies so far defined are all inter related
for example

Note: For the CPM (conditional probability matrix) if you add all the elements in any column the sum
P(X|Y) should be equal to unity. Similarly if you add al the elements along any row of the CPM P(Y|X)
the sum shall be unity.

Channel
A communication channel is a medium through which the symbols generated by the source flow to the receiver. A discrete
memory less channel is a statistical model with an input xk and output yj which accepts symbols from source X and generates an
output symbol Y a shown in the figure.

©® A.Sarkar ,ECE, JGEC, page no 8


Source Receiver
X X1 Y1 Y

X2 Y2

P(yj|xk)
X Y

Xk Yj

Xm Yn

Discrete memory less


channel

If the alphabets of X and Y are infinite , then the channel is a continuous channel whereas if the alphabets of X and Y are finite
the channel is a discrete channel. It is memory less when the current output depends on only the current input symbol but not on
any previous input symbols. Such a discrete memory less channel is shown which has m inputs generated by X and n outputs
that are received by the receiver.
The channel is represented by conditional probability ‘P(yj|xk)’ which is the conditional probability of obtaining an output yj
given that the input is xj and is called the channel transition probability.

In fact a channel can be characterized completely by means of a channel matrix. A channel matrix is a matrix of channel
transitional probabilities[P(Y|X)] represented by

P(y1/x1) P(y2/x1) ……… P(yn/x1)


P(y1/x2) P(y2/x2) ……… P(yn/x2)
.
P(Y|X)= .
.
P(y1/xm) P(y2/xm) ……… P(yn/xm)

The conditional probabilities P(yj|xk) then have special significance as the channel's forward transition
probabilities. By way of example, Fig. below depicts the forward transitions for a noisy channel with two source
symbols and three destination symbols.

©® A.Sarkar ,ECE, JGEC, page no 9


Mutual Information
Our quantitative description of information transfer on a discrete memory-less channel begins with the mutual
information.
which measures the amount of information transferred when xk is transmitted and yj is received.
On an average we require H(X) bits of information to specify one input symbol. However if we are allowed to observe the
output symbol produced by that input, we require, then only H(X|Y) bits of information to specify the input symbol.
Accordingly we come to the conclusion that on an average , observation of a single output provides us with [H(X)-H(X|Y)]
Bits of information. This difference is called “mutual information”. The quantity I(X; Y) represents the average amount of
source information gained per received symbol, as distinguished from the average information per source symbol
represented by the source entropy H(X).

I(X,Y)=H(X)-H(X|Y).
Equation above says that the average information transfer per symbol equals the source entropy minus the
equivocation. Correspondingly, the equivocation represents the information lost in the noisy channel.
In alternate way I(X,Y)=H(Y)-H(Y|X)
Equation above says that the information transferred equals the destination entropy H(Y) minus the noise
entropy H(Y|X) added by the channel.

P(Xk) is the source probability.


P(xk|yj) is computed at receiver end.
The difference between the initial uncertainty of the source symbol x k , log 1/p(xk) and the final uncertainty about the same
symbol xk after receiving yj , log1/p(xk|yj) is the information gained through the channel. This difference , we call as the mutual
information between the symbols xk and yj.

It is clear that

so the mutual information is symmetrical.


Averaging above equation over all admissible character xk and yj , we obtain the average information gain of the receiver. where the
summation subscripts indicate that the statistical average is taken over both alphabets.

©® A.Sarkar ,ECE, JGEC, page no 10


H(X,Y)

H(X|Y)
I(X,Y) H(Y|X)

H(X) H(Y)

for a channel with independent input and output , we have H(X|Y)=H(X) and H(Y|X)=H(Y)
I(X,Y)=H(X)-H(X|Y)=H(X)-H(X)=0
i.e. no information is transferred through the channel. Hence the channel has the largest internal loss( lossy channel) as compared to noise
free channel which is a loss less newrork.
Properties of Mutual Information
 I(X,Y)=I(Y,X)
 I(X,Y)≥0
 I(X,Y)=H(Y)-H(Y|X)
 I(X,Y)=H(X)-H(X|Y)
 I(X,Y)=H(X)+H(Y)-H(X,Y)
 I(X,Y)=H(X)=H(Y)( FOR NOISE FREE CHANNEL)
 I(X,Y)=0 ( FOR A LOSSY/ NOISY CHANNEL)

Discrete Channel Capacity


We've seen that discrete memoryless channels transfer a definite amount of information I(X; Y), despite corrupting
noise. A given channel usually has fixed source and destination alphabets and fixed forward transition probabilities,
so the only variable quantities in I(X; Y) are the source probabilities P(xk). Consequently, maximum information
transfer requires specific source statistics-obtained, perhaps, through source encoding. Let the resulting maximum
value of I(X; Y) be denoted by
CS=max[I(X:Y)] i.e. maximum rate at which the information is transferred on an average over the channel.
Bits/symbol
©® A.Sarkar ,ECE, JGEC, page no 11
This is the channel capacity per symbol of a DMC where the maximization is overall possible input probability
distribution {P(xk)} on X.
C=r.CS bits/sec where r is symbols transmitted per second.=channel capacity per seond. which represents the maximum rate of
information transfer.

For a lossless channel H(Y|X)=0 and hence I(X:Y)=H(X)-H(X|Y)=H(X) i..e. the mutual information ( information transfer ) is equal to
the source entropy and no source information is lost in transmission. Consequently chanlecapacity per symbol is given by
CS=max H(X)=log2m where m is number of symbols in X.
For a deterministic channel H(Y|X)=0 and I(X;Y)=H(Y)
CS=max[H(Y)]=log2n bits/symbol
C=rlog2n bps. Where n is the number of symbols at Y.
For a noiseless channel we have I(X:Y)=H(X)=H(Y) as (H(Y|X)=H(X|Y)=0)
Channel capacity is given by
CS=log2m=log2n
C=r log2m=rlog2n
For a BSC
I(X:Y)=H(Y)+P log2P+(1-P) log2(1-P)
And the channel capacity is given by
CS= 1+P log2P+(1-P) log2(1-P)
C=r[1+P log2P+(1-P) log2(1-P)]
Binary Symmetric Channel
A BSC has two inputs ( x1=0, x2-1) and two outputs (y1=0 , y2=1) with a channel matrix given by
where α is the transition probability

1-α α
P(Y|X)=
α 1-α

There are two source symbols with probabilities P(x1)=p P(x2)=1-p


and two destination symbols with forward transition probabilities
P(y1|x2)=P(y2|x1)=α P(y1|x1)=P(y2|x2)=1-α
This model could represent any binary transmission system in which errors are statistically independent and
the error probabilities are the same for both symbols, so the average error probability is

we know that

©® A.Sarkar ,ECE, JGEC, page no 12


©® A.Sarkar ,ECE, JGEC, page no 13
©® A.Sarkar ,ECE, JGEC, page no 14
Shanon Hartley law:- when distribution is Gaussian maximum entropy occurs( proof in lathi).

©® A.Sarkar ,ECE, JGEC, page no 15


©® A.Sarkar ,ECE, JGEC, page no 16
©® A.Sarkar ,ECE, JGEC, page no 17
Equation above is the famous Hartley-Shannon law. When coupled with the fundamental theorem, it establishes
an upper limit for reliable information transmission on a band limited AWGN channel, namely, R ≤ B log (1+S/N)
bitslsec. Additionally, since bandwidth and signal-to-noise ratio are basic transmission parameters, Eq. above
establishes a general performance standard for the comparison of communication systems.
BW to SNR trade off
From shanon hatrley law we observe that by increasing B in Hz and decreasing S/N , the channel capacity can be maintained
constant. Also by decreasing b and increasing S/N the channel capacity can again be made constant. Such an adjustment
between b and S/N is defined as BW to SNR trade off..
Keeping in mind that noise power varies with bandwidth as N = NoB, we explore the trade-off between
bandwidth and signal power by writing

Thus, if No and R have fixed values, information transmission at the rate R ≤ C requires

This plot reveals that bandwidth compression (B/R < 1) demands a dramatic increase of signal power, while
bandwidth expansion (B/R > 1) reduces S/NoR asymptotically toward a distinct limiting value of about - 1.6 dB
as B/R →∞

©® A.Sarkar ,ECE, JGEC, page no 18


From shanon’s theorem it looks that as B is increased , C will also increase to a great extent but as B increases , N also
increases so the product of B and log2 (1+S/N) will increase only upto a certain limit known as shanon’s limit.
In fact, an ideal system with infinite bandwidth has finite channel capacity given by shanon’s limit as

Problem 1:

problem 2:

©® A.Sarkar ,ECE, JGEC, page no 19


3)

4)

©® A.Sarkar ,ECE, JGEC, page no 20


5)

6)

7)

©® A.Sarkar ,ECE, JGEC, page no 21


8)

9)

10)

©® A.Sarkar ,ECE, JGEC, page no 22


11)

©® A.Sarkar ,ECE, JGEC, page no 23


12)

13)

©® A.Sarkar ,ECE, JGEC, page no 24


14)

15)

©® A.Sarkar ,ECE, JGEC, page no 25


©® A.Sarkar ,ECE, JGEC, page no 26

You might also like