Information Theory
Information Theory
Information Theory
1-Information theory
2- Detection of digital signals in noise.
3- Source coding of discrete sources
4- Channel coding.
5-Introduction to digital signal processing (DSP)
6-Digital filter design
7-Selected digital communication systems
INFORMATION THEORY
Def: Information theory is a subject that deals with information and data
transmission from one point to another. The block diagram of any digital
communication system is shown below:
Noise+jamming
Self information:
Suppose that the source of information produces finite set of messages
x1, x2, x3,…., xn with prob. p(x1), p(x2), p(x3),……..,p(xn) and such that
The function that relates p(xi) with information of xi is denoted by I(xi) and
is called self information of xi.
The log function shown satisfies all previous three points hence:
Recall that
Ex: A fair die is thrown, find the amount of information gained if you are
tolled that 4 will appear.
Solution:
Since fair, die then, p(1)=p(2)=…….=p(6)=1/6, then:
I(4)=-log2 p(4)=-log2(1/6)=ln6/ln2=2.5849 bits.
(note that if "a" is not given then a=2)
3
Ex: A biased coin has p(Head)=0.3. Find the amount of information gained
if you are tolled that a tail will appear.
Solution:
P(tail)=1-p(Head)=1-0.3=0.7, then
I(tail)=-log2(0.7)=-ln0.7/ln2=0.5145 bits.
Ex: Find the amount of information contained in a black & white (B/W) TV
picture if we assume that each picture has 2*105 dots (pixels or picture
elements) and each pixel has 8 equiprobable and distinguishable levels of
brightness.
Solution:
P(each level)=1/8 since equiprobable levels
Information/pixel=-log2(1/8)=3 bits
Information/picture=Information/pixel * no. of pixels
=3 * 2* 105=600 kbits.
pixel
Homework:
Repeat previous example for color TV with 16 equiprobable colors and 8
equiprobable levels of brightness.
Source Entropy
If the source produces not equiprobable messages then I(xi) , i=1,2,…..,n, are
different. Then the statistical average of I(xi) over i will give the average
amount of uncertainty associated with the source X. This average is the
called source entropy and is denoted by H(X). This H(X) is given by:
or
4
Ex: Find the entropy of the source producing the following messages:
Solution:
H(X)=-[0.25ln0.25+ 0.1ln0.1+ 0.15ln0.15+0.5ln0.5]/ln2
H(X)=1.7427 bits/symbol
Note: usually and according to our previous study in logic and digital
electronic we are not familiar with fractions of bits. Here in communication,
these fractions occur due to averaging., i.e., for the previous example the
1.7427 is the average, i.e, if the source produces say 100000 message, then
the amount of information produced is 174270 bits.
Solution :
p(0T)+p(1T)=1, hence:
Binary 0, p(0T)
source
1, p(1T)
Notes:
1-In general H(X)=H(X)|max=log2n bits/symbol if all messages are
equiprobable, i.e, p(xi)=1/n, then :
Ex: A source produces dots "." and dashes "—" with p(dot)=0.65. If the time
duration of a dot is 200ms and that for a dash is 800ms. Find the average
source entropy rate.
Solution:
P(dot)=0.65, then p(dash)=1-p(dot)=1-0.65=0.35.
H(X)=-[0.65 log20.65 + 0.35 log20.35]=0.934 bits/symbol
6
Solution:
block
Synch
character
Each of the 7 positions behaves as a source that may produce randomly one
of the English alphabets with prob of 1/26, hence:
Information/position= - log2(1/26)=log226=4.7 bits
Information/block=7 * 4.7 =32.9 bits, we exclude the 1st character since it
has no information having the prob of certain event (contains synch only)
Then:
R(X)=Information/blocks * rate of producing blocks/sec
=32.9 *400=13160 bits/sec
MUTUAL INFORMATION
.
7
Tx
Noise Rx
jamming
X1 Y1
X2 Y2
X3 Y3
. channel .
. .
. .
Xn Ym
Consider the set of symbols x1, x2,…..,xn, the transmitter Tx may produce.
The receiver Rx may receive y1,y2,…,ym. Theoretically, if the noise and
jamming is zero, then the set X=setY and n=m. However, and due to noise
and jamming, there will be a conditional prob p(yj/xi):
Define:
1-p(xi) to be what is called the apriori prob of the symbol xi , which is the
prob of selecting xi for transmission.
2-p(xi/yj) to be what is called the aposteriori prob of the symbol xi after the
reception of yj.
The amount of information that yj provides about xi is called the mutual
information between xi and yj. This is given by:
Properties of I(xi,yj):
1-it is symmetric, i.e. I(xi,yj)=I(yj,xi)
2- I(xi,yj)>0 if aposteriori prob> apriori prob, yj provides +ve information
about xi.
8
bits/symbol
or
bits/symbol
Marginal Entropies
bits/symbol
Solution:
This is a very useful identity to ease calculations in problem solving.
To prove it, then we know that:
But p(xi,yj)=p(xi) p(yj/xi), putting this inside the log term only, then:
Then:
In the above equation, the 1st term is in fact H(X) and the 2nd term with - sign
is H(Y/X), then:
H(X,Y)=H(X)+H(Y/X)
Homework:
Show that H(X,Y)=H(Y)+H(X/Y)
Note:
Above identity indicates that the transinformation I(X,Y) is in fact the net
average information obtained at the receiver coming from the difference
between the original information produced by the source H(X) and that
information lost at the channel H(X/Y)(losses entropy) due to noise and
jamming.
Homework:
Show that I(X,Y)=H(Y)-H(Y/X)
10
y1 y2
Solution:
1-First we find p(X) & p(Y) from p(X,Y) by summing the rows and
columns:
, then:
H(X)=-[0.75 ln0.75 ) +2*0.125 ln0.125]/ln2=1.06127 bits/symbol
H(Y)=-[0.5625 ln0.5625 + 0.4375 ln0.4375]/ln2=0.9887 bits/symbol
2-
H(X,Y)=-[0.5log20.5 +0.25log20.25 +0.125log20.125+2*0.0625log20.0625]
H(X,Y)=1.875 bits/symbol
3-H(Y/X)=H(X,Y)-H(X)=1.875-1.06127=0.813 bits/symbol
H(X/Y)=H(X,Y)-H(Y)=1.875-0.9887=0.886 bits/symbol
4- but then:
bits
That means y2 gives ambiguity about x1
5-I(X,Y)=H(X)-H(X/Y)=1.06127-0.8863=0.17497 bits/symbol
6- To draw the channel model, we find p(Y/X) matrix from p(X,Y) matrix
by dividing its rows by the corresponding p(xi):
11
2/3
X1
Y1
X2 1
Y2
X3
0.5
X2 Y2
Y3
0.8
Ex: Find and plot the transinformation for a binary symmetric channel (BSC)
shown if p(0T)=p(1T)=0.5
pe-1
0T 0R
1T 1R
1-pe
Solution: This BSC is a very well known channel with practical values of
pe<<1. If we denote 0T=x1, 1T=x2, 0R=y1, 1R=y2, then:
12
This TSC is symmetric but not very practical since practically x1 and x3 do
not affected so much as x2. In fact the interference between x1 and x3 is much
less than the interference between x1 & x2 or x2 & x3.
13
1-2pe
X1 Y1
X2
pe
pe
X2 Y2
X3
pe
pe
X3 Y3
1-2pe
Hence, the more practical but nonsymmetric channel has the trans prob:
1-pe
X1 Y1
X2 pe
e
pe
1-2pe
X2 Y2
X3
pe
pe
X3 Y3
1-pe
1-lossless channel: This has only one nonzero element in each column of the
transitional matrix p(Y/X). As an example:
14
I(X,Y)=H(X) with zero losses entropy. (Homework: draw the channel model
of this channel).
2-Determinstic channel: This has only one nonzero element in each row of
the transitional matrix p(Y/X). As an example:
3-Noiseless channel: This has only one nonzero element in each row and
column of the transitional matrix p(Y/X), i.e. it is an identity matrix. As an
example:
Definitions:
Source efficiency and redundancy:
Source efficiency==H(X)/H(X)|max=H(X)/log2n
Source redundancy=1-=1-[H(X)/log2n]
Above can also be given as percentage.
H(X) source
entropy H(Y) receiver H(X,Y) system
entropy entropy
I(X,Y)
H(Y/X) noise H(X/Y) losses transinformation
entropy entropy
Examples:
1- is a BSC where n=m=2 and the 1st row is the
permutation of the 2nd row.
= H(Y)+K
Where K= .
Hence: I(X,Y)=H(Y)+K for symmetric channels only
Now to find max of I(X,Y)=max[H(Y)+K]=max[H(Y)]+K
17
Notes:
1-I(X,Y) becomes maximum equals C only if the condition for
maximization is satisfied, i.e. only if Y has equiprobable symbols. This
condition yields that X has also equiprobable symbols since if the output
of a symmetric channel is equiprobable, then its input X is also
symmetric.
2-For symmetric channel only, and to ease calculations, we can use the
formula
I(X,Y)=H(Y)+K.
X2 Y2
0.7
Find the channel capacity and efficiency if I(x1)=2bits
I(X,Y)=H(Y)+K=0.97095-0.88129=0.0896 bits/symbol.
Then: =I(X,Y)/C=0.0896/0.1187=75.6%.
prob:
Procedure:
1- First, we find I(X,Y) as a function of input prob:
I(X,Y)=f (p(x1), p(x2), ….p(xn). subject to the constraint:
i.e. use this constraint to reduce the number of variables by 1.
2- Partial differentiate I(X,Y) with respect to the (n-1) input prob., then
equate these partial derivatives to zero.
3-Solve the (n-1) equations simultaneously then find p(x1),p(x2),……,p(xn)
that gives maximum I(X,Y).(Note that the condition here is not necessarily
equiprobable since the channel is not symmetric.
4-put resulted values of input prob. in the function f given in step 1 above to
find C=max[I(X,Y)].
Ex: Find the channel capacity for the channel having transitional prob:
Solution: Note that the channel is not symmetric since the 1st row is not a
permutation of the 2nd row. Now let p(x1)=p , then p(x2)=1-p, hence instead
of having two variables, we will have only one variable p:
Then:
Notes:
similar to x2, then we can assume that p(x1)=p(x2)=0.5, then use it to find
I(X,Y) representing C. In such a case, no need for differentiation but be
20
careful not to mix this special example with the symmetric case and the use
of the formula C=log2m+K is not correct and gives wrong answer.
0.9
X1 Y1
X2 Y2
0.9 Y3
Cascading of channels:
If two channels are cascaded as shown, then, the overall transition matrix of
the equivalent channel is the matrix multiplication of the transitional prob of
the two cascaded channels.
X Y Z
Channel 1 Channel 2
p(Y/X) p(Z/Y)
(nxm) (mxr)
1 1 1
n m r
Ex: Find the transition matrix p(Z/X) for the cascaded channel shown:
21
0.8 0.7
Y1
Z1
X1
Y2
Z2
X2
Y3
0.7
Solution:
Homework: For previous example, find p(Y) and p(Z) if p(X)=[ 0.7 0.3]
in bits/sample
in bits/sample.
in bits/sample.
in bits/sample.
22
Note that all above entropies are differential entropies and not an absolute
measure of information since all prob are in fact prob. density functions.
variance of n(t). If n(t) is a thermal noise then we can assume that =0, and
the frequency spectrum of this noise is flat over wide range of frequencies as
shown.
Gn(f)
/2
Two-sided
spectrum
f
23
This has two sided power spectral density Gn(f)=/2 W/Hz and one-sided
power spectral density Gn(f)= W/Hz
Gn(f
)
One-sided
spectrum
f
BW
From Gn(f), we can find the noise power as N= BW in watts.
Since the spectrum is flat, we call this noise white noise. This white noise
affects the signal x(t) as additive term, i.e., the received signal y(t)=x(t)+n(t).
A very popular name of Additive, White, Gaussian, Noise (AWGN) is used
for such thermal noise. The figure below shows how this AWGN affects
equiprobable bipolar A signal.
+A p (x)
0.5 0.5
-A
x
x(t) -A +A
y(t) -A y
nats/sample
But:
then
nats/sample
0r bits/sample
x0,
26
(S/=1)
Note:
The result of previous example indicates that the channel capacity C
approaches a limit of 1.44S/ even if B is very large. This result is very
important for bandwidth unlimited channels, but power limited channels
such as satellite channels, where the bandwidth may be large but signal
power is a very important parameter.
Ex: Find the maximum theoretical information rate that can be transmitted
over a telephone channel having 3.5KHz bandwidth and 15dB SNR.
Solution:
C is the maximum theoretical information rate, using Shannon eq, then:
C = B log2(1+ SNR), where, SNR=15dB, changing into absolute
SNR=100.1*15=31., then:
C =3500 log2(1+31)=17500bps.
Ex:
A source produces 16 equiprobable symbols at a rate of 500 symbols/sec,
check the possibility of transmitting this rate over the telephone channel of
previous example.
Solution:
First, we find the rate of information from the source, which is the source
entropy rate R(X):
R(X)= H(X)* rate of symbols.
H(X)=H(X)|max=log216=4bits (equiprobable case)
27
Then: R(X)=4 * 500= 2000 bps. Now since R(X) < 17500, then yes it is
possible to transmit source output over this channel.
Ex:
Find the minimum theoretical time it would take to transmit 2500 octal digits
over the telephone channel of the previous example.
Solution:
From previous example then C=17500 bps. A minimum theoretical
transmission time is obtained if the channel operates at the maximum rate
which is C, then:
Ex:
Find the minimum theoretical SNR required to transmit a compressed video
information at a rate of 27Mbps over a channel having 5MHz bandwidth.
Solution:
For the minimum theoretical SNR, then put C=source bit rate =27Mbps,
then:
C= B log2(1+SNR)
27* 106 = 5* 106 log2(1+SNR),or
1+SNR =25.4 SNR=41.2 absolute or SNR=16.1 dB