Information Theory: 1.1 Review of Probability
Information Theory: 1.1 Review of Probability
Information Theory
1.1 Review of probability
Self Probability: If an experiment has A1, A2, A3 ….. An outcome then:
n( Ai )
Prob. (A) = P(A) = lim
N N
Note that:
n
1 P( A) 0 and P( A ) 1
i 1
i
P( A , B ) 1
j 1 i 1
i j
1
Forth Class Electrical Dept.
Communication II Nada Nasih
Note that:
P(Ai,Bj) = P(Ai). P(Bj / Ai) = P(Bj). P(Ai / Bj)
n m
P( Ai / B j ) 1 and
i 1
P( Bj / A ) 1
j 1
i
2
Forth Class Electrical Dept.
Communication II Nada Nasih
Sol:
2
P(A1)=
j 1
P( A1 , B j ) 0.1 0.25 0.35
2
P(A2)=
j 1
P( A2 , B j ) 0 0.2 0.2
2
P(A3)=
j 1
P( A3 , B j ) 0.25 0.2 0.45
3
P(B1)= i 1
P( Ai , B1 ) 0.1 0 0.25 0.35
3
P(B2)=
i 1
P( Ai , B2 ) 0.25 0.2 0.2 0.65
0.1 0.25 2 5
0.35 0.35 7 7
P(Ai, Bj) 0.2 1
P(Bj / Ai) 0 0
P(Ai) 0.2 1
0.25 0.2 5 4
0.45 0.45 9 9
0.1 0.25 2 5
0.35 0.65 7 13
P(Ai, Bj) 0.2 4
P(Ai / Bj) 0 0
P(B j ) 0.65 13
0.25 0.2 5 4
0.35 0.65 7 13
3
Forth Class Electrical Dept.
Communication II Nada Nasih
b. Continuous R.V.: Here X can be all real values not discrete then
we call P(X)=PDF=Prob. Density function that gives the prob.
That X lies between any two points X1 & X2.
X2
note that :
P( X )dx 1
X X .P( X )dx
X2 X
2
.P( X )dx
2 X 2 - X
2
4
Forth Class Electrical Dept.
Communication II Nada Nasih
Sol:
2 2
1 1
a. P( X )dx 1
P( X )dx 2 .K .dx 2 .K .(4)
2 2
1
K=
2
2 2 2
1 X X X2 1
b. P(X>1) = P( X )dx
1 12 4 2 8 1 8
0 2
1 X 1 X
c. X X .( )dx X .( )dx 0
2
2 4 0
2 4
2
1 X 2
X 2 2. X 2 .( )dx
0
2 4 3
2 X 2 - X =
2 2 2
-0=
3 3
H.W2: two dice are thrown, the sum of points appearing on the
two dice is a random variable (X). Find the value of the R.V.
taking by X & corresponding probabilities.
H.W3: If P(X) = a a x
e , find X , X2 and 2
2
5
Forth Class Electrical Dept.
Communication II Nada Nasih
Self Information:
Suppose that te source of information produces finite set of messages X 1,
n
X2, …. Xn with prob. P(X1), P(X2),…P(Xn), such that P( Ai ) 1 .
i 1
The amount of information gained from knowing that the source produces
the messages Xi as follows:
1. Information is zero if P(Xi)=1.
2. Information increases as P(Xi) decreases.
3. information is a positive quantity.
6
Forth Class Electrical Dept.
Communication II Nada Nasih
Note that:
Ln( P)
Loga p =
Ln(a)
Ex: A fair dice is thrown, find the amount of information gained if you
are told that 4 will appear.
Sol :
1
Fair dice = P(1)=P(2)=P(3)=P(4)=P(5)=P(6)=
6
1
I(4)=- Loga P(4) = - Log2 = Log2 6
6
Ln(6)
I(4)= = 2.5844 bit
Ln(2)
Sol:
1
Information / pixel = - log2 P(level) = - log2 =3 bits
8
Information / picture = 3*2*105 = 600 K bits
7
Forth Class Electrical Dept.
Communication II Nada Nasih
Source Entropy :
If I(Xi), i=1,2,…n are different for a source producing un equal
probability symbols, then the statical average of I(X i) will give the
average amount of uncertainty associated with the source X, this average
is called source entropy and denoted by H(X) and measured by bit per
symbol.
n
H(X)= P( X i ) I ( X i )
i 1
n
H(X)= - P( X i ).log 2 P( X i )
i 1
Sol:
n
1
H(X)= - P( X i ).log 2 P( X i ) = - [0.25 Ln (0.25) + 0.1 Ln (0.1) + 0.15
i 1 Ln(2)
Ln (0.15) + 0.5 Ln (0.5)]
H(X)= 1.7427 bits/symbols
Sol:
P(0) + P(1) = 1
P(1) = 1 – P(0)
2
H(X)= P( X i ). log 2 P( X i )
i 1
Note that:
H(X)=log2 n, if the n symbols X1, X2, …. Xn are equal probability
H(X)=0, if one of the symbols has prob. = 1
8
Forth Class Electrical Dept.
Communication II Nada Nasih
1
Rate of producing Symbols =
n
i .P( X i )
i 1
= average time duration of symbols
τi= time duration of Xi
H (X )
R(X) =
Ex: A source produces dots "•" & dashes "-" with probability P(dot) =
0.65, if time duration of a dot is 200 ms and that for a dash is 800 ms.
Find the average source entropy R(X).
Sol:
2
i .P( X i ) = [200*0.65 + 800*0.35]= 410 ms
i 1
H ( X ) 0.934
R(X) = = = 2.278 bit/sec
410
9
Forth Class Electrical Dept.
Communication II Nada Nasih
Mutual Information :
Consider the set of symbols X1, X2…… Xn can be produced. The receiver
may receive Y1, Y2…… Ym. if the noise and jamming are zero the set X =
set Y and (n=m), however, due to noise and jamming, there will be
conditional probability P(Y/X).
Definition:
P(Xi) is called a priori prob. Of the symbol Xi which is the prob. Of
selecting Xi for transmission.
P(Xi/Yi) is known a posteriori prob. Of Xi after the reception of Yi.
P( X i / Yi )
I(Xi, Yi) = log2 (a posteriori prob.)/(a priori prob.) =log2
P( X i )
Note that :
P(Yi / X i )
I(Xi, Yi) = I(Yi, Xi) = log2
P(Yi )
Marginal Entropy:
A term usually used to denote both source entropy H(X) & receiver
entropy (Y(X).
m
H(Y) = - P(Y j ).log 2 P(Y j ) bit/symbol
j 1
10
Forth Class Electrical Dept.
Communication II Nada Nasih
The average amount of information associated with the pair (Xi / Yi) &
(Yi / Xi) are called conditional entropy.
m n
H(Y/X)= -
j 1 i 1
P( X i , Y j ). log 2 P(Y j / X i ) Noise Entropy.
m n
H(X/Y)= -
j 1 i 1
P( X i , Y j ). log 2 P( X i / Y j ) Losses Entropy.
TransInformation:
Average mutual information, this is statical average of all pairs I(Xi, Yi)
m n
I(X,Y)=
j 1 i 1
P( X i , Y j ).I ( X i , Y j )
m n P( X i / Y j )
= P( X i , Y j ).log 2
j 1 i 1 P( X i )
m n P(Y j / X i )
= P( X i , Y j ).log 2
j 1 i 1 P(YJ )
It is measured by bits/symbol.
11
Forth Class Electrical Dept.
Communication II Nada Nasih
Sol:
m n
H(X,Y)= -
j 1 i 1
P( X i , Y j ). log 2 P( X i , Y j )
m n
=-
j 1 i 1
P( X i , Y j ). log 2 P( X i ) P(Y j / X i )
m n m n
=-
j 1 i 1
P( X i , Y j ). log 2 P( X i ) -
j 1 i 1
P( X i , Y j ). log 2 P(Y j / X i )
n m n
=- P( X i ).log 2 P( X i ) -
i 1
j 1 i 1
P( X i , Y j ). log 2 P(Y j / X i )
= H(X) + H(Y/X)
The prove above shows that the transinformation I(X,Y) is the Net
average information gained at Rx which is the difference between
the source information produced by the source H(X) and the
information lost in the channel H(X/Y) or (H(Y/X) due to noise
and jamming.
12
Forth Class Electrical Dept.
Communication II Nada Nasih
Sol:
2
1. P(Xi) = P( X ,Y ) = [0.75
j 1
i j 0.125 0.125]
3
P(Yi) = P( X ,Y ) = [0.5625 0.4375]
i 1
i j
3
1
H(X) = - P( X ).log
i 1
i 2 P( X i ) =
ln( 2)
[ 0.75 ln(0.75) +2*0.125ln(0.125)]
= 1.06127 bits/symbol
2
1
H(Y) = - P(Y ).log
j 1
j 2 P(Y j ) =
ln( 2)
[ 0.5625 ln(0.5625) +0.4375ln(0.4375)]
= 0.9887 bits/symbol
2 3
1
2. H(X,Y)= )= -
j 1 i 1
P( X i , Y j ). log 2 P( X i , Y j ) =
ln( 2)
[0.5ln(0.5)+
0.25ln(0.25)+0.125ln(0.125)+2*0.0625ln(0.0625)]
=1.875 bits/symbols
P( X 1 / Y2 ) P(X1 , Y2 )
4. I(X1, Y2) = log2 since P(X1 / Y2 )
P( X 1 ) P(Y2 )
P(X1 , Y2 )
= log2
P( X 1 )P(Y2 )
0.25
= log2 = -0.3923 bits
0.75 * 0.4375
13
Forth Class Electrical Dept.
Communication II Nada Nasih
0.5 0.25 2 1
0.75 0.75 3 3
0 0.125 0 1
P(Yj/Xi) = 0.125 0.125 = 1 1
0.0625 0.0625 1 1
0.125 0.125 2 2
X1
2/3
1/3 Y1
X2
1
Y2
1/2 1/2
X3
14
Forth Class Electrical Dept.
Communication II Nada Nasih
Ex: Find & plot the transinformation for binary symmetric channel
(BSC) shown below if P(0T)=P(1T)=0.5
0T 0R
1T 1R
Sol:
I(X,Y)=H(Y)-H(Y/X)
Let 0T = X1 & 1T = X2
1 Pe Pe
0R = Y1 & 1R = Y2 P(Y/X)=
Pe 1 Pe
1 Pe Pe
P(Xi,Yj)=P(Xi).P(Yj/Xi)= 2 2
Pe 1 Pe
2 2
P(Yj)= [0.5 0.5]
m n
H(Y/X)= -
j 1 i 1
P( X i , Y j ). log 2 P(Y j / X i )
1 1 Pe Pe
=- [2 * ln(1 Pe) 2 * ln( Pe)]
ln 2 2 2
1
=- [(1 Pe) ln(1 Pe) ( Pe) ln( Pe)]
ln 2
1
I(X,Y)=1 + [(1 Pe) ln(1 Pe) ( Pe) ln( Pe)]
ln 2
Pe I(X,Y)
0 1
0.5 0
1 1
15
Forth Class Electrical Dept.
Communication II Nada Nasih
1-2Pe
1 2 Pe Pe Pe X1 Y1
P( Y/X)= Pe 1 2 Pe Pe Pe
Pe Pe 1 2 Pe Pe
Pe
X2 1-2Pe Y2
Pe Pe
Pe
X3 Y3
1-2Pe
This TSC is symmetric but not practical since (doesn't effect that
much on X3).
There is no chance that X1 is received as Y3 or X3 as X1 . Hence, a
non symmetrical channel but more practical is shown :-
1-Pe
X1 Y1
Transision channel matrix
Pe
Pe
1 Pe Pe 0 X2 1-2Pe Y2
P( Y/X)= Pe 1 2 Pe Pe
Pe Pe
0 Pe 1 Pe
X3 Y3
1-Pe
3 1
4 0 0 0
4
1 2
P( Y/X)= 0 0 0
3 3
0 0 0 0 1
16
Forth Class Electrical Dept.
Communication II Nada Nasih
H(X) H(Y)
X Y X Y
H(X,Y) H(Y/X)
X Y X Y
H(X/Y) I(X,Y)
17
Forth Class Electrical Dept.
Communication II Nada Nasih
Channel Capacity C:
P(Y /X ).log
j 1
j i 2 P(Yj/X i ) = K = constant
= H(Y) + K
I ( X ,Y )
Channel Efficiency =
C
I ( X ,Y )
Channel Redundancy = R = 1
C
18
Forth Class Electrical Dept.
Communication II Nada Nasih
Ex: Find the channel capacity for BSC shown then find channel
redundancy, if I(X1)=2 bits
0.7 0.3
P(Y/X)= 0.7
0.3 0.7 X1 Y1
0.3 0.3
X2 Y2
0.7
Sol:
P(Y /X ).log
j 1
j i 2 P(Yj/X i ) = K
0.7log20.7+0.3log20.3=-0.61086
P(Y)=[0.4 0.6]
1
H(Y)= [(0.4). ln(0.4) (0.6) ln(0.6)] 0.97095bit / symbol
ln(2)
I(X,Y)=0.97075-0.61086=0.36 bit/symbol
R=1-0.36/0.38914=7.46%
19
Forth Class Electrical Dept.
Communication II Nada Nasih
P( X ) 1
i 1
i & use this constrain to reduce the number of
variables by 1.
To maximize I(X,Y), differentiate I(X,Y) with respect to
P(X1), P(X2),….. P(Xn) and then equate to zero.
Find the input prob. P(X1), P(X2),….. P(Xn) that make
I(X,Y) max.
0.7
X1 Y1
0.7 0.3
P(Y/X)= X2 Y2
0.1 0.9
0.9
Sol:
Let P(X1)=p then P(X2)=1-p
1
H(Y)= [(0.1 0.6 p).ln(0.1 0.6 p) (0.9 0.6 p) ln(0.9 0.6 p)]
ln( 2)
m n
H(Y/X) = -
j 1 i 1
P( X i , Y j ). log 2 P(Y j / X i )
1
= [0.7 p. ln 0.7 0.3 p ln 0.3 0.1(1 p) ln 0.1 0.9(1 p) ln 0.9]
ln(2)
H (Y / X ) 1
[0.7. ln 0.7 0.3 ln 0.3 0.1ln 0.1 0.9 ln 0.9]
p ln( 2)
20
Forth Class Electrical Dept.
Communication II Nada Nasih
1
= [0.285781]
ln( 2)
H (Y ) 1
[0.6 0.6. ln(0.1 0.6 p) 0.6 0.6. ln(0.9 0.6 p)]
p ln( 2)
1 0.1 0.6 p
= [0.6 ln ]
ln(2) 0.9 0.6 p
I ( X , Y ) H (Y ) H (Y / X )
Zero
p p p
0.1 0.6 p
0.6 ln 0.285781 0
0.9 0.6 p
0.1 0.6 p 0.285781 1
ln . ln
0.9 0.6 p 0.6
0.1 0.6 p
0.285781
e 0.6
p 0.47187
0.9 0.6 p
H(Y) = -0.96021
H(Y?X)= 0.66354
NOTE:
Sometimes to ease calculation, we are asked to find channel capacity
when channel is non-symmetric but there are some similarities between
some symbols ( not all ). In such case we can satisfy that by assuming
theses symbols are equal probability and proceed as in previous example.
Ex: 0.9
X1 Y1
We can assume
P(X1) = P(X3) = p 0.1
0.8 0.1
Then X2 Y2
P(X2) = 1-2p 0.1
0.1
X3 Y3
0.9
21
Forth Class Electrical Dept.
Communication II Nada Nasih
Cascading of channels
If two channels are cascaded then the over all transition matrix is the
product of the two transition matrices.
22
Forth Class Electrical Dept.
Communication II Nada Nasih
Ex: find the transition matrix P(Z/X) for the cascaded channels shown
P(X) = [0.7 0.3]
Sol:
0.8 0.7
0.8 0.2 0 X1 Y1 Z1
P(Y/X) = 0.2 0.3
0.3 0 0.7
0.3
X2 Y2 1
0.7 0.3 Z2
P(Z/Y) = 1 0 0.7
1
1 0
Y3
0.76 0.24
P(Z/X)=P(Y/X).P(Z/X)=
0.91 0.09
H.W 7 : find the joint prob. and then find P(Y) & P(Z) of the example.
P(X,Y)=P(X)P(Y/X)
P(X,Z)=P(X)P(Z/X)
23