Itc Term1
Itc Term1
UNIT I
ETEC 304
VI Semester
Learning Objective
⚫
Introduction to Information Theory
⚫ Modelling of Information Source
⚫ Entropy (Joint /Conditional)
⚫ Channel capacity
⚫
Data compaction
⚫ Markov Sources
1916-2001 2
1. The purpose of communication system is to carry information bearing base band
signals from one place to another placed over a communication channel.
2. Information theory is concerned with the fundamental limits of communication.
What is the ultimate limit to data compression?
What is the ultimate limit of reliable communication over a noisy
channel,
3. Information Theory is a branch of probability theory which may be applied to
the study of the communication systems that deals with the mathematical
modelling and analysis of a communication system rather than with the
physical sources and physical channels.
4. Two important elements presented in this theory are Binary Source (BS) and the
Binary Symmetric Channel (BSC).
5. A binary source is a device that generates one of the two possible symbols ‘0’
and ‘1’ at a given rate ‘r’, measured in symbols per second.
6. The BSC is a medium through which it is possible to transmit one symbol per
time unit.
What is Information Theory
▪ The smaller the probability of an event is, the more information the
occurrence of that event will convey.
▪ The message associated with the least likelihood
event contains the maximum information.
Measure of Information
▪ Information of symbol
a =10 ,hartley
1
I = log a = − log a P( X )
➢
P( X ) ➢
1 1
H = p log 2 + (1− p) log
2
p 1− p
H
1.0
Entropy is maximized when all
the symbols are equiprobable
0 p
0.5 1
N
H =
▪ N symbols: 1
log2 N = log2 N bit/symbol
n=1 N
9
Information Rate := If
the time rate at which source X emits
symbols is r (symbols/sec). the information rate = rH(X) b/s .
10
Solution
▪ We have
1 1 1 1
H (X ) = log 2 2 + log 2 4 + log 2 8 + 2 log 2 16
2 4 8 16
15
= bits/sample
8
Properties of Entropy:
➢ 0 ≤ 𝐻 𝑋 ≤ log2 N ; M = no.of symbols of
the alphabet of source X.
➢ When all the events are equally likely, the average
uncertainty must have the largest value i.e. log2 N ≥ 𝐻 ( 𝑋)
Extension of DMS :
Symbols are taken 2 at a time then it is second order extension of the source , x1x2,
x1x1, x2x1,x2x2
Hint: 3 symbols hence 9 combinations, probabilities of individual symbols get multiplied. Ans:
H(X)= 1.29 bits/sym , H(X2)=2.59bits/sym.
(x1x1, x1x2, x1x3, x2x2, x2x3, x2x1, x3x3, x3x2, x3x1)
SOURCE EFFICIENCY : Average information conveyed by the source (H(X)) to the maximum
average information .
Maximum average information is H(Xmax)= log2N, where N is number of symbols assuming
equal probability. γ_X=(H(X)) /〖H(X)〗_ max ×100%
Redundancy = R_X=1-γ_X
Q) Source emits three symbols with 0.7,0.15, and 0.15 as probabilities. Calculate the efficiency and
redundancy. Ans: 74.5 % and 25.4 %.
Source Coding
The conversion of the output of a discrete memory less source (DMS) into a sequence of binary
symbols i.e. binary code word, is called Source Coding
The Code produced by a discrete memoryless source (DMS) , has to be efficiently represented
to minimize the average bit rate required to represent the source or minimize the average bit
rate required for representation of the source by reducing the redundancy of the information
source.
For example, in telegraphy, we use Morse code, in which the alphabets are denoted by Marks and
Spaces. If the letter E is considered, which is mostly used, it is denoted by “.” Whereas the
letter Q which is rarely used, is denoted by “--.-”
Let X be a DMS with finite entropy H (X) and an symbols {𝑥1 … … . . 𝑥 𝑚 } with
corresponding probabilities of occurrence P(xi) (i = 1, …. , m). Let the binary code word
assigned to symbol xi by the encoder have length ni, measured in bits. The length of the
code word is the number of binary digits in the code word.
II. Average Code word Length:
The average code word length L, per source symbol is givenby
The parameter L represents the average number of bits per source symbol used in the
source coding process.
The source coding theorem states that for a DMS X, with entropy H (X), the average
code word length L per symbol is bounded as L ≥ H (X). Further, L can be made as
close to H (X) as desired for some suitable chosen code.
Thus, with
𝐿𝑚𝑖𝑛 = 𝐻 (𝑋)
The code efficiency can be rewritten as
𝜂 = 𝐻(𝑋) / 𝐿
Source coding theorem states that " There should be H(X) number of bits to represent
any of the symbols emitted by the source to have lossless communication .
This source coding theorem is called as noiseless coding theorem as
it establishes an error-free encoding. It is also called as Shannon’s first theorem.
H(X) also represents the minimum rate at which an information source can be compressed for
reliable reconstruction.
Classification of Code:
✓ Fixed – Length Codes
✓ Variable – Length Codes
✓ Distinct Codes
✓ Prefix – Free Codes
✓ Uniquely Decodable Codes
✓ Instantaneous Codes
✓ Optimal Codes
𝐾 = Σ 2−𝑛 ≤ 1 𝑖
𝑖=1
Q) DMS given with 4 symbols. Which code does not satisfy Kraft Inequality. Show A and D are
UDC. (Code C: Satisfies KE but is not UDC.)
xi CODE CODE CODE CODE
A B C D
X1 00 0 0 0
X2 01 10 11 100
X3 10 11 100 110
X4 11 110 110 111
Entropy Coding :
The design of a variable – length code such that its average code word
length approaches the entropy of DMS is often referred to as Entropy
Coding.
Huffman code
17
Source P (xi) Codeword
Sample
xi H (X) = 2.36 b/symbol
x1 0.30 00 L = 2.38 b/symbol
x2 0.25 01 η = H (X)/ L = 0.99
x3 0.20 11
x4 0.12 101
x5 0.08 1000
x6 0.05 1001
Shannon Fano Algorithm
Q) Let there be six (6) source symbols having probabilities as x1 = 0.30, x2 =
0.25, x3 = 0.20, x4 = 0.12, x5 = 0.08 x6 = 0.05. Obtain the Shannon – Fano
Coding for the given source symbols.
LZ ALGO
1011010001010.....
Given the binary sequence : 1 0 10 11 01 101 010 1011
Phrases Numbering Code
1
1) Variable to Fixed Code
001 000 1
0 010 000 0 2) Prior probabilities not given
10 011 001 0
11 100 001 1
01 101 010 1
101 110
010 111
011 1
101 0
Lempel ziv
1011 110 1
algorithm
1) Divide the given sequence ino phrases known as parsing.
3)The tail bit , last bit in the phrase is the innovation symbol.
4)The code is formed by writing the code number of the head bits.
Lempel-Ziv Coding
1011010100010…
12 3 4 5 6 7
–
each phrase consists of a previously occurring phrase
(head) followed by an additional 0 or 1 (tail)
transmit code for head followed by the additional bit for tail
–
– 01001121402010…
for head use enough bits for the max phrase number so far:
100011101100001000010…
– decoder constructs an identical dictionary
P(Y)= P(X)*P(Y/X)
P(X,Y)=P(Xd)*P(Y/X)
𝑃(𝑦1/𝑥1) ⋯ 𝑃(𝑦𝑛/𝑥1)
𝑃(𝑌/ 𝑋) = ⋮ ⋱ ⋮
𝑃(𝑦 1 /𝑥) ⋯ 𝑃(𝑦 𝑛 /𝑥)
Since each input to the channel results in some output, each row of the
column matrix must sum to unity. This means that
𝑛
Now, if the input probabilities P(X) are represented by the row matrix, we
have
Also the output probabilities P(Y) are represented by the row matrix, we have
[𝑃(𝑌)] = [𝑃 ( 𝑦1 ) 𝑃 ( 𝑦2) … .𝑃(𝑦𝑛)]
Then
[𝑃 (𝑌) ] = [𝑃(𝑋)][𝑃(𝑌/𝑋)]
Now if P(X) is represented as a diagonal matrix, we have
𝑃(𝑥1) ⋯ 0
[𝑃(𝑋)]𝑑 ⋮ ⋱ ⋮
= 0 ⋯𝑃(𝑥 𝑚 )
Then
𝑃( 𝑋, 𝑌) = [𝑃(𝑋)]𝑑[𝑃(𝑌/𝑋)]
Where the (i, j) element of matrix [P(X,Y)] has the form P(xi, yj).
The matrix [P(X, Y)] is known as the joint probability matrix and the element
P(xi, yj) is the joint probability of transmitting xi and receiving yj.
Entropy, Conditional Entropy and Mutual Information
H(X,Y)
H(X) H(Y)
H(Y |X)
H(X |Y)
I(X ;Y)
14
Conditional and Joint Entropies
Marginal Probability
denotes the average uncertainty of the random
variable X or of channel input
denotes the average uncertainty of the random
variable Y or of channel output
Represents amount of information lost due to noise etc wrt the output symbol or when ouput is
observed.
12
Similarly
therefore
where
If X and Y are statstistically
independent then :
H(X,Y)= H(X)+H(Y)
where
Similarly : yugna
2021-03-30 12:37:10
-------------------------------------------
- CHAIN RULE
yugna
2021-03-30 12:37:14
-------------------------------------------
- CHAIN RULE
Mutual Information : Transinformation
▪ Given that
▪ is the uncertainty of the random variable X over informaton channel
▪ is information lost in the channel due to the noise or the uncertainty of
random variable X after random variable Y is known
▪ Then,
▪ Denotes the balance of information at the receiver or the the amount of uncertainty of X that
has been removed given Y is known
▪ Definition of mutual information I ( X ;Y ) = H ( X ) − H ( X | Y )
I ( X ;Y ) = H ( Y) − H ( Y | X)
I ( X ;Y ) = H ( X ) + H(Y)− H ( X | Y )
13
Mutual Information
❖ Mutual Information (MI) of two random variables is a measure of the mutual dependence
between the two variables. More specifically, it quantifies the "amount of information" (in
units such as Shannons, commonly called bits) obtained about one random variable
through observing the other random variable.
❖ It can be thought of as the reduction in uncertainty about one random variable given
knowledge of another.
❖ High mutual information indicates a large reduction in uncertainty; low mutual information
indicates a small reduction; and zero mutual information between two random variables
means the variables are independent.
❖ For two discrete variables X and Y whose joint probability distribution is PXY (x,y) , the
mutual information between them, denoted I(X;Y) , is given by
MUTUAL
INFORMATION
PROOF :
Subtracting
Rearranging
Q)
P(0/0)=1-p
P(1/0)= P(0/1)=p
P(1/1)=1-p
19
Channel Capacity
▪
▪ If R ≤ C, theoretically guarantee almost error free transmission
▪ If R>C, reliable transmission is impossible
Channel capacity: a maximum rate, C in bits/sec of a channel
The capacity per symbol Cs of a discrete-memoryless channel:
C s = I ( X ;Y ) (max over all possible input distribution)
max px ( )
▪
The channel capacity per sec C
If there are r symbols transmitted per second then the max rate of transmission
of information per second is rCs
Hence max capacity per second is denoted by C(b/s)= rCs b/s
21
Capacities of Special Channels :
Q1) Find capacity of the channel .
For the given channel diagram write the channel matrix.
Q3)
24
Types of Codes- Prefix Codes