5CS3 ITC Unit II @zammers
5CS3 ITC Unit II @zammers
5CS3
Information Theory & Coding
Unit – 2
Source Coding For Data Compaction
Contents
Prefix code
Huffman code
Shanon-Fane code
Lempel-Ziv coding
channel capacity
Channel coding theorem
Shannon limit
Prefix Codes
A prefix code is a variable length code in which no codeword is a prefix
of another one
0 1
e.g a = 0, b = 100, c = 101, d = 11
1
Can be viewed as a binary trie a 0
0 1 d
b c
Huffman code
The Huffman coding procedure finds the optimum (least rate
uniquely decodable, variable length entropy code associated with
a set of events given their probabilities of occurrence.
Xi X1 X2 X3 X4 X5 X6
X) = 2.36 bit/symbol
1. Create a list of probabilities or frequency counts for the given set of symbols so that th
relative frequency of occurrence of each symbol is known.
2. Sort the list of symbols in decreasing order of probability, the most probable ones to th
left and least probable to the right.
3. Split the list into two parts, with the total probability of both the parts being as close to
each other as possible.
4. Assign the value 0 to the left part and 1 to the right part.
5. Repeat the steps 3 and 4 for each part, until all the symbols are split into individua
subgroups.
on:
x) be the probability of occurrence of symbol x:
means that P(D)~P(B),, so divide {D, B} into {D} and {B} and assign 0 to D and 1 to B.
{A, C, E} group,
P(A) = 0.22 and P(C) + P(E) = 0.20
So the group is divided into
{A} and {C, E}
and they are assigned values 0 and 1 respectively.
{C, E} group,
P(C) = 0.15 and P(E) = 0.05
So divide them into {C} and {E} and assign 0 to {C} and 1 to {E}
Note: The sp
C E
0.15 0.05 is now stopp
each symbo
separated now
Variable A B C D E
Code 2 2 3 2 3
Length
X1 0.30 0 0 00
X2 0.25 0 1 01
X3 0.20 1 0 10
X4 0.12 1 1 0 110
X5 0.08 1 1 1 0 1110
X6 0.05 1 1 1 1 1111
Shannon-Fano Encoding
H(X) = 2.36 bit/symbol
L = 2.38 bit/symbol
Efficiency = 0.99
Channel Capacity
Channel capacity, in electrical engineering, computer science
and information theory, is the tight upper bound on the rate a
which information can be reliably transmitted ove
a communication channel.
Therefore:
C = 3400 * log2(1 + 1000)
= (3400)(9.97)
≈34 kbps
Shannon-Hartley
Hartley Theorem
We cannot prove the theorem, but can partially justify it as follows:
suppose the received signal is accompanied by noise with a RMS
voltage of σ , and that the signal has been quantised with levels
separated by a = λσ .
If λ is chosen sufficiently large, we may expect to be able to recognize
the signal level with an acceptable probability of error.
Suppose further that each message is to be represented by one voltage
level.
If there are to be M possible messages, then there must be M levels
The average signal power is then
Shannon-Hartley
Hartley Theorem
Shannon-Hartley
Hartley Theorem
where N = σ 2 is the noise power.
power If each message is equally
likely, then each carries an equal amount of information.
Shannon-Hartley
Hartley Theorem
To find the information rate, we need to estimate how many
messages can be carried per unit time by a signal on the channel.
Since the discussion is heuristic, we note that the response of an
ideal LPF of bandwidth B to a unit step has a 10–90 percent rise
time of τ = 0.44/B.
We estimate therefore that with T = 0.5/B ≈ τ we should be able
to reliably estimate the level.
The message rate is then
Shannon-Hartley
Hartley Theorem
Shannon-Hartley
Hartley Theorem
This is equivalent to the Shannon-Hartley
Shannon theorem with λ = 3.5.
on 1 2 3 4 5 6 7 8
erical ФA 1B 2B ФB 2A 5B 4B 3A
ep.
00111010011001011100101101110
Lampel Ziv Coding