0% found this document useful (0 votes)
10 views50 pages

5CS3 ITC Unit II @zammers

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views50 pages

5CS3 ITC Unit II @zammers

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

5CS3-01

5CS3
Information Theory & Coding
Unit – 2
Source Coding For Data Compaction
Contents
Prefix code
Huffman code
Shanon-Fane code
Lempel-Ziv coding
channel capacity
Channel coding theorem
Shannon limit
Prefix Codes
A prefix code is a variable length code in which no codeword is a prefix
of another one

0 1
e.g a = 0, b = 100, c = 101, d = 11
1
Can be viewed as a binary trie a 0

0 1 d

b c
Huffman code
The Huffman coding procedure finds the optimum (least rate
uniquely decodable, variable length entropy code associated with
a set of events given their probabilities of occurrence.

Xi X1 X2 X3 X4 X5 X6

P(Xi) 0.30 0.25 0.20 0.12 0.08 0.05


Huffman coding
Xi P(Xi)
X1 0.30 0.30 0.30 0.45 0.55
00 00 00 1 0
X2 0.25 0.25 0.25 0.30 0.45
01 01 01 00 1
X3 0.20 0.20 0.25 0.25
11 11 10 01
X4 0.12 0.13 0.20
101 100 11
X5 0.08 0.12
1000 101
X6 0.05
1001
Xi Code Codeword length
X1 00 2
X2 01 2
X3 11 2
X4 101 3
X5 1000 4
x6 1001 4
Xi Probability Code Code length
X1 0.30 00 2
X2 0.25 01 2
X3 0.20 11 2
X4 0.12 101 3
X5 0.08 1000 4
x6 0.05 1001 4

gth of Information = 0.30*2 + 0.25*2 + 0.20*2 + 0.12*3 + 0.08*4 + 0


= 0.6 + 0.5 + 0.4 + 0.36 + 0.32 + 0.2 = 2.38 bit/sym
ngth of Information = 0.30*2 + 0.25*2 + 0.20*2 + 0.12*3 + 0.08*4 + 0.0
= 0.6 + 0.5 + 0.4 + 0.36 + 0.32 + 0.2 = 2.38 bit/sym

X) = 2.36 bit/symbol

iciency = H(X) / L = 2.36 / 2.38 = 0.991 = 99.1%

dundancy = 1 – Code Efficiency = 1 – 0.991 = 0.009 = 0.9%


f Probability of five variable is given then calculate
the length of information using Huffman Coding

P(Xi) 0.2 0.4 0.2 0.1 0.1


X(i) P(Xi)
X1 0.4 0.4 0.4 0.6
1 1 1 0
X2 0.2 0.2 0.4 0.4
01 01 00 1
X3 0.2 0.2 0.2
000 000 01
X4 0.1 0.2
0010 001
X5 0.1
0011
Xi Probability Code Code length
X1 0.4 1 1
X2 0.2 01 2
X3 0.2 000 3
X4 0.1 0010 4
X5 0.1 0011 4
ngth of Information = 0.4*1 + 0.2*2 + 0.2*3 + 0.1*4 + 0.1*4
= 0.4 + 0.4 + 0.6 + 0.4 + 0.4 = 2.2 bit/symbol
) = 1.98 bit/symbol
e Efficiency = H(X) / L = 1.98 / 2.2 = 0.9 = 90%
undancy = 1 – Code Efficiency = 1 – 0.9 = 0.1 = 10%
p(a) = .1, p(b) = .2, p(c ) = .2, p(d) = .5
Variable Probability

a 0.5 0.5 0.5


0 0 0
b 0.2 0.3 0.5
11 10 1
c 0.2 0.2
100 11
d 0.1
101
Variable Probability Code Code
Length
a 0.5 0 1
b 0.2 11 2
c 0.2 100 3
d 0.1 101 3

ngth of Information = 0.5 * 1 + 0.2*2 + 0.2*3 + 0.1 *3


= 0.5 + 0.4 + 0.6 + 0.3 = 1.8 bit/ symbol
Shanon-Fane code
Shannon Fano Algorithm is an entropy encoding technique fo
lossless data compression of multimedia.
multimedia

Named after Claude Shannon and Robert Fano, it assigns a code


to each symbol based on their probabilities of occurrence.

It is a variable length encoding scheme, that is, the codes


assigned to the symbols will be of varying length.
HOW DOES IT WORK?
The steps of the algorithm are as follows:

1. Create a list of probabilities or frequency counts for the given set of symbols so that th
relative frequency of occurrence of each symbol is known.

2. Sort the list of symbols in decreasing order of probability, the most probable ones to th
left and least probable to the right.

3. Split the list into two parts, with the total probability of both the parts being as close to
each other as possible.

4. Assign the value 0 to the left part and 1 to the right part.

5. Repeat the steps 3 and 4 for each part, until all the symbols are split into individua
subgroups.
on:
x) be the probability of occurrence of symbol x:

n arranging the symbols in decreasing order of probability:

P(D) + P(B) = 0.30 + 0.2 = 0.58

P(A) + P(C) + P(E) = 0.22 + 0.15 + 0.05 = 0.42


n {D, B} group,

0.30 and P(B) = 0.28

means that P(D)~P(B),, so divide {D, B} into {D} and {B} and assign 0 to D and 1 to B.
{A, C, E} group,
P(A) = 0.22 and P(C) + P(E) = 0.20
So the group is divided into
{A} and {C, E}
and they are assigned values 0 and 1 respectively.

{C, E} group,
P(C) = 0.15 and P(E) = 0.05
So divide them into {C} and {E} and assign 0 to {C} and 1 to {E}
Note: The sp
C E
0.15 0.05 is now stopp
each symbo
separated now
Variable A B C D E

Probabili 0.22 0.28 0.15 0.30 0.05


ty
Code 10 01 110 00 111

Code 2 2 3 2 3
Length

L = 0.22*2 + 0.28*2 + 0.15*3 + 0.30*2 + 0.05*3 = 2.2 bit/ symbol


Shannon-Fano Encoding
Xi P(Xi)

X1 0.30 0 0 00

X2 0.25 0 1 01

X3 0.20 1 0 10

X4 0.12 1 1 0 110

X5 0.08 1 1 1 0 1110

X6 0.05 1 1 1 1 1111
Shannon-Fano Encoding
H(X) = 2.36 bit/symbol
L = 2.38 bit/symbol
Efficiency = 0.99
Channel Capacity
Channel capacity, in electrical engineering, computer science
and information theory, is the tight upper bound on the rate a
which information can be reliably transmitted ove
a communication channel.

The basic mathematical model for a communication system is


the following:
Channel Capacity
According to channel capacity equation,
C = B log2(1 + S/N),
C-capacity,
B-bandwidth of channel,
S-signal power,
N-noise power,
when B -> infinity (read B 'tends to‘ infinity), capacity saturates
to 1.44S/N.
Channel Capacity
S/N is the signal to noise power ratio (SNR). SNR generally
measured in dB using the formula,

(S/N) dB = 10 log (Signal Power / Noise Power)


(SNR) dB = 10 log (Signal Power / Noise Power)
Consider a voice-grade line for which B=3100Hz, SNR=30dB
Calculate the channel Capacity.
Capacity
Given,
SNR = 30dB
B = 3100Hz
(SNR) dB = 10 log (Signal Power / Noise Power)
30 = 10 log (S/N)
(S/N) = 1000
C = B log2(1 + S/N)
= 3100 * log2(1 + 1000)
= 3100 * log2(1001)
C = 30,894 bps = 30.894 kbps
Shannon-Hartley
Hartley Theorem
channel capacity , the tightest upper bound on information rate (excluding
error correcting codes) of arbitrarily low bit error rate data that can be sen
with a given average signal power S through an additive white Gaussian
noise channel of power N, is:

C is the channel capacity in bits per second


B is the bandwidth of the channel in hertz
S is the total received signal power over bandwidth, in watts
N is the total noise or interference power over bandwidth, in watts
S/N is the signal-to-noise ratio (SNR) expressed as a linear power ratio (not
as logarithmic decibels).
Shannon-Hartley
Hartley Theorem
Consider the operation of a modem on an ordinary
telephone line. The SNR is usually about 1000. The
bandwidth is 3.4 KHz.

Therefore:
C = 3400 * log2(1 + 1000)
= (3400)(9.97)
≈34 kbps
Shannon-Hartley
Hartley Theorem
We cannot prove the theorem, but can partially justify it as follows:
suppose the received signal is accompanied by noise with a RMS
voltage of σ , and that the signal has been quantised with levels
separated by a = λσ .
If λ is chosen sufficiently large, we may expect to be able to recognize
the signal level with an acceptable probability of error.
Suppose further that each message is to be represented by one voltage
level.
If there are to be M possible messages, then there must be M levels
The average signal power is then
Shannon-Hartley
Hartley Theorem
Shannon-Hartley
Hartley Theorem
where N = σ 2 is the noise power.
power If each message is equally
likely, then each carries an equal amount of information.
Shannon-Hartley
Hartley Theorem
To find the information rate, we need to estimate how many
messages can be carried per unit time by a signal on the channel.
 Since the discussion is heuristic, we note that the response of an
ideal LPF of bandwidth B to a unit step has a 10–90 percent rise
time of τ = 0.44/B.
We estimate therefore that with T = 0.5/B ≈ τ we should be able
to reliably estimate the level.
The message rate is then
Shannon-Hartley
Hartley Theorem
Shannon-Hartley
Hartley Theorem
This is equivalent to the Shannon-Hartley
Shannon theorem with λ = 3.5.

 Note that this discussion has estimated the rate at which


information can be transmitted with reasonably small error, the
Shannon-Hartley theorem indicates that with sufficiently
advanced coding techniques transmission at channel capacity can
occur with arbitrarily small error.
Shannon-Hartley
Hartley Theorem
The expression of the channel capacity of the Gaussian
channel makes intuitive sense::
As the bandwidth of the channel increases, it is possible to
make faster changes in the information signal, thereby
increasing the information rate.
rate
As S/N increases, one can increase the information rate while
still preventing errors due to noise.
noise
 For no noise, S/N → ∞ and an infinite information rate is
possible irrespective of bandwidth.
bandwidth
Shannon-Hartley
Hartley Theorem
Thus we may trade off bandwidth for SNR. For example, if S/N
= 7 and B = 4kHz, then the channel capacity is C = 12 × 103
bits/s.
If the SNR increases to S/N = 15 and B is decreased to 3kHz, the
channel capacity remains the same.
same
However, as B → ∞, the channel capacity does not become
infinite since, with an increase in bandwidth, the noise power also
increases.
If the noise power spectral density is η/2, then the total noise
power is N = ηB, so the Shannon-Hartley
Shannon law becomes
Shannon-Hartley
Hartley Theorem
Shannon-Hartley
Hartley Theorem
This gives the maximum information transmission rate possible
for a system of given power but no bandwidth limitations.

The power spectral density can be specified in terms o


equivalent noise temperature by η = kTeq.

There are literally dozens of coding techniques, entire textbooks


are devoted to the subject, and it is an active research subject.

Obviously all obey the Shannon-Hartley


Shannon theorem
Some general characteristics of the Gaussian channel can b
demonstrated. Suppose we are sending binary digits at
transmission rate equal to the channel capacity: R = C.
If the average signal power is S, then the average energy per b
is Eb = S/C, since the bit duration is 1/C seconds.
This relationship is as follows:
Shannon limit
The asymptote is at Eb/η = −1.59
59dB, so below this value there is
no error-free communication at any information rate. This is
called the Shannon limit.
Lampel Ziv Coding
The Lempel-Ziv algorithm is a variable-to-fixed
variable length code.
Basically, there are two versions of the algorithm LZ77 and
LZ78 are the two lossless data compression algorithms published
by Abraham Lempel and Jacob Ziv in & They are also known as
LZ1 and LZ2 respectively.
These two algorithms form the basis for many variations
including LZW, LZSS, LZMA and others.
Besides their academic influence, these algorithms formed the
basis of several ubiquitous compression schemes, including GIF
and the DEFLATE algorithm used in PNG.
Lampel Ziv Coding
AABABBBABAABABBBABBABBA

A AB ABB B ABA ABAB BB ABBA BBA


1 2 3 4 5 6 7 8 9

on 1 2 3 4 5 6 7 8

nce A AB ABB B ABA ABAB BB ABBA


ition 1 2 3 4 5 6 7 8

uence A AB ABB B ABA ABAB BB ABBA

erical ФA 1B 2B ФB 2A 5B 4B 3A
ep.

ode 000 11 101 001 100 1011 1001 0110

00111010011001011100101101110
Lampel Ziv Coding

1 0 10 11 01 101 010 1010

1, 0, 10, 11, 01, 101, 010, 1010


Directory 001 010 011 100 101 110 111 1000
Location
Content 1 0 10 11 01 101 010 1010

Code 000 1 000 0 001 0 001 1 010 1 011 1 101 0 110 0

You might also like