Unit 2
Unit 2
Source Coding
Where Sk is the output of the discrete memoryless source and bk is the output of the source
encoder which is represented by 0s and 1s.
The encoded sequence is such that it is conveniently decoded at the receiver.
Let us assume that the source has an alphabet with k different symbols and that
the kth symbol Sk occurs with the probability Pk, where k = 0, 1…k-1.
Let the binary code word assigned to symbol Sk, by the encoder having length lk, measured in
bits.Hence, we define the average code word length ( L ¿of the source encoder as
M
L= ∑ P K l K Consider : Assume Prob.s as 1/4
K=1
With L≥ Lmin we will have η≤1 (which means source coding aim to give compact
representation i.e for 7 bit ,it should give 1 or 2 bit representation. L should be kept as min.as
possible, but it should satisfy L≥Lmin ,that is the condition for good or optimal code ,The value
of Lmin is given by Shannon’s first theorem.)
However, the source encoder is considered efficient when η=1
For this, the value Lmin has to be determined.
Let us refer to the definition, “Given a discrete memoryless source of entropy H(X), the average
code-word length L for any source encoding is bounded as L≥H(X)” (Shannon’s first theorem)
in simpler words, the code word example: Morse code for the word QUEUE is−.−..−...−. is
always greater than or equal to the source code QUEUE in example. Which means, the symbols
in the code word are greater than or equal to the alphabets in the source code.
Hence with Lmin=H(X) the efficiency of the source encoder in terms of Entropy H(X) may be
written as:
Source Coding theorem: noiseless coding theorem / Shannon’s first theorem.
L≥H(X)
∑ PK l K *1 - ∑ PK l og 2 1/P K ≥0
K=1 K=1
2
log =1Rewriting
2
M M
∑ PK l K log 22 - ∑ PK l og 2 1/P K ≥0
K=1 K=1
∑ PK ¿ ¿ ≥0
K=1
M
∑ PK ¿ ¿ ≥0
K=1
M
∑ PK ¿ ¿ ≥0
K=1
∑ PK (log22 P K )
lk
≥0
K=1
log P K 2l ≤ P K 2l
k k
lk lk
−log 1/ P K 2 ≤ P K 2 −1
lk lk
log 1/ PK 2 ≥ 1−P K 2
Rewrite in log2 form
M M
1
1
l og2 K=1
∑ P K (log P K 2l )≥ log 2 1/ P K
k
∑ P K (1− P 12l )
l og2 K=1 k
K
M
1
∑ P K ( log P K 2l ) is nothing but L−¿H(X)
l og2 K=1
k
M M
1
L−¿ H(X) ≥ ∑ P K− l 1 ∑ P K P 12l ¿
l og 2 K=1 k
og 2 K =1 K
M
1
≥1(always 1)−¿
l og2 K=1
∑ 2−l k
∑ 2−l ≤1 k
K=1
M
∑ PK =1
K=1
2−l ≤ P K ___A
k
−l k
P K ≥2
P K ˂ 2−l +1 __B k
M M
1
∑ PK l k ˂ ∑ P K (1+l og 2
Pk
)
K=1 K=1
M M M
∑ PK lk ˂ ∑ P K + ∑ PK log 2 P1
K=1 K=1 K=1 k
Prefix Code :
A prefix code is a type of code system distinguished by its possession of the "prefix property",
which requires that there is no whole code word in the system that is a prefix (initial segment) of
any other code word in the system.
For example, a code with code words {9, 55} has the prefix property; a code consisting of
{9, 5, 59, 55} does not, because "5" is a prefix of "59" and also of "55". A prefix code is
a uniquely decodable code: given a complete and accurate sequence, a receiver can identify each
word without requiring a special marker between words. However, there are uniquely decodable
codes that are not prefix codes; for instance, the reverse of a prefix code is still uniquely
decodable (it is a suffix code), but it is not necessarily a prefix code.
variable-length Huffman codes
country calling codes
Chen–Ho encoding
the country and publisher parts of ISBNs
the Secondary Synchronization Codes used in the UMTS W-CDMA 3G Wireless
Standard
VCR Plus+ codes
Unicode Transformation Format, in particular the UTF-8 system for
encoding Unicode characters, which is both a prefix-free code and a self-
synchronizing code
variable-length quantity
Tree:
Solution:
Use
The average codeword length:
M
L= ∑ P K l K
K=1
R= nyquist rate as 1000 samples/sec
3
1
Entropy : H(s) ¿ ∑ Pi log 2
i=1 Pi
Tree:
In {A, C, E} group,
1. In {C, E} group,
P(C) = 0.15 and P(E) = 0.05
So divide them into {C} and {E} and assign 0 to {C} and 1 to {E}
Step:
Tree:
=0.22*2+0.28*2+0.15*3+0.30*2+0.05*3
L = 2.2
r=nyquist rate as 1000 samples/sec
5
1 1 1 1 1
Entropy : H(S) ¿ ∑ Pi log 2 =0.22log 2 +0.28log 2 +0.15log 2 +0.30log 2 +0.05
i=1 Pi 0.22 0.28 0.15 0.30
1
log 2
0.05
H(S) =2.142 symbols/sec
Information Rate:
R = H(S)*r = =2.142 x 1000
R = 2142 bits/sec
Code efficiency=
H(S) =2.142 symbols/sec
Lmin=H(S)
Code efficiency=2.142/2.2 =0.95=95%
Redundancy =1-code efficiency =1-0.95=0.05
Huffman coding algorithm:
The source symbols are listed in order of decreasing probability.
The two source symbols of lowest probability are assigned a 0 and 1. This part of step is
referred to as splitting stage.
These two source symbols are regarded as being combined into a new source symbol with
probability equal to the sum of two original probabilities. The probability of the new
symbolis placed in the list in accordance with its value.
The procedure is repeated until we are left with a final list of source statics of only two for
which a 0 and a 1 are assigned.
As high as possible
MSB
Redundancy=1-codeeffciency=1-0.95=0.05
Lmin=H(S)
As low as possible
Probability Codeword
Symbol
S0 0.4 1
S1 0.2 01
S2 0.2 000
S3 0.1 0010
S4 0.1 0011
Which clearly reflects that Huffman encoding process is not
unique.
0.4
2.422
2.422 2422
0.032 REDUNDANCY
Extended Huffman Coding/Adaptive Huffman
Coding :
In applications where the alphabet size is large, pmax is generally quite small, and the
amount of deviation from the entropy, especially in terms of a percentage of the rate, is
quite small.
However, in cases where the alphabet is small and the probability of occurrence of the
different letters is skewed, the value of pmax can be quite large and the Huffman code
can become rather inefficient when compared to the entropy.
To overcome this inefficiency we use adaptive Huffman coding, the same can be
illustrated with the help of following example:
Consider a source that puts out iid letters from the alphabet A = {a1, a2, a3} with the
probability model P(a1) = 0.8, P(a2) = 0.02, and P(a3) = 0.18. The entropy for this
source is 0.816 bits/symbol. A Huffman code for this source is shown in Table below
TABLE 1: The Huffman code.
Letter Codeword
a1 0
a2 11
a3 10
The average length for this code is 1.2 bits/symbol. The difference between the average
code length and the entropy, or the redundancy, for this code is 0.384 bits/symbol,
which is 47% of the entropy. This means that to code this sequence we would need
47% more bits than the minimum required.
Now for the source described in the above example, instead of generating a codeword
for every symbol, we will generate a codeword for every two symbols. If we look at the
source sequence two at a time, the number of possible symbol pairs, or size of the
extended alphabet, is 32 = 9. The extended alphabet, probability model, and Huffman
code for this example are shown in Table below
TABLE 2: The extended alphabet and corresponding Huffman code.
a1a1 0.64 0
a1a3 0.144 11