0% found this document useful (0 votes)
25 views11 pages

Optimal Source Code: L L L L P L

The document discusses optimal source code and the Kraft inequality, which is essential for determining the expected length of prefix codes. It presents methods for finding optimal codeword lengths, including Shannon-Fano and Huffman coding schemes, emphasizing the importance of minimizing expected length while satisfying certain constraints. Additionally, it addresses the sensitivity of Huffman coding to variations in probability assignments and compares different coding schemes based on their efficiency and expected lengths.

Uploaded by

Turjo Sarker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views11 pages

Optimal Source Code: L L L L P L

The document discusses optimal source code and the Kraft inequality, which is essential for determining the expected length of prefix codes. It presents methods for finding optimal codeword lengths, including Shannon-Fano and Huffman coding schemes, emphasizing the importance of minimizing expected length while satisfying certain constraints. Additionally, it addresses the sensitivity of Huffman coding to variations in probability assignments and compares different coding schemes based on their efficiency and expected lengths.

Uploaded by

Turjo Sarker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Optimal source code

Now we proved that any codeword set that


satisfies the prefix condition has to satisfy the
Kraft inequality,

and that the Kraft inequality is a sufficient con-


dition for the existence of a codeword set with
the specified set of codeword lengths.

We now consider the problem of finding the


prefix code with the minimum expected length.

This is equivalent to finding the set of lengths


l1, l2, . . . , lm satisfying the Kraft inequality
P
and whose expected length L = pili is less
than the expected length of any other prefix
code.

This is a standard optimization problem: Min-


imize L over all integers l1, l2, . . . , lm
satisfying Kraft inequality over the li’s.
23
Bounds on Optimal Code Length

We now demonstrate a code that achieves an


expected length L within 1 bit of the lower
bound; that is, H(X) ≤ L < H(X) + 1

P
We wish to minimize L = pili subject to the
constraint that l1, l2, . . . , lm are integers
P −l
and r i ≤ 1.

We proved that the optimal codeword lengths


can be found by finding the probability distribu-
tion closest to the distribution of X in relative
entropy,

pi may not equal an integer, we round it up to


give integer word-length assignments,

li = dlogr p1 e
i

These lengths satisfy the Kraft inequality since


24
P −dlog p1 e P −log p1

P
r i r i = pi = 1

This choice of codeword lengths satisfies


−li
Let Qi = qr
r−li
X

i=1

logr p1 ≤ li < logr p1 + 1


i i

Multiplying by pi and summing over i, we ob-


tain Hr (X) ≤ L < Hr (X) + 1

Since an optimal code can only be better than


this code, we have the following theorem.

Theorem: Let l1, l2, . . . , lm be optimal code-


word lengths for a source distribution p and a
r-ary alphabet, and let L be the associated ex-
P
pected length of an optimal code (L = pili
).

Then Hr (X) ≤ L < Hr (X) + 1


Shannon First Theorem

Hr (S) ≤ L < Hr (S) + 1

Then Hr (S n) ≤ Ln < Hr (S n) + 1 or, Hr (S) ≤


Ln < H (S) + 1
n r n

Ln
In the limit, lim = Hr (S)
n→∞ n

This is the noiseless coding theorem which sug-


gests that by coding the n-th extension of S,
one can make the average number of r-ary
code symbols per source symbol as small as
possible but not smaller than the entropy of
the source. Lnn better approximates the aver-
age codeword length.

For Markov source, the adjoint will obey the


bound on L. i.e. Hr (S̄) ≤ L

Augmenting previous results,


25
Hr (S) ≤ Hr (S̄) ≤ L

and Hr (S n) ≤ Hr (S¯n) ≤ Ln

Now select li as unique integer satisfying

logr P1 ≤ li < logr P1 + 1 - so that


i i

Hr (S)+ Hr (S̄)−H
n
r (S)
≤ Ln < H (S)+ Hr (S̄)−Hr (S)+1
n r n

Ln
Here also lim can be made as close to
n→∞ n
Hr (S) as possible.
Shannon Fano coding scheme

The length assignment described above gives


rise to Shannon-Fano code. So, lenghts are
chosen as logr ( P1 ) ≤ li < logr ( P1 ) + 1
i i

The problem with this scheme is that it does


not consider the relative positions of symbols
with respect to one another.

It only considers absolute probabilities. Ex.


pA = 1/210 and pB = 1 − 1/210 gives lA = 10
and lB = 1.

From common sense, it appears that the choice


should be lA and lB should both be 1.

However, in the long run, the Shannon-Fano


assignment would not increase the average length
too much. Nevertheless, we should do better.
26
Huffman coding scheme

Efficiency of code is given by η = HrL(S)

Redundancy of code is given by 1−η = Hr (S)−L


L

An optimal (shortest expected length) prefix


code for a given distribution can be constructed
by a simple algorithm discovered by Huffman.

First, sort the probabilities P1 ≥ P2 ≥ . . . ≥ Pq

Reduce the source to q-1 symbols and reorder


(sort again) until no of symbols =2.

Reduced sequence Sj has sα which has sα0 and


sα1 in Sj−1.

Then Pα = Pα0 + Pα1

27
At every level, if the two symbols having small-
est probabilities are collapsed into a compound
symbol and we go on constructing the coding
tree as a heap, we get the optimal code.

Then, overall average length would increase by


an amount Lj−1 = Lj + Pα0 + Pα1 ; since only
these two symbols have increased length.

So, the overhead due to expansion of a com-


pound symbol is minimum if the smallest prob-
ability symbols are chosen for the purpose.

Any other choice of sα0 and sα1 would not be


optimal.

This exhibits a greedy choice and therefore at


every stage of heap formation, we should sort
the symbol probabilities and collapse the two
symbols with smallest probabilities as we go up
towards the root of the coding tree.
Sensitivity of Huffman coding scheme

Suppose the probability assignment used for


code compression is different from what occurs
in real life.

p0i = pi + ei such that

q
1 2 1 P 2
X
q ei = 0 and var(ei) = σ = q ei
i=1

Hence, L0 = 1q lip0i = L + 1q
P P
liei

1 P
We should examine q liei to get a better feel
of how the length is affected by noise.

Using Lagrange multipliers λ and µ,

1 1 1
L = q liei − λ q ei − µ( q ei − σ 2)
P P 2

28
∂L = 1 (l − λ − 2µe ) = 0
∂ei q i i

On summing over i, 1q i li − λ = 0
P

gives λ = 1q i li.
P

∂L = 0 gives
ei ∂e
i
P
1 2µ 2 ) = 0 to get µ = ei l i

P P
q l i ei q (ei 2qσ 2
.

∂L = 0 gives
Now, putting λ and µ in li ∂e
i

1 P l2 − λ 1 P l − 2µ 1 P(e l ) = 0
q i q i q i i

1 P 2 1 P 2 1
Then, we can write ( q eili) = ( q li −( q li)2)σ 2
P

1
i.e. ( q eili)2 = Var (li) Var (ei)
P

This implies that high variance of codeword


lengths make the average length of code more
prone to variation with noise.
Comparing some coding schemes

Symbol Space Prob Code-I Code-II Code-III


A .5 0 00 0111
B .3 10 01 011
C .1 110 10 01
D .1 111 11 0

Expected length (Code-I) = 1.7 (uniquely de-


codable and instantaneous)

For Code-II it is 2.0 (fixed length, easy to de-


code)

For Code-III it is 3.2 (uniquely decodable, not


instantaneous; not efficient as well).

29

You might also like