Source Coding
Source Coding
EC 303
Recap
• Uncertainty=Information
• Uncertainty measured in terms of probability
of occurrence (determined experimentally
using sufficiently large number of
experiments).
• Information: inversely proportional to
uncertainty measured by probability of
occurrence.
2
Uncertainty, Information and Entropy
3
Entropy
• The expected information may be quantified in a set of possible
outcomes or mutually exclusive events.
• Specifically, if an event i occurs with probability pi, 1<= i<= N out of
a set of N mutually exclusive events, then the average or expected
information is given by
N
H ( p1 , p2 ,..., pN ) = pi log (1/ pi )
i =1
4
Source Coding
• Source Coding, is taking a set of messages that need to be sent from a
sender and encoding them in a way that is efficient. The notions of
information and entropy will be fundamentally important in this effort.
• E.g. ASCII coding, .wav format for audio, Digital images saving 3 color
values using 8 bits each. All these encodings involve a sequence of fixed-
length symbols, each of which can be easily manipulated independently.
For example, to find the 42nd character in the file, one just looks at the
42nd byte and interprets those 8 bits as an ASCII character.
• Do all the characters (ASCII, 128 characters) occur with same probability?
• Should all be encoded using same number of bits?
5
Compression
• Specifically, the entropy, defined earlier, tells us
the expected amount of information in a
message, when the message is drawn from a set
of possible messages, each occurring with some
probability.
7
Lossy Data Compression
• Shannon also developed the theory of lossy data
compression. This is better known as rate-distortion theory. In
lossy data compression, the decompressed data does not
have to be exactly the same as the original data.
8
Rate-Distortion Theory
– We trade off rate (number of bits per symbol)
versus distortion this is represented by a rate-
distortion function R(D)
9
Example
• Consider a fair dice (each outcome is equally likely). Compute the entropy
of this source.
• If this dice was loaded such that outcomes 6 and 5 are more likely than
others p(X=5) = 0.5 and p(X=6) = 1/3. Compute the entropy in this
scenario. What can you conclude from the above two results?
10
Example
• The number of outcomes for the dice (or the alphabet for the
DMS behind it) are {1, 2, 3, 4, 5, 6}. In the first case, the
probability for each is the same or 1/6.
• Entropy = 6*(log2(6)/6) = 2.585
• Loaded Dice:
• Entropy:
1 1 1
H = log 2 (2) + log 2 (3) + 4 * log 2 (24)
2 3 24
H = 1.7925
11
Source Coding Theorem
• Source coding theorem establishes a fundamental limit on the rate at
which the output of an information source can be compressed without
causing large error probability at the receiver. This is one of the
fundamental theorems of information theory.
12
Source Coding Theorem
• The theorem, first proved by Shannon, only gives the theoretical bound for
the performance of the encoders. It does not provide any algorithm for
design of such optimum codes.
13
Huffman Coding
• In Huffman coding, fixed length blocks of the source output are mapped to
variable length binary blocks. The idea here is to map the more frequently
occurring fixed length sequences to shorter binary sequences and the less
frequently occurring ones to longer binary sequences.
14
Example
• Consider the following case where an alphabet of size 5 is
mapped using variable length coding.
a1 ½ 1 1 0 00
a2 ¼ 01 10 10 01
15
Example
• Code must be uniquely decodable.
• Code should be instantaneous.
• Prefix Condition: A code is said to satisfy prefix condition if no code
word is prefix to another code word.
16
Compression
• The average length achieved by a code is computed as the sum of
binary code length ni weighted by the probability of the
corresponding code symbol.
17
Huffman Encoding Algorithm
• Sort the source outputs in the decreasing order of their probability.
• Merge the two least probable outputs into a single output whose
probability is the sum of the corresponding probabilities.
• Continue this till the number of remaining outputs is equal to two.
• Arbitrarily assign 0 and 1 as code words for the two remaining outputs.
• If the output is the result of merger of two outputs; append the current
code word with 0 and 1 to obtain the next code word. Repeat the last step
till there are no mergers.
18
Huffman Encoding
• Two entries with the lowest relative frequency are merged to form a new
branch with their composite probability. After every merging, the new
branch and the remaining branches are reordered such that the reduced
table preserves the descending probability occurrence.
• At this time, the new branch rises until it can rise no more. This bubbling
results in a code with lowest code length variance.
• If bubbling is not done; the code length will stay the same but with high
length variance.
19
Example
• Determine the Huffman code for a source with alphabet A = {a1, a2, a3, a4,
a5} with probabilities; 1/3, ¼, 1/6, 1/8 and 1/8.
• Determine the Huffman code for the output for a loaded dice; p(X=5) = 0.5
and p(X=6) = 1/3.
• How does the length of the Huffman code compare with the Entropy of
the source?
20
Solution
• Problem 1:
– The Entropy =2.2091
– Average length = 27/12=2.25
– Efficiency = 98.18%, compression ratio = 3/2.25=1.33
• Problem 2:
– The Entropy = 2.585
– Average length = 16/6=2.667
– Efficiency = 96.93% Compression = 3/2.667=1.12
• Problem 3:
– The Entropy = 1.7935
– Average length = 11/6 = 1.833
– Efficiency = 97.85% Compression = 3/1.833=1.6367
21
Huffman Code
• Determine the Huffman code for a loaded coin. The p(X=head)=0.9.
Compare this with the entropy and determine the efficiency of your code.
22
Code Extension
• In the first case, the outcome heads is most likely but it is not possible to
encode the outcome using less than one bit. Although the entropy is
0.4690, the number of bits needed is 1 so the efficiency is only 46.9 %
• In the second case, the codes are; a : 1, b: 01, c: 00. The average length =
1.31 vs. the entropy which is 0.9443. This code provides good compression
but efficiency is low.
• Compression ratio = 2/1.31 = 1.53
• Efficiency: 0.9443/1.31 = 72%
23
Code Extension
• To improve the code efficiency, we have to redefine the source alphabet.
With larger source, there is a better probability of variance is occurrence,
which in turn is necessary for reduction in average code length.
• How do we increase the source alphabet? Consider using two (or more)
outcomes of the coin flipping experiment at a time. Now the source
alphabet is of size 4.
• p(HH) = 0.81, p(HT)=0.09, p(TH) = 0.09, p(TT)=0.01
• Determine the entropy and average code length in this case and when
source alphabet is of size 8.
24
Code Extension
• Alphabet size = 4
• Entropy = 0.9380
• Code: HH: 1, HT: 00, TH: 011, TT: 000
• Average length = 1.29
• Efficiency: 0.9380/1.29 = 72%
• Alphabet size = 8
• Entropy: 1.4070
• Average length: 1.5970
• Efficiency = 1.4070/1.5970=88%
25
Block Coding
• As the block length increases, the code efficiency seems to improve.
This results is known as the “noiseless coding theorem” of Shannon,
which says that as we use the Huffman coding algorithm over longer
and longer blocks of symbols, the average number of bits required to
encode each symbol approaches the entropy of the source.
H ( X ) L( X ) H ( X ) + 1
H ( X n ) L( X n ) H ( X n ) + 1
H ( X n ) = nH ( X )
H ( X ) L( X n ) / n H ( X n ) + 1 / n
lim L( X n ) / n = H ( X )
n→
26
Extension Codes
• Thus, code extension offers a powerful technique to improve the
efficiency of the code. In the cases discussed here, we assumed that
the individual outcomes were independent. That is not always the case.
• For example, while encoding English text, certain combinations are very
likely to occur than others. Consider qu, ch, sh, ng and so on. If
probabilities are assigned based on this information, more efficient
code can be obtained.
• Read: Huffman coding for Fax transmission pages 866-868 from ref [1].
27
Frequency of Occurrence
28
Example
• Given a long sequences of ternary symbols “a”, “b” and “c”; it was
observed that “a”, “b” and “c” were equally likely. 50% of the time, the
alphabet would be followed by the consecutive alphabet. The alphabet
would be followed by itself only 20% of the times.
• You are asked to design the Huffman code for this binary source. First
determine the entropy of the source and the efficiency of the generated
Huffman code.
29