Lec3 Source Coding Annotated Day4
Lec3 Source Coding Annotated Day4
• Given an information source and a noisy channel, information theory provides limits on
Maximum rate at which reliable communication can take place over the noisy
channel
job of channel coding
What is Information?
• What does “information” mean?
• There is no some exact definition, however:
• Information carries new specific knowledge, which is definitely new
for its recipient;
• Information is always carried by some specific carrier in different
forms (letters, digits, different specific symbols, sequences of digits,
letters, and symbols , etc.);
• Information is meaningful only if the recipient is able to interpret it.
7
Information
• The information materialized is a message.
• Information is always about something (size of a parameter,
occurrence of an event, etc).
• Viewed in this manner, information does not have to be accurate; it
may be a truth or a lie.
• Even a disruptive noise used to inhibit the flow of communication and
create misunderstanding would in this view be a form of information.
• However, generally speaking, if the amount of information in the
received message increases, the message is more accurate.
8
Information Theory Related Questions
• How we can measure the amount of information?
• How we can ensure the correctness of information?
• What to do if information gets corrupted by errors?
• How much memory does it require to store/transmit information?
• Can we reduce the time taken to transfer the information
(compression)?
9
Information Content
• What is the information content of any message?
• Shannon’s answer is: The information content of a message consists
simply of the number of 1s and 0s it takes to transmit it. Or information
can be measured in bits.
• Hence, the elementary unit of information is a binary unit: a bit, which
can be either 1 or 0; “true” or “false”; “yes” or “no”, etc.
11
InformationUncertainty
• Zero information
• Sachin Tendulkar retired from Professional Cricket. (celebrity, known fact)
• Narendra Modi is the Prime Minister of India. (Known fact)
• Little information
• It will rain in Bangalore in the month of August. (not much uncertainty since
Aug. is monsoon time)
• Large information
• An earthquake is going to hit Bangalore tomorrow. (are you sure? an unlikely
event)
• Someone solved world hunger problem. (Seriously?)
12
Mathematical Model for a Discrete-Time Information Source
𝑝 0 ,𝑝 1 , …, 𝑝 𝐾 −1
with the probability of occurrence such that
K -1
P( s sk ) pk , k 0,1, .. , K -1 and p
k 0
k 1
14
Measure of Information
15
Definition of Information
• Using such intuition, Hartley proposed the following definition of the
information:
• The amount of information gain after observing the event , which
occurs with probability , as the logarithmic function
bits
16
• Properties:
• = 0 for =1
• No info. gain if we absolutely certain about the outcome of an event.
• 0 for 1
• The occurrence of the event provides some or no info., but never brings about a loss of info.
• for
• The less probable the event is, the more info we gain when it occurs.
• +,
• if and are statistically independent.
• Normally, the base of logarithm is accepted as 2 and the unit of information is bit (binary digit).
Example of Amount of Information
• One flip of a fair coin:
• Before the flip, there are two equally probable choices: heads or tails.
P(H)=P(T)=1/2. Amount of information = log2(2/1) = 1 bit
• Roll of two dice:
• Each die has six faces, so in the roll of two dice there are 36 possible
combinations for the outcome. Amount of information = log2(36/1) = 5.2 bits.
• A randomly chosen decimal digit is even:
• There are ten decimal digits; five of them are even (0, 2, 4, 6, 8). Amount of
information = log2(10/5) = 1 bit.
18
Entropy (Average information per Message)
• Clearly, is a discrete RV that takes values ,…, , respectively.
• The mean value of over the source alphabet is given by
H ( A) E[ I ( sk )]
K 1
pk I ( sk )
k 0
K 1
1
pk log 2
k 0 pk
• This is called Entropy of a DMS with source alphabet .
• It is the measure of average info content per source symbol/message.
• It depends only on the probabilities of the source symbols.
19
Some Properties of Entropy
• =0, if and only if the probability for some k , and the remaining
probabilities in the set are all zero
No uncertainty
• if and only if the probability for all k (all the symbols in the set are
equiprobable)
maximum uncertainty
Example: Entropy of Binary Memoryless Source
HH((A
S )) - p0 log 2 p0 - p1 log 2 p1
- p0 log 2 p0 - (1- p0 ) log 2 (1- p0 ), (bits)
H ( )
1.0
• When =0.5, it is called
binary symmetric source
p0
0 1
2
1 21
Extension of a Discrete Memoryless Source
K 1
1
H ( A) pi log 2
i 0 pi
K 1 K 1 1
H ( A ) pi p j log 2
2 Since the source is
pp memoryless
i 0 j 0 i j
K 1
K 1
1 1
pi p j log 2 log 2
p p
i 0 j 0 i j
H ( A) H ( A) 2 H ( A)
2. Next consider the 2nd order extension of the source. Since has three symbols, it
follows that the source alphabet of the extended source has =9 symbols. Now,
find the entropy of the extended source.
Verify that .
3. A DMS source has an alphabet of size K and the source outputs are
equally-likely. Find the entropy of that source.
Ans. Log(K)
• The original source sequence must be perfectly reconstructed from the encoded
binary sequence (Lossless encoding).
26
Example
Source Encoder
• Do all the symbols in the source alphabet occur with same
probability?
Each alphanumeric
symbol to a sequence
of dots and dashes
Discrete 𝑠𝑘 Source 𝑏𝑘 Binary
memoryless
Average Code-word Length source
encoder sequence
K 1
L pk lk
k 0
Coding Efficiency
• Let denote the minimum possible value of Then, the coding efficiency
() is defined as
Lmin
K 1
L where L pk lk
k 0
33
Intuition behind Lmin H ( A)
• Since the source is memoryless, the probability of a typical sequence
N
P( S s) p npi
i
i 1
N
2 npi log 2 pi This means that for large n, almost all the
i 1
o/p sequences of length n of the source are
n
n pi log 2 pi equally probable with prob.
2 i 1
• since p = 1/N
• If this dice was loaded such that outcomes 6 and 5 are more
likely than others p(X=5) = 0.5 and p(X=6) = 1/3. The rest of the
outcomes occur with equally probability. Compute the entropy
in this scenario.
• Loaded Dice:
• Entropy: 1 1 1
H log 2 (2) log 2 (3) 4 * log 2 (24)
2 3 24
H 1.7925bits
39
When the Source has Memory
• For a source having memory (example: printed English text has lot of
dependency between letters and words), the outputs of the source
are not independent and previous outputs reveal some information
about the future ones.
• This dependency reduces uncertainty, and the average information
info. content per source symbol is less for such a source.
Source Coding Theorem - Shannon
• Source coding theorem establishes a fundamental limit on the rate at
which the output of an information source can be compressed
without causing large error probability at the receiver.
• This is one of the fundamental theorems of information theory.
41
Source Coding Algorithms
• The theorem, first proved by Shannon, only gives the theoretical bound for the
performance of the encoders. It does not provide any algorithm for design of
such optimum codes.
• As a result, several algorithms have been developed that try to compress the
information at the source in a fashion such that the data is recoverable at the
receiver without any losses.
42
Classification of codes
1 0 10 0
2 010 00 10
3 01 11 110
4 10 110 111
43
Classification of codes
• Non-singular (distinct) codes: If each codeword is distinguishable from other codeword.
• Uniquely decodable codes: Set of codes that can be decoded in only one way
Code-1, You receive 01010 decode as x2 x4 or x1 x4 x4 (non-uniq.deco.)
• Instantaneously decodable: A uniquely decodable code is called an instantaneous code if the end of
any code word is recognizable without examining subsequent code symbols.
• Prefix-free Code: A code is said to be prefix-free code if no code word is prefix to another code word
• Prefix-free condition is a sufficient but not necessary condition for a code to be uniquely decodable
• Prefix-free uniquely decodable
• Reverse may not true always
44
Kraft Inequality
Note: However, it does not guarantee that any code that satisfies this inequality is
automatically uniquely decodable.
You need to check U.D. condition separately
What we want?
• Huffman codes are prefix-free codes with minimum average codeword length. In
this sense, they are optimal.
50
Steps
Huffman Encoding Example 1
• Average codeword length for Huffman code is
_
• The performance of the optimal code is almost close to the optimal one. It is
measured by coding efficiency, which is 2.418/2.45=0.97
• Huffman code is uniquely decodable. To emit the following message sequence
You can verify that this seq. can be decoded only one way i.e.,
More Examples on Huffman Coding
1. Determine the Huffman code for a source with alphabet A =
{a1, a2, a3, a4, a5} with probabilities; 1/3, ¼, 1/6, 1/8 and 1/8.
How does the length of the Huffman code compare with the
Entropy of the source for each case?
55
Solution
• Problem 1:
• The Entropy =2.2091 bits
• Average length = 27/12=2.25 bits
• Coding efficiency =2.20/2.25= 98.18%.
• Problem 2:
• The Entropy = 2.585
• Average length = 16/6=2.667
• Coding Efficiency = 96.93%.
56
More Examples on Huffman Coding
• Determine the Huffman code for a loaded coin. The p(X=head)=0.9. Compare this
with the entropy and determine the efficiency of your code.
57
solution
• In the first case, the outcome heads is most likely but it is not possible to encode the outcome
using less than one bit. Although the entropy is 0.468, the number of bits needed is 1 so the
efficiency is only .46
• In the second case, the codes are; a : 0, b: 10, c: 11. The average length = 1.27 vs. the entropy
which is 0.9443. Efficiency: 0.9443/1.27 = .74
58
Example: Extended Source
Solution
Block Coding
• As we use the Huffman coding algorithm over
longer and longer blocks of symbols, the average
number of bits required to encode each symbol
approaches the entropy of the source. (See the
previous example)
• If the Huffman code is designed for sequences of source letters of length n (the
nth order extension of the source), we have
H ( An ) Ln H ( An ) 1
• Where is the average codeword length for the extended source and thus, the
codeword length per message is, on an average,
•
L Ln / n
• If the source is memoryless, H ( An ) nH ( A)
1
• Therefore, H ( A) L H ( A)
n
• Thus, as n
65
Shanon-Fano Encoding
In Shannon-Fano encoding the ambiguity may arise in the choice of approximately equiprobable sets.
Drawback of Huffman & Shanon-Fano Coding
• The Huffman code is optimal in the sense that, for a given source, they provide a
prefix code with minimum no of bits /message.
• Not a good choice for any practical source whose statistics are not known in
advance.
• The Lampel-Ziv algorithm belongs to the class of universal source coding algo, i.e.,
algorithms that are independent of source statistics. This is a variable-to-fixed
length coding scheme.
Lempel-Ziv (L-Z) Coding
72
Example 1
• Consider the following sequence
101011011010101010
In which dictionary
location the longest
prefix of the content The last message
After Parsing (the phrases are) appeared before of the content
1, 0, 10, 11, 01, 101,010, 1010
Dictionary location Contents Codeword
1 1 (0,1)
2 0 (0,0)
3 10 (1,0)
4 11 (1,1)
5 01 (2,1)
6 101 (3,1)
7 010 (5,0)
8 1010 (6,0)
Example 2
• Consider the following sequence
ABBAABBAABBABAABAA
After Parsing:
A, B, BA, AB, BAA, BB, ABA, ABAA
Dictionary location Contents Codeword
1 A (0,A)
2 B (0,B)
3 BA (2, A)
4 AB (1,B)
5 BAA (3,A)
6 BB (2,B)
7 ABA (4,A)
8 ABAA (7,A)
• Let … = C(n)
• Total # of symbols in the alphabet = K
• Then L-Z encoding yields a fixed length code sequence of length
C (n)log 2 C (n) log 2 K bits