0% found this document useful (0 votes)
86 views75 pages

Lec3 Source Coding Annotated Day4

This document discusses source coding and summarizes key concepts from chapters 15 and 5 of referenced textbooks. It covers: 1) Analog to digital conversion techniques like sampling, quantization, and digital pulse modulation. 2) Key issues in evaluating performance of digital communication systems including efficiently representing information sources and reliably transmitting information over noisy channels. 3) Information theory provides limits on the minimum number of bits needed to represent a source and the maximum rate information can be transmitted over a channel.

Uploaded by

Paras Vekariya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views75 pages

Lec3 Source Coding Annotated Day4

This document discusses source coding and summarizes key concepts from chapters 15 and 5 of referenced textbooks. It covers: 1) Analog to digital conversion techniques like sampling, quantization, and digital pulse modulation. 2) Key issues in evaluating performance of digital communication systems including efficiently representing information sources and reliably transmitting information over noisy channels. 3) Information theory provides limits on the minimum number of bits needed to represent a source and the maximum rate information can be transmitted over a channel.

Uploaded by

Paras Vekariya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 75

Source Coding

Chapter 15  up to Section 15.3, from B.P. Lathi book.

Chapter 5 Digital Communication Systems by Simon Haykin


Review of previous class
• Analog signal to digital signal conversion
• Sampling
• Quantization
• Uniform vs non-uniform
• Digital pulse modulation techniques
• Pulse code modulation (PCM)
• Differential pulse code modulation (DPCM)
• Delta modulation (DM)
Key Issues in Evaluating Performance of Digital Comm. System

• The information from a given source needs to be represented efficiently.


• The rate at which information can be transmitted reliably over a noisy channel.

• Given an information source and a noisy channel, information theory provides limits on

 Minimum no of bits per symbol to fully represent the source


 job of source coding

 Maximum rate at which reliable communication can take place over the noisy
channel
 job of channel coding
What is Information?
• What does “information” mean?
• There is no some exact definition, however:
• Information carries new specific knowledge, which is definitely new
for its recipient;
• Information is always carried by some specific carrier in different
forms (letters, digits, different specific symbols, sequences of digits,
letters, and symbols , etc.);
• Information is meaningful only if the recipient is able to interpret it.

7
Information
• The information materialized is a message.
• Information is always about something (size of a parameter,
occurrence of an event, etc).
• Viewed in this manner, information does not have to be accurate; it
may be a truth or a lie.
• Even a disruptive noise used to inhibit the flow of communication and
create misunderstanding would in this view be a form of information.
• However, generally speaking, if the amount of information in the
received message increases, the message is more accurate.

8
Information Theory Related Questions
• How we can measure the amount of information?
• How we can ensure the correctness of information?
• What to do if information gets corrupted by errors?
• How much memory does it require to store/transmit information?
• Can we reduce the time taken to transfer the information
(compression)?

9
Information Content
• What is the information content of any message?
• Shannon’s answer is: The information content of a message consists
simply of the number of 1s and 0s it takes to transmit it. Or information
can be measured in bits.
• Hence, the elementary unit of information is a binary unit: a bit, which
can be either 1 or 0; “true” or “false”; “yes” or “no”, etc.

• One of the basic postulates of information theory is that information


can be treated like a measurable physical quantity, such as density or
mass.
10
Information
• What do we mean by information?
• “A numerical measure of the uncertainty of an experimental outcome”

• How to quantitatively measure and represent information?

• Let us first look at how we assess the amount of information in our


daily lives using common sense

11
InformationUncertainty
• Zero information
• Sachin Tendulkar retired from Professional Cricket. (celebrity, known fact)
• Narendra Modi is the Prime Minister of India. (Known fact)
• Little information
• It will rain in Bangalore in the month of August. (not much uncertainty since
Aug. is monsoon time)
• Large information
• An earthquake is going to hit Bangalore tomorrow. (are you sure? an unlikely
event)
• Someone solved world hunger problem. (Seriously?)

12
Mathematical Model for a Discrete-Time Information Source

•We will study a simple model for information source,


called discrete memoryless source (DMS).
•Definition of DMS: A DMS is a discrete-time, discrete
amplitude random process in which the symbols (letters)
s are generated independently and with same
distribution.
•Thus, a DMS generates a sequence of i.i.d. RVs that take
takes values in a discrete set. The symbols emitted at any
time is independent
Information source of previous choices
𝑆 0 , 𝑆1 , 𝑆2 … 13
Mathematical Model for a Discrete-Time
Information Source
• The source output is modeled as a discrete random variable S which takes
on symbols from a fixed finite alphabet
𝐴= { 𝑠 0 , 𝑠1 , … , 𝑠 𝐾 − 1 }

𝑝 0 ,𝑝 1 , …, 𝑝 𝐾 −1
with the probability of occurrence such that
K -1
P( s  sk )  pk , k  0,1, .. , K -1 and p
k 0
k 1

• The full description of the DMS is𝑝given


0 ,𝑝 1 , by
…,the
𝑝 𝐾set
−1 A, called the
alphabet, and the probabilities

14
Measure of Information

Interrelations between information, uncertainty or surprise


No surprise no information

If the probability and for , then there is no surprise and, therefore, no


information when symbol is emitted.

The amount of information may be related to the inverse of


the probability of occurrence (p).

15
Definition of Information
• Using such intuition, Hartley proposed the following definition of the
information:
• The amount of information gain after observing the event , which
occurs with probability , as the logarithmic function

bits

Minimum no. of binary digits required to encode the message

16
• Properties:
• = 0 for =1
• No info. gain if we absolutely certain about the outcome of an event.

• 0 for 1
• The occurrence of the event provides some or no info., but never brings about a loss of info.

• for
• The less probable the event is, the more info we gain when it occurs.

• +,
• if and are statistically independent.

• Normally, the base of logarithm is accepted as 2 and the unit of information is bit (binary digit).
Example of Amount of Information
• One flip of a fair coin:
• Before the flip, there are two equally probable choices: heads or tails.
P(H)=P(T)=1/2. Amount of information = log2(2/1) = 1 bit
• Roll of two dice:
• Each die has six faces, so in the roll of two dice there are 36 possible
combinations for the outcome. Amount of information = log2(36/1) = 5.2 bits.
• A randomly chosen decimal digit is even:
• There are ten decimal digits; five of them are even (0, 2, 4, 6, 8). Amount of
information = log2(10/5) = 1 bit.

18
Entropy (Average information per Message)
• Clearly, is a discrete RV that takes values ,…, , respectively.
• The mean value of over the source alphabet is given by

H ( A)  E[ I ( sk )]
K 1
  pk I ( sk )
k 0
K 1
 1 
  pk log 2  
k 0  pk 
• This is called Entropy of a DMS with source alphabet .
• It is the measure of average info content per source symbol/message.
• It depends only on the probabilities of the source symbols.
19
Some Properties of Entropy

Where K is the number of symbols of the alphabet of the source.

• =0, if and only if the probability for some k , and the remaining
probabilities in the set are all zero
No uncertainty
• if and only if the probability for all k (all the symbols in the set are
equiprobable)
maximum uncertainty
Example: Entropy of Binary Memoryless Source

• For binary source of symbol 0 with prob. , and symbol 1


with prob. p1  1  p0

HH((A
S ))  - p0 log 2 p0 - p1 log 2 p1
 - p0 log 2 p0 - (1- p0 ) log 2 (1- p0 ), (bits)
H ( )
1.0
• When =0.5, it is called
binary symmetric source

p0
0 1
2
1 21
Extension of a Discrete Memoryless Source

• Sometimes it is useful to consider blocks of symbols than individual


symbols. Each block consisting of successive source symbols.

• We view it as an extended source with source alphabet that has distinct


blocks, where K is the no of symbols in the source alphabet of the
original source.

• In DMS, the source symbols are statistically independent, the entropy of


the extended source
How?
Proof: Consider a simple case n=2

K 1
1
H ( A)   pi log 2  
i 0  pi 
K 1 K 1  1 
H ( A )   pi p j log 2 
2  Since the source is
pp  memoryless
i 0 j 0  i j
K 1 
K 1
1  1 
  pi  p j log 2    log 2  
 p  p 
i 0 j 0  i  j 
 H ( A)  H ( A)  2 H ( A)

We can extend it for n > 2 as well.


Example
1. Consider a DMS with alphabet A={ 𝑠0 , 𝑠 1 , 𝑠 2} with =1/4, =1/4, =1/2. What is the
entropy of this source?

2. Next consider the 2nd order extension of the source. Since has three symbols, it
follows that the source alphabet of the extended source has =9 symbols. Now,
find the entropy of the extended source.
Verify that .
3. A DMS source has an alphabet of size K and the source outputs are
equally-likely. Find the entropy of that source.

Ans. Log(K)

• Uniform distribution has maximum entropy since the symbols are


equiprobable and uncertainty is the highest.
Source Coding
• Efficient representation of data generated by a source: The process by which this
representation is done is called source encoding. The device that performs this
representation is called source encoder.

An efficient encoder satisfies two functional requirements:


• The code-words produced by the encoder are typically binary in form aiming to
minimize the average bit rate required to represent the source by “reducing the
redundancy”

• The original source sequence must be perfectly reconstructed from the encoded
binary sequence (Lossless encoding).

26
Example
Source Encoder
• Do all the symbols in the source alphabet occur with same
probability?

• Should all be encoded using same number of bits?


• Less frequent-occurring symbols can be encoded with more bits than more
frequent-occurring symbols to reduce the average no of bits/message

• Lets see an example


Morse Code
• Source alphabet:
• Code alphabet:
• Code function:

Each alphanumeric
symbol to a sequence
of dots and dashes
Discrete 𝑠𝑘 Source 𝑏𝑘 Binary
memoryless
Average Code-word Length source
encoder sequence

• Let the binary code-word assigned to symbol by the encoder have


length (bits).
• Average code-word length () : Average no of bits per source symbol

K 1
L   pk lk
k 0
Coding Efficiency
• Let denote the minimum possible value of Then, the coding efficiency
() is defined as
Lmin
 K 1
L where L   pk lk
k 0

• We clearly have since . The source encoder is said to be efficient


when 1.
• W?
Source-Coding Theorem (by Shannon, 1948)
• Given a DMS source of entropy , the average code-word length for
any source encoding is bounded by
L  H ( A)
Lmin  H ( A)

• Hence, the entropy represents a fundamental limit on the average


no. of bits per source symbol.
• Source Coding Efficiency is defined as
H ( A)

L
Intuition behind Lmin  H ( A)
• Specifically, the entropy has a very intuitive meaning.
• Let us assume an extended source of length from a DMS where n

• Total no. of symbols in alphabet A is N.


• With very high probability (if n)
symbol repeats approximately times
symbol repeats approximately times
… symbol repeats approximately times
--All other compositions are extremely unlikely to occur.
• This means that with very high probability, every such sequence of the source has the same
composition, and therefore, the same probability.
Such a sequence is generally called the Typical sequence.

33
Intuition behind Lmin  H ( A)
• Since the source is memoryless, the probability of a typical sequence

N
P( S  s)   p npi
i
i 1
N
  2 npi log 2 pi This means that for large n, almost all the
i 1
o/p sequences of length n of the source are
n
n  pi log 2 pi equally probable with prob.
2 i 1

Such sequences are called typical sequences


 nH ( A )
2
# of typical sequences
Intuition behind Lmin  H ( A)
• Total no. of sequences of length n from alphabet of size N is
• But, the effective no. of sequences (typical sequences) .

• We mean that almost nothing is lost by neglecting the other


(-) number of sequences for very high n.

• Thus, in practice, it is enough to consider the set of typical sequences


than the set of all possible outputs. This is the essence of data
compression.
Intuition behind Lmin  H ( A)
• # of bits required to represent the typical sequences = nH(A)
• However, these bits are used to represent source output of length n.
• Therefore, on the average, any source output requires
nH(A)/n = H(A)
number of bits per source symbol for an essentially error-free
representation.

• This justifies why we use


Lmin  H ( A)
Exceptional case
• A DMS can be compressed if its PMF is not taken from uniform distribution.
• Why?

• since p = 1/N

• Then, no. of typical sequences =

• Total no. of possible sequences of length n from alphabet of size N is

• So, no compression is possible for source following uniform distribution.


Example
• Consider a fair dice (each outcome is equally likely). Compute
the entropy of this source.

• If this dice was loaded such that outcomes 6 and 5 are more
likely than others p(X=5) = 0.5 and p(X=6) = 1/3. The rest of the
outcomes occur with equally probability. Compute the entropy
in this scenario.

• What can you conclude from the above two results?


38
solution
• The number of outcomes for the dice (or the
alphabet for the DMS behind it) are {1, 2, 3, 4,
5, 6}. In the first case, the probability for each is
the same or 1/6.
• Entropy = 6*(log2(6)/6) = 2.585 bits

• Loaded Dice:
• Entropy: 1 1  1 
H log 2 (2)  log 2 (3)  4 *  log 2 (24) 
2 3  24 
H  1.7925bits

39
When the Source has Memory
• For a source having memory (example: printed English text has lot of
dependency between letters and words), the outputs of the source
are not independent and previous outputs reveal some information
about the future ones.
• This dependency reduces uncertainty, and the average information
info. content per source symbol is less for such a source.
Source Coding Theorem - Shannon
• Source coding theorem establishes a fundamental limit on the rate at
which the output of an information source can be compressed
without causing large error probability at the receiver.
• This is one of the fundamental theorems of information theory.

• Source coding theorem


• A source with entropy H can be encoded with arbitrarily small error
probability at any rate R (bits/source output) provided, R > H.
• If R < H, the error probability will be bounded away from zero, independent
of the complexity of the encoder and decoder employed.

41
Source Coding Algorithms
• The theorem, first proved by Shannon, only gives the theoretical bound for the
performance of the encoders. It does not provide any algorithm for design of
such optimum codes.

• As a result, several algorithms have been developed that try to compress the
information at the source in a fashion such that the data is recoverable at the
receiver without any losses.

42
Classification of codes

Alphabet Code1 Code2 Code3


Non-singular, but not uniquely decodable Instantaneous
uniquely decodable but not instantaneous code (Prefix-
free code)

1 0 10 0
2 010 00 10

3 01 11 110

4 10 110 111

43
Classification of codes
• Non-singular (distinct) codes: If each codeword is distinguishable from other codeword.

• Uniquely decodable codes: Set of codes that can be decoded in only one way
Code-1, You receive 01010 decode as x2 x4 or x1 x4 x4 (non-uniq.deco.)

• Instantaneously decodable: A uniquely decodable code is called an instantaneous code if the end of
any code word is recognizable without examining subsequent code symbols.

• Prefix-free Code: A code is said to be prefix-free code if no code word is prefix to another code word
• Prefix-free condition is a sufficient but not necessary condition for a code to be uniquely decodable
• Prefix-free uniquely decodable
• Reverse may not true always

Prefix-free codes instantaneous codes

44
Kraft Inequality

Note: However, it does not guarantee that any code that satisfies this inequality is
automatically uniquely decodable.
You need to check U.D. condition separately
What we want?

Optimal code satisfies two criteria

I. Need to be instantaneous/prefix-free code, which


is for sure u.d. code.

II. Has smaller average codeword length among other


prefix-free codes
1. Huffman Source Coding Algorithm
• Here, fixed length blocks of the source output are mapped to variable length
binary blocks. This is called fixed-to-variable length coding.
• The idea here is to map the more frequently occurring fixed length sequences to
shorter binary sequences and the less frequently occurring ones to longer binary
sequences, thus achieving good coding efficiency.

• Huffman codes are prefix-free codes with minimum average codeword length. In
this sense, they are optimal.

50
Steps
Huffman Encoding Example 1
• Average codeword length for Huffman code is
_

• The performance of the optimal code is almost close to the optimal one. It is
measured by coding efficiency, which is 2.418/2.45=0.97
• Huffman code is uniquely decodable. To emit the following message sequence

the encoded binary sequence is

You can verify that this seq. can be decoded only one way i.e.,
More Examples on Huffman Coding
1. Determine the Huffman code for a source with alphabet A =
{a1, a2, a3, a4, a5} with probabilities; 1/3, ¼, 1/6, 1/8 and 1/8.

2. Determine the Huffman code for the output of a fair dice.

How does the length of the Huffman code compare with the
Entropy of the source for each case?

55
Solution
• Problem 1:
• The Entropy =2.2091 bits
• Average length = 27/12=2.25 bits
• Coding efficiency =2.20/2.25= 98.18%.

• Problem 2:
• The Entropy = 2.585
• Average length = 16/6=2.667
• Coding Efficiency = 96.93%.

56
More Examples on Huffman Coding
• Determine the Huffman code for a loaded coin. The p(X=head)=0.9. Compare this
with the entropy and determine the efficiency of your code.

• A three letter alphabet has the following probability distribution;


• p(Xi = a)=0.73
• p(Xi = b)=0.25
• p(Xi = c)=0.02

57
solution
• In the first case, the outcome heads is most likely but it is not possible to encode the outcome
using less than one bit. Although the entropy is 0.468, the number of bits needed is 1 so the
efficiency is only .46

• In the second case, the codes are; a : 0, b: 10, c: 11. The average length = 1.27 vs. the entropy
which is 0.9443. Efficiency: 0.9443/1.27 = .74

58
Example: Extended Source
Solution
Block Coding
• As we use the Huffman coding algorithm over
longer and longer blocks of symbols, the average
number of bits required to encode each symbol
approaches the entropy of the source. (See the
previous example)

• Thus, as the block length increases, the coding


efficiency improves and approaches to 1.

• We will next see why is it so.


63
• We can show that the average codeword length of Huffman code satisfies the
following inequality
H ( A)  L  H ( A)  1 (Proof is skipped)

• If the Huffman code is designed for sequences of source letters of length n (the
nth order extension of the source), we have
H ( An )  Ln  H ( An )  1

• Where is the average codeword length for the extended source and thus, the
codeword length per message is, on an average,

L  Ln / n
• If the source is memoryless, H ( An )  nH ( A)

1
• Therefore, H ( A)  L  H ( A) 
n
• Thus, as n

• Equivalently, if the source is memoryless, then for Huffman


coding with sufficiently large block length (n), average
number of bits required to encode each symbol approaches
the entropy of the source.

• Thus, code extension offers a powerful technique to improve


the efficiency of the code.

65
Shanon-Fano Encoding
In Shannon-Fano encoding the ambiguity may arise in the choice of approximately equiprobable sets.
Drawback of Huffman & Shanon-Fano Coding

• The Huffman code is optimal in the sense that, for a given source, they provide a
prefix code with minimum no of bits /message.

• Both have two disadvantages:


• It depends on the source probabilities- source statistics need to know in advance
• Complexity of the algorithm exponentially increases with the block length

• Not a good choice for any practical source whose statistics are not known in
advance.

• The Lampel-Ziv algorithm belongs to the class of universal source coding algo, i.e.,
algorithms that are independent of source statistics. This is a variable-to-fixed
length coding scheme.
Lempel-Ziv (L-Z) Coding

The steps of L-Z algorithm are as follows:


• Any sequence of the source output is uniquely parsed into phrases of varying
length and these phrases are encoded using codewords of equal length.
• Parsing is done by identifying phrases of the smallest length that have not
appeared before.
• The new phrase is the concatenation of a previous phrase and a new source
message.
• Encoding: Lexicographic (dictionary) ordering of the previous phrase and the new
source message are concatenated.

72
Example 1
• Consider the following sequence
101011011010101010
In which dictionary
location the longest
prefix of the content The last message
After Parsing (the phrases are) appeared before of the content
1, 0, 10, 11, 01, 101,010, 1010
Dictionary location Contents Codeword
1 1 (0,1)
2 0 (0,0)
3 10 (1,0)
4 11 (1,1)
5 01 (2,1)
6 101 (3,1)
7 010 (5,0)
8 1010 (6,0)
Example 2
• Consider the following sequence
ABBAABBAABBABAABAA
After Parsing:
A, B, BA, AB, BAA, BB, ABA, ABAA
Dictionary location Contents Codeword
1 A (0,A)
2 B (0,B)
3 BA (2, A)
4 AB (1,B)
5 BAA (3,A)
6 BB (2,B)
7 ABA (4,A)
8 ABAA (7,A)
• Let … = C(n)
• Total # of symbols in the alphabet = K
• Then L-Z encoding yields a fixed length code sequence of length
C (n)log 2 C (n)  log 2 K  bits

• It can be shown that the above expression approaches to nH(A) for


large values of n (sequence length) and with stationary and ergodic
source
• This algo is widely used to compress computer files. (.zip)

You might also like