Module-3 Information Theory: Entropy Source-Coding Theorem
Module-3 Information Theory: Entropy Source-Coding Theorem
Information theory
Entropy
REFER NOTES
Source-coding Theorem
The process by which this representation is accomplished is called source encoding.
The device that performs the representation is called a source encoder. For reasons to be
described, it may be desirable to know the statistics of the source.
In particular, if some source symbols are known to be more probable than others, then we may
exploit this feature in the generation of a source code by assigning short codeword’s to frequent
source symbols, and long codeword’s to rare source symbols.
We refer to such a source code as a variable-length code. The Morse code, used in telegraphy in
the past, is an example of a variable-length code.
Our primary interest is in the formulation of a source encoder that satisfies two requirements:
1. The codeword’s produced by the encoder are in binary form.
2. The source code is uniquely decodable, so that the original source sequence can be
reconstructed perfectly from the encoded binary sequence. The second requirement is
particularly important: it constitutes the basis for a perfect source code.
Consider then the scheme shown in Figure 5.3 that depicts a discrete memoryless source
whose output sk is converted by the source encoder into a sequence of 0s and 1s, denoted by
bk.
We assume that the source has an alphabet with K different symbols that the kth symbol sk
occurs with probability pk , k = 0, 1,… K – 1.
Let the binary codeword assigned to symbol sk. by the encoder have length lk , measured in
bits. We define the average codeword length L of the source encoder as
According to this theorem, the entropy H(S) represents a fundamental limit on the average number of
bits per source symbol necessary to represent a discrete memoryless source, in that it can be made as
small as but no smaller than the entropy H(S).
Lmin = H(S), we may rewrite (5.19), defining the efficiency of a source encoder in terms of the entropy
H(S) as shown by
Prefix Coding
REFER NOTES
Huffman Coding
The basic idea behind Huffman coding is the construction of a simple algorithm that computes
an optimal prefix code for a given distribution, optimal in the sense that the code has the
shortest expected length.
The end result is a source code whose average codeword length approaches the fundamental
limit set by the entropy of a discrete memoryless source, namely H(S).
The essence of the algorithm used to synthesize the Huffman code is to replace the
prescribed set of source statistics of a discrete memoryless source with a simpler one. This
reduction process is continued in a step-by-step manner until we are left with a final set of
only two source statistics (symbols), for which (0, 1) is an optimal code
To be specific, the Huffman encoding algorithm proceeds as follows:
The source symbols are listed in order of decreasing probability. The two source
symbols of lowest probability are assigned 0 and 1. This part of the step is referred to
as the splitting stage.
These two source symbols are then combined into a new source symbol with
probability equal to the sum of the two original probabilities.The probability of the
new symbol is placed in the list in accordance with its value.
The procedure is repeated until we are left with a final list of source statistics
(symbols) of only two for which the symbols 0 and 1 are assigned.
The code for each (original) source is found by working backward and tracing the
sequence of 0s and 1s assigned to that symbol as well as its successors.
Lempel–Ziv Coding
A drawback of the Huffman code is that it requires knowledge of a probabilistic model of the
source; unfortunately, in practice, source statistics are not always known a priori.
Moreover, in the modeling of text we find that storage requirements prevent the Huffman code
from capturing the higher-order relationships between words and phrases because the
codebook grows exponentially fast in the size of each super-symbol of letters (i.e., grouping of
letters); the efficiency of the code is therefore compromised.
To overcome these practical limitations of Huffman codes, we may use the Lempel–Ziv
algorithm, which is intrinsically adaptive and simpler to implement than Huffman coding.
Basically, the idea behind encoding in the Lempel–Ziv algorithm is described as follows: The
source data stream is parsed into segments that are the shortest subsequences not
encountered previously.
Discrete Memoryless Channels
A discrete memoryless channel is a statistical model with an input X and an output Y that is a noisy
Every unit of time, the channel accepts an input symbol X selected from an alphabet 𝒳 and, in response,
version of X; both X and Y are random variables.
The channel is said to be “discrete” when both of the alphabets 𝒳 and 𝒴 have finite sizes.
It is said to be “memoryless” when the current output symbol depends only on the current input symbol
and not any previous or future symbol.
Figure 5.7a shows a view of a discrete memoryless channel. The channel is described in terms of an input
The channel encoder and channel decoder in Figure 5.11 are both under the designer’s control
and should be designed to optimize the overall reliability of the communication system.
The approach taken is to introduce redundancy in the channel encoder in a controlled manner,
so as to reconstruct the original source sequence as accurately as possible.
In a rather loose sense, we may thus view channel coding as the dual of source coding, in that
the former introduces controlled redundancy to improve reliability whereas the latter reduces
redundancy to improve efficiency.
To state Shannon’s second theorem, the channel-coding theorem,10 in two parts as follows:
1. Let a discrete memoryless source with an alphabet 𝒮have entropy H(S) for random variable
S and produce symbols once every Ts seconds. Let a discrete memoryless channel have
capacity C and be used once every Tc seconds, Then, if
1.1
there exists a coding scheme for which the source output can be transmitted over the
channel and be reconstructed with an arbitrarily small probability of error.
2. Conversely, if
it is not possible to transmit information over the channel and reconstruct it with an
arbitrarily small probability of error.
The channel-coding theorem is the single most important result of information theory. The
theorem specifies the channel capacity C as a fundamental limit on the rate at which the
transmission of reliable error-free messages can take place over a discrete memoryless channel.
However, it is important to note two limitations of the theorem:
1. The channel-coding theorem does not show us how to construct a good code. Rather, the
theorem should be viewed as an existence proof in the sense that it tells us that if the
condition of (1.1) is satisfied, then good codes do exist.
2. The theorem does not have a precise result for the probability of symbol error after
decoding the channel output. Rather, it tells us that the probability of symbol error tends to
zero as the length of the code increases, again provided that the condition of (1.1) is
satisfied.