0% found this document useful (0 votes)
47 views40 pages

Communication System CH#2

This document discusses information theory and coding. It introduces key concepts such as entropy, which is a measure of the uncertainty in a random variable. Entropy can be used to determine the minimum number of bits needed to encode a message from an information source and is calculated based on the probabilities of all possible messages. The document provides examples of calculating entropy for various information sources and coding methods such as Huffman coding are introduced for compressing data in an efficient manner.

Uploaded by

Haylemaryam G.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views40 pages

Communication System CH#2

This document discusses information theory and coding. It introduces key concepts such as entropy, which is a measure of the uncertainty in a random variable. Entropy can be used to determine the minimum number of bits needed to encode a message from an information source and is calculated based on the probabilities of all possible messages. The document provides examples of calculating entropy for various information sources and coding methods such as Huffman coding are introduced for compressing data in an efficient manner.

Uploaded by

Haylemaryam G.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Communication Systems

(ECEg4271)

By: H/MARYAM G.

Yekatit 15, 2014 E.C.

Yekatit 15, 2014 E.C. Communication Systems 1 / 40


CHAPTER II
Information Theory and Coding
▶ Introduction
▶ Measure of Information, Entropy
▶ Source Coding
▶ Huffman Coding, Shannon-Fano Coding

Yekatit 15, 2014 E.C. Communication Systems 2 / 40


Introduction
Father of Digital Communication
Shannon’s Definition of Communication:
”The fundamental problem of communication
is that of reproducing at one point either ex-
actly or approximately a message selected at
another point.”
Most important Masters thesis of the
century:
”A symbolic analysis of relay and switching cir-
cuits, M.Sc. thesis, EE Dept, MIT, 1937”
Very influential in digital circuit design. Claude Elwood Shannon
(April 30, 1916 – February 24, 2001)

Yekatit 15, 2014 E.C. Communication Systems 3 / 40


Introduction
▶ Shannon wants to find a way for “reliably” transmitting data throughout
the channel at “maximal” possible rate.
▶ Information theory provides a quantitative measure of the information con-
tained in message signals.
▶ It allows us to determine the capacity of a communication system to trans-
fer this information from source to destination.
▶ Information theory deals with mathematical modeling and analysis of a
communication system rather than with physical sources and physical chan-
nels.
▶ In particular, it provides answers to two fundamental questions:
① What is the irreducible complexity below which a signal cannot be compressed?
(answer: the entropy H).
② What is the ultimate transmission rate for reliable communication over a noisy chan-
nel? (answer: the channel capacity C).
Yekatit 15, 2014 E.C. Communication Systems 4 / 40
Measure of Information
▶ An information source is an object that produces an event the outcome
of which is selected at random according to a probability distribution.
▶ A practical source in a communication system is a device that produces
messages and it can be either analog or discrete.
▶ A discrete information source is a source that has only a finite set of symbols
as possible outputs.
▶ The set of source symbols is called the source alphabet and the ele-
ments of the set are called symbols or letters.
▶ Information sources can be classified as having memory or being memo-
ryless.

Yekatit 15, 2014 E.C. Communication Systems 5 / 40


Measure of Information
▶ A source with memory is one for which a current symbol depends on the
previous symbols.
▶ A memoryless source is one for which each symbol produced is indepen-
dent of the previous symbols.
▶ A discrete memoryless source (DMS) can be characterized by the list of
the symbols, the probability assignment to these symbols and the specifi-
cation of the rate of generating these symbols by the source.
▶ The amount of information contained in an event is closely related to its
uncertainty.
▶ Messages containing knowledge of high probability of occurrence convey
relatively little information. If an event is certain, it conveys zero infor-
mation.

Yekatit 15, 2014 E.C. Communication Systems 6 / 40


Measure of Information
▶ Thus, a mathematical measure of information should be a function of the
probability of the outcome and should satisfy the following axioms:
① Information should be proportional to the uncertainty of an outcome.
② Information contained in independent outcomes should add.
▶ Let us consider a DMS, denoted by X, with alphabets {x1 , x2 , . . . , xm }.
with probabilities P(X = xi ) = pxi , i = 1, 2, ..., m, that must satisfy the
condition:
m
∑ pxi = 1 (1)
i=1
▶ The amount of information gained after observing the event X = xi , which
occurs with probability pxi , denoted by I(xi ), becomes:

I(xi ) = log2 1/pxi = − log2 (pxi ) (2)
Yekatit 15, 2014 E.C. Communication Systems 7 / 40
Measure of Information
▶ Note that I(xi ) satisfies the following properties.
i. I(xi ) = 0 for pxi = 1; If we are absolutely certain of the outcome of an event, even
before it occurs, there is no information gained.
ii. I(xi ) ≥ 0, for 0 ≤ pxi ≤ 1; The occurrence of an event X = xi either provides some
or no information, but never brings about a loss of information.
iii. I(xi ) > I(x j ) if pxi < px j ; The less probable an event is, the more information we gain
when it occurs.
iv. I(xi , x j ) = I(xi ) + I(x j ), If xi and x j are independent.
▶ Why logarithmic function is used?
• Information cannot be negative.
• The lowest possible self information is zero.
• If the number of informations are more, the total information is the sum of all
individual informations.

Yekatit 15, 2014 E.C. Communication Systems 8 / 40


Entropy
▶ In a practical communication system, we usually transmit long sequences of
symbols from an information source. Thus, we are more interested in the
average information that a source produces than the information content
of a single symbol. The entropy will then be measured in bits.
▶ The entropy, H(X), is a measure of the average information content per
source symbol (bits/symbol).
▶ The mean value of I(xi ) over the alphabet of source X with m different
symbols is given by:
m
H(X) = E[I(xi )] = ∑ pxi I(xi )
i=1
m 
H(X) = − ∑ pxi log2 pxi (3)
i=1
Yekatit 15, 2014 E.C. Communication Systems 9 / 40
Entropy
▶ The source entropy H(X) can be considered the average amount of uncer-
tainty within source X that is resolved by use of the alphabet.
▶ The source entropy H(X) satisfies the following relations:

0 ≤ H(X) ≤ log2 m (4)


where m is the size (number of symbols) of the alphabet of source X.
▶ The lower bound corresponds to no uncertainty while the upper bound
corresponds to the maximum uncertainty (all the symbols in the source
alphabet are equi-probable).
▶ Information Rate: If the time rate at which source X emits symbols is
r (symbols/second), the information rate R of the source is given by:
R = rH(X) b/s (5)
Yekatit 15, 2014 E.C. Communication Systems 10 / 40
Entropy
▶ Example 1: For a binary source X that generates independent symbols 0
and 1 with equal probability, the source entropy H(X) is:

1 1 1 1
H(X) = − log2 − log2
2 2 2 2
= 1 bits/symbol

This is known as the binary entropy and


its function, denoted by Hb (p), is given
in Fig. 1.
Figure 1: The binary entropy function.

Yekatit 15, 2014 E.C. Communication Systems 11 / 40


Entropy
▶ Example 2: A source with bandwidth 40 Hz is sampled at the Nyquist
rate. Assuming that the resulting sequence can be approximately mod-
eled by a DMS with alphabet x = {−2, −1, 0, 1, 2} and with corresponding
probabilities { 21 , 14 , 18 , 16
1 1
, 16 }, determine the rate of the source in bits/sec.
Solution: We have
 
1 1 1 1 15
H(X) = log2 2 + log2 4 + log2 8 + 2 × log2 16 = bits/sample
2 4 8 16 8

since we have 80 samples/sec the source produces information at a rate


of 150 bits/sec.

Yekatit 15, 2014 E.C. Communication Systems 12 / 40


Entropy
▶ Exercise 1: A discrete source emits one of three symbols once every
millisecond. The symbol probabilities are 12 , 14 and 14 respectively. Find the
source entropy and information rate.
▶ Exercise 2: The output of an information source consists of 150 symbols.
1
32 of which occurred with probability 64 and the remaining 118 occurred
1
with probability 236 . The source emits 2000 symbols per second. Assuming
that symbols are chosen independently, find the average rate of this source.
▶ Exercise 3: A high-resolution black-and-white TV picture consists of about
2 × 106 picture elements and 16 different brightness levels. Pictures are
repeated at the rate of 32 per second. All picture elements are assumed to
be independent and all levels have equal likelihood of occurrence. Calculate
the average rate of information conveyed by this TV picture source.

Yekatit 15, 2014 E.C. Communication Systems 13 / 40


Joint and Conditional Entropy
▶ When dealing with two or more random variables, exactly in the same way
that joint and conditional probabilities are introduced.
▶ These concepts are especially important when dealing with sources with
memory.
▶ The joint entropy H(X,Y ) of a pair of discrete random variables (X,Y )
with a joint distribution p(xi , y j ) is defined as:
n m  
H(X,Y ) = − ∑ ∑ p xi , y j log2 p xi , y j (6)
j=1 i=1

▶ The amount of uncertainty remaining about the channel input after ob-
serving the channel output, is called as conditional entropy.

Yekatit 15, 2014 E.C. Communication Systems 14 / 40


Joint and Conditional Entropy
▶ The conditional entropy, H(Y |X), of the random variable Y given the ran-
dom variable X is defined by:
" #
 m  m  n  
H(Y |X) = ∑ p xi H Y |X = x = ∑ p xi − ∑ p y j |xi log2 p y j |xi
i=1 i=1 j=1

n m  
H(Y |X) = − ∑ ∑ p xi , y j log2 p y j |xi (7)
j=1 i=1

▶ Similarly, the conditional entropy, H(X|Y ), of the random variable X given


the random variable Y is defined by:
n m  
H(X|Y ) = − ∑ ∑ p xi , y j log2 p xi |y j (8)
j=1 i=1

Yekatit 15, 2014 E.C. Communication Systems 15 / 40


Joint and Conditional Entropy

▶ Example 1: From the definition of the joint entropy and chain rule, p xi , y j =
 
p xi p y j |xi , show that H(X,Y ) = H(X) + H(Y |X).
n m  
H(X,Y ) = − ∑ ∑ p xi , y j log2 p xi , y j
j=1 i=1
n m   
= − ∑ ∑ p xi , y j log2 p xi p y j |xi
j=1 i=1
n m   n
 m 
= − ∑ ∑ p xi , y j log2 p xi − ∑ ∑ p xi , y j log2 p y j |xi
j=1 i=1 j=1 i=1
m   n  m 
= − ∑ p xi log2 p xi − ∑ ∑ p xi , y j log2 p y j |xi
i=1 j=1 i=1
H(X,Y ) = H(X) + H(Y |X) (9)
Yekatit 15, 2014 E.C. Communication Systems 16 / 40
Mutual Information
▶ Since the entropy H(X) represents our uncertainty about the channel input
before observing the channel output, and the conditional entropy H(X|Y )
represents our uncertainty about the channel input after observing the chan-
nel output.
▶ It follows that the difference H(X)−H(X|Y ) must represent our uncertainty
about the channel input that is resolved by observing the channel output.
▶ This is called the mutual information, I(X;Y ), of the channel.

I(X;Y ) = H(X) − H(X|Y ) (10)


▶ I(X;Y ) is interpreted as the reduction in uncertainty of X due to knowledge
of Y or a measure of dependency between X and Y .
▶ Similarly, we may write

I(Y ; X) = H(Y ) − H(Y |X) (11)


Yekatit 15, 2014 E.C. Communication Systems 17 / 40
Mutual Information
▶ From eq. 10, the mutual information of a channel is related to the joint
entropy of the channel input and channel output as:
I(X;Y ) = H(X) + H(Y ) − H(X,Y ) (12)
▶ Derivation of Mutual Information:

I(Y ; X) = H(Y ) − H(Y |X) = H(X) + H(Y ) − H(X,Y )


m  n 
= − ∑ pxi log2 pxi − ∑ py j log2 py j − H(X,Y )
i=1 j=1
m n   m n  
= − ∑ ∑ p xi , y j log2 pxi − ∑ ∑ p xi , y j log2 py j − H(X,Y )
i=1 j=1 i=1 j=1
m n

 pxi , y j
I(Y ; X) = ∑ ∑ p xi , y j log2   (13)
i=1 j=1 pxi py j
Yekatit 15, 2014 E.C. Communication Systems 18 / 40
Mutual Information
▶ Some additional properties of I(X;Y );
i. I(X; X) = H(X)
ii. I(X;Y ) ≥ 0
iii. I(X;Y ) = I(Y ; X)
iv. I(X;Y ) ≤ min{H(X), H(Y )}

Yekatit 15, 2014 E.C. Communication Systems 19 / 40


Mutual Information
▶ The Venn diagram representation of entropy, joint entropy and conditional
entropy.

Figure 2: Graphical view of entropy, joint entropy and conditional entropy.

Yekatit 15, 2014 E.C. Communication Systems 20 / 40


Channel Capacity
▶ Of practical interest in many communication applications is the number of
bits that may be reliably transmitted per second through a given commu-
nications channel.
▶ We define the channel capacity of a DMC as the maximum average mutual
information I(X;Y ) in any single use of the channel (i.e., signaling interval),
where the maximization is over all possible input probability distributions
p(xi ) on X.

Figure 3: Noisy channel model.


Yekatit 15, 2014 E.C. Communication Systems 21 / 40
Channel Capacity
▶ Assume that the output of the channel depends probabilistically on the
input, the channel capacity per symbol of a DMC is defined as:

Cs = max I(X;Y ) (bits/symbol) (14)


p(xi )

where the maximization is over all possible input probability distributions


p(xi ) on X. Note that the channel capacity Cs , is a function of only the
channel transition probabilities that define the channel.
▶ If r symbols are being transmitted per second, then the maximum rate of
transmission of information per second is rCs . This is the channel capacity
per second and is denoted by C(b/s):

C = rCs (b/s) (15)

Yekatit 15, 2014 E.C. Communication Systems 22 / 40


Source Coding
▶ An important problem in communications is the efficient representation of
data generated by a discrete source.
▶ The process by which this representation is accomplished is called source
encoding. The device that performs the representation is called a source
encoder.
▶ Our primary interest is in the development of an efficient source encoder
that satisfies two functional requirements:
• The code words produced by the encoder are in binary form.
• The source code is uniquely decodable, so that the original source sequence can be
reconstructed perfectly from the encoded binary sequence.
▶ We assume that the source has an alphabet with m different symbols, and
that the ith symbol xi occurs with probability pi , i = 1, 2, ..., m. Let the
binary code word assigned to symbols xi by the encoder have length li ,
measured in bits.
Yekatit 15, 2014 E.C. Communication Systems 23 / 40
Source Coding
▶ We define the average code word length (the average number of bits per
source symbol used in the source encoding process), L, of the source en-
coder as:
m
L = ∑ pxi li (16)
i=1

▶ Let Lmin denote the minimum possible value of L. We then define the
coding efficiency of the source encoder as:
Lmin
η= (17)
L
▶ With L ≥ Lmin , we clearly have η ≤ 1. The source encoder is said to be
efficient when η approaches unity.
Yekatit 15, 2014 E.C. Communication Systems 24 / 40
Source Coding
▶ But how is the minimum value Lmin determined? The answer to this fun-
damental question is embodied in Shannon’s source-coding theorem.
▶ According to the source-coding theorem, the entropy H(X) represents a
fundamental limit on the average number of bits per source symbol neces-
sary to represent a discrete memoryless source in that it can be made as
small as, but no smaller than, the entropy H(X).
▶ Thus with Lmin = H(X), we may rewrite the efficiency of a source encoder
in terms of the entropy H(X) as:

H(X)
η= (18)
L

Yekatit 15, 2014 E.C. Communication Systems 25 / 40


Source Coding
▶ Let us assume that the possible outputs of an information source are X =
{x1 , x2 , x3 , x4 , x5 }, and consider the following three codes for this source.

Table 1: Sample code words for source X.


Letter Probability Code I Code II Code III Code IV
x1 0.5 1 1 0 00
x2 0.25 01 10 10 01
x3 0.125 001 100 110 10
x4 0.0625 0001 1000 1110 11
x5 0.0625 00001 10000 1111 110
▶ In the first code, each codeword ends with a 1. Therefore, as soon as the
decoder observes a 1, it knows that the codeword has ended and a new
codeword will start. This means that the code is a self-synchronizing code.
Yekatit 15, 2014 E.C. Communication Systems 26 / 40
Source Coding
▶ In the second code each code word starts with a 1. Therefore, upon ob-
serving a 1, the decoder knows that a new code word has started and,
hence, the previous bit was the last bit of the previous code word.
▶ So that the second code is again self-synchronizing but not as desirable as
the first code.
▶ The reason is that in this code we have to wait to receive the first bit of
the next code word to recognize that a new code word has started, whereas
in code 1 we recognize the last bit without having to receive the first bit
of the next code word.
▶ Both codes 1 and 2 therefore are uniquely decodable. However, only
code 1 is instantaneous.
▶ Codes 1 and 3 have the nice property that no code word is the prefix of
another code word. It is said that they satisfy the prefix condition.
Yekatit 15, 2014 E.C. Communication Systems 27 / 40
Source Coding
▶ It can be proved that a necessary and sufficient condition for a code to be
uniquely decodable and instantaneous is that it satisfy the prefix condi-
tion.
▶ This means that both codes 1 & 3 are uniquely decodable & instantaneous.
▶ However, code 3 has the advantage of having a smaller average code word
length. In fact, for code 1 the average code word length is; E[L] = 1 × 21 +
2 × 14 + 3 × 18 + 4 × 16
1 1
+ 5 × 16 31
= 16 , and for code 3; E[L] = 1 × 12 + 2 ×
1 1 1 1 30
4 + 3 × 8 + 4 × 16 + 4 × 16 = 16 .
▶ Code 4 has a major disadvantage. This code is not uniquely decodable.
For example, the sequence 110110 can be decoded in two ways, as x5 x5 or
as x4 x2 x3 .
▶ Codes that are not uniquely decodable, are not desirable and should be
avoided in practice.
Yekatit 15, 2014 E.C. Communication Systems 28 / 40
Source Coding
▶ Prefix codes are distinguished from other uniquely decodable codes by
the fact that the end of a codeword is always recognizable. Hence, the
decoding of a prefix can be accomplished as soon as the binary sequence
representing a source symbol is fully received. For this reason, prefix codes
are also referred to as instantaneous codes.
▶ Given a discrete memoryless source of entropy H(X), the average codeword
length L of a prefix code is bounded as follows:

H(X) ≤ L < H(X) + 1 (19)

▶ From the discussion above it is seen that the most desirable of the above
four codes is code 3, which is uniquely decodable, instantaneous, and has
the least-average codeword length. This is an example of Huffman code.

Yekatit 15, 2014 E.C. Communication Systems 29 / 40


Shannon-Fano Coding
▶ Shannon–Fano coding, named after Claude Elwood Shannon and Robert
Fano, is a technique for constructing a prefix code based on a set of symbols
and their probabilities.
▶ It is suboptimal in the sense that it does not achieve the lowest possible
expected codeword length like Huffman coding; however unlike Huffman
coding, it does guarantee that all codeword lengths are within one bit of
their theoretical ideal.
▶ Shannon–Fano algorithm passes through the following stages.
① List the source symbols in order of decreasing probability.
② Partition the set into two sets that are as close to equiprobable as possible, and
assign 0 to the upper set and 1 to the lower set.
② Continue this process, each time partitioning the sets with as nearly equal probabil-
ities as possible until further partitioning is not possible.

Yekatit 15, 2014 E.C. Communication Systems 30 / 40


Shannon-Fano Coding
▶ In Shannon–Fano coding, the symbols are arranged in order from most
probable to least probable, and then divided into two sets whose total
probabilities are as close as possible to being equal.
Table 2: Sample Shannon-Fano encoding.

▶ Note that in Shannon-Fano encoding the ambiguity may arise in the choice
of approximately equiprobable sets.
Yekatit 15, 2014 E.C. Communication Systems 31 / 40
Huffman Encoding Algorithm
▶ An optimal (shortest expected length) prefix code for a given distribution
can be constructed by a simple algorithm discovered by Huffman.
▶ The basic idea behind Huffman coding is to assign to each symbol of
an alphabet a sequence of bits roughly equal in length to the amount of
information conveyed by the symbol in question.
▶ The end result is a source code whose average code word length approaches
the fundamental limit set by the entropy of a discrete memoryless source.
▶ The essence of the algorithm used to synthesize the Huffman code is to
replace the prescribed set of source statistics of a discrete memoryless
source with a simpler one.
▶ This reduction process is continued in a step-by-step manner until we are
left with a final set of only two source statistics (symbols), for which (0,
1) is an optimal code.
Yekatit 15, 2014 E.C. Communication Systems 32 / 40
Huffman Encoding Algorithm
▶ Starting from this trivial code, we then work backward and thereby con-
struct the Huffman code for the given source.
▶ Generally, Huffman encoding algorithm proceeds as follows:
① The source symbols are listed in order of decreasing probability. The two source
symbols of lowest probability are assigned 0 and 1. This part of the step is referred
to as the splitting stage.
② These two source symbols are then combined into a new source symbol with proba-
bility equal to the sum of the two original probabilities. (The list of source symbols,
and, therefore, source statistics, is thereby reduced in size by one.) The probability
of the new symbol is placed in the list in accordance with its value.
③ The procedure is repeated until we are left with a final list of source statistics (sym-
bols) of only two for which the symbols 0 and 1 are assigned.
▶ The code for each (original) source is found by working backward and
tracing the sequence of 0’s and 1’s assigned to that symbol as well as its
successors.
Yekatit 15, 2014 E.C. Communication Systems 33 / 40
Huffman Encoding Algorithm
▶ Example:The five symbols of the alphabet of a DM source (s0 , s1 , s2 , s3 , s4 )
and their probabilities ( 52 , 51 , 15 , 10
1 1
, 10 ) are shown in Fig. 3. below.
The average codeword length is therefore:
2 1 1 1 1
E[L] = 2 × + 2 × + 2 × + 3 × + 3 × = 2.2
5 5 5 10 10
The entropy of the specified DM source is calculated as follows.
       
1 1 1
H(X) = 0.4 log2 + 2 × 0.2 log2 + 2 × 0.1 log2
0.4 0.2 0.1
= 2.12193 bits/symbol
Efficiency of a source encoder becomes:
H(X) 2.12193
η= = = 96.46%
L 2.2
Yekatit 15, 2014 E.C. Communication Systems 34 / 40
Huffman Encoding Algorithm
▶ Following through the Huffman algorithm, we reach the end of the compu-
tation in four steps, resulting in the Huffman tree shown in Table 3. The
codewords of the Huffman code for the source are tabulated in Table 4.
Table 3: Example of the Huffman encoding algorithm.

Table 4: source code.

Yekatit 15, 2014 E.C. Communication Systems 35 / 40


Huffman Encoding Algorithm
▶ For the example at hand, we may make two observations:
i. The average codeword length L exceeds the entropy H(X) by only 3.67 percent.
ii. The average codeword length L does indeed satisfy Eq. 19. (prefix code bound).

Yekatit 15, 2014 E.C. Communication Systems 36 / 40


Channel Encoding

Figure 4: Channel encoding - Channel decoding.


Yekatit 15, 2014 E.C. Communication Systems 37 / 40
Channel Encoding
▶ The inevitable presence of noise in a channel causes errors between the
output and input data sequences of a digital communication system.
▶ For a relatively noisy channel, the probability of error may have a value
higher than 10−2 , which means that less than 99 out of 100 transmitted
bits are received correctly.
▶ For many applications, this level of reliability is found to be far from ade-
quate. Indeed, a probability of error equal to 10−6 or even lower is often a
necessary requirement.
▶ To achieve such a high level of performance, we may have to resort to the
use of channel coding.
▶ The design goal of channel coding is to increase the resistance of a digital
communication system to channel noise.

Yekatit 15, 2014 E.C. Communication Systems 38 / 40


Assignments
1. Linear block codes
A) Hamming codes
B) Cyclic redundancy check (CRC) codes
C) Bose-Chaudhuri-Hocquenghem (BCH) codes
D) Reed-Solomon codes

2. Convolutional codes
A) Maximum likelihood decoding
B) Viterbi decoding

3. Turbo codes; Which are a combination of block and convolutional codes.

Yekatit 15, 2014 E.C. Communication Systems 39 / 40


Any Questions?
END

Yekatit 15, 2014 E.C. Communication Systems 40 / 40

You might also like