Digital Communication: Information Theory
Digital Communication: Information Theory
Information Theory
Digital communication systems and networks are described by their data rates. The ultimate goal
of a communication system is to transfer information rather than data. Defining what information
is, how to characterize it, and how to improve information transfer rates are the challenges
addressed by information theory. One of the persons who started the digital revolution was
Claude Shannon. Shannon showed in one of his major research papers that all information
sources - people speaking, television pictures, telegraph keys have a source rate associated
with them which can be measured in bits per second. Similarly communication channels have a
capacity that can be expressed in the same units. Information can be transmitted over the channel
if and only if the source rate does not exceed the channel capacity.
1
log 2 P xi
P{xi }
I xi log 2
Eq. (60)
This measure satisfies our requirement that the information contained in an outcome is
proportional to its uncertainty. If P xi = 1, then I xi = 0, telling us that a certain event
contains no information.
The definition also satisfies the requirement that the total information in independent events
should add. Clearly, a rain forecast for two days contains twice as much information as for one
day. From equation for two independent outcomes, xi and x j :
1
P xi and x j
I xi and x j log 2
1
P xi P x j
log 2
1
1
+ log 2
P x j
P xi
log 2
= I xi I x j
Eq. (61)
Hartley's measure defines the information in a single outcome. The measure entropy H(X),
(sometimes absolute entropy), defines the information content of the source X as a whole. It is
the mean information provided by the source per source output or symbol. We have from
equation 61:
H X P xi I xi P xi log 2 P xi
i
Eq.(62)
Example
Consider the case where there is a need to communicate the set of student grades of which there
are four, A, B, C and E. It is a trivial task to map the grades into a binary alphabet as
follows:
A
00
01
10
11
Since there are 50 grades to be communicated, the above coding will require the transfer of 100
binary symbols, or bits. The question arises whether the information could be accurately
transferred using fewer than 100 bits. If the four grades are equiprobable (that is each symbol
has an equally likely chance of occurring) then the probability of any of the grades is 0.25 and
the entropy is:
H = 4 x P(any grade)*log2{P(any grade)}
= 4 x (0.25) x 2
= 2 bits/symbol
When this is the case then the information cant be transmitted in less than 100 bits.
However, if the distribution of the grades is as shown below, we may calculate the entropy by:
SYMBOL
QUANTITY
PROBABILITY
PA = 0.10
10
PB = 0.20
28
PC = 0.56
PE = 0.14
From this we may infer that the most efficient coding of the information will require 1.6621
binary symbols per information symbol. Therefore we should be able to find a way to code the
information with fewer than 100 binary symbols (note 50 x 1.6621 = 83.105).
At the same time we must be cognizant of the fact that it will never be possible to accurately
code the information with fewer than 84 binary symbols.
We can easily demonstrate a reduction in the required number of binary symbols by adopting a
variable length coding and assigning shorter codes to more frequently occurring message
symbols. A solution is as follows:
A
B
C
E
010
00
1
011
If the above coding is used, only 84 bits will be required (check this on your own). Note that in
the above assignment, each code is unique and is not a prefix of any other code in the set. This
variable-length coding is known as Huffman coding (after the engineer who designed it) and is a
practical way to design relatively efficient, prefix-free, variable length codes, where the
probability distribution of the source information symbols is known.
As an exercise, write down the binary sequence for the following grades, then starting with the
first bit see why it is possible for the receiver to accurately decode the received binary sequence.
CECCEACCEBBCCECCE E B BCABCA
(Hint)
The binary sequence for the first six grades is:
101111011010
The only grade that does not start with a 0 is C, also there are no symbols with codes 10 or 101,
therefore the first symbol is C. There is no symbol with code 0 or 01, therefore the second
symbol is 011 or E, etc. Huffman coding is only one example of many types of compression
coding. One of the drawbacks is that it requires the probability of the message elements to be
known beforehand.
Entropy then gives us the average information content of the source.
R = 2Blog2(n) bits/sec
This formula states that as n tends to infinity, so does the information rate.
Is there a limit on the number of levels? The limit is set by the presence of noise. If we continue
to subdivide the magnitude of the changes into ever decreasing intervals, we reach a point where
we cannot distinguish the individual levels because of the presence of noise. Noise therefore
places a limit on the maximum rate at which we can transfer information. Obviously, what really
matters is the signal-to-noise ratio (SNR). This is defined by the ratio of signal power S to noise
power N, and is often expressed in deciBels;
SNR = 10log10(S/N) dB
Also note that it is common to see following expressions for power in many texts: