0% found this document useful (0 votes)
81 views5 pages

Digital Communication: Information Theory

- Information theory defines information as the amount of uncertainty reduced by an outcome. Entropy quantifies the average information of a source as the weighted information content of its possible outcomes. - A communication channel's capacity is the maximum rate at which information can be transmitted error-free. It depends on the channel's bandwidth and signal-to-noise ratio. - Coding techniques like Huffman coding can compress data to approach a channel's capacity limit, at the cost of increased complexity. The ultimate limit is set by noise levels which cause errors.

Uploaded by

Monique Hepburn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views5 pages

Digital Communication: Information Theory

- Information theory defines information as the amount of uncertainty reduced by an outcome. Entropy quantifies the average information of a source as the weighted information content of its possible outcomes. - A communication channel's capacity is the maximum rate at which information can be transmitted error-free. It depends on the channel's bandwidth and signal-to-noise ratio. - Coding techniques like Huffman coding can compress data to approach a channel's capacity limit, at the cost of increased complexity. The ultimate limit is set by noise levels which cause errors.

Uploaded by

Monique Hepburn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 5

Digital Communication

Information Theory
Digital communication systems and networks are described by their data rates. The ultimate goal
of a communication system is to transfer information rather than data. Defining what information
is, how to characterize it, and how to improve information transfer rates are the challenges
addressed by information theory. One of the persons who started the digital revolution was
Claude Shannon. Shannon showed in one of his major research papers that all information
sources - people speaking, television pictures, telegraph keys have a source rate associated
with them which can be measured in bits per second. Similarly communication channels have a
capacity that can be expressed in the same units. Information can be transmitted over the channel
if and only if the source rate does not exceed the channel capacity.

Information Sources and Entropy


Consider a weather forecast for the Sahara desert. This forecast is an information source. The
information source has two main outcomes: rain or no-rain. Clearly, the outcome no-rain
contains little information; it is a highly probable outcome. The outcome rain, however, contains
considerable information; it is a highly improbable event.
In information theory, an information source is a probability distribution, i.e. a set of
probabilities assigned to a set of outcomes. This reflects the fact that the information contained in
an outcome is determined not only by the outcome, but by how uncertain it is. An almost certain
outcome contains little information.
A measure of the information contained in an outcome was introduced by Hartley in 1927. He
defined the information (sometimes called self-information) contained in an outcome, xi , as:

1
log 2 P xi
P{xi }

I xi log 2

Eq. (60)

This measure satisfies our requirement that the information contained in an outcome is
proportional to its uncertainty. If P xi = 1, then I xi = 0, telling us that a certain event
contains no information.
The definition also satisfies the requirement that the total information in independent events
should add. Clearly, a rain forecast for two days contains twice as much information as for one
day. From equation for two independent outcomes, xi and x j :

1
P xi and x j

I xi and x j log 2

1
P xi P x j

log 2

1
1
+ log 2
P x j
P xi

log 2

= I xi I x j

Eq. (61)

Hartley's measure defines the information in a single outcome. The measure entropy H(X),
(sometimes absolute entropy), defines the information content of the source X as a whole. It is
the mean information provided by the source per source output or symbol. We have from
equation 61:
H X P xi I xi P xi log 2 P xi
i

Eq.(62)

Example
Consider the case where there is a need to communicate the set of student grades of which there
are four, A, B, C and E. It is a trivial task to map the grades into a binary alphabet as
follows:
A

00

01

10

11

Since there are 50 grades to be communicated, the above coding will require the transfer of 100
binary symbols, or bits. The question arises whether the information could be accurately
transferred using fewer than 100 bits. If the four grades are equiprobable (that is each symbol
has an equally likely chance of occurring) then the probability of any of the grades is 0.25 and
the entropy is:
H = 4 x P(any grade)*log2{P(any grade)}
= 4 x (0.25) x 2
= 2 bits/symbol
When this is the case then the information cant be transmitted in less than 100 bits.

However, if the distribution of the grades is as shown below, we may calculate the entropy by:
SYMBOL

QUANTITY

PROBABILITY

PA = 0.10

10

PB = 0.20

28

PC = 0.56

PE = 0.14

= -( PALog2 PA + PBLog2 PB + PCLog2 PC + PELog2 PE)


= -(0.10 x -3.3219 + 0.20 x -2.3219 + 0.56 x -0.8365 + 0.14 x -2.8365)
= 1.6621 bits/symbol

From this we may infer that the most efficient coding of the information will require 1.6621
binary symbols per information symbol. Therefore we should be able to find a way to code the
information with fewer than 100 binary symbols (note 50 x 1.6621 = 83.105).
At the same time we must be cognizant of the fact that it will never be possible to accurately
code the information with fewer than 84 binary symbols.
We can easily demonstrate a reduction in the required number of binary symbols by adopting a
variable length coding and assigning shorter codes to more frequently occurring message
symbols. A solution is as follows:
A
B
C
E

010
00
1
011

If the above coding is used, only 84 bits will be required (check this on your own). Note that in
the above assignment, each code is unique and is not a prefix of any other code in the set. This
variable-length coding is known as Huffman coding (after the engineer who designed it) and is a
practical way to design relatively efficient, prefix-free, variable length codes, where the
probability distribution of the source information symbols is known.
As an exercise, write down the binary sequence for the following grades, then starting with the
first bit see why it is possible for the receiver to accurately decode the received binary sequence.
CECCEACCEBBCCECCE E B BCABCA

(Hint)
The binary sequence for the first six grades is:
101111011010
The only grade that does not start with a 0 is C, also there are no symbols with codes 10 or 101,
therefore the first symbol is C. There is no symbol with code 0 or 01, therefore the second
symbol is 011 or E, etc. Huffman coding is only one example of many types of compression
coding. One of the drawbacks is that it requires the probability of the message elements to be
known beforehand.
Entropy then gives us the average information content of the source.

The relationship between information, bandwidth and noise


The most important question associated with a communication channel is the maximum rate at
which it can transfer information.
Information can only be transferred by a signal if the signal is permitted to change. Analogue
signals passing through physical channels may not change arbitrarily fast. The rate at which a
signal may change is determined by the bandwidth. In fact it is governed by the same NyquistShannon law as governs sampling; a signal of bandwidth B may change at a maximum rate of
2B. If each change is used to signify a bit, the maximum information rate is 2B.
The Nyquist-Shannon theorem makes no observation concerning the magnitude of the change. If
changes of differing magnitude are each associated with a separate bit, the information rate may
be increased. Thus, if each time the signal changes it can take one of n levels, the information
rate is increased to:

R = 2Blog2(n) bits/sec
This formula states that as n tends to infinity, so does the information rate.
Is there a limit on the number of levels? The limit is set by the presence of noise. If we continue
to subdivide the magnitude of the changes into ever decreasing intervals, we reach a point where
we cannot distinguish the individual levels because of the presence of noise. Noise therefore
places a limit on the maximum rate at which we can transfer information. Obviously, what really
matters is the signal-to-noise ratio (SNR). This is defined by the ratio of signal power S to noise
power N, and is often expressed in deciBels;

SNR = 10log10(S/N) dB
Also note that it is common to see following expressions for power in many texts:

P dBW =10log10(S/1) dBW


P dBm =10log10(S/0.001) dBm
The first equation expresses power as a ratio to 1 Watt and the second equation expresses power
as a ratio to 1 milliWatt. These are expressions of power and should not be confused with SNR.
There is a theoretical maximum to the rate at which information passes error free over the
channel. This maximum is called the channel capacity C. The famous Hartley-Shannon Law
states that the channel capacity C is given by:

C = Blog2(1 + (S/N)) bits/sec


Note that S/N is linear (i.e. not stated in dB) in this expression. For example, a 10KHz channel
operating in a SNR of 15dB has a theoretical maximum information rate of b/s.
10,000 x log2(31.623) = 49,828 bits/second
The theorem makes no statement as to how the channel capacity is achieved. In fact, channels
only approach this limit. The task of providing high channel efficiency is the goal of coding
techniques. The failure to meet perfect performance is measured by the bit-error-rate (BER).
Typically BERs are of the order 10-6.
Bit error rate is a measured quantity and is the number of errored bits received divided by the
number of bits transmitted. If a system has a BER of 10-6 then this means that in a test
measurement, one bit error was received for every one million bits transmitted. The probability
of error, P(e), is a theoretical expectation of the bit error rate.

You might also like