0% found this document useful (0 votes)
19 views30 pages

Chapter 1 (A)

Uploaded by

Nidhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views30 pages

Chapter 1 (A)

Uploaded by

Nidhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

Information Theory and Coding

1
Chapter 1
Information Sources
&
Source Coding

2
Reading Material

• Richard B. Wells: Applied Coding and Information


Theory for Engineers; Pearson Education (Prentice
Hall), 1999. [Chapter 1]

• Simon Haykin: Communication Systems, 4th Ed.,


John Wiley & Sons, 2001. [Chapters 8]

• Claude Shannon: "A mathematical theory of


communication", Bell System Technical Journal, vol
27; pp. 379-423, July 1948; pp. 623-656, Oct 1948.

3
Part a
• Introduction
• Information Sources
• Uncertainty and Information
• Amount of Information
• Average Information and Entropy

4
Introduction
The information theory came into existence in 1948,
with the publication of Claude Shannon’s paper
"A mathematical theory of communication”

The results took the scientific community by surprise.


It was generally believed that increasing the
transmission rate of information over a
communication channel increased the
probability of error.

But Shannon proved that this is not true as long as the


5
communication rate is below channel capacity.
Information Theory ??
What is Information Theory?
Why study Information Theory?
Information Theory in the context of
* Communication Systems
* Information/Computer Science
* Communication is the transfer of information from one
place to another
* Key feature is the transfer of random signals over
transmission channels with random behaviour

6
Communication System Model

Coding
Information Source Channel
source coding coding

Information
channel

Source Channel
Destination
decoding decoding
Decoding

7
Information Theory ??

This should be accomplished


* as efficiently as possible
(bandwidth and power)
bits/sec/Hz for Digital Communication
* with as much fidelity as possible (for analogue com) - SNR
* with as much reliability as possible (for digital com) – BER
* as securely as possible (encryption schemes)

8
Introduction
The purpose of a communication system is to transmit information
from one point to another with high efficiency and reliability.
Information theory provides a quantitative measure of the
information contained in message signals and allows us to
determine the capacity of a communication system to transfer this
information from source to destination.
Through the use of coding, redundancy can be reduced from
message signals so that channels can be used with improved
efficiency. This is called Source Coding.
In addition, systematic redundancy can be introduced into the
transmitted signal so that channels can be used with improved
reliability. This is called Channel Coding or Error Control
Coding. 9
Introduction
• Information theory applies the laws of
probability theory, and mathematics in general,
to study the collection and processing of
information.

• In the context of communication systems,


information theory, originally called the
mathematical theory of communication,
deals with mathematical modelling and analysis
of communication systems, rather than with
physical sources and physical channels.

10
Introduction
In particular, Information Theory provides answers to
the following two fundamental questions:
 What is the minimum number of bits per entropy of a source
symbol required to fully represent the
source? capacity of a channel
 What is the ultimate transmission rate for

reliable communication over a noisy


Information
channel? theory is the scientific study of information
and communication systems designed to handle it (information).
• Including telegraphy, radio communications, and all other systems
concerned with the processing and/or storage of signals.

11
Information Sources
An information source is an object that produces
an event, the outcome of which is random and in
accordance with some probability distribution.
A practical information source in a communication system is a
device that produces messages. It can be either analogue or digital.
Here, we shall deal mainly with the discrete sources, since the
analogue sources can be transformed to discrete sources through
the use of sampling and quantisation techniques.

A discrete information source is a source that has only


a finite set of symbols as possible outputs. The set of
possible source symbols is called the source alphabet,
and the elements of the set are called symbols.
12
Information Sources
Information sources can be classified as
•having memory
•being memory-less
A source with memory is one for which a current symbol
depends on the previous symbols.
A memory-less source is one for which each symbol produced is
independent of the previous symbols
i.e. the symbols emitted during successive signalling intervals are
statistically independent.

A source having the properties just described above is termed a


discrete memory-less source, memory-less in the sense that the
symbol emitted at any time is independent of previous choices.
13
Uncertainty and Information
Can we find a measure of how much
“information” is produced by such a source
?

The idea of information is closely related to that of


“uncertainty” or “surprise”

14
Uncertainty and Information
• Suppose that a probabilistic experiment involves the
observation of the output emitted by a discrete memory-less
source during every unit of time (signalling interval).
• The source output is modelled as a discrete random variable
S which takes on K-symbols from a fixed finite alphabet

{s0, s1, s2, …, sK-1}


with probabilities
P(S=sk) = pk
where, i = 0, 1, …, K-1.

Of course, this set of probabilities must satisfy the


condition
pk = 1 15
Uncertainty and Information
Consider the event S=sk, with probability pk.
It is clear that:
• if the probability pk=1 and pi=0 for all ik,
then there is no “surprise” and therefore no
“information”.
• if the source symbols occur with different
probabilities, and the probability pk is low, then
there is more “surprise” and therefore more
“information” compared to when symbol sk is
emitted by the source with higher probability than
the symbol si (where ik).
16
Uncertainty and Information
Thus, “uncertainty”, “surprise” and “information” are all
related.
• Before the event (S=sk) occurs there is an amount of
uncertainty. When the event occurs there is an amount of
surprise. After the occurrence of the event S=sk there is
gain in the amount of information.
• All these three amounts are obviously the same.

Moreover, the amount of information is related to


the inverse of the probability of occurrence of the
related event.
17
Amount of Information
The definition of amount of information gained after observing
the event S=sk, is denoted by I(sk). It must have the following
properties:
 I(sk) = 0 for pk = 1.
If we are absolutely certain of the outcome of an event, even before
it occurs, then there is no information gained after the occurrence of
the event.
 I(sk)  0 for 0  pk  1.
In other words, the occurrence of the event S=sk either provides some
or no information, but never brings about a loss of information.
 I(sk) > I(si) for pk < pi.
That is, the less probable an event is, the more information we gain
when it occurs.
 I(sks) = I(sk) + I(s) if sk and s are statistically
independent. In other words, information contained in statistically
18
independent outcomes should add.
Amount of Information
The amount of information I(sk), associated with a source symbol sk
which occurs with probability pk, is defined by the logarithmic function

I(sk) = log(1/pk)
It is the standard practice to use a logarithm to base 2. The resulting unit of
information is called the bit.
I(sk) = log2(1/pk)
= -log2(pk) for k = 0,1, 2, …., K-1
• When pk=1/2, we have I(sk)= 1 bit. Hence, one bit is the amount of
information that we gain when one of two possible and equally likely
events occurs.

19
Amount of Information
• Note that the information-theoretic concept of “bit” is quite
unrelated to the computer science usage of the term “bit” .

• In computer science usage, the term “bit” is used as an


abbreviation for the phrase binary digit.
• In information theory, the term “bit” is used as the unit of
information measure an abbreviation for binary digit or as a unit
of information measure.

From the context, it should be clear whether the term bit is


intended as an abbreviation of the phrase binary digit or as the
unit of information measure.
20
Amount of Information
In case, there is likely to be an ambiguity whether “bit”
is intended as an abbreviation for binary digit or as a
unit of information measure, it is customary to refer to
a binary digit as a binit.
Note that if the probabilities of the two binits (0 and 1) are not
necessarily equal, one binit may convey more and the other binit
may convey less than 1 bit of information.

I(sk) = log2(1/pk) = -log2(pk) in bits


Example: Consider a diskette storing a data file consisting of 100,000
binary digits (binits), i.e., a total of 100,000 “0”s and “1”s . If the binits
0 and 1 occur with probabilities of ¼ and ¾ respectively, then binit 0
conveys an amount of information equal to log 2 (4/1) = 2 bits, while the
binit 1 conveys information amounting to log2 (4/3) = 0.42 bit. 21
Amount of Information
Example 1.1
A source emits one of four possible symbols during
each signalling interval. These symbols occur with
the probabilities: po=0.4, p1=0.3, p2=0.2 and p3=0.1.
Find the amount of information gained by observing
the source emitting each of these symbols.

Solution
Let the event S=sk denote the emission of symbol sk by the source.
Hence, I(sk) = log2(1/pk) bits
I(s0) = log2(1/0.4) = 1.322 bits
I(s1) = log2(1/0.3) = 1.737 bits
I(s2) = log2(1/0.2) = 2.322 bits
I(s3) = log2(1/0.1) = 3.322 bits
22
Average Information and Entropy
• Messages produced by information sources consist of sequences of symbols. While
the receiver of a message may interpret the entire message as a single unit,
communication systems often have to deal with individual symbols.
For example, if we are sending messages in English language, the user at the receiving end is
interested mainly in words, phrases and sentences, whereas the communication system has to
deal with individual letters or symbols.
• Hence it is desired to know the average information content per source symbol, known
also as entropy, (H).

H = The mean value/expectation of I(sk) which is a discrete random variable


that takes on the values I(s0), I(s1),…, I(sK-1) with probabilities p0, p1, …, pK-1
respectively and that are statistically independent.

H = E[I(sk)]

= pkI(sk)
23
= pklog2(1/pk)
Average Information and Entropy
The quantity H is called the entropy of a discrete memory-less
source. It is a measure of the average information content per
source symbol. It may be noted that the entropy H depends on the
probabilities of the symbols in the alphabet of the source.

Example 1.2a Solution


Consider a discrete
memory-less source The entropy of the given source is
with source
H = p0log2(1/p0) + p1log2(1/p1) + p2log2(1/p2)
alphabet {s0,s1,s2}
with probabilities = ¼log2(4) + ¼log2(4) + ½log2(2)
p0=1/4, p1=1/4 and
p2=1/2. Find the = 2/4 + 2/4 + 1/2
entropy of the = 1.5 bits
source.
24
Average Information and Entropy
Example 1.2b
Consider another source X, having an infinitely
large set of outputs with probabilities of
occurrence given by {P(xi)=2-i, i=1,2,3,….}. What is
the average information or entropy of the source,
H(X).
 
1
H X   pxi log  2  i log 2i
i 1 pxi  i 1


 i.2 2 bits
i

i 1

25
Properties of Entropy
For a discrete memory-less source with a fixed alphabet:
• H=0, if and only if the probability pk=1 for some k,
and the remaining probabilities in the set are all zero.
This lower bound on the entropy corresponds to ‘no
uncertainty’.
• H=log2(K), if and only if pk=1/K for all k (i.e. all the
symbols in the alphabet are equiprobable). This upper
bound on the entropy corresponds to ‘maximum
uncertainty’.
Hence,
0  H  log2(K)
K is the radix (number of symbols) of the alphabet S of the source.

26
Properties of Entropy

H  log2(K)

Graphs of the functions x  1 and log x versus x.


27
Example 1.3: Entropy of Binary Memory-less Source

Entropy function H(p0).


28
Properties of Entropy
Let us examine H under different cases for K=2:
Case I : p1 0.01, p 2 0.99, H 0.08
Case II : p1 0.4, p 2 0.6, H 0.97

Case III : p1 0.5, p 2 0.5, H 1

29
Properties of Entropy
• In Case I, it is very easy to guess whether the message s 0 with a
probability =0.01 will occur or the message s1 with probability =0.99
will occur.(Most of the time message s1 will occur). Thus in this case,
the uncertainty is less.
• In Case II, it is somewhat difficult to guess whether s 0 will occur or s1
will occur as their probabilities are nearly equal. Thus in this case,
the uncertainty is more.
• In Case III, it is extremely difficult to guess whether s 0 or s1 will
occur, as their probabilities are equal. Thus in this case, the
uncertainty is maximum.

Entropy is less when uncertainty is less.


Entropy is more when uncertainty is more.
Thus, we can say that entropy is a measure of uncertainty.
30

You might also like