0% found this document useful (0 votes)
74 views

CE Notes

The document discusses communication systems and source coding. It defines key concepts like discrete memoryless sources, source alphabet, entropy, information rate, and the Shannon source coding theorem. The Shannon theorem states that the minimum average code word length L for any lossless source encoding is greater than or equal to the entropy H(φ) of the source. It also defines the coding efficiency as the ratio of the source entropy to the actual average code word length.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

CE Notes

The document discusses communication systems and source coding. It defines key concepts like discrete memoryless sources, source alphabet, entropy, information rate, and the Shannon source coding theorem. The Shannon theorem states that the minimum average code word length L for any lossless source encoding is greater than or equal to the entropy H(φ) of the source. It also defines the coding efficiency as the ratio of the source entropy to the actual average code word length.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Communication system

The function of any communication system is to convey the information from source to
destination.

Discrete message

Message which is selected from a finite number of predetermined messages.

During one time one message is transmitted. During the next time interval the next from
the set is transmitted.

Memory source

A source with memory for which each symbol depends on the previous symbols.

Memoryless source

Memoryless in the sense that the symbol emitted at any time is independent of previous
choices.

Probabilistic experiment involves the observation of the output emitted by a discrete


source during every unit of time. The source output is modeled as a discrete random variable,
S,which takes on symbols form a fixed finite alphabet.

S = s0, s1, s2, · · · , sk-1

with probabilities

P(S = sk) = pk, k = 0, 1, · · · ,K – 1

We assume that the symbols emitted by the source during successive signaling intervals are
statistically independent. A source having the properties is described is called discrete
memoryless source, memoryless in the sense that the symbol emitted at any time is
independent of previous choices.
Source alphabet

The set of source symbols is called the source alphabet.

Symbols or letters

The element of the set is called symbol.

Uncertainty

The amount of information contained in each symbols is closely related to its uncertainty
or surprise.

If an event is certain (that is no surprise, of probability 1) it conveys zero information.

We can define the amount of information contained in each symbols.

I(sk) = logb(1/ pk)

Here, generally use log2 since in digital communications we will be talking about bits.
The above expression also tells us that when there is more uncertainty(less probability) of the
symbol being occurred then it conveys more information.

Unit of the information

The unit of the information depends on the base of the logarithmic

function. UNIT b VALUE

Binit 2

Decit (OR) Hartley 10

Natural unit(nat) e

When pk = ½ ,we have I(sk) = 1 bit.

Some properties of information are summarized here:

1. For certain event i.e, pk = 1 the information it conveys is zero, I(sk) =


0. Absolutely certain of the outcome of an event.
2. For the events 0 ≤ pk ≤ 1 the information is always I(sk) ≥0.Either provides some
or no information, but never brings about a loss of information.
3. If for two events pk > pi, the information content is always I(sk) < I(si).
The less probable the event is, the more information we gain when it occurs.
4. I(sksi) = I(sk)+I(si) if sk and si are statistically independent.
Proof:
p(sj, sk) = p(sj) p(sk) if sk and si are statistically independent.
I (sj, sk) = log (1 / p(sj, sk))
= log (1 / p(sj) p(sk))
= log (1 / p(sj)) + = log (1 / p(sk))
= I(sj) I(sk)
Average information or entropy

The amount of information I(sk) produced by the source during an arbitrary signalling
interval depends on the symbol sk emitted by the source at that time. Indeed, I(sk) is a discrete
random variable that takes on the values I(s0), I(s1), · · · , I(sK-1) with probabilities p0, p1, · · · ,
pK-1 respectively.

The mean of I(sk) over the source alphabet S is given by

The important quantity H(φ) is called the entropy of a discrete memoryless


source with source alphabet φ. It is a measure of the average information content per source
symbol. Note the entropy H(φ) depends only on the probabilities of the symbols in the alphabet
φ of the source.

Problem

1. A DMS has four symbols S1 , S2, S3 S4 with probabilities 0.40, 0.30, 0.20,
0.10
a. Calculate H(φ).
b. Find the amount of information contained in the message S 1S2S3 S4
and S4S3S3 S2 , and compare with the H(φ).
Solution

a. H(φ) = ∑ Pk log2(1/ Pk)


= - 0.4 log2 0.4 - 0.3 log2 0.3 - 0.2log2 0.2 - 0.1log20.1
= 1.85 b/symbol
b. P(S1S2S3 S4 ) = (0.4)(0.3)(0.2)(0.1) = 0.0096
I(S1S2S3 S4 )= - log2 (0.0096) = 0.60 b/symbol
I(S1S2S3 S4 ) < 7.4 ( 4 H(φ) )
P(S4S3S3 S2) = (0.1)(0.2)2 (0.3) = 0.0012
I(S4S3S3 S2) = -log2(0.0012) =9.70 b/ symbol
I(S4S3S3 S2) > 7.4 ( 4 H(φ) )

Some properties of entropy

The entropy H(φ) of a discrete memoryless source is bounded as follows:

0 ≤ H(φ) ≤ log2(K)

where K is the radix (number of symbols)of the alphabet S of the source.

Furthermore, we may make two statements:

1. H(φ) = 0, if and only if the probability pk = 1 for some k, and the remaining
probabilities in the set are all zero; this lower bound on entropy corresponds to no uncertainty.

2. H(φ) = log2(K), if and only if pk = 1/ K for all k; this upper bound on entropy
corresponds to maximum uncertainty.

Proof:

H(φ) ≥0.

Since each probability pk is less than or equal to unity, it follows that each term

Pk logb(1/ pk) is always nonnegative. So H(φ) ≥0.

The term Pk is zero if, and only if, pk = 0 or 1. That is pk =1 for some k and
all the rest are zero.

H(φ) ≤ log2(K)

To prove this upper bound , we make use of a property of the natural logarithm.

To proceed with this proof, consider any two probability distributions {p0, p1, · · · , pk-1 }
and {q0, q1, q2, · · · , qk-1 } on the alphabet φ = {s0, s1, s2, · · · , sk-1 } of a discrete
memoryless source. Then changing to the natural logarithm, we may write
Hence, using the inequality, we get

We thus have the fundamental inequality

Where the equality holds only if pk = qk for all k. Suppose we next put

So,

Thus H(φ) is always less than or equal to log2k. This equality holds only if the symbols
are equiprobable.

Entropy of a binary memoryless channel

Consider a discrete memoryless binary source shown defined on the alphabet φ = {0, 1}.
Let the probabilities of symbols 0 and 1 be p0 and 1- p0 respectively.

The entropy of this channel is given by


From which we observe the following:

1. When p0=0, the entropy H(φ) =0.


2. When p0=1, the entropy H(φ)=1.
3. The entropy attains its maximum value, Hmax = 1 bit, p0= p1= 1/2., that is, symbols 1
and 0 are equally probable.

Figure - Entropy of a discrete memoryless binary source

Information rate

If the source of the message generates messages at the rate of r messages per second,
then the information rate is defined to be

R= rH = average number of bits of information / second.

Example problem

An analog signal is bandlimited to B Hz,sampled at the nyquist rate, and the samples are
quantized into four levels. The quantization levels Q1,Q2,Q3,Q4(messages ) are assumed
independent and occur with probabilities P1=P4=1/8 and P2=P3=3/8. Find the information rate
of the source.

Solution

The average information H is

H= p1log2(1/p1) + p2log2(1/p2) + p3log2(p3) + p4log2(1/ p4)

=1/8 log2(8) +3/8 log2(8/3) +3/8 log2(8/3) +1/8 log2(8)

= 1.8 bits / message

The information rate R is

R= rH =2B(1.8) = 3.6 bits /s.


Shannon source coding theorem
An important problem in communication is the efficient representation of data generated
by a discrete source. The process by which this representation is accomplished is called source

encoding. The device that performs that representation is called a source encoder.

Variable length code

If some source symbols are known to be more probable than others, then the source
code is generated by assigning short code to frequent source symbols, and long code to rare
source symbols.

EX : Morse code, in which the letters and alphabets are encoded into streams of marks
and spaces, denoted as dots “.” And dashes “-“.

Our primary interest is in the development of an efficient source encoder that satisfies
two functional requirements:

1. The code words produced by the encoder are in binary form.

2. The source code is uniquely decodable, so that the original source sequence can be
reconstructed perfectly from the encoded binary sequence.

We define the average code word length, L, of the source encoder as

𝐾−1
L = ∑ 𝑘=0 pkIk

In physical terms, the parameter L represents the average number of bits per source
symbol used in the source encoding process. Let Lmin denote the minimum possible value of L.
We then define the coding efficiency of the source encoder as

η = Lmin / L

The source encoder is said to be efficient when η approaches unity.

Source coding theorem:

Given a discrete memoryless source of entropyH(φ) , the average code-word length L


for any distortionless source encoding scheme is bounded as

L ≥ H(φ)

According to the source-coding theorem, the entropy H(φ)represents a fundamental limit


on the average number of bits per source symbol necessary to represent a discrete memoryless
source in that it can be made as small as, but no smaller than, the entropy H(φ).Thus with Lmin
= H(φ), we may rewrite the efficiency of a source encoder in terms of the entropy H(φ)as
η = H(φ)/ L

Data Compaction:

1. Removal of redundant information prior to transmission.

2. Lossless data compaction – no information is lost.

3. A source code which represents the output of a discrete memoryless source should be
uniquely decodable.

Prefix Coding

Consider a discrete memoryless source of alphabet {s0, s1, s2, · · · , sk-1}

and statistics {p0, p1, p2, · · · , pk-1}.

For each finite sequence of symbols emitted by the source, the corresponding sequence
of code words is different from the sequence of code words corresponding to any other source
sequence. For the above mentioned symbol, let the code word be denoted by

{mk0, mk1, mk2, · · · , mkn-1}

– the element are 0s and 1s.

n - denotes the code word length

Prefix condition

The initial part of the code word is represented by the elements {mk0, mk1, mk2, · · · , mki}
Any sequence made up of the initial part of the code word is called prefix.

Prefix code

1. The Prefix Code is variable length source coding scheme where no code is the prefix
of any other code.

2. The prefix code is a uniquely decodable code.

3. But, the converse is not true i.e., all uniquely decodable codes may not be prefix
codes.
From 1 we see that Code I is not a prefix code. Code II is a prefix code. Code III is also
uniquely decodable but not a prefix code. Prefix codes also satisfies Kraft-McMillan inequality
which is given by

Code I violates the Kraft – McMillan inequality.

Both codes II and III satisfies the Kraft – McMillan inequality, but only code II is a prefix code.

Decoding procedure

1. The source decoder simply starts at the beginning of the sequence and decodes one
codeword at the time.
2. The decoder always starts at the initial state of the tree.
3. The received bit moves the decoder to the terminal state if it is 0,or else to next
decision point if it is 1.

Given a discrete memoryless source of entropy H(φ), a prefix code can be constructed
with an average code-word length l, which is bounded as follows:

H(φ)≤L≤ H(φ)+1
The left hand side of the above equation, the equality is satisfied owing to the condition
that, any symbol sk is emitted with the probability

where, lk is the length of the codeword assigned to the symbol sk.

Shannon – fano coding

Procedure

1. List the symbols in order of decreasing probability.


2. Partition the set into two sets that are as close to equiprobable as possible, and assign
0 to the upper set and 1 to the lower set.
3. Continue the process, each time partitioning the sets with nearly equal probabilities as
possible until further partitioning not possible.
Example problems
1. A DMS has six symbols S1 , S2, S3 S4, S5, S6, with corresponding probabilities 0.30,
0.25, 0.20, 0.12, 0.08, 0.05. construct a Shannon – fano code for S.
Sk pk Step 1 Step 2 Step 3 Step 4 Step 5
S1 0.30 0 0 00

S2 0.25 0 1
01
S3 0.20
1 0 10
S4 0.12
1 1 0 110
S5 0.08
1 1 1 0 1110
S6 0.05
1 1 1 1 1111

2. A DMS has six symbols S1 , S2, S3, S4 with corresponding probabilities, 1/2 , 1/4, 1/8, 1/8,
construct a Shannon – fano code for S.
Sk pk Step 1 Step 2 Step 3 Step 4
S1 1/2 0
0
S2 1/4
1 0 10
S3 1/8
1 1 0 110
S4 1/8
1 1 1 111
www.padeepz.net
3. A DMS has five equally likely symbols S1 , S2, S3 S4, S5 construct a Shannon – fano
code for S.

Sk pk Step 1 Step 2 Step 3 Step 4


S1 0.20
0 0 00
S2 0.20
0 1 01
S3 0.20
1 0 10
S4 0.20
1 1 0 110
S5 0.20
1 1 1 111

Huffman coding

It is to assign to each symbol of an alphabet a sequence of bits roughly equal in length


to the amount of information conveyed by the symbol in question.

Algorithm

1. The source symbols are listed in order of decreasing probability. The two source
symbols of lowest probability are assigned a 0 and a 1. This part of the step is reffered to as a
splitting stage.

2. These two source symbols are regarded as being combined into a new source symbol
with probability equal to the sum of the original probabilities. The probability of the new symbol
is placed in the list in accordance with its value.

3. The procedure is repeated until we are left with a final list of source statistics of only
two for which a 0 and a 1 are assigned.

Symbol Stage I Stage II Stage III Stage IV

S0, 0.4 0.4 0.4 0.6 0


S1 0.2 0.2 0 1
0.4 0.4
S2 0.2 0.20 0.
1
2
S3 0.1 0 0.2 1
S4 0.1 1

www.padeepz.net
The code for each source symbol is found by working backward and tracing the
sequence of 0s and 1s assigned to that symbol as well as its successors.

Symbol probability code word

S0, 0.4 00
S1 0.2 10
S2 0.2 11
S3 0.1 010
S4 0.1 011

Drawbacks:

1. Requires proper statistics.

2. Cannot exploit relationships between words, phrases etc.,

3. Does not consider redundancy of the language.

Lempel-ziv coding

1. Overcomes the drawbacks of Huffman coding

2. It is an adaptive and simple encoding scheme.

3. When applied to English text it achieves 55% in contrast to Huffman coding which achieves
only 43%.

4. Encodes patterns in the text This algorithm is accomplished by parsing the source data
stream

into segments that are the shortest subsequences not encountered previously.

Problems

Let the input sequence be

000101110010100101.........

We assume that 0 and 1 are known and stored in codebook

subsequences stored : 0, 1

Data to be parsed: 000101110010100101.........

The shortest subsequence of the data stream encountered for the first time and not seen before
is 00

subsequences stored: 0, 1, 00

Data to be parsed: 0101110010100101.........

The second shortest subsequence not seen before is 01; accordingly, we go on to write

Subsequences stored: 0, 1, 00, 01

Data to be parsed: 01110010100101.........

We continue in the manner described here until the given data stream has been completely
parsed. The

code book is shown below:

Numerical positions: 1 2 3 4 5 6 7 8 9

subsequences: 0 1 00 01 011 10 010 100 101

Numerical Repre

sentations: 11 12 42 21 41 61 62

Binary encoded

blocks: 0010 0011 1001 0100 1000 1100 1101

Discrete Memoryless Channels

Let X and Y be the random variables of symbols at the source and destination
respectively. The description of the channel is shown in the Figure
The channel is described by an input alphabet

{x0, x1, x2, · · · , xJ-1}, J – input alphabet size

an output alphabet

{y0, y1, y2, · · · , yK-1}, K – output alphabet size

and a set of transition probabilities

p(yk/xj) = p(Y= yk / X= xj ) for all j and k

channel matrix or transition matrix

p(y0/x0) p(y1/x0) ……… p(yK-1/x0)

p(y0/x1) p(y1/x1) ……… p(yK-1/x1)

P= .

p(y0/xJ-1) p(y1/xJ-1) ……… p(yK-1/xJ-1)

each row corresponds to a fixed channel input.

each column corresponds to a fixed channel output.

Input probability distribution p(xj) , j=1,2,……J-1, the event that the channel input X= xj

occurs with probability

p(xj) = p(X=xj) for all j.

joint probability distribution

p(xj, yk) = p(X=xj, Y= yk)


= p(yk/xj) p(xj)

Marginal probability distribution of the output random variable Y is obtained by averaging out the
dependence of p(xj, yk) on xj,

Binary Symmetric Channel

– A discrete memoryless channel with J = K = 2.

– The Channel has two input symbols(x0 = 0, x1 = 1)and two output symbols(y0 = 0, y1 =
1).

– The channel is symmetric because the probability of receiving a 1 if a 0 is sent is same


as the probability of receiving a 0 if a 1 is sent.

– The conditional probability of error is denoted by p. Abinary symmetric channel is shown


in Figure and its transition probability matrix is given by
Mutual Information

If the output Y as the noisy version of the channel input X and H(X) is the uncertainity
associated with X, then the uncertainity about X after observing Y , H(X|Y) is given by

The quantity H(X|Y) is called Conditional Entropy. It is the amount of uncertainity about
the channel input after the channel output is observed. Since H(X) is the uncertainity in channel
input before observing the output, H(X) - H(X|Y) represents the uncertainity in channel input that
is resolved by observing the channel output. This uncertainity measure is termed as Mutual
Information of the channel and is denoted by I(X; Y).

Where the H(Y) is the entropy of the channel output and H(Y/X) is the conditional
entropy of the channel output given the channel input.

Properties of Mutual Information

Property 1:

The mutual information of a channel is symmetric, that is

I(X; Y) = I(Y;X)
Where the mutual information I(X; Y) is a measure of the uncertainty about the channel input
that is resolved by observing the channel output, and the mutual information I(Y;X) is a measure

of the uncertainty about the channel output that is resolved by sending the channel output.

Proof:

Substituting Eq.3 and Eq.10 in Eq.4 and then combining, we obtain

From Bayes’ rule for conditional probabilities, we have

Hence, from Eq.11 and Eq.12


Property 2:

The mutual is always non-negative, that is I(X; Y) ≥0

Proof:

We know,

Substituting Eq. 14 in Eq. 13, we get

Using the following fundamental inequality which we derived discussing the properties of
Entropy,

Drawing the similarities between the right hand side of the above inequality and the left hand
side of Eq. 13, we can conclude that

With equality if, only if,

p(xj |yk) = p(xj)p(yk ) for all j and k.

Property 2 states that we cannot lose information, on the average, by observing the
output of a channel. Moreover, the mutual information is zero if, and only if, the input and output
symbols of the channel are statistically independent.

Property 3:

The mutual information of a channel is related to the joint entropy of the channel input and
channel output by

where, the joint entropy (X, Y) is defined as


Proof:

Therefore, from Eq. 18 and Eq. 23, we have

Channel Capacity

Channel Capacity, C is defined as ‘the maximum mutual information I(X; Y) in any single
use of the channel(i.e., signaling interval), where the maximization is over all possible input
probability distributions {p(xj)} on X”
C is measured in bits/channel-use, or bits/transmission.

Example:

For, the binary symmetric channel discussed previously, I(X; Y) will be maximum when

p(x0) = p(x1) = 1/2 . So, we have

Since, we know

Using the probability values in Eq. 3 and Eq. 4 in evaluating Eq. 2, we get

FIGURE-Variation of channel capacity of a binary symmetric channel with transition


probability p.
1. When the channel is noise free, p=0, the channel capacity C attains its maximum
value of one bit per channel use. At this value the entropy function attains its
minimum value of zero.
2. When the conditional probability p=1/2 due to noise, the channel capacity C attains
its minimum value of zero,whereas the entropy function attains its maximum value of
unity, in such a case the channel is said to be useless.
Channel Coding Theorem:

Goal: Design of channel coding to increase resistance of a digital communication system


to channel noise.

Channel coding

Mapping of the incoming data sequence into channel input sequence. It is performed in
the transmitter by a channel encoder.

Channel decoding (inverse mapping)

Mapping of the channel output sequence into an output data sequence. It is performed in
the receiver by a channel decoder.

The channel coding theorem is defined as process to introduce redundancy in order to


reconstruct the original source sequence as accurately as possible.

1. Let a discrete memoryless source

– with an alphabet φ

– with an entropy H(φ)

– produce symbols once every Ts seconds

2. Let a discrete memoryless channel

– have capacity C

– be used once every Tc seconds.

3. Then if,
There exists a coding scheme for which the source output can be transmitted over the channel
and be reconstructed with an arbitrarily small probability of error. The parameter C / Tc

is called critical rate.

4. Conversly, if

it is not possible to transmit information over the channel and reconstruct it with an arbitrarily
small probability of error.

Example:

Considering the case of a binary symmetric channel, the source entropy H(Φ) is 1. Hence, from
the above equation, we have

But the ratio Tc / Ts equals the code rate, r of the channel encoder.

r≤C

Hence, for a binary symmetric channel, if r ≤ C, then there exists a code capable of achieving an
arbitrarily low probability of error.

Information Capacity Theorem:


The Information Capacity Theorem is defined as ‘The information capacity of a
continuous channel of bandwidth B hertz, perturbed by additive white Gaussian noise of power
spectral Density N0/ 2 and limited in bandwidth to B, is given by

where P is the average transmitted power. Proof:

Assumptions:

1. band-limited, power-limited Gaussian channels.

2. A zero-mean stationary process X(t) that is band-limited to B hertz, sampled at


Nyquist rate of 2B samples per second

3. These samples are transmitted in T seconds over a noisy channel, also band-limited
to B hertz.

The number of samples, K is given by

We refer to Xk as a sample of the transmitted signal. The channel output is mixed with additive
white Gaussian noise(AWGN) of zero mean and power spectral density N0/2. The noise is
band-limited to B hertz. Let the continuous random variables Yk, k = 1, 2, · · · ,K denote samples
of the received signal, as shown by

'

The noise sample Nk is Gaussian with zero mean and variance given by

The transmitter power is limited; it is therefore


Now, let I(Xk; Yk) denote the mutual information between Xk and Yk. The capacity of the channel
is given by

The mutual information I(Xk; Yk) can be expressed as

This takes the form

When a symbol is transmitted from the source, noise is added to it. So, the total power is P + σ2.

For the evaluation of the information capacity C, we proceed in three stages:

1. The variance of sample Yk of the received signal equals P + σ2. Hence, the differential
entropy of Yk is

2. The variance of the noise sample Nk equals σ2. Hence, the differential entropy of Nk is given
by

3. Now, substituting above two equations into

bits per transmission.


The number K equals 2BT. Accordingly, we may express the information capacity in the
equivalent form:

You might also like