0% found this document useful (0 votes)
38 views78 pages

Itc Term1

Bharati Vidyapeeth Notes

Uploaded by

Harshit Bhardwaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views78 pages

Itc Term1

Bharati Vidyapeeth Notes

Uploaded by

Harshit Bhardwaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

INFORMATION THEORY AND CODING

UNIT I

ETEC 304

VI Semester
Learning Objective


Introduction to Information Theory
⚫ Modelling of Information Source
⚫ Entropy (Joint /Conditional)

⚫ Source Coding Theorm

⚫ Discrete mmemoryless channels

⚫ Channel capacity

Data compaction

⚫ Markov Sources
1916-2001 2
1. The purpose of communication system is to carry information bearing base band
signals from one place to another placed over a communication channel.
2. Information theory is concerned with the fundamental limits of communication.
What is the ultimate limit to data compression?
What is the ultimate limit of reliable communication over a noisy
channel,
3. Information Theory is a branch of probability theory which may be applied to
the study of the communication systems that deals with the mathematical
modelling and analysis of a communication system rather than with the
physical sources and physical channels.
4. Two important elements presented in this theory are Binary Source (BS) and the
Binary Symmetric Channel (BSC).
5. A binary source is a device that generates one of the two possible symbols ‘0’
and ‘1’ at a given rate ‘r’, measured in symbols per second.
6. The BSC is a medium through which it is possible to transmit one symbol per
time unit.
What is Information Theory

"A Mathematical Theory of Communication",


Bell System Technical Journal. 1948

All about max-min problems in communications

Information theory deals with fundamental limits on comm.


▪ Channel transmission rate: What is the maximum rate at which
information can be reliably transmitted over a communication
channel? 3
▪ Source compression rate: What is the minimum rate at which information
can be compressed and still be retrievable with small or no error?

▪ What is the complexity of such optimal schemes?


➢ There are three main concepts in this theory:
1. The first is the definition of a quantity that can be a valid measurement of information
which should be consistent with a physical understanding of its properties.
2. The second concept deals with the relationship between the information and the
source that generates it. This concept will be referred to as the source information.
Compression and encryptions are related to this concept.
3. The third concept deals with the relationship between the information and the
unreliable channel through which it is going to be transmitted. This concept leads to
the definition of a very important parameter called the channel capacity. Error -
correction coding is closely related to this concept.
Information

▪ What is Information and How to measure it?


▪ Examples:
▪ “The sun will rise” The moon will rise midnoon.
▪ “It will rain tomorrow” It will snow in Delhi

▪ “Every one got ‘A+’ in the mid-term test”

▪ Information is proportional to uncertanity in an outcome of an event.


Information of an event depends only on its probability of occurrence
and is not dependent on its content.

▪ The smaller the probability of an event is, the more information the
occurrence of that event will convey.
▪ The message associated with the least likelihood
event contains the maximum information.
Measure of Information

▪ The information that a source event can convey and


the probability of the event satisfy:
I.
II. decreases → increases, vice versa

III. Consider multiple independent events e.g ,x1,x2 …


I (P(x1) P(x2)…….]= I[P(x1)+P(X2)]+…….
If two independent events occur (whose joint probability is the product of their individual
probabilities), then the information we get from observing the events is the sum of the two
information: I (p1* p2) = I (p1) + I (p2).

1V. Information is a non-negative quantity: I (p) ≥ 0.

▪ Information of symbol
a =10 ,hartley
1
I = log a = − log a P( X )

P( X ) ➢

1 Bit =0.6932 nat = 0.3010 decit


Modeling of Information Source

▪ Information source can be modeled by random process which


produces set of source symbols called as SOURCE ALPHABET . The
elements of set are known as SYMBOLS/LETTERS

Each symbol produced is independent


Discrete memoryless source (DMS) of previous symbol.
▪ A discrete-time, discrete-amplitude random process random variables
▪ A full description of DMS:
▪ Alphabet set A = {a1, a1,…, aN } where the random variable A
takes its values

▪ Probabilities pi i=1


N

A discrete memoryless source (DMS) can be characterized by the list of the


symbol, the probability assignment of these symbols and the specification of
the rate of generating these symbols by the source.
Entropy ( Average Information)

▪ Consider a discrete source with N possible symbols


Since long sequences of Symbols are transmitted we opt for kknowing the average information
of a source rather than infomation/symbol)
▪ Entropy H(.): average amount of information conveyed per
symbol ( The mean value of I(x ) over source alphabet X with N possible symbols is :)
 N
H ( X ) = E  I (x j )  =  P(x j ) log 2 (bit/symbol)
1
j=1 P(x j )

The entropy, H, of a discrete random variable X is a measure of the amount


of uncertainty associated with the value of X.
For quantitative representation of average information per symbol we make the
following assumptions:
i) The source is stationary so that the probabilities may remain constant with
time.
ii) The successive symbols are statistically independent and come from the
source at an average rate of ‘r’ symbols per second

▪ Example: Consider a discrete memoryless source X having 3


symbols where . Determine
the entropy of the source.
▪ Solution: 1 1 1
H = p1 log 2 + p2 log 2 + p3 log 2
p1 p2 p3
1 1 1 8
= 1+  2 +  2 = 1.5bit/Symbol
2 4 4
Entropy (Cont’d)

▪ What is the maximum entropy?


▪ Consider binary case with

1 1
H = p log 2 + (1− p) log
2
p 1− p
H

1.0
Entropy is maximized when all
the symbols are equiprobable
0 p
0.5 1
N

H =
▪ N symbols: 1
log2 N = log2 N bit/symbol
n=1 N
9
Information Rate := If
the time rate at which source X emits
symbols is r (symbols/sec). the information rate = rH(X) b/s .

R = r H(X) b/s [(symbols / second) * ( bits/ symbol)].

▪ Q1) A source with bandwidth 4KHz is sampled at the Nyquist rate

▪ Assuming that the resulting sequence can be modeled by a


discrete memoryless source with probabilities

▪ What is the information rate of


the source in bit/sec?

10
Solution

▪ We have
1 1 1 1
H (X ) = log 2 2 + log 2 4 + log 2 8 + 2  log 2 16
2 4 8 16
15
= bits/sample
8

▪ Since we have 8000 samples/sec, the source produces information at a


rate of 15k bits/sec.

Properties of Entropy:
➢ 0 ≤ 𝐻 𝑋 ≤ log2 N ; M = no.of symbols of
the alphabet of source X.
➢ When all the events are equally likely, the average
uncertainty must have the largest value i.e. log2 N ≥ 𝐻 ( 𝑋)
Extension of DMS :
Symbols are taken 2 at a time then it is second order extension of the source , x1x2,
x1x1, x2x1,x2x2

Q) DMS generates symbols with probabilities as :p( x1)=0.6, x2=0.3, x3=0.1.


Calculate the entropy of the source , entropy of 2nd order extemsion of the source . Show that
H(X2)= 2 H(X).

Hint: 3 symbols hence 9 combinations, probabilities of individual symbols get multiplied. Ans:
H(X)= 1.29 bits/sym , H(X2)=2.59bits/sym.
(x1x1, x1x2, x1x3, x2x2, x2x3, x2x1, x3x3, x3x2, x3x1)

SOURCE EFFICIENCY : Average information conveyed by the source (H(X)) to the maximum
average information .
Maximum average information is H(Xmax)= log2N, where N is number of symbols assuming
equal probability. γ_X=(H(X)) /〖H(X)〗_ max ×100%

Redundancy = R_X=1-γ_X

Q) Source emits three symbols with 0.7,0.15, and 0.15 as probabilities. Calculate the efficiency and
redundancy. Ans: 74.5 % and 25.4 %.
Source Coding

The conversion of the output of a discrete memory less source (DMS) into a sequence of binary
symbols i.e. binary code word, is called Source Coding

The Code produced by a discrete memoryless source (DMS) , has to be efficiently represented
to minimize the average bit rate required to represent the source or minimize the average bit
rate required for representation of the source by reducing the redundancy of the information
source.

For example, in telegraphy, we use Morse code, in which the alphabets are denoted by Marks and
Spaces. If the letter E is considered, which is mostly used, it is denoted by “.” Whereas the
letter Q which is rarely used, is denoted by “--.-”

Sk is symbols emitted from the source and bk is coded information .


I. Code word Length:

Let X be a DMS with finite entropy H (X) and an symbols {𝑥1 … … . . 𝑥 𝑚 } with
corresponding probabilities of occurrence P(xi) (i = 1, …. , m). Let the binary code word
assigned to symbol xi by the encoder have length ni, measured in bits. The length of the
code word is the number of binary digits in the code word.
II. Average Code word Length:
The average code word length L, per source symbol is givenby

The parameter L represents the average number of bits per source symbol used in the
source coding process.

III. Code Efficiency:


The code efficiency η is defined as
𝜼 = 𝑳𝒎𝒊𝒏 /
𝑳
where Lmin is the minimum value of L. As η approaches unity, the code is said to be
efficient.

IV. Code Redundancy:


The code redundancy γ is defined as
𝜸 = 𝟏 −ƞ
Source coding theorem

The source coding theorem states that for a DMS X, with entropy H (X), the average
code word length L per symbol is bounded as L ≥ H (X). Further, L can be made as
close to H (X) as desired for some suitable chosen code.
Thus, with
𝐿𝑚𝑖𝑛 = 𝐻 (𝑋)
The code efficiency can be rewritten as
𝜂 = 𝐻(𝑋) / 𝐿

Source coding theorem states that " There should be H(X) number of bits to represent
any of the symbols emitted by the source to have lossless communication .
This source coding theorem is called as noiseless coding theorem as
it establishes an error-free encoding. It is also called as Shannon’s first theorem.

H(X) also represents the minimum rate at which an information source can be compressed for
reliable reconstruction.
Classification of Code:
✓ Fixed – Length Codes
✓ Variable – Length Codes
✓ Distinct Codes
✓ Prefix – Free Codes
✓ Uniquely Decodable Codes
✓ Instantaneous Codes
✓ Optimal Codes

xi Code 1 Code 2 Code 3 Code 4 Code 5 Code 6


x1 00 00 0 0 0 1
x2 01 01 1 10 01 01
x3 00 10 00 110 011 001
x4 11 11 11 111 0111 0001
Fixed – Length Codes:
A fixed – length code is one whose code word length is fixed. Code 1 and Code 2
of above table are fixed – length code words with length 2.
Variable – Length Codes:
A variable – length code is one whose code word length is not fixed. All codes of
above table except Code 1 and Code 2 are variable – length codes.
Distinct Codes:
A code is distinct if each code word is distinguishable from each other. All codes of
above table except Code 1 are distinct codes.
Prefix – Free Codes:
A code in which no code word can be formed by adding code symbols to another
code word is called a prefix- free code. In a prefix – free code, no code word is
prefix of another. Codes 2, 4 and 6 of above table are prefix – free codes.
Uniquely Decodable Codes:
A distinct code is uniquely decodable if the original source sequence can be reconstructed
perfectly from the encoded binary sequence.
A sufficient condition to ensure that a code is uniquely decodable is that no code word is a prefix
of another. Thus the prefix – free codes 2, 4 and 6 are uniquely decodable codes.
A uniquely decodable code is called an instantaneous code if the end of any code word is
recognizable without examining subsequent code symbols. The instantaneous codes have the
property previously mentioned that no code word is a prefix of another code word. Prefix – free
codes are sometimes known as instantaneous codes.
Optimal Codes:
A code is said to be optimal if it is instantaneous and has the minimum average L for a given
source with a given probability assignment for the source symbols.
Kraft Inequality:
Let X be a DMS with alphabet 𝑥𝑖(𝑖= 1,2, …, 𝑚). Assume that the length of the
assigned binary code word corresponding to xi is ni.
A necessary and sufficient condition for the existence of an instantaneous binary code
is
𝑚

𝐾 = Σ 2−𝑛 ≤ 1 𝑖

𝑖=1

This is known as the Kraft Inequality.


It may be noted that Kraft inequality assures us of the existence of an instantaneously
decodable code with code word lengths that satisfy the inequality.
But it does not show us how to obtain those code words, nor does it say any code
satisfies the inequality is automatically uniquely decodable.

Q) DMS given with 4 symbols. Which code does not satisfy Kraft Inequality. Show A and D are
UDC. (Code C: Satisfies KE but is not UDC.)
xi CODE CODE CODE CODE
A B C D
X1 00 0 0 0
X2 01 10 11 100
X3 10 11 100 110
X4 11 110 110 111
Entropy Coding :

The design of a variable – length code such that its average code word
length approaches the entropy of DMS is often referred to as Entropy
Coding.

There are basically two types of entropy coding, viz.


Huffman Coding
Shannon – Fano Coding
HUFFMAN CODING : results in an optimal code. It is
the code that has the highest efficiency.
The Huffman coding procedure is as follows:
1. List the source symbols in order of decreasing probability.
2. Combine the probabilities of the two symbols having the lowest probabilities and reorder the
resultant probabilities, this step is called reduction 1. The same procedure is repeated until there
are two ordered probabilities remaining.
3. Start encoding with the last reduction, which consists of exactly two ordered probabilities. Assign
0 as the first digit in the code word for all the source symbols associated with the first probability;
assign 1 to the second probability.
4. Now go back and assign 0 and 1 to the second digit for the two probabilities that were combined
in the previous reduction step, retaining all the source symbols associated with the first
probability; assign 1 to the second probability.
5. Keep regressing this way until the first column is reached.
6. The code word is obtained tracing back from right to left.
Huffman Source Coding

▪ Huffman coding is a variable-length binary coding.


▪ The idea is to map the more probable source sequences to shorter
binary codewords
▪ Synchronization is a problem in variable-length coding
▪ Example:

Huffman code
17
Source P (xi) Codeword
Sample
xi H (X) = 2.36 b/symbol
x1 0.30 00 L = 2.38 b/symbol
x2 0.25 01 η = H (X)/ L = 0.99
x3 0.20 11
x4 0.12 101
x5 0.08 1000
x6 0.05 1001
Shannon Fano Algorithm
Q) Let there be six (6) source symbols having probabilities as x1 = 0.30, x2 =
0.25, x3 = 0.20, x4 = 0.12, x5 = 0.08 x6 = 0.05. Obtain the Shannon – Fano
Coding for the given source symbols.

xi P (xi) Step 1 Step 2 Step 3 Step 4 Code


x1 0.30 0 0 00
x2 0.25 0 1 01
x3 0.20 1 0 10
x4 0.12 1 1 0 110
x5 0.08 1 1 1 0 1110
x6 0.05 1 1 1 1 1111

H (X) = 2.36 b/symbol


L = 2.38 b/symbol
η = H (X)/ L = 0.99
Q) Following are probabilties of the 6 symbols :
0.30, 0.25, 0.20, 0.12, 0.08, 0.05. Find the efficiency and redundancy using both Shannon Fano and Huffman Coding.

Q)DMS has four symbols with probabilitie as


p(x1)=1/2 0 0, 10, 110, 111
p(x2)=1/4 1 0
p(x3)=1/8 1 1 0
p(x4)=1/8 1 1
1
Find Shannon Fano Code and show that code has the optimum property with code efficiency of 100 percent.
[OPTIMUM CODE: ni=I(xi) where ni= length of the code.

Q) DMS has 5 equally likely symbols.


Find the code efficiciency using both Shannon Fano and Huffman oding.

LZ ALGO

1011010001010.....
Given the binary sequence : 1 0 10 11 01 101 010 1011
Phrases Numbering Code
1
1) Variable to Fixed Code
001 000 1
0 010 000 0 2) Prior probabilities not given
10 011 001 0
11 100 001 1
01 101 010 1
101 110
010 111
011 1
101 0
Lempel ziv
1011 110 1
algorithm
1) Divide the given sequence ino phrases known as parsing.

2)Number of numbering bits decided based on number of phrases formed.

3)The tail bit , last bit in the phrase is the innovation symbol.

4)The code is formed by writing the code number of the head bits.
Lempel-Ziv Coding

Memorize previously occurring substrings in the input data


– parse input into the shortest possible distinct ‘phrases’
– number the phrases starting from 1 (0 is the empty string)

1011010100010…
12 3 4 5 6 7

each phrase consists of a previously occurring phrase
(head) followed by an additional 0 or 1 (tail)

transmit code for head followed by the additional bit for tail

– 01001121402010…
for head use enough bits for the max phrase number so far:

100011101100001000010…
– decoder constructs an identical dictionary

Input = 1011010100010010001001010010 Improvement


Dictionary Send Decode • Each head can only
0000  1 1 be used twice so at
0001 1 00 0
its second use we
0010 0 011 11
0011 11 101 01 can:
0100 01 1000 010 – Omit the tail bit
0101 010 0100 00 – Delete head from
0110 00 0010 10 the dictionary and
0111 10 1010 0100 re-use dictionary
1000 0100 10001 01001 entry
1001 01001 10010 010010
The Discrete Memoryless Channels (DMC):

Channel Representation: A communication channel may be defined as the path or


medium through which the symbols flow to the receiver end.
A DMC is a statistical model with an input X and output Y. Each possible input to
output path is indicated along with a conditional probability P (yj/xi), where P (yj/xi)
is the conditional probability of obtaining output yj given that the input is x1 and is
called a channel transition probability.

A channel is completely specified by the complete set of transition probabilities. The


channel is specified by the matrix of transition probabilities [P(Y/X)]. This matrix
is known as Channel Matrix.
▪ It is characterized by a relationship between its input and output, which is
generally a stochastic relation due to the presence of fading and noise.
Probability Transition matrix (PTM)
Every possible input to output path isindicated by conditional probability
Each input results in some outputs , hence each row adds to unity.

P(Y)= P(X)*P(Y/X)

P(X,Y)=P(Xd)*P(Y/X)
𝑃(𝑦1/𝑥1) ⋯ 𝑃(𝑦𝑛/𝑥1)
𝑃(𝑌/ 𝑋) = ⋮ ⋱ ⋮
𝑃(𝑦 1 /𝑥) ⋯ 𝑃(𝑦 𝑛 /𝑥)

Since each input to the channel results in some output, each row of the
column matrix must sum to unity. This means that
𝑛

Σ 𝑃(𝑦𝑗/𝑥𝑖) = 1 𝑓𝑜𝑟 𝑎𝑙𝑙𝑖


𝑗=1

Now, if the input probabilities P(X) are represented by the row matrix, we
have
Also the output probabilities P(Y) are represented by the row matrix, we have
[𝑃(𝑌)] = [𝑃 ( 𝑦1 ) 𝑃 ( 𝑦2) … .𝑃(𝑦𝑛)]
Then
[𝑃 (𝑌) ] = [𝑃(𝑋)][𝑃(𝑌/𝑋)]
Now if P(X) is represented as a diagonal matrix, we have
𝑃(𝑥1) ⋯ 0
[𝑃(𝑋)]𝑑 ⋮ ⋱ ⋮
= 0 ⋯𝑃(𝑥 𝑚 )
Then
𝑃( 𝑋, 𝑌) = [𝑃(𝑋)]𝑑[𝑃(𝑌/𝑋)]
Where the (i, j) element of matrix [P(X,Y)] has the form P(xi, yj).
The matrix [P(X, Y)] is known as the joint probability matrix and the element
P(xi, yj) is the joint probability of transmitting xi and receiving yj.
Entropy, Conditional Entropy and Mutual Information

H(X,Y)
H(X) H(Y)

H(Y |X)
H(X |Y)
I(X ;Y)

14
Conditional and Joint Entropies

Marginal Probability
denotes the average uncertainty of the random
variable X or of channel input
denotes the average uncertainty of the random
variable Y or of channel output

denotes the uncertainty of random variable X


Joint Probability after random variable Y is observed

denotes the uncertainty of random variable Y


Probability
after random variable X was transmitted
Conditional

average uncertainty of channel
1. H (X) is the average uncertainty of the channel input and H (Y) is the average
uncertainty of the channel output.
2. The conditional entropy H (X/Y) is a measure of the average uncertainty remaining
about the channel input after the channel output has been observed. H (X/Y) is also
called equivocation of X w.r.t. Y.
3. The conditional entropy H (Y/X) is the average uncertainty of the channel output given
that X was transmitted.
4. The joint entropy H (X, Y) is the average uncertainty of the communication channel as
a whole. Few useful relationships among the above various entropies are as under:
5. H (X, Y) = H (X/Y) + H (Y)
6. H (X, Y) = H (Y/X) + H (X)
7. I(X:Y) = H(X)-H(X/Y)= H(Y)- H(Y/X)
8. X and Y are statisticallyindependent.
Priori Entropies : H(X) and H(Y) : Avearge information going into and coming out
of channel. (before transmission)

Posteriori/Conditional Entropies : After transmission and reception of


particular symbol .

Conditonal Entropy or uncertainity of X when recieving Y:

Represents amount of information lost due to noise etc wrt the output symbol or when ouput is
observed.

12
Similarly

JOINT Entropy : = H(X,Y) From Bayes Theorem

therefore

where
If X and Y are statstistically
independent then :
H(X,Y)= H(X)+H(Y)
where

Similarly : yugna
2021-03-30 12:37:10
-------------------------------------------
- CHAIN RULE

yugna
2021-03-30 12:37:14
-------------------------------------------
- CHAIN RULE
Mutual Information : Transinformation

▪ Given that
▪ is the uncertainty of the random variable X over informaton channel
▪ is information lost in the channel due to the noise or the uncertainty of
random variable X after random variable Y is known

▪ Then,
▪ Denotes the balance of information at the receiver or the the amount of uncertainty of X that
has been removed given Y is known
▪ Definition of mutual information I ( X ;Y ) = H ( X ) − H ( X | Y )

I ( X ;Y ) = H ( Y) − H ( Y | X)

I ( X ;Y ) = H ( X ) + H(Y)− H ( X | Y )
13
Mutual Information
❖ Mutual Information (MI) of two random variables is a measure of the mutual dependence
between the two variables. More specifically, it quantifies the "amount of information" (in
units such as Shannons, commonly called bits) obtained about one random variable
through observing the other random variable.

❖ It can be thought of as the reduction in uncertainty about one random variable given
knowledge of another.

❖ High mutual information indicates a large reduction in uncertainty; low mutual information
indicates a small reduction; and zero mutual information between two random variables
means the variables are independent.

❖ For two discrete variables X and Y whose joint probability distribution is PXY (x,y) , the
mutual information between them, denoted I(X;Y) , is given by
MUTUAL
INFORMATION
PROOF :

Subtracting

Rearranging
Q)

Q) Given channel matrix as

: Find H(X), H(Y), H(X,Y), H(Y/X), H(X/Y)


A lossless channel is characterized by the fact that no source information is
lost in transmission.
When source symbol is sent in
deterministic channel , it is clear which
output will be received.
Both input and output alphabets are of same size.
Binary-Symmetric Channel

P(0/0)=1-p

P(1/0)= P(0/1)=p

P(1/1)=1-p

The channel has 2 inputs and 2 outputs. x1 =0 and


x2=1 and y1=0 and y2= 1

The channel is symmetrical because the probability of receiving a 1 if 0 is


sent is same as probability of recieving a 0 if 1 is sent.

19
Channel Capacity

▪ If R ≤ C, theoretically guarantee almost error free transmission
▪ If R>C, reliable transmission is impossible
Channel capacity: a maximum rate, C in bits/sec of a channel
The capacity per symbol Cs of a discrete-memoryless channel:
C s = I ( X ;Y ) (max over all possible input distribution)
max px ( )

The channel capacity per sec C
If there are r symbols transmitted per second then the max rate of transmission
of information per second is rCs
Hence max capacity per second is denoted by C(b/s)= rCs b/s

The Noisy Channel Coding Theorem

21
Capacities of Special Channels :
Q1) Find capacity of the channel .
For the given channel diagram write the channel matrix.

Given P(X)= P(x1), P(x2),P(x3)=


1/2, 1/4, 1/4 . Find H(X), H(Y),
Q2) H(Y/X). H(X/Y), H(X,Y).

P(Y/X) = [(0.2 0.5 0.3


0.2 0.6 0.2

Q3)

24
Types of Codes- Prefix Codes

You might also like