Lecture 2
Lecture 2
Institute of TECHNOLOGY
Electrical Engineering
Department
Communication System(EEng4261)
Lecture 2
Information theory and coding
By: Amanuel A.
March,2025 1
What we will cover ???
Introduction
Information Theory Fundamental Theorems
Measurement of Information
Entropy
Channel Representation(special channel), Channel Capacity
Code length and code efficiency
Source coding theorem
Classification of code
Entropy coding(Shannon-Fano, Huffman)
Reading Assignment
Review on Probability Theorems and Axioms
♦ Product Rule
♦ Sum Rule
♦ Bayes Theorem
♦ Conditional, Joint Probability
♦ Dependent and Independent Variable, Mutual probability
distribution
3
Introduction
Information theory is a field of study which deals with
quantification, storage, and communication of information.
• It is originally proposed by Claude E. Shannon in his famous
paper entitled “A Mathematical Theory of Communication”.
• “Shannon’s Vision(Channel Coding Theorem) : It is possible to achieve
near perfect communication of information
over a noisy channel”
In general, information theory provides:
a quantitative measure of information contained in message signals.
a way to determine the capacity of a communication system to transfer
information from source to destination.
4
Shannon Theory
Fundamental Theorems
Shannon 1: Error-free transmission is possible if R ≥ H and C ≥ R
R ≥ H - Source coding theorem (Simplified)
C ≥ R -Channel Coding theorem (Simplified)
Shannon 2: Source coding and Channel coding can be optimized independently and binary
symbols can be used as intermediate format..
H: The information content of the source ,R: Rate from the source coder C: Channel
Capacity 5
Information Source
An information source (can be either analog or discrete) is an object that
produces an event which is selected at random according to a specific probability
distribution.
A discrete information source is a source that has only a finite set of symbols as
possible outputs.
The set of source symbols is called the source alphabet and the elements of the
set are called symbols or letters.
Information sources can be classified as: Sources with memory and Memoryless
sources.
6
Measure of Information
8
Measure of Information…
Information Content of a Symbol:
Note that I(xi) satisfies the following properties.
9
ENTROPY
• Entropy,𝑯(𝑋),is a measure of average uncertainty of a random variable. It is number
of bits on average required to describe a random variable.
• It is also defined as the measure of information in terms of bits.
• Higher entropy (more random) higher information content.
• For a discrete random variable 𝑋 with 𝑝𝑋(𝑥𝑖) the entropy is defined as
H (X ) p X (xi ) log(p X (xi ))
10
• For a continuous random variable 𝑋,𝐇(𝑋) is given by:
11
H ( X ) σ2𝑖=1 p(xi ) log2 ( p(xi ))
-p(x1 ) log2 ( p(x1 )) p(x2 ) log2 ( p(x2 ))
H ( X ) 0.5log 2 (0.5) 0.5log 2 (0.5) 1bit.
- If we already know that it will be (or was) heads, i.e., P (heads ) = 1.
12
H (X ) 0.9 log 2 (0.9) 0.1 log 2 (0.1) 0.469 bit.
0 0.5 1
• The amount of information is more than zero but less than one bit.
• The uncertainty (information) is greatest when the events are equiprobable.
13
Average Information or Entropy of a DMS:
In a practical communication system, we usually transmit long sequences
of symbols from an information source.
Thus, we are more interested in the average information that a source
produces than the information content of a single symbol(Source entropy,
H(X) ).
The mean value of I(xi) over the alphabet
m of source , X with m different
H (X ) E[I (xi )] P(xi )I (xi )
symbols is given by: i1
m
P(xi ) log2
P( xi )
bits/symbol
i1
14
Average Information or Entropy of a DMS…
The quantity H(X) is called the entropy of the source X.
It a measure of the average information content per symbol for a DMS X.
The source entropy H(X) satisfies the following relations:
0 H ( X ) log2 m
where m is the size of the alphabet of source X .
The lower bound corresponds to no uncertainty, which occurs when one symbol has
probability p(xi)=1 while p(xj)= 0 for j≠i, so X emits xi at all times. The upper bound
corresponds to the maximum uncertainty which occurs when p(xj)= 1/m for all i (i.e:
when all symbols have equal probability to be generated by X).
15
Measure of Information…
Average Information or Entropy of a DMS…
Information Rate:
If the time rate at which the source X emits symbols is r (symbols/sec), the information
rate R, of the source is given by:𝑅 = r H(X)𝑏/𝑠
16
Discrete Memoryless Channel
Channel Representation:
A communication channel is the path or medium through which the
symbols flow from a source to a receiver.
A discrete memoryless channel (DMC) is a statistical model with an input
X and output Y as shown in the figure below.
17
Discrete Memoryless Channels…
Channel Representation…
The channel is discrete when the alphabets of X and Y are both finite.
It is memoryless when the current output depends on only the current input and
not on any of the previous inputs.
In the DMC shown above, the input X consists of input symbols x1, x2, …, xm
and the output Y consists of output symbols y1, y2, …, yn.
Each possible input-to-output path is indicated along with a conditional
probability P(yj/xi), which is known as channel transition probability.
18
Discrete Memoryless Channels…
Special Channels:
1. Lossless Channel
A channel described by a channel matrix with only one non-zero element in
each column is called a lossless channel.
An example of a lossless channel is shown in the figure below.
19
Discrete Memoryless Channels…
2. Deterministic Channel
A channel described by a channel matrix with only one non-zero unity
element in each row is called a deterministic channel.
An example of a deterministic channel is shown in the figure below.
20
Discrete Memoryless Channels…
3. Noiseless Channel
A channel is called noiseless if it is both lossless and deterministic.
The channel matrix has only one element in each in each row and column
with unity element as shown in the figure below.
21
Discrete Memoryless Channels…
4. Binary Symmetric Channel
The binary symmetric channel (BSC) is defined by the channel matrix
and channel diagram given below.
22
Mutual Information
Conditional and Joint Entropies:
Using the input probabilities P(xi), output probabilities P(yj), transition
probabilities P(yj/xi), and joint probabilities P(xi, yj), we can define the
following various entropy functions for a channel with m inputs and
n outputs.
23
Mutual Information
24
Mutual Information of a Channel:
25
Additive White Gaussian Noise(AWGN) Channel…
In an additive white Gaussian noise (AWGN ) channel, the channel output Y is given
by:
Y= X + n
where X is the channel input and n is an additive band-limited white Gaussian noise
with zero mean and variance σ2.
The capacity Cs of an AWGN channel is given by:
26
Additive White Gaussian Noise(AWGN) Channel…
If the channel bandwidth B Hz is fixed, then the output y(t) is also a band-limited
signal completely characterized by its periodic sample values taken at the Nyquist
rate 2B samples/s.
Then the capacity C(b/s) of the AWGN channel is given by:
For a band-limited and power limited AWGN channel, the channel
capacity is
27
Example: channel capacity
28
Examples on Information Theory and Coding
Example-1:
A DMS X has four symbols x1 , x2 , x3 , x4 with probabilities
P(x1 ) 0.4, P(x2 ) 0.3, P(x3 ) 0.2 and P(x4 ) 0.1
a. Calculate H ( X )
b. Find the amount of information contained in the messages
x1 x2 x3 x4 and x4 x3 x3 x2
29
Examples on Information Theory and Coding Cont’d…
Solution:
4
a. H ( X ) P(xi ) log2
[ P( xi )]
i1
1.85 bits/symbol
b.
Similarly
30
Examples on Information Theory and Coding Cont’d….
Example-2:
Consider a binary symmetric channel shown below.
31
Examples on Information Theory and Coding Cont’d….
Solution:
a. The channel matrix is given by :
P( y1 / x1 ) P( y 2 / x1 ) 0.9 0.1
P(Y / X ) 0.2 0.8
P( y1 / x 2 ) P( y 2 / x 2
)
b. P (Y ) P ( X ) P (Y / X )
0 . 9 0 .1
0.5 0 . 5
0 . 2 0 . 8
0.55 0 . 4 5 P ( y 1 ) P ( y 2 )
P ( y1 ) 0 . 5 5 a n d P ( y 2 ) 0 .4 5
32
Examples on Information Theory and Coding Cont’d….
Solution:
33
Practices your self
1. An information source can be modeled as a bandlimited process with a bandwidth
of 6kHz. This process is sampled at a rate higher than the Nyquist rate to
provide a guard band of 2kHz. It is observed that the resulting samples take
values in the set {-4, -3, -1, 2, 4, 7} with probabilities 0.2, 0.1, 0.15, 0.05, 0.3 and
0.2 respectively. What is the entropy of the discrete-time source in bits/sample?
What is the entropy in bits/sec?
34
Practices your self
2.. Let 𝑋 and 𝑌 have the fllowing joint distribution:
35
4 : Additionally, given the data below, compute the conditional
entropy 𝐻(𝑌/ 𝑋)
36
Source Coding
37
Code Length and Code Efficiency
Let X be a DMS with finite entropy H(X) and an alphabet {x1, x2, …..xm} with
corresponding probabilities of occurrence P(xi) (i=1, 2, …., m).
An encoder assigns a binary code (sequence of bits) of length ni bits for each
symbol xi in the alphabet of the DMS.
A binary code that represent a symbol xi is known as a code word.
The number of binary digits in code word (or simply ni) is so called length of
a code word (code length).
The average code word length
m L, per source symbol is given by:
L P(xi )ni
i1
38
Code Length and Code Efficiency
The parameter, L represents the average number of bits per source symbol
used in the source coding process.
The code efficiency η, is defined as:
Lmin
, where Lmin is the minimum possible value of L
L
1
39
Source Coding Theorem
The source coding theorem states that for a DMS X with entropy H(X),
the average code word length ,L per symbol is bounded as:
L H (X )
Further, L can be made as close to H(X) as desired, by employing
efficient coding schemes.
Thus, with Lmin=H(X), the code efficiency can be written as:
𝐻(𝑋)
=
𝐿
40
Classifications of Codes
There are several types of codes.
Let’s consider the table given below to describe different types of codes.
1. Fixed-Length Codes:
A fixed-length code is one whose code word length is fixed.
Code 1 and Code 2 of the above table are fixed-length codes with
length 2.
41
Classifications of Codes
2. Variable-Length Codes:
A variable-length code is one whose code word length is not fixed.
All codes of the above table except codes 1 and 2 are variable-length
codes.
3. Distinct Codes:
A code is distinct if each code word is distinguishable from the other
code words.
All codes of the above table except code 1 are distinct codes.
42
Classifications of Codes
4. Prefix-Free Codes:
A code is said to be a prefix-free code when the code words are distinct
and no code word is a prefix for another code word.
Codes 2, 4 and 6 in the above table are prefix free codes.
5. Uniquely Decodable Codes:
A distinct code is uniquely decodable if the original source sequence can be
reconstructed perfectly from the encoded binary sequences.
Note that code 3 of Table above is not a uniquely decodable code.
For example, the binary sequence 1001 may correspond to the source sequences
X2X3X2 or X2X1XIX2
43
letter P(xk) Code Code Code
• Consider the alphabet I II III
X = {x1, x2, x3} that may coded as C(x1) = 0, C( x2) =
1, and C(x3) = 01
x1 0.500 1 0 0
• This code is not uniquely decodable since the string
01 may be decoded as (x1, x2) or ( x3) x2 0.250 00 10 01
• Note that in the above code, the code for (x1) is a
prefix of the code for (x3) and the code is said to be
NOT prefix free x3 0.125 01 110 011
44
Entropy Coding
The design of a variable-length code such that its average code word length
approaches the entropy of the DMS is often referred to as Entropy coding.
There are two commonly known types of entropy coding; which are presented
in this section. These are:
i. Shannon-Fano coding
ii. Huffman coding
1. Shannon-Fano Coding:
An efficient code can be obtained by the following simple procedure, known as
Shannon-Fano algorithm:
1. List the source symbols in order of decreasing probability.
2. Partition the set into two sets that are as close to equi-probable as possible,
and assign 0 to the symbols in the upper set and 1 to the symbols in the
lower set.
45
Entropy Coding
1. Continue this process, each time partitioning the sets with as nearly
equal probabilities as possible, and assigning 0s to the symbols of
upper and 1s to the lower sets, until further partitioning is not
possible.
2. Collect 0s and 1s in the order from 1st step to the last step, to form
a code word for each symbol.
Note that in Shannon-Fano encoding the ambiguity may arise in the
choice of approximately equi-probable sets.
46
Shannon-Fano Coding , Example 1
48
Entropy Coding
2. Huffman Coding:
Huffman encoding employs an algorithm that results in an optimum code.
Thus, it is the code that has the highest efficiency.
The Huffman encoding procedure is as follows:
1. List the source symbols in order of decreasing probability.
2. Combine the probabilities of the two symbols that have lowest probabilities,
and re order the resultant probabilities. This action is know as reduction.
3. Repeat the above step until only two ordered probabilities are left.
4. Start encoding with the last reduction, which contains only two ordered
probabilities. Assign 0 for the 1st and 1 for the 2nd probability.
5. Now, go back one step and assign 0 and 1 (as a 2nd digit) for the two
probabilities that were combined in the last reduction process, and keep the
digit in the previous step for the probability that is not combined.
6. Repeat this process until the first column. The resulting digits will be code
words for the symbols.
49
Example 1: Given a DMS with seven source letters x1, x2, …..x 7 with
probabilities 0.35, 0.30, 0.20, 0.10,0.04,0.005, 0.005, respectively
Letter Prob. I(x) Code
x1 0.35 1.5146 00
x2 0.30 1.7370 01
x3 0.20 2.3219 10
x4 0.10 3.3219 110
x5 0.04 4.6439 1110
x6 0.005 7.6439 11110
x7 0.005 7.6439 11111
50
• The above code is not necessarily unique
• We can devise an alternative code as shown in the following for the
same source as above
X1 0
X2 10
X3 110
X4 1110
X5 11110
X6 111110
X7 111111
51
Entropy Coding
2. Huffman Coding continued…
An example of Huffman encoding is shown in Table below.
52
Practice The following Question
Example 1: Consider a discrete memoryless source X which has six symbols x1, x2,
x3, x4, x5 and x6 with probabilities 0.45, 0.20, 0.12, 0.10, 0.09 and 0.04,
respectively.
1. Construct the Huffman code for X.
2. Calculate the efficiency of the code.
Example 2: A discrete memoryless source X has four symbols x1, x2, x3 and
x4 with probabilities 0.4, 0.25, 0.19 and 0.16, respectively.
1. Construct the Huffman code.
2. Calculate the efficiency of the code.
3. If pair of symbols are encoded using the Huffman algorithm, what is the efficiency
of the new code? Compare the result with the one in part (2).
53
• 𝜒 =1,2,3,4,5 , 𝑝𝑋 = 0.25, 0.25,0.2,0.15,0.15
54