Week 5 Information Theory Part1
Week 5 Information Theory Part1
(CCE 345)
2
Introduction (Cont.)
Father of Digital
Communication
❑ In 1940, Shannon’s master’s thesis had
been on the use of Boolean algebra in
the analysis of relay (logical circuits) at
Bell lab. Shannon’s interest in
computers overlapped with the
problems of communication.
❑ in 1948, the real birth of modern
information theory can be traced to the
publication of Claude Shannon’s
“A Mathematical Theory of Claude Shannon
Communication” (𝟏𝟗𝟏𝟔 − 𝟐𝟎𝟎𝟏)
3
Shannon’s Theory
4
Measurement of Information
1- Common-sense Measure of Information:
❑ Let us assume that the following three proposed titles :
A. There will be a daylight tomorrow.
B. There was a serious traffic accident in somewhere last night.
C. A large asteroid will hit earth in 2 days.
❑ The reader would not notice the first title unless he lives near
the North or the South Pole.
❑ The reader may be interested in the second headline.
❑ But the third title will attraction the reader's attention from the
first and second titles.
5
Measurement of Information (Cont.)
1- Common-sense Measure of Information (Cont.):
❑ The amount of information carried by a message appears to be
related to our ability to expect a message.
❑ The probability of occurrence of the first event is unity (an
assured event), the second is lower, and the third is practically
zero (an almost impossible event).
❑ If an event of low probability occurs, it causes greater surprise
and, hence, conveys more information than the occurrence of an
event of larger probability.
6
Measurement of Information (Cont.)
1
𝐼𝑚 ~𝑙𝑜𝑔
𝑃𝑚
where 𝑷𝒎 is the probability of occurrence a message and 𝑰𝒎 is the
information contained in the message.
7
Measurement of Information (Cont.)
2- Engineering Measure of Information:
❑ For efficient transmission, shorter codewords are assigned for a
letters (massages) which occur more frequently (i.e., higher
probability of occurrence) such as 𝑒, 𝑡, 𝑎, and 𝑛. While the
longer codewords are assigned for a letters which occur less
frequently (i.e., lower probability of occurrence) such as 𝑥, 𝑞,
and 𝑧.
❑ For example: A combination of two binary digits can form the
four codewords 00, 01, 10, 11, which assigned to the four
equiprobable messages 𝒎𝟏 , 𝒎𝟐 , 𝒎𝟑 , and 𝒎𝟒 , respectively.
8
Measurement of Information (Cont.)
2- Engineering Measure of Information (Cont.):
❑ From previous example, let as assume that 𝑳 equiprobable
messages, each with a probability of occurrence, 𝑷 = 𝟏/𝑳. And
𝒍𝒐𝒈𝟐 𝑳 binary digits to encode each of 𝑳 equiprobable message.
❑ Hence, to encode each message (with probability 𝑷), we need
𝒍𝒐𝒈𝟐 (𝟏/𝑷) binary digits.
❑ Thus, from the engineering point of view, the information
𝑰 conveyed by a message can be defined as
1
𝐼 = 𝑙𝑜𝑔2 bits
𝑃
9
Measurement of Information (Cont.)
2- Engineering Measure of Information (Cont.):
❑ From previous equation it is noted that, as the probability of the
event (message) increased, the information conveyed will be
decreased.
1
𝐼𝑖 = 𝑙𝑜𝑔2 bits
𝑃𝑖
❑ Hence, the mean, or average, information per message (symbol)
emitted by the source is given by
𝐿 𝐿 𝐿
1
𝐻 𝑚 = 𝑃𝑖 𝐼𝑖 = 𝑃𝑖 𝑙𝑜𝑔2 = − 𝑃𝑖 𝑙𝑜𝑔2 𝑃𝑖 bits/symbol
𝑃𝑖
𝑖=1 𝑖=1 𝑖=1
12
Average Information per Message (Cont.)
❑ The entropy of a source is a function of the message
probabilities. It is required to find the message probability
distribution that produces the maximum entropy.
❑ Because the entropy is a measure of uncertainty, the probability
distribution that generates the maximum uncertainty will have
1
the maximum entropy. Note: 𝑃1 = 𝑃2 = ⋯ = 𝑃𝐿 =
𝐿
𝐿
1 1
𝐻 𝑚 = 𝑃𝑖 𝑙𝑜𝑔2 = 𝐿. . 𝑙𝑜𝑔2 𝐿 bits/symbol
𝑃𝑖 𝐿
𝑖=1
❑ Thus, the maximum entropy can be calculated as
𝐻𝑚𝑎𝑥 𝑚 = 𝑙𝑜𝑔2 𝐿 bits/symbol
13
Average Information per Message (Cont.)
❑ The entropy 𝐻 𝑚 of a DMS is bounded by
0 ≤ 𝐻 𝑚 ≤ 𝐿𝑜𝑔2 𝐿
❑ If the source emits a symbol rate of 𝑅𝑠 (symbols/second), the
average source information rate 𝑅𝑏 can be calculated as follows:
𝑅𝑏 = 𝑅𝑠 . 𝐻 𝑚 bps
where 𝑅𝑏 is measured by (bits/second), 𝑅𝑠 in (symbols/second), and
𝐻 𝑚 in (bits/symbol).
Example (1):
Consider a discrete memoryless source that emits two symbols (or
letters) 𝑥1 and 𝑥2 with probabilities 𝑞 and 1−𝑞, respectively. Find
and sketch the entropy of this source as a function of 𝑞.
14
Average Information per Message (Cont.)
Example (1):(Cont.)
Hint: This source can be a binary source that emits the symbols 0
and 1 with probabilities 𝑞 and 1−𝑞, respectively.
Solution:
𝐿
𝐻 𝑚 = − 𝑃𝑖 𝑙𝑜𝑔2 𝑃𝑖
𝑖=1
𝐻 𝑞 = −𝑞 𝑙𝑜𝑔2 𝑞 − 1 − 𝑞 𝑙𝑜𝑔2 (1 − 𝑞)
This entropy vanishes when 𝑞 = 0 or 𝑞 = 1 because the outcome is
certain, it is maximal at 𝑞 = 1/2 when the uncertainty on the outcome
is maximal.
15
Average Information per Message (Cont.)
Solution (Cont.):
Sketch the entropy as:
16
Average Information per Message (Cont.)
Extended DMS:
❑ For DMS, a combined (block) symbol is used instead of individual
symbols. Each block symbol consists of 𝒌 successive source
symbols. Thus, we have an extended source of order 𝒌 and have
𝑳𝑘 block symbols, where 𝑳 is the number of symbol of the DMS.
❑ The entropy of an extended DMS is thus equal to 𝒌 times the
entropy of the original (non-extended) source 𝐻 𝑚𝑘 = 𝑘 𝐻 𝑚
17
Average Information per Message (Cont.)
Example (2):
A DMS can generate one of the three symbols 𝑚1 , 𝑚2 , and 𝑚3 with
probabilities 𝑃 𝑚1 = 0.5 , 𝑃 𝑚2 = 0.25 , and 𝑃 𝑚3 = 0.25 .
Determine the entropy 𝐻 𝑚𝑘 for 𝑘 = 1 and 𝑘 = 2 (second
extension). Comment the results.
Solution:
▪ The source entropy can be calculated as
1 1 1 1 1 1 3
𝐻 𝑚 = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 = bits/symbol
2 2 4 4 4 4 2
▪ Since the source has three distinct symbols. Therefore, the
second-order extension of the source has nine block symbol with
the following probabilities:
18
Average Information per Message (Cont.)
Solution (Cont.):
1
𝑃 𝑚1 𝑚1 =
4
1
𝑃 𝑚2 𝑚2 = 𝑃 𝑚2 𝑚3 = 𝑃 𝑚3 𝑚2 = 𝑃 𝑚3 𝑚 3 =
16
1
𝑃 𝑚1 𝑚2 = 𝑃 𝑚2 𝑚1 = 𝑃 𝑚1 𝑚3 = 𝑃 𝑚3 𝑚1 =
8
Accordingly, the entropy of the extended source calculated as
2
1 1 1 1 1 1
𝐻 𝑚 = − 𝑙𝑜𝑔2 +4 − 𝑙𝑜𝑔2 + 4 − 𝑙𝑜𝑔2 = 3b/Bk sy
4 4 16 16 8 8
3
Verify: 𝐻 𝑚𝑘 =𝑘𝐻 𝑚 >𝐻 𝑚2 = 2 𝐻 𝑚 = 2 × = 3 bits/Blk symbol
2
19
Source Coding Theory
20
Coding for DMS
Kinds of Source coding:
❑ There are two types of source coding which are:
1. Fixed length code-word.
2. Variable length code-word.
1- Fixed length code-word:
❑ Is the simplest method to encode each symbol of a discrete source
into a block of bits, where each block consists of the same number
of 𝑛 bits.
❑ Thus, there are 𝟐𝒏 different blocks for a block of 𝑛 bits. Assuming
the number of symbols in the source alphabet is 𝐿 and 𝐿 ≤ 2𝑛 ,
then a different binary 𝐧-tuple may be assigned to each symbol.
21
Coding for DMS (Cont.)
1- Fixed length code-word (Cont.):
❑ Assuming the decoder in the receiver knows the beginning of the
encoded sequence. The decoder can segment the received bits into
𝒏-bit blocks and then decode each block into the corresponding
source symbol.
❑ The encoder in the transmitter and the decoder in the receiver must
both obviously work with the same look-up table. Accordingly, the
fixed code-word length 𝑹 can be calculated as
𝑅 = 𝑙𝑜𝑔2 𝐿 bits/symbol
where 𝑥 denotes the least integer greater than or equal to x,
denoted ceil(x).
22
Coding for DMS (Cont.)
2- Variable length code-word :
❑ For a Specific Source Code:
𝐿
𝑅ത = 𝑃𝑖 𝑛𝑖 bits/symbol
𝑖=1
23
Coding for DMS (Cont.)
Example (3):
Find the code efficiency for the fixed-length coder assuming the
following DMSs:
a) 4 equiprobable symbols.
b) 5 equiprobable symbols.
c) 4 symbols with probabilities 0.5, 0.25, 0.125, 0.125.
d) Comment the result.
Solution:
a) 𝐻𝑚𝑎𝑥 𝑚 = 𝑙𝑜𝑔2 4 = 2 bits/symbol
𝑅 = 𝑙𝑜𝑔2 𝐿 = 2 bits/symbol
𝜂 = 𝐻(𝑚)Τ𝑅 = 1
𝑅 = 𝑙𝑜𝑔2 4 = 2 bits/symbol
𝐻(𝑚) 1.75
𝜂= = = 0.875
𝑅 2
d) 100% efficiency of fixed-length code can be satisfied when
the symbols are equiprobable, and number of symbols are
power of-2.
25
26