0% found this document useful (0 votes)

15 views71 pages

06 Source Coding WuG 2023 10 10 17 24

Uploaded by

tluo34618

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views71 pages

06 Source Coding WuG 2023 10 10 17 24

Uploaded by

tluo34618

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 71

电子科技大学格拉斯哥学院 Glasgow College at UESTC

Information Theory

Lossless Source Coding

WU, Gang (武刚)， Ph. D., Professor

National Key Laboratory Science and Technology on

Communications
(通信抗干扰技术国家级重点实验室)
[email protected];
[email protected]
Office: B3-607, Main building
Bitmap versus Vector Graph

2
Bitmap versus Vector Graph

3
Source Encoder and Decoder

• Source Encoder
– In digital communication we convert the signal from source into digital signal.
– The point to remember is we should like to use as few binary digits as possible
to represent the signal. In such away this efficient representation of the source
output results in little or no redundancy. This sequence of binary digits is
called information sequence.
– Source Encoding or Data Compression: the process of efficiently converting
the output of wither analog or digital source into a sequence of binary digits is
known as source encoding.
• Source Decoder
– At the end, if an analog signal is desired then source decoder tries to decode
the sequence from the knowledge of the encoding algorithm, which results in
the approximate replica of the input at the transmitter end.

4
Typical Image Coding (Lossy）
• JPEG/JPEG2000
– Recommendation T.81, International Telecommunication Union
(ITU) Std., Sept. 1992, Joint Photographic Expert Group (JPEG).
[Online]. Available: http://www.w3.org/Graphics/JPEG/itu-t81.pdf.
– Athanassios Skodras, Charilaos Christopoulos, and Touradj
Ebrahimi, “The jpeg 2000 still image compression standard,” IEEE
Signal processing magazine, vol. 18, no. 5, pp. 36–58,
2001.https://www.ece.uvic.ca/~frodo/software.html
• WebP
– Google Developers, “A new image format for the
web,”https://developers.google.com/speed/webp/, 2010.
• Google Inc., “Vp8 data format and decoding guide,”
https://datatracker.ietf.org/doc/html/rfc6386, 2011.
• FLIF
– Jon Sneyers and Pieter Wuille, “Flif: Free lossless image format
based on maniac compression,” in 2016 IEEE international
conference on image processing (ICIP). IEEE, 2016, pp. 66–70.
• Pixel CNN
– Aaron Van Oord, Nal Kalchbrenner, and Koray Kavukcuoglu, “Pixel
recurrent neural networks,” in International conference on
machine learning. PMLR, 2016, pp. 1747–1756.
– Tim Salimans, Andrej Karpathy, Xi Chen, and Diederik P Kingma,
“Pixelcnn++: Improving the pixelcnn with discretized logistic
mixture likelihood and other modifications,” arXiv preprint
arXiv:1701.05517, 2017.

5
Data Compression in Video Chatting

❑ In the video chatting, we do not need to transmit every details of the

bodies and background in a frame. We only need to transmit the
image of the head.
❑ Furthermore, we only need to transmit the offset vector rather than the
details of all the pixels.
6
Video Coding Standardization
• ITU-T（International Telecommunication Union –
Telecommunication Standardization Sector)
– VCEG（Video Coding Expert Group)
• H.26x
• ISO/IEC（International Standardization Organization,
International Electrotechnical Commission）
– MPEG（Moving Picture Expert Group）
• MEPG-x
• Jan. 2013, ISO/IEC and ITU-T jointl issued the new
HEVC standard(High Efficiency Video Coding)
– H.265（MPEG-H part2）
– 3GPP Release 12 (LTE-Advanced) include support for the
HEVC.

7
The updated specifications bringing support
for HEVC into 3GPP (2014)
Speicification # Title of 3GPP Speicification

TS 26.114 IP Multimedia Subsystem (IMS); Multimedia telephony; Media handling and interaction

TS 26.140 Multimedia Messaging Service (MMS); Media formats and codecs

TS 26.141 IP Multimedia System (IMS) Messaging and Presence; Media formats and codecs

TS 26.234 Transparent end-to-end Packet-switched Streaming Service (PSS); Protocols and codecs
Transparent end-to-end packet switched streaming service (PSS); 3GPP file format
TS 26.244
(3GP)
Transparent end-to-end Packet-switched Streaming Service (PSS); Progressive
TS 26.247
Download and Dynamic Adaptive Streaming over HTTP (3GP-DASH)
TS 26.346 Multimedia Broadcast/Multicast Service (MBMS); Protocols and codecs

TR 26.906* Evaluation of High Efficiency Video Coding (HEVC) for 3GPP services

*https://www.3gpp.org/ftp/Specs/html-info/26906.htm

8
Some Abbreviations
• DASH:
– Dynamic Adaptive Streaming over HTTP
• HEVC:
– High Efficiency Video Coding
• MBMS:
– Multimedia Broadcast/Multicast Service
• MTSI:
– Multimedia Telephony Services over IMS
• PSS:
– Packet-switched Streaming Service

9
Video Streaming Codec in 5G
• 3GPP started developing functionalities for
multimedia services and applications as part
of the Rel-16 and Rel-17 specifications.
• These include enablers for 5G Media
Streaming and extensions such as
– edge processing, analytics and event exposure;
– improvements to LTE-based 5G Broadcast and
hybrid services;
– 5G Multicast Broadcast Services (MBS);
– eXtended Reality (XR) and Augmented Reality (AR)
experiences.

10
Main Content

1 Fundamental of Source Coding

2 Lossless Source Coding Theorems

3 Classic Source Coding Schemes

11
Communication System

Transmitter
Source

Source Channel
Info.

Encryp Modu-
Encoder -tor Encoder lator

Channel
red 0101 0011 00110
Info. Dest.

Source Decryp Channel Demodu-

Decoder -tor Decoder lator

Receiver

12
Fundamental Concepts

❑ Source coding: reduce the redundancy, transmit more information

with a lower rate, improve the transmission efficiency.
❑ Channel coding: increase the redundancy by increasing the code rate
or the bandwidth, improve the transmission reliability.
❑ Cryptogram: encryption and decryption aims for increasing and
reducing the entropy, respectively, improve the transmission security.
❑ Source coding theorem: lossless source coding theorem (discrete
info source) and lossy source coding (continuous info source).
❑ Lossless source coding: every symbol generated by the info source
should be mapped onto a specific output codeword of the encoder.
❑ Lossy source coding: by satisfying the communication requirement,
the source coding allows some distortion (rate distortion theory).

13
Definition of Source Coding
❑ Source Coding: representing the original messages of the info source
by the code sequences, which are more suitable to be transmitted on a
specific medium. The source coding is completed by the encoder.
❑ The random message 𝑿 generated by the info source consists of 𝐿
discrete symbols, which is expressed as
𝑿 = 𝑋1 𝑋2 ⋯ 𝑋𝑙 ⋯ 𝑋𝐿 , where we have 𝑋𝑙 ∈ 𝑎1 , 𝑎2 , ⋯ , 𝑎𝑖 , ⋯ 𝑎𝑛 .
Source coding transforms the original random symbol sequence 𝑿
into the following code sequence 𝒀 :
𝒀 = 𝑌1 𝑌2 ⋯ 𝑌𝑘 ⋯ 𝑌𝐾 , where we have 𝑌𝑘 ∈ 𝑏1 , 𝑏2 , ⋯ , 𝑏𝑗 , ⋯ 𝑏𝑚 .

Lossless coding: every message of the info source is encoded into a

specific code sequence (codeword); Every codeword can only be
decoded into a specific message.
14
Source Coding：Example of Weather

• Implementation：Encoder (mapping)
• Mathematical Model
After enoding
S {00，01，10，11}
Source Source
Channel
Encoder
{s1(sunny),s2(cloudy), Binary sets =(0,1)
s3(rainy),s4(snowy)}
Codebook Is this codes optimum, in
(Set of codewords)
terms of efficiency and
• Essence: unique decodability?
– A transformation of the original symbol of the source
according to certain mathematical rules
– Single source symbol → Code symbol
– Source sequence → code sequence

15
Fixed Length Coding Theorem
❑ The entropy rate of a stationary memoryless info source 𝑿 is 𝐻(𝑋)
Fixed length
𝑿 = (𝑋1 𝑋2 ⋯ 𝑋𝑙 ⋯ 𝑋𝐿 ) coding 𝒀 = (𝑌1 𝑌2 ⋯ 𝑌𝑘 ⋯ 𝑌𝐾 )
𝑋𝑙 = {𝑎1 , 𝑎2 , ⋯ , 𝑎𝑖 , ⋯ 𝑎𝑛 } 𝑌𝑘 = {𝑏1 , 𝑏2 , ⋯ , 𝑏𝑗 , ⋯ 𝑏𝑚 }
❑ Fixed length coding theorem: for arbitrary 𝜀 > 0，𝛿 > 0, if we have
Information Rate 𝐾
𝑅 = log 2 𝑚 ≥ 𝐻 𝑋 + 𝜀 (Positive Theorem)
（bit/symbol） 𝐿
When 𝐿 is high, the decoding error could be lower than 𝛿. Otherwise,
Information Rate 𝐾
𝑅 = log 2 𝑚 ≤ 𝐻 𝑋 − 2𝜀, (Negative Theorem)
（bit/symbol） 𝐿
When 𝐿 is high, the decoding error cannot be avoided.
16
In p.43, Textbook

17
Coding Efficiency and Error Rate
❑ We have a stationary memoryless info source 𝑿 , which sequentially
sends symbols by obeying the following probability distribution：
𝐾
𝑎1 𝑎2 𝑎3 𝑎4 Coding Efficiency: 𝐻(𝑋)ൗ log 2 𝑚
𝑋 𝐿
= 1/2 1/4 1/8 1/8
𝑝(𝑋) What is decoding error rate?

Please calculate the coding efficiency and the decoding error rate for
the following cases:
(1) Encoding the one-symbol messages sent by 𝑿 with a binary code
having fixed length 2.
(2) Encoding the two-symbol messages sent by 𝑿 with a binary code
having fixed length 3.

18
Ideal Encoder
❑ Fixed length coding theorem (positive): when information rate is a little
higher than the entropy, we may realise error-free decoding. The length 𝐿 of
the message generated by the info source has to satisfy:
Var 𝐼 𝑎𝑖
𝐿≥ 2 .
𝜀 𝛿
The decoding error rate is lower than an arbitrary positive number 𝛿.
❑ Fixed length coding theorem :
❑ when the information rate 𝑅 is higher than the single-symbol entropy 𝐻(𝑋) by 𝜀, the
decoding error rate may not exceed 𝛿.
❑ If 𝑅 is lower than 𝐻(𝑋) by 2𝜀, the decoding error rate must be higher than 𝛿.
❑ When we encode the original message having an infinite length (𝐿 → ∞), the
ideal encoder having a coding efficiency tending to a unity exist, namely
𝐾
lim 𝐻(𝑋)ൗ log 2 𝑚 = 1,
𝐿→∞ 𝐿
which is impossible in practice.
19
Discussion on the Ideal Encoder
❑ We have the following single-symbol info source:
𝑋 𝑎1 𝑎2 𝑎3 𝑎4 𝑎5 𝑎6 𝑎7 𝑎8
= .
𝑝(𝑋) 0.4 0.18 0.1 0.1 0.07 0.06 0.05 0.04
Please encode this info source by using the fixed length code. The
coding efficiency should be 90%, the decoding error rate is 10−6 .
➢ Please calculate the length of the symbol sequence that needs to be
encoded together.
➢ If we use the binary fixed length code to encode every single
symbol generated by the info source, find the coding efficiency.

20
How an ideal encoder operates?
❑ An ideal encoder operates by obeying the following steps:
(1) Given 𝜀 and 𝛿, calculate the length 𝐿 of the symbol sequence generated by
𝑉𝑎𝑟 𝐼 𝑎𝑖
the info source that need to be encoded together by exploiting 𝐿 ≥ .
𝜀2𝛿

(2) Given 𝐿, calculate the probabilities of all the possible symbol sequences
having a length of 𝐿.

(3) Sort all the possible symbol sequences having a length of 𝐿 by the
descending order of their sending probabilities.

(4) Encode the sorted symbol sequences. The encoded symbol sequences
constitute a set of 𝐴𝜀 . The encoding process continues until 𝑝 𝐴𝜀 ≥ 1 − 𝛿.

21
What is a code？
• Definition:
– A code is a mapping from the discrete set of
symbols {0, · · · , M − 1} to finite binary sequences
• For each symbol, m their is a corresponding finite binary
sequence σm
• |σm| is the length of the binary sequence

• Expected number of bits per symbol (bit rate)

– Example for M = 4
• Message from {0,1,2,3}
• Encoded bit stream：(0, 2, 1, 3, 2) → (01|0|10|100100|0)
– Fixed Length Code：|σm| is constant for all m
– Variable Length Code： |σm| varies with m
22
Review on Source Coding

23
Unique Decodability
• Definition:
– For every string of source letters {𝑥1 , 𝑥2 , … , 𝑥𝑛 }, the
encoded output {C(𝑥1 )C(𝑥2 ), … , C(𝑥𝑛 )} must be distinct,
i.e., must differ from {C(𝑥1′ )C(𝑥2′ ), … , C(𝑥𝑚
′
)} for any other
source string {𝑥1‘ , 𝑥2’ , … , 𝑥𝑚
‘ }.

• Decoding error:
– If C(𝑥1 )C(𝑥2 ), … , C(𝑥𝑛 )= C(𝑥1′ )C(𝑥2′ ), … , C(𝑥𝑚
′ ), decoder must

fail on one of these inputs.

• Example:
– Consider a → 0, b → 01, c → 10
– Then ac → 010 and ba → 010
– Not uniquely decodable.

24
Example of Source coding
• Fixed length coding versus variable length coding

Source Prob. of source Codebook for source coding

symbol symbol Code 1 Code 2 Code 3 Code 4 Code 5
s1 p(s1)=1/2 00 00 0 1 1
s2 p(s2)=1/4 01 11 01 10 01
s3 p(s3)=1/8 10 00 001 100 001
s4 p(s4)=1/8 11 11 0001 1000 0001

• Is each code sequence decodable uniquely?

• Which has highest efficiency?
– the average number of coded bits per source symbol

25
Review on Fixed-length Coding Theorem

• What is the coding efficiency for source coding?

Fixed length
𝑿 = (𝑋1 𝑋2 ⋯ 𝑋𝑙 ⋯ 𝑋𝐿 ) coding 𝒀 = (𝑌1 𝑌2 ⋯ 𝑌𝑘 ⋯ 𝑌𝐾 )
𝑋𝑙 = {𝑎1 , 𝑎2 , ⋯ , 𝑎𝑖 , ⋯ 𝑎𝑛 } 𝑌𝑘 = {𝑏1 , 𝑏2 , ⋯ , 𝑏𝑗 , ⋯ 𝑏𝑚 }
𝐾 𝐾
Coding Efficiency: 𝐻(𝑋)ൗ log 2 𝑚 What is meaning of the denominator,
𝐿
log 2 𝑚？
𝐿

❑ We have the following single-symbol info source:

𝑋 𝑎1 𝑎2 𝑎3 𝑎4 𝑎5 𝑎6 𝑎7 𝑎8
= .
𝑝(𝑋) 0.4 0.18 0.1 0.1 0.07 0.06 0.05 0.04
Please encode this info source by using the fixed length code.

26
Fixed Length Coding Theorem
❑ The entropy rate of a stationary memoryless info source 𝑿 is 𝐻(𝑋)
Fixed length
𝑿 = (𝑋1 𝑋2 ⋯ 𝑋𝑙 ⋯ 𝑋𝐿 ) coding 𝒀 = (𝑌1 𝑌2 ⋯ 𝑌𝑘 ⋯ 𝑌𝐾 )
𝑋𝑙 = {𝑎1 , 𝑎2 , ⋯ , 𝑎𝑖 , ⋯ 𝑎𝑛 } 𝑌𝑘 = {𝑏1 , 𝑏2 , ⋯ , 𝑏𝑗 , ⋯ 𝑏𝑚 }
❑ Fixed length coding theorem: for arbitrary 𝜀 > 0，𝛿 > 0, if we have
Information Rate 𝐾
𝑅 = log 2 𝑚 ≥ 𝐻 𝑋 + 𝜀 (Positive Theorem)
（bit/symbol） 𝐿
When 𝐿 is high, the decoding error could be lower than 𝛿. Otherwise,
Information Rate 𝐾
𝑅 = log 2 𝑚 ≤ 𝐻 𝑋 − 2𝜀, (Negative Theorem)
（bit/symbol） 𝐿
When 𝐿 is high, the decoding error cannot be avoided.
27
How to do fixed-length encoding?
• Segment source symbols into n-tuples.
• Map each n-tuple into binary L-tuple
where

• Let be number of bits per source

symbol

28
Coding efficiency
❑ We have the following single-symbol info source:
𝑋 𝑎1 𝑎2 𝑎3 𝑎4 𝑎5 𝑎6 𝑎7 𝑎8
= .
𝑝(𝑋) 0.4 0.18 0.1 0.1 0.07 0.06 0.05 0.04
Please encode this info source by using the fixed length code.
• Fixed-length codes
– 3 bit for each symbol(L=3)
– H(X)=2.55 bit/symbol
𝐻 𝑋
– 𝜂= = 85%
L
– Could we enhance the coding efficiency?
• Solution
– Variable-length source codes
29
Variable-length source codes
• Motivation
– Probable symbols should have shorter
codeworkds than improbable to reduce
bpss
• Definition
– A variable-length source code C encodes
each symbol x in source alphabet to a
binary X codeword C(x) of length l(x).
• For example, for X = {a, b, c}
– C(a)= 0
– C(b) = 10
– C(c) = 11
30
Decoder for Variable-length source codes

• Only prefix-free codes are uniquely

decodable.
• Example
– Consider a → 0, b → 01, c → 11
– Then
• accc → 0111111=016;
• bccc → 01111111=017
– This can be shown to be uniquely
decodable

31
Example： Not uniquely decodable

____ __
______ ______
_ _ ___
______ ______
_________ ______

32
Varied Length Coding Theorem
❑ For a memoryless info source 𝑋 having an entropy of 𝐻(𝑋), when
encoding its symbols by an 𝑚 -ary Variable-length codeword, a
lossless encoding scheme exists, whose average codeword length 𝐾 ഥ
satisfies: 𝐻(𝑋) 𝐻 𝑋
≤𝐾 ഥ <1+ .
log 2 𝑚 log 2 𝑚
The average information rate of the encoder satisfies:
𝐾ഥ
𝐻 𝑋 ≤ 𝑅ത = log 2 𝑚 < 𝐻 𝑋 + 𝜀,
𝐿
Where we have 𝐿 = 1 and 𝜀 is an arbitrary positive.
❑ When the info source sends a symbol sequence having a length of 𝐿,
the average code length satisfies:
𝐿𝐻(𝑋) 𝐿𝐻(𝑋)
ഥ
≤𝐾 <1+
log 2 𝑚 log 2 𝑚
33
Proof

34
Proof (cont.)

35
Coding Efficiency
❑ The length 𝐿 of the symbol sequence to be encoded together by a
varied length encoder is far lower than a fixed length counterpart. The
coding efficiency is lower-bounded by
𝐻(𝑋) 𝐻(𝑋)
𝜂= >
𝑅ത log 2 𝑚
𝐻 𝑋 +
𝐿
❑ Encode the following single-symbol source by using a binary varied
length code:
𝑋 𝑎1 𝑎2 𝑎3 𝑎4 𝑎5 𝑎6 𝑎7 𝑎8
= .
𝑝(𝑋) 0.4 0.18 0.1 0.1 0.07 0.06 0.05 0.04
When the coding efficiency is required to be higher than 90%,
calculate the length of the symbol sequences that needs to be
processed together.
36
Pros and Cons of Varied-Length Code
❑ Varied length coding is capable of compressing the information.
❑ Basic principle of varied-length coding: the messages having high
sending probabilities are encoded into short codewords, while the
messages having low sending probabilities are encoded into long
codewords. Therefore, the average codeword length can be minimised
in order to increase the communication efficiency.
❑ Varied-length coding increase the complexity of decoding:
❑ The decoder should correctly identify the beginning of the codewords having
different lengths (synchronous decoding); since the code has various length,
when a specific symbol is received, the decoder does not know whether this is
the end of a code.
❑ It has to wait for receiving the following symbols. Then it can make a correct
decoding (decoding delay).

37
Examples of Varied Length Coding

Message of Sending
Code A Code B Code C Code D
Info source Probability
𝑎1 0.5 0 0 0 0
𝑎2 0.25 0 1 01 10
𝑎3 0.125 1 00 011 110
𝑎4 0.125 10 11 0111 1110
Uniquely
Undecodable Undecodable Decodable Decodable
Decoding Delay
No Delay
❑ A code is called a prefix-free code if no codeword is a prefix of any
other codewords. For brevity, a prefix-free code will be referred to as
a prefix code, which is also regarded as an instantaneous code.
38
Tree Representing code words of Code D

39
Tree Graph of Prefix Code

0 ternary code 0
1 1
0 2 0 2
0 0
1 1 1 1
2 2 0
2 2
0 0 0 1
1 1 1 2
2 2 2

Full Tree Non-full Tree

❑ Prefix codes can be constructed by a tree graph. The starting point is
the root and every segment between a pair of nodes is branch.
❑ The full tree constructs the fixed length code, while the non-full tree
constructs the varied length code.
40
Necessary and Sufficient Conditions
❑ Necessary and sufficient condition of an 𝑚-ary prefix-free code is
𝑛

෍ 𝑚−𝑘𝑖 ≤ 1. ( Kraft Inequality)

𝑖=1
𝑘𝑖 is the length of the i-th codeword.
Proof：Prove the necessity and the sufficiency of the inequality. See
Theorem 3.21, page 47, Textbook by Prof. Gallager
❑ Prefix codes are uniquely decodable.
𝑋 𝑎1 ⋯ 𝑎𝑖 ⋯ 𝑎𝑛
❑ Given an info source = 𝑝(𝑎 ) ⋯ 𝑝(𝑎 ) ⋯ 𝑝(𝑎 ) , if
𝑝(𝑋) 1 𝑖 𝑛
the length 𝑘𝑖 of the message 𝑎𝑖 ’s codeword satisfies
log 2 𝑝(𝑎𝑖 ) log 2 𝑝(𝑎𝑖 )
1− > 𝑘𝑖 ≥ − ，
log 2 𝑚 log 2 𝑚
Then this is a prefix code.
41
Proof on Kraft Inequality
• We prove this by associating
codewords with base 2 expansions
i.e., ‘decimals’ in base 2.

42
Proof on Kraft Inequality (2)

43
Proof on Kraft Inequality (3)

44
Proof on Kraft Inequality (4)

45
Proof on Kraft Inequality (5)

46
Quiz

47
The 2nd Way for Proof of Kraft Inequality

48
______ ______

_____

_ _ _ _

_ _ _
__ __ __

_______

49
__

__ ___

50
Shannon Code
❑ Binary Shannon code has the following coding steps:
(1) Sort the symbols of the info source by the descending order
of their sending probabilities, e.g. 𝑝(𝑎1 ) ≥ ⋯ ≥ 𝑝(𝑎𝑛 )

𝑗−1
(2) Let 𝑝 𝑎0 = 0 and 𝑝𝑎 𝑎𝑗 = σ𝑖=0 𝑝(𝑎𝑖 ) , 𝑗 = 1,2, ⋯ , 𝑛
represents the accumulate probability before the 𝑗-th symbol.

(3) Decide the length 𝑘𝑖 of the i-th codeword to satisfy the

inequalities of − log 2 𝑝 𝑎𝑖 ≤ 𝑘𝑖 ≤ 1 − log 2 𝑝 𝑎𝑖 .

(4) Represent 𝑝𝑎 𝑎𝑗 in its binary form and take 𝑘𝑖 numbers after

the decimal point as the codeword of the symbol 𝑎𝑖 .
51
Example of Shannon Code

❑ Please encode the following single symbol info source by the binary
Shannon code and calculate its coding efficiency.
𝑋 𝑎1 𝑎2 𝑎3 𝑎4 𝑎5 𝑎6
=
𝑃(𝑋) 0.25 0.25 0.2 0.15 0.1 0.05
• Please compute expected length and coding efficiency.
To convert a fraction, keep multiplying the fraction part by 2 until it
becomes 0. Collect the integer parts in forward order.

-log20.25=2 0.25 × 2 = 0.5 0.5 × 2 = 1.0 0.7 × 2 = 1.4 0.85 × 2 = 1.7 0.95 × 2 = 1.9
0.0 × 2 = 0.0 0.4 × 2 = 0.8 0.7 × 2 = 1.4 0.9 × 2 = 1.8
(0.0)10=(00)2 0.5× 2 = 1.0
0.0 × 2 = 0.0 0.8 × 2 = 1.6 0.4 × 2 = 0.8 0.8 × 2 = 1.6
-log20.25=2 0.6 × 2 = 1.2
-log20.15=2.737 0.8 × 2 = 1.6
(0.25)10=(01)2 -log20.2=2.3219
-log20.1=3.3219 0.2 × 2 = 0.4
(0.5)10=(100)2 (0.7)10=(101)2
(0.85)10=(1101)2 -log20.05=4.329
(0.95)10=(11110)2

52
53
Fano Code
❑ M-ary Fano code has the following coding steps:
(1) Sort the symbols of the info source by the descending order
of their sending probabilities, e.g. 𝑝(𝑎1 ) ≥ ⋯ ≥ 𝑝(𝑎𝑛 )

(2) Divide the probabilities into M groups having similar sum

probabilities.

(3) Each group is assigned a code-element chosen from M

possible values.

(4) Consider each group as a unity and repeat steps (2) and (3)
until the probabilities is undividable.
54
Example of Fano Code

❑ Please encode the following single symbol info source by the binary
Fano code and calculate its coding efficiency.
𝑋 𝑎1 𝑎2 𝑎3 𝑎4 𝑎5 𝑎6
=
𝑃(𝑋) 0.25 0.25 0.2 0.15 0.1 0.05
Coding efficiency:
❑ We have a binary discrete info source 𝑋, whose probability distribution
is P 𝑋 = [0.7, 0.3]，please encode its expanded source of order three
by the binary Fano code and calculate its coding efficiency.
Coding efficiency: xxxx

55
Huffman Code
❑ Binary Huffman code has the following coding steps:
(1) Sort the symbols of the info source by the descending order of their sending
probabilities, e.g. 𝑝(𝑎1 ) ≥ ⋯ ≥ 𝑝(𝑎𝑛 )

(2) Assign 0 and 1 to the symbols 𝑎𝑛−1 and 𝑎𝑛 having the lowest probabilities,
respectively. Combine 𝑎𝑛−1 and 𝑎𝑛 as a new symbol, whose probability is the
sum probability of 𝑎𝑛−1 and 𝑎𝑛 . The new source 𝑆1 has (𝑛 − 1) symbols.

(3) Sort the symbols in the new source 𝑆1 by the descending order of their
sending probabilities. Repeat step (2) and obtain a new reduced source 𝑆2
having (𝑛 − 2) symbols.

(4) Repeat the above steps until the info source is reduced to two symbols. Get
back to the original symbol along the coding path and obtain the codewords.

56
Examples of Huffman Coding (1)
• X={a1, a2, a3, a4, a5}，Pr(ai)={0.3,0.25,0.25,0.1,0.1}

57
Examples of Huffman Coding (2)
• Using Tenary sets, {0,1,2}

58
Examples of Huffman Coding (３)

❑ Please encode the following single symbol info source by the binary
Huffman code and calculate its coding efficiency.
𝑋 𝑎1 𝑎2 𝑎3 𝑎4 𝑎5 𝑎6
=
𝑃(𝑋) 0.25 0.25 0.2 0.15 0.1 0.05

Coding efficiency: xxxx

❑ A ternary discrete info source 𝑋 has a probability distribution
𝑝 0 = 0.5, 𝑝 1 = 0.3, 𝑝 2 = 0.2 , please encode its expanded
source of order two by the binary Huffman code and calculate its
coding efficiency.
Coding efficiency: xxxx

59
Two methods for Huffman Coding
❑ Method 1: The combined new symbol is put behind the other
symbols having the same probabilities.
❑ Method 2: The combined new symbol is put ahead of the other
symbols having the same probabilities.
❑ Example: encode the following single-symbol source by the binary
Huffman code with these two methods and calculate the variance of
the codeword length:
𝑋 𝑎1 𝑎2 𝑎3 𝑎4 𝑎5
=
𝑃(𝑋) 0.4 0.2 0.2 0.1 0.1
❑ Conclusion: when sorting the symbols of the reduced source, putting
the combined new symbol in front is capable of efficiently using the
short codewords.

60
The paper published by Huffman
• Huffman D A. A method for the
construction of minimum-redundancy
codes[J]. Proceedings of the IRE, 1952,
40(9): 1098-1101.

61
Example in the paper

62
Derived Coding Requirement
• No message shall be coded in such a
way that its code is a prefix of any other
message, or that any of its prefixes are
used elsewhere as a message code.

63
𝑚-ary Huffman Code
❑ If the codewords of an 𝑚-ary code constitute a tree graph, the number
of the separable codewords is 𝑚 + 𝑘(𝑚 − 1), where 𝑘 is positive
integer.
❑ For minimising the average codeword length, the last reduced source
should have 𝑚 different symbols, when using 𝑚-ary Huffman code.
❑ In the first step of assigning the code-elements to the symbols having
lowest probabilities, the number of symbols processed is not 𝑚.
❑ If a source has 𝑛 symbol, let k is the minimum integer satisfying 𝑚 +
𝑘 𝑚 − 1 > 𝑛, then 𝑠 = 𝑚 + 𝑘 𝑚 − 1 − 𝑛 codewords is abandoned.
❑ We then simultaneously process (𝑚 − 𝑠) symbols in the first step.
𝑋 𝑎1 𝑎2 𝑎3 𝑎4 𝑎5 𝑎6 𝑎7 𝑎8
❑ Encode =
𝑝(𝑋) 0.4 0.18 0.1 0.1 0.07 0.06 0.05 0.04
by the ternary Huffman code and calculate its coding efficiency.
64
𝑚-ary Huffman Code (2)
• It will be noted that the terminating
auxiliary ensemble always has one unity
probability message.
• Each preceding ensemble is increased in
number by m -1 until the first auxiliary
ensemble is reached.
• Therefore, if N1 is the number of messages
in the first auxiliary ensemble, then (N1 -
1)/(m-1) must be an integer.
• However N1 = N-n0+ 1, where no is the
number of the least probable messages n0=2
combined in a bracket in the original
ensemble. Therefore, n0 (which, of course,
is at least two and no more than m) must
be of such a value that (N-n0)/(m-1) is an
integer.
65
Comparison

❑ Shannon code, Fano code and Huffman code all consider the statistics
of the info source. The symbols frequently sent are encoded by a short
codeword but those infrequently sent are encoded by a long codeword
in order to reduce the average codeword length.
❑ Shannon code has a unique coding scheme but its coding efficiency is
not very high. Fano code and Huffman code both have multiple
coding schemes.
❑ Fano code is suitable for the info source having the close group
probabilities after the division.
❑ Huffman code does not have nay specific requirement on the info
source. It as a high coding efficiency and the complexity of its
encoder is low.

66
Optimality of Huffman Codes

67
Assignment

68
Assignment(cont.)

69
Assignment(cont.)

70
Project
Please encode the 26 English letters and the ‘space’ in the English
novel – Game of Thrones by using an arbitrary source coding
technique.
Requirements:
(1) Please choose a source coding technique from Shannon coding,
Fano coding, Huffman coding or any other source coding technique.
(2) Please use at least the first two chapters “Prologue” and “Bran”
as the information source.
(3) Please freely choose a programming platform to complete this
project, C++, Java, Python, Matlab and etc.
(4) Please freely make a two-person group to complete this project.
(5) Please submit a compressed package named after the name of
the group members, which includes the executable file, the source
file and the English REPORT.
(6) Both Class 1 and Class 2, please, submits the packages to me,
and at the same time, to my two teaching assistants.
(7) Deadline: November 30, 2023

Chapter 4 - Introduction To Source Coding
No ratings yet
Chapter 4 - Introduction To Source Coding
72 pages
Information Coding Techniques
0% (2)
Information Coding Techniques
374 pages
Excel7 Students Book
No ratings yet
Excel7 Students Book
168 pages
UDS in CAN Flash Programming
50% (2)
UDS in CAN Flash Programming
8 pages
COMPUTER Networks MCA
No ratings yet
COMPUTER Networks MCA
286 pages
IT2302-Information Theory and Coding
No ratings yet
IT2302-Information Theory and Coding
9 pages
Error Control Coding
No ratings yet
Error Control Coding
76 pages
Image Compression: CS474/674 - Prof. Bebis
100% (1)
Image Compression: CS474/674 - Prof. Bebis
110 pages
Information Theory: Dr. Muhammad Imran Farid
No ratings yet
Information Theory: Dr. Muhammad Imran Farid
32 pages
Bec613a MMC Mod3
No ratings yet
Bec613a MMC Mod3
50 pages
Question Bank NLP SOLUTIONS
No ratings yet
Question Bank NLP SOLUTIONS
21 pages
MMC Module Iii-1
No ratings yet
MMC Module Iii-1
73 pages
ECMR11 Proceedings
No ratings yet
ECMR11 Proceedings
333 pages
DC-PPT 5
No ratings yet
DC-PPT 5
44 pages
Edit 610 - Final Project
No ratings yet
Edit 610 - Final Project
9 pages
Compression and Decompression
No ratings yet
Compression and Decompression
19 pages
Mesleki Yeterlilik
No ratings yet
Mesleki Yeterlilik
106 pages
Introduction To ITC
No ratings yet
Introduction To ITC
24 pages
Chapter 8
No ratings yet
Chapter 8
108 pages
21ECE72 - Coding and Cryp Module 2
No ratings yet
21ECE72 - Coding and Cryp Module 2
69 pages
3 Source Coding
No ratings yet
3 Source Coding
31 pages
Errata: Cultural History of The Native Peoples of Southern New England
100% (3)
Errata: Cultural History of The Native Peoples of Southern New England
5 pages
M.C.A. (Sem - III) Paper - III Data Communication and Networking PDF
No ratings yet
M.C.A. (Sem - III) Paper - III Data Communication and Networking PDF
286 pages
Unit 5 - Part-Ii
No ratings yet
Unit 5 - Part-Ii
41 pages
Introduction To Digital Communications System: Wireless Information Transmission System Lab
No ratings yet
Introduction To Digital Communications System: Wireless Information Transmission System Lab
83 pages
Low Density Parity Check Codes
0% (1)
Low Density Parity Check Codes
21 pages
Of Mice and Men
No ratings yet
Of Mice and Men
7 pages
Bits Signals and Packets
No ratings yet
Bits Signals and Packets
98 pages
Digital Communication System
No ratings yet
Digital Communication System
10 pages
Holiday Homework Class-Vii
No ratings yet
Holiday Homework Class-Vii
80 pages
Ip 09
No ratings yet
Ip 09
33 pages
Speech Coding: Fundamentals and Applications: ARK Asegawa Ohnson
No ratings yet
Speech Coding: Fundamentals and Applications: ARK Asegawa Ohnson
20 pages
Chapter 4 - Introduction To Source Coding PDF
No ratings yet
Chapter 4 - Introduction To Source Coding PDF
72 pages
ImageCompression-UNIT-V-students Material
No ratings yet
ImageCompression-UNIT-V-students Material
88 pages
Information Theory & Source Coding
100% (1)
Information Theory & Source Coding
14 pages
Data Compression 2
No ratings yet
Data Compression 2
19 pages
Chapter-6 Concepts of Error Control Coding - Block Codes
No ratings yet
Chapter-6 Concepts of Error Control Coding - Block Codes
77 pages
Module 2
No ratings yet
Module 2
47 pages
Tema N°1
No ratings yet
Tema N°1
39 pages
Nios 302 Ch-23
No ratings yet
Nios 302 Ch-23
12 pages
Module 3
No ratings yet
Module 3
23 pages
Speech Coding: Fundamentals and Applications: ARK Asegawa Ohnson
No ratings yet
Speech Coding: Fundamentals and Applications: ARK Asegawa Ohnson
20 pages
Group Presentation Digital Communication Systems
No ratings yet
Group Presentation Digital Communication Systems
29 pages
Our Findings Show
No ratings yet
Our Findings Show
35 pages
Fundamentals of Vector Quantization
No ratings yet
Fundamentals of Vector Quantization
87 pages
Low Density Parity Check Codes For Erasure Protection: Alexander Sennhauser April 22, 2005
No ratings yet
Low Density Parity Check Codes For Erasure Protection: Alexander Sennhauser April 22, 2005
20 pages
Source Coding and Channel Coding For Mobile Multimedia Communication
No ratings yet
Source Coding and Channel Coding For Mobile Multimedia Communication
19 pages
Video Processing Communications Yao Wang Chapter8a
No ratings yet
Video Processing Communications Yao Wang Chapter8a
19 pages
Android Controlled Arduino Robot Car: Jie Hou
No ratings yet
Android Controlled Arduino Robot Car: Jie Hou
27 pages
Prais Winsten Regression
No ratings yet
Prais Winsten Regression
33 pages
Ubuntu-8.10 Install Guide
No ratings yet
Ubuntu-8.10 Install Guide
21 pages
PSquare Abacus Proposal School Pages 1-18
No ratings yet
PSquare Abacus Proposal School Pages 1-18
18 pages
EC 2214: Coding & Data Compression: Vishwakarma Institute of Technology
No ratings yet
EC 2214: Coding & Data Compression: Vishwakarma Institute of Technology
35 pages
Data Compression Basics: Discrete Source
No ratings yet
Data Compression Basics: Discrete Source
34 pages
Inf The Rev
No ratings yet
Inf The Rev
19 pages
Cst446 May 2024-Scheme
No ratings yet
Cst446 May 2024-Scheme
10 pages
White Paper A Coding Theory Tutorial Randy Yates 19-Aug-2009 19:03 PA1 N/a Tutorial - Tex
No ratings yet
White Paper A Coding Theory Tutorial Randy Yates 19-Aug-2009 19:03 PA1 N/a Tutorial - Tex
12 pages
Elearning - Vtu.ac - in P4 EC63 S11
No ratings yet
Elearning - Vtu.ac - in P4 EC63 S11
76 pages
Coding Intro
No ratings yet
Coding Intro
21 pages
Gip Unit 5 - 2018-19 Even
No ratings yet
Gip Unit 5 - 2018-19 Even
36 pages
Ecs452 2
No ratings yet
Ecs452 2
16 pages
Source Coding: Source Encoder Channel Encoder Digital Source Source Entropy Symbols Binary Sequence Modulator
No ratings yet
Source Coding: Source Encoder Channel Encoder Digital Source Source Entropy Symbols Binary Sequence Modulator
18 pages
4 6
No ratings yet
4 6
19 pages
Speech Coding Journal
No ratings yet
Speech Coding Journal
20 pages
Source and Channel Encoder and Decoder Modeling: S-72.333 Postgraduate Course in Radiocommunications Fall 2000
No ratings yet
Source and Channel Encoder and Decoder Modeling: S-72.333 Postgraduate Course in Radiocommunications Fall 2000
17 pages
SSC & Railway General Awareness Quiz (Eng.)
No ratings yet
SSC & Railway General Awareness Quiz (Eng.)
4 pages
Coding Intro
No ratings yet
Coding Intro
21 pages
CC 01 Introduction
No ratings yet
CC 01 Introduction
19 pages
Information Theory and Coding-Autonomous
No ratings yet
Information Theory and Coding-Autonomous
8 pages
Combined Science Component 3
No ratings yet
Combined Science Component 3
7 pages
How To Calculate Pn205
No ratings yet
How To Calculate Pn205
3 pages
ART Py-Pde A Python Package For Solving Partial Differential Equtions
No ratings yet
ART Py-Pde A Python Package For Solving Partial Differential Equtions
4 pages
Department of Information Technology Information Theory and Coding Question Bank Unit-I Part - A
No ratings yet
Department of Information Technology Information Theory and Coding Question Bank Unit-I Part - A
6 pages
Module 2
No ratings yet
Module 2
6 pages
A Flowchart
100% (1)
A Flowchart
5 pages
Still Vs Yet
No ratings yet
Still Vs Yet
5 pages
Grade 9 3rd Quarter
No ratings yet
Grade 9 3rd Quarter
8 pages
Lesson Plan: Class-V Subject: English Language and Spelling and Dictation
No ratings yet
Lesson Plan: Class-V Subject: English Language and Spelling and Dictation
6 pages
Devoir N°5
No ratings yet
Devoir N°5
4 pages
s7 200 Quick Reference Info en
No ratings yet
s7 200 Quick Reference Info en
6 pages
Selenese Commands
No ratings yet
Selenese Commands
7 pages
Sanglay, Anna Karenina - THEO 11 Reflection Paper
No ratings yet
Sanglay, Anna Karenina - THEO 11 Reflection Paper
2 pages
Ap2120 Datasheet Eng
No ratings yet
Ap2120 Datasheet Eng
2 pages
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Application and Implementation of DES Algorithm Based on FPGA
From Everand
Application and Implementation of DES Algorithm Based on FPGA
madhav
No ratings yet
Computer Jargon: The Illustrated Glossary of Basic Computer Terminology
From Everand
Computer Jargon: The Illustrated Glossary of Basic Computer Terminology
Kevin Wilson
No ratings yet
CISCO PACKET TRACER LABS: Best practice of configuring or troubleshooting Network
From Everand
CISCO PACKET TRACER LABS: Best practice of configuring or troubleshooting Network
Mulayam Singh
No ratings yet
Computer Jargon - The Illustrated Glossary of Basic Computer Terminology: Decode and simplify complex computer terms with easy-to-follow visual guides
From Everand
Computer Jargon - The Illustrated Glossary of Basic Computer Terminology: Decode and simplify complex computer terms with easy-to-follow visual guides
Kevin Wilson
No ratings yet
Colour Banding: Exploring the Depths of Computer Vision: Unraveling the Mystery of Colour Banding
From Everand
Colour Banding: Exploring the Depths of Computer Vision: Unraveling the Mystery of Colour Banding
Fouad Sabry
No ratings yet

06 Source Coding WuG 2023 10 10 17 24

Uploaded by

06 Source Coding WuG 2023 10 10 17 24

Uploaded by

电子科技大学格拉斯哥学院 Glasgow College at UESTC

Lossless Source Coding

National Key Laboratory Science and Technology on

❑ In the video chatting, we do not need to transmit every details of the

TS 26.140 Multimedia Messaging Service (MMS); Media formats and codecs

1 Fundamental of Source Coding

2 Lossless Source Coding Theorems

3 Classic Source Coding Schemes

Source Decryp Channel Demodu-

❑ Source coding: reduce the redundancy, transmit more information

Lossless coding: every message of the info source is encoded into a

• Expected number of bits per symbol (bit rate)

fail on one of these inputs.

Source Prob. of source Codebook for source coding

• Is each code sequence decodable uniquely?

• What is the coding efficiency for source coding?

❑ We have the following single-symbol info source:

• Let be number of bits per source

• Only prefix-free codes are uniquely

Full Tree Non-full Tree

෍ 𝑚−𝑘𝑖 ≤ 1. ( Kraft Inequality)

___ ___ ___ ___

(3) Decide the length 𝑘𝑖 of the i-th codeword to satisfy the

(4) Represent 𝑝𝑎 𝑎𝑗 in its binary form and take 𝑘𝑖 numbers after

(2) Divide the probabilities into M groups having similar sum

(3) Each group is assigned a code-element chosen from M

Coding efficiency: xxxx

You might also like

_ _ _ _