06 Source Coding WuG 2023 10 10 17 24
06 Source Coding WuG 2023 10 10 17 24
Information Theory
2
Bitmap versus Vector Graph
3
Source Encoder and Decoder
• Source Encoder
– In digital communication we convert the signal from source into digital signal.
– The point to remember is we should like to use as few binary digits as possible
to represent the signal. In such away this efficient representation of the source
output results in little or no redundancy. This sequence of binary digits is
called information sequence.
– Source Encoding or Data Compression: the process of efficiently converting
the output of wither analog or digital source into a sequence of binary digits is
known as source encoding.
• Source Decoder
– At the end, if an analog signal is desired then source decoder tries to decode
the sequence from the knowledge of the encoding algorithm, which results in
the approximate replica of the input at the transmitter end.
4
Typical Image Coding (Lossy)
• JPEG/JPEG2000
– Recommendation T.81, International Telecommunication Union
(ITU) Std., Sept. 1992, Joint Photographic Expert Group (JPEG).
[Online]. Available: http://www.w3.org/Graphics/JPEG/itu-t81.pdf.
– Athanassios Skodras, Charilaos Christopoulos, and Touradj
Ebrahimi, “The jpeg 2000 still image compression standard,” IEEE
Signal processing magazine, vol. 18, no. 5, pp. 36–58,
2001.https://www.ece.uvic.ca/~frodo/software.html
• WebP
– Google Developers, “A new image format for the
web,”https://developers.google.com/speed/webp/, 2010.
• Google Inc., “Vp8 data format and decoding guide,”
https://datatracker.ietf.org/doc/html/rfc6386, 2011.
• FLIF
– Jon Sneyers and Pieter Wuille, “Flif: Free lossless image format
based on maniac compression,” in 2016 IEEE international
conference on image processing (ICIP). IEEE, 2016, pp. 66–70.
• Pixel CNN
– Aaron Van Oord, Nal Kalchbrenner, and Koray Kavukcuoglu, “Pixel
recurrent neural networks,” in International conference on
machine learning. PMLR, 2016, pp. 1747–1756.
– Tim Salimans, Andrej Karpathy, Xi Chen, and Diederik P Kingma,
“Pixelcnn++: Improving the pixelcnn with discretized logistic
mixture likelihood and other modifications,” arXiv preprint
arXiv:1701.05517, 2017.
5
Data Compression in Video Chatting
7
The updated specifications bringing support
for HEVC into 3GPP (2014)
Speicification # Title of 3GPP Speicification
TS 26.114 IP Multimedia Subsystem (IMS); Multimedia telephony; Media handling and interaction
TS 26.141 IP Multimedia System (IMS) Messaging and Presence; Media formats and codecs
TS 26.234 Transparent end-to-end Packet-switched Streaming Service (PSS); Protocols and codecs
Transparent end-to-end packet switched streaming service (PSS); 3GPP file format
TS 26.244
(3GP)
Transparent end-to-end Packet-switched Streaming Service (PSS); Progressive
TS 26.247
Download and Dynamic Adaptive Streaming over HTTP (3GP-DASH)
TS 26.346 Multimedia Broadcast/Multicast Service (MBMS); Protocols and codecs
TR 26.906* Evaluation of High Efficiency Video Coding (HEVC) for 3GPP services
*https://www.3gpp.org/ftp/Specs/html-info/26906.htm
8
Some Abbreviations
• DASH:
– Dynamic Adaptive Streaming over HTTP
• HEVC:
– High Efficiency Video Coding
• MBMS:
– Multimedia Broadcast/Multicast Service
• MTSI:
– Multimedia Telephony Services over IMS
• PSS:
– Packet-switched Streaming Service
9
Video Streaming Codec in 5G
• 3GPP started developing functionalities for
multimedia services and applications as part
of the Rel-16 and Rel-17 specifications.
• These include enablers for 5G Media
Streaming and extensions such as
– edge processing, analytics and event exposure;
– improvements to LTE-based 5G Broadcast and
hybrid services;
– 5G Multicast Broadcast Services (MBS);
– eXtended Reality (XR) and Augmented Reality (AR)
experiences.
10
Main Content
11
Communication System
Transmitter
Source
Source Channel
Info.
Encryp Modu-
Encoder -tor Encoder lator
Channel
red 0101 0011 00110
Info. Dest.
Receiver
12
Fundamental Concepts
13
Definition of Source Coding
❑ Source Coding: representing the original messages of the info source
by the code sequences, which are more suitable to be transmitted on a
specific medium. The source coding is completed by the encoder.
❑ The random message 𝑿 generated by the info source consists of 𝐿
discrete symbols, which is expressed as
𝑿 = 𝑋1 𝑋2 ⋯ 𝑋𝑙 ⋯ 𝑋𝐿 , where we have 𝑋𝑙 ∈ 𝑎1 , 𝑎2 , ⋯ , 𝑎𝑖 , ⋯ 𝑎𝑛 .
Source coding transforms the original random symbol sequence 𝑿
into the following code sequence 𝒀 :
𝒀 = 𝑌1 𝑌2 ⋯ 𝑌𝑘 ⋯ 𝑌𝐾 , where we have 𝑌𝑘 ∈ 𝑏1 , 𝑏2 , ⋯ , 𝑏𝑗 , ⋯ 𝑏𝑚 .
• Implementation:Encoder (mapping)
• Mathematical Model
After enoding
S {00,01,10,11}
Source Source
Channel
Encoder
{s1(sunny),s2(cloudy), Binary sets =(0,1)
s3(rainy),s4(snowy)}
Codebook Is this codes optimum, in
(Set of codewords)
terms of efficiency and
• Essence: unique decodability?
– A transformation of the original symbol of the source
according to certain mathematical rules
– Single source symbol → Code symbol
– Source sequence → code sequence
15
Fixed Length Coding Theorem
❑ The entropy rate of a stationary memoryless info source 𝑿 is 𝐻(𝑋)
Fixed length
𝑿 = (𝑋1 𝑋2 ⋯ 𝑋𝑙 ⋯ 𝑋𝐿 ) coding 𝒀 = (𝑌1 𝑌2 ⋯ 𝑌𝑘 ⋯ 𝑌𝐾 )
𝑋𝑙 = {𝑎1 , 𝑎2 , ⋯ , 𝑎𝑖 , ⋯ 𝑎𝑛 } 𝑌𝑘 = {𝑏1 , 𝑏2 , ⋯ , 𝑏𝑗 , ⋯ 𝑏𝑚 }
❑ Fixed length coding theorem: for arbitrary 𝜀 > 0,𝛿 > 0, if we have
Information Rate 𝐾
𝑅 = log 2 𝑚 ≥ 𝐻 𝑋 + 𝜀 (Positive Theorem)
(bit/symbol) 𝐿
When 𝐿 is high, the decoding error could be lower than 𝛿. Otherwise,
Information Rate 𝐾
𝑅 = log 2 𝑚 ≤ 𝐻 𝑋 − 2𝜀, (Negative Theorem)
(bit/symbol) 𝐿
When 𝐿 is high, the decoding error cannot be avoided.
16
In p.43, Textbook
17
Coding Efficiency and Error Rate
❑ We have a stationary memoryless info source 𝑿 , which sequentially
sends symbols by obeying the following probability distribution:
𝐾
𝑎1 𝑎2 𝑎3 𝑎4 Coding Efficiency: 𝐻(𝑋)ൗ log 2 𝑚
𝑋 𝐿
= 1/2 1/4 1/8 1/8
𝑝(𝑋) What is decoding error rate?
Please calculate the coding efficiency and the decoding error rate for
the following cases:
(1) Encoding the one-symbol messages sent by 𝑿 with a binary code
having fixed length 2.
(2) Encoding the two-symbol messages sent by 𝑿 with a binary code
having fixed length 3.
18
Ideal Encoder
❑ Fixed length coding theorem (positive): when information rate is a little
higher than the entropy, we may realise error-free decoding. The length 𝐿 of
the message generated by the info source has to satisfy:
Var 𝐼 𝑎𝑖
𝐿≥ 2 .
𝜀 𝛿
The decoding error rate is lower than an arbitrary positive number 𝛿.
❑ Fixed length coding theorem :
❑ when the information rate 𝑅 is higher than the single-symbol entropy 𝐻(𝑋) by 𝜀, the
decoding error rate may not exceed 𝛿.
❑ If 𝑅 is lower than 𝐻(𝑋) by 2𝜀, the decoding error rate must be higher than 𝛿.
❑ When we encode the original message having an infinite length (𝐿 → ∞), the
ideal encoder having a coding efficiency tending to a unity exist, namely
𝐾
lim 𝐻(𝑋)ൗ log 2 𝑚 = 1,
𝐿→∞ 𝐿
which is impossible in practice.
19
Discussion on the Ideal Encoder
❑ We have the following single-symbol info source:
𝑋 𝑎1 𝑎2 𝑎3 𝑎4 𝑎5 𝑎6 𝑎7 𝑎8
= .
𝑝(𝑋) 0.4 0.18 0.1 0.1 0.07 0.06 0.05 0.04
Please encode this info source by using the fixed length code. The
coding efficiency should be 90%, the decoding error rate is 10−6 .
➢ Please calculate the length of the symbol sequence that needs to be
encoded together.
➢ If we use the binary fixed length code to encode every single
symbol generated by the info source, find the coding efficiency.
20
How an ideal encoder operates?
❑ An ideal encoder operates by obeying the following steps:
(1) Given 𝜀 and 𝛿, calculate the length 𝐿 of the symbol sequence generated by
𝑉𝑎𝑟 𝐼 𝑎𝑖
the info source that need to be encoded together by exploiting 𝐿 ≥ .
𝜀2𝛿
(2) Given 𝐿, calculate the probabilities of all the possible symbol sequences
having a length of 𝐿.
(3) Sort all the possible symbol sequences having a length of 𝐿 by the
descending order of their sending probabilities.
(4) Encode the sorted symbol sequences. The encoded symbol sequences
constitute a set of 𝐴𝜀 . The encoding process continues until 𝑝 𝐴𝜀 ≥ 1 − 𝛿.
21
What is a code?
• Definition:
– A code is a mapping from the discrete set of
symbols {0, · · · , M − 1} to finite binary sequences
• For each symbol, m their is a corresponding finite binary
sequence σm
• |σm| is the length of the binary sequence
– Example for M = 4
• Message from {0,1,2,3}
• Encoded bit stream:(0, 2, 1, 3, 2) → (01|0|10|100100|0)
– Fixed Length Code:|σm| is constant for all m
– Variable Length Code: |σm| varies with m
22
Review on Source Coding
• Implementation:Encoder (mapping)
• Mathematical Model
After enoding
S {00,01,10,11}
Source Source
Channel
Encoder
{s1(sunny),s2(cloudy), Binary sets =(0,1)
s3(rainy),s4(snowy)}
Codebook Is this codes optimum, in
(Set of codewords)
terms of efficiency and
• Essence: unique decodability?
– A transformation of the original symbol of the source
according to certain mathematical rules
– Single source symbol → Code symbol
– Source sequence → code sequence
23
Unique Decodability
• Definition:
– For every string of source letters {𝑥1 , 𝑥2 , … , 𝑥𝑛 }, the
encoded output {C(𝑥1 )C(𝑥2 ), … , C(𝑥𝑛 )} must be distinct,
i.e., must differ from {C(𝑥1′ )C(𝑥2′ ), … , C(𝑥𝑚
′
)} for any other
source string {𝑥1‘ , 𝑥2’ , … , 𝑥𝑚
‘ }.
• Decoding error:
– If C(𝑥1 )C(𝑥2 ), … , C(𝑥𝑛 )= C(𝑥1′ )C(𝑥2′ ), … , C(𝑥𝑚
′ ), decoder must
24
Example of Source coding
• Fixed length coding versus variable length coding
25
Review on Fixed-length Coding Theorem
26
Fixed Length Coding Theorem
❑ The entropy rate of a stationary memoryless info source 𝑿 is 𝐻(𝑋)
Fixed length
𝑿 = (𝑋1 𝑋2 ⋯ 𝑋𝑙 ⋯ 𝑋𝐿 ) coding 𝒀 = (𝑌1 𝑌2 ⋯ 𝑌𝑘 ⋯ 𝑌𝐾 )
𝑋𝑙 = {𝑎1 , 𝑎2 , ⋯ , 𝑎𝑖 , ⋯ 𝑎𝑛 } 𝑌𝑘 = {𝑏1 , 𝑏2 , ⋯ , 𝑏𝑗 , ⋯ 𝑏𝑚 }
❑ Fixed length coding theorem: for arbitrary 𝜀 > 0,𝛿 > 0, if we have
Information Rate 𝐾
𝑅 = log 2 𝑚 ≥ 𝐻 𝑋 + 𝜀 (Positive Theorem)
(bit/symbol) 𝐿
When 𝐿 is high, the decoding error could be lower than 𝛿. Otherwise,
Information Rate 𝐾
𝑅 = log 2 𝑚 ≤ 𝐻 𝑋 − 2𝜀, (Negative Theorem)
(bit/symbol) 𝐿
When 𝐿 is high, the decoding error cannot be avoided.
27
How to do fixed-length encoding?
• Segment source symbols into n-tuples.
• Map each n-tuple into binary L-tuple
where
28
Coding efficiency
❑ We have the following single-symbol info source:
𝑋 𝑎1 𝑎2 𝑎3 𝑎4 𝑎5 𝑎6 𝑎7 𝑎8
= .
𝑝(𝑋) 0.4 0.18 0.1 0.1 0.07 0.06 0.05 0.04
Please encode this info source by using the fixed length code.
• Fixed-length codes
– 3 bit for each symbol(L=3)
– H(X)=2.55 bit/symbol
𝐻 𝑋
– 𝜂= = 85%
L
– Could we enhance the coding efficiency?
• Solution
– Variable-length source codes
29
Variable-length source codes
• Motivation
– Probable symbols should have shorter
codeworkds than improbable to reduce
bpss
• Definition
– A variable-length source code C encodes
each symbol x in source alphabet to a
binary X codeword C(x) of length l(x).
• For example, for X = {a, b, c}
– C(a)= 0
– C(b) = 10
– C(c) = 11
30
Decoder for Variable-length source codes
31
Example: Not uniquely decodable
____ __
______ ______
_ _ ___
______ ______
_________ ______
32
Varied Length Coding Theorem
❑ For a memoryless info source 𝑋 having an entropy of 𝐻(𝑋), when
encoding its symbols by an 𝑚 -ary Variable-length codeword, a
lossless encoding scheme exists, whose average codeword length 𝐾 ഥ
satisfies: 𝐻(𝑋) 𝐻 𝑋
≤𝐾 ഥ <1+ .
log 2 𝑚 log 2 𝑚
The average information rate of the encoder satisfies:
𝐾ഥ
𝐻 𝑋 ≤ 𝑅ത = log 2 𝑚 < 𝐻 𝑋 + 𝜀,
𝐿
Where we have 𝐿 = 1 and 𝜀 is an arbitrary positive.
❑ When the info source sends a symbol sequence having a length of 𝐿,
the average code length satisfies:
𝐿𝐻(𝑋) 𝐿𝐻(𝑋)
ഥ
≤𝐾 <1+
log 2 𝑚 log 2 𝑚
33
Proof
34
Proof (cont.)
35
Coding Efficiency
❑ The length 𝐿 of the symbol sequence to be encoded together by a
varied length encoder is far lower than a fixed length counterpart. The
coding efficiency is lower-bounded by
𝐻(𝑋) 𝐻(𝑋)
𝜂= >
𝑅ത log 2 𝑚
𝐻 𝑋 +
𝐿
❑ Encode the following single-symbol source by using a binary varied
length code:
𝑋 𝑎1 𝑎2 𝑎3 𝑎4 𝑎5 𝑎6 𝑎7 𝑎8
= .
𝑝(𝑋) 0.4 0.18 0.1 0.1 0.07 0.06 0.05 0.04
When the coding efficiency is required to be higher than 90%,
calculate the length of the symbol sequences that needs to be
processed together.
36
Pros and Cons of Varied-Length Code
❑ Varied length coding is capable of compressing the information.
❑ Basic principle of varied-length coding: the messages having high
sending probabilities are encoded into short codewords, while the
messages having low sending probabilities are encoded into long
codewords. Therefore, the average codeword length can be minimised
in order to increase the communication efficiency.
❑ Varied-length coding increase the complexity of decoding:
❑ The decoder should correctly identify the beginning of the codewords having
different lengths (synchronous decoding); since the code has various length,
when a specific symbol is received, the decoder does not know whether this is
the end of a code.
❑ It has to wait for receiving the following symbols. Then it can make a correct
decoding (decoding delay).
37
Examples of Varied Length Coding
Message of Sending
Code A Code B Code C Code D
Info source Probability
𝑎1 0.5 0 0 0 0
𝑎2 0.25 0 1 01 10
𝑎3 0.125 1 00 011 110
𝑎4 0.125 10 11 0111 1110
Uniquely
Undecodable Undecodable Decodable Decodable
Decoding Delay
No Delay
❑ A code is called a prefix-free code if no codeword is a prefix of any
other codewords. For brevity, a prefix-free code will be referred to as
a prefix code, which is also regarded as an instantaneous code.
38
Tree Representing code words of Code D
39
Tree Graph of Prefix Code
0 ternary code 0
1 1
0 2 0 2
0 0
1 1 1 1
2 2 0
2 2
0 0 0 1
1 1 1 2
2 2 2
42
Proof on Kraft Inequality (2)
43
Proof on Kraft Inequality (3)
44
Proof on Kraft Inequality (4)
45
Proof on Kraft Inequality (5)
46
Quiz
47
The 2nd Way for Proof of Kraft Inequality
48
______ ______
_____
_______
49
__
__ ___
50
Shannon Code
❑ Binary Shannon code has the following coding steps:
(1) Sort the symbols of the info source by the descending order
of their sending probabilities, e.g. 𝑝(𝑎1 ) ≥ ⋯ ≥ 𝑝(𝑎𝑛 )
𝑗−1
(2) Let 𝑝 𝑎0 = 0 and 𝑝𝑎 𝑎𝑗 = σ𝑖=0 𝑝(𝑎𝑖 ) , 𝑗 = 1,2, ⋯ , 𝑛
represents the accumulate probability before the 𝑗-th symbol.
❑ Please encode the following single symbol info source by the binary
Shannon code and calculate its coding efficiency.
𝑋 𝑎1 𝑎2 𝑎3 𝑎4 𝑎5 𝑎6
=
𝑃(𝑋) 0.25 0.25 0.2 0.15 0.1 0.05
• Please compute expected length and coding efficiency.
To convert a fraction, keep multiplying the fraction part by 2 until it
becomes 0. Collect the integer parts in forward order.
-log20.25=2 0.25 × 2 = 0.5 0.5 × 2 = 1.0 0.7 × 2 = 1.4 0.85 × 2 = 1.7 0.95 × 2 = 1.9
0.0 × 2 = 0.0 0.4 × 2 = 0.8 0.7 × 2 = 1.4 0.9 × 2 = 1.8
(0.0)10=(00)2 0.5× 2 = 1.0
0.0 × 2 = 0.0 0.8 × 2 = 1.6 0.4 × 2 = 0.8 0.8 × 2 = 1.6
-log20.25=2 0.6 × 2 = 1.2
-log20.15=2.737 0.8 × 2 = 1.6
(0.25)10=(01)2 -log20.2=2.3219
-log20.1=3.3219 0.2 × 2 = 0.4
(0.5)10=(100)2 (0.7)10=(101)2
(0.85)10=(1101)2 -log20.05=4.329
(0.95)10=(11110)2
52
53
Fano Code
❑ M-ary Fano code has the following coding steps:
(1) Sort the symbols of the info source by the descending order
of their sending probabilities, e.g. 𝑝(𝑎1 ) ≥ ⋯ ≥ 𝑝(𝑎𝑛 )
(4) Consider each group as a unity and repeat steps (2) and (3)
until the probabilities is undividable.
54
Example of Fano Code
❑ Please encode the following single symbol info source by the binary
Fano code and calculate its coding efficiency.
𝑋 𝑎1 𝑎2 𝑎3 𝑎4 𝑎5 𝑎6
=
𝑃(𝑋) 0.25 0.25 0.2 0.15 0.1 0.05
Coding efficiency:
❑ We have a binary discrete info source 𝑋, whose probability distribution
is P 𝑋 = [0.7, 0.3],please encode its expanded source of order three
by the binary Fano code and calculate its coding efficiency.
Coding efficiency: xxxx
55
Huffman Code
❑ Binary Huffman code has the following coding steps:
(1) Sort the symbols of the info source by the descending order of their sending
probabilities, e.g. 𝑝(𝑎1 ) ≥ ⋯ ≥ 𝑝(𝑎𝑛 )
(2) Assign 0 and 1 to the symbols 𝑎𝑛−1 and 𝑎𝑛 having the lowest probabilities,
respectively. Combine 𝑎𝑛−1 and 𝑎𝑛 as a new symbol, whose probability is the
sum probability of 𝑎𝑛−1 and 𝑎𝑛 . The new source 𝑆1 has (𝑛 − 1) symbols.
(3) Sort the symbols in the new source 𝑆1 by the descending order of their
sending probabilities. Repeat step (2) and obtain a new reduced source 𝑆2
having (𝑛 − 2) symbols.
(4) Repeat the above steps until the info source is reduced to two symbols. Get
back to the original symbol along the coding path and obtain the codewords.
56
Examples of Huffman Coding (1)
• X={a1, a2, a3, a4, a5},Pr(ai)={0.3,0.25,0.25,0.1,0.1}
57
Examples of Huffman Coding (2)
• Using Tenary sets, {0,1,2}
58
Examples of Huffman Coding (3)
❑ Please encode the following single symbol info source by the binary
Huffman code and calculate its coding efficiency.
𝑋 𝑎1 𝑎2 𝑎3 𝑎4 𝑎5 𝑎6
=
𝑃(𝑋) 0.25 0.25 0.2 0.15 0.1 0.05
59
Two methods for Huffman Coding
❑ Method 1: The combined new symbol is put behind the other
symbols having the same probabilities.
❑ Method 2: The combined new symbol is put ahead of the other
symbols having the same probabilities.
❑ Example: encode the following single-symbol source by the binary
Huffman code with these two methods and calculate the variance of
the codeword length:
𝑋 𝑎1 𝑎2 𝑎3 𝑎4 𝑎5
=
𝑃(𝑋) 0.4 0.2 0.2 0.1 0.1
❑ Conclusion: when sorting the symbols of the reduced source, putting
the combined new symbol in front is capable of efficiently using the
short codewords.
60
The paper published by Huffman
• Huffman D A. A method for the
construction of minimum-redundancy
codes[J]. Proceedings of the IRE, 1952,
40(9): 1098-1101.
61
Example in the paper
62
Derived Coding Requirement
• No message shall be coded in such a
way that its code is a prefix of any other
message, or that any of its prefixes are
used elsewhere as a message code.
63
𝑚-ary Huffman Code
❑ If the codewords of an 𝑚-ary code constitute a tree graph, the number
of the separable codewords is 𝑚 + 𝑘(𝑚 − 1), where 𝑘 is positive
integer.
❑ For minimising the average codeword length, the last reduced source
should have 𝑚 different symbols, when using 𝑚-ary Huffman code.
❑ In the first step of assigning the code-elements to the symbols having
lowest probabilities, the number of symbols processed is not 𝑚.
❑ If a source has 𝑛 symbol, let k is the minimum integer satisfying 𝑚 +
𝑘 𝑚 − 1 > 𝑛, then 𝑠 = 𝑚 + 𝑘 𝑚 − 1 − 𝑛 codewords is abandoned.
❑ We then simultaneously process (𝑚 − 𝑠) symbols in the first step.
𝑋 𝑎1 𝑎2 𝑎3 𝑎4 𝑎5 𝑎6 𝑎7 𝑎8
❑ Encode =
𝑝(𝑋) 0.4 0.18 0.1 0.1 0.07 0.06 0.05 0.04
by the ternary Huffman code and calculate its coding efficiency.
64
𝑚-ary Huffman Code (2)
• It will be noted that the terminating
auxiliary ensemble always has one unity
probability message.
• Each preceding ensemble is increased in
number by m -1 until the first auxiliary
ensemble is reached.
• Therefore, if N1 is the number of messages
in the first auxiliary ensemble, then (N1 -
1)/(m-1) must be an integer.
• However N1 = N-n0+ 1, where no is the
number of the least probable messages n0=2
combined in a bracket in the original
ensemble. Therefore, n0 (which, of course,
is at least two and no more than m) must
be of such a value that (N-n0)/(m-1) is an
integer.
65
Comparison
❑ Shannon code, Fano code and Huffman code all consider the statistics
of the info source. The symbols frequently sent are encoded by a short
codeword but those infrequently sent are encoded by a long codeword
in order to reduce the average codeword length.
❑ Shannon code has a unique coding scheme but its coding efficiency is
not very high. Fano code and Huffman code both have multiple
coding schemes.
❑ Fano code is suitable for the info source having the close group
probabilities after the division.
❑ Huffman code does not have nay specific requirement on the info
source. It as a high coding efficiency and the complexity of its
encoder is low.
66
Optimality of Huffman Codes
67
Assignment
68
Assignment(cont.)
69
Assignment(cont.)
70
Project
Please encode the 26 English letters and the ‘space’ in the English
novel – Game of Thrones by using an arbitrary source coding
technique.
Requirements:
(1) Please choose a source coding technique from Shannon coding,
Fano coding, Huffman coding or any other source coding technique.
(2) Please use at least the first two chapters “Prologue” and “Bran”
as the information source.
(3) Please freely choose a programming platform to complete this
project, C++, Java, Python, Matlab and etc.
(4) Please freely make a two-person group to complete this project.
(5) Please submit a compressed package named after the name of
the group members, which includes the executable file, the source
file and the English REPORT.
(6) Both Class 1 and Class 2, please, submits the packages to me,
and at the same time, to my two teaching assistants.
(7) Deadline: November 30, 2023
71