0% found this document useful (0 votes)
23 views16 pages

ECE 867 VII 1 Entropy-Coding 4p

Lecture of entropy coding

Uploaded by

Amir Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views16 pages

ECE 867 VII 1 Entropy-Coding 4p

Lecture of entropy coding

Uploaded by

Amir Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Information Theory and Coding 1 Information Theory and Coding 2

“data compression” or “lossless compression”. Entropy


Entropy Coding
coding is widely used in virtually all popular
This area represents a (the!) major application venue for
international multimedia compression standards such as
many of the information theoretic concepts we have
JPEG and MPEG.
studied so far. We will also develop some new analytical
A complete entropy codec, which is an encoder/decoder
and algorithmic tools under the general entropy coding.
pair, consists of the process of “encoding” or
Entropy coding is also known as “zero-error coding”,
“compressing” a random source (e.g., an image, video,

Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha

Information Theory and Coding 3 Information Theory and Coding 4

audio, text, or any random signal/process) and the


Random Compressed
process of “decoding” or “decompressing” the Entropy
Encoding
Source Source

compressed signal to “perfectly” regenerate the original

random source. In other words, there is no loss of


Random Compressed
Entropy
information due to the process of entropy coding. Source
Decoding
Source

Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha


Information Theory and Coding 5 Information Theory and Coding 6
It is important to note that any possible loss-of- for such a system, and from the perspective of the
information or distortion that may be introduced in a entropy encoder, the input “random source” to that
communication system is not due to entropy encoder is the quantized transform coefficients.
encoding/decoding. A typical image compression system,
Quantized
Transform
Coefficients
Coefficients
for example, includes a transform process, a quantization
Random Entropy Compressed
Transform Quantization
Source Coding Source
process, and an entropy coding stage. In such system, the Examples
Examples
¾DCT
¾Huffman
¾Wavelets
¾Arithmetic
distortion is introduced due to quantization. Moreover,
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 7 Information Theory and Coding 8
Code Design and Notations and the second alphabet % is the one that is used for
In general, entropy coding (or “source coding”) is constructing the codewords. Based on the second
achieved by designing a code, C , which provides a one- alphabet % , we can construct and define the set D*,
to-one mapping from any possible outcome a random which is the set of all finite-length string of symbols
variable X (“source”) to a codeword. withdrawn from the alphabet % as shown in the
There two alphabets in this case; one alphabet is the following figures.
traditional alphabet $ (or F ) of the random source X ,
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 9 Information Theory and Coding 10
Alphabet (A) of Set of Codewords
Random Source (X)
D* Alphabet of
X A code symbols Entropy
A b1 used to construct Encoder Cab
a Codewords
B
b2b1
b b3b3b3 bi  B b4b2b2b3 b2b1 b4b2
C
c b4b2
. b 1 b
2
bbbb
4 2 2 3 Entropy
.
b3 b4 Decoder Cab
In this example: b4b2b2b3 b2b1 b4b2
B 4
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 11 Information Theory and Coding 12
Binary tree representation of a Definition
binary (D-ary; D=2) prefix code.
000 Set of Codewords D*
A source code, C , is a mapping from a random variable
00 001 0
0 01 010 10 (source) X with alphabet $ to a finite length string of
011 110
symbols, where each string of symbols (codeword) is a
111
100
Alphabet of code symbols member of the set D*:
1 10 101 used to construct codewords
11 110 B 0 1 C : $ o D*
111 B {D 2
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 13 Information Theory and Coding 14
The codewords in D* are formed from an alphabet B that Example
has D elements: | % | D . We say that we have a D -ary Let X be a random source with x  $ ^1,2,3,4`.
code; or B is a D -ary alphabet. Let % ^0,1`, and hence | % | D 2 . Then:
The most common case is when the alphabet B is the set D* {0, 00, 000,...1, 11, 111,...
B 2 and we have 01, 10,
^0,1`; therefore, in this case, D
001,010,011,100,...
binary codewords. ....
}
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 15 Information Theory and Coding 16
We can define the code C as follows: Definition
Codeword Length For a random variable X with a p.m.f. p1 , p2 ,..., pm , the
C 1 C x 1 0 L1 1 expected length of a code C X is:
10 L 2 m
C 2 C x 2 2
L C ¦p L. i i
i 1
C 3 C x 3 110 L3 3
C 4 C x 4 111 L4 3
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 17 Information Theory and Coding 18
Code Types In addition, we have to design codes that are uniquely
The design of a good code follows the basic notion of decodable. In other words, if the source generates a
entropy: For random outcomes with a high probability, a sequence: x1 , x2 , x3 ,... that is mapped into a sequence of
good code assigns “short” codewords and vice versa. The codewords C x1 , C x2 , C x3 ,... , then we should be
overall objective is to have the average length L L C
able to recover the original source sequence x1 , x2 , x3 ,...
to be as small as possible.
from the codewords sequence C x1 , C x2 , C x3 ,... .
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 19 Information Theory and Coding 20
In general, and as a start, we are interested in codes that Although a non-singular code is uniquely decodable for a
map each random outcome xi into a unique codeword single symbol, it does not guarantee unique decodability
that differs from the codeword of any other outcome. For for a sequence of outcomes of X .
a random source with alphabet $ ^1,2,...m` a non-
singular code meets the following constraint:
C x i z C x j i z j
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 21 Information Theory and Coding 22
Example: In the above example, the code C1 is non-singular,
Code C1 Code C2 however, it is not uniquely decodable. Meanwhile, the
C 1 C x 1 1 C x 1 10 code C2 is both non-singular and uniquely decodable.
C 2 C x 2 10 C x 2 00 Therefore, not all non-singular codes are uniquely
C 3 C x 3 101 C x 3 11 decodable; however, every uniquely decodable code is
C 4 C x 4 111 C x 4 110 non-singular.
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 23 Information Theory and Coding 24
It is important to note that a uniquely decodable code Therefore, it is highly desirable to design a uniquely
may require the decoding of multiple codewords to decodable code that can be decoded instantaneously
uniquely identify the original source sequence. when receiving each codeword.
This is the case for the above code C2 . (Can you give an This type of codes are known as instantaneous, prefix
example when the C2 decoder needs to wait for more free, or simply prefix codes.
codewords before being able to uniquely decode a In a prefix code, a codeword cannot be used as a prefix
sequence?) for any other codewords.
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 25 Information Theory and Coding 26
Example: It should be rather intuitive to know that every prefix
In the following example, no codeword is used as a code is uniquely decodable but the inverse is not always
prefix for any other codeword. true.
C 1 C x 1 0 C 2 C x 2 10 In summary, the three major types of codes, non-singular,
C 3 C x 3 110 C 4 C x 4 111 uniquely decodable, and prefix codes, are related as
shown in the following diagram.
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 27 Information Theory and Coding 28
Kraft Inequality
All
possible
Non-
codes Based on the above discussion, it should be clear that
singular
codes
Uniquely
decodable
codes uniquely decodable codes represent a subset of all
Prefix
(instantaneous) possible codes. Also, prefix codes are a subset of
codes
uniquely decodable codes.
Prefix codes meet a certain constraint, which is known as
the Kraft Inequality.
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 29 Information Theory and Coding 30
Theorem Conversely, given a set of codeword lengths that meet
m
For any prefix D -ary code C with codeword lengths
the inequality ¦ D  Li d 1, there exists a prefix code for
i 1
L1 , L2 ,..., Lm the following must be satisfied:
this set of lengths.
m
 Li
¦D d 1.
i 1
Proof
A prefix code C can be represented by a D -ary tree.
Below we illustrate the proof using a binary code and a
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 31 Information Theory and Coding 32
corresponding binary tree. (The same principles apply to Binary tree representation of a
binary (D-ary; D=2) prefix code.
000 Set of Codewords D*
higher order codes/trees.)
00 001 0
For illustration purposes, let us consider the code: 0 01 010 10
011 110
C 1 C x 1 0 C 2 C x 2 10
111
100
C 3 C x 3 110 C 4 C x 4 111 Alphabet of code symbols
1 10 101 used to construct codewords
11 110 B 0 1
This code can be represented as follows.
111 B {D 2
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 33 Information Theory and Coding 34
An important attribute of the above tree representation of Binary tree representation of a
binary (D-ary; D=2) prefix code.
000
codes is the number of leaf nodes that are associated with £1
00 001 Leaf nodes of the
each codeword. For example, the first codeword C 1 0 , 0 01 010 Codeword 0
011
there are four leaf nodes that are associated with it.
£2
100 Leaf nodes of the
Similarly, the codeword C 2 10 , has two leaf nodes.
1 10 101 Codeword 10
11 110
111
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 35 Information Theory and Coding 36
The last two codewords are leaf nodes themselves, and
000
£1
hence each of these is associated with a single leaf node 00 001 Leaf nodes of the
0 01 010 Codeword 0
(itself).
011
£2
100 Leaf nodes of the
1 10 101 Codeword 10
11 110 £3 Leaf nodes of codeword 110
111 £4 Leaf nodes of codeword 111
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 37 Information Theory and Coding 38
Note that for a prefix code, any codeword cannot be an of leaf nodes that are associated with (descendant of) a
ancestor of any other codeword. codeword at level Li is D Lmax  Li .
Let Lmax be the maximum length among all codeword Furthermore since each group £i of leaf nodes of a
lengths of a prefix code. codeword with length Li is a disjoint from any other
For each codeword with length Li d Lmax , this codeword
group of leaf nodes £j , then:
is at depth Li of the D -ary tree. Hence, the total number m m
Lmax  Li
¦D
i 1
d D Lmax which implies: ¦ D  Li d 1.
i 1
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 39 Information Theory and Coding 40
By similar arguments, one can construct a prefix code for Optimum Codes
a set of lengths that satisfy the above constraint: Here we address the issue of finding minimum length
m
 Li L C codes given the constraint imposed by the Kraft
¦D d 1.
i 1
inequality. In particular, we are interested in finding
QED
codes that satisfy:
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 41 Information Theory and Coding 42
m
Consequently, we can minimize the following objective
L* min L C min ¦p L i i such that
L1 , L2 ,... Lm L1 , L2 ,... Lm i 1
m m
 Li
m
function: J ¦ p L  O¦ D i i .
 Li i 1 i 1
¦D d 1.
i 1
wJ
m
Ÿ pi  O D  Li ln D 0 .
wLi
If we assume that equality is satisfied: ¦ D  Li 1, we
i 1
O ln D
Ÿ p i L*i
.
can formulate the problem using Lagrange multipliers. D
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 43 Information Theory and Coding 44
 L*i pi Therefore:
Ÿ D .
O ln D
The average length L* C of an optimum code can be
m
 L*i
Using the constraint ¦ D 1, expressed as:
i 1
m
1 *
Ÿ O L* ¦p L i i
ln D i 1
m
 L*i
Ÿ D pi Ÿ L*i  log D pi Ÿ L* ¦ pi log D pi Ÿ L* HD X
i 1
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 45 Information Theory and Coding 46
 L*i
where H D X is the entropy of the original source X D pi Ÿ L*i  log D pi .
(measured with a logarithmic base D ). However, and in general, the probability distribution
For a binary code, D 2 , then the average length is the values ( pi ) do not necessarily guarantee integer-valued
same as the standard (base-2) entropy measured in bits. lengths for the codewords.
Based on the above derivation, achieving an optimum
prefix code C with an entropy length L* H D X is only
possible when:
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 47 Information Theory and Coding 48
Below, we state one of the most fundamental theorems in Theorem (Entropy Bound)
information theory that relates the average length of any The expected length L of a prefix D -ary code C for a
prefix code with the entropy of the random source with random source X with an entropy H D X satisfies
general distribution values ( pi ). This theorem, which is the following inequality:
commonly known as the entropy bound theorem, L t HD X
illustrates that any code cannot have an average length  Li
with equality if-and-only-if D pi i .
that is smaller than the entropy of the random source.
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 49 Information Theory and Coding 50
m m
Proof
L  HD X
 ¦ pi log D D  Li  ¦ pi log D pi .
i 1 i 1
We look at the difference between the average length
m
§ pi ·
L of code C with lengths L1 , L2 ,..., Lm , and the entropy L  HD X ¦ p log i D ¨  Li ¸;
i 1 © D ¹
m
H D X of the random source X .
§ pi ·
L  HD X ¦ pi log D ¨  Li m ¸;
m m i 1 D § L ·
¨ ¨ D j¸ ¸
i i D i
L  HD X i
¦ p L  ¦ p log p ¦ ¸
i 1 i 1
¨ § m Lj · © j 1 ¹
¨ ¨¦D ¸ ¸
© ©j1 ¹ ¹
Using: Li  log D D  Li :

Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 51 Information Theory and Coding 52
Let: § 1 ·
Since D p || q t 0 and log ¨ ¸ t 0,
m © d ¹
D  Li  Lj
qi and d ¦D .
§ m Lj · j 1 Ÿ L  H D X t 0;
¨¦D ¸
©j1 ¹
Ÿ L t H D X .
m
§ pi · § 1 · m
LH D X ¦ p log i D ¨ ¸  log ¨ ¸ ¦ pi ;
i 1 © qi ¹ © d ¹i1
Note that equality takes place iff:
§ 1 ·
L  HD X D p || q  log ¨ ¸.
© d ¹
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 53 Information Theory and Coding 54
m
Lj D  Li Observations from the Entropy Bound Theorem
d ¦D 1 and pi qi D  Li .
j 1 § m  Lj ·
¨¦D ¸ The Entropy Bound Theorem and its proof leads to
©j1 ¹
important observations that we outline below:
§ 1 ·
Hence, in this case: log ¨ ¸ 0 and D p || q 0 .
© d ¹ 1. For random sources with distributions that
 Li
Ÿ L HD X satisfy pi D , where Li is an integer for
Q.E.D. i 1,2,..., m , there exists a prefix code that
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 55 Information Theory and Coding 56
achieves the entropy H D X . Such distributions 2. For general distributions that do not satisfy the
 Li
are known as D -adic. For the binary case, D 2, relationship ( pi D ), a measure of the
we have a Dyadic distribution (or a dyadic code). “distance” between the average length L and the
Example of a dyadic distribution is: entropy H D X can be expressed as:
1 1 1 1
p1 , p2 , p3 , p3 ; and § 1 ·
2 4 8 8 LH D X D p || q  log ¨ ¸.
© d ¹
L1 1, L2 2, L3 3, L3 3.
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 57 Information Theory and Coding 58
This measure of distance can provide some D  Li
qi
§ m Lj ·
guidelines for a procedure to search for an ¨¦D ¸
©j1 ¹
optimum prefix code that provides a “minimum
such that qi is as “close” as possible to the source
§ 1 ·
distance” L  H D X D p || q  log ¨ ¸ . In distribution pi in a relative entropy sense.
© d ¹
This approach is not very trivial; and consequently,
principle, this procedure can be based on finding
we will focus on more practical approaches for
the distribution:
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 59 Information Theory and Coding 60
finding prefix codes. We will consider examples of Constraints on Uniquely Decodable Codes
sub-optimum and optimal prefix codes. Prior to In this section, we address the following question:
that, we address one final important issue If we consider the set of non-prefix uniquely
regarding constraints for the more general case of decodable codes, what does happen to the Kraft
uniquely decodable codes. Inequality?
In particular, a non-instantaneous uniquely decodable
code does not have to meet the prefix-free constraint
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 61 Information Theory and Coding 62
that a prefix code has to adhere to. Hence, does this Theorem (McMillan)
relaxation of the prefix-free constraint get translated For any uniquely decodable D -ary code C with
into a relaxation in the Kraft Inequality? The answer codeword lengths L1 , L2 ,..., Lm the following must be
is NO! This is highlighted by the following Theorem. m
satisfied: ¦ D  Li d 1.
i 1
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha
Information Theory and Coding 63 Information Theory and Coding 64
Conversely, given a set of codeword lengths that meet The McMillan Theorem basically states that the Kraft
m
Inequality must be met by all uniquely decodable codes
the inequality ¦ D  Li d 1, there exists a uniquely
i 1
(prefix or not). Consequently, moving forward we
decodable code for this set of lengths.
continue to focus on prefix codes, which are used by all
popular compression standards, such as JPEG and MPEG.
Proof
(See either of the two books for the Proof.)
Copyright © 2005-2007 – Hayder Radha Copyright © 2005-2007 – Hayder Radha

You might also like