Lec 03
Lec 03
Zhu Li
Dept of CSEE, UMKC
Office: FH560E, Email: [email protected], Ph: x 2346.
http://l.web.umkc.edu/lizhu
slides created with WPS Office Linux and EqualX LaTex equation editor
1, = + − 1,
2 1 2 2
cbabcba
Total area: H(X, Y)
Relative Entropy/Cross Entropy
= log
H(X | Y) I(X; Y) H(Y | X)
H(X) H(Y)
pmf = pmf(find(pmf>0));
e = -1*sum(pmf.*log(pmf));
return;
Context :
x1 x2 x3
x4 x5
f(x4,x3, x2,x1)==100
Given a set of integer length {l1, l2, …ln} that satisfy above inequality,
we can always find a prefix code with code length l1, l2, …ln
1
1 (a2)
01 01 (a1)
000
000 (a3)
0010
Entropy:
H(S) = - (0.2*log2(0.2)*2 + 0.4*log2(0.4)+0.1*log2(0.1)*2)
= 2.122 bits / symbol
0.4 00 0.4 10
0.6 0.4 1 0.6 0.4 0
0.2 0.2
01 11
0 0.6 1 0 0.6 1
a 0.4 b 0.4
The two longest codewords differ only in the last bit and
correspond to the two least probable symbols. Otherwise we
can rearrange to achieve this.
Proof skipped.
Z. Li: ECE/CS 5578 Multimedia Communication p.15
Canonical Huffman Code
Huffman algorithm is needed only to compute the optimal
codeword lengths
The optimal codewords for a given data set are not unique
Canonical Huffman code is well structured
Given the codeword lengths, can find a canonical Huffman
code
Also known as slice code, alphabetic code.
i =1
10
01 00 01
11 111
100 101 110
000 001 100
Decoding:
UnaryDecode() {
n = 0;
while (ReadBit(1) == 1) {
n++;
}
return n;
}
m m m m
0 max
Codeword :
(Unary, fixed-length)
ênú
n = qm + r = ê ú m + r
ëmû
q: Quotient, r: remainder, “fixed-length” code
used unary code
K bits if m = 2^k
q Codeword
m=8: 000, 001, ……, 111
0 0
1 10 If m ≠ 2^k: (not desired)
2 110
3 1110 ëlog 2 mû bits for smaller r
4 11110 bits for larger r
5 111110
élog 2 mù
m = 5: 00, 01, 10, 110, 111
6 1111110
……
Z. Li: ECE/CS 5578 Multimedia Communication p.24
Golomb Code with m = 5 (Golomb-5)
n q r code n q r code n q r code
0 0 0 000 5 1 0 1000 10 2 0 11000
1 0 1 001 6 1 1 1001 11 2 1 11001
2 0 2 010 7 1 2 1010 12 2 2 11010
3 0 3 0110 8 1 3 10110 13 2 3 110110
4 0 4 0111 9 1 4 10111 14 2 4 110111
Encoding: n q r code
Remainder bits:
RBits = 3 for m = 8 0 0 0 0000
1 0 1 0001
GolombEncode(n, RBits) {
q = n >> RBits; 2 0 2 0010
UnaryCode(q); 3 0 3 0011
WriteBits(n, RBits); 4 0 4 0100
} Output the lower (RBits) bits of n. 5 0 5 0101
Decoding: 6 0 6 0110
7 0 7 0111
GolombDecode(RBits) {
q = UnaryDecode();
n = (q << RBits) + ReadBits(RBits);
return n;
}
0 4 11001
max
In Exp-Golomb code, the group size 5 11010
6 11011
increases exponentially 7 1110000
ExpGolombDecode() { 0 0 0
GroupID = UnaryDecode(); 1 100 1
if (GroupID == 0) { 2 101
return 0; 3 11000 2
} else { 4 11001
Base = (1 << GroupID) - 1; 5 11010
Index = ReadBits(GroupID); 6 11011
return (Base + Index); 7 1110000 3
} 8 1110001
} 9 1110010
} 10 1110011
11 1110100
12 1110101
13 1110110
14 1110111
f(x)
-l x
f ( x) = l e
x
-l x
f ( x) = l e
1
2
x
closest to diadic:
1/2, 1/4, 1/8,...
0.2 0 0
p=0.7? x4 x5
0 0 0
lossless_coding.m
Z. Li: ECE/CS 5578 Multimedia Communication p.36
Optimal Code for Geometric Distribution
Find m s.t. group wise Geometric distribution with p ~ 1/2
Geometric distribution with parameter ρ: P(x)
P(X=n) = ρn (1 - ρ)
Unary code is optimal prefix code when ρ ≤ 1/2.
Also optimal among all entropy coding for ρ = 1/2. x
How to design the optimal code when ρ > 1/2 ?
P(x)
Transform into GD with ρ ≤ 1/2 (as close as possible)
How? By grouping m events together!
Each x can be written as x = xq m + xr x
m -1 m -1
1 - r m
PX q ( q) = å PX ( qm + r ) = å (1 - r )r qm + r =(1 - r )r qm = r mq (1 - r m )
r =0 r =0 1- r
xq has geometric dist with parameter ρm.
Unary code is optimal for xq if ρm ≤ 1/2
1 é 1 ù
m³- m = ê- ú is the minimal possible integer.
log 2 r ê log 2 r ú
Z. Li: ECE/CS 5578 Multimedia Communication p.37
Golomb Parameter Estimation (J2K book: pp. 55)
r
E ( x) =
1- r
1
ρm ≤1/2 k
m = 2 ³ E ( x)
2
ì é æ1 öù ü
k = max í 0 , ê log 2 ç 2 E ( x ) ÷ ú ý.
î ê è øú þ