Types of Coding: - Source Coding - Code Data To More Efficiently Represent
Types of Coding: - Source Coding - Code Data To More Efficiently Represent
Types of Coding
Source Coding - Code data to more efficiently represent
the information
Reduces size of data
Analog - Encode analog source data into a binary format
Digital - Reduce the size of digital source data
Channel Coding - Code data for transmition over a noisy
communication channel
Increases size of data
Digital - add redundancy to identify and correct errors
Analog - represent digital values by analog signals
Complete Information Theory was developed by Claude
Shannon
Entropy
M
1
X
pi log2 pi
i=0
= E [log2 pX ]
H(X) has units of bits
p(i|j) = PM 1
k=0
p(k, j)
1
M
1 M
X
X
i=0 j=0
= E [log2 p(X|Y )]
The mutual information between X and Y is given by
I(X; Y ) = H(X) H(X|Y )
The mutual information is the reduction in uncertainty of
X given Y .
A Code
Definition: A code is a mapping from the discrete set of
symbols {0, , M 1} to finite binary sequences
For each symbol, m their is a corresponding finite binary sequence m
|m| is the length of the binary sequence
M
1
X
m=0
|m| pm
Example for M = 4
m
0
1
2
3
m
|m|
01
2
10
2
0
1
100100 6
n
= E [|Xn |]
=
M
1
X
m=0
|m| pm
H(Xn)
Question: Can we achieve this bound?
Answer: Yes! Constructive proof using Huffman codes
Huffman Codes
Variable length prefix code Uniquely decodable
Basic idea:
Basic algorithm:
10
11
12
p1
p2
p3
p4
p5
p6
p7
0.4
0.08
0.08
0.2
0.12
0.07
0.04
0.01
0.4
0.08
0.08
0.2
0.12
0.07
0.4
0.08
0.08
0.2
0.12
0.12
0.12
0.12
0.4
0.16
0.2
0.4
0.16
0.2
0.4
0.24
0.36
0.4
0.24
0.6
1.0
0.05
13
p1
p2
p3
p4
p5
p6
0111
0110
010
001
0001
p7
00001 00000
1
0
root
14
15
Coding in Blocks
We can code blocks of symbols to achieve a bit rate that
approaches the entropy of the source symbols.
,X
, Xm1}, X
| m, {z, X2m1},
| 0, {z
Y0
Y1
So we have that
Yn = Xnm, , X(n+1)m1
where
Yn {0, , M m 1}
16
y is the
of bits per symbol Xn is given by n
x = my where n
number of bits per symbol for a Huffman code of Yn.
Then we have that
H(Yn) n
y < H(Yn) + 1
1
H(Yn)
m
1
1
H(Yn) +
m
m
n
y
m
<
n
y
m
< H(Xn) +
1
m
H(Xn) n
x < H(Xn) +
1
m
H(Xn)
1
m m
H(Xn) lim n
x H(Xn)
m
17
A Huffman coder can achieve this performance, but it requires a large block size.
As m becomes large M m becomes very large large
blocks are not practical.
This assumes that Xn are i.i.d., but a similar result holds
for stationary and ergodic sources.
Arithmetic coders can be used to achieve this bitrate in
practical situations.
18
13
26
If more than 2b repetitions occur, then the repetition is broken into segments
111 |
00 | |{z}
| 00000000
| {z } | |{z}
08
02
13
19
Algorithm
20
XOR
Run Length
Encoding
Huffman
Coding
Causal
Predictor
Decoder
Huffman
Decoding
Run Length
Decoding
XOR
Xs
Causal
Predictor
21
22
hs(i, j)
sS
h(i, j)
p(xs = i|zs = j) = P1
k=0 h(k, j)
23
h(i, j)
N (j)
24
XOR
Run Length
Encoding
Huffman
Coding
Causal
Predictor
Causal
Histogram Estimation
Decoder
Huffman
Decoding
Run Length
Decoding
XOR
Xs
Causal
Predictor
Causal
Histogram Estimation
25
26
Distortion
Let X and Z be random vectors in IRM . Intuitively, X is
the original image/data and Z is the decoded image/data.
Assume we use the squared error distortion measure given
by
d(X, Y ) = ||X Z||2
Then the distortion is given by
h
This actually applies to any quadratic norm error distortion measure since we can define
= AX and
X
So
Z = AZ
i
h
h
2 i
2
= E X
Z = E ||X Z||
D
B
where B = AtA.
27
X (N ) = (X0, , XN 1)
Z (N ) = (Z0, , ZN 1)
Encoder function: Y = Q(X0, , XN 1)
Decoder function: (Z0, , ZN 1) = f (Y )
Resulting quantities
Bit-rate =
K
N
N 1
i
1 X h
2
E ||Xn Zn||
Distortion =
N n=0
28
Differential Entropy
Notice that the information contained in a Gaussian random variable is infinite, so the conventional entropy H(X)
is not defined.
Let X be a random vector taking values in IRM with density function p(x). Then we define the differential entropy
of X as
Z
p(x) log2 p(x)dx
h(X) =
xIRM
= E [log2 p(X)]
h(X) has units of bits
29
xIRM
= E [log2 p(X|Y )]
The mutual information between X and Y is given by
I(X; Y ) = h(X) h(X|Y ) = I(Y ; X)
Important: The mutual information is well defined for
both continuous and discrete random variables, and it represents the reduction in uncertainty of X given Y .
30
31
32
K
R
N
and
N 1
i
1 X h
Distortion =
E ||Xn Zn||2 D
N n=0
Comments:
One can achieve a bit rate arbitrarily close to R(D) at
a distortion D.
Proof is constructive (but not practical), and uses codes
that are randomly distributed in the space IRM N of source
symbols.
33
R() = max
log
,0
2
2
D() = min ,
Intuition:
Distortion
Rate
34
n=0
D() =
N
1
X
n=0
Intuition:
min
n2 ,
35
Analysis: We know that we can always represent the covariance in the form
R = T tT
where the columns of T are the eigenvectors of R, and
2
= diag{02, , N
1 } is a diagonal matrix of eigenvalues. We can then decorrelate the Gaussian random vector with the following transformation.
= T tX
X
has the covariance matrix
From this we can see that X
given by
t
t
t
E X X = E T XX T
t
t
= T E XX T = T tRT =
meets the conditions of Example 2. Also,
So therefore, X
we see that
2
2
36
n=0
D() =
N
1
X
n=0
min
n2 ,
2
where 02, , N
1 are the eigenvalues of R.
Intuition:
An optimal code requires that the components of a vector be decorrelated before source coding.
37
1
D() =
min {Sx(), } d
2
Intuition:
Distortion
Rate
Distortion
38
c(k) =
1 k = 0
2 k = 1, , N 1
N
1
X
k=0
(2n + 1)k
2N
Comments:
In this form, the DCT is an orthonormal transform. So
if we define the matrix F such that
(2n + 1)k
Fn,k = c(k) cos
,
2N
then F 1 = F H where
F
= F
t
= FH
39
f (n) cos
F (k) =
2N
N n=0
N 1
n 2(2n+1)k
o
k
c(k) X
j
j
2N
f (n)Re e
e 2N
=
N n=0
(
)
2N
1
2(2n+1)k
k X
c(k)
j 2N
j
2N
= Re e
fp(n)e
N
n=0
o
c(k) n j k
= Re e 2N Fp(k)
N
k
k
c(k)
j 2N
+j 2N
Fp(k)e
+ Fp(k)e
=
N
k
j 2N
2k
c(k)e
Fp(k) + Fp(k)ej 2N
=
N
40
j 2k
2N
f (n)
= [f (0), f (1), f (2), f (3)]
fp(n)
= [f (0), f (1), f (2), f (3), 0, 0, 0, 0]
fp(n+N 1)
= [0, 0, 0, 0, f (3), f (2), f (1), f (0)]
f (n) + fp(n+N 1)
= [f (0), f (1), f (2), f (3), f (3), f (2), f (1), f (0)]
41
k
j 2N
c(k)e