0% found this document useful (0 votes)

130 views87 pages

Fundamentals of Vector Quantization

This document provides an overview of fundamentals of vector quantization. It discusses key concepts such as: - The Shannon compression model of encoding, rate, and distortion. - Properties of optimal quantizers including achieving minimum distortion for a given rate, or minimum rate for a given distortion. - Design of structured vector quantizers using techniques like partitioning the input space into cells and assigning codewords. - Theoretical limits on achievable performance as determined by rate-distortion theory and source coding theory. Key concepts discussed include the rate-distortion function and operational rate-distortion function.

Uploaded by

monacer_ericson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

130 views87 pages

Fundamentals of Vector Quantization

Uploaded by

monacer_ericson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 87

Fundamentals of

Vector Quantization
Robert M. Gray
Information Systems Laboratory
Department of Electrical Engineering
http://www-isl.stanford.edu/~gray/compression.html

The Shannon compression model:

codes, rate, and distortion
Optimalility properties and quantizer design
Structured vector quantizers
Optimal achievable performance:
quantization and source coding theory

Part I:
codes, rate, and distortion

Input
Signal

Encoder

Channel

Decoder

Reconstructed
Signal

Figure 1

Classic Shannon model of point-to-point

communication system
General Goal: Given the signal and the
channel, find an encoder and decoder which give
the best possible reconstruction.
To formulate as precise problem, need
probabilistic descriptions of signal and channel
(parametric model or sample training data)
possible structural constraints on form of
codes
(block, sliding block, recursive)

quantifiable notion of what good or bad

reconstruction is (MSE, Pe)
Mathematical: Quantify what best or optimal
achievable performance is.
Practical: How do you build systems that are
good, if not optimal?
How measure performance?
SNR, MSE, Pe, bit rates, complexity, cost

Assumptions:
Signal is discrete time or space (e.g., already
sampled).
Do not separate out signal decompositions,
i.e., assume either done already or to be done
as part of the code.
Consider a code structure that maps blocks or
vectors of input data into possibly variable
length binary strings.
Later consider recursive code structures
Discrete-time random process input signal: {Xn}
Xn A <k ; distribution PX
X k = (X0, X1, . . . , Xk1), Xn <
Usually assume some form of stationarity (strict,
asymptotic, etc.)
If not stationary: handle with universal or
adaptive codes.
4

Encoder
An encoder (or source encoder) is a mapping
of the input vectors into a collection W of finite
length binary sequences.
: A W {0, 1}.
W = channel codebook , members are channel
codewords.
set of binary sequences that will be stored or
transmitted
Assume that W satisfies the prefix condition so
that it is uniquely decodable.
Given an i {0, 1}, define
l(i) = length of binary vector i
instantaneous rate r(i) = l(i)
k bits/input
symbol.
Average rate R(, W) = E[r((X))].

An encoder is said to be fixed length or fixed

rate if all channel codewords have the same
length, i.e., if l(i) = Rk for all i W.
Variable rate codes may require data
buffering, expensive and can overflow and
underflow
Harder to synchronize variable-rate codes.
Channel bit errors can have catastrophic
effects.
But variable rate codes can provide superior
rate/distortion tradeoffs.
E.g., in image compression can use more bits for
edges, fewer for flat areas. In voice compression,
more bits for plosives, fewer for vowels.

Define a source decoder : W A

usually A = A
Decoder is a table lookup.
Define the reproduction codebook
C {(i); i W}
members of C called reproduction codewords or
templates.
Convenient to reindex codebook using integers as
C {l ; l = 0, 1, . . . , M 1}
where M = ||W||=number of reproduction
codewords
A source code or compression code for the
source {Xn} consists of a triple (, W, ).

AWC
or, equivalently,

X i = (X) X = Q(x) = ((X)).

ENCODER
-

in = (Xn)

Xn
DECODER
in

n = (in)
X
General block memoryless source code
Later consider codes with memory, but general
block might operate in local nonmemoryless
fashion.
invertible or noiseless or lossless if ((x)) = x
lossy if it is not lossless.
Require a measure of In this case a measure of
distortion d to quantify how lossy.

Quality and Cost

Distortion
Distortion measure d(x, x) measures the
distortion or loss resulting if an original input x
is reproduced as x.
Mathematically: A distortion measure satisfies
d(x, x) 0
To be useful, d should be
easy to compute
tractable
meaningful for perception or application.
No single distortion measure accomplishes all of
these goals. Most common is MSE:
k1
X
2
|xl yl |2
d(x, y) = ||x y|| =
l=0

Weighted or transform/weighted versions are

used for perceptual coding. In particular:
Input-weighted squared error:

= (X X)
BX (X X),
d(X, X)
BX positive definite.
most common BX = I,
= ||X X||
2 (MSE)
d(X, X)
Other measures: Bx = x2 I
Performance of a compression system measured
by expected values of the distortion and rate.
D(, W, ) = D(, ) = E[d(X, ((X))]
1
R(, W, ) = R(, W) = E(r(X)) = E(l((X)))
k

? Every code yields point in distortion/rate

plane: (R, D).
Average Distortion
6

f
LL
L
L
L

v
L
L

v
L v
L
L
L
L
L
v
L
v
Lv
v
@
v
@
@
v
v
v
@
v
v
@
v
v
@v
PP
PP
v
v
PP
PP
Pv
hhhh
hhh v
hhv
-

1.5 2 2.5 3.0

Average Rate
D(, ) and R(, W) measure costs, want to
minimize both Tradeoff
Given all else equal, one code is better than
another if it has lower rate or lower distortion.
11

Interested in undominated points in D-R plane:

For given rate (distortion) want smallest possible
distortion (rate) optimal codes
Optimization problem:
Given R, what is smallest possible D?
Given D, what is smallest possible R?
Lagrangian formulation: What is smallest
possible D + R?

Make more precise and carefully describe the

various optimization problems.
Rate-distortion approach: Constrain
R(, W) R. Then optimal code (, W, )
minimizes D(, ) over all allowed codes.
operational rate-distortion function

R(, W).
R(D)
=
inf
,W,:D(,)D

Distortion-rate approach: Constrain

D(, ) D. Then optimal code (, W, )
minimizes R(, W) over all allowed codes.
operational distortion-rate function

D(, ).
D(R)
=
inf
,W,:R(,W)R

Lagrangian approach: Fix Lagrangian

multiplier .
Optimal code (, W, ) minimizes
J(, W, ) = D(, ) + R(, W)
over all allowed codes.
operational Lagrangian distortion function
J = inf E(X, ((X))) =
,W,

inf [D(, ) + R(, W)].

,W,

First two problems are duals, all three are

equivalent. E.g., Lagrangian approach yields R-D
for some D or D-R for some R.

Lagrangian approach effectively uncostrained

minimization of modified distortion
J = E(X, (X)) where
(X, (X)) = d(X, ((X)) + l((X))
Extreme points:

As 0 distortion 0, rate R(0)

=??

As rate 0 distortion D(0)

=??

Easy: D(R)
and R(D)
are monontonically
nonincreasing in their arguments.
Note: usually wish to optimize over constrained
subset of computationally reasonable codes, or
implementable codes.
Examples: Fixed rate codes, tree-structured
codes, product codes
Introduce structured codes later, but mention
fixed rate codes now.

Fixed Rate Codes

Important special case: Fixed rate (length)
codes.
Require all words in W = range space of W to
have equal length.
Then r(X) constant and problem simplified.
Eases buffering requirments when use fixed-rate
transmission media
Eases effects of channel errors
In fixed rate case, minimizing modified distortion
equivalent to minimizing ordinary distortion.
Lagrangian = distortion-rate if fixed-rate

Useful to consider an alternative description of a

source code:
Idea: use a neutral index set: set of integers
I = {0, 1, . . . , M 1}
and refer everything to it.
Now:
Encoder
:AI
Decoder : I A
Index decoder : I {0, 1}

E.g., given (, W, ), index W = {i; i I}.

Then (i) i,
are
Representations (, W, ) and (,
, )
equivalent:
(X));
((X)) = (

(X) = ((X))

A
I{z } W | {zI } C.
|

Storage cost: instantaneous rate

1
1

r(x) = l((x)) = l(((x))

k
k
Average rate:
= R(,

R(,
, )
) = k1 E[l(((X))],
(x))
distortion: d(x, (

= d(x, ((x)))
average distortion:
= E[d(X, (
(X))],
D(, W, ) = D(,
)

Encoder partition S = {Si; i I}

= i}; i I.
Si = {x : (x)
Partitions input space into disjoint, exhaustive
collection of cells
Define

1 if x F
1F (x) =
0 otherwise.

Then
(x) =

(i)1Si (x); (x)

i1Si (x)

and
(x))
((x)) = (

S (x)
(i)1
i

Optimalility Properties
and Quantizer Design
Extreme Points
0: all of the emphasis on distortion
force 0 average distortion
Minimize rate only as an afterthought.
(0, R) 0 distortion, corresponds to lossless codes

What is smallest R giving a lossless code? R(0)

Shannons noiseless coding theorem:
1
1
1

H(X) R(0) < H(X) +

k
k
k
Where H(X) = Px Pr(X = x) log2 Pr(X = x)
if X discrete, H(X) = supquantizers Q H(X)
otherwise.
Huffman code achieves minimum
Note: H = if X continuous.
20

All emphasis on rate, force 0 average rate
minimize distortion as an afterthought.
(D, 0): 0 rate, no bits communicated.

What is the best possible 0 rate code? D(0)

(Useless in practice, but provides key ideas for
general case in very simple setting.)
Given 0 rate, channel codeword is empty
sequence, length 0.
Only parameter: decoder

Optimal peformance: D(0)

= inf yA E[d(X, y)]
achieved by codebook with single word
minyA1E[d(X, y)]
Follows from simple inequalities

Example, A = A = <k , squared error distortion,

i.e.,
d(x, y) = ||x y||2 = (x y)t(x y)

Then D(0)
= inf y E[||X y||2] achieved by

miny 1E[||X y||2] = E(X)

Center of mass or centroid.

e.g.,
Input-weighted squared error

= (X X)
BX (X X)
d(X, X)
centroid is E[BX ]1E[BX X].

Empirical Distributions
Training set or learning set
L = {xi; i = 0, 1, . . . , L 1}
Empirical distribution is
1 L1
#i : xi F
X
=
1F (xi)
PL(F ) =
L
L i=0
Find the vector y that minimizes
1 L1
X
2
E[(X y) ] =
||X y||2.
L n=0
Answer = expectation
1 L1
X
xi ,
y=
L n=0
the Euclidean center of gravity of the collection
of vectors in L.
the sample mean or empirical mean.

Problem: Find optimal code for non-extreme .

Approach: find a code improvement algorithm.
or (, W, ) is a source
Suppose that (,
, )
code for a stationary source {X(n)} with
distribution PX .
Fact: If fix two of three components of code
then can describe optimal third
(,
, ),
component.
Yields necessary conditions for an optimal code
and a code improvement algorithm.
Will yield descent design algorithm for complete
code

Optimal Decoder for given encoder and

index decoder
Given
& the optimal decoder is
= min 1E[d(X, y)|(X)

= i]; i I
(i)
yA

Lloyd centroids Si w.r.t. PX .

Proof: Apply 0 rate result to conditional
distribution
(X)))]
+ R(,

= D(,
)
) =
E[(X, (
X

E[(X, (i))|
(X)

= i]PX (Si)+R(,
)

iI
X

PX (Si) min
E[(X, y)|(X)

= i]+R(,
)
y

If squared error distortion conditional mean

E[X|(X)

= i].
If empirical distribution conditional sample
average
For more general input-weighted squared error:

= i]1E[BX X|(X)

= i].
xi = E[BX |(X)
25

Optimal Index Decoder for given

Encoder and Decoder
Given
& optimal is optimal lossless code
for (X),

e.g., a Huffman code.

Proof:
(X)))]
+ R(,

= D(,
)
)
E[(X, (
+ E[l((W ))]
= D(,
)
+ min E[l(((X)))]
= D(,
)

where W = (X)

& minimum is over all

prefix-free lossless codes
Can assume that a Huffman code can
approximately achieve the Shannon bound, i.e.,
that

l(((X)))

log2 Pr((X))
Replace average length by the entropy H((X))

entropy-constrained VQ.

Optimal Encoder given Decoder and

Index Decoder
Encoder is only component depending on all
other components.
Given & , the optimal encoder is the
minimum (modified) distortion encoder
(generalized nearest neighbor)

+ l((i))].
(x)

= min1[d(x, (i))
iI

Proof:
Z

E[(X, ((X)))] =
dPX (x)[d(x, ((x))) + l((x))]
Z
dPX (x) min[d(x, i) + l(i)]
iI

If code fixed rate, then l(i) = R all i and this

reduces to the usual minimum MSE rule
(Euclidean nearest neighbor)

(x)

= min1d(x, (i)).
iI

Optimality properties iterative design

algorithm
can improve it by applying
Given code (,
, ),
three properties:
optimize the encoder for the given decoders
optimize the decoder for the given encoder
optimize the index decoder for the given
encoder
Distortion is nonnegative and nondecreasing
descent algorithm.
In general converges only to a stationary point
no guarantee of global optimality

Variations
Problem with general codes satisfying Lloyd
conditions is complexity:
NN encoder requires computation of distortion
between input and output(s) for all codewords.
Exponentially growing computations and
memory.
Partial solutions:
Allow suboptimal, but good, components.
Decrease in performance may be compensated by
decrease in complexity or memory. May actually
yield a better code in practice.
E.g., by simplifying search and decreasing
storage, might be able to implement an otherwise
unimplemental dimension for a fixed rate R.

Suboptimal channel codebook Constrain

. use any good, if suboptimal, lossless code
(arithmetic, Lempel-Ziv)
Fixed rate codes.
Suboptimal encoder fast searches which
might not yield NN, e.g., greedy tree
structures or search only near to previous
selection
Constrained code structure Insist
codebooks have structure that ease searching
or lessen memory. E.g., lattice codes, product
codes (scalar, gain/shape, mean/shape),
transform codes (transform big vector then
use simple quantizers on output), and
tree-structured codes

Fixed Rate Channel Codebooks

Constrain all channel codewords to have same
length R bits.
in the Lagrangian formulation the codeword
lengths or instantaneous rate does not enter into
consideration and can simply use the
distortion-rate formulation, i.e., try to minimize
the average distortion for the given common rate
R of all codewords.
Here optimal encoder becomes simply Euclidean
nearest neighbor selection.
traditional vanilla VQ
For fixed rate codes, the algorithm is Lloyds
optimal PCM design algorithm extended to
vectors.
Forgeys algorithm, k-means, Isodata, principal
points
(classical clustering algorithms )
31

Other Clustering Algorithms

Most fixed-rate.
Pairwise nearest neighbor (PNN) (Ward,
Equitz)
Classical k-means: incremental update
variation of Lloyd.
k-means Codebook update:
recompute centroid for codeword i:
1 x(n) =
i(n) = n1

1)
+
n i
n
1
i(n 1) + n (x(n) i(n 1)).
Alternative form:
define a(n) = 1/n
k-means update rule is
i(n) = i(n 1) + a(n)(x(n) i(n 1))
if d(x(n), i(n 1)) d(x(n), l (n 1)), all l.
This is idea behind Kohonens self organizing
feature map (SOFM) applied to clustering.
32

Neural net approaches: (all incremental)

back propagation
competitive learning
SOFM
simulated annealing, stochastic relaxation
Randomize and reduce randomization. E.g.,
put in random jumps out of local optima or
add white noise then reduce
(some proven global optima, but enormously
complex)
deterministic annealing: Rose, Gersho et al.
Randomize quantizer, replace by maximum
entropy distribution, then reduce entropy to
zero (to a deterministic quantizer)

Basic idea: Input x is mapped into a channel

codeword i randomly according to a
conditional probability mass function:
pi(x) = Pr( channel codeword = i|X = x)
=

1 d(x,y )

i
e T
P

1 d(x,y )

l
e T

Note this is not Lloyd optimal!

The centroid condition is then replaced by
P
n xnpi(xn)
(i) = P
n pi(xn)
For fixed T , iterate pmf/centroid
constructions to convergence.
Iterates as T 0 to hard quantizer.

Local vs. global optimality

No guarantee globally optimal. Conditions are
necessary not sufficient. No fully successful
approach to global optimization, but several
avenues:
Exhaustive search
Stochastic relaxation
Simulated Annealing
Deterministic Annealing
Multiple initial guesses
Most evidence anecdotal, very few theorems.

There are also non-clustering methods for

constructing fixed rate VQ codebooks.
Random selection or choose from training set.
Pruned random selection: only take words
when no existing words are good enough.
fractal codes: Take codebook from image and
using affine transformations of codewords to
get other codewords. Problem to find good
codebook.
As a VQ, decoding simple. Encoding hard.
Great for ferns and fjords.

Suboptimal Encoders
Several NN search speedups and shortcuts
reported in quantization and pattern recognition
literature. Most ad hoc and help to some degree.
Tree-structured codes
Every channel codebook is assumed to be a prefix
code and hence can be depicted as a binary tree.
Binary prefix-free codes can be depicted as a
binary tree:

1
}
@
@

label

@
R
@

1
0

}

}

X
XXXX

X}

}
HH
HH
HH}

1111
1110

110
terminal node

@
@

@
@}

10 codeword

root node

}
A
A
A

branch

A

A
A
A
A

011

}

}
HH
HH
}

6
HH}

XXX
XXX}

1
1
0
A
0
A
A}
siblings
@
}
@

1

}
@
X
XXXX
@

X}
?
1
0 @
0
@
}
HH
HH

}
1

HH

}
0 * XXXXX
parent
X}

0
child

0101
0100
0011
0010
0001
0000

Can view each channel codeword as pathmap

through the tree.
Can make code progressive and embedded by
producing reproductions with each bit.
Optimally done by centroid of node, conditional
expectation of input given pathmap to that node.
38

Suggests suboptimal greedy encoder: Instead

of finding best terminal node in tree (full-search),
advance through the tree one node at a time,
performing at each node a binary search
involving only two children nodes.
Both choices increase length by one pairwise
Euclidean NN.
Provides an approximate but fast search of the
reproduction codebook: tree-structured vector
quantization (TSVQ)

Can either
1. Begin with codebook and design a tree search
for the codebook.
2. Modify Lloyd to design the tree structured
codebook from scratch.
One way to design TSVQ from scratch:
Step 0 Begin with optimal 0 rate tree.
Step 1 Split node to form rate 1 bit per
vector tree.
Step 2 Run Lloyd.
Step 3 If desired rate achieved, stop. Else either
Split all terminal nodes (balanced tree), or
Split worst terminal node.
Step 4 Run Lloyd.
Step 5 Go to Step 3.

Effect is that once a vector encoded into a node,

it will only be tested against decendents of that
node.
6

v
-

(a) 0 bit resolution

Codeword 0

Codeword 1

f
f

(b) 1 bit resolution

v
I
@
@

Only training
vectors mapped
Aff
A into codeword 0
A
are used
AU
v

v
AK
A

Only training @ff@

vectors mapped @@R v
into codeword 1
are used
(c) 2 bit resolution

TSVQ Summary: balanced vs. unbalanced,

fixed-rate vs. variable rate
HH

1
H

HH
j

J
^
J

J
J
^
J

H 1
HH
j
H

1
J
^
J

J 1
J
^
J

Sequence of binary decisions. (Each node

labeled, do pairwise NN.)
Search is linear in bit rate (not exponential).
Increased storage, possible performance loss
Code is successive approximation (progressive,
embedded)
Table lookup decoder (simple, cheap,
software)
Fixed or variable rate
43

BFOS TSVQ Design

Basic idea is taken from methods in statistics for
designing decision trees: (See, e.g.,
Classification and Regression Trees by
Breiman, Friedman, Olshen, & Stone)
first grow a tree,
then prune it.
trade off average distortion and average rate.
By first growing and then pruning back can get
optimal subtrees of good trees.
Can get benefits of lookahead without the full
computational load.

Growing Trees
Balanced Tree Growing
Split all nodes in a level, cluster labels for child
nodes using distribution conditioned on parent.
Problem: Can get empty nodes.
Unbalanced Tree Growing
Split one node at a time.
Split worst node, one with largest conditional
distortion or partial distortion. (McKhoul,
Roucos, Gish)
Split in a greedy tradeoff fashion: maximize
change in distortion if split node t
|.
(t) = |
change in rate if split node t

0HHHH1H

j
H

0HHHH1H

j
H

JJJ 1

J
^
J

Pruning trees
Prune trees by finding subtree T that minimizes
change in distortion if prune subtree T
(T ) = |
|.
change in rate if prune subtree T
These optimal subtrees are nested
(Generalized BFOS Algorithm, related to
CART algorithm
TM

0HHHH1H

j
H

JJJ 1

0HHHH1H

JJJ 1

J
^
J

JJJ 1

J
^
J

0HHHH1H

j
H

J
^
J

Average Distortion
6

f
LL
L

L
L
L

v
L
L

v
L
L
L
L
L
L

v
v

Lv
@
@

v
v
v

@v
PP
P

v
v

v
v
PP v
PPv
hhhh
hhh v
hhv

1.5

2.5

3.0

Average Rate

Points: distortion-rate pairs of all possible

subtrees of a large tree.

Part III: Structured VQ

Lattice VQ
Codebook is subset of a regular lattice.
A lattice L in <k is the set of all vectors of the
form Pni=1 miui, where the mi are integers and
the ui are linearly independent (usually
nondegenerate, i.e., n = k).
E.g., a uniform scalar quantizer (all bin widths
equal) is a lattice quantizer. A product of k
uniform quantizers is a lattice quantizer.
E.g., hexagonal lattice in 2D, E8 in 8D.
Multidimensional generalization of uniform
quantization. (Rectangular lattice corresponds to
scalar uniform quantization.)

Advantages:
Parameters: scale and support region.
Fast nearest neighbor algorithms known.
Good theoretical approximations for
performance.
Approximately optimal if used with entropy
coding: For high rate uniform quantization
approximately minimizes average distortion
subject to an entropy constraint.

Classified VQ
Switched VQ: Separate codebook for each input
class (e.g., active, inactive, textured, background,
edge with orientation)

xn
?

VQ
Codebook Cin

un Codeword Index

in Codebook Index

Classifier
f

- m
@
@
R
@
f

Requires on-line classification and side

information.

Transform VQ
X

1
X

2
X

..
.

T
..
.

..
.
-

k
X

Multistage VQ
(Multistep or Cascade or Residual VQ)

E2
- +

6

+
X
-

2 +
E
- +
-
X

+ 6
1
X

1
X
2-Stage Encoder

-I
1

-I
2

1
X

E2

-+

+6

-I
3

2
E

E3

-+

+6
E2

3
E

+ - E4

+6
E3

Multistage Encoder

1
X

?
2
E
-+
-
X
6

3
E
3-stage Decoder

Product Codes
Mean-Removed VQ
Shape Codebook
Cr =
j ; j = 1, , Nr }
{U

6

- +

Sample
Mean

Scalar
Quantizer
6

Mean Codebook
Cm =
{m
i ; i = 1, , Nm }

Shape/Gain VQ

Shape Codebook
Cs =
j ; j = 1, , Ns }
{S

Maximize
j
Xt S

Minimize
2
[
g Xt S)]

Gain Codebook
Cg =
{
gi ; i = 1, , Ng }
j
S
?

(i, j)

ROM
gi

Shape/Gain Decoder

Hierarchical VQ

8 bpp

2D VQ

4 bpp

4D VQ

2 bpp

8D VQ

1 bpp

Each arrow is implemented as a table lookup

having 65536 possible input indexes and 256
possible output indexes. The table is populated
by an off-line minimum distortion search.

Recursive VQ
Predictive VQ
Codebook
C=
{i ; i = 1, , 2R }

e
+ n
- +

6

n
e

- +
+

n
X

Vector
Predictor
Encoder

n
e
in

V Q1
6

+
- +

6+

-X

Vector
Predictor

n
X
Codebook
C=
{ei ; i = 1, , 2R }

Decoder

Finite State VQ
Switched Vector Quantizer: Different codebook
for each state + Next State Rule
f

CK
-

..
.

next-state
function

Sn+1

Unit
Delay

xin CSn

Decoder

CK
-

..
.

next-state
function

Sn+1

Unit
Delay

Encoder

Part IV:
Quantization and Source Coding
Theory
X = X k , k allowed to vary.
Can define optima for increasing dimensions:
k (D), and J,k
k (R), R
D
Quantities are subadditive & can define
asymptotic optimal performance

k (R) = lim D
k (R)
D(R) = inf D
k
k

k (D) = lim R
k (D)
R(D)
= inf R
k
k

J = inf J,k = lim J,k .

(1)
(2)
(3)

Shannon coding theorems relate these

operationally optimal performances (impossible
to compute) to information theoretic
minimizations.
60

D(R)
= D(R),
R(D)
= R(D)
i.e., operational DRF (RDF) = Shannon DRF
(RDF)
To define these Shannon quantities need some
definitions from information theory:
Average mutual information between two
discrete random variables X and Y is
I(X; Y ) = H(X) + H(Y ) H(X, Y ) =
Pr(X = x, Y = y)
Pr(X
=
x,
Y
=
y)
log
2
x,y
Pr(X = x) Pr(Y = y)
X

Definition extends to continuous alphabets by

maximizing over all quantized versions:
I(X; Y ) = sup I(1(X); 2(Y ))
,
1

Shannon channel capacity: Channel described by

family of conditional probability distributions
PY k |X k , k = 1, 2, . . .
1
C = lim sup I(X k ; Y k )
k P k k
X

Shannon distortion-rate function: Source

described by family of source probability
distributions PX k , k = 1, 2, . . .
1
D(R) = lim
E[dk (X k , Y k )]
inf
k P k k :I(X k ;Y k )kR k
Y |X
Lagrangian a bit more complicated, equals

D + R, where D = D(R)
at point where is
the magnitude of the slope of the DRF.
Shannons distortion rate-theory is asymptotic:
fixed rate R and asymptotically large block size k

Another approach: Fixed k, asymptotically large

R.
X, fX (x), MSE
= {yi; i = 1, . . . , N },
VQ Q or (,
, ):C
S = {Si; i = 1, . . . , N }
Average distortion:
1
D(Q) = E[||X Q(X)||2]
k
1 N
X Z
2f (x) dx
||x

y
||
=
i
X
k i=1 Si
Average rate: If fixed rate code:
R(Q) = k 1 log N
If variable rate code:
1 N
X
PX (Si) log PX (Si)
R(Q) = Hk (X) =
k i=1

Operational distortion-rate functions:

k,f (R) =
D
k,v (R) =
D

inf

D(Q)

inf

D(Q)

QQf :R(Q)R
QQv :R(Q)R

where Qf and Qv are the sets of all fixed and

variable length quantizers respectively.
Since
1 N
X
Hk (X) =
PX (Si) log PX (Si) k 1 log N,
k i=1
must have
k,f (R) D
k,v (R)
D
I.e., since collection of codes satisfying
variable-rate constraint is bigger and hence
infimum is smaller.

Shannon described the asymptotic behavior of

operational distortion rate functions if R fixed
and k .
High rate quantization theory considers case k
fixed and R
Bennett(1948), Zador (1963), Gersho (1979),
Neuhoff and Na (1995)
Define volume V (Si) of cell Si:
Z
V (Si) = Si dx
If finite, say in granular region. Else in overload
region.
kD(Q) =

i:V (Si)<
X

i:V (Si)=

2f (x) dx+
||x

y
||
i
X
Si

2f (x) dx
||x

y
||
i
X
Si

Granular distortion and overload distortion

Bennett assumptions:
N is very large
fX is smooth (so that Riemann sums
approach Riemann integrals and mean value
theorem of calculus applies)
The total overload distortion is negligible.
E.g., all the probability is on a bounded set.
The volumes V (Si) of all bounded cells are
tiny.
The reproduction codewords are the Lloyd
centroids of their cell.

Since fX is smooth, the cells are small, and

y i Si ,
fX (x) fX (yi); x Si
From the mean value theorem of calculus
Z

PX (Si) = Si fX (x) dx V (Si)fX (yi)

hence
PX (Si)
.
fX (yi)
V (Si)

Z
||x yi||2
1 N
X
dx
PX (Si) Si
D
k i=1
V (Si)

For each i, yi is the centroid of Si and hence

Z
||x yi||2
dx
Si
V (Si)
= minimum MSE for a 0 bit code for a uniformly
distributed random variable on Si = moment of
inertia of the region Si about its centroid
Convenient to use normalized moments of
inertia so that they are invariant to scale.
Define
Z
1
||x y(S)||2
dx
M (S) =
S
2/k
V (S)
kV (S)
where y(S) denotes the centroid of S.

Then if c > 0 and cS = {cx : s S}, then

M (S) = M (cS)
M depends only on shape and not upon scale.
Now:
D
=

N
X
i=1
N
X
i=1

PX (Si)M (Si)V (Si)2/k

fX (yi)M (Si)V (Si)1+2/k

Assume as N , reproduction vectors

C = {yi; i = 1, . . . , N } have a smooth point
density (x) in the sense that
1
( # reproduction vectors in a set S)
N
Z
S (x) dx; all S.
Z

<k (x) dx = 1

Then

1
V (Si)
(yi)
69

(n)

Bi (x) as the cell of the codebook Cn which

contains x. Assume that as N and
(n)
Bi (x) x, then
(n)

M (Bi (x)) m(x),

the inertial profile of the sequence of codebooks.
(assumed smooth)
For large N
m(X) 2R
]2
,
D E[
2/k
(X)
where

1
R = log N
k

This is Bennetts integral or the Bennett

approximation.
Bennett: m = 1/12, k = 1
Gersho: k 1
Na & Neuhoff: general m(x)
Can use to evaluate scalar and vector quantizers,
transform coders, tree-structured quantizers.
Can provide bounds.
Used to derive a variety of approximations in
compression, e.g., bit allocation and optimal
transform codes.

Optimizing m(x)
Gershos conjecture:
If fX (x) is smooth & R is large, the
minimum distortion quantizer has cells Si
that are (approximately) scaled, rotated, &
translated copies of S , the convex
polytope that tesselates Rk with minimum
normalized moments of inertia M (S), i.e.,
m(x) = Ck = min M (S) = M (S )
S

Optimal Ck known only for k = 1 (interval

1 = 0.08333 . . .)
C1 = 12
and k = 2 (regular hexagon,
C2 = 5 = 0.08019 . . .)
36 3
Many guesses and bounds exist for higher k.
(Conway and Sloane)
Sphere bound:
2/k

2 dx
||x||
Vk
S
=
Ck M ( sphere ) =
1+2/k
k+2
kV (S)
R

where Vk is the volume of a sphere in k

dimensions with unit radius:
2 k/2
k/2
=
Vk = k
( 2 + 1) k( k2 )
where

(t) = 0 xt1ex dx
Z

High Rate Entropy Approximation

entropy of the quantized vector is given by
H = H(Q(X)) =

N
X
i=1

PX (Si) log PX (Si),

where PX (Si) = RSi fX (x) dx.

Again make approximation that
PX (Si) fX (yi)/(N (yi)) = fX (yi)V (Si)

1
H h(X) E(log
),
N (X)
where h(X) is the differential entropy.
Thus for large N
1
).
H(Q(X)) h(X) E(log
N (X)
Connection between differential entropy and
entropy!
74

Optimum Quantizer Point Densities

Assume that Gershos conjecture holds. (Or use
bound.)
(Otherwise can at least use same ideas to obtain
lower bounds to operational DRF and describe
point densities which result in lower bounds.)
fixed rate:
1
2R
]2
D Ck E[
(X)2/k
Find the optimum point density (x), i.e., find
which minimizes
E((X)2/k ) =
subject to

fX (x)
dx
2/k
(x)

(x) dx = 1

Holders inequality implies

E((X)2/k ) =

f (x)(x)2/k dx ||fX ||k/(k+2)

with equality if and only if (x) is proportional

to f (x)k/(k+2)
Thus using minimizing point density Gershos
conjecture for N large that
k,f Ck ||fX ||
22R
D
k/(k+2)

These provide an approximation and a bound to

the operational DFR for fixed rate codes.
Variable rate: constrain H(Q(X)) instead of
k 1 log N
First: derive an interesting property for the fixed
rate case.

Partial Distortion Property

Assume Gershos conjecture holds. Then
1
V (Si)(yi)
N

k
k+2
2
||fX || k
Si ||x yi|| fX (x) dx Ck N

k+2

for all i!
In fixed rate case, for asymptotically large N the
optimum quantizer has the property that the
cells contribute approximately equally to the
overall average distortion.

Variable rate codes

Recall that for large N
1
).
H(Q(X)) h(X) E(log
N (X)
Find optimum point density when entropy
constrained instead of N :
Jensens inequality
1
)1/k )
H h(X) kE(log(
N (X)
1
h(X) k log E((
)1/k ).
N (X)
2 (Rh(X))

D(Q) Ck 2 k
,

with equality iff (x) = constant, e.g.,

1
; xA
(x) =
V (A)
Thus under the Bennett conditions
2h(X)
2R
k,v (R) Ck 2
D
2 k
with equality iff (x) = constant.
78

If N very large, then lattice codes chosen so that

Voronoi cell has minimum normalized moment of
inertia are nearly optimal in the ECVQ problem.
Tradeoff: Lattice codes mean simple NN selection
and nearly optimal variable rate performance,
but need to entropy code to attain performance.
ECVQ design can do better, and sometimes
much better if rate not asymptotically big.
Note:
For the fixed rate case, cells should have
roughly equal partial distortion.
For the variable rate case, cells should have
roughly equal volume.
In neither case should you try to make cells have
equal probability (maximum entropy).

Summary
m(X)
2/k
]N
D(Q) E[
(X)2/k
2R
k,f (R) Ck ||fX ||
2
D
k/(k+2)
2h(X)

k,v (R) Ck 2 k 22R

D
Can use high rate quantization theory to
quantify the gains of vector quantization over
scalar quantization as a function of dimension k,
and to separate out the gains as those due
memory, to space filling (moment of inertia), and
to shape (of density function).

Gish & Pierce

Asymptotic theory implies that for iid sources,
high rates (low distortion)
1,v (R)
C1
D

1.533dB

Dk,v (R) C
Or, equivalently, for low distortion
1,v (D) R
k,v (D) 0.254 bits
R
famous quarter bit result.
Suggests at high rates there may be little to be
gained by using vector quantization (but still
need to use vectors to do entropy coding!)
Ziv (1985) showed that for all distortions
k,v (D) 0.754 bits
1,v (D) R
R
= Q(X + Z) Z
using a dithering argument: X
with Z uniform and independent of X.
(subtractive dither)
81

Comment: If fix R and let k , then

k,v (R)
D

D(R)
k

k,f (R)
D

D(R)
k

the Shannon DRF.

Bennett theory recently extended to
input-weighted squared error measures.

Recent Extensions
Can generalize to input-dependent squared error
and to non-difference distortion measures that
behave locally in this way (have a well-behaved
Taylor series expansion)

= (X X)
BX (X X),
d(X, X)
Fixed rate:
k
k2 2
Ck N k
D DL(Qopt) =
k+2
Z

1
1
k+2
1+2/k
[f (x)(det(B(x))) k ]
dx} k

Optimal point density

(x)

1 k
(p(x)(det(B(x))) k ) k+2

Variable rate:
D DL(Qopt)
k2
R
kCk
k2 (HQh(p) 12 log(det(B(x)))f (x)dx)
e
=

k+2

opt(x) =

1
(det(B(x))) 2
1
R
xG(det(B(x))) 2 dx

(4)

Final Comments

The Shannon compression model:

codes, rate, and distortion
Mathematical model of compression system.
Successfully applied to asymptotic
performance bounds and design algorithms.
Optimalility properties and quantizer
design
Lloyd optimal codes. For high rate, lattice
codes + lossless codes nearly optimal (variable
rate)
Rate-distortion ideas can be used to improve
standards-compliant and wavelet codes. (e.g.,
Ramchandran, Vetterli, Orchard)
Multiple distortion measures (include other
signal processing)
MPEG 4 style image structure?
85

Compression on networks, joint source and

channel coding
Structured vector quantizers
Transform/pyramid/subband/wavelet are
currently best (rate, distortion, complexity)
for image compression, especially embedded
zerotrees
Many variations and alternative structures
come and go, some stay.
Optimal achievable performance:
quantization and source coding theory
Very general Shannon theorems exist. Recent
flurry of work on universal and adaptive codes.
Multiple distortion measures, non-difference
(perceptual) distortion measures.
Current work on convergence and rate of
convergence (Vapnic-Cernovenkis theory), use
in classification and regression (Devroye,
Gyorfi, Gabor, Nobel, Olshen)
86

Open problems: Gershos conjecture, correct

combination of wavelet theory + Bennett
(Goldberg did for traditional bit allocation
approach, not yet done for embedded zerotree
codes and related), other improved
combinations of wavelet and compression
theory and design (Mallat, Orchard).
Another issue: Perceived quality of compressed
audio and images.
Different applications have different fundamental
quality requirements:
Entertainment, browsing, screening, diagnostic,
legal, scientific
Quantitative as predictors for subjective and
diagnostic.

Business Mathematics: Quarter 1 - Module 1: New Normal Math For Grade 11
100% (3)
Business Mathematics: Quarter 1 - Module 1: New Normal Math For Grade 11
18 pages
Convolutional Coding and Viterbi Decoding
No ratings yet
Convolutional Coding and Viterbi Decoding
30 pages
Lec40 - 210102096 - VEDIKA GARG
No ratings yet
Lec40 - 210102096 - VEDIKA GARG
5 pages
Lecture 7
No ratings yet
Lecture 7
31 pages
Source and Channel Encoder and Decoder Modeling: S-72.333 Postgraduate Course in Radiocommunications Fall 2000
No ratings yet
Source and Channel Encoder and Decoder Modeling: S-72.333 Postgraduate Course in Radiocommunications Fall 2000
17 pages
Mesleki Yeterlilik
No ratings yet
Mesleki Yeterlilik
106 pages
Lec 7
No ratings yet
Lec 7
26 pages
06 Source Coding WuG 2023 10 10 17 24
No ratings yet
06 Source Coding WuG 2023 10 10 17 24
71 pages
ITC 2020 21 Lecture 7
No ratings yet
ITC 2020 21 Lecture 7
15 pages
RDT Ortego
No ratings yet
RDT Ortego
28 pages
IT1251 Information Coding Techniques
No ratings yet
IT1251 Information Coding Techniques
23 pages
(Lecture Notes) Jyrki Lahtonen - Convolutional Codes (With Exercises in Finnish) (2013) PDF
No ratings yet
(Lecture Notes) Jyrki Lahtonen - Convolutional Codes (With Exercises in Finnish) (2013) PDF
76 pages
Assignment Cyber Security Solved
No ratings yet
Assignment Cyber Security Solved
22 pages
Error Control Coding
No ratings yet
Error Control Coding
76 pages
Elearning - Vtu.ac - in P4 EC63 S11
No ratings yet
Elearning - Vtu.ac - in P4 EC63 S11
76 pages
Channel Coding
No ratings yet
Channel Coding
107 pages
2004 Martinian Wornell Allerton
No ratings yet
2004 Martinian Wornell Allerton
10 pages
Lecture 2
No ratings yet
Lecture 2
49 pages
Lecture Notes Part II
No ratings yet
Lecture Notes Part II
52 pages
Shannon Final SCS
No ratings yet
Shannon Final SCS
32 pages
First Midterm Exam
No ratings yet
First Midterm Exam
10 pages
EC 2214: Coding & Data Compression: Vishwakarma Institute of Technology
No ratings yet
EC 2214: Coding & Data Compression: Vishwakarma Institute of Technology
35 pages
Predictive PDF
100% (1)
Predictive PDF
5 pages
RDT Berger
No ratings yet
RDT Berger
35 pages
Chapter Three Source Coding: 1-Sampling Theorem
No ratings yet
Chapter Three Source Coding: 1-Sampling Theorem
19 pages
Brest Bitflipping
No ratings yet
Brest Bitflipping
87 pages
Principles of Communications: Chapter 4: Analog-to-Digital Conversion
No ratings yet
Principles of Communications: Chapter 4: Analog-to-Digital Conversion
35 pages
Data Compression Basics: Discrete Source
No ratings yet
Data Compression Basics: Discrete Source
34 pages
Convolutional Codes I Algebraic Structure
No ratings yet
Convolutional Codes I Algebraic Structure
19 pages
Convolutional Codes and Their Decoding 2010
No ratings yet
Convolutional Codes and Their Decoding 2010
33 pages
Vector Quantization K Means Nearest Neig
No ratings yet
Vector Quantization K Means Nearest Neig
19 pages
Coding Intro
No ratings yet
Coding Intro
21 pages
Lec06 PDF
No ratings yet
Lec06 PDF
10 pages
Quantization: Prof. Pooja M. Bharti IT Department Laxmi Institute of Technology
No ratings yet
Quantization: Prof. Pooja M. Bharti IT Department Laxmi Institute of Technology
35 pages
Coding Intro
No ratings yet
Coding Intro
21 pages
Chapter 5 Multi
No ratings yet
Chapter 5 Multi
16 pages
Comm-04-Phase and Frequency Modulation
No ratings yet
Comm-04-Phase and Frequency Modulation
37 pages
Midterm 00
No ratings yet
Midterm 00
3 pages
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
No ratings yet
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
48 pages
Question Bank: P A RT A Unit I
No ratings yet
Question Bank: P A RT A Unit I
13 pages
EC8002 Multimedia Compression and Communication Notes 2
No ratings yet
EC8002 Multimedia Compression and Communication Notes 2
109 pages
Mohabbat Rang Badalti Hai by Subas Gul Urdu Novels Center
No ratings yet
Mohabbat Rang Badalti Hai by Subas Gul Urdu Novels Center
127 pages
Lecture 6-Lossy Image Compression Techniques
No ratings yet
Lecture 6-Lossy Image Compression Techniques
41 pages
Video Processing Communications Yao Wang Chapter8a
No ratings yet
Video Processing Communications Yao Wang Chapter8a
19 pages
Advances in Source Coding Toby Berger
No ratings yet
Advances in Source Coding Toby Berger
67 pages
Digital Communications I: Modulation and Coding Course
No ratings yet
Digital Communications I: Modulation and Coding Course
29 pages
EC8002 MCC Question Bank Watermark
No ratings yet
EC8002 MCC Question Bank Watermark
109 pages
Lecture 15 (91 Slides)
No ratings yet
Lecture 15 (91 Slides)
91 pages
Audio Coding
No ratings yet
Audio Coding
349 pages
Lossy Compression Iii - 1
No ratings yet
Lossy Compression Iii - 1
21 pages
Department of Information Technology Information Theory and Coding Question Bank Unit-I Part - A
No ratings yet
Department of Information Technology Information Theory and Coding Question Bank Unit-I Part - A
6 pages
Inf The Rev
No ratings yet
Inf The Rev
19 pages
Cst446 May 2024-Scheme
No ratings yet
Cst446 May 2024-Scheme
10 pages
Source Coding: Source Encoder Channel Encoder Digital Source Source Entropy Symbols Binary Sequence Modulator
No ratings yet
Source Coding: Source Encoder Channel Encoder Digital Source Source Entropy Symbols Binary Sequence Modulator
18 pages
2015 Chapter 8 MMS IT
No ratings yet
2015 Chapter 8 MMS IT
12 pages
PIE Cheat Sheet
No ratings yet
PIE Cheat Sheet
4 pages
Module-3 Information Theory: Entropy Source-Coding Theorem
No ratings yet
Module-3 Information Theory: Entropy Source-Coding Theorem
14 pages
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Unit 1 Mam181 Question Answers
No ratings yet
Unit 1 Mam181 Question Answers
5 pages
Roch Mmids Intro 5exercises
No ratings yet
Roch Mmids Intro 5exercises
9 pages
As Physics Presentation Coursework
100% (2)
As Physics Presentation Coursework
6 pages
An Improved Algorithm For Matching Large Graphs: L. P. Cordella, P. Foggia, C. Sansone, M. Vento
No ratings yet
An Improved Algorithm For Matching Large Graphs: L. P. Cordella, P. Foggia, C. Sansone, M. Vento
8 pages
CU-2021 B.Sc. (Honours) Mathematics Semester-VI Paper-CC-14P Practical QP
No ratings yet
CU-2021 B.Sc. (Honours) Mathematics Semester-VI Paper-CC-14P Practical QP
2 pages
Mod 1
No ratings yet
Mod 1
66 pages
Mth100 Mid Term by Math Zone For Vu
No ratings yet
Mth100 Mid Term by Math Zone For Vu
166 pages
M.Sc. Mathematics 20 7 21
No ratings yet
M.Sc. Mathematics 20 7 21
97 pages
Mathematics Grade 10 Term 1 Week 7 - 2021
No ratings yet
Mathematics Grade 10 Term 1 Week 7 - 2021
6 pages
Mathxl Online Homework
100% (1)
Mathxl Online Homework
6 pages
ASMO 2019 Maths Grade 10
67% (3)
ASMO 2019 Maths Grade 10
10 pages
Scientific Calculator: Comparison Chart
No ratings yet
Scientific Calculator: Comparison Chart
2 pages
Abbott-TeachYourselfAlgebra Text PDF
100% (1)
Abbott-TeachYourselfAlgebra Text PDF
157 pages
STAT 5 Week 9 Notes Part 1
No ratings yet
STAT 5 Week 9 Notes Part 1
3 pages
College Algebra 8th Edition Ron Larson Instant Download
100% (1)
College Algebra 8th Edition Ron Larson Instant Download
61 pages
Differential Equations of Higher Order: N n-1 n-2 1 2 N N n-1 n-2
No ratings yet
Differential Equations of Higher Order: N n-1 n-2 1 2 N N n-1 n-2
31 pages
CONTROL SYSTEM Assignment and Objective Bits
No ratings yet
CONTROL SYSTEM Assignment and Objective Bits
3 pages
Solution Chemistry Lesson Plan
No ratings yet
Solution Chemistry Lesson Plan
14 pages
The Transfer Function X (S) /F(S)
No ratings yet
The Transfer Function X (S) /F(S)
3 pages
Revision of Halmos: "How To Write Mathematics"
No ratings yet
Revision of Halmos: "How To Write Mathematics"
3 pages
SSRN 4742238
No ratings yet
SSRN 4742238
29 pages
(Math-Urdu-Eng) Worksheets Grade 3
No ratings yet
(Math-Urdu-Eng) Worksheets Grade 3
63 pages
Engineering Graphics Unit I
No ratings yet
Engineering Graphics Unit I
134 pages
System of Linear Equations Review Answers
No ratings yet
System of Linear Equations Review Answers
4 pages
Kernels For Edge Dominating Set: Simpler or Smaller: Abstract. A Kernelization For A Parameterized Computational Problem
No ratings yet
Kernels For Edge Dominating Set: Simpler or Smaller: Abstract. A Kernelization For A Parameterized Computational Problem
2 pages
Antiderivative of Arctan
No ratings yet
Antiderivative of Arctan
4 pages
Maths Worksheet
No ratings yet
Maths Worksheet
58 pages
College Algebra ch01
No ratings yet
College Algebra ch01
26 pages
Structural Dynamics Prof. P. Banerji Department of Civil Engineering Indian Institute of Technology, Bombay Lecture - 3 Dynamics of SDOF Structure
No ratings yet
Structural Dynamics Prof. P. Banerji Department of Civil Engineering Indian Institute of Technology, Bombay Lecture - 3 Dynamics of SDOF Structure
19 pages