RDT Ortego
RDT Ortego
rate-distortion (R-D)based optimization techniques briefly outllining some of its fundamental contributions
and their practical application to image and video related to its establishment of the fundamental perfor-
coding. We begin with a short discussion of classical mance boiunds on compression systems for specific
rate-distortion theory and then we sources. This will lead us to show
show how in many practical coding
scenarios, such as in stan- From Information Thheov the potential limitations of these
theoretical results when it comes
dards-compliant coding environ- timization of to dealing with complex sources
ments, resource allocation can be such as images or video, thus set-
put in an R-D frameworlc we then CO tandlards ting the stage for the more restric-
htroduce two popular techniques ti& R ~ D optimization
for resource allocation, namely, Lagrangian optimization frameworks to be discussed in later sections.
and dynamic programming. After a discussion of these two
techniques as well as some of their extensions,we conclude
with a quick review of recent literaturein these areas citing a C/ussfcu/R-D theory
number of applicationsrelated to image and video compres- The starting point of classical rate distortion (R-D) the-
sion and transmission.We also provide a number of Illustra- ory can be found in Shannon’s seminal work [3, 41,
tive boxes to capture the salient points of the article. whose 50th anniversary is being celebrated this year.
Rate-distortion theory comes under the umbrella of
source coding or compression, which is concerned with
the task of haximaliy stripping redundancy from a
From Shannon to MPEG Coding source, subiect to a fidelitv criterion. In other words.
, I
Recent years have seen significant research activity in the rate-distortion theory is concerned with the task of repre-
area of R-D optimized image and video coding (see [ 1,2] senting a source with the fewest number of bits possible
and many of the references in this article). In this section, for a given reproduction quality.
,,
Original Lena Our goal is to enable one to
“visualize” the effectiveness
of these models by synthesizing “images” derived from typical
realizations of these assumed models. The advantages of such
an exercise are manyfold: it can not only expose the powers and
shortcomings of different attributes of the test models, but it
can additionally inspire the creation of new and improved
frameworks that can embellish some of the drawbacks.
While practical image coding algorithms have been founded
on a variety of operational frameworks, simple statistical mod-
els have been very popular. For example, early state-of-the-art Gaussian Approximation
~
_ _ SPHlT
subband image coding frameworks were based on i.i.d. models 25
for image subbands (based on Gaussian or Laplacian p.d.f.s) 0.1 0.2 03 0.4 0.5 0.6 0.7 0.
and optimal bit-allocation techniques to ensure that bits were Rate, bpp
optimally distributed among the subbands in proportion to
their importance, as gleaned through the variance of their dis- A 2. Rate-distortion curves achieved with (i) the SPIHT coder
tributions [ 191. A coding algorithm based on such a frame- [2 11; and (io with the Shannon R-D bounds corresponding to
work would no doubt be very efficient at incorporating the an i i d . zero-mean Gaussian model for each wavelet
second component mentioned in the previous paragraph, but it subbands (with empirically measured variances): this results
raises the obvious question about how accurate an i.i.d. in a Gaussian vector source, and water-pouring arguments
Laplacian or Gaussian subband model might be. Let us try to ore used to find the theoretical R-D bounds [15].
complex to even allow us to find a bound!). This point is il- Choosing the Parameters of a Concrete System:
lustrated by the example of Box l. Additional discussions Operational R-D
of approaches for practical modeling of complex data can To guarantee that o u r design will be practical we can
be found in the article by Effros in this special issue [ 181. abandon o u r search for the best utnconstrained R-1)
performance of any system. Instead let us start by
choosing a specific coding scheme that efficiently cap-
R-D Meets R&D: Operational R-D in tures the relevant statistical dqmidciicies associated
Practical Coder Design with the source, while also satisfying o u r system re-
As just discussed, R-D performance is the fundamental quirements of coding complexity, delay and memory.
trade-off in the design o f any lossy compression system. T h e n we can search for the best operating points for
W e have highlighted how fundamental theoretical re- that specific system.
search has resulted in the computation of performance For example consider a scalar quantizer followed by an
bounds but also indicated two major concerns with these entropy coder. This quantizer is completcly defined by its
R-D theoretical benchmarks: quantization bins, the reproduction level for each bin,
1.complexity (how much memory, delay or computa- and the associated codewords for each reproduction level.
tion is required? Can w e construct a practical algorithm to Well-known techniques are then available to find the best
approach the bound?), and choice (in an R-D sense) ofthese parameters for a specific
2. model mismatch (how g d are our modeling nss~unp- (statistical) source. Similar results arc available for other
dons? are they too simple to characterize the sources frilly?). compression schemes (e.g., fixed-rate scalar quantizer,
These will be addressed in the following sections. vector quantizer of dimension N, etc.).
For a given system and source, i f w e consider all pos- operating points. The boundary between achievable and
sible quantization choices, we can define an opemtional nonachievable performance is then defined by the convex
rate-distortioncunx. This curve is obtained by plotting hull o f the set of operating points.
for each rate the distortion achieved by designing the From now on, we will consider optimality in the o p e w
best encoder/clecoder pair for the rate. N o t c that these tional sense; i.e., the best achievablc perforiimancc for a given
points are operational in that they are directly achievable soiirce (as described by a training set o r by a given statistical
with the chosen implementations and for the given set o f model) given o u r choice of compression framework.
test data. This bound will allow LIS to distinguish be- Kef?)rcwe address the issue ofinodeling, let us first con-
tween the best achievable operating points and those sider in more detail the problem of optimal cncodcr/de-
that are suboptimal o r unachievable. While the bound coder design. T h e basic design mechanism can be
given by Shannon’s theoretical K-1) function gives n o summarized as follows. First select a piirticular coniprcs-
constructive procedure for attaining that optimal per- sion franie\vorli (for example, a scalar quantizer with eti-
formance, in the operational K-1) casc, wc always deal tropy coding as abovc), then proceed by altcrnativelv
with achievable points. designing the encoder (i.e., the rule to map real-valued in-
A particular case of interest, which we will describe puts to quantization indices) and the decoder (i.e., the nile
later, is one where the encoder can select aniong a fixed to reproduce a particular quantizer index at the receiver).
and discrete set o f coding paramctcrs, with cach K-1) First the encoder is optimized for the given decoder;
point being obtained through thc choice o f a specific i.e., given the reproduction levels at the decoder, an en-
combination of coding parameters. In that scenario, as il- coder is designed that produces a mapping having the
lustrated by Fig. 7, we can plot the individual admissible minimum distortion for a given rate, for the given train-
e term tr‘insform coding generically describes coding stream. However, tlie metnoryless assumption is typically a
Th techniques where the so~ircedata is first decomposed us-
ing a lincar transform and wlicrc each oftlie frequency compo-
bad one, and significant gains can be had by exploiting tlie
memory in tlie quantized bitstream (e.g., zero \dues tend to
nents obtained from tlie decomposition are then quantized. cluster). A simple way to exploit this is through zero rLinlengtli
A typical transform-based image coder comprises tlie cas- coding. JI’EG L I S ~ Sa two-dimensional entropy code based on
cade of a front-end linear transform followed by a scalar tlie length of the zero-run and the magnitude of tlic noiizero
quantization stage and then an entropy coder. The transform coefficient breaking up the run.
serves tlie dual roles of (i) energy compaction, so that tlie bulk The AC coefticients are processed in a “zig-zag” manlier
of tlie signal energy is isolated in a small fraction of tlie trans- (see Fig. 8) that approximately orders coefficients from lowest
form coefficients and (ii) signal decorrelation, s o that there is to highest frequency. Run-length codes represent tlie se-
little loss in performance due to simple scalar quantization: quence of quantized coefficients as (run, value) pairs, where
this is possible because the set of all transform coefficients rep- “run” represents the number of zero-\~alucdAC coefficients
resenting a given frequency can be, to first order, modeled as a between the current nonzero coefficient and the previous noti-
memoryless soLirce (e.g., i.i.d. Gaussian or Laplacian) for zero coefficient, and “value” is the value (nonzero) of current
which efficient simple quantizers can be found. Tlie scalar cocfficient. A special end-of-block (EOB) code signals the end
quantizer is the lossy part of tlie framework and confines the ofnonzero coefficients in the current block. For tlie example in
representation to ;I discrete set ofindices corresponding to dis- Fig. 8, with three iionzero AC coefficients, the sequence after
crete quantization levels, while the last-stage entropy coder re- run-length encoding is (0,5)(0,3)(4,7)(EOH). The sequence
moves the redundancy in the quantization index stream.
of “rutis” atid “values” is compressed using Huffman or arith-
Commercial image and video compression standards are
nietic codes.
based on tlie discrete cosine transform ( I X T ) . Figurc 8 pro-
vides an example of the most popular mode of operation, the Despite the apparent rigidity ofthe JPEG syntax, there is a
so-called “baseline,” within the JPEG compression standard surprising amount of room for gains attainable with clever en-
[ 51. A briefdescription ofthe JPEG coding algorithm follows. coder optimization [39, 401. The syntax allows for the
The image is decomposed into 8 x 8 blocks for tlic purpose quantization matrix and the entropy coding table to be
of transform, quantization, and entropy coding. Blocks are adapted on a per-image basis as well as for arbitrary compres-
proccssed in a raster scan order and arc transformed independ- sion ratios desired. A more subtle option a\iailable is for the en-
ently using a block DCT. After the DCT, each 8 x 8 block is coder to “dupe” tlie decoder optimally in a rate-distortion
quantized using uniform scalar quantization. Quantization sense while being fully syntax-compatible. As an example,
step sizcs are defined for e x l i oftlie 64 frequency coefficients small nonzero values that brcak up potentially long zero-runs
using an 8 x 8 quantization matrix. Typically, a single are typically very expensive in bit-rate cost in comparison to
quantization table is used for each color component; however, their relative contribution to reducing quantization distor-
up to four different tables may be used ifneedcd. The values of tion. If the encoder can “lie” to the decoder about the magni-
the quantization tables are encoded in the header of the coni- tude of these coefficients; i.e. call these nonzero values zeroes,
pressed tile. Quantization is 11 lossy step; i.e., the information then the decoder is none the worse off, while tlic K-1) perfor-
caniiot be recovered perfectly at the decoder. However, it is mance is significantly increased.
tlie quantization operation that allows one to achieve a high A systematic way of doing this optimally in tlie K-11 sense,
compression rate at tlie price of some quality degradation. termed coefficient thresliolding, has been descrihcd in [41].
Tlie first quantized frequcncy coefficient, called tlie 1 X The good news is that sizeable K-D performance gains, ofthe
coefficient, represents the average sample value in a block and order of 25% in compression efticicncy can be realized ivhde
is predicted from the previously encoded block to save bits. beinp completely faitCfi1 t o the JI’EG yntax. Another article in
Only tlie difference from the previous D C coefficient is en- this issue 121 will provide further evidctice of the practical
coded, which typically is much smaller than the absolute value of K-D techniques in improving the quality in stan-
value of the coefficient. The remaining 63 frequency coeffi- dards-based video coding, where there is even more flexibility
cients (called AC coefficients) arc encoded using onlv tlic
I I
in choice of operating parameters.
data of tlie current block. I
The entropy coder consti-
tutes tlic second basic compo-
nent in tlie K-1) trade-off, ;is it
dctcmmines tlic iiumber of bits
that will be used for a particu-
lar image and quantization set-
ting. Tlie entropy coder is
lossless and it maps each oftlie Image Component
various quantization indices to
given codes. A simple way of
mnpacting tlie quantization
index stream would be to as-
Block-basedCoding
I
iume a nicmor\’less model fix DataOut
.:
tive-transform coding framework involves selecting the oper-
ating point of the combination of transform, quantizer, and
e
& q+- r
-
tioiis characterizing the transform coefficients, or in the ab-
sence of an analytical modcl, from training over a large class of
“typical” possible signals.
As a first step toward attaining an adaptive transform, it is
clear that an improvement can be found ifwe search over the
whole set ofbiiiary trees for a particular filter set, instead ofus-
Arbitrary Wave-Packet Trees ing the fixed tree ofthe wavelet transform. (see Fig. 9 ) .A fast
algorithm, also known as the “single tree” algorithm, to find
the best tree (dubbed the “best basis”) jointly with the best
quantization and ciitropy coding strategy has been described
in [42]. The idea is to search for the best basis (for a
rate-distortion cost function as appropriate for compression)
for the signal from a library ofwavelct packet bases. In order to
A 9. (o)Two-channel decomposition as a node and two branches achieve this, two entities arc needed: a cost function for basis
in the decomposition tree. (b) All possible binory wovelet comparison and a fast search algorithm. Sec the section on ba-
packet decompositions of depth 2. (c) Some typical depth-3 bi- sic coniponeiits in image/video coding algorithms and Box 9
nary wavelet packet subtree decompositions. for a more detailed treatment of these issues.
W h e n considering video sources, w e will need addi- scription can be found in the article by Sullivan and
tional tools to allow us to fully exploit the redundancy Wiegand in this issue [2].
between consecutive frames in a video sequence. Mo- Lack ofspace prevents us from going into more depth,
tion compensation is the most popular approach to but we refer the interested reader to recent textbooks for a
achieve this goal. T h e encoder coinputes the motion general description of the more popular methods and al-
paraiiieters (for example block-based motion vectors as gorithms, and h o w the transform-coding paradigm is put
in MPEG [32j) and the decoder uses those in the recon- to practice [ 5,31-371. W e will revisit the various building
struction framework. A particular framework will spec- blocks in transforin-codiIig frameworks later in the
ify how the motion information is transmitted and h o w article, where we will outline examples of K-1) optimiza-
it is interpreted by the decoder. A m o r e detailed de- tion applied to these algorithms.
Whimpact
.de rate-distortion-based techniques have had major
on image and video compression frameworks,
a
Separate Design
concrete esamples of scenayios where this optimization is t w o classes depending on whether compression is tar-
called for; i.e., where the encoder has to mal<e choices geted for storage or transmission applications. Kcforc
a m o n g a finite set of operating modes. presenting these formulations wc discuss
we no\nl presellt a series of gelleric problem forlnula- issues and introduce the notation.
tions that spell o u t some of the possible constraints the Selection of the basic c o d i n g unit.Until tlow, we
cncoclcr will have to meet \vhen performing this parame- have considered generic R-13 trade-offs where a codiig
ter selection. These problem descriptions are divided into unit, be it a sample, an image block, o r an image, is en-
'
NI
loss in practice [49] (see tlie “lkpendency Problems” sec- pretation of the Lagrangian cost. As the quantization in-
tion for a more detailed description of dependent alloca- dex j increases (i.e., the rate decreases and the distortion
tion problems.) Even if malting the independence increases) we have a trade-off between rate and distor-
apProgimation results in performance loss, the dcpend- tion. The Lagrange multiplier allows lis to select specific
cncy effects ‘ire often ignored to speed LIP the computa- trade-off points. Minimizing the 1,agrangian cost
tion. For esample, it is common to consider the allocation d + h . vri when h = 0, is equivalent to minimizing the
ii
of bits to frames in a video sequence as if these could be distortion; i.e., it selects the point closer to the y-axis in
trcated inctepcndently; however, due to the motion esti- Fig. 14. Conversely, minimizing the Lagrangian cost
mation loop, tlic bit allocation for one frame has the po- when h becomes arbitrarily large is equivalent t o mini-
tentinl to affect subsequent frames. mizing the rate, and thus finding the point closest to the
x-axis in Fig. 14. Intermediate \ d u e s of h determine in-
termediate operating points.
Lagrangian Optimization
Then the main result states that
The classical solution for the problem of Formulation 3 is
based o n the discrete version of Lagrangian optimization
Theorem 1 [ 6 4 , 651 If the mappincq x ’ ( i )for
i = 1,2,. . . ,N , minimizes:
Block i
so that:
the course. As Bob has spent a few more hours on his ex-
tra-curricular activities than he probably should have, and he
has to devote a considerable amount of his time to the other 30
courses as well, he realizes that he has to budget his study time /
for Physics very carefully. Suppose he is able to project his ex-
pected perforniance on both the project and the final exam
Slopes at
based on how much time he devotes to them, and further he Optimality
frames; i.e., it cvould incur a cost 1, (2) for franme 1 and examples within the MPEG coding framework that
then, given that quantizer 2 was selected for frame 1, illustrate different forms of dependent!..
would choose the minimum among a11 J z (2, x),which Trellis-based Dependency. T h e selection of
turns o u t to be J (2,2). However, in this particular esam- macroblock-le~~el quantization in an MPEG video stream
ple, the greedy approach, allocating first for frame 1 and is a d e p e n d e n t p r o b l e m because t h e rate for
then for frame 2, can be outperformed. T h e better overall macroblock i and quantizer j dcpends o n the quantizer
performance can be achieved when quantizer 1 is L I S for ~ ~ chosen for macroblock i - 1. This is because predictive
the first frame and quantizer 2 is used for the second. entropy coding of the quantization indices is used t o in-
Even t h o u g h /,(2)< Jl(l) w e have t h a t crease the coding efficiency. I n this situation it is possible
(2) + J z ( 2 2 )> It(1)+ J z (W.
ll to represent all the possible selections as a trellis where
Several types of dependency scenarios can be identi- each state represents one quantizcr selcction for a given
fied. Rather than attempt to provide a complete taxon- macroblock, with each stage of the trellis corresponding
omy of all these schemes, let us consider t w o coiicretc to one macroblock. Dynamic programming can then be
used to find the minimal cost path in this trellis, where the tween the previously decoded frame a i d the ciirrent frame.
branch cost is typically defined as the Lagrangian cost in- This difference frame is in turn compressed and used to re-
troduced above [81-831. As in the example of Fig. 16, construct the decoded version ofthe current frame. It is easy
taking the dependency into account avoids “greedy” se- to see that we have a recursive prediction loop, and thus the
lection of coding parameters, where the quantizer assign- residue frame will depend o n the selection of quantization
ment is optimized for the current coding unit alone. parameters f o r all previous frames since t h e last
In general, trellis-based dependencies arise in cases INTRA-frame [SO]. I n this case we can obsen7e that d possi-
where the undcrlying structure is such that the memory in ble combinations generated by successive quantizer choices
the system is finite (i.e., coding choices for i depend only can be representcd as a tree with the number of branches
on a finite set ofprevious coding units) and the number of growing exponentially with the number of levels ofdepend-
possible cases is also finite. I n other words, in this case, ency (i.e., number of frames since the last INTIU-frame).
the available coding parameters for a given coding unit T h e problem of dependent coding with an application
depend on the “state” of the system-the finite set of pa- to an MPEG framework is studied in [SO]. T h e main con-
rameters that completely determine achievable values. As clusion is that exponential growth in the number ofcom-
in [ 8 1-83], for these types of dependencies one can use a binations makes t h e exact solution too complex.
dynamic programming approach, where the state corre- However it is possible to make approximations that s i n
sponds to the state o f t h e system, and branches (each cor- plify the search for the optimal solution. Good heuristics
responding to a choice of quantization) have associated a include the use of so-called monotonicity assumptions (a
Lagrangian cost that combines the rate and distortion for more finely quantized predictor typically results in
the given parameter choices. smaller prediction error) [ 801, o r greedy approaches
Tree-based dependency. A second example of depend- where, for example, only a few quantization choices are
ency can be seen when we maljze the effect of motion com- kept at any given stage. T h e problem can also be alleviated
pensation in a n MPEG framework. After m o t i o n by resorting to models ofthe dependent R-D characteris-
compensation, the encoder transmits the difference be- tics so that not all the operating points in the tree need t o
be explicitly computed [48, 571, o r by considering m o d - we introduced, namely, budget a d dclay constrained al-
els of the rate [47,60] and assuming that the quantization locations, and refer the reader to Boxes 2, 3, 4.
scale provides a good estimate of quality.
e continue the thread from Box 3 in our quest for de- with the quantizer that minimizes the r”Aistortion trade-off
W s i g ,wing adaptive transforms based on wavelet expan- (for a fixed “quality factor” h ) :
sions that arc K-D optimized.
Let us first address the cost fiinction, namely the R-D func- ](node) = min [D(node)+ ?&(node)].
qtcn,inzcr
tion. Returning to the problem of jointly finding the best
combination of wavelet packet (WP) transform (or basis) and Note the implication ofthis step-we do not yet know if
the cluantization ami entropy-coding choices, we assume that an arbitrary tree node will be part of our desired optimal
arbitrary (finite) quantization choices are assumed available to subtree choice, but we do know what quantization choice to
quantize the W1’ coefficients in each tree node (see Figs. 9 and use for that node $it is part of the best-basis subtree. This is
21), with both rate (K) and distortion (D) being assumed to particularly satisfying because it has eiiabled us to decouple
be additive cost metrics over t h e W P tree: i.e., the best quantizer/ b a s i s choice without sac r i fi c i ng
c
R(tree)= R(1eaf nodes); and D(tree) = D(1eaf nodes). As
an example, the commonly used first-order entropy and MSE
optimality .
We now have remaining only the unfinished business of
measures for R and I)satis@ this additivity condition. finding the best basis. The special trce structure of the basis
Turning now to the fast-search problem, one possible ap- can bc exploited in formulating a fast tree-based search strat-
prvach to finding the best tree is the “grecdy tree growing” al- egy. The idea is to use a bottom-up recursive “split-merge”
gorithm, which starts at the root and divides each signal in two decision at each node, corresponding to whether it is costlier,
if it is profitable to do so (ifthe cost oftlie subsignals generated in the Lagrangian sense, to keep thc parent node or its chil-
is less than the cost ofthe sigiial they come from). It terminates dren nodes. This Fast dynamic programming (DP) based
when no more profitable splits remain. It is easy to determine pruning method is also optimal because the signal subspace
that this, however, does not find the globally optimal tree, spanned by the parent node is the direct sum of the signal
which is found by starting at the deepest level of the tree and subspaces spanned by its children nodes thanks to the
pruning pairs of branches having higher total cost than that of orthogonality of the filter bank. We now describe the details.
their parent. Assume known the optimal subtree from a tree node n “on-
We iiow describe the details using a 1-D case for simplicity wards” to the full tree-depth log N.Then, by Rcllman’s
(see Fig. 21). The idea is to first grow the full (STFT-like) tree optimality principle of DP [74] (see also Box 7), all surviving
(see Fig. 21(a)) to f d l depth (or some maximum fixed depth in paths passing through node n must invoke this same optimal
practice) for the whole signal. Note that due to the “finishing” path. There are only two contenders for the “sur-
tree-structure of the bases, wc now have available the W1’ coef- viving path” at every node of the tree, the parent and its chil-
ficients corresponding to all the bases on our search list. That is, dren, with the winner having the lower Lagrangian cost.
ifwe grow the coefficientsofa depth-5 tree, we know the coetti- That is, starting from the full trce, the leaf nodes arc rccur-
cicnts associated with all subtrees grown to depth-5 or less. sively subjected to an optimal split-mergc decision, follou-
The next step is to populate cach W1’ tree node with the ing a policy of:
minimum Lagrangian cost o\7er all quantization choices for
that tree node. This minimum cost at each node is associated Prune iF: ](parentnode) 5 [ j(child1)+ j(child2)b
tivc-transform counterparts such as those based on wave- sclection o f optimal points. Roth these approaches
let pickets o r adaptive wavelet packets [ 1021. concentrated on the independent allocation case, the
dependent case was considered by [ 801 with applica-
tions provided to M P E G coding scenarios. M o r e re-
Delay-Constrained Allocation Problems
Kox 7 introduces the problem o f delay-constraiiied cent w o r k has also considered R-1)-optimized M P E G
allocation. This class o f problems, a s also outlined in coding using models o f the R-1) characteristics to re-
Formulation 4, is typically encoiintered in video t r a m - duce the complexity [48]. T h e rdte-control problem
mission under delay constraints. T h e m o r e traditional can be fhrmulated n o t o n l y in terms of selection of.
view of the problem is as a bufyev control problem but, quantization parameters as in the <ibo\.ereference but
as described above and in [44, 491, the delay con- also in terms of selection of the best types of frames ( I ,
straint is more general. Rate-distortion techniques 1’ o r K ) as in [ 1061.
have been applied to the rate control under CKR A second area in rate-control research is that of control
transmission conditions. For example, [46] provides for VBK channels. Here we can consider two cl~ssesof
an overall optimal solution using dynamic prograni- problems. First, in some cases it is possible for the en-
m i n g as well as Lagrangian based approximations. A n coder to select both the source and the channel rate;
alternative formulation is to consider the buffering where the selection of channel rate may be subject to con-
constraints as a set of budget constraints a s in [67]. straints, such as, for example, the policing constraints in
T h e tradition a 1 di rec t - feedb ac k mec h a 11 is 111 used in a n ATM network. Examples of this type of optimization
buffer-control algorithms [ 1041 where quantization include [49], which employs dynamic programming
scale is controlled by buffer fullness is replaced in techniques, and [SO],which utilizes multiple budget con-
1051 by a feedback mechanism that controls instead straints and a Lagrangian approach. Other approaches in-
the \ d u e of the Lagrange multiplier to be used in the clude [ S l , 1071.
(Left Child)
t both the best basis and the best quantization choice are now
known!
Of course, this corresponds to a particular choice of h,
which was fixed during this tree-pruning operation. Unfor-
Rc, tunately, this h may not be the correct one: we want the one
that corresponds to the target bit budget K . However, due
to the convexity of the rate-distortion curve, the optimal
slope h’ matched to the desired R can be easily obtained us-
(Parent Node) ing standard convex search techniques; e.g. the bisection
m e t h o d o r Newton’s m e t h o d or o t h e r s t a n d a r d
root-solving methods. An important point of note is that
(Right Child)
t ISlopekh
the Lagrangian method can only obtain solutions that re-
side on the convex-hull of the rate-distortion curve, and,
thus, a target rate whose optimal operating point is not on
the convex hull will be approximated by the nearest con-
vex-hull rate. In practice, for most practical coding applica-
RC2 tions, the convex hull of the R-D curve is dense enough that
this approximation is almost exact.
We will now summarize the single-tree algorithm:
(D, + DC2)+ W,,
+ R&DP + LRJ A Grow a full-balanced (STFT-like)tree to some desired fixed
depth (i.e., find all the WP coefficientsassociated with all bases
in the library);
A For a fixed h, populate each node of the full tree with the
A 2 1. The single-tree algorithm finds the best tree-structured best Lagrangian cost D + hR over all quantizer choices (i.e.,
wavelet packet basis for a given signal. (a) The algorithm find the best quantizer choice for each node);
starts from the full STR-like tree and prunes back from the A Prune the full tree recursively, starting from the leaf nodes
leaf nodes to the root node until the best pruned subtree is (i.e., find the best-basis subtree);
obtained. (b) At each node, the split-merge decision is made A Iterate over h using a convex search method to meet the tar-
according to the criterion:prune if get bit rate (i.e., match the best subtree/quantizer choice to the
I@arentnode) 2 [I(child)+ I(child2)l desired bit budget).
T h e second class of problems is that where the chan- tions driving these techniques and provide some pointers
nel rate is subject to random variations o r can vary from to recent activities in the field.
link to link in a networked transmission. I n [ 1071 R-1)
methods are provided to reduce the bit rate of encoded
Background and Problem Formulation
data without requiring that it be decoded and With the explosion in applications involving image and
recompressed. In [52,63, 108,1091 approaches based on video communication, such as those afforded by the
dynamic programming and Lagrangian optimization are boom in multimedia- and Internet-driven applications, as
presented to address the problems of transmission in well as those afforded by emerging applications like cable
burst-error channels such as those eticouiitered in a wire- modems and wireless services, the image coiiitiiLiiiication
less-transmission environment [ 1 101. problem has rccently assumed heiglitcned interest and
importance, as visual data represents b!~ far the largest
percentage of multimedia traffic.
The Role of R-D Techniques in Joint A natural question to ask is: why d o we need to
Source-Channel Coding re-invent data coniniunications just bccause of the cur-
W e have thus far focused (with the exception of the p r e v - rent multimedia explosion? There are se\Feral reasons to
ous paragraph) o n rate-distortion methods for source revisit the existing paradigms and systems. The primary
coding when dealing with image and video sources. We one is that current communication link designs are pri-
n o w briefly address the problem of the applicability of marily mismatched for image and video sources as they
such methods for the bigger problem of image and video fail to account for important source considerations such
transmission, specifically in t h e c o n t e x t of joint as (i) highly time-varying source and channel characteris-
source-channel coding. Kox 5 highlights the essence of tics, (ii) high source tolerance to channel loss, and (iii)
the problem. Here, we take a look at some of the applica- unequal importance of transmitted bits. This comes from
8. N. Jayant, J. Johnston, and R. Safranek, “Signal compression based on mod- 31 A Gersho and R M Gray, VectorQnantzzatzonand Swnul Compresszon
els of human perception,” Proc. of the BEE, Oct. 1993. Kluwer Academc Publishers, 1992
9. T. Berger, Rate-Distom’on Theoly. A Mathematical Theory$? Data Conzpres- 32 J Mitchell, W Pennebaker, C E Fogg, and D J LeGall, M E G Video Com-
sion. Prentice-Hall, 1971. presswn Standard New York Chapman and Hall, 1997
10. R.M. Gray, Source Coding Theoyi. IUuwer Academic Publishers, 1990. 33 A N Netravab and B G HaskeU, Dzgztal Pzctures Representatzon, Compres
11.W. Bennett, “Spectra of quantized signals,” Bell Sys.TechJ., vol. 27, pp. swn and Standards, New York Plenum Press, 2nd ed , 1995
446-472, Jul. 1948. 34 R J Clarke, Bgztal ConzpresszonofSaL1Imagtx and Video Academc Press, 1995
12. P. Zador, Development and Evaluation of Proceduresfor Quantizing 35 V Bhaskaran and K Konstantiiudes, Image and Vzdeo Compresszon Stan-
Multivariate Distvibutions. PhD thesis, Stanford University, Stanford, CA, dards A&mthms and Archztectures ICluwer Academic Publishers, 1996
1964.
36 K R Rao and J J Hwang, Technzque &StandardsforImage, fide0 &Audzo
13. A. Gersho, “Asymptotically optimal block quantization,” LEEE Trans. on
Codzng Prenuce Hall, 1996
, m 2 3 , pp. 373-380, J ~ I 1979.
I%$. ~ h .vol. .
37 B G. Haskell, A Pun, and A N Netravali, Dzgztaal Vzdeo An Introductzont o
14. D.L. Neuhoff, ‘The other asymptotic theory of source coding,” inJoint
MPEG-2 Chapman and Hall, 1997
IEEEDLMACS Wodz.rhop on Coding and Quantization, (Rutgers University,
Piscataway, NJ), Oct 1992. Also available as ftp://ftp.eecs.umich.edu/peo- 38 T U - T (formerly CCI’IT), “Video coding for low bitrate communication,”
ple/neuhoff/OthAsymptThy.ps. ITU-T Recommendauon H 263, version 1,Nov. 1995, versioii 2, Jan
1998
15. T.M. Cover and J.A. Thomas, Elements oflnfomation Theory. New York:
Wiley, 1991. 39 S Wu and A Gersho, “Rate-constrained picture-adapuve quantization foi
16. N. Farvardin and J.W. Modestino, “Optimum quantizer performance for a JPEG basehne coders” in Proc IEEE Intematzonal Conference Acoustzcs,
class of non-Gaussian memoryless sources, IEEE Trans. on Info. Th., vol. Speech and Sgnal Processzn., ICASSP’93, vol 5, (Minueapohs, MN), pp
Il-30, pp. 485-497, May 1984. 389-392, April 1993
17. P.A. Chou, T. Lookabaugh, and R.M. Gray, “Entropy-constrained vector 40 K Ramchandran and M Vetterli, “Rate-distoruon opnmal fast
quantization,”LEEE Trans. A S P , vol. 37, pp. 31-42, Jan. 1989. thresholdmg with complete JPEG/MPEG decoder compatibihty,”IEEE
Trans on Iwge Proc ,vol 3, pp 700 704, Sept 1994
18. M. Effros, “Optimal modeling for coinplex system design,” this issue, pp.
51-73. 41 M Crouse and K Ramchandran, “Joint thresholdmg and quantizer selec-
19. P.H. Westerink, J Biemond, and D.E. Boekee, “An optimal bit allocation tion for transform image-codmg Entiopy-constrained analysis and applica-
uons to basehne JPEG,” IEEE Transactzons o n Image Processzng,vol 6, pp
algorithm for sub-band coding,” Proc. ofICASSP, pp. 757-760, 1988.
285-297, February 1997
20. J.M. Shapiro, “Embedded image coding using zerotrees of wavelet coeffi-
cieiits,”LEEE Trans. on S&nal Proc., vol. 41, pp. 3445-3462, Dec. 1993. 42 K Ramchandran andM Vetterh, ‘%est wavelet packet bases m a rate-&stomon
sense,”LEEE Trans on I q e P r o c , vol 2, pp 160.175, Apr 1993
21. A. Said and W.A. Pearlinan, “A new fast and efficient image coder based on
set partitioning in hierarchical trees,” IEEE Trans. Circuits and Systemsf i 43 M W Garrett, “Statistical analysis of a long trace of VBR coded vldeo ”
Video Technology, pp. 243-250, June 1996. Ph D Thesis, Chapter IV,Columbia University, 1993
22. Z. Xiong, K. Ramchandran, and M.T. Orchard, “Space-frequency 44 AR Reibman and B G Haskell, “Cc”amts on vanable bit-ratendeo for ATM
quantization for wavelet image coding,” IEEE Trans. on Image Proc., vol. 6, networks,”lEEE Trans on CASfiwdeo tech, vol 2, pp 361-372, Dec 1992
pp. 677-693, May 1997. 45 S -W Wu and A Gersho, “Rate-constramed optimal block-adaptive codmg
23. R.L. Joshi, H. Jafarlhani, J.H. Kasiier, T.R. Fischer, N. Farvardin, M.W. for &gital tape recordmg of HDTV,” IEEE Trans on Czrcuzts and Sys $Y
Marcellin, and R.H. Bamberger, “Comparison of different methods of das- Video Tech ,vol 1, pp 100-112,Mar 1991
97. G.M. Schuster and A.K. Katsaggelos, “An optimal boundary encoding 115 W F Schreiber, “Considerations in the design of HDTV systems for ter-
scheme in the rate-distortion sense,”IEEE Trans. on Image Proc., vol. 7, pp. restrial broadcasting,” SMPTEJournal, pp 668-677, Sept 1991
13-26, Jan. 1998. 116 K Ramchandran, A Ortega, I<M Uz, and M Vetterli, “Mularesolution
98. G.M. Schuster and A.K. Katsaggelos, “A video compression scheme with broadcast for digital HDTV using joint source-channelcoding,” LEEEJ on
optimal bit allocatioii among segmentation, displacement vector field, and Se1 Arearzn Comm ,vol 11, pp 6-23, Jan 1993
displaced frame difference,” IEEE Trans. on Image Proc., vol. 6, pp. 117 S McCanne, M Vetterli, and V Jacobson, ‘low-complexity vldeo cod-
1487-1502,Nov 1997. ing for receiver-driven layered multicast,” IEEEJ on Se1 Areas zn Comm ,
99. C. Herley, J. Kovacevic, I<. Ramchandran, and M. Vetterli, “Tilings of the vol 15, pp 983 1001, Aug 1997
time-frequencyplane: construction of arbitrary orthogonal bases and fast
118 J Hagenauer, “Rate-compauble punctured convoluuonal codes (rcpc
tiling algorithms,” IEEE Trans. on Sknal Proc., SpecialIssue on Wavelets, vol.
codes) and their applications,” IEEE Tram on Comm ,vol COM 36, pp
41, pp. 3341-59, Dec.1993.
389-400, Apr 1988
100. Z. Xiong, IC. Ramchandran, C. Herley, andM.T. Orchard, “Flexible time
119 K Ramchandran and M Vetterli, “Multiresolution Joint Source-Channel
segmentations for time-varyingwavelet packets,” LEEE Trans. on Sknal
Coding for Wireless Channels,” in Wzreles Communzcatzons A Sgnal Pro-
Proc., vol. 45, pp. 333-345, Feb. 1997.
cesszng Pmspectzve, Edcors V Poor and G Wornell Prentice-Hall, 1998
101. Z. Xiong, M.T. Orchard, and I<. Ramchandran, ‘Wavelet packet image
120 B Schafer, ‘Terrestrial transmission of DTVB signals the European speci
coding using joint space-frequency quantization,” IEEE Trans. on Image
ficauon,” in InternatzonalBroadcastzng Conventzon, no 413, September 1995
Proc., pp. 892-898, June 1998.
102 K. Ramchandran, M. Vetterli, and C. Herley, “Wavelets, subband coding 121 H Kumazawa, M Kasahara, and T Namekawa, “A construction of vec-
and best bases,” Proceedings ofthe IEEE, pp. 541-560, April 1996. Special tor quantizers for noisy channels,”Electron Eng Japan, vol 67-B, pp
Issue on Wavelets: Invited Paper. 39-47, 1984
103. R. Coifman and V. Wickhauser, “Entropy-based algorithm for best basis 122 K A Zeger and A Gersho, “Zero redundancy channel codmg in vector
selection,” IEEE Trans. Infirn?. Theoly, vol. IT-38, pp. 713-718, Mar. 1992. quantmtion,”LEEE Elecpon Letters, pp 654-655, June 1987
104. C.-T. Chen and A. Wong, “A self-governingrate buffer control strategy 123 I Kozintsev and K Ramchandran, “Robust image transmission over en-
for pseudoconstant bit rate video coding,” IEEE Trans. on Image Proc., vol. ergyconstrained time-varying channels using mularesolution joint
2, pp. 50-59, Jan. 1993. source-channelcoding,” IEEE Trans on Szpal Processzng, Speczal Issue on
Wavelets and Fzlter Banks, vol 46, pp 1012-1026,April 1998
105. J. Choi and D. Park, “A stable feedback control of the buffer state using
the controlled Lagrange multiplier method,” IEEE Trans. on Image Proc., 124 M W Marcellin and T R Fischer, “Joint trelhs coded quantization and
vol. 3, pp. 546-558, Sept. 1994. modulation,” IEEE Trans on Comnz ,Jan 1993
106. J. Lee and B.W. Dickinson, “Rate distortion optimized frame-type selec- 125 I Kozintsev and I< Ramchandran, “A hybrid compressed-uncompressed
tion for MPEG coding,” IEEE Trans. on Circ. and Sys.for Video Tech., vol. 7, framework for wireless image transmission,))in Proc of ICIP’97, (Santa
pp. 501-510, June 1997. Barbara, California), Oct 1997