0% found this document useful (0 votes)
33 views28 pages

RDT Ortego

This document provides an overview of rate-distortion theory and techniques for resource allocation in image and video coding. It begins with a discussion of classical rate-distortion theory and its establishment of fundamental performance bounds. However, these theoretical results have limitations when applied to complex sources like images and video. The document then introduces Lagrangian optimization and dynamic programming as popular techniques for rate-distortion optimized resource allocation in practical coding scenarios.

Uploaded by

karthikbgm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views28 pages

RDT Ortego

This document provides an overview of rate-distortion theory and techniques for resource allocation in image and video coding. It begins with a discussion of classical rate-distortion theory and its establishment of fundamental performance bounds. However, these theoretical results have limitations when applied to complex sources like images and video. The document then introduces Lagrangian optimization and dynamic programming as popular techniques for rate-distortion optimized resource allocation in practical coding scenarios.

Uploaded by

karthikbgm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

n this article we provide an overview of we start by discussing classical rate-distortion theory and

rate-distortion (R-D)based optimization techniques briefly outllining some of its fundamental contributions
and their practical application to image and video related to its establishment of the fundamental perfor-
coding. We begin with a short discussion of classical mance boiunds on compression systems for specific
rate-distortion theory and then we sources. This will lead us to show
show how in many practical coding
scenarios, such as in stan- From Information Thheov the potential limitations of these
theoretical results when it comes
dards-compliant coding environ- timization of to dealing with complex sources
ments, resource allocation can be such as images or video, thus set-
put in an R-D frameworlc we then CO tandlards ting the stage for the more restric-
htroduce two popular techniques ti& R ~ D optimization
for resource allocation, namely, Lagrangian optimization frameworks to be discussed in later sections.
and dynamic programming. After a discussion of these two
techniques as well as some of their extensions,we conclude
with a quick review of recent literaturein these areas citing a C/ussfcu/R-D theory
number of applicationsrelated to image and video compres- The starting point of classical rate distortion (R-D) the-
sion and transmission.We also provide a number of Illustra- ory can be found in Shannon’s seminal work [3, 41,
tive boxes to capture the salient points of the article. whose 50th anniversary is being celebrated this year.
Rate-distortion theory comes under the umbrella of
source coding or compression, which is concerned with
the task of haximaliy stripping redundancy from a
From Shannon to MPEG Coding source, subiect to a fidelitv criterion. In other words.
, I

Recent years have seen significant research activity in the rate-distortion theory is concerned with the task of repre-
area of R-D optimized image and video coding (see [ 1,2] senting a source with the fewest number of bits possible
and many of the references in this article). In this section, for a given reproduction quality.

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 23


1053-58S8/98/$10.000 1998IEEE
tem are correctly exploited, original and decoded images
will be almost indistinguishable.In the lossy case one can
thus trade-off the number of bits in the representation
stness (the rate) with the fidelity of the representation (the dis-
tortion). This, as noted by Shannon, is a fundamental
trade-off as it states the question: how much fidelity in the
representation are we w i h g to give up in order to reduce
the storage (or the number of bits required to transmit
the data) ? The main purpose of this article is to survey and
Source representation is a rather vague proposition overview how these R-D trade-offs are talcen into account
unless we first establish what a “source” is. For example in practical image and video coders, thus clarifying how
we can consider a source to be one particular set of data (a these information-theoretic techniques have had an im-
text file, a segment of digitized audio, an image or a video pact in everyday practice. Along the way we will discuss
clip). Alternatively we can consider a class of sources that several coding problems that are typically solved using
are characterized by their statistical properties (text files R-D techniques and will introduce the optimization tech-
containing C code, speech segments, natural images or niques (such as Lagrangian optimization and dynamic
videoconferencing sequences). When one considers a programming) that are becoming an essential part of the
class of sources, it is clear that efficient source coding en- coder designer toolbox.
tails talung advantage of the “typical” behavior within Although R-D theory, as stated earlier, comes under
that class. For example, this means that techniques that the umbrella of source coding, it is important to note that
work well for speech may not get us too far if applied to the theory is applicable also in the more general context of
video. However, even a narrowly defined “class” of inputs data transmission over a noisy channel. This is due to
will likely show significant variations among inputs (e.g., Shannon’s celebrated separationprinczple of digital com-
different scenes in a video sequence), and thus techniques munication, where he proved the optimality of dividing
that allow an “input-by-input” parameter selection are the problem of optimal transmission of inforniation (op-
likely to be superior to those that result in a “one size fits timal in the sense of most efficient use of available re-
all” coding for all inputs in the class. In this article, we will sources such as power, bandwidth, etc.) into that of (i)
present techniques that strive in some sense to attain the representing the information efficiently and then (ii) pro-
best of both worlds to be achieved: the coding scheme is tecting the resulting representation so that it can be trans-
designed based on typical features of a class of signals, but mitted virtually loss-free to a receiver. We will see inore
the coding parameters, within the selected codmg frame- about this a little later (see Box 5). The idea seems to be
work, are chosen on an input-by-input basis to optimize intuitively good, as signal representation issues appear to
the particular realization in the statistical class of interest. be inherently different froin those involved in efficient
Compression can be achieved with “lossless” tech- digital communication. There are many practical situa-
niques where the decoded or decompressed data is an ex- tions in which separation holds (and even in situat
act copy of the original (as is the case in such staple where it does not, this “divide and conquer” approach
software tools as zip,~zip,or compvess).Lossless compres- provides a good way to taclde a problem!). The impact of
sion is important where one needs perfect reconstruction this result is hard to overestimate as it has set the stage for
of the source. However, this requirement also makes the design of all current digital communications systems.
compression performance somewhat limited, especially Still, there are, as we will see in Box 5 and later in this
for applications where the amount of source information article, important cases in which joint consideration of
is voluminous, bandwidth or storage constraints are se-
source and channel (i.e., ignoring separation) may in fact
vere, and a perfect rendition of the source is overkll. As be useful.
an example, consider terrestrial broadcast (at about 20
Mb/s) of HDTV (raw bit rate of over 1 Gb/sec) that Distortion Measures: The Elusive Problem
would require a compression ratio exceedmg 50: 1,which The issue ofwhat distortion measures are more suitable for
is at least an order of magnitude in excess of the capacity speech, audio, images, or video has been the object of con-
of the best lossless image compression methods. tinuing study for as long as dgital representation of these
In such scenarios, “lossy” compression is called for. signalshas been considered. Clearly, since these sources are
Higher compression ratios are possible at the cost of im- encoded and transmitted to be ultimately played back or
perfect source representation. The trade-off between &splayed for a human listener/observer,a distortion mea-
source fidelity and coding rate is exactly t h e sure should be consistent with what the subject can ob-
rate-distortion trade-off. Lossy approaches are preferred serve or hear. Thus, distortion ineasures that correlatewell
for coding of images and video (and are used in popular with the perceptual impact of the loss should be favored.
compression algorithms such as JPEG [51j . Compression Leaving aside obvious dfferences in perception between
is lossy in that the decoded images are not exact copies of individuals, finding a general enough, not to mention eas-
the originals but, if the properties of the human visual sys- ily computable, measure of perceptual quaky has proven

24 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


to be an elusive goal. Thus, in practice, simple and percep- boundaries between achievable and nonachievable regions.
tually sound design rules are applied, wherever perceptual However, the bounds may not be tight for situations of
quality measures are unavailable or too complex. For ex- practical relevance (e.g., relatively low rate and s m d block
ample, known characteristics of human perception lctate sizes).Moreover, generally these bounds are not construc-
that not all frequencies in an audio signal or an image have tive. S d , ifbounds could be computed they would provide
the same importance. With these design rules in mind, ap- useful information to benchmark specific applications.
propriate frequencyweighting can be introduced at the en- Unfortunately, to derive bounds one needs to first
coder. After the perceptualweighting has been performed, characterize the sources and this can be problematic for
an optimized encoder can still be used to minimize an ob- complex sources such as video. Indeed, bounds are likely
jective lstortion measure, such as for example the mean to be found only for the simpler statistical source models.
squared error (MSE). For example, bounds have been known for independent
It is worth noting that whle it is typical to dismiss MSE identically distributed (i.i.d.) scalar sources with Gaussi-
as being poorly correlated to human perception, systems an,Laplaaian, or generalized Gaussian distributions. The
built on the above philosophy (i.e., based on a perceptually latter dstribution is fairly general and can be used to
meaningful framework) can be optimized for MSE perfor- model numerous real-life phenomena; it includes both
mance with excellent results not only in MSE (as one the Gaussian and Laplacian distributions as special cases
would hope should be the case) but also in terms of percep- and provides a family of probability-density functions
tual quality. An example of this observation can be found where, for a given variance, a shape parameter can be se-
in the current JPEG 2000 image compression standardza- lected to match a range of statistical behaviors, from
tion process, which seeks to replace the current JPEG stan- heavy tailed to fast-decay probability-density functions.
dard [61. The comparisons made at the Sydney meeting in The R-D function itself is known in closed form only for
November 1997 showed that coders that incorporated Gaussian sources, while for other dstributions one would
techniques to minimize the MSE were ranked at the top in have to resort to numerical optimization methods; e.g.,
bothperceptual and objective tests [71! the Blahut-Arimoto algorithm [ 151.
However, it is also important to realize that significant An interesting and useful special case in R-D theory re-
gains in objective (e.g., average MSE) quality may not fers to the use of scalar quantizers, where the samples are
translate into comparably significant gains in perceptual quantized one at a time rather than as a collection or vec-
quality. Since the success of a particular codng applica- tor. The theory of optimal scalar quantization (how to de-
tion ultimately does not depend on objective quality mea- sign scalar quantizers with performance sufficientlyclose
sures, it will be necessary to determine at the design stage to the bounds) has been widely studied. For these simple
if any applicable optimization approaches can be justified sources, practical quantizers with various degrees of ap-
in terms of the trade-off between implementation cost proximation to the optimal values are available [ 161. The
and perceptual quality. This further emphasizes the need simplest kind of scalar quantizer is the uniform scalar
to incorporate perceptual criteria into the coder design so quantizer, where the quantizer step sizes are uniform.
that any further optimizations of the encodng have to More sophisticated extensions include fixed-rate
choose among “perceptually friendly” operating points. nonuniform quantizers (where each quantization level is
In the remainder of the articlewe will assume that MSE, represented with the same number of bits), also known as
or some suitablyweighted version ofMSE, has been used. Lloyd-Max quantizers, and the variable-length
We refer to [81 and references therein for a review of per- nonuniform quantizer dubbed as t h e en-
ceptual-coding issues for audio, images, and video. tropy-constrained scalar quantizer, as well as its exten-
sions to vector quantization [ 171.
Optimality and R-D Bounds While the above-mentionedtechniques deal with opti-
Rate-distortion theory [9] has been activelystudied in the mal quantization strategies for a given source distribu-
information-theory community for the last SO years. The tion, when dealing with complex sources such as images
focus of this study has been to a large extent the derivation and video signals, the question of what the right source
ofperformance bounds; that is, determining the region of distribution should be involves accurate source model-
achievable points in the rate distortion (or bits-fidelity) ing. It is therefore important to consider both issues, and
trade-off for certain limited statistical source classes. optimizing image or video coding performance in fact
One can distinguish between two classes of bounds, consists of two steps:
those based on Shannon theory [3,9,10] and those de- 1. Given a particular type of data, say an image, what is the
rived from high rate approximations [ 11-13]. The former appropriate probabilistic, or other, model for that source?
provides asymptotic results as the sources are coded using 2. Given the selected model, and any applicable
longer and longer blocks. The latter assumes fmed input bounds, haw close can a practical algorithm come to the
block siws but estimates the performance as the e n c o h g optimal performance dictated by the bound?
rate becomes arbitrarily large. A comparison between these For image and video coding both steps are equally impor-
two approaches can be found in [14]. Bounds computed taat because models that can adequately capture the statis-
with either set of techniques wdl allow us to determine tical redundancies may not be available (or may be too

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 25


Box 1 - Use of Statistical Image Models for Compression
s mentioned in the “Optimality and R-D Rounds” section, address this question by taking a subband decomposition ofa
A devising a good image coding algorithm involves two im-
portant intellectual components: (i) selecting a sound opera-
typical image (such as the Lena image, (see Fig. l ) ) ,measur-
ing the empirical variances of the different subbands, and
tional model or framework, and (ii) striving to optimize modeling these subbands as i.i.d. Gaussian distributions. Fig-
coding performance in the selected framework. Although this ure 2 shows the theoretically attainable rate-distortion perfor-
mance (using water-pouring concepts from information
theory [ 151).Note that this is the optimal performance theo-
retically attainable using infinite Shannon-style complexity
involving asymptotic random coding arguments based o n in-
finitely long vectors of samples [ 151. Yet, as seen in Fig. 2,
this coder is handsomely outperformed by a low-complexity
modern-day wavelet image coder such as Shapiro’s EZW

,,
Original Lena Our goal is to enable one to
“visualize” the effectiveness
of these models by synthesizing “images” derived from typical
realizations of these assumed models. The advantages of such
an exercise are manyfold: it can not only expose the powers and
shortcomings of different attributes of the test models, but it
can additionally inspire the creation of new and improved
frameworks that can embellish some of the drawbacks.
While practical image coding algorithms have been founded
on a variety of operational frameworks, simple statistical mod-
els have been very popular. For example, early state-of-the-art Gaussian Approximation
~

_ _ SPHlT
subband image coding frameworks were based on i.i.d. models 25
for image subbands (based on Gaussian or Laplacian p.d.f.s) 0.1 0.2 03 0.4 0.5 0.6 0.7 0.
and optimal bit-allocation techniques to ensure that bits were Rate, bpp
optimally distributed among the subbands in proportion to
their importance, as gleaned through the variance of their dis- A 2. Rate-distortion curves achieved with (i) the SPIHT coder
tributions [ 191. A coding algorithm based on such a frame- [2 11; and (io with the Shannon R-D bounds corresponding to
work would no doubt be very efficient at incorporating the an i i d . zero-mean Gaussian model for each wavelet
second component mentioned in the previous paragraph, but it subbands (with empirically measured variances): this results
raises the obvious question about how accurate an i.i.d. in a Gaussian vector source, and water-pouring arguments
Laplacian or Gaussian subband model might be. Let us try to ore used to find the theoretical R-D bounds [15].

complex to even allow us to find a bound!). This point is il- Choosing the Parameters of a Concrete System:
lustrated by the example of Box l. Additional discussions Operational R-D
of approaches for practical modeling of complex data can To guarantee that o u r design will be practical we can
be found in the article by Effros in this special issue [ 181. abandon o u r search for the best utnconstrained R-1)
performance of any system. Instead let us start by
choosing a specific coding scheme that efficiently cap-
R-D Meets R&D: Operational R-D in tures the relevant statistical dqmidciicies associated
Practical Coder Design with the source, while also satisfying o u r system re-
As just discussed, R-D performance is the fundamental quirements of coding complexity, delay and memory.
trade-off in the design o f any lossy compression system. T h e n we can search for the best operating points for
W e have highlighted how fundamental theoretical re- that specific system.
search has resulted in the computation of performance For example consider a scalar quantizer followed by an
bounds but also indicated two major concerns with these entropy coder. This quantizer is completcly defined by its
R-D theoretical benchmarks: quantization bins, the reproduction level for each bin,
1.complexity (how much memory, delay or computa- and the associated codewords for each reproduction level.
tion is required? Can w e construct a practical algorithm to Well-known techniques are then available to find the best
approach the bound?), and choice (in an R-D sense) ofthese parameters for a specific
2. model mismatch (how g d are our modeling nss~unp- (statistical) source. Similar results arc available for other
dons? are they too simple to characterize the sources frilly?). compression schemes (e.g., fixed-rate scalar quantizer,
These will be addressed in the following sections. vector quantizer of dimension N, etc.).

26 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


A 3. Image synthesized from A 4. Here the image is synthe the subband description. If one assumes a random sign for
the statistics of the original sizedagain from theglobal the magnitudes of the coefficients (i.e., use a truly random
image but without sign infor- variancemeasurements but the two-sided Laplacian model), one realizes the “bizarre” image
mation and with a single correctsign information is used, ofFig. 3. Ifthe magnitude is assumed to bc Laplacian distrib-
variance assigned to each of uted but the s&n of the random variable is known, then one
the subbands (i.e., no spa- synthesizes the image of Fig 4., where some of the edge
tially local information is structure becomes faintly exposed. Let us now try a more “Io-
available). cal” model that models thc image subbands as Laplacian dis-
tributed with spatiallV varying variances, given by an i.i.d.

teiprcting the wavclet data as “space-frequency” sets of


coder [20] or its improved variant, the SPIHTcoder [21]; for information, they derive significant performance gains over
example, at a coding rate of0.5 bit per pixel, the SPIHT codcr the early subband coders that treated the data only as “fre-
outperforms the infinite-complexity i.i.d.-Gairssian based quency” sets of information.

For a given system and source, i f w e consider all pos- operating points. The boundary between achievable and
sible quantization choices, we can define an opemtional nonachievable performance is then defined by the convex
rate-distortioncunx. This curve is obtained by plotting hull o f the set of operating points.
for each rate the distortion achieved by designing the From now on, we will consider optimality in the o p e w
best encoder/clecoder pair for the rate. N o t c that these tional sense; i.e., the best achievablc perforiimancc for a given
points are operational in that they are directly achievable soiirce (as described by a training set o r by a given statistical
with the chosen implementations and for the given set o f model) given o u r choice of compression framework.
test data. This bound will allow LIS to distinguish be- Kef?)rcwe address the issue ofinodeling, let us first con-
tween the best achievable operating points and those sider in more detail the problem of optimal cncodcr/de-
that are suboptimal o r unachievable. While the bound coder design. T h e basic design mechanism can be
given by Shannon’s theoretical K-1) function gives n o summarized as follows. First select a piirticular coniprcs-
constructive procedure for attaining that optimal per- sion franie\vorli (for example, a scalar quantizer with eti-
formance, in the operational K-1) casc, wc always deal tropy coding as abovc), then proceed by altcrnativelv
with achievable points. designing the encoder (i.e., the rule to map real-valued in-
A particular case of interest, which we will describe puts to quantization indices) and the decoder (i.e., the nile
later, is one where the encoder can select aniong a fixed to reproduce a particular quantizer index at the receiver).
and discrete set o f coding paramctcrs, with cach K-1) First the encoder is optimized for the given decoder;
point being obtained through thc choice o f a specific i.e., given the reproduction levels at the decoder, an en-
combination of coding parameters. In that scenario, as il- coder is designed that produces a mapping having the
lustrated by Fig. 7, we can plot the individual admissible minimum distortion for a given rate, for the given train-

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 27


co~ildbe fiirther from the truth. I n fact n fiinciamental part
D of designing a coder is the selection of the underlying
~, Convex Hull of
R-D Operating Points model and indeed many choices arc typically a\,ailable.
For example, a so~ircecan be considerect to be scnlar, o r
*I
treated as a set ofvectors, o r we can model the source after
it has been decomposed into its frccpency components,
\ etc. (see Box 1). Each of these approxhes can model the
Set of
X Operating Pts same original data but provides widely varying coniprcs-
sion results.
To first-order approximation, good compression sys-
Performance Bound
for Stationary Source of t e m based on complex models tend to be more complex
Known Distribution to implement (but may provide bettcr performance) than
I systems based on simpler models. A simple illustration of
this rule can be seen when coinpiring scalar and vector
quantizers. Thus, the main difficulty in achie\ring good
K-1) performance for images and \kko is, as exemplified
3 in Box 1, finding a model that is
b
R A simple enough that a compression system matched to
this model can achieve good performance with reason-
A 7. The operational R-D characteristic is composed of all possi- able cost,
ble operating points obtained by applying admissible coding
A but complex enough t o capture the main characteristics
parameters to each of the elements in a particular set of data.
of the source.
This is different from the best achievable performance for a
source having the same statistical characteristics of the data. For example, as seen in Box 1, using a simple i.i.d.
model for each subband as the basis fix o u r coder design
ing source. Then, the decoder is optimized for the given results in very poor coding performance as it does not ex-
encoder; i.e., once inputs have been assigned t o indices, ploit existing spatial redundancies (see Figs. 2 and 3 ) .
we choose the best reproduction for a particular set d i n - However, the (still simple) models that assume correct
dices. This design process iterates between these two local variance and sign information can be seen t o capture
steps until con\’crgence.
1
a great deal of image information (see Fig. 6) and indeed
Variations of this approach (known as the Lloyd Algo- models of this kind underlie many state-of-the-art wa\.e-
rithm, which was initially proposed for scalar quantizers let image coders.
[27, 281) have proved t o be \’er\. popular, even though There are niany approaches t o nchie\ing this dual goal
global optimalin. cannot be guaranteed. Examples of sce- of “matching without excessive complexity,” most of
narios where \&ations of the Lloyd Algorithm have been them based on a simple principle: to replace n sivgle c o n -
applied include eiitrol-‘y-constraiiied quantizers, vector plex inodel bji a multitude of simple nzodcls. This approach is
quantizers [29], tree-structured vector quantizers [ 301 and described in detail in the article by Effros i n this special is-
entropy-constrained vector quantizers [ 171, among other sue [ 181. Here M’C describe one particular instance ofthis
frameworks. Details can be found in [31] and a more de- method, namely, tiansjmz c o d i q .
tailed discussion of these types of algorithms and their kip- The transfc)rm-coding piradigm calls for decomposing
plication is presented in another article in this issue [ 181. the so~irccinto its frequency coniponents using block trans-
f o r m such as the discrete cosine ti-ansti)rni (1X;T) o r
Choosing a Good Model: subbancl coding using now-popuI,ir \i~,ivcletfilters. I t is only
The Transform-CodingParadigm after decomposing the source into its frcquency compo-
In our discussion so far, “best” has been defined not only nents o r bnnds that we appl!. quantization. Thus, \\.e con-
for a particular framework, but also for the given soiircc, a s sider the R-1) trxic-ofli in thc transtimi d o m i n . From the
specified by ;I prohahilistic model o r by ‘1 training set of standpoint of modeling this has the main ,icl\.,intage of al-
representative data. Since any applicahle prob‘ibilistic lowing LIS to use simple niodels, ;IS shonw in 130s l .
models will have to be cierivcd in the first place from a That this app,ach is estrcmcl!~iisefiil c a i he vcrified by,
training set of data samples from the source, here, without how Lvidely it is used in recent ini‘igc and \.ideo coding stan-
loss of generality, we xsiinie sources t o be represented b!, dards, fi-0111 JI’EG to Ml’EG 1/2/4 ~ n 0d1 1 to Jl’EG 2000.
appropriate training sets. Furthermore, in many cases, e.g., We refer the reader t o Roses 2 ;incl 3 h i - two csamples of
when dealing with multidiniensional sources, practical transform coding, based on blocli traisfoi-ms such .IS the
closed-form models may not be available. l X T and wavelet transfc)rnis, respccti\dy. Our goal is t\\‘o-
It would seem that models are inherent properties of fold: to provide intuition as to wh!~these appro”hs ‘ire 1‘
the sources t o be compressed and therefore the coder de- good idea, and secondly to give exmiplcs ofresourcc alloca-
signer has little flexibility in the model selection. Nothing tion issues that arise in transtimii coding t’rCime\\~orlts.

28 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


Box 2 - An Example of Transform Coding: JPEC

e term tr‘insform coding generically describes coding stream. However, tlie metnoryless assumption is typically a
Th techniques where the so~ircedata is first decomposed us-
ing a lincar transform and wlicrc each oftlie frequency compo-
bad one, and significant gains can be had by exploiting tlie
memory in tlie quantized bitstream (e.g., zero \dues tend to
nents obtained from tlie decomposition are then quantized. cluster). A simple way to exploit this is through zero rLinlengtli
A typical transform-based image coder comprises tlie cas- coding. JI’EG L I S ~ Sa two-dimensional entropy code based on
cade of a front-end linear transform followed by a scalar tlie length of the zero-run and the magnitude of tlic noiizero
quantization stage and then an entropy coder. The transform coefficient breaking up the run.
serves tlie dual roles of (i) energy compaction, so that tlie bulk The AC coefticients are processed in a “zig-zag” manlier
of tlie signal energy is isolated in a small fraction of tlie trans- (see Fig. 8) that approximately orders coefficients from lowest
form coefficients and (ii) signal decorrelation, s o that there is to highest frequency. Run-length codes represent tlie se-
little loss in performance due to simple scalar quantization: quence of quantized coefficients as (run, value) pairs, where
this is possible because the set of all transform coefficients rep- “run” represents the number of zero-\~alucdAC coefficients
resenting a given frequency can be, to first order, modeled as a between the current nonzero coefficient and the previous noti-
memoryless soLirce (e.g., i.i.d. Gaussian or Laplacian) for zero coefficient, and “value” is the value (nonzero) of current
which efficient simple quantizers can be found. Tlie scalar cocfficient. A special end-of-block (EOB) code signals the end
quantizer is the lossy part of tlie framework and confines the ofnonzero coefficients in the current block. For tlie example in
representation to ;I discrete set ofindices corresponding to dis- Fig. 8, with three iionzero AC coefficients, the sequence after
crete quantization levels, while the last-stage entropy coder re- run-length encoding is (0,5)(0,3)(4,7)(EOH). The sequence
moves the redundancy in the quantization index stream.
of “rutis” atid “values” is compressed using Huffman or arith-
Commercial image and video compression standards are
nietic codes.
based on tlie discrete cosine transform ( I X T ) . Figurc 8 pro-
vides an example of the most popular mode of operation, the Despite the apparent rigidity ofthe JPEG syntax, there is a
so-called “baseline,” within the JPEG compression standard surprising amount of room for gains attainable with clever en-
[ 51. A briefdescription ofthe JPEG coding algorithm follows. coder optimization [39, 401. The syntax allows for the
The image is decomposed into 8 x 8 blocks for tlic purpose quantization matrix and the entropy coding table to be
of transform, quantization, and entropy coding. Blocks are adapted on a per-image basis as well as for arbitrary compres-
proccssed in a raster scan order and arc transformed independ- sion ratios desired. A more subtle option a\iailable is for the en-
ently using a block DCT. After the DCT, each 8 x 8 block is coder to “dupe” tlie decoder optimally in a rate-distortion
quantized using uniform scalar quantization. Quantization sense while being fully syntax-compatible. As an example,
step sizcs are defined for e x l i oftlie 64 frequency coefficients small nonzero values that brcak up potentially long zero-runs
using an 8 x 8 quantization matrix. Typically, a single are typically very expensive in bit-rate cost in comparison to
quantization table is used for each color component; however, their relative contribution to reducing quantization distor-
up to four different tables may be used ifneedcd. The values of tion. If the encoder can “lie” to the decoder about the magni-
the quantization tables are encoded in the header of the coni- tude of these coefficients; i.e. call these nonzero values zeroes,
pressed tile. Quantization is 11 lossy step; i.e., the information then the decoder is none the worse off, while tlic K-1) perfor-
caniiot be recovered perfectly at the decoder. However, it is mance is significantly increased.
tlie quantization operation that allows one to achieve a high A systematic way of doing this optimally in tlie K-11 sense,
compression rate at tlie price of some quality degradation. termed coefficient thresliolding, has been descrihcd in [41].
Tlie first quantized frequcncy coefficient, called tlie 1 X The good news is that sizeable K-D performance gains, ofthe
coefficient, represents the average sample value in a block and order of 25% in compression efticicncy can be realized ivhde
is predicted from the previously encoded block to save bits. beinp completely faitCfi1 t o the JI’EG yntax. Another article in
Only tlie difference from the previous D C coefficient is en- this issue 121 will provide further evidctice of the practical
coded, which typically is much smaller than the absolute value of K-D techniques in improving the quality in stan-
value of the coefficient. The remaining 63 frequency coeffi- dards-based video coding, where there is even more flexibility
cients (called AC coefficients) arc encoded using onlv tlic
I I
in choice of operating parameters.
data of tlie current block. I
The entropy coder consti-
tutes tlic second basic compo-
nent in tlie K-1) trade-off, ;is it
dctcmmines tlic iiumber of bits
that will be used for a particu-
lar image and quantization set-
ting. Tlie entropy coder is
lossless and it maps each oftlie Image Component
various quantization indices to
given codes. A simple way of
mnpacting tlie quantization
index stream would be to as-
Block-basedCoding
I
iume a nicmor\’less model fix DataOut

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 29


I -
Box 3 Adaptive Transforms Based on Wavelet Expansions
daptivity in signal processing is one of the most powerful dard. The wavelet o r subband decomposition consists of an oc-
Aexception.
no
lid desirable features, and the compression application is
Here we outline the role of having adaptive
tave-band frequency decomposition. It is implemented by a
tree-structured filter bank consisting of a cascade of lowpass fil-
transforms in wavelet image coding. While the DCT is the ters ( H ,( z ) )and highpass filters ( H ,(2))followed by decimators
transform of choice in commercial compression standards (see Figure 9(a)). By recursively splitting the outputs of the
such as JPEG, MPEG, etc., the discrete wavelet transform lowpass branches, we realizc an octave-band frequency decom-
(DWT) has recently emerged as a superior alternative and position associated with a logarithmic filter-bank tree structure
looks set to replace the DCT as the next-generation transform (see Figure 9(c)). This logarithmic frequency decomposition
in the newly emerging JPEG 2000 image compression Stan- property of the wavelet transform gives g o d frequency selectiv-
ity at lower frequencies and good time (or spatial) selectivity at
higher frequencies.This trade-off is well suited to many “natural))
images that exhibit loiig-duration, low-frequency events (e.g.,
background events in typical scenes) and short-duration
high-frequency events (e.g. edges). Other choices of subtrees
lead to other time-frequencyselectivity trade-offs, as seen in Fig.
9(c).These trees, which represent generalizationsof the wavelet
tree, are dubbed in the literature as wavelet packets (seeFig. 9).
The general resource allocation problem for the adap-

.:
tive-transform coding framework involves selecting the oper-
ating point of the combination of transform, quantizer, and

rl“( entropy-coder in order to realize the best rate-distortion


trade-off. Dcpending on the flexibility (and complexity con-
cerns) of the framework, one or all of the above hnctional
components can be jointly optimized. The traditional ap-
proach in compression is to use a fixcd transform (like the
DCT or the DWT) and then choose a quantization strategy
matched to thc properties of thc input process and the fixed
transform. The quantization strategy (bit-allocation policy) is
typically based on a model for the probability-density hnc-
STFT Tree Wavelet Tree

e
& q+- r
-
tioiis characterizing the transform coefficients, or in the ab-
sence of an analytical modcl, from training over a large class of
“typical” possible signals.
As a first step toward attaining an adaptive transform, it is
clear that an improvement can be found ifwe search over the
whole set ofbiiiary trees for a particular filter set, instead ofus-
Arbitrary Wave-Packet Trees ing the fixed tree ofthe wavelet transform. (see Fig. 9 ) .A fast
algorithm, also known as the “single tree” algorithm, to find
the best tree (dubbed the “best basis”) jointly with the best
quantization and ciitropy coding strategy has been described
in [42]. The idea is to search for the best basis (for a
rate-distortion cost function as appropriate for compression)
for the signal from a library ofwavelct packet bases. In order to
A 9. (o)Two-channel decomposition as a node and two branches achieve this, two entities arc needed: a cost function for basis
in the decomposition tree. (b) All possible binory wovelet comparison and a fast search algorithm. Sec the section on ba-
packet decompositions of depth 2. (c) Some typical depth-3 bi- sic coniponeiits in image/video coding algorithms and Box 9
nary wavelet packet subtree decompositions. for a more detailed treatment of these issues.

W h e n considering video sources, w e will need addi- scription can be found in the article by Sullivan and
tional tools to allow us to fully exploit the redundancy Wiegand in this issue [2].
between consecutive frames in a video sequence. Mo- Lack ofspace prevents us from going into more depth,
tion compensation is the most popular approach to but we refer the interested reader to recent textbooks for a
achieve this goal. T h e encoder coinputes the motion general description of the more popular methods and al-
paraiiieters (for example block-based motion vectors as gorithms, and h o w the transform-coding paradigm is put
in MPEG [32j) and the decoder uses those in the recon- to practice [ 5,31-371. W e will revisit the various building
struction framework. A particular framework will spec- blocks in transforin-codiIig frameworks later in the
ify how the motion information is transmitted and h o w article, where we will outline examples of K-1) optimiza-
it is interpreted by the decoder. A m o r e detailed de- tion applied to these algorithms.

30 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


Standards-Based Coding: importantly, the computational complexity will increase
Syntax-ConstrainedR-D Optjmization as well.
Thus far we have considered the complete design of a com- We have now set the stage to define the class of problems
pression system in which for a given set of constraints, we that wdl occupy the rest of this article. We consider that a
fmd encoding and decoding algorithms to suit our needs. general coding framework has been selected, which can ac-
Let us assume now that the decoder has been selected; that commodate different types of sources, and thus specific pa-
is, we have a complete specification of the language that rameters can be selected for each i m . e (and for each desired
can be understood by the decoder, with an accompanying rate) and sent to the decoder as overhead. Our goal is then:
description of how an output (the decoded image or
video) is to be produced given a specific input stream. Formulation 1 - Discrete R-D Optimization:
This scenario is precisely that assumed by most recent in- Parameter Selection for a Given Input
ternational compression standards (PEG, MPEG, H . 2 6 ~ Given aspeciJc encodinfiflaweworkwhere the decoder isfilly
and so on). Motivated by the desire to maximize defined, optimize the encodin. of aparticular iwade, or video
interoperability, these standards provide an agreed-upon sequence in order to meet some rate/distortionobjectives.
&meamPntax that any standard-compliantdecoder will use
to provide an output signal. Agreeing on such standards al- Note that here we are assuming deterministicknowl-
lows encoding/decoding produm fkom different vendors to edge of the input and our goal is to optimize the parame-
talkto one other and has become the preferred way to achieve ter selection for that input. We no longer seek optimality
affordable, widely available digital and video compression. over an ensemble of inputs, but rather confine ourselves
to doing our best for the given input, given the con-
Still, it is not clear a priori how much flexibility the en-
straints imposed by the coding framework. This is a very
coder can enjoy in selecting its modes of operations, if it is
realistic scenario, as it can be applied, for example, to
constrainedby a particular decoder.It turns out that most most image and video compression standards defined to
of these standards are designed to endow the encoder date (e.g., JPEG, [5], MPEG [32] or H.263 [38]) where
with a lot of flexibility and creativity in its selection of the the encoding mode selection can be optimized for each
system parameters, and there is a big gap in performance input. As is shown in the article by Sullivan and Wiegand
between the best choice and the worst. In all stan- in this issue, the potential gains when using these optimi-
dards-based applications the encoder can select parame- zation teclhniques are significant [2].
ters that will result in various levels of R-D performance. However, the selection of the initialcoding framework
This leads to a situation where the number of operating is key to the system performance. No matter how sophis-
points is discrete, and thus the operational R-D bound is ticated the optimization techniques one is willing to uti-
determined by the convex hull of the set of all operating lize (and we will describe some fairly complex ones!), if
points (see Fig. 7). For example, as anybody who has the coding framework is inherentlylimited or flawed, not
used JPEG compression can verify, one can select differ- much improvement will be achievable. Recall that we are
ent rate-quality targets for a particular image and still placing ourselves in an operational R-D framework and
guarantee that these images can be decoded. Likewise, in are thus limited to only those R-D operating points that
video coding, each frame or scene requires a differentrate the initial framework can achieve. Thus, a good coding
to achieve a given perceptual quality and the encoder framework without any form of optimization is likely to
needs to control the codmg parameter selection to enable be superior to a subpar coding approach, no matter
proper transmission (see Box 4). In typical image and whether parameter selection has been R-D optimized in
video coding optimization scenarios such as those involv- the latter scheme. We refer once more to Box 1for an ex-
ing these standards, the encoding taskof selecting the best ample of how the model-selection problem may often be
operating point from a dscrete set of options agreed more important than having an optimized algorithm.
upon a priori by a fxed decoding rule is often referred to In the spirit of operational R-D, we define here the
as syntax-constl.ainedoptimization. The selected operat- “optimal” solution as that achieving the best objective
ing choice is communicated by the encoder to the de- function among all possible operating points. Note then
coder as side-information,typically as part of the header. that we consider finite sets of coding choices at the en-
Having flexibility at the encoder also increases the ro- coder and therefore there exists one achievable optimal
bustness of compression against modeling mismatches. choice of parameters: if all else fails, one could do an ex-
That is, a system designed for a particular source model haustive search comparison of all possible operating
will perform poorly if the source model should differ. A points and choose the best. Obviously, our goal will be to
more robust system can be designed if several compres- find those operating points without a brute-forcesearch.
sion modes are available and the encoder can select
among them. It is important to note that this robustness
and flexibility is not free: the amount of side information
Typical Allocation Problems
needed to configure the decoder appropriately will in- The previous section introduced the framework of dis-
crease if more modes are to be accommodated, and, more crete R-D optimization and Boxes 2, 3,4 and 5 gave us

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 31


Box 4 - Delay-Constrained Transmission and Buffer Control

ost practical video compression algorithms exploit spa-


M tial and temporal redundancy through transform coding
and motion estimation, respectively. However, the degree of
redundancy, and therefore the resulting rate for a given distor-
tion, can fluctuate widely from scene to scene. For example,
scenes with high motion content will require more bits than
more stationary ones. This is illustrated by the sample rate
I I
trace [43] of Fig. 10. This trace is obtained from the movie
Star Wars, and more specifically its opening 4 minutes (as true A 1 1. Delay components in a communication system.
fans no doubt will have guessed). It shows rate changes of
close to an order ofmagnihide when the quantization step size to adjust the source rate, and thus thc delivered quality, to
I
I
is kept constant (a JPEG coder was used to code each frame.) make it possible to use a smaller buffer (shorter delay) without
losing any data. The delays required for thc trace of Fig. 10
160
would be exceedingly long -therefore, in practical applica-
140 1) tions it is necessary to perform vate control to adjust the coding
parameters and meet the delay constraints.
As shown in Fig. 12 it is possible to adjust the video rate
(and the quality) by modif\.ing the quantization step sizes used
for each frame. It is therefore easy to see that rate-control
problems can be cast as resource-allocatioii problems where
the goal is to determine how many bits to use on different
40 s' parts of the video sequence and to d o s o in such a way as to
maximize the quality delivered to the end user. Of course, a
20
natural way to approach these problems is to consider the R-13
0 trade-offs in the allocation, and techniques such as those de-
io00 2000 4W0 6000 7000
0 3o00 5000
scribed in this article have been widely used for rate control
Frame Numbei [45-481.
Note that even in cases where transmission is performed
A 10. Bit rate per frame for initial 4 minutes in the "Star Wars" over a VBR channel, or where the sequence is pre-encoded
trace produced by Mark W. Garrett at Bellcore [43].This trace
and stored (e.g., in a digital versatile disk, (DVD)), it is also
was computed using Motion JPEG (i.e., each frame is en-
necessary to perform rate allocation. For example, to store a
coded independently using JPEG, without motion estimation full-length movie in a DVD it may be necessary to prcanalyze
and with the same quantization parameters for every frame) the movie and then allocate appropriate target rates to the vari-
and it clearly shows very significant variations in rate be- ous parts of the movie. In this way, allocxing more bits to the
tween different scenes. Similar variations can be observed more challenging scenes and fewer to the easier ones will result
when compressing using other algorithms such as MPEG. in a globally uniform quality. K-l)-based approaches for rate
control for VBR channels have also been studied [49-521.
Let us now consider a typical real-time transmission as il-
lustrated in Fig. l l . As just described, video frames require a
variable bit rate and thus it will be necessary to have buf-fcrs at
encoder and decoder to smooth the bit rate variations. As-
suming the video input and video output devices capture and
display frames at a constant rate, and no frames are dropped
during transmission, it is easy to see that the end-to-end delay DCT

in the system will remain constant [44].


Let us call AT the end-to-end delay: a frame coded at time t
has to be decoded at time t + A T . This imposes a constraint on
the rate that can be used for each frame (it has to be low enough
g
-
that transmission can be paranteed within the delay). 0

Consider the case when transmission takes place over a


CBR channel. Of the delay components of Fig. 11 only AT,!,
andATd (the time spent in the encoder and the decoder buffer,
respectively) will now be variable. Consider, for example
Az.,-this delay will be at most B ,,,,,, / C , where B ,,,,,, is the
physical buffer size at the encoder and C is the channel rate in
- Motion
Compensate

bits per second. It is clear that B,,,,, has to be smaller than A T .C


or otherwise we could store in the buffer frames, which will
+
Motion Vectors
then experience too much delay.
Ifwe consider the transmission of a sequence such as that of A 12. Block diagram of a typical MPEG coder. The quantization
Fig. 10, either ( i ) we will have to use very large buffers (and parameter can be adjusted to make the rate comply with the
correspondingly long end-to-end delays), or (ii) we will have channel constraints.

32 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


Box 5 - Rate-Distortion Techniques for Joint Source-Channel Coding

Whimpact
.de rate-distortion-based techniques have had major
on image and video compression frameworks,
a
Separate Design

their utility extends beyond compression to the bigger frame-


work of image and video transmission systems. The problem
of transmitting image and video signals naturally involves Channel
both source coding and channel coding. The image or video Coder
source has an associated rate-distortion characteristic that
quantifies the optimal trade-off between compression effi- Channel
ciency and the resulting distortion. The classical goal of source
coding is to operate as closely as possible to this rate-distortion
bound. Then comes the task of reliably transmitting this
-1
Channel
source-coded bitstream over a noisy channel that is character- Decoder
ized by a channel capacity that quantifies the maximum rate at
which information can be reliably transmitted over the chan-
nel. The classical goal of channel coding is to deliver informa- A 13. Separation principle. Optimaliiy is achieved by separate
tion at a rate that is as close to the channel capacity as possible.
design of the source and channel codecs.
For point-to-point communications with no delay con-
straints, one can theoretically separate the source and channel
cally the end-to-end distortion of the delivered copy of the
coding tasks with no loss in performance. This important in-
source: due to the probabilistic nature of the channel, one has
formation-theoretic result, as stated in the introduction, goes
to typically abandon the deterministic distortion metrics of the
back to Shannon's celebratedseparationprinciple,which allows
earlier formulations and instead consider expected distortion
the separate design of a source compression/decompression
measures. The distortion is due to both source quantization,
scheme and a channel coding/decoding scheme, as long as the
which is deterministic in nature for a fixed discrete set of
source code produces a bit rate that can be carried by the chan-
quantizer choices, and channel noise, which is obviously of a
nel code. The separation principle is illustrated in Fig. 13.
probabilistic nature. This is in contrast to the use of K-1)
It is nevertheless important to recall that information the-
methods for pure source coding, where quantization is the
ory relies on two important assumptions, iiamely (i) the use of
only source of distortion.
arbitrarily long block lengths for both source and channel
codes and (ii) the availability of arbitrarily high computational
Formulation 2 - Joint Source-Channel Coding
resources (and associated delays). It is obvious that such con-
Optimization
ditions are not met in practice, both because of delay con-
Given a speeijic operational transmissionframework involvinfi a
straints and practical limits on computational resources. For
source coder (characterized by its collection ofpossible K-D operat-
certain multiuser communication scenarios such as broadcast
i n . points), a channel coder (characterized by its collection of er-
[53] and multicast, the separation theorem does not apply
ror-cowectin8 strenclths and code rates), and possibly a modem
even theoretically,and there is a need for closer interaction be-
device (characterized by the modem parameters like constellation,
tween the source and channel coding components. A key issue
transmission power, etc. ), optimize the expected end-to-enddeliv-
in the design ofefficient image and video transmission systems
ered imame or video quality subject to a constraint o n the total bit
for these cases therefore involves the investigation of joint de-
rate for sourcc and channel codin.) or the total transmission power
sign of these source and channel coding components.
or energ?,(ifthe modem is included in the optimization box) and
The "Kole of K-D Techniques in Joint Source-Channel
possibly subject to other constraints like bandwidth and delay as
Coding" section will take a more detailed look at typical sce-
well. Alternatively, the expected distovtion can be$xcd and the cost
narios involving joint source-channel coding and will give
fitnction could involve total bit rate or transmission enerfiy.
pointers to some of the more recent work in the field. In this
The bit-allocation problem in the case of joint source-channel
overview box, we will formulate the basic joint source-channel
codin. can be thusjbmulated as$ndinp the optimaldistribution of
coding problem at a high level. It is insightful to note that
bits, H ,,,, between source bits, KT,,t,,tc,
4,,t,r in order to reduce
there are many flavors to this problem depending on the num-
quantization distortion, and channelcodin8parity bits, Kt,,t,,>,!cl,in
ber of jointly designed elements involved. If only the source
order to minimize the expected distortion E(11) due to both the
coder and forward error-correction devices arc involved, the
sourcc quantization and the noisy channel:
resource to be allocated is the total bit rate between the source
and channel codcr. Ifthe modulator-demodulator (modem) is min E ( D ) subject to
il<,l,WC p"IRl,litCI': , ri,n*nri pnminrtci-r )
included in the optimization box, then the transmission power
or energy can become the constraint. The cost function is typi- R<,,zr<t
+ ~d,">,,,d 5 Rb*,&et.

concrete esamples of scenayios where this optimization is t w o classes depending on whether compression is tar-
called for; i.e., where the encoder has to mal<e choices geted for storage or transmission applications. Kcforc
a m o n g a finite set of operating modes. presenting these formulations wc discuss
we no\nl presellt a series of gelleric problem forlnula- issues and introduce the notation.
tions that spell o u t some of the possible constraints the Selection of the basic c o d i n g unit.Until tlow, we
cncoclcr will have to meet \vhen performing this parame- have considered generic R-13 trade-offs where a codiig
ter selection. These problem descriptions are divided into unit, be it a sample, an image block, o r an image, is en-

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 33


when a particular rate is se- functions can easily be computed for each coding unit,
rio it will be necessary to decidebut when our problem involves deciding on the alloca-
granularity to optimize the en- tion for a set of codng units, defining the overali cost
ple, it is possible to consider function requires some additional thought. For example,
basic coding units in a video coding assume distortion is our objective function; then there are
measure frame-wise rate and several alternatives for defining the overall distortion
the sequence, and then decide measure given the individual distortion measures for each
eratlng points. Alternatively one can oper- of the coding units. For example one can assume that
dmg choices for a single minimizing the uve1cuHe distortion is a desirable objective.
being, for example, the 8 But consider now a long video sequence: is it really true
that an average dstortion measure is appropriate?Would
the viewer find more objectionable a result where both
average quality and peal< distortion are higher, as com-
pared to a scenario where the average quality is lower but
so is the worst-case dstortion? These are valid questions
and they justify the need consider alternatives to aver-
age MSE; for example, imax approaches (where the
worst-case distortion is minimized) [541 or approaches
based on lexicographic optimization, which can be seen
as a more general case of minimax [55, 561.
Perceptually weighted versions of these cost functions
can also be accommodated.As in our earlier dscussion of
distortion measures, we should emphasize that large
Complexity. Complexity is a determining factor in as-
gains in a particular objective function (for example
sessing the practicality of the various R-D optimization
MSE) may not always result in comparably large im-
techniques we will describe. Two major sources of com-
provements in perceptual quality, even if carem percep-
plexity can be identified. First, the R-D data itself may
tual weighting has been introduced.
have to be measured from the images and thus several en-
Notation. Let us consider N coding units where each
code/decode operations may have to be performed to de-
c o l n g unit has M dfferent available operating points.
termine the R-D values. (Note that this is consistent with
For each coding unit i we have information about its rate
our assumption of an operational R-D framework, where
rll and distortion dy when using quantizer j . We make no
the goal is to select the best among those operating points
assumptions of any particular structure for the vI1and dy ;
that are achievable.) In order to reduce the computations
we simply use the convention that quantization indices
required to measure rate and distortion for each coding
are listed in order of increasing “coarseness”; i.e. j = 1is
unit, one could resort to models (as, for example, in [48])
the final quantizer (highest and lowest dz,) and j = M
that would be employed in the optimization algorithm
is the coarsest. There are no other assumptions made: for
instead of the actual values. The second source of com-
example, we do not take into account any possible corre-
plexity comes from the search itself. Even if the R-D data
lation between the rate and distortion characteristics of
is known, or has been adequately modeled, we will have
consecutive coding units or assume any properties for 1c
to search for the best operating point and that in itself
anddll. We consider here that the R-D data is known; it &
could be a complicated task.
possible replace measured lcI1 and dll by values that are es-
In addition, complexity depends not only on the timated based on models but this would not affect the al-
number of operations required but also on the delay in gorithms we propose. Examples of rate-allocation
computing the optimal solution and, related to it, the applications that utilize models instead of actual data can
storage required by the search algorithm. Obviously, be found in [47,48, 57-60].
more complex algorithms can be applied in off-line en-
It would be trivial to achieve minimaldistortionifno con-
coding applications whereas live or real-time encoding straints on the rate were imposed. More interestbig issues
will limit the admissible delay and complexity. Complex arise when one tries to achieve the best performance given
algorithms can also be justified if the quality improve- some constraintson the rate. We will formulatetwo classes of
ments are significant in scenarios where encoding is per- closely related problems where the rate constraintsare dnven
formed just once but decoding is done many times. by (i)total bit budget (e.g., for storage applications) and (ii)
Since standards such as MPEG provide a common de- transmission delay (e.g., for video transmission).
coding framework it is possible to develop encoders cov-
ering a range of scenarios; from high-quality,
high-complexity professional encoding to low-cost, Storage Constraints:
low-complexity consumer products. Budge&Constrained Allocation
Cost function. Both lstortion and rate may be part In the first class of problems we consider, the rate is con-
of the objective functions to be optimized. The objective strained by some restriction on the maximum total num-

IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


ber of bits that can be used. This total number of bits tions on total rate but also on the rate availablefor subsets
available, or budget R, has to be distributed among the of coding units. Assume, for example, that a set of images
different coding units with the goal of minimizing some has to be placed in a storage device that is physically parti-
overall distortion metric. For example we may want to tioned (e.g., a disk array) and that it is impossible (or un-
use JPEG as in Box 2 to compress the images within an desirable for performance reasons) to split images across
image database so that they all fit in a computer disk one or more devices. In this case, we will have to deal with
(where now we may be concerned with the aggregate partial constraints on the set of images assigned to each
quality over all the images).This problem can be re-stated particular device, in addition to the overall budget con-
as follows: straint. An optimal allocation that considers only the ag-
gregate storage constraint may result in an invalid
Formula 3 - Budget Constrained Allocation distribution between the storage devices.
Find the optimal quantizer, or operating point, x(i)foreach Consider the case where two storage devices, each one
coding unit i such that of size R, 12, are used. We will have then the following
constraint, in addition to the budget constraint ofEq. (1):

'
NI

' ~ (6) RT 12,


i=l
and some metric f (d,x( , ,d ,
(,, ) ,... ,dNxc ) )is minimized.
where N , is the number of coding units that are stored in
For example, if we are interested in a minimum average the first storage device. N , itself may not be given and
distortion (MMSE) problem we have that may have to be determined.
N

Delay-Constrained Allocation and Buffering


A simple storage-constrained allocation such as that in
Alternatively, a minimax (MMAX) approach [54, 611 Formulation 3 cannot encompass situations where the
would be such that coding units (for example, a series of video frames) are
streamed across a link or a network to a receiver. In this
situation, as outlined in Box 4, each coding unit is subject
to a delay constraint; i.e., it has to be available at the de-
coder by a certain time in order to be played back.
Finally, lexicographically optimal (MLEX) approaches
For example, let a coding unit be coded at time t and
[55] have been recently proposed as extensions of the assume that it will have to be available at the decoder at
minimax solution. The MLEX approach compares two time t +AT, where AT is the end-to-end delay of the
solutions by sorting their distortions or, as in [55], their system. If each coding unit lasts t , seconds, then the
quantization indices. For simplicity, assume the end-to-end delay can be expressed as AN = AT / t , in
quantization indices are used in the comparison, with coding units. For example if a video encoder com-
j = 1being the finest quantizer. Then, to compare two so- presses 30 frames per second and the system operates
lutions, we sort the quantization inQces of all the coQng with an end-to-end delay of AT = 2 seconds, then the
units from largest to smallest: we then compare the re- decoder will wait 2 seconds to decompress and display
sulting sorted lists and we say that the one represented by
the first frame (assuming no channel transmission de-
the smallest number is the best in the MLEX sense. For
lay) a n d at any given t i m e t h e r e will be
example, consider four coding units that receive the fol-
AN = 2 / (1/ 30) = 60video frames in the system (stored
lowing two allocations (1,3,4,4) and (3,2,3,2). After
in the encoder or decoder buffers or being transmit-
sorting, we obtain (4,4,3,1) and (3,3,2,2) and given that ted). The video encoder will have to ensure that the rate
we have 3322 < 4431, the second allocation is the better selection for each frame is such that no frames arrive
one in the MLEX sense. Allocations derived under the too late at the decoder.
MLEX constraint have the interesting property of tend- Given the delay constraints for each coding unit, our
ing to equalize the distortion or the quantization scale
problem becomes:
across all coding units.
In the remainder of the article we will concentrate on Formulation 4 - Delay-ConstrainedAllocation
the MMSE since it is by far the most widely used. Exam-
Find the optimal set of quantizers x(i)such that (i) each cod-
ples of schemes based on MMAX and MLEX can be
in8 unit i encoded at time tiis received at the decoder before its
found in [54, 611 and [55, 561, respectively.
"dcadline" ti + 6 and, (ii)agiven distohon metric,for ex-
ample one of those used in Fomulation 3, is minimized.
Allocation Under Multiple Partial Budget Constraints
A more general version of the problem of Formulation 3 This would be an easy problem if there were no con-
may arise in situations where there are not only limita- straints on the transmission bandwidth (e.g. when read-

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 35


ing video data from a DVD, where peak read-out the relationship between buffering and delay constraints,
bandwidth from the drsli exceeds the maximuni coding we refer to [44,49].
rate for any frame). Note, however, that even if users have We call this the effective size because it defines an
access to broadband channels, by the universal maxim imposed constraint regardless of the physical buffer
that expenditures shall always rise to meet the incomes, size. In general, the applicable constraint will be im-
we may assume that limited bandwidth will be die domi- posed by the smallest of B, (i)and B,, . Assuming that
nant scenario for the foreseeablefuture. (We hope readers sufficient physical buffer storage is available (i.e., B,,
will forgive two video compression researchers for not is always larger than B, (i), certainly reasonable with
claiming otherwise. One of our most respected arid senior the constantly decreasing price of memory) our prob-
colleagues in the source-coding community reassures us lem becomes:
that he has heard for over 30 years how bandwidths are
exploding and there is no more need for compression!) Formulation 5 - Buffer-Constrained Allocation
The complexity of this allocation problem depends on Find the optimal set @quantizersx(i)foreach imch that the
the channel characteristics. Specifically we will need to buffey occ%pancy
laow if the channel provides a constant bit rate (CBR) or
a variable bit rate (VBR),if the channel delay is constant, B(i)= max(B(i - 1)+ y W ( , ) - C(i),O),
if the channel is reliable, etc. For simplicity, in what fol-
is S%Ch that
lows let us assume that we have 6, = A T for all i.
In both CBR and VBR cases, as shown in Box 4, data wd B(i)I B, (2)
be stored in buffers at encoder and decoder. Assume a vasi-
able channel rate of C(i)during the i-thcodmg unit interval.
Then, we w d have that the encoder buffer state at time i E
and some metric f (d, x ( ,d, x ( , ,. . . ,dNx( )is minimized.

It is worth noting that the problems of Formulations 3


B(i)= max(B(i - 1)+ Y~,~) - C(i),O), and 5 are very much related. For example, consider the
case where C(i)is constant and equal to C. In this situa-
with B(0)= 0 being the initial state of the buffer. tion the overall allocation is in both cases constrained by
Let us now consider what constraints need to be ap- the same total budget. Therefore, if the solution to For-
plied to the encoder buffer state (it can be shown that mulation 3 meets the constraints of Formulation 5 it is
controlling the encoder buffer suffices to guarantee also the optimal solution to the latter problem. This fact
that the delay constraints are met [44, 491). First, the can be used to find approximate solutions to the problem
buffer state B(i)caiinot grow indefinitely because ofthe of Formulation 5 as shown in [46].
finite physical buffer. If B,, is the physical memory It is also interesting to note that the constraints depend
available then we will need to guarantee that B(i) < B,, on the channel rates. When the channel rates can be cho-
at all times. In addition, in order for the delay con- sen by the user (e.g., tsansmission over a network), this
straint of Formulation 4 not to be violated, we need to leads to interesting questions on which is the best combi-
guarantee that the data corresponding to coding unit i nation of source and channel rates given constraints on
is transmitted before tl + AT; that is, transmission has the channel rates [49-511? In scenarios where the channel
to be completed during the next AN coding unit inter- is unreliable, we cannot deterministically laow what the
vals. Intuitively, in order for this constraint to be met, future rates wdl be, but it is possible, if channel models are
all we need to ensure is that the future channel rates, available, to replace channel rates by their estimated val-
over the next AN units, are sufficient to transmit all the ues in the above formulation [52, 62,631.
data in the buffer.
Let us define the effective b%fferszze B, ( 2 ) as
The R-D Optimization Toolbox
In thls section we describe in more detail some of the ba-
k=ztl sic techniques that can be applied to the problems that we
have just described. Our goal here is to explain these tools
i.e., the sum of future channel rates over the next AN in- in generic terms. Later sections will provide pointers to
tervals. Then it is easy to see [44, 491 that correct trans- specific work where modified versions of these inethods
mission is guaranteed if have been SuccessfLlUy been applied to a varierty of com-
pression scenarios.
B(i)I B, (i), vi.
As an example, consider t h e case where Independent Problems
C(i)= C = R, / N is constant. Then, if the system oper- We consider first the case where die rate vli and drstortion
ates with an end-to-end delay AN the buffer can store no dq can be measured independently for each coding unit;
c
more than AN . bits at time i. For a detailed analysis of i.e., the R-D data for coding unit i can be computed with-

36 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


out requiring that other coding units be encoded as well. first introduced by Everett [64]. This approach was first
One example ofthis scenario is tlie allocation of bits to dif- used in a source coding application, following the frame-
ferent bloclis in a l3CT image coder such as that of Box 2 work we describe here, by Shoham and Gersho [65] and
\\.here blocks are indivicluallyquantized and entropy coded by Chou, Lookabaugh and Gray [ 17, 301 in tree pruning
(i.e., inter-block prediction is used). Another example and entropy-constraiiied allocation problems. Since then
would be tlic allocation of bits t o video frames encoded by this approach has been used by numerous authors [ 42,
MPEG in Ih’TKA-only mode, o r using motion Jl’EG. 4 5 , 4 6 , SO, 66-68].
It is also useful to note that scenarios that involve any T h e basic idea ofthis technique is as follows. Introduce
prediction o r contest-based coding are by nature “dc- a Lagrangc multiplier h 2 0, a non-negative real number,
pendent” but caii. sometimes be approximated using “in- a n d l e t us c o n s i d e r t h e L a g r a n g i a n c o s t
ctependent” allocation strategies with little performance J i i ( h )= dV + h . vrl. Refer to Fig. 14 for a graphical inter-

loss in practice [49] (see tlie “lkpendency Problems” sec- pretation of the Lagrangian cost. As the quantization in-
tion for a more detailed description of dependent alloca- dex j increases (i.e., the rate decreases and the distortion
tion problems.) Even if malting the independence increases) we have a trade-off between rate and distor-
apProgimation results in performance loss, the dcpend- tion. The Lagrange multiplier allows lis to select specific
cncy effects ‘ire often ignored to speed LIP the computa- trade-off points. Minimizing the 1,agrangian cost
tion. For esample, it is common to consider the allocation d + h . vri when h = 0, is equivalent to minimizing the
ii
of bits to frames in a video sequence as if these could be distortion; i.e., it selects the point closer to the y-axis in
trcated inctepcndently; however, due to the motion esti- Fig. 14. Conversely, minimizing the Lagrangian cost
mation loop, tlic bit allocation for one frame has the po- when h becomes arbitrarily large is equivalent t o mini-
tentinl to affect subsequent frames. mizing the rate, and thus finding the point closest to the
x-axis in Fig. 14. Intermediate \ d u e s of h determine in-
termediate operating points.
Lagrangian Optimization
Then the main result states that
The classical solution for the problem of Formulation 3 is
based o n the discrete version of Lagrangian optimization
Theorem 1 [ 6 4 , 651 If the mappincq x ’ ( i )for
i = 1,2,. . . ,N , minimizes:
Block i

then it is also the optimal solution to the budpet-constmined


problem ofFomulation 3,fovthepn&cular case wheve the to-
tal budget is:

so that:

fov any x satisfiing Eg. (1) with Rmiven by Eq. (3).

Since we have removed the budget constraint of Eq.


( l ) ,for a given operating “quality” h, Eq. (2) can be re-
written as:

so that the minimum can be computed independently for


each coding unit. Note also that for each coding unit i, the
p o i n t o n t h e R-13 characteristic t h a t m i n i m i z e s
A 14. For each coding unit, minimizing d,,(,, + b,x(,)
for a given h j , ,
dtx, + Arty( is that point at which the line of absolute
is equivalent to finding the point in the R-D characteristic that slope h is tangent t o the convex hull ofthe K-I) character-
is “hit“ first by a ”plane wave“ of slope h istic (see Fig. 14).For this reason we normally refer t o

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 37


h as the slope, and since h is the same for every coding Typically the problems we consider involve significant
unit on the sequence, we can refer to this algorithm as a structure in the constraints, and that can guide the search
“constant slope optimization.” for the vector Lagrange multiplier [50, 67, 71-73]. For
T h e intuitive explanation of the algorithm is simple. example, in some cases, these constraints are embedded-
By considering operating points at constant slope we are if there are N coding units, we have a series of c con-
making all the coding units operate at the same marginal straints where constraint lz is a budget constraint affecting
return for an extra bit in the rate-distortion trade-off. coding units 1through nb;i.e., constraintlz limits the total
Thus, thc MSE reduction in using one extra bit for a given rate allocation for units 1 through n b . T h e other con-
coding unit would be equal to the MSE increase incurred straints likewise affect blocks 1 through n , , n , , ... ,
in using one less bit for another unit (since we need to n , = N , respectively. 111those and similar cases, a search
maintain the same.overal1budget). For this reason, there strategy cati be derived to find the optimal vector A in an
is no allocation that is more efficient fir that particular iterative fashion [ 721.
Dudmet. Box 6 illustrates why this approach is intuitively
sound. This technique was well known in optimization
Dynamic Programming
problems where the cost and objective functions were
As mentioned above, and discussed in more detail in Rox
continuous aiid differentiable. Everett’s contribution
8, Lagrangian techniques have the shortcoming of not
[64] was to demonstrate that the Lagrangian technique
being able to reach points that d o not reside on the convex
could also be used for discrete optimization problems,
hull o f t h e R-1) characteristic. An alternative formulation
with n o loss ofoptimality i f a solution exists with the re-
is then to formulate the allocation as a deterministic dy-
quired budget; i.e., as long as there exists a point in the
namic programming problem.
convex hull that meets the required budget.
I n this case, we create a tree that will represent all possi-
T h e properties of the Lagrange multiplier method are ble solutions. Each stage o f t h e tree corresponds to one of
very appealing in terms of computation. Finding the the coding units j and each node of the tree at a given
best quantizer for a given h is easy and can be d o n e inde- stage represents a possible cumulative rate usage. For ex-
pendently for each coding unit. ( N o t e that here we are ample, as seen in Fig. 15, to the accumulated rate at block
considering that the K-1) data has already been com-
puted and we are discussing the search complexity.
Finding the K-1) data may in itself require substantial
complcxity.)Still, one has t o find the “right” h in order
to achieve the optimal solution at the required rate; i.e.,
find h such that K ( h )as defined above is close o r equal to
the prespecified budget. Finding the correct h can be
done using the bisection search [42, 651 o r alternative
approaches such as those proposed in [69]. N o t e that
the number of iterations required in searching for h cati Nodes
be kept low as long as we d o n o t seek to have an exact
match o f t h e budget rate. Moreover, in scenarios such as
video coding, where we may be performing allocations
on successive frames having similar characteristics, it is
possible to initialize the Lagrange multiplier for a frame
with the values at which convergence was achieved for
previous frames, which will again reduce the Iiumber of
required iterations; i.e. providing a g o o d initial guess of
h leads to reduced complexity.

Generalized Lagrangian Optimization Stage i-1 Stage i Stage i c l


Allocation problems with multiple constraints, such as
those mentioned in Formulation 4 earlier, can also be
solved using Lagraiigian techniques. These approaches
are based on generalized Lagrangian relaxation methods
[70]. T h e basic idea is to introduce a Lagrange multi-
plier for cach of the constraints, which can thus be re-
L i-1 I i+l
Coding Unit
m

A 15. Trellis diagram to be used for the Viterbi algorithm solu-


tion. Each branch corresponds to a quantizer choice for a given
laxed. T h e problem now is that the solution can be found coding unit and has an associated cost, while its length along
o n l y f o r t h e r i g h t vector L a g r a n g e m u l t i p l i e r the vertical axis is proportional to the rate. For instance,
,
A = { h , . . . ,h , }, and the search in a multidimensional quantizer 1 at stage i produces a distortion d,,,and requires
space is not as straightforward as it is when a single rate q , y A path will correspond to a quantizer assignment to all
Lagrange multiplier is used. the coding units in the sequence.

38 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


budget constraint, it suffices to
prune the branches that exceed
the desired total rate allocation
(the tree cannot grow above
the “ceiling” specified by the
budget constraint). Similarly, if
a buffering constraint such as
that of Formulation 5 is intro-
duced, then we will need t o
prune o u t branches that exceed
the maximum buffer size at a
given stage [46].
The algorithm can be infor-
mally summarized as follows. At
stage i, for all the surviving
nodes, add the branches corre-
(a) (b) sponding to all quantization
A 16. Operational R-D characteristics of two frames in a dependent coding framework, where choices available at that stage
frame 2 depends on frame 1. (a) Independent frame‘s R-D curve. (b) Dependent frame‘s R-D (the rate Y~ determines the end
curves. Note how each quantizer choice for frame 1 leads to a different R2.D curve. The tiode and the distortion d.. is
Lagrangian costs shown are J=D+hR for each frame. added to the path distortign.)
Prune branches that exceed the
i - 1 we add the rate corresponding to each possible rate constraints, then, for each remaining node at stage i + 1,
quantization choice, thus generating new nodes with the keep only the lowest-cost branch.
appropriate accumulated rate. Each branch has as a cost Given the above discussion one might conclude that
the distortion corresponding to the particular quantizer, Lagrangian optimization is t o be generally preferred
and therefore as we traverse the tree from the root to the g i v e n its c o m p l e x i t y a d v a n t a g e s . H o w e v e r , t h e
leaves we can compute the accumulated distortion for Lagrangian approach does have one drawback in that
each o f t h e solutions. I t should be clear that this is indeed only points in the convex hull of the global operational
a way of representing all possible solutions, since by tra- R-D characteristic can be reached. This is not a problem if
versing the tree we get successive allocations for each of the convex hull is “sufficiently dense”; however, in some
the coding units. applications it may result in significant suboptimality. See
Let us now consider what happens if t w o paths con- also Box 8 for an example of this scenario.
verge into a single node; i.e., t w o alternative solutions
provide the same cumulative rate. It seems intuitive that
Dependency Problems
the solution having higher distortion u p to that point
S o far we have assumed that selection of the coding mode
should be removed (i.e., pruned from the tree), since
can be made independently for each coding unit without
from that stage on both solutions have the same remain- affecting the other units. There exist, however, scenarios
ing bits to use. Those paths that are losers s o f i r will be where this assumption is no longer a valid one.
losers overall. This is the gist o f t h e Optimality Principle This is typically the case in coding schemes based o n
introduced by Bellman [74-761, as it applies to this par- prediction [57, SO]. For example, assume that each cod-
ticular problem. See also Rox 7 for a simple example of ing unit i is predicted from the preceding coding unit
its applicability. This particular brand of dynamic pro- i - 1. T h e predictor is constructed usiFg t h e past
gramming (DI’), which handles deterministic cost func- quantized data, arid thus we code X I - P ( X L +); , i.e., the
tions and helps us find the shortest (in the sense of the prediction error. As we use quantized data, the prediction
branch cost) path in a graph, is also known as the Viterbi error and thus the admissible R-D operating points for i
algorithm [ 761 o r Dykstra’s shortest-path algorithm. In depend on o u r choice of quantizer for i - 1. Each choice
compression applications, dynamic programming is x ( i - 1) results in a different characteristic.
used in the encoder in the trellis coded quantizer (TCQ) One example of this scenario is illustrated by Fig. 16,
[77, 781, as well as in the scalar vector quantizer where we depict all the available R-1) choices for two
( S V Q )[ 791. I t is also used, as will be explained in Box 9, video frames where each frame can be coded using three
to optimally prune trees in applications such a s wave- different quantization settings, and where frame 2 is pre-
let-packet optimization [42], o r tree-structured vector dicted from frame 1 (note that there are nine possible
quantization [ 301. choices for frame 2, since the choices for frame 1 affect the
It will be easy t o incorporate additional constraints to resulting R-D values for frame 2). It should be noted that
the tree growth s o that the problems of Formulations 3 an algorithm that considers the two frames independently
and 5 can be solved. For example, to introduce an overall would select (for the given slope h ) quantizer 2 for both

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 39


-
Box 6 Example of Resource Allocation Using Lagrangian Optimization

vv e show the generality and intuition of Lagrangian opti-


mization by showing an example that is well outside the
scope ofimage coding or even engineering. This will hopefully
study time atid spending it on the term project until lie derives
the same marginal return for the next hour, minute, or second
that he spends on either activity. This is exactly the operating
highlight die general applicability of this class of techniques points A, and B, on the c~irvesthat live on thc same slope of
even to general problems involving resource allocation that we the trade-off characteristics.
are faced with in our day-to-day lives.
Let us take the example ofBob Smart, a physics freshman at
Intuition 01 Lagrange Multipliers
a U S . bur-year college. It is three weeks before Finals Week,
Fis Taken Mf Pts Taken On
and Rob, like all.tnotivated freshmen, is taking five demanding for Prolea for Final Exam
courses, one ofwliich is Physics 101. Rob wakes up to the real-
ization that his grade for the course will be based on (i) a term
project that he has not started o n pet, and (ii) a final exam. 50
T
lSlapel=2 ptslhr
Each ofthese two components is worth 50% of the grade for ISlopel=6 ptslhr

the course. As Bob has spent a few more hours on his ex-
tra-curricular activities than he probably should have, and he
has to devote a considerable amount of his time to the other 30
courses as well, he realizes that he has to budget his study time /
for Physics very carefully. Suppose he is able to project his ex-
pected perforniance on both the project and the final exam
Slopes at
based on how much time he devotes to them, and further he Optimality

c m quantify them using the curves shown in Fig. 17, which


measure his expected deviation from perfection (50 points for
each component) versus the amount oftime spent on the com-
ponent. After carefully surveying his situation, Rob realizes 10 hrs 20 hrs
Time Spent an Proled Time Spent on Final Exam
that he can spare a maximum of 30 hours total between both
the project and the final exam. The question is: how does he al- Problem Max total Score given a 30 hr budget
Solution A1 E1 not optimal by diverting 1 h i from final exam lo project you can gain 4 ptsl
locate his time optimally in order to maximize his score in the Must operate at SAME SLOPE at optlmalityl
course?
One option would be for him to devote 10 hours to the A 1 7. ///usfrationof Lagrangian optimization.
project (Rob was never big on projects!) aiid 20 hours to
studying for the final exam. This would amount to operating This is exactly the constant-slope paradigm alluded to in
on Points A, and H, in Fig. 17. Based on the trade-ofT that the body of the text. In a comprcssion application, the
models Rob‘s propensity with respect to both the project and trade-ofTs involve rate and distortion rather than scores on cx-
the exam, this would result in an expected score of 20 (or a de- ams aiid studying time, but the principles are the same. An im-
viation of30 from 50) on the project (point A , ) and a score of portant point to be emphasized in this example is that the
30 on the final exam (point B , ) for a net of 50 points. This does constant-slope condition holds only under the constraint that
not bode well for Rob, but can he make better use of his time? the rate-distortion (or equivalent trade-off) curves are inde-
The answer lies in the slopes of the trade-off curves that char- pendent. In our example above, this means that we assume that
acterize both the project and the final exam. Operating point the final-exam curve is independent of the project
A , has an absolute slope of 6 points/hour, while operating curve-something that may or may not be true in reality. Ifthe
poitit H ,has a slope of only 2 poiiits/hour. Clearly, Rob could amount of time spent on the project influences Rob Smart’s
help his cause by diverting one hour from the final exam to the preparedness for the final exam, then we have a case of “de-
project: this would increase his performance by 4 points! It is pendent coding” for the comprcssion analogue (see the “1)~-
clear that he should keep stealing time from the final-exam pendency Problems” section).

frames; i.e., it cvould incur a cost 1, (2) for franme 1 and examples within the MPEG coding framework that
then, given that quantizer 2 was selected for frame 1, illustrate different forms of dependent!..
would choose the minimum among a11 J z (2, x),which Trellis-based Dependency. T h e selection of
turns o u t to be J (2,2). However, in this particular esam- macroblock-le~~el quantization in an MPEG video stream
ple, the greedy approach, allocating first for frame 1 and is a d e p e n d e n t p r o b l e m because t h e rate for
then for frame 2, can be outperformed. T h e better overall macroblock i and quantizer j dcpends o n the quantizer
performance can be achieved when quantizer 1 is L I S for ~ ~ chosen for macroblock i - 1. This is because predictive
the first frame and quantizer 2 is used for the second. entropy coding of the quantization indices is used t o in-
Even t h o u g h /,(2)< Jl(l) w e have t h a t crease the coding efficiency. I n this situation it is possible
(2) + J z ( 2 2 )> It(1)+ J z (W.
ll to represent all the possible selections as a trellis where
Several types of dependency scenarios can be identi- each state represents one quantizcr selcction for a given
fied. Rather than attempt to provide a complete taxon- macroblock, with each stage of the trellis corresponding
omy of all these schemes, let us consider t w o coiicretc to one macroblock. Dynamic programming can then be

40 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


-
Box 7 Example of Bellman‘s Optimality
Principle of Dynamic Programming
t idea of dynamic programming, or Bellman’s optimality
Th’.‘
principle of dynamic programming, can be captured very
easily with a very simple example that illustrates the basic idea.
Suppose we are interested in finding the shortest auto
route between Los Angeles and New York City (see Fig. 18).
Further suppose that you know that the shortest route be-
tween L.A. and New York goes through Chicago. Then Bell-
man’s optimality principle states the obvious fact that in this
case, the Chicago to New York leg of the shortest journey
from L.A. to New York will be identical to the shortest auto
route between Chicago and New York; i.e. to the shortest
route on a trip that starts at Chicago and ends in New York.
Why is this obvious observation useful?Because it can result in
a lot of computational savings in finding the best path from
L.A. to New York: ifwe find the best path from L.A. to Chi-
cago, then we only need to add on the shortest auto distance A 18. Illustration of dynamic programming.
between Chicago and New York, if we already know the an-
swer to that. above example can be captured as follows: suppose there are
Sophisticated applications of this basic principle can lead two separate paths from L.A. to Chicago, one through St.
to fast optimal algorithms for a variety of problems of inter- Louis and the other through Denver. If the L.A.-Den-
est. A popular incarnation of the above principle in signal ver-Chicago route is longer than the L.A.-St. Louis-Chicago
processing and communications involves the omniscient route, then the former can be thrown out because the best
Viterbi Algorithm, which uses the dynamic programming route from L.A. to New York passing through Chicago can
principle illustrated above in finding the “best path” through never go through Denver. This principle is true for every
a trellis induced by a finite-state machine. The cities in the ex- state in the system and is the guiding principle behind the
ample above are analogous to the “states” in the Viterbi Al- popular Viterbi algorithm. As described in the main body,
gorithm a t various time stages ofthe trellis. Recall that a state the principle of dynamic programming finds application in a
decouples the fiiture from the past; that is, given the state, fu- variety of scenarios in image and video coding based on
ture decisions are not influenced by past ones that led to that rate-distortion considerations like buffer-constrained video
state. Thus, if two paths merge at a system state, then the compression (see also Box 4) and adaptive wavelet-packet
costlier of the two paths can be pruned. The analogy to the based transform coding (see Box 9).

used to find the minimal cost path in this trellis, where the tween the previously decoded frame a i d the ciirrent frame.
branch cost is typically defined as the Lagrangian cost in- This difference frame is in turn compressed and used to re-
troduced above [81-831. As in the example of Fig. 16, construct the decoded version ofthe current frame. It is easy
taking the dependency into account avoids “greedy” se- to see that we have a recursive prediction loop, and thus the
lection of coding parameters, where the quantizer assign- residue frame will depend o n the selection of quantization
ment is optimized for the current coding unit alone. parameters f o r all previous frames since t h e last
In general, trellis-based dependencies arise in cases INTRA-frame [SO]. I n this case we can obsen7e that d possi-
where the undcrlying structure is such that the memory in ble combinations generated by successive quantizer choices
the system is finite (i.e., coding choices for i depend only can be representcd as a tree with the number of branches
on a finite set ofprevious coding units) and the number of growing exponentially with the number of levels ofdepend-
possible cases is also finite. I n other words, in this case, ency (i.e., number of frames since the last INTIU-frame).
the available coding parameters for a given coding unit T h e problem of dependent coding with an application
depend on the “state” of the system-the finite set of pa- to an MPEG framework is studied in [SO]. T h e main con-
rameters that completely determine achievable values. As clusion is that exponential growth in the number ofcom-
in [ 8 1-83], for these types of dependencies one can use a binations makes t h e exact solution too complex.
dynamic programming approach, where the state corre- However it is possible to make approximations that s i n
sponds to the state o f t h e system, and branches (each cor- plify the search for the optimal solution. Good heuristics
responding to a choice of quantization) have associated a include the use of so-called monotonicity assumptions (a
Lagrangian cost that combines the rate and distortion for more finely quantized predictor typically results in
the given parameter choices. smaller prediction error) [ 801, o r greedy approaches
Tree-based dependency. A second example of depend- where, for example, only a few quantization choices are
ency can be seen when we maljze the effect of motion com- kept at any given stage. T h e problem can also be alleviated
pensation in a n MPEG framework. After m o t i o n by resorting to models ofthe dependent R-D characteris-
compensation, the encoder transmits the difference be- tics so that not all the operating points in the tree need t o

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 41


-
Box 8 Comparison of Lagrangian and Dynamic Programming Approaches
he basic difference between the Lagrangian and dynamic nearest convex hull point ( A ) has higher distortion than B.
T Programming (DP) approaches is that the Lagrangian ap- Therefore B would be the optimal solution for the problem at
proach is limited to select only operating points on the convex hand. In fact, any points in the shaded area would be better
hull of the overall R-D characteristic. No such constraint af- than A. However, none ofthem is reachable using Lagrangian
fects the DP approach. techniques, as these points are located “over” the convex hull.
This can be easily seen with the example of Fig. 19, which However, these points would be reachable through dynamic
represents the combined R-D characteristicfor a set of coding programming.
units. The figure shows an instance of the problem of Formu- Note that this situation comes about because we are deal-
lation 3 where rate has to be allocated to coding units in order ing with a discrete allocation and therefore the set of achiev-
to meet a budget constraint. In Fig. 19 the operating point C able points on the convex hull is also discrete (the example of
exceeds the budget and thus is not an admissible solution. The Box 6 assumes a continuous range of choices and thus the
Lagrangian approach is indeed optimal in that case.) In many
instances, in particular when the convex hull is densely popu-
lated, this situation is less likely to arise or in any case the gap
between the best Lagrangian solution and the optimal solu-
tion may be small. Exceptions include scenarios where the
X I number of coding units is small and the convex hull is sparse,
I
I as for example in the coding for scalar vector quantization
(SVQ) [ 791. When that is the case, the performance gap may
be larger and using DP all the more important. However it
may also be possible to use a Lagrangian solution to initialize
the DP search (see, for example, [84]).
In terms of complexity, the Lagrangian approach is prefer-
able, since it can be run independently in each coding unit,
X whereas DP requires a tree to be grown. The complexityof the
DP approaches can grow exponentially with the number of
coding units considered, while the Lagrangian approach’s
X complexity will only grow linearly. Thus, in many situations
x x the Lagrangian approach is a sufficiently good approximation
once computation complexity has been taken into account.
We also refer the reader to examples given in the webpage
at http://sipi.usc.edu/-ortega/RD_Examples/, which dem-
onstrates both the Lagrangian and DP techniques at work.
One of the examples shows how for different values of h, dif-
ferent points are achieved for each of the coding units and in
the overall allocation. We demonstrate how the operating
points can be searched with the bisection algorithm until the
A B C
desired operating point is reached [42,65].A second example
~ ~~ demonstrates the operation of the DP algorithm where a tree
A 19. Comparison between Lagrangian optimization and dy- is grown until the minimum cost path that meets the con-
namic programming. straint is found.

be explicitly computed [48, 571, o r by considering m o d - we introduced, namely, budget a d dclay constrained al-
els of the rate [47,60] and assuming that the quantization locations, and refer the reader to Boxes 2, 3, 4.
scale provides a good estimate of quality.

Budget Constraint Problems


Application to Basic Components Let us n o w visit a few applications involving bud-
get-constrained optimization in the context o f image and
in ImageNideo Coding Algorithms
video coding. D u e to lack o f space, we will omit compre-
In what follows we briefly outline scenarios where varia- hensive coverage of the image and video coding frame-
tions of the generic formulations described in the previous works and algorithms, referring the reader instcad to a
section have been found to be useful. While the formula- number of both old and recent textbooks o n thc subject
tions and algorithms are similar, there are key dfferences [ 1,5,31-37]. W e will dwell only and briefly o n transform
that would make the dscussion too long; we therefore limit coding, which was introduced earlier in the context o f
ourselves to providmg an overview of the applications with DCT and wavelet-based coding in Koxes 2 and 3, respec-
pointers to relevant work in the various areas. W e structure tively, and refer the reader to textbooks that treat this
our discussion along the lines of the two classes of problems topic [85-881.

42 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


works such as MPEG and H.263. There is a considerable
amount of spatio-temporal redundancy in typical video
data, and it is particularly important to pay attention to
the temporal dimension, which has the bulk of the inher-
ent redundancy for typical video sequences.
R-D based techniques for variable-size motion com-
pensation can be found in [68], while variable bit-rate
motion-vector encoding based on DPCM can be found in
A 20. Illustration of joint source-channel coding for multicast sce-
[69,90-921. Other examples of applications of R-D tech-
nario.
niques to video coding can be found in [5 1,93-951. A de-
tailed account of the impact of these techniques on
The budget-constrained optimization problem in
state-of-the-art video coders, concentrating specifically
source coding is to minimize the quantization dstortion
on the motion-related issues, can be found in another arti-
subject to a bit-rate constraint. The most general formula-
cle in this issue [2].
tion in the context of transform codng involves selecting
Further, rate-distortion optimization techniques can
the operating point involving the combination of trans-
be applied to shape coding, where trade-offs between the fi-
form, quantizer, and entropy coder in order to realize the
delity of the shape representation versus the bit rate needed
best rate-distortion trade-off. Depending on the flexibil-
to represent the shape can be optimized [ l , 96,971. These
ity (and complexity concerns) of the framework, one or
techniques are likely to be used in newer standards, such as
all of the above functional components can be jointly op-
WEG-4, which introduce support for video objects, rather
timized. Typically, the transform is held fixed (e.g. based
use the video frame as their basic video unit.
on DCT or a discrete wavelet transform) and the
quantization and entropy coder are jointly optimized.
The quantization modes can vary from simple scalar Adaptive Transform-Based R-D Optimization
quantizers to rather sophisticated vector quantizers, but While coding algorithms that use a fixed transformation
by abstracting these quantization modes as constituting a can be useful if the class of signals is well suited in some
discreteset of quantizers,a host of different frameworks can sense (e.g. in time-frequency characterization) to the
be considered under a common conceptual umbrella. fured transform, this may not be adequate for dealing
with arbitrary classes of signals with either unknown or
time-varying characteristics. For example, for images or
Fixed-Transform-Based R-D Optimization image segments having high-frequency stationary com-
A good example of a fured-transform-based application ponents, the wavelet transform is a bad fit. This motivates
involves syntax-constrained optimization of image and us to consider a more p o w e m adaptive framework that
video coding standards like JPEG, where the quantizer can be robust when dealing with a large class of signals of
choice (8 x 8 quantizer matrix for the image) and the en- either unknown or more typically, time (or space-)vary-
tropy coding choice (Huffman table) can be optimized ing characteristics. In this approach, the goal is to make
on a per-image and per-compression-ratiobasis (see Box the transformation signal-adaptive. See Boxes 3 and 9 for
2). The spectrum of general applications for optimizing applications involving wavelet packets, which represent
the quantizer and entropy coding choice can range from generalizations of the wavelet transform.
the selection of completely different quantizers (or The idea behind adaptive-transformframeworks is to
codebooks in a vector quantization scheme) to simply replace the fuced transform with a large library of trans-
scalinga quantizer by a scalingfactor, as is usually done by forms that can additionally be searched efficiently. The li-
users of JPEG. brary of transforms can be fairly general and can include
Another recently popular haven for R-D-based tech- for example, the family of quadtree spatial segmentations
niques is wavelet-based image coding, where several [68,83,98] or variable block-size DCTs (e.g. 4 x4,8 x 8,
state-of-the-art coding algorithms realize their high per- 16 x 16 blocks). Similarly, one can take the standard
formance gains by using a variety of sophisticated wavelet decomposition and customize its various param-
rate-distortion based optimization techniques. A partial eters (filters, tree-structure, including the number of lev-
list of coding algorithms that derive their gains from els of decomposition) to a particular image [42] or to
rate-distortion optimization includes forward-adaptive parts of an image [99]. These techniques have been
frameworks involving subband classification criteria [23, shown to provide substantial gains over nonadaptive de-
891, space-frequency quantization (SFQ) that R-D composition techniques [loo, 101,1031. Examples such
optimizes the popular wavelet zerotree framework [22], as the family of wavelet packet transforms of Box 9 are il-
as well as backward-adaptiveframeworks such as the esti- lustrations of the joint optimization of the transform,
mation-quantization (EQ) framework [24] and back- quantizer, and entropy-coding choice in an im-
ward adaptive quantization 1251. age-adaptive rate-distortion sense. Many of the sophisti-
Similarly, optimization techniques can be applied with cated wavelet-based frameworks described in the
impressive performance gains for video coding frame- previous subsection can be extended to their adap-

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 43


~

Box 9 - Adaptive Transforms Based on Wavelet Expansions

e continue the thread from Box 3 in our quest for de- with the quantizer that minimizes the r”Aistortion trade-off
W s i g ,wing adaptive transforms based on wavelet expan- (for a fixed “quality factor” h ) :
sions that arc K-D optimized.
Let us first address the cost fiinction, namely the R-D func- ](node) = min [D(node)+ ?&(node)].
qtcn,inzcr
tion. Returning to the problem of jointly finding the best
combination of wavelet packet (WP) transform (or basis) and Note the implication ofthis step-we do not yet know if
the cluantization ami entropy-coding choices, we assume that an arbitrary tree node will be part of our desired optimal
arbitrary (finite) quantization choices are assumed available to subtree choice, but we do know what quantization choice to
quantize the W1’ coefficients in each tree node (see Figs. 9 and use for that node $it is part of the best-basis subtree. This is
21), with both rate (K) and distortion (D) being assumed to particularly satisfying because it has eiiabled us to decouple
be additive cost metrics over t h e W P tree: i.e., the best quantizer/ b a s i s choice without sac r i fi c i ng
c
R(tree)= R(1eaf nodes); and D(tree) = D(1eaf nodes). As
an example, the commonly used first-order entropy and MSE
optimality .
We now have remaining only the unfinished business of
measures for R and I)satis@ this additivity condition. finding the best basis. The special trce structure of the basis
Turning now to the fast-search problem, one possible ap- can bc exploited in formulating a fast tree-based search strat-
prvach to finding the best tree is the “grecdy tree growing” al- egy. The idea is to use a bottom-up recursive “split-merge”
gorithm, which starts at the root and divides each signal in two decision at each node, corresponding to whether it is costlier,
if it is profitable to do so (ifthe cost oftlie subsignals generated in the Lagrangian sense, to keep thc parent node or its chil-
is less than the cost ofthe sigiial they come from). It terminates dren nodes. This Fast dynamic programming (DP) based
when no more profitable splits remain. It is easy to determine pruning method is also optimal because the signal subspace
that this, however, does not find the globally optimal tree, spanned by the parent node is the direct sum of the signal
which is found by starting at the deepest level of the tree and subspaces spanned by its children nodes thanks to the
pruning pairs of branches having higher total cost than that of orthogonality of the filter bank. We now describe the details.
their parent. Assume known the optimal subtree from a tree node n “on-
We iiow describe the details using a 1-D case for simplicity wards” to the full tree-depth log N.Then, by Rcllman’s
(see Fig. 21). The idea is to first grow the full (STFT-like) tree optimality principle of DP [74] (see also Box 7), all surviving
(see Fig. 21(a)) to f d l depth (or some maximum fixed depth in paths passing through node n must invoke this same optimal
practice) for the whole signal. Note that due to the “finishing” path. There are only two contenders for the “sur-
tree-structure of the bases, wc now have available the W1’ coef- viving path” at every node of the tree, the parent and its chil-
ficients corresponding to all the bases on our search list. That is, dren, with the winner having the lower Lagrangian cost.
ifwe grow the coefficientsofa depth-5 tree, we know the coetti- That is, starting from the full trce, the leaf nodes arc rccur-
cicnts associated with all subtrees grown to depth-5 or less. sively subjected to an optimal split-mergc decision, follou-
The next step is to populate cach W1’ tree node with the ing a policy of:
minimum Lagrangian cost o\7er all quantization choices for
that tree node. This minimum cost at each node is associated Prune iF: ](parentnode) 5 [ j(child1)+ j(child2)b

tivc-transform counterparts such as those based on wave- sclection o f optimal points. Roth these approaches
let pickets o r adaptive wavelet packets [ 1021. concentrated on the independent allocation case, the
dependent case was considered by [ 801 with applica-
tions provided to M P E G coding scenarios. M o r e re-
Delay-Constrained Allocation Problems
Kox 7 introduces the problem o f delay-constraiiied cent w o r k has also considered R-1)-optimized M P E G
allocation. This class o f problems, a s also outlined in coding using models o f the R-1) characteristics to re-
Formulation 4, is typically encoiintered in video t r a m - duce the complexity [48]. T h e rdte-control problem
mission under delay constraints. T h e m o r e traditional can be fhrmulated n o t o n l y in terms of selection of.
view of the problem is as a bufyev control problem but, quantization parameters as in the <ibo\.ereference but
as described above and in [44, 491, the delay con- also in terms of selection of the best types of frames ( I ,
straint is more general. Rate-distortion techniques 1’ o r K ) as in [ 1061.
have been applied to the rate control under CKR A second area in rate-control research is that of control
transmission conditions. For example, [46] provides for VBK channels. Here we can consider two cl~ssesof
an overall optimal solution using dynamic prograni- problems. First, in some cases it is possible for the en-
m i n g as well as Lagrangian based approximations. A n coder to select both the source and the channel rate;
alternative formulation is to consider the buffering where the selection of channel rate may be subject to con-
constraints as a set of budget constraints a s in [67]. straints, such as, for example, the policing constraints in
T h e tradition a 1 di rec t - feedb ac k mec h a 11 is 111 used in a n ATM network. Examples of this type of optimization
buffer-control algorithms [ 1041 where quantization include [49], which employs dynamic programming
scale is controlled by buffer fullness is replaced in techniques, and [SO],which utilizes multiple budget con-
1051 by a feedback mechanism that controls instead straints and a Lagrangian approach. Other approaches in-
the \ d u e of the Lagrange multiplier to be used in the clude [ S l , 1071.

44 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


where ](childnode) corresponds to the cost of the cheapest ward the root of the tree, using the above split/mcrge criterion
path that “goes through” the child node. Using this, we begin at each node, making sure that we record the optimal decisions
at the complete tree-depth n = log N and work our way to- along the way, until we arrive at the tree root. At this point, the
best basis is known by simply backtracking our way down the
tree using our recorded decisions at each tree node. In fact,

(Left Child)
t both the best basis and the best quantization choice are now
known!
Of course, this corresponds to a particular choice of h,
which was fixed during this tree-pruning operation. Unfor-
Rc, tunately, this h may not be the correct one: we want the one
that corresponds to the target bit budget K . However, due
to the convexity of the rate-distortion curve, the optimal
slope h’ matched to the desired R can be easily obtained us-
(Parent Node) ing standard convex search techniques; e.g. the bisection
m e t h o d o r Newton’s m e t h o d or o t h e r s t a n d a r d
root-solving methods. An important point of note is that

(Right Child)
t ISlopekh
the Lagrangian method can only obtain solutions that re-
side on the convex-hull of the rate-distortion curve, and,
thus, a target rate whose optimal operating point is not on
the convex hull will be approximated by the nearest con-
vex-hull rate. In practice, for most practical coding applica-
RC2 tions, the convex hull of the R-D curve is dense enough that
this approximation is almost exact.
We will now summarize the single-tree algorithm:
(D, + DC2)+ W,,
+ R&DP + LRJ A Grow a full-balanced (STFT-like)tree to some desired fixed
depth (i.e., find all the WP coefficientsassociated with all bases
in the library);
A For a fixed h, populate each node of the full tree with the
A 2 1. The single-tree algorithm finds the best tree-structured best Lagrangian cost D + hR over all quantizer choices (i.e.,
wavelet packet basis for a given signal. (a) The algorithm find the best quantizer choice for each node);
starts from the full STR-like tree and prunes back from the A Prune the full tree recursively, starting from the leaf nodes
leaf nodes to the root node until the best pruned subtree is (i.e., find the best-basis subtree);
obtained. (b) At each node, the split-merge decision is made A Iterate over h using a convex search method to meet the tar-
according to the criterion:prune if get bit rate (i.e., match the best subtree/quantizer choice to the
I@arentnode) 2 [I(child)+ I(child2)l desired bit budget).

T h e second class of problems is that where the chan- tions driving these techniques and provide some pointers
nel rate is subject to random variations o r can vary from to recent activities in the field.
link to link in a networked transmission. I n [ 1071 R-1)
methods are provided to reduce the bit rate of encoded
Background and Problem Formulation
data without requiring that it be decoded and With the explosion in applications involving image and
recompressed. In [52,63, 108,1091 approaches based on video communication, such as those afforded by the
dynamic programming and Lagrangian optimization are boom in multimedia- and Internet-driven applications, as
presented to address the problems of transmission in well as those afforded by emerging applications like cable
burst-error channels such as those eticouiitered in a wire- modems and wireless services, the image coiiitiiLiiiication
less-transmission environment [ 1 101. problem has rccently assumed heiglitcned interest and
importance, as visual data represents b!~ far the largest
percentage of multimedia traffic.
The Role of R-D Techniques in Joint A natural question to ask is: why d o we need to
Source-Channel Coding re-invent data coniniunications just bccause of the cur-
W e have thus far focused (with the exception of the p r e v - rent multimedia explosion? There are se\Feral reasons to
ous paragraph) o n rate-distortion methods for source revisit the existing paradigms and systems. The primary
coding when dealing with image and video sources. We one is that current communication link designs are pri-
n o w briefly address the problem of the applicability of marily mismatched for image and video sources as they
such methods for the bigger problem of image and video fail to account for important source considerations such
transmission, specifically in t h e c o n t e x t of joint as (i) highly time-varying source and channel characteris-
source-channel coding. Kox 5 highlights the essence of tics, (ii) high source tolerance to channel loss, and (iii)
the problem. Here, we take a look at some of the applica- unequal importance of transmitted bits. This comes from

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 45


a long history of data communications, where loss of bits network congestion in networking applications, or fad-
is disastrous (e.g. datafdes), andwhere every bit is equally ing in wireless-communication applications.
sacred. Some relevant important attributes of the im- Joint source-channelcoding schemes studied in the lit-
age/video source are summarized below: erature have been historically driven by two high-level
A The performance metric is the delivered visual quality ideologies. Very crudely, these ideologies may be classi-
(e.g. mean-squared-error or more correctly, the percep- fied as being inspired by “lgital” versus “analog))trans-
tual distortion) of the source due to both source mission methods.
quantization and channel distortion under constraints of The dtgital class of techniques is based on optimally al-
fuied-system resources such as bandwidth and transmis- locating bits between lgital source and channel codes.
sion energy. This contrasts with commonly used perfor- Source-coding bits correspond to a dtgitally compressed
mance criteria, such as bit-error rates, that are appropriate and entropy-coded stream. Channel-coding bits corre-
for traditional data communications.
A The unequal error sensitivities of a typical video
bitstream (e.g., bits representing motion vectors or syn-
chronization/header information versus bits representing
high-frequencymotion-compensated error residue or de-
tail in textured image areas) emphasizes the desirabhty of
a layered approach to both source and channel coding and
calls for a rehauling of conventional “single-resolutionJJ
dgital transmission frameworks with systems that have a
multiresolution character to them.
A Due to the stringent delay requirements of synchro-
nous video applications, there is a need to include finite spond to the parity information of a digital er-
buffer constraints (efficient rate-control strategies). ror-correction code. A popular rate-distortion-based
These requirements will also influence the choice of error approach to this digitally inspired source-channelcoding
control codmg strategies such as forward error correction paradigm consists of minimizing the expected end-to-end
(FEC) vs. automatic repeat request (ARQ) techniques, as source distortion (due to both source quantization and
well as more powerful hybrid FEC/ARQ choices [ 1111. channel transmission impairments) subject to a total rate
I n Box 5 we motivated t h e need for j o i n t on both source coding and channel coding. This boils
source-channelcoding due to the practical shortcomings down to an allocation problem not only among source
of the separation principle (see Fig. 13) as well as its theo- coding elements but also between soa4vce coding and channel
retical inapplicabilityto a number of multiuser communi- coding elements. Extensions of the Lagrangian method de-
cation scenarios of interest such as broadcast [53] and scribed earlier can be invoked here, with the twist that the
multicast. There is thus the potential for performance trade-offs involve expected distortion versus total rate,
gains if there is closer interaction between the source- and due to the presence of the channel coder. Due to the typi-
channel-coding functions. The understanding of the su- cally unequally important nature of source bit layers when
periority of a joint approach to source and channel codmg dealing with image and video sources, as pointed out ear-
in such cases has recently initiated numerous research ac- lier, these layers are matched with unequal levels of chan-
tivities in this area, a partial list of which can be found nel protection. This comes under the category of unequal
among [ 112-1141.Examples of successful deployment of error protection (UEP) channel codes. One of the most
joint source-channelcoding principles for multiuser com- popular classes of deployed UEP channel codes is the
munications frameworks like broadcast and multicast can family of rate-compatible punctured convolutional
be found in 1115-1161and [117] respectively. (RCPC) codes [118] that is promising for a number of
recent applications involving layered video codmg and
streaming for Internet and other applications. The joint
Overview of Applications and Methodologies source-channel coding problem becomes one of opti-
Image and video transmission problems of the kind for- mally matchng the resolubon “trees” for both the source
mulated above come in various application-driven fla- and the channel coders in a rate-distortion sense. A num-
vors. Of particular interest are image and video delivery ber of researchers have contributed significantly to this
over heterogeneous packet networks such as the Internet class of algorithms: a summary of this is provided in refer-
and ATM, as well as wireless video for point-to-point as ence [119]. An example of a modulation-domain-based
well as broadcast/multicast scenarios. A key challenge in- UEP scheme is described in [116], which has been re-
volving video sources involves the stringent synchro- cently considered for European digital audio and video
nous-delay requirements. For networkmg applications broadcast [1201. Each layer of dtfferent error protection
there are constraints imposed by the network related to corresponds to a specific type of the receiving monitor
both the average and peal<burst rates. The avdable chan- (typically, there are three layers or resolutions) and has
nel capacity can fluctuate quite a bit also, such as due to different bit-error-rate requirements. Thus, the quality of

46 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


the received video varies gracefully with the receiver type ment, as for example the Internet. Figure 20 succinctly
as well as its distance from the transmitter. captures the basic idea.
It should be noted that this “d@al” ideology.,while al- While currently mostly wired links are involved, it is
lowing higher source compression because of- entropy clear that mobile components are becoming more and
coding, can also lead to increased risk of error propaga- more important as well. Such a scheme would be suitable
tion. The popular solution is to insert periodic for such an environmentas well, possibly with bridges be-
resynchronization capabilities using packetization. The tween wired and wireless components.
resulting synchronization and packetization overheads
that are needed to increase error-resilience obviously eat
into the compression efficiency. The problem becomes Summary
one of optimizing this balance. I n this article we have provided an overview of
The other ideology has been inspired essentially by the rate-distortion optimization techniques as they are used
“graceful degradation”philosophy reminiscent of analog in practical image and video applications. We started by
transmission. Thus, while the single-resolution digital establishing the link between these techniques and the
philosophy adopts an “all or nothing” approach (within rate-distortion theory developed in the field of informa-
the packetization or layering operation) resulting in the tion theory. We motivate that standard-based im-
well-known “cliff effect,” the analog-inspired approach agefvideo coding can benefit from optimization
carries a “bend but do not break” motto. The idea is to do techniques as it allows the encoder to optimize the selec-
intelligent mappings of source codewords into channel tion of its coding parameters, while preserving decoder
constellation points, so as to have a similarity mapping compatibility. We then defined a generic resource alloca-
between “distances” in the source-coding domain and tion problem and gave two concrete examples: namely,
“dlstancesyyin the channel-modulation domain [ 113, budget-constrained allocation and delay-constrained al-
121-1241.Thus, large source distortions are effectively location. We explained in detail the Lagrangian optimiza-
mapped to high noise immunity (i.e., to low probability tion and dynamic programming techniques, which have
error events, and vice versa) with intelligently chosen in- become essential tools to solve these allocation problems.
dex assignments. The advantages of such an approach are This allowed us to give an overview of applications where
increased robustness and graceful degradation. The dis- rate-distortion optimization has proven to be useful. We
advantage is the lack of a guaranteed quality of service ended by describing how these techniques can also be
(there is no notion of “perfect” noise immunity). found to be useful within joint source-channel coding
It is interesting to note also that hybrid versions of frameworks.
these two philosophies that are aimed at exploiting the
“best of both worlds” have been advocated recently [ 1251
with significant performance gains. Acknowledgments
If the modem is included in the optimization box, then We thank the anonymous reviewers, in particular re-
the standard rate-distortion problem becomes trans- viewer 1,and Dr. Jong-Won Kim of USC, for their care-
formed into a power-lstortion trade-off problem (where ful readmg of the manuscript and their constructive
the constraint now becomes the transmission power or comments. We also would like to thank Raghavendra
energy rather than the bit rate). This leads to interesting Singh and Arif Karu from USC for their implementation
extensions of well-known rate-distortion-based optimi- of the example webpage referred in Box 8.
zation algorithms to their power-distortion counterparts Ortega’s work was supported in part by the National
[ 113,1231. The reader is referred to [ 1191for a more de- Science Foundation under grant MIP-9502227
tailed historical perspective of joint source-channel cod- (CAREER), and by the Integrated Media Systems Cen-
ing of images and video sources. ter (a National Science Foundation Engineering Re-
An example of an area where joint source-channel search C e n t e r ) , the Annenberg Center for
coding ideas have had an impact is in communicating Communication at the University of Southern Catifor-
over heterogeneous networks. In particular, the case of nia, and the California Trade and Commerce Agency.
multicast in a heterogeneous environment is well Ramchandran’s work was supported in part by the Na-
suited for multiresolution source and channel coding. tional Science Foundation under grant MIP 97-03181
The idea is very simple: give each user the best possible (CAREER), the Army Research Office under award
quality by deploying a flexible networking infrastruc- DAAH04-96-1-0342, and the Office of Naval Research
ture that will reach each user at its target bit rate. More under award NOOO14-97-1-0864.
precisely, a multicast transmission can be conceptual-
ized as living on a multiresolution tree, which is a set of Antonio Omega is an Assistant Professor with the Univer-
trees carrying various resolutions. Each user then sity of Southern California’s Integrated Media Systems
reaches as many levels of the multiresolution tree as is Center in the Department of Electrical Eng.-Systems in
possible given its access capabilities. Such a scheme was Los Angeles, California, USA (e-mail:ortega@
proposed in [ 1171 for a heterogeneous packet environ- sipi.usc.edu). Kannan Ramchandran is an Assistant Pro-

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 47


fessor with the Beclman Institute of the Department of sification in subband coding of images,” B E E Trans on Image Processzng;
vol 6, pp 1473-1486,Nov 1997
Electrical and Computer Engineering at the University of
I I1i n o i s , U r b an a - Champ ai gn , Illinois , US A 24 S Lopresto,(I Ramchandran, and M T Orchard, “Image codng based on
(e-mail:[email protected]). murture modehng of wavelet coefficients and a fast esumauoii quantization
framework,” in Data Compresszon Confrence ’97, (Snowbird, Utah), pp
221-230, 1997.
References 25 Y Yoo, A Ortega, and B Yu, “Adaptive quanuzation of image subbands
with efficient overhead rate selection,” in Proc of the Intl Conf on Irnuge
1.G.M. Schuster and A.K. Katsaggelos, Rate-Distortion Based Video Compres
Proc, ICP’96, vol 2, (Lausanne, Switzerland), pp 361 364, Sept 1996
ion, Optimal Video Frame Compression and Object Boundary Encoding.Kluwer
Academic Publishers, 1997. 26 C Chrysafis and A Ortega, “Efficient context-based entropy coding for
lossy wavelet image compression,” in Data Compresszon Conference ’97,
2. G. Sullivan and T. Wiegand, “Rate-distortion optimization for video com-
(Snowbird, Utah), pp 241-250, 1997
pression,” this issue. pp. 74-90.
3. C.E. Shannon, “A mathematical theory of communication,”Bell SyG. Tech. 27. S P Lloyd, “Least squares quanuzation in PCM,” IEEE Trans on In$ Th ,
J o u ~vol.
~ 27,
, pp. 379- 423, 1948. vol IT-28, pp 127.135, Mar 1982 Also, Unpublished Bell Labs Techm-
cal Report, 1957
4. C.E. Shannon, “Coding theorems for a discrete source with a fidelity crite
rion,” in IRENational Conventwn Record, Part4, pp. 142-163, 1959. 28. J. Max, “Quanuzing for minimum distoruon,” IEEE Trans on Info Th ,
pp 7-12,Mar 1960
5. W. Pennebalcer and J. MitchellJPEG Still Image Data Compression Standard.
Van Nostrand Reinhold, 1994. 29 Y Linde, A Bum, and R M Gray, “An algorithm for vector quantizer de-
sign,” B E E Trans on Comm ,vol COM-28, pp 84-95, Jan 1980
6. D. Lee, Wew work itein proposal: JPEG 2000 image coding system.”
ISO/IEC JTCl/SC29/WGl N390,1996. 30 P A Chou, T Lookabaugh, and R M Gray, “Optimal pruning with appli
7. Testing Ad Hoc Group, “JPEG2000 testing results.” ISO/IEC cauons to tree-structured source codmg and modeling,” IEEE Trans on In-
JTClJSC29/WGl/N705, Nov. 1997. form Themy, vol 35, pp 299-315, Mar 1989

8. N. Jayant, J. Johnston, and R. Safranek, “Signal compression based on mod- 31 A Gersho and R M Gray, VectorQnantzzatzonand Swnul Compresszon
els of human perception,” Proc. of the BEE, Oct. 1993. Kluwer Academc Publishers, 1992

9. T. Berger, Rate-Distom’on Theoly. A Mathematical Theory$? Data Conzpres- 32 J Mitchell, W Pennebaker, C E Fogg, and D J LeGall, M E G Video Com-
sion. Prentice-Hall, 1971. presswn Standard New York Chapman and Hall, 1997
10. R.M. Gray, Source Coding Theoyi. IUuwer Academic Publishers, 1990. 33 A N Netravab and B G HaskeU, Dzgztal Pzctures Representatzon, Compres
11.W. Bennett, “Spectra of quantized signals,” Bell Sys.TechJ., vol. 27, pp. swn and Standards, New York Plenum Press, 2nd ed , 1995
446-472, Jul. 1948. 34 R J Clarke, Bgztal ConzpresszonofSaL1Imagtx and Video Academc Press, 1995
12. P. Zador, Development and Evaluation of Proceduresfor Quantizing 35 V Bhaskaran and K Konstantiiudes, Image and Vzdeo Compresszon Stan-
Multivariate Distvibutions. PhD thesis, Stanford University, Stanford, CA, dards A&mthms and Archztectures ICluwer Academic Publishers, 1996
1964.
36 K R Rao and J J Hwang, Technzque &StandardsforImage, fide0 &Audzo
13. A. Gersho, “Asymptotically optimal block quantization,” LEEE Trans. on
Codzng Prenuce Hall, 1996
, m 2 3 , pp. 373-380, J ~ I 1979.
I%$. ~ h .vol. .
37 B G. Haskell, A Pun, and A N Netravali, Dzgztaal Vzdeo An Introductzont o
14. D.L. Neuhoff, ‘The other asymptotic theory of source coding,” inJoint
MPEG-2 Chapman and Hall, 1997
IEEEDLMACS Wodz.rhop on Coding and Quantization, (Rutgers University,
Piscataway, NJ), Oct 1992. Also available as ftp://ftp.eecs.umich.edu/peo- 38 T U - T (formerly CCI’IT), “Video coding for low bitrate communication,”
ple/neuhoff/OthAsymptThy.ps. ITU-T Recommendauon H 263, version 1,Nov. 1995, versioii 2, Jan
1998
15. T.M. Cover and J.A. Thomas, Elements oflnfomation Theory. New York:
Wiley, 1991. 39 S Wu and A Gersho, “Rate-constrained picture-adapuve quantization foi
16. N. Farvardin and J.W. Modestino, “Optimum quantizer performance for a JPEG basehne coders” in Proc IEEE Intematzonal Conference Acoustzcs,
class of non-Gaussian memoryless sources, IEEE Trans. on Info. Th., vol. Speech and Sgnal Processzn., ICASSP’93, vol 5, (Minueapohs, MN), pp
Il-30, pp. 485-497, May 1984. 389-392, April 1993

17. P.A. Chou, T. Lookabaugh, and R.M. Gray, “Entropy-constrained vector 40 K Ramchandran and M Vetterli, “Rate-distoruon opnmal fast
quantization,”LEEE Trans. A S P , vol. 37, pp. 31-42, Jan. 1989. thresholdmg with complete JPEG/MPEG decoder compatibihty,”IEEE
Trans on Iwge Proc ,vol 3, pp 700 704, Sept 1994
18. M. Effros, “Optimal modeling for coinplex system design,” this issue, pp.
51-73. 41 M Crouse and K Ramchandran, “Joint thresholdmg and quantizer selec-
19. P.H. Westerink, J Biemond, and D.E. Boekee, “An optimal bit allocation tion for transform image-codmg Entiopy-constrained analysis and applica-
uons to basehne JPEG,” IEEE Transactzons o n Image Processzng,vol 6, pp
algorithm for sub-band coding,” Proc. ofICASSP, pp. 757-760, 1988.
285-297, February 1997
20. J.M. Shapiro, “Embedded image coding using zerotrees of wavelet coeffi-
cieiits,”LEEE Trans. on S&nal Proc., vol. 41, pp. 3445-3462, Dec. 1993. 42 K Ramchandran andM Vetterh, ‘%est wavelet packet bases m a rate-&stomon
sense,”LEEE Trans on I q e P r o c , vol 2, pp 160.175, Apr 1993
21. A. Said and W.A. Pearlinan, “A new fast and efficient image coder based on
set partitioning in hierarchical trees,” IEEE Trans. Circuits and Systemsf i 43 M W Garrett, “Statistical analysis of a long trace of VBR coded vldeo ”
Video Technology, pp. 243-250, June 1996. Ph D Thesis, Chapter IV,Columbia University, 1993

22. Z. Xiong, K. Ramchandran, and M.T. Orchard, “Space-frequency 44 AR Reibman and B G Haskell, “Cc”amts on vanable bit-ratendeo for ATM
quantization for wavelet image coding,” IEEE Trans. on Image Proc., vol. 6, networks,”lEEE Trans on CASfiwdeo tech, vol 2, pp 361-372, Dec 1992
pp. 677-693, May 1997. 45 S -W Wu and A Gersho, “Rate-constramed optimal block-adaptive codmg
23. R.L. Joshi, H. Jafarlhani, J.H. Kasiier, T.R. Fischer, N. Farvardin, M.W. for &gital tape recordmg of HDTV,” IEEE Trans on Czrcuzts and Sys $Y
Marcellin, and R.H. Bamberger, “Comparison of different methods of das- Video Tech ,vol 1, pp 100-112,Mar 1991

48 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


46. A. Ortega, K. Ramchandran, and M. Vetterli, “Optimal trellis-based buf- 67. D.W. Lin, M.-H. Wang, and J.-J. Chen, “Optimal delayed-coding of video
fered compression and fast approximation,”IEEE Trans. on Ima@ Proc., vol. sequences subject to a buffer-size constraint,” in Proc. of Sl’IE Visual Cam-
3, pp. 26-40, Jan. 1994. mnnications and Image Processing ‘93, (Cambridge, MA), Nov. 1993.
47. W. Ding and B. Liu, “Rate control of MPEG video coding and recording 68. G.J. Sullivan and B.L. Baker, “Eficicnt quadtree coding of images and
by rate-quantization modeling,” IEEE Transactions on Circuits and Systems video,” IEEE Trans. on ImaJe Proc., vol. 3, pp. 327-331, May 1994.
for Video Technology,vol. 6, pp. 12-20, Feb 1996.
69. G.M. Schuster and A.K. Katsaggelos, “An optimal quadtree-based motion
48. L:J. Lin and A. Ortega, “Bit-rate control using piecewise approximated estimation and motion based interpolation scheme for video compression,”
rate-distortion characteristics,” IEEE Trans. on Circ. and Syxfor Video Tech., IEEE Trans. on ImageProc., vol. 7, no. 11, pp. 1505-1523, Nov. 1998.
vol. 8, pp. 446-459, Aug. 1998.
70. M.L. Fisher, “The Lagrangian relaxation method for solving integer pro-
49. C.-Y. Hsu, A. Ortega, and A. Reibman, “Joint selection of source and gramming problems,”Manajement Science, vol. 27, pp. 1-18, Jan. 1981.
channel rate for VBR video transmission under ATM policing constraints,”
71. J.-J. Chen and D.W. Lin, “Optimal coding of video sequences over ATM
IEEE J. on Sel. Areas in Comm., vol. 15, pp. 1016-1028,Aug. 1997.
networks,” in Proc. of the 2nd Intl. Con$ on ImaJe Processing, KIP-95, vol. I,
50. J:J. Chen and D.W. Lin, “Optimal bit allocation for coding of video sig- (Washington, D.C.), pp. 21-24, Oct. 1995.
nals over ATM networks”1EEE J . on Sel. Areas in Comm., vol. 15, pp. 72. A. Ortega, “Optimal rate allocation under multiple rate constraints,” in
1002-1015, Aug. 1997. Proc. of the Data Compression Conference, DCC’96 (Snowbird, UT), pp.
51. W. Ding, “Joint encoder and channel rate control of VBR video over ATM 349-358, Apr. 1996.
networks,” IEEE Transactions on Circuitsand Systemsfor Video Technology, 73. J.-J. Chen and D.W. Lin, “Optimal bit allocation for video coding under
vol. 7, pp. 266-278, April 1997. multiple constraints,” in Proc. IEEE Intl. Con6 on Image Proc., ICIP’96,
52. C.-Y. Hsu, A. Ortega, and M. Khansari, “Rate control for robust video 1996.
transmission over burst-error wireless channels,” IEEEJ. on Sel. Areas in 74. B. Bellman, Dynamic Programming.Princeton University Press, 1957.
Comm., 1998. to appear.
75. D.P. Bertsekas, Dynamic Programming.Prentice-Hall, 1987.
53. T. Cover, “Broadcast channels,” IEEE Trans. on I n f m . Theory,vol. IT-18,
pp. 2-14, Jan. 1972. 76. G.D. Forney, “The Viterbi algorithm,” Proc. of the IEEE, vol. 61, pp.
268-278, March 1973.
54. G.M. Schuster and A.K. Katsaggelos, “The minimumaverage and mini-
mum-maximum criteria in lossy compression,” Vistasin Astronomy, vol. 41, 77. M.W. Marcellin and T.B. Fischcr, “Trellis coded quantization of
110.3,pp. 427-437, 1997. memoryless and Gauss-Markov sources,”IEEE Trans. on Comm., vol. 38,
pp. 82-93, Jan. 1990.
5 5 . D.T. Hoang, Fast and EjJcientAlgorithmsfr Text and Video Compression.
PhD thesis, Brown University, 1997. 78. T.B. Fischer and M. Wang, “Entropy-constrained trellis-coded
quantization,” IEEE Trans. on Info. Th., vol. 38, pp. 415-426, Mar. 1992.
56. D.T. Hoang, E.L. Linzer, and J.S. Vitter, “Lexicographic bit allocation for
MPEG video,” Jonmnl of fisual Communicationand Image Representatwn, 79. R. Laroia and N. Farvardin, “A structured fixed-rate vector quantizer de-
vol. 8, Dec. 1997. rived from a variable length scalar quantizer part I: memoryless sources,”
IEEE Trans. on Infomz. Theoy, vol. IT-39, pp. 85 1-867, May 1993.
57. K.M. Uz, J.M. Shapiro, and M. Czigler, “Optimal bit allocation in the pres-
cnce of quantizcr fcedback,” in Proc. of ICASSP’93, vol. V, (Minneapolis, 80. K. Kamchandran, A. Ortega, and M. Vetterli, “Bit allocation for dependent
MN), pp. 385-388, Apr. 1993. quantization with applications to multiresolution and MPEG video coders,”
IEEE Trans. on ImageProc., vol. 3, pp. 533-545, Sept. 1994.
58. E. Frimout, J, Biemond, and R.L. Lagcndijk, “Forward ratc control for
MPEG recording,” in Proc. of SPIE Visual Communications and Image 8 1. A. Ortega and I<. Ramchandran, “Forward-adaptive quantization with op-
timal overhead cost for image and video coding with applications to MPEG
Processing ’93, (Cambridge, MA), pp. 184-194, Nov. 1993.
video coders,” in Proc. of PIE, D@al Video Compression:Algomthms 0
59. J. Katto and M. Ohta, “Mathematical analysis of MPEG compression capa- Technologies 95, (San Jose, CA), Feb. 1995.
bility and its application to rate control,” in Proc. oflCIP’95, vol. 11, (Wash-
82. T. Wiegand, M. Lightstone, D. Muliherjee, T. Campbell, and %I<.Mitra,
ington, D.C.), pp. 555-559, 1995.
“Rate-distortion optimized mode selection for very low bit-rate video cod-
60. H.-M. Hang and J.-J. Chen, “Source model for transform video coder and its ap- ing and the emerging H.263 standard,” IEEE Trans. on Circ. and Sys.for
plication,” IEEE Trans, un CAS@ V&o Tech.,vol. 7, pp. 287-311, Apr. 1997. Video Tech., vol. 6, pp. 182-190, Apr. 1996.
61. G.M. Schuster and A.K. l<atsaggelos, “The min-max approach in video 83. G.M. Schuster and A.K. Katsaggelos, “A video comprcssiou scheme with
coding,” in Proc. of 1997 Intl. Conf Acoust., Speech and Shnal Processing, optimal bit allocation between displacement vector field and displaced
ICASSP’97, vol. IV, (Munich, Germany), pp. 3105-3108, Apr. 1997. frame difference,” IEEEJ. on Sel. Areas in Comm., vol. 15, pp. 1739-1751,
62. A. Ortcga and M. Khansari, “Rate control for video coding over variable Dec. 1997.
bit rate channels with applications to wireless transmission,” in Proc. of the 84. Y. Yoo, A. Ortega, and I<. Ramchandran, “A novel hybrid technique for
2nd. I d . Confon ImageProc., ICIP’95 (Washington, D.C.), Oct. 1995. discrete rate-distortion optimization with applications to fast codebook
search for scalar vector quantization,” in l’roc. of ICASSP’96 (Atlanta, GA),
63. C.-Y. Hsu, A. Ortega, and M. Khansari, “Rate control for robust video
Apr. 1996.
transmission over wireless channels,” in Proc. of Visual Comm. and Inzaje
Proc., VCIP’97 (San Jose, CA), Feb. 1997. 85. M. Vetterli and J. Kovacevic, Wavelets and Snbband Coding. Prcnticc Hall,
1995.
64. H. Everett, “Generalized Lagrange multiplier method for solving problems
of optimum allocation ofresources,” OperationsResearch, vol. 1X, pp. 86. G. Strang and T. Nguyen, Wauelers and Filter Banks. Wellealey-Cambridge
399-417, 1963. Press, 1996.
65. Y. Shoham and A. Gersho, “Efficient bit allocation for an arbitrary set of 87. N.S. Jayant and P.Noll, DzgZtal Coding of Wavefomzs:Pvinciples andApplzca-
quantizers,”IEEE Trans. ASSP, vol. 36, pp. 1445-1453, Sep. 1988. tions to Speech and Video. Englewood Cliffs, New Jersey: Prentice-Hall,
1984.
66. S.-Z. Kiang, R.L. Baker, G.J. Sullivan, and C.-Y. Chili, “Kecursive optimal
pruning with applications to tree structured vector quantizers,” IEEE Trans. 88. AI<. Jain, Fundamentals of DZgZtal Image Processing. Englewood Cliffs, NJ:
on Inzaje Proc., vol. 1, pp. 162-169, Apr. 1992. Prentice-Hall, 1989.

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 49


89. R.L. Joshi, T.R. Fischer, and R.H. Bamberger, “Optimum classification in 107 A Elefthenads and D Anastassiou, “Constrained and general dynamic
subband coding of images,” in Proc. ofICIP’94 (Austin, TX), pp. 883-887, rate shaping of compression digital video,” in Proc qICIP’995, vol I11
Nov. 1994. (Washington, D C ), pp 396-399, 1995
90. M.C. Chen and A.N. Willson Jr., “Rate-distortion optimal motion estima- 108 A Ortega and M Khansari, “Rate control for video coding over variable
tion algorithms for motion-compensated transform video coding,” IEEE bit rate channels with applicaaons to wireless transmission,” in Proc of
Trans. on Ciw. and Syxfir Video Tech., vol. 8, pp. 147-158, Apr. 1998. ICIP’95, (Washington, D C ), Nov 1995
91. D. Tzovaras and M. Strintzis, “Motion and disparity field estimation using 109 C -YHsu and A Ortega, “A Lagrangian optimization approach to rate
rate-distortion optimization,” IEEE Trans. on Circ. and Sys. for Video Tech., control for delay-constrainedvideo transmission over burst-error channels,”
vol. 8, pp. 171.180, Apr. 1998. in Proc ofICASSP’98, (Seattle, WA), May 1998
92. D.T. Hoang, P.M. Long, and J.V. Vitter, “Efficient cost measures for mo- 110 M Khansari, A Jalali, E Dubois, and P Mermelstein, “Low bit-rate
tion estimation at low bit rates,” IEEE Trans. on Circ. and Sys.for Video video transmission over fading channels for wireless microcellular systems,”
Tech., vol. 8, pp. 488-500, Aug. 1998. LEEE Trans on Czrc and Sys for Vzdeo Tech ,vol 6, pp 1-11,Feb 1995
93. J. Ribas-Corbera and D.L. Neuhoff, “Optimal bit allocations for lossless 111 S Lin and J D J Costello, Elror Control Codzng Fundamentals and
video coders: Motion vectors vs difference frames,” in Proc. ofICIP’95, vol. Applzcatwns Prentice-Hall, 1984
3, pp. 180-183, 1995.
112 J Modestino, D G Dant, and A Vickers, “Combined source-channel
94. J. Ribas-Corbera and D.L. Neuhoff, “Optimizing block size in mo-
codmg of images using the block cosine transform,” IEEE Trans on
tion-compensated video coding,” Journal of Electronic Imaging, vol. 7, pp.
Commun ,vol COM-29, pp 1261-1274, Sept 1981
155-165, Jan. 1998.
113 N Farvardin and V Vaishampayan, “On the performance and complexity
95. W.C. Chung, F. Kossentini, and M.J.T. Smith, “A new approach to scal-
of channel-optimizedvector quanuzers,” LEEE Trans on Infirm Theoly, vol
able video coding,” in Proc. ofDCC’95, (Snowbird, UT), pp. 381-390, Apr
E-37, Jan 1991
1995.
96. G.M. Schuster, G. Mehikov, and A. Katsaggelos, “Operationally optimal 114 F H Liu, P Ho, and V Cuperman, “Joint source and channel codmg us-
vertex-based shape coding,” this issue, pp. 91-108. ing a non-hnear receiver,” in Proc ofICC’93, June 1993

97. G.M. Schuster and A.K. Katsaggelos, “An optimal boundary encoding 115 W F Schreiber, “Considerations in the design of HDTV systems for ter-
scheme in the rate-distortion sense,”IEEE Trans. on Image Proc., vol. 7, pp. restrial broadcasting,” SMPTEJournal, pp 668-677, Sept 1991
13-26, Jan. 1998. 116 K Ramchandran, A Ortega, I<M Uz, and M Vetterli, “Mularesolution
98. G.M. Schuster and A.K. Katsaggelos, “A video compression scheme with broadcast for digital HDTV using joint source-channelcoding,” LEEEJ on
optimal bit allocatioii among segmentation, displacement vector field, and Se1 Arearzn Comm ,vol 11, pp 6-23, Jan 1993
displaced frame difference,” IEEE Trans. on Image Proc., vol. 6, pp. 117 S McCanne, M Vetterli, and V Jacobson, ‘low-complexity vldeo cod-
1487-1502,Nov 1997. ing for receiver-driven layered multicast,” IEEEJ on Se1 Areas zn Comm ,
99. C. Herley, J. Kovacevic, I<. Ramchandran, and M. Vetterli, “Tilings of the vol 15, pp 983 1001, Aug 1997
time-frequencyplane: construction of arbitrary orthogonal bases and fast
118 J Hagenauer, “Rate-compauble punctured convoluuonal codes (rcpc
tiling algorithms,” IEEE Trans. on Sknal Proc., SpecialIssue on Wavelets, vol.
codes) and their applications,” IEEE Tram on Comm ,vol COM 36, pp
41, pp. 3341-59, Dec.1993.
389-400, Apr 1988
100. Z. Xiong, IC. Ramchandran, C. Herley, andM.T. Orchard, “Flexible time
119 K Ramchandran and M Vetterli, “Multiresolution Joint Source-Channel
segmentations for time-varyingwavelet packets,” LEEE Trans. on Sknal
Coding for Wireless Channels,” in Wzreles Communzcatzons A Sgnal Pro-
Proc., vol. 45, pp. 333-345, Feb. 1997.
cesszng Pmspectzve, Edcors V Poor and G Wornell Prentice-Hall, 1998
101. Z. Xiong, M.T. Orchard, and I<. Ramchandran, ‘Wavelet packet image
120 B Schafer, ‘Terrestrial transmission of DTVB signals the European speci
coding using joint space-frequency quantization,” IEEE Trans. on Image
ficauon,” in InternatzonalBroadcastzng Conventzon, no 413, September 1995
Proc., pp. 892-898, June 1998.
102 K. Ramchandran, M. Vetterli, and C. Herley, “Wavelets, subband coding 121 H Kumazawa, M Kasahara, and T Namekawa, “A construction of vec-
and best bases,” Proceedings ofthe IEEE, pp. 541-560, April 1996. Special tor quantizers for noisy channels,”Electron Eng Japan, vol 67-B, pp
Issue on Wavelets: Invited Paper. 39-47, 1984

103. R. Coifman and V. Wickhauser, “Entropy-based algorithm for best basis 122 K A Zeger and A Gersho, “Zero redundancy channel codmg in vector
selection,” IEEE Trans. Infirn?. Theoly, vol. IT-38, pp. 713-718, Mar. 1992. quantmtion,”LEEE Elecpon Letters, pp 654-655, June 1987

104. C.-T. Chen and A. Wong, “A self-governingrate buffer control strategy 123 I Kozintsev and K Ramchandran, “Robust image transmission over en-
for pseudoconstant bit rate video coding,” IEEE Trans. on Image Proc., vol. ergyconstrained time-varying channels using mularesolution joint
2, pp. 50-59, Jan. 1993. source-channelcoding,” IEEE Trans on Szpal Processzng, Speczal Issue on
Wavelets and Fzlter Banks, vol 46, pp 1012-1026,April 1998
105. J. Choi and D. Park, “A stable feedback control of the buffer state using
the controlled Lagrange multiplier method,” IEEE Trans. on Image Proc., 124 M W Marcellin and T R Fischer, “Joint trelhs coded quantization and
vol. 3, pp. 546-558, Sept. 1994. modulation,” IEEE Trans on Comnz ,Jan 1993
106. J. Lee and B.W. Dickinson, “Rate distortion optimized frame-type selec- 125 I Kozintsev and I< Ramchandran, “A hybrid compressed-uncompressed
tion for MPEG coding,” IEEE Trans. on Circ. and Sys.for Video Tech., vol. 7, framework for wireless image transmission,))in Proc of ICIP’97, (Santa
pp. 501-510, June 1997. Barbara, California), Oct 1997

50 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998

You might also like