0% found this document useful (0 votes)
5 views13 pages

A Linear Source Model and A Unified Rate Control Algorithm For DCT Video Coding

Uploaded by

ila2088
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views13 pages

A Linear Source Model and A Unified Rate Control Algorithm For DCT Video Coding

Uploaded by

ila2088
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

970 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO.

11, NOVEMBER 2002

A Linear Source Model and a Unified Rate Control


Algorithm for DCT Video Coding
Zhihai He, Member, IEEE, and Sanjit K. Mitra, Life Fellow, IEEE

Abstract—We show that, in any typical transform coding sys- A. Brief Review of Rate Control Algorithms
tems, there is always a linear relationship between the coding bit
rate and the percentage of zeros among the quantized transform The rate-distortion ( - ) formula for a simple quantizer has
coefficients, denoted by . Based on Shannon’s source coding the- been established for a long time [3], [4]. However, in a typical
orem, a theoretical justification is provided for this linear source transform coding system, for example, JPEG coding [1], such
model. The physical meaning of the model parameter is also dis- type of analytic entropy formula does not work, especially at
cussed. We show that it is directly related to the image content low bit rates [5]. In Fig. 1, we plot the actual JPEG coding bit
and is a measure of picture complexity. In video coding, we pro-
pose an adaptive estimation scheme to estimate this model param- rates and the entropies of the quantized discrete cosine trans-
eter. Based on the linear source model and the adaptive estima- form (DCT) coefficients for images “Lena” and “Peppers” at
tion scheme, a unified rate control algorithm is proposed for var- different quantization scales. It can be seen that the relative error
ious standard video coding systems, such as MPEG-2, H.263, and between them is very large. In the classical - formula, the
MPEG-4. Our extensive simulation results show that the proposed only parameter which describes the input source is the variance
rate control outperforms other algorithms reported in the litera-
ture by providing much more accurate and robust rate control. of the source data. It is well known that variance itself is far in-
sufficient to characterize the input source data and to determine
Index Terms—Linear rate model, rate control, rate-distortion the final coding bit rate. In addition, the analytic entropy for-
control, Shannon’s source coding theorem, video coding.
mula does not take into account the coding behavior of a spe-
cific coding algorithm. For example, the wavelet-based image
I. INTRODUCTION compression algorithm proposed in [6] is much more efficient
than the JPEG coding [1]. Therefore, even for the same image,
T HE OUTPUT bit rate of the video encoder varies dramati-
cally over time due to scene activities. In visual communi-
cation over narrow-band or time-varying channels, rate control
different coding algorithms have quite different - functions.
Since the analytic entropy formula does not work well in prac-
is very important to ensure the successful transmission of coded tical coding applications, more sophisticated rate formulas have
video data through communication channels. In real-time video been developed and used in the literature and applied to rate con-
communications, such as video conferencing, video phone, in- trol for video coding applications [5]. To estimate the rate func-
teractive classrooms, and real-time Web cast, the end-to-end tion more accurately, some operational approaches have also
delay for video transmission has to be very small, which requires been proposed. In [7], the - curve is modeled by an exponen-
tial formula coupled with several control parameters. The model
more accurate and robust rate control.
parameters are then estimated from the coding statistics gener-
In standard image and video coding, the output bit rate is
controlled by the quantization parameter of the video encoder. ated by re-encoding of the input video. Obviously, this algorithm
In this work, we denote the quantization parameter by . For a has very high computational complexity. In video coding, the
uniform quantizer, represents the quantization step size. For a coding statistics of the previous frames can be utilized to esti-
perceptual quantizer with a quantization matrix as in JPEG [1] mate the model parameters of the current frame, as in the VM7
image and MPEG [2] video coding, represents the quantiza- rate control algorithm [8], [9]. In this way, the computational
tion scale factor. The relationship between and is described complexity is reduced. However, in this approach, it is assumed
by the rate-quantization ( - ) function, denoted by . The that the neighboring frames have the same - characteristics.
key issue in rate control is to estimate . Once is avail- Unfortunately, this is not true at scene changes. Therefore, this
able, to achieve the target coding bit rate , we just select the approach often suffers from performance degradation at scene
corresponding quantization parameter . changes [10]. The empirical estimation of the model parameter
can also be carried out at the macroblock (MB) level, as in the
TMN8 rate control algorithm [10], [11]. With an adaptive quan-
tization scheme at the MB-level, the TMN8 algorithm has a su-
Manuscript received June 5, 2001; revised November 1, 2001. This paper was perior rate control performance when compared to the rate con-
recommended by Associate Editor H. Sun. trol in VM7. However, due to the limited accuracy of its source
Z. He was with the Department of Electrical and Computer Engineering, Uni-
versity of California, Santa Barbara, CA 93106 USA. He is now with Sarnoff model, it also suffers from performance degradation for videos
Corporation, Princeton, NJ 08543-5300 USA (e-mail: [email protected]) with high motion. Furthermore, the TMN8 rate control algo-
S. K. Mitra is with the Department of Electrical and Computer Engi- rithm is mainly designed for pictures. For pictures, a very
neering, University of California, Santa Barbara, CA 93106 USA (e-mail:
[email protected]). rough approximation for the coding bit rate is used. Therefore,
Digital Object Identifier 10.1109/TCSVT.2002.805511 the TMN8 algorithm is mainly used for H.263 video coding. It
1051-8215/02$17.00 © 2002 IEEE
Authorized licensed use limited to: Malaviya National Institute of Technology Jaipur. Downloaded on February 11,2025 at 07:04:29 UTC from IEEE Xplore. Restrictions apply.
HE AND MITRA: A LINEAR SOURCE MODEL AND A UNIFIED RATE CONTROL ALGORITHM FOR DCT VIDEO CODING 971

(a) (b)
Fig. 1. Entropy and the actual JPEG coding bit rate for images (a) “Lena” and (b) “Peppers.”

does not work well in MPEG-2 coding which has at least one by (1). As a consequence, a unified linear rate control algo-
picture in each group of pictures (GOP). rithm is proposed here for MPEG-2, H.263, and MPEG-4 video
coding. The proposed algorithm is conceptually simple and has
B. Proposed Source Model and Rate Control Scheme low computational complexity. However, it outperforms other
To the best of our knowledge, all the source models reported rate control algorithms reported in the literature by providing
in the literature [5], [7], [8], [10] try to find the best expression more accurate and robust rate control.
for the coding bit rate in terms of the quantization parameter The paper is organized as follows. In Section II, we present
. In other words, the rate function is defined and modeled in the the -domain - analysis technique. Based on our extensive
domain. In order to improve the accuracy of the source model, simulation results, we introduce the linear rate model in Sec-
the expression of becomes more and more complicated as tion III. In Section IV, based on Shannon’s source coding the-
reported in the literature. orem, we provide a theoretical justification for the linear rate
It is well known that zeros play a key role in transform coding model. The physical meaning of the model parameter is dis-
of images and videos. The high compression ratio in transform cussed in Section V. A unified rate control algorithm for video
coding is mainly achieved by efficient coding of zeros [1], [2], coding is proposed in Section VI. The experimental results and
[6], [13], [15]. Let be the percentage of zeros among the performance comparison with other rate control algorithms are
quantized transform coefficients. Note that monotonically in- presented in Section VII. Finally, some concluding remarks are
creases with , which implies there is a one-to-one mapping be- given in Section VIII.
tween them. (Here, we assume that the transform coefficients
have a continuous and positive distribution.) Hence, mathemat- II. -DOMAIN RATE ANALYSIS
ically, the coding bit rate is also a function of , denoted by As discussed in Section I, if we assume the transform
. In this work we propose to study the rate function in the coefficients have a positive distribution, obviously, there is
domain instead of the traditional domain. Based on our ex- a one-to-one mapping1 between and . Therefore, any
tensive simulations and theoretical justification, we have found function in the domain can be mapped into the domain
out that in all typical transform coding systems, such as the and vice versa. This mapping can be easily computed from the
wavelet-based image compression [6], [16], [17], JPEG image distribution of the transform coefficients. Let us consider the
coding [1], MPEG-2 [2], H.263 [13], and MPEG-4 video coding H.263 encoder [14] as an example. Let and be the step
[15], is always a linear function size and dead zone threshold of the quantizer, respectively. In
H.263 coding implementation, is for intra MBs and
(1) for inter MBs. Let and be the distributions of the
DCT coefficients in the intra and inter MBs, respectively. (In
where is a constant. This leads to a unified source model for H.263 codec implementation, the DCT coefficients have integer
all typical transform coding systems. We have also observed that values. Therefore, and actually are histograms of
the only model parameter is directly related to the image con-
1In fact, the distribution of transform coefficients is nonnegative. However,
tent. Its physical meaning is also discussed in this work. For
video coding, we develop an adaptive scheme to estimate the
RD
this does not affect the proposed - analysis framework and rate control al-
gorithm. This is because zero frequency of a sample value means no such trans-
value of . Once is estimated, the rate curve can be constructed form coefficients existing in the picture data.
Authorized licensed use limited to: Malaviya National Institute of Technology Jaipur. Downloaded on February 11,2025 at 07:04:29 UTC from IEEE Xplore. Restrictions apply.
972 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2002

Fig. 2. Sample images for wavelet image coding.

the DCT coefficients.) For any , the corresponding percentage that is almost a straight line. In addition, this line passes
of zeros can be obtained as follows: through the point . This is because, when is 1.0, all
the coefficients are quantized to zeros and the corresponding
(2) coding bit rate should also become zero. Fig. 3 shows that
the linear source model given by (1) is true for practical wavelet
image coding. Experiments over many other images with other
where is the number of coefficients in the current video wavelet-based coding algorithms, such as the zeros-tree [16] and
frame. The above computation only involves several additions. the stack-run (SR) [17] coding algorithms, also yield similar re-
In the MPEG-2 quantization scheme, a perceptual quantization sults. Due to page limitation, they are omitted here.
matrix, denoted by , is employed [2]. In this case, we first For comparison, in Fig. 4, we plot the rate curve in the
scale each DCT coefficient by its perceptual weight . After domain for each sample image. It can be seen that for different
scaling, the perceptual quantization becomes uniform. We then images the patterns of are quite different from each other.
generate the distribution of the scaled DCT coefficients and In addition, has a very complex nonlinear behavior. This
compute the mapping using a formula similar to (2). In our rate image-dependent variation and nonlinear behavior make it very
control algorithm, this mapping is stored as a look-up table. The hard to develop an accurate and robust source model in the
mapping of the rate curve between the and the domains is domain. However, in the domain, the rate curve is a linear
performed by table look-up and bi-linear interpolation. In the function which is extremely simple. This is the great advantage
proposed -domain source modeling approach, we first estimate of the proposed -domain rate analysis.
the rate function in the domain, then map it to the domain to
obtain the - function, if necessary. B. Linear Source Model for DCT-Based Video Coding

III. LINEAR SOURCE MODEL FOR TRANSFORM CODING Next we show the linear source model also holds in various
video coding systems. With the H.263 video codec [14], we en-
The great advantage of -domain analysis is that the rate func- code the test video sequence at a series of quantization stepsizes.
tion is linear in the domain. To show this, a series of experi- Let be the coding bit rate which excludes the motion vectors
ments have been performed as described in this section. (MVs) and the header information bits. It should be noted that
the amount of bits for MV and header information is already
A. Linear Source Model for Wavelet-Based Coding fixed before rate control and quantization. We can not change
We randomly select 24 sample images which have a wide it during the rate control process. In Fig. 5, we plot for
range of - characteristics. The sample images are shown several frames from the “Foreman” video sequence. It can be
in Fig. 2. Each sample image is first decomposed by a five- seen that is approximately a linear function, as described
level dyadic scheme with the 9/7 wavelet [19]. The decomposed by (1).
image is uniformly quantized with a step size and then coded To demonstrate this linear relationship more systematically,
by the set partitioning in hierarchical trees (SPIHT) [6] algo- we study the correlation coefficient between and , denoted
rithm. For each , we compute the corresponding percentage by . In Fig. 6, we plot the values of for
of zeros and record the coding bit rate . By varying the each frame in “Akiyo” and “News” coded by MPEG-4, “Car-
quantization step size , we can generate a series of points on phone,” “Salesman,” “Coastguard,” and “Tabeltennis” coded
the rate curve , which are plotted in Fig. 3. It can be seen by MPEG-2. It should be noted that in MPEG-4 does not
Authorized licensed use limited to: Malaviya National Institute of Technology Jaipur. Downloaded on February 11,2025 at 07:04:29 UTC from IEEE Xplore. Restrictions apply.
HE AND MITRA: A LINEAR SOURCE MODEL AND A UNIFIED RATE CONTROL ALGORITHM FOR DCT VIDEO CODING 973

Fig. 3. Linear relationship between the percentage of zeros  and the coding bit rate R in wavelet image coding with SPIHT. The x axis represents  while the
y axis represents R. All the plots share the same coordinate system.

Fig. 4. Plot of the rate curve R(q ) in the q domain for each sample image coded by SPIHT. The x axis represents q , while the y axis represents R. All the plots
share the same coordinate system.

include the bits for shape information. From Fig. 6, it can be [20], [21] indicates that transform coefficients have a general-
seen that is always larger than 0.99. For most of the ized Gaussian distribution given by
frames, it is even larger than 0.995 which is extremely close to
1. This implies that the linear relationship between and also
holds in MPEG-2 and MPEG-4 video coding. In summary, our
(3)
extensive simulation results demonstrate that the linear source
model given by (1) is a unified source model for all typical
transform coding systems, such as SPIHT, zero-tree, and JPEG where
image coding, MPEG-2, H.263, and MPEG-4 video coding.

IV. HEURISTIC JUSTIFICATION


(4)
It is well known that the asymptotic - behavior of a coding
system is characterized by Shannon’s source coding theorem
[3], [4]. Based on this theorem, we provide a heuristic justifi- where is the standard deviation of the transform coefficients
cation for the -domain linear rate model in (1). The literature and is a model parameter which controls the shape of the
Authorized licensed use limited to: Malaviya National Institute of Technology Jaipur. Downloaded on February 11,2025 at 07:04:29 UTC from IEEE Xplore. Restrictions apply.
974 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2002

Fig. 5. Linear relationship between the percentage of zeros  and the coding bit rate R in video coding with H.263. The test frames are from the “Foreman”
QCIF video.

Fig. 6. The correlation coefficient (inverse) of each frame between the coding bit rate R and  in MPEG-4 and MPEG-2 video coding.

distribution. For example, when , becomes a generalized Gaussian source, we first consider its two special
Gaussian distribution given by cases: the Laplacian and the Gaussian sources.

A. Laplacian Source
(5)
For the Laplacian source, let us define the distortion measure
When , becomes a Laplacian distribution given as where is the input symbol and is the
by output symbol of the quantizer. According to Shannon’s source
coding theorem [4], [23], if a distortion is allowed, the min-
imum number of bits needed to represent a symbol from a Lapla-
(6) cian source is given by

For DCT-based image/video coding, Lam and Goodman [22]


have mathematically shown that the DCT coefficients have a
(7)
Laplacian distribution. In the following, before studying the
Authorized licensed use limited to: Malaviya National Institute of Technology Jaipur. Downloaded on February 11,2025 at 07:04:29 UTC from IEEE Xplore. Restrictions apply.
HE AND MITRA: A LINEAR SOURCE MODEL AND A UNIFIED RATE CONTROL ALGORITHM FOR DCT VIDEO CODING 975

TABLE I
AVERAGE PERCENTAGE OF ZEROS FOR A
WIDE RANGE OF CODING BIT RATES

First we consider a uniform quantizer. For a given quantization Fig. 7. Plots of the function R() given by (16) for different dead zone
step size , by definition, the corresponding distortion is threshold values.

is negligible when compared with the linear term. Therefore,


theoretically, is an approximately linear function.
(8) The above mathematical formulation is for the uniform quan-
With (6), from Appendix A we have tizer with a dead zone threshold of . In image and video
coding, a uniform threshold quantizer with a larger dead zone
is often used to produce more quantized zeros in order to re-
duce the coding bit rate. Suppose the dead zone threshold is
(9) where is some positive constant. The cor-
Note that the percentage of zeros is given by responding quantization distortion is given by

(10)

After changing the independent variable from to , (9) be-


comes (14)

(11) In this case, the percentage of zeros is given by

With (7) and (11), we have (15)

(12) With (14), (16), and (7), the expression of becomes

which is the rate function in the domain. A Taylor expansion (16)


of (12) yields
where
(13)
(17)
In our extensive simulations we observe that, in transform
coding of images and videos at low bit rates, the corresponding
We plot for , 0.5, and 0.75 in Fig. 7. It can be
percentage of zeros is mostly larger than 70%. To show this,
seen that the plots are all very close to being straight lines. This
in Table I we list the average percentage of zeros among the
justifies the linear rate model given by (1) for the Laplacian
quantized DCT coefficients for a wide range of coding bit rates.
source.
The four test videos are “Carphone,” “Akiyo,” “Foreman,”
and “Football” in QCIF format coded at 15 frames per second B. Gaussian Source
(fps). The last row of the table lists the corresponding peak
Next we consider the Gaussian source which is another spe-
signal-to-noise ratio (PSNR) for 960 kbps. We can see that
cial case of the generalized Gaussian source. For the Gaussian
these bit rates and PSNR values are much higher than those
distribution we need to employ the square error distortion
required in practical video coding applications. Even for 960
kbps, the average value of is still above 70%. Therefore, for (18)
practical purposes, we assume is normally larger than 70%.
In other words, is less than 0.3, which implies that According to Shannon’s source coding theorem [23], if a square
the nonlinear term in (13) is less than 0.027, which distortion given by (18) is allowed, the minimum number of
Authorized licensed use limited to: Malaviya National Institute of Technology Jaipur. Downloaded on February 11,2025 at 07:04:29 UTC from IEEE Xplore. Restrictions apply.
976 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2002

Fig. 8. Plots of the function R() for a Gaussian source at different dead zone Fig. 9. Plots of the lower and upper bounds of the -domain rate function R()
threshold values. for a generalized Gaussian source with  = 10 and  = 1:5.

bits needed to represent a symbol from a Gaussian source is


given by

(19)
.
For a uniform threshold quantizer with a step size and dead
zone , by definition, the corresponding distortion is given by

(20)

It might be very difficult to derive an explicit expression of Fig. 10. The slope  of each sample image.
in the same way as for the Laplacian distribution. Instead, we
evaluate numerically and plot it for different dead zone
The parameter in (21) is the square error distortion. Following
thresholds in Fig. 8. It can be seen that the rate function in the
the same procedure as for the Gaussian source, we numerically
domain for a Gaussian source is also also very close to being a
evaluate the lower and upper bounds of the rate function in the
linear function.
domain and plot them in Fig. 9. Here the variance of the source
is set to and the distribution control parameter is set
C. Generalized Gaussian Distribution
to . It can be seen that the lower and the upper bounds
Laplacian and Gaussian sources are two special cases of the of the rate function are very close to each other. In addition,
generalized Gaussian source. For these two types of sources both are approximately linear functions. Since the actual rate
coupled with appropriate distortion measures, based on the function should lie between them, it should therefore also be ap-
source coding theorem, we have the explicit expressions for proximately a linear function. This justifies the linear rate model
their - functions as given in (7) and (19). However, for a given in (1) for a generalized Gaussian source.
generalized Gaussian distribution, due to the complex nature
of the distribution, it is difficult to obtain an explicit expression V. PHYSICAL UNDERSTANDING OF THE SOURCE MODEL
for the - function. Instead, we can obtain the lower and
upper bounds of its - function [24]. To be more specific, for The only parameter of the proposed source model is the slope
a generalized Gaussian source with zero mean and variance . Obviously, is related to some characteristic of the input
, we have source data, and this characteristic has a deterministic effect on
the coding bit rate. In this section, we try to find out what the
(21) characteristic is and what its physical meaning is. In Fig. 10,
we plot the slope for each sample listed in Fig. 2. It can be
Here is the differential entropy of given by seen that the variation of is very large. We sort all of these
sample images by the value of . In Fig. 11, the sorted images are
listed in a raster scan order with increasing from the smallest
(22)
to the largest. It can be seen that the images in the first half have
Authorized licensed use limited to: Malaviya National Institute of Technology Jaipur. Downloaded on February 11,2025 at 07:04:29 UTC from IEEE Xplore. Restrictions apply.
HE AND MITRA: A LINEAR SOURCE MODEL AND A UNIFIED RATE CONTROL ALGORITHM FOR DCT VIDEO CODING 977

Fig. 11. Samples images sorted by  and listed in the raster scan order.

more high-frequency texture than those in the second half. The


images in the second half are smoother and more structured.
This suggests to us that the value of is closely related to the
amount of texture presented in the corresponding image.
In the frequency domain, an image with more texture nor-
mally has a relatively larger amount of middle or high-frequency
components [18]. In other words, the energy is more distributed
to the middle or high-frequency subbands. For smoother and
more structured images with less texture, the energy is more
concentrated in the lower frequency subbands. Let us consider
the wavelet transform as an example. After a five-level dyadic
subband decomposition, there are 16 subbands total, denoted by
. Let be the variance of the wavelet coeffi-
cients in . Let and be the arithmetic mean and geometric
mean of , respectively. We define the energy compaction
measure as Fig. 12. The linear correlation between the coding gain (energy compaction
measure) and the slope  .

(23) this section, we propose an adaptive estimation algorithm for


in video coding. Based on the linear source model in (1) and
the adaptive estimation, a unified rate control algorithm is then
Obviously, larger corresponds to more compacted energy and proposed for MPEG-2, H.263 and MPEG-4 video coding.
less texture components. Actually, is often used as a feature
variable for texture analysis. In Fig. 12, we plot the pair of A. Adaptive Estimation of
for each sample image listed in Fig. 2. It can be seen that there is Let be the number of the coded MBs in the current frame.
a strong linear correlation between and . The correlation co- Note that in a 16 16 MB, there are a total of 384 luminance and
efficient between them is 0.845 which is very high. This strong chrominance coefficients. Let be the number of bits used to
correlation explains the physical meaning of which is the only encode these MBs. We denote the number of zeros in these
parameter of the proposed source model. MBs by . Based on (1), can be estimated as follows:

VI. UNIFIED RATE CONTROL FOR VIDEO CODING (24)


In Sections II–V, we have developed a linear source model in
the domain. The only parameter of the source model is the . The estimated is then applied to the current MB. We can see
To estimate the rate curve, first we have to accurately estimate that the estimated value of is an accumulative statistic of the
the value of . Obviously, the physical meaning of provides a coded MBs. In Fig. 13(a), we plot the estimated value of at
way to estimate from the distribution of the subband energy. each MB for Frame 80 in “Carphone” QCIF video sequence.
However, Fig. 10 tells us the estimation is not very accurate. In It can be seen that, as more and more MBs are encoded, the
Authorized licensed use limited to: Malaviya National Institute of Technology Jaipur. Downloaded on February 11,2025 at 07:04:29 UTC from IEEE Xplore. Restrictions apply.
978 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2002

TABLE II
RATE CONTROL RESULTS FOR MPEG-2. -RC REPRESENTS THE PROPOSED
-DOMAIN RATE CONTROL ALGORITHM

(a)

current MB from if it is an intra MB or from if


it is an inter MB.
(b) Step 4: Loop. Repeat steps 2 and 3 for the next MB until all
Fig. 13. Analysis of the MB-level adaptive estimation of  . (a) Estimated value the MBs in the current frame are encoded.
of  at each MB. (b) Standard deviation of estimated  for each frame. This rate control algorithm has very low computational com-
plexity and implementation cost, only involving several addi-
estimated value of converges to its true value of the current tion and a few simple multiplication operations. It can be seen
video frame. Let be the standard deviation of the estimated that the proposed rate control algorithm always divides the video
value of at each MB. In Fig. 13(b), we plot for each coded frame into two groups, coded and uncoded MBs, and balances
frame in “Carphone.” It can be seen that is mostly less than the bit budget between these two groups. Such a type of rate
. In other words, the relative estimation error is below control mechanism turns out to be very accurate and robust. It
5%, which is very small. should also be noted that, in the proposed algorithm, the rate
control of the current frame does not use any information or
B. Rate Control Algorithm statistics from its previous frame. Therefore, it will never suffer
With the adaptive estimation of and the proposed linear from performance degradation at scene changes.
source model, the rate control for video coding turns out to be
very simple and straightforward. Let the target bit rate (in bits) VII. EXPERIMENTAL RESULTS
per frame be . Let the encoder buffer size be and the The proposed rate control algorithm has been implemented
number of bits in the buffer be . The available bits for coding in the H.263, MPEG-2 and MPEG-4 video coders. In the fol-
the current frame is lowing experiments, we compare the proposed rate control al-
gorithm with the rate control algorithms developed in TM5 [25]
(25)
of MPEG-2, TM8 [10] of H.263+, and VM8 [26] of MPEG-4.
where the target buffer level is by default set to 0.2. Let
A. Rate Control in MPEG-2
be the number of MBs in a video frame. For QCIF videos, is
99. The quantization parameter is determined by the following In MPEG-2, the video sequence is coded by the unit of GOP.
steps: Each GOP consists of at least one intracoded picture ( -frame)
Step 1; Initialization. Before encoding the first MB, set and some intercoded pictures ( - and -frames). We employ
. Compute the distributions and the TM5 bit allocation scheme [25] to determine the number of
for the DCT coefficients in the intra- and inter- MBs, bits assigned to each frame in the current GOP. For each frame,
respectively. Set which is its average value for typical the proposed rate control algorithm is employed to achieve the
video sequences. target bit rate. The test videos are “Foreman,” “Salesman,”
Step 2: Determine the quantization parameter . According “Tabletennis,” and “Coastguard.” The target bit rate for each
to (1), the number of zeros to be produced by quantizing the rest test is shown in Table II. To measure rate control performance,
MBs should be we define the relative control error as

(26) (27)

Based on the one-to-one mapping between and , the quanti- where and are the actual and target coding bits of each
zation parameter is determined. The current MB is quantized video frame. We plot of both rate control algorithms for
with and encoded. “Foreman” and “Tabletennis” in Figs. 14 and 15, respectively.
Step 3: Update. Let and be the number of zeros and It can be seen that the proposed algorithm yields a much smaller
number of bits produced by the current MB, respectively. Set control error, which is mostly less than 2%. The PSNR of each
, , and . If frame in “Foreman” and “Tabletennis” are plotted in Figs. 16
, update the value of according to (24). At the same and 17, respectively. It can be seen that with the proposed rate
time, subtract the frequencies of the DCT coefficients in the control algorithm the picture quality is significantly improved.
Authorized licensed use limited to: Malaviya National Institute of Technology Jaipur. Downloaded on February 11,2025 at 07:04:29 UTC from IEEE Xplore. Restrictions apply.
HE AND MITRA: A LINEAR SOURCE MODEL AND A UNIFIED RATE CONTROL ALGORITHM FOR DCT VIDEO CODING 979

Fig. 14. Relative bit rate control error in percentage for each frame in Fig. 17. PSNR of each frame in “Tabletennis” when the proposed rate control
“Foreman” when the proposed rate control algorithm (solid-dot line) and the algorithm (solid–dotted line) and the TM5 algorithm are applied to the MPEG-2
TM5 algorithm are applied to the MPEG-2 coding. coding.

TABLE III
DESCRIPTION OF THE RATE CONTROL TESTS WITH THE H.263 CODEC

The significant picture quality improvement is due to our accu-


rate source model and robust rate control.

B. Rate Control in H.263


Fig. 15. Relative bit rate control error in percentage for each frame in In real-time video coding with H.263, the time delay should
“Tabletennis” when the proposed rate control algorithm (solid-dot line) and the be very small, which imposes strict requirement on the rate
TM5 algorithm are applied to the MPEG-2 coding.
control process. In the following experiment, we compare the
proposed rate control algorithm with the TMN8 algorithm
[10] which is one of the best available rate control algorithms
for video coding. The configuration of each test is shown in
Table III. The frame rate is fixed at 10 fps. We plot the number
bits in the buffer for each test in Fig. 18. The proposed rate
control algorithm maintains a much steadier buffer level than
TMN8. The number of bits produced by each frame is plotted
in Fig. 19. It can be seen that with the proposed algorithm the
output bit rate of the video encoder is well matched to the target
bit rate or the channel bandwidth. The average PSNR of the
luminance components in each test are listed in Table III. The
proposed algorithm achieves a slightly improved picture quality
due to its more robust rate control and less skipped frames.

C. Rate Control in MPEG-4


Fig. 16. PSNR of each frame in “Foreman” when the proposed rate control We use the MoMuSys MPEG-4 codec [27] with the
algorithm (solid–dotted line) and the TM5 algorithm are applied to the MPEG-2 H.263-type quantization scheme to test the proposed algorithm
coding. and the rate control algorithm in VM8. The two test videos are
“Carphone” and “News”. We treat the whole scene as one video
The picture quality improvement for the other two test videos object. The frame rate is 10 fps. The target bit rate is 64 kbps.
are listed in Table II. The PSNR gain of 0.87 dB on average is The number of bits produced by each frame in “Carphone” and
achieved. For some video frames, the gain is even up to 2 dB. “News” are plotted in Figs. 20 and 21, respectively. It can
Authorized licensed use limited to: Malaviya National Institute of Technology Jaipur. Downloaded on February 11,2025 at 07:04:29 UTC from IEEE Xplore. Restrictions apply.
980 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2002

Fig. 18. Number of bits in the buffer when the proposed algorithm (solid line) and the TMN8 rate control algorithm (dotted line) are appplied to the H.263 video
coding.

Fig. 19. Number of bits produced by each encoded frame when the proposed algorithm (solid line) and the TMN8 rate control algorithm (dotted line) are appplied
to the H.263 video coding.

Authorized licensed use limited to: Malaviya National Institute of Technology Jaipur. Downloaded on February 11,2025 at 07:04:29 UTC from IEEE Xplore. Restrictions apply.
HE AND MITRA: A LINEAR SOURCE MODEL AND A UNIFIED RATE CONTROL ALGORITHM FOR DCT VIDEO CODING 981

of the model parameter is discussed. Second, a unified rate con-


trol algorithm is proposed for various standard video coding sys-
tems, which outperforms other algorithms reported in the liter-
ature by providing more accurate and robust rate control. The
algorithm proposed in this work is conceptually simple. It has
very low computational complexity and implementation cost.

APPENDIX
A detailed derivation of (9) is given in this appendix. Note
that

(28)

and
Fig. 20. Number of bits produced by each encoded frame in “Carphone” when
the proposed algorithm (solid line) and the VM8 rate control algorithm (dotted (29)
line) are appplied to the MPEG-4 coding.
From (8), we have

(30)

(31)

(32)

Fig. 21. Number of bits produced by each encoded frame in “News” when the (33)
proposed algorithm (solid line) and the VM8 rate control algorithm (dotted line)
are appplied to the MPEG-4 coding.
(34)

be seen that with the proposed algorithm, an almost constant bit


rate is achieved. The relative control error is less than 1%. How-
ever, in the rate control of VM8, the bit rate variation is very (35)
large. On average, 0.5 dB and 0.7 dB improvement in PSNR
are achieved by the proposed rate control algorithm for “Car-
phone” and “News,” respectively. The proposed rate control al- REFERENCES
gorithm has also been tested over other video sequences at dif- [1] G. K. Wallace, “The JPEG still picture compression standard,” Commun.
ferent coding bit rates and yields similiar results. The simulation ACM, vol. 34, pp. 30–44, Apr. 1991.
results, along with the results presented in the above, show that [2] D. LeGall, “MPEG: A video compression standard for multimedia ap-
plication,” Commun. ACM, vol. 34, pp. 46–58, Apr. 1991.
the proposed algorithm provides a much more robust and accu- [3] H. Gish and J. N. Pierce, “Asymptotically efficient quantizing,” IEEE.
rate rate control than other algorithm reported in the literature. Trans. Inform. Theory, vol. IT-14, pp. 676–683, Sept. 1968.
[4] T. Berger, Rate Distortion Theory. Englewood Cliffs, NJ: Prentice-
Hall, 1984.
VIII. CONCLUDING REMARKS [5] H.-M. Hang and J.-J. Chen, “Source model for transform video coder
and its application—Part I: Fundamental theory,” IEEE Trans. Circuits
There are two major contributions in this work. First, we have Syst. Video Technol., vol. 7, pp. 287–298, Apr. 1997.
proposed a novel framework for source modeling by studying [6] A. Said and W. A. Pearlman, “A new fast and efficient image codec
the rate function in the domain instead of the traditional do- based on set partitioning in hierarchical trees,” IEEE Trans. Circuits
Syst. Video Technol., vol. 6, pp. 243–250, June 1996.
main. A unified linear source model have been developed for [7] W. Ding and B. Liu, “Rate control of MPEG video coding and
typical transform video coding systems. The proposed source recording by rate-quantization modeling,” IEEE Trans. Circuits Syst.
model is very simple but very accurate. Based on Shannon’s Video Technol., vol. 6, pp. 12–20, Feb. 1996.
[8] T. Chiang and Y.-Q. Zhang, “A new rate control scheme using quadratic
source coding theorem, we have also developed a theoretical rate distortion model,” IEEE Trans. Circuits Syst. Video Technol., vol. 7,
justification for this linear source model. The physical meaning pp. 246–250, Feb. 1997.
Authorized licensed use limited to: Malaviya National Institute of Technology Jaipur. Downloaded on February 11,2025 at 07:04:29 UTC from IEEE Xplore. Restrictions apply.
982 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 11, NOVEMBER 2002

[9] H.-J. Lee, T. Chiang, and Y.-Q. Zhang, “Scalable rate control for Zhihai He (M’01) received the B.S. degree from
MPEG-4 video,” IEEE Trans. Circuits Syst. Video Technol., vol. 10, pp. Beijing Normal University, Beijing, China, and
878–894, Sept. 2000. the M.S. degree from Institute of Computational
[10] J. Ribas-Corbera and S. Lei, “Rate control in DCT video coding for Mathematics, Chinese Academy of Sciences,
low-delay communications,” IEEE Trans. Circuits Syst. Video Technol., Beijing, China, in 1994 and 1997 respectively,
vol. 9, pp. 172–185, Feb. 1999. both in mathematics, and the Ph.D. degree from
[11] , “Contribution to the rate control Q2 experiment: A quantizer University of California, Santa Barbara, CA, in
control tool for achieving target bitrates accurately,” in Coding of 2001, in electrical engineering.
Moving Pictures and Associated Audio MPEG96/M1812 ISO/IEC In 2001, he joined Sarnoff Corporation, Princeton,
JTC/SC29/WG11, Sevilla, Spain, Feb. 1997. NJ, as a Member of Technical Staff. His current
[12] “Video Coding for Low Bit Rate Communications,” in ITU-T Recom- research interests include image compression, video
mendation H.263, version 1: ITU-T, 1995. coding, network transmission, wireless communication, and embedded system
[13] G. Cote, B. Erol, M. Gallant, and F. Kossentini, “H.263+: Video coding design.
at low bit rates,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, pp. Dr. He received the 2001 IEEE Circuits and Systems Society CSVT Trans-
849–866, Nov. 1998. actions Best Paper Award.
[14] “ITU-T/SG-15, video codec test model, TMN5,” in Telenor Research:
Telenor codec, 1995.
[15] T. Sikora, “The MPEG-4 video standard verification model,” IEEE
Trans. Circuits Syst. Video Technol., vol. 7, pp. 19–31, Feb. 1997. Sanjit K. Mitra (S’59–M’63–SM’69–F’74–LF’01)
[16] J. M. Shapiro, “Embedded image coding using zerotrees of wavelet co- received the B.Sc. (Hons.) degree in physics in 1953
efficients,” IEEE Trans. Signal Processing, vol. 41, pp. 3445–3462, Dec. from Utkal University, Cuttack, India, the M.Sc.
1993. (Tech.) degree in radio physics and electronics in
[17] M. J. Tsai, J. D. Villasenor, and F. Chen, “Stack-run image coding,” 1956 from Calcutta University, Calcutta, India, the
IEEE Trans. Circuits Syst. Video Technol., vol. 6, pp. 519–521, Oct. M.S. and Ph.D. degrees in electrical engineering
1996. from the University of California, Berkeley, in 1960
[18] T. Chang and C.-C. J. Kuo, “Texture analysis and classification with and 1962, respectively, and an Honorary Doctorate
tree-structured wavelet transform,” IEEE Trans. Image Processing, vol. of Technology degree from the Tampere University
2, pp. 432–435, Oct. 1993. of Technology, Tampere, Finland, in 1987.
[19] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, “Image coding From June 1962 to June 1965, he was with Cornell
using wavelet transform,” IEEE Trans. Image Processing, vol. 1, pp. University, Ithaca, NY, as an Assistant Professor of Electrical Engineering. He
205–220, Apr. 1992. was with the AT&T Bell Laboratories, Holmdel, NJ, from June 1965 to January
[20] R. W. Buccigrossi and E. P. Simoncelli, “Image compression via joint 1967. He has been on the faculty at the University of California since then,
statistical characterization in the wavelet domain,” IEEE Trans. Image first at the Davis campus and then at the Santa Barbara campus since 1977, as a
Processing, vol. 8, pp. 1688–1701, Dec. 1999. Professor of Electrical and Computer Engineering, where he served as Chairman
[21] S. M. LoPresto, K. Ramchandran, and M. T. Orchard, “Image coding of the Department from July 1979 to June 1982. He has published over 550
based on mixture modeling of wavelet coefficients and a fast estimation- papers in signal and image processing, 12 books, and holds five patents.
quantization framework,” in Proc. Data Compression Conf., Snowbird, Dr. Mitra served as the President of the IEEE Circuits and Systems (CAS) So-
UT, Mar. 1997, pp. 221–230. ciety in 1986 and as a Member-at-Large of the Board of Governors of the IEEE
[22] E. Y. Lam and J. W. Goodman, “A mathematical analysis of the DCT co- Signal Processing (SP) Society during 1996–1999. He is currently a member
efficient distributions for images,” IEEE Trans. Image Processing, vol. of the editorial boards of four journals. He is the recipient of the 1973 F.E.
9, pp. 1661–1666, Oct. 2000. Terman Award and the 1985 AT&T Foundation Award of the American So-
[23] A. J. Viterbi and J. K. Omura, Principles of Digital Communication and ciety of Engineering Education, the 1989 Education Award, and the 2000 Mac
Coding. New York: McGraw-Hill, 1979. Van Valkenburg Society Award of the IEEE CAS Society, the Distinguished
[24] T. G. Cover and J. A. Thomas, Elements of Information Theory. New Senior U.S. Scientist Award from the Alexander von Humboldt Foundation of
York: Wiley, 1991. Germany in 1989, the 1996 Technical Achievement Award, and the 2001 So-
[25] “MPEG-2 video test model 5,” in ISO/IEC JTC1/SC29/WG11 ciety Award of the IEEE SP Society, the IEEE Millennium Medal in 2000, and
MPEG93/457, 1993. the McGraw-Hill/Jacob Millman Award of the IEEE Education Society in 2001.
[26] “Text of ISO/IEC 14 496-2 MPEG4 video VM—Version 8.0,” in He is the co-recipient of the 2000 Blumlein–Browne–Willans Premium of the
ISO/IEC JTC1/SC29/WG11 Coding of Moving Pictures and Associated Institution of Electrical Engineers (London), the 2001 IEEE TRANSACTIONS ON
Audio MPEG 97/W1796. Stockholm, Sweden: Video Group, 1997. CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY Best Paper Award, and the
[27] “MPEG4 verification model version 7.0,” in ISO/IEC 2002 Technical Achievement Award of the European Association for Signal
JTC1/SC29/WG11 Coding of Moving Pictures and Associated Audio Processing (EURASIP). He is an Academician of the Academy of Finland and
MPEG97. Bristol, U.K.: MoMuSys codec, 1997. a Corresponding Member of the Croatian Academy of Sciences and Arts. He is
a Fellow of the AAAS, and SPIE, and a member of EURASIP and ASEE.

Authorized licensed use limited to: Malaviya National Institute of Technology Jaipur. Downloaded on February 11,2025 at 07:04:29 UTC from IEEE Xplore. Restrictions apply.

You might also like