0% found this document useful (0 votes)
10 views2 pages

A Fast Mode Decision Algorithm For Downscaled Transcoding of H.264 Preencoded Video

Uploaded by

Qiang Chen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views2 pages

A Fast Mode Decision Algorithm For Downscaled Transcoding of H.264 Preencoded Video

Uploaded by

Qiang Chen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

P1-11

A Fast Mode Decision Algorithm for


Downscaled Transcoding of H.264 Preencoded Video
Matthias von dem Knesebeck, Student Member, IEEE, Panos Nasiopoulos, Member, IEEE
University of British Columbia, Vancouver, Canada
Abstract-- In this paper, we present a fast mode decision optimal motion vector. With them as a starting point, an
method for transcoding pre-encoded H.264 video to a lower additional 1-pixel motion search with a subsequent quarter-
resolution. The algorithm reduces the computational load of the pixel refinement is still required to obtain high compression
mode decision process during transcoding by adaptively
efficiency. AWMVM gives the best results among the above
considering information from the residual of the pre-encoded
video and some intermediate results obtained in the re-encoding methods and is used as a base for our proposed method.
process. A performance improvement of 37% has been achieved H.264 defines seven block-sizes for every temporally
compared to the best existing downscaling transcoding process predicted (INTER-) frame (16x16, 16x8, 8x16, 8x8, 8x4, 4x8,
known as the area-weighted vector median filter (AWVM). 4x4) as well as a SKIP mode, which does not encode a
residual signal. When considering a downscaling ratio of 2:1,
I. INTRODUCTION four 16x16 blocks of the original preencoded stream will be
With the continuous evolution of versatile mobile devices, reduced to one 16x16 block in the transcoded stream. Each
the demand for mobile video applications is rapidly gaining block can itself consist of a number of sub-partitions.
popularity. At the same time, mobile devices suffer from In order to identify the most efficient choice among these
innate constraints such as limited processing power, energy block sizes, H.264 aims to minimize a Lagrangian cost
reservoir, storage space and transmission bandwidth. Due to function for every coding mode:
these constraints, it is highly desirable to provide the content J = D + λR (1)
being displayed in a format that is optimized for the target
device. However, existing content has often been pre-encoded where D denotes the distortion between the predicted and the
with a resolution or bitrate exceeding these constraints, aiming original macroblock, R represents the number of bits used to
at a different market, e.g., home entertainment. encode the motion information, and λ is the Lagrangian
Transcoding is a process that allows converting a bitstream multiplier that is itself dependent on the quantization factor
from one format to another, in the present case from a high chosen. The coding mode resulting in the lowest cost J is
resolution to a lower resolution. The computationally finally selected for encoding.
expensive cascaded transcoding scheme involves decoding the In order to reduce the computational burden of mode
original preencoded stream, downsizing the obtained frames decision that is due to evaluating all possible coding modes, a
using for instance a bilinear filter and subsequently re- number of methods have been proposed for encoding from raw
encoding these downsized frames using an H.264 encoder. video and have been applied for transcoding [3-6]. However,
This process can consume a considerable amount of time and improved performance may be achieved when additional
computational resources. Cascaded transcoding doesn’t take information from the preencoded video stream is taken into
advantage of existing information in the preencoded stream account, as it is subsequently presented in our algorithm.
about compression characteristics such as motion vectors,
II. PROPOSED ALGORITHM
modes or residual values. Hence, various methods have been
proposed that re-use or consider that information in order to The goal is to find a measure that can be obtained with little
speed-up the re-encoding process. computational effort and that would allow an early decision
Motion search and mode decision account for up to 90% of about the optimal mode for the current macroblock.
the computational efforts in video encoding. After transcoding a number of test sequences with the
For addressing the challenge of motion search, a number of cascaded scheme, we analyzed the distribution of the block
algorithms have been proposed that allow re-estimating a sizes within every frame of the resulting downsized video.
suitable (initial) motion vector from the preencoded stream We have found that there is a correlation present between
and hence reduce the burden of the element of motion search. the distribution of large, medium sized and small partition
Those include the simple average (SA), the activity-weighted sizes within every 16x16 macroblock and the sum-of-absolute
average [1, 2], the area-weighted average (AWA) and the area- residual values (SAR) found within the areas in the preencoded
weighted vector median (AWMVM) [3]. The obtained motion stream that correspond to these macroblocks (in case of a 2:1
vectors from these schemes are close approximations to the downsizing ratio, those corresponding areas contain 32x32
pixels). The correlation indicates that small partition sizes are
found in the downscaled stream where the absolute residual
This work was supported by the Natural Sciences and Engineering
Research Council of Canada (NSERC) and the British Columbia
values in the preencoded video are large and vice versa.
Innovation Council. However, this statement does not hold true in all cases.

978-1-4244-4316-1/10/$25.00 ©2010to:
Authorized licensed use limited IEEE
East China Normal University. Downloaded on March 23,2024 at 07:08:51 UTC from IEEE Xplore. Restrictions apply.
When re-encoding the downscaled video using the proposed algorithm and the AWMVM method for the obtained
AWMVM as a starting point for the 1-pixel motion search, we QCIF resolution stream of the Foreman sequence.
also found that a second measure, the distribution of the cost TABLE 2: EXPERIMENTAL RESULTS (CIF -> QCIF), QP 20
values J obtained from the 16x16 mode search within a frame PSNR bitrate Enc.Time SrchPts
exhibits a significant correlation as well with the distribution Sequence Method (dB) (kbps) (s) per MB CMP/MB
AWMVM 42.44 346.4 808.1 5,148 150,172
of the resulting final block sizes chosen. Foreman Proposed 42.40 350.4 636.5 2,070 92,677
Interestingly, our analysis unveiled that in the vast majority ∆ -0.04 1.17% -21.23% -59.79% -38.29%
of cases when the first measure failed to indicate the best block AWMVM 44.49 77.4 577.2 4,395 125,418
Akiyo Proposed 44.45 77.6 448.3 1,974 81,175
size, the second measure would make a correct prediction and ∆ -0.04 0.30% -22.34% -55.09% -35.28%
vice versa. We concluded that a combination of these two AWMVM 41.40 1,109.8 647.2 5,209 140,641
measures might yield the desired predictor that fulfills the two Mobile Proposed 41.35 1,124.1 500.1 2,184 85,124
∆ -0.05 1.28% -22.74% -58.07% -39.47%
goal criteria. The following linear combination of the two Average -0.04 0.92% -22.10% -57.65% -37.68%
measures, obtained by running an optimization algorithm,
minimizes the error between the predicted block size 46

distribution and the desired block size distribution and yields 45

an overall very strong correlation: 44

i ∈ {1..N }
43
M i = 1.25 × SAR i + J i , (2)

PSNR (dB)
42
with Mi denoting the measure that is used to assign a set of 41
valid modes for the given (16x16) macroblock i within a frame 40
with N macroblocks, SARi indicating the sum-of-absolute 39
residual values in the corresponding area of the preencoded 38
Proposed
video and Ji stating the cost value obtained from the 16x16 37 AWMVM
mode search. 36
With this measure at hand, we now need to define the 100 200 300 400 500 600
Bitrate (kbps)
decision criteria to assign the appropriate set of modes to the
Fig. 1: Foreman (CIF → QCIF)
current macroblock. While the distribution of M remains
strongly correlated with the block-size distribution for varying
IV. CONCLUSION
motion content, the mean and the range of the values Mi within
a frame change with the content. We concluded that a criterion We presented an early-termination mode decision process
involving the mean (µ) and the standard deviation (σ) would for efficiently transcoding H.264 video content to a smaller
define a suitable factor for this task. A close analysis unveiled resolution by a factor of 0.5. The method takes advantage of
that the adaptive thresholds in Table 1 are most suitable for correlation present between the residual of the preencoded
assigning a set of valid modes to every macroblock i. stream, the final cost value from the 1-pixel full search around
the AWMVM vector and the set of optimal coding modes for a
TABLE 1: VALID MODES FOR EVERY MACROBLOCK
given macroblock. Our performance evaluations have shown
Threshold for Mi Valid Modes
Mi > µ+0.5σ all modes
that about 37% of the computations can be saved compared to
µ+0.5σ > Mi > µ-σ 16x16,16x8,8x16 the unmodified AWMVM method while preserving picture
Mi < µ-σ 16x16 only quality (-0.04dB) and incurring a 0.92% increase in bitrate.

III. EXPERIMENTAL RESULTS REFERENCES


The proposed algorithm was implemented using the JM [1] B. Shen, I.K. Sethi and B. Vasudev, "Adaptive motion-vector
resampling for compressed video downscaling", IEEE Trans. on
reference software 14.2 on a 3.0 GHz Pentium IV platform. Circuits and Syst. for Video Technology, vol. 9, pp. 929-36, 1999.
Simulations were performed with the first 100 frames of three [2] M. Chen, M. Chu and S. Lo, "Motion vector composition algorithm for
standard test sequences (Akiyo, Foreman, Mobile) using IPPP spatial scalability in compressed video", IEEE Trans. on Consumer
Electronics, vol. 47, pp. 319-25, 2001.
GOP structure and 1 reference frame.
[3] Y. Tan and H. Sun, "Fast motion re-estimation for arbitrary downsizing
Table 2 shows the results for transcoding the three test video transcoding using H.264/AVC standard", IEEE Trans. on
sequences from CIF to QCIF resolution with the AWMVM Consumer Electronics, vol. 50, pp. 887-94, 2004.
method and the proposed algorithm, detailing PSNR, bitrate, [4] C. Liu, W. Wei, N. Yang and C. Kuo, "Motion vector re-estimation for
video trans-coding with arbitrary downsizing," in Int. Conf. on Intellig.
encoding time, number of search points visited per MB and the Inform. Hiding and Multimedia Signal Processing, pp. 806-9, 2008.
number of pixel comparisons performed per MB (CMP/MB). [5] H. Shen, X. Sun, F. Wu, H. Li and S. Li, "A fast downsizing video
We observe that 37.68% of pixel comparisons are avoided transcoder for H.264/AVC with RD optimal mode decision," in IEEE
Int. Conf. on Multimedia and Expo, pp. 2017-20, 2006.
by using the proposed method with a small decrease in picture [6] P. Zhang, Y. Lu, Q. Huang and W. Gao, "Mode mapping method for
quality by 0.04dB and a bit rate increase of 0.92% on average. H.264/AVC spatial downscaling transcoding," in IEEE Int. Conf. on
Fig. 1 illustrates the rate-distortion performance of the Image Processing, pp. 2781-4, 2004.

Authorized licensed use limited to: East China Normal University. Downloaded on March 23,2024 at 07:08:51 UTC from IEEE Xplore. Restrictions apply.

You might also like