A High-Level Simulator For The H.264/AVC Decoding Process in Multi-Core Systems
A High-Level Simulator For The H.264/AVC Decoding Process in Multi-Core Systems
Outline
Introduction
H.264 as a new-generation video coding algorithm is becoming increasingly important for international broadcasting standards such as DVB-H and DMB. H.264 improved high compression efficiency at the cost of increased computational complexity. Mobile devices (embedded processor)
Multi-core systems provide an elegant and powerefficient solution to overcome the performance limitation.
3
Introduction
Efficiently distributing the video algorithm among multiple processors is a non-trivial task.
It requires detailed knowledge about the algorithmic complexity and inter-dependencies between functional blocks. The objective of this paper is an investigation on
the dynamic behavior of the H.264 decoding process the interaction between the main decoding tasks in the multi-core environments
4
Figure 1. Dynamic variations in the execution times of individual macroblocks in the H.264 decoding process. Histograms are shown for six IPB coded sequences of 100 frames with Group of Pictures (GOP) sizes being 13.
Histogram bins plot the number of macroblocks having similar runtimes. It is observed that the runtimes of macroblocks significantly vary within a sequence due to different image content. The overall runtime of the decoder strongly depends on the content of the encoded video material.
5
We present a high-level simulator for multi-core implementations of the H.264 decoder in this paper.
Figure 2. Concept of the simulator. (a) Profiling data. (b) Underlying hardware. (c) Simulation of a splitting that maps f1 and f2 to the first core and f3 to the second one.
Encoded Bitstream
Spatial Prediction
Motion Compensation
Multi-processor decoding
The parser processor (1st CPU) performs all functions related to bitstream parsing
Entropy Decoding : the basic entropy decoding of picture data such as motion vectors and DCT residuals Context Calculation : the prediction step of context adaptive VLC coding for residuals and motion vector prediction Init : the memory initialization of macroblock data structures
10
Multi-processor decoding
Intra/Inter Prediction : the intra and inter prediction routines IDCT : the inverse residual transformation, which are based on multiples of the 4 4 pixel block size Strength Calculation : filter strength coefficients for the deblocking process are calculated Deblocking : before applying the deblocking filter as the last step in the macroblock decoding process
11
SVENm Multimedia Engine Video / multimedia companion Targets H.264 encoding / decoding at SD resolution
13
14
Simulation result
6 test sequences
Parameters
Figure 5. Two methods for partitioning the H.264 decoder on a dual-core system. (a) Scenario 1: The function Strength Calc. is part of the parsing module. (b) Scenario 2: The function Strength Calc. is part of the reconstructor.
16
Figure 6. The Foreman sequence at different bitrates. Macroblocks are classified based Figure 6(b) indicates that the that work between two on the percentage of the overall time is load spentbalancing in the parsing modulethe of the decoder. A percentage of 80 that 80% of the runtime is spent in the parser, while 20% are processors ismeans significantly improved. consumed in the reconstructor. A value of 50% indicates a perfect balance.
17
Figure 7(a) the reconstructor processorscore idle for time is approximately Figure 7. Idle: time for parser and reconstructor three test sequences.40% (a) in all three test sequences and at for allparser data rates. Filter strength calculation is done the side. (b) Filter strength calculation is performed at :the reconstructor side. Figure 7(b) the reconstructor idle time can be reduced below 15%.
18
Figure 8. Average idle times of all system cores while decoding (a) intra-coded(I frames). For the simulations the calculation of the filter strength was assigned to the reconstructor core.[Figure 5(b)]
Increasing the PSNR value (and the bitrate) mainly raises the macroblock processing complexity at the parsing core and performance decrease. At a buffer size of one macroblock(1MB) the Foreman sequence performs best at 35 dB.
5MB: a continuous performance decrease with increasing PSNR values can be 19 observed for the Foreman sequence.
Flowergarden and the Barcelona sequences, higher parsing complexity results in(c) Figure 8. Average idle times of all system coresthe while decoding (b) inter-coded P- and typically higher idle times andtest less performance improvements at higher buffer sizes. inter-coded B-frames of three sequences.
20
Conclusion
A simulator for mapping the H.264 decoding process onto hardware architecture has been introduced. We have demonstrated the simulators abilities to analyze the efficiency of a multicore architecture under various conditions.
21
References
[4] T.-T. Shih, C.-L. Yang, and Y.-S. Tung, Workload characterization of the H.264/AVC decoder, in Proc. of the 5th IEEE Pacific-Rim Conference on Multimedia, pp. 957966, 2004. [6] F. Seitner, R. Schreier, M. Bleyer, and M. Gelautz, A macroblock-level analysis on the dynamic behaviour of an H.264 decoder, in Proc. of ISCE 2007, (Dallas), June 2007. [7] E. B. van der Tol, E. G. Jaspers, and R. H. Gelderblom, Mapping of H.264 decoding on a multiprocessor architecture, in Proc. of the SPIE, 5022, pp. 707718, May 2003. F. Seitner, R. Schreier, M. Bleyer, and M. Gelautz, Evaluation of dataparallel splitting approaches for H.264 decoding, Proc. of the 6th International Conference on Advances in Mobile Computing and Multimedia, Linz; November 2008. http://www.powercam.cc/slide/1580
Florian Seitner, Josef Meser, Gerold Schedelberger, Andreas Wasserbauer, Michael Bleyer, Margrit Gelautz, Markus Schutti, Ralf Schreier, Premysl Vaclavik, Gerald Krottendorfer, Gnther Truhlar, Thomas Bauernfeind, Philipp Beham, Design Methodology for the SVENm Multimedia Engine, Austrochip 2008, Invited Poster.
22
H264 benchmarks
JM Reference Codec X264 encoder FFmpeg H.264 decoder FFmpeg includes a H.264/AVC decoder that implements most of the features of the main and high profiles of the standard. The code is very optimized and include MMX/SSE and Altivec SIMD instructions for the most time consuming kernels. It is widely used in free multimedia players like MPlayer, VLC media player(VideoLAN), Xineetc. http://ffmpeg.org/
23