0% found this document useful (0 votes)
64 views23 pages

A High-Level Simulator For The H.264/AVC Decoding Process in Multi-Core Systems

This document presents a high-level simulator for mapping an H.264 video decoder to multi-core systems. The simulator models the decoding tasks and their dependencies to analyze partitioning approaches and optimize load balancing across processor cores.

Uploaded by

StarLink1
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views23 pages

A High-Level Simulator For The H.264/AVC Decoding Process in Multi-Core Systems

This document presents a high-level simulator for mapping an H.264 video decoder to multi-core systems. The simulator models the decoding tasks and their dependencies to analyze partitioning approaches and optimize load balancing across processor cores.

Uploaded by

StarLink1
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 23

A high-level simulator for the H.

264/AVC decoding process in multi-core systems


Florian H. Seitner, Ralf M. Schreier, Michael Bleyer, Margrit Gelautz Vienna University of Technology, Austria
SPIE IS&T Electronic Imaging Conference, Multimedia on Mobile Devices 2008

Outline

Introduction Multi-processor decoding High-level simulator Simulation result Conclusion

Introduction

H.264 as a new-generation video coding algorithm is becoming increasingly important for international broadcasting standards such as DVB-H and DMB. H.264 improved high compression efficiency at the cost of increased computational complexity. Mobile devices (embedded processor)

Low processing(computation) capability Limited energy(power)

Multi-core systems provide an elegant and powerefficient solution to overcome the performance limitation.
3

Introduction

Efficiently distributing the video algorithm among multiple processors is a non-trivial task.

The decoding load should be distributed equally Data dependency Synchronization

It requires detailed knowledge about the algorithmic complexity and inter-dependencies between functional blocks. The objective of this paper is an investigation on

the dynamic behavior of the H.264 decoding process the interaction between the main decoding tasks in the multi-core environments
4

Figure 1. Dynamic variations in the execution times of individual macroblocks in the H.264 decoding process. Histograms are shown for six IPB coded sequences of 100 frames with Group of Pictures (GOP) sizes being 13.

Histogram bins plot the number of macroblocks having similar runtimes. It is observed that the runtimes of macroblocks significantly vary within a sequence due to different image content. The overall runtime of the decoder strongly depends on the content of the encoded video material.
5

Table 1. Six test sequences with normalization to 35 dB.

We present a high-level simulator for multi-core implementations of the H.264 decoder in this paper.

Figure 2. Concept of the simulator. (a) Profiling data. (b) Underlying hardware. (c) Simulation of a splitting that maps f1 and f2 to the first core and f3 to the second one.

Parallel H.264 Decoding


The H.264 Decoder

The H.264 decoding process http://www.powercam.cc/slide/1580


Inverse Quantization Stream Parsing Entropy Decoder Inverse DCT + Deblocking

Encoded Bitstream

Spatial Prediction
Motion Compensation

Reference Frames Parser

Reconstructor Data-Parallel Processing

Multi-processor decoding

Figure 3. Partitioning the H.264 decoder on a dual-core system.

The parser processor (1st CPU) performs all functions related to bitstream parsing

Entropy Decoding : the basic entropy decoding of picture data such as motion vectors and DCT residuals Context Calculation : the prediction step of context adaptive VLC coding for residuals and motion vector prediction Init : the memory initialization of macroblock data structures
10

Multi-processor decoding

The reconstructor processor (2nd CPU) handles all pixel-based operations

Intra/Inter Prediction : the intra and inter prediction routines IDCT : the inverse residual transformation, which are based on multiples of the 4 4 pixel block size Strength Calculation : filter strength coefficients for the deblocking process are calculated Deblocking : before applying the deblocking filter as the last step in the macroblock decoding process
11

High-level simulator - Austrochip 2008, Invited Poster


CHILI Vector Processor
CHILI Design CHILI Core with 32bit / 4 Slots / 8 SIMD High performance for signal processing and control code Compiler friendly instruction set Fully programmable (C / Assembler) C-Compiler (LLVM, GCC) and instruction set simulator available CHILI Processor Features Separate instruction and data path 16-bit SIMD operands 64 32-bit general purpose registers 128-bit core memory interface 64 KB instruction cache 64 KB data SRAM (core memory) 64-channel data load and store DMA controller 1.92 GMAC 16-bit operations (@ 240 MHz)
12

SVENm Multimedia Engine Video / multimedia companion Targets H.264 encoding / decoding at SD resolution

13

14

Simulation result

6 test sequences

Foreman, Flowergarden, Barcelona, Paris, Bus, Mobile


Test sequences are encoded in H.264 main profile using the JM12.2 encoder GOP size = 13 frames CIF, IPB, VLC, deblocking active, all prediction modes allowed SR(Search Range) = +/16 pixels 3 reference frames 1 slice per frame
15

Parameters

Simulation result (1)Variation of partitioning

Figure 5. Two methods for partitioning the H.264 decoder on a dual-core system. (a) Scenario 1: The function Strength Calc. is part of the parsing module. (b) Scenario 2: The function Strength Calc. is part of the reconstructor.

16

Figure 6. The Foreman sequence at different bitrates. Macroblocks are classified based Figure 6(b) indicates that the that work between two on the percentage of the overall time is load spentbalancing in the parsing modulethe of the decoder. A percentage of 80 that 80% of the runtime is spent in the parser, while 20% are processors ismeans significantly improved. consumed in the reconstructor. A value of 50% indicates a perfect balance.

17

Figure 7(a) the reconstructor processorscore idle for time is approximately Figure 7. Idle: time for parser and reconstructor three test sequences.40% (a) in all three test sequences and at for allparser data rates. Filter strength calculation is done the side. (b) Filter strength calculation is performed at :the reconstructor side. Figure 7(b) the reconstructor idle time can be reduced below 15%.

18

Simulation result (2)Variation of buffers

Figure 8. Average idle times of all system cores while decoding (a) intra-coded(I frames). For the simulations the calculation of the filter strength was assigned to the reconstructor core.[Figure 5(b)]
Increasing the PSNR value (and the bitrate) mainly raises the macroblock processing complexity at the parsing core and performance decrease. At a buffer size of one macroblock(1MB) the Foreman sequence performs best at 35 dB.

5MB: a continuous performance decrease with increasing PSNR values can be 19 observed for the Foreman sequence.

Flowergarden and the Barcelona sequences, higher parsing complexity results in(c) Figure 8. Average idle times of all system coresthe while decoding (b) inter-coded P- and typically higher idle times andtest less performance improvements at higher buffer sizes. inter-coded B-frames of three sequences.
20

Conclusion

A simulator for mapping the H.264 decoding process onto hardware architecture has been introduced. We have demonstrated the simulators abilities to analyze the efficiency of a multicore architecture under various conditions.

21

References

[4] T.-T. Shih, C.-L. Yang, and Y.-S. Tung, Workload characterization of the H.264/AVC decoder, in Proc. of the 5th IEEE Pacific-Rim Conference on Multimedia, pp. 957966, 2004. [6] F. Seitner, R. Schreier, M. Bleyer, and M. Gelautz, A macroblock-level analysis on the dynamic behaviour of an H.264 decoder, in Proc. of ISCE 2007, (Dallas), June 2007. [7] E. B. van der Tol, E. G. Jaspers, and R. H. Gelderblom, Mapping of H.264 decoding on a multiprocessor architecture, in Proc. of the SPIE, 5022, pp. 707718, May 2003. F. Seitner, R. Schreier, M. Bleyer, and M. Gelautz, Evaluation of dataparallel splitting approaches for H.264 decoding, Proc. of the 6th International Conference on Advances in Mobile Computing and Multimedia, Linz; November 2008. http://www.powercam.cc/slide/1580

Florian Seitner, Josef Meser, Gerold Schedelberger, Andreas Wasserbauer, Michael Bleyer, Margrit Gelautz, Markus Schutti, Ralf Schreier, Premysl Vaclavik, Gerald Krottendorfer, Gnther Truhlar, Thomas Bauernfeind, Philipp Beham, Design Methodology for the SVENm Multimedia Engine, Austrochip 2008, Invited Poster.

22

FFmpeg H.264 decoder

H264 benchmarks

JM Reference Codec X264 encoder FFmpeg H.264 decoder FFmpeg includes a H.264/AVC decoder that implements most of the features of the main and high profiles of the standard. The code is very optimized and include MMX/SSE and Altivec SIMD instructions for the most time consuming kernels. It is widely used in free multimedia players like MPlayer, VLC media player(VideoLAN), Xineetc. http://ffmpeg.org/
23

FFmpeg H.264 decoder

You might also like