H.264/AVC Intra-Only Coding (iAVC) Techniques For Video Over Wireless Networks
H.264/AVC Intra-Only Coding (iAVC) Techniques For Video Over Wireless Networks
264/AVC intra-only coding (iAVC) techniques for video over wireless networks
a
Ming Yang*a, Monica Trifas*a, Guolun Xiongb, and Joshua Rogersa Jacksonville State University, 700 Pelham Road North, Jacksonville, AL, USA 36265 b Huanyu Autolighting Co. Ltd., 45 Huanshan Road, Xiangfan, Hubei, China 441021
ABSTRACT
The requirement to transmit video data over unreliable wireless networks (with the possibility of packet loss) is anticipated in the foreseeable future. Significant compression ratio and error resilience are both needed for complex applications including tele-operated robotics, vehicle-mounted cameras, sensor network, etc. Block-matching based inter-frame coding techniques, including MPEG-4 and H.264/AVC, do not perform well in these scenarios due to error propagation between frames. Many wireless applications often use intra-only coding technologies such as Motion-JPEG, which exhibit better recovery from network data loss at the price of higher data rates. In order to address these research issues, an intra-only coding scheme of H.264/AVC (iAVC) is proposed. In this approach, each frame is coded independently as an I-frame. Frame copy is applied to compensate for packet loss. This approach is a good balance between compression performance and error resilience. It achieves compression performance comparable to MotionJPEG2000 (MJ2), with lower complexity. Error resilience similar to Motion-JPEG (MJ) will also be accomplished. Since the intra-frame prediction with iAVC is strictly confined within the range of a slice, memory usage is also extremely low. Low computational complexity and memory usage are very crucial to mobile stations and devices in wireless network. Keywords: Coding, video, wireless, network, error-resilience, H.264/AVC, Motion-JPEG2000.
1. INTRODUCTION
The requirement to transmit video data over unreliable wireless network (with the possibility of packet loss) will be present in the foreseeable future. It is necessary to develop advanced video compression techniques that are well-suited to the network environment in applications such as tele-operated robotics, vehicle-mounted cameras, sensor network, etc. Significant compression ratio and error resilience are both needed for such type of operations. Block-matching based inter-frame coding techniques, including MPEG-4 and H.264, achieve significant compression rations. However, in these types of schemes, the decoding of P-frames depends on the successful decoding of preceding I- or P-frames, and the decoding of B-frames depends on the successful decoding of both preceding and succeeding I- or P-frames. Thus, any packet loss within the I-frame of the Group-of-Pictures (GOP) will cause disastrous video quality degradation, which is called error propagation (Fig. 1). Due to this situation, numerous wireless applications often use intra-only coding technologies such as motion-JPEG, which exhibit better recovery from network data loss at the price of higher data rates. In order to address these research issues, an intra-only coding scheme of H.264/AVC is proposed.
Damaged Frame due to Packet Loss Infected Frame Complete Frame
Multimedia on Mobile Devices 2009, edited by Reiner Creutzburg, David Akopian, Proc. of SPIE-IS&T Electronic Imaging, SPIE Vol. 7256, 725608 2009 SPIE-IS&T CCC code: 0277-786X/09/$18 doi: 10.1117/12.811423
In this approach, we apply the intra-only mode of H.264/AVC (iAVC) for video compression. In essence, each frame is coded independently as an I-frame. In order to compensate for packet loss, various error concealment techniques have been applied. The proposed approach is a good balance between compression performance and error resilience. H.264/AVC intra coding utilizes intra prediction to eliminate spatial correlation within the frame. This scheme achieves compression performance comparable to MJ2 and much better than MJ, with lower complexity. Error resilience similar to MJ is also achieved due to the same intra-coding strategy used in iAVC. Since the intra-frame prediction with iAVC is strictly confined within the range of a slice, memory usage is also extremely low. Low computational complexity and memory usage are very crucial to mobile stations and devices in wireless networks. The major advantages of the proposed approach are: (1) H.264/AVC intra-only mode: comparable performance to MJ2 with much lower complexity; (2) Limited error propagation: since there is no inter-frame prediction, error propagation is limited within the slice; (3) Fast intra prediction mode decision: reduced complexity; (4) Low latency: the decoding of a frame does not depend on other frames; (5) General Markov Model (GMM) based network modeling; (6) Frame-copy based error concealment. In this paper, two coding schemes iAVC and MJ2 have been adopted for testing. The encoded video has been transmitted through a pseudo lossy channel with 5%, 10%, and 15% packet loss. At the receiver end, the video data has been decoded and error-concealment techniques have been applied to compensate for packet loss. This paper is organized as following: the method is presented in Section-2; an overview of H.264/AVC and MJ2 is shown in Section3; network modeling is described in Section-4; experimental results are presented in Section-5; and the paper is concluded in Section-6.
2. METHOD DESCRIPTION
H.264/AVC, as the newest joint standard of ITU-T and ISO/IEC, has achieved significant improvements in coding efficiency compared to previous standards such as MPEG-1/2/4 and H.261/3. H.264/AVC intra coding utilizes intra prediction to eliminate spatial correlation within the frame. The advantages of iAVC over inter-frame coding schemes are listed in the Table-1:
Table-1 Comparison of compression schemes
Compression Scheme Compression Strategy Compression Ratio Frame rate suitability Inter-frame dependence Error propagation Processing latency Complexity Memory usage Parallel Processing
iAVC Coding Intra-frame correlation Medium Suitable for low frame rate video No dependence Maximum one slice Low: no inter-frame dependence Low: no motion estimation involved Low: only the data within a slice needs to be stored in memory Easy: no inter-frame dependency only, remove spatial
Inter-frame Coding Inter-frame, correlation High Suitable for high frame rate video Have dependence Multiple frames within the GOP, until the next I-frame Much higher: reference frame is involved for ME High: motion estimation involved Large: reference frames need to be stored for ME Difficult: inter-frame dependency remove temporal
As shown in Fig. 2, the video frames will first be divided into slices and then each slice will be compressed with H.264/AVC in intra-mode. Fast mode decision algorithms will be incorporated to speed up the process of mode decision. After that the video data will go through application-layer and network-layer packetization. Then, the video data will go through a pseudo lossy channel with a certain degree of packet loss. At the receiver, the corrupted video data will be decoded with frame copy as the error concealment tool.
Video Frames Video Frames divided into Slices
Wireless Channel (With Loss and Delay)
Network Modeling
Downloaded from SPIE Digital Library on 18 Feb 2010 to 86.64.70.136. Terms of Use: http://spiedl.org/terms
aU.
U-
to
000
-o
.0
-o
'C
U)
0)
I-)
U)
U) =
(0
U)
(0
U)
Li
-o
-c
to n S
-o
a,
'C
a,
Li
to to
a,
Li
C >, 0.
0)
E E
a,
a,
0.
a) 'C
By subtracting the intra frame predicted image from original input image, the result, a difference or residual image, is generated. An integer transform is applied to this residual image with the resulting coefficients being adaptively quantized and entropy coded. The prediction mapping information is stored in a bit stream along with compressed residual image. Because the amount of data required for the residual image can be reduced by highly accurate intra prediction, higher compression efficiency can be achieved when using intra-only compression. In contrast to inter-frame prediction, spatial intra prediction is performed within a single slice, which means that error propagation is strictly limited within the range of a slice. This is a big advantage compared to inter-frame coding scheme. In iAVC scheme, the most time-consuming procedure is the decision of intra prediction mode. In H.264/AVC reference software JM uses a brute-force search, which calculates the Lagrange cost for each prediction mode and then decides the optimum based on the rate-distortion measure. With this approach, the rate and distortion of every possible prediction mode need to be computed. In order to reduce the complexity and latency of the proposed iAVC video codec, we plan to apply advanced prediction mode decision approach. It is well known that strong correlations exist between adjacent pictures. It is natural to assume that the optimal mode decision results of spatially adjacent macroblocks are strongly correlated. The basic idea is to use the optimal prediction modes of blocks located above or left to the current block to estimate the optimal prediction mode of current block. With this scheme, the brute-force prediction mode decision process is skipped. 3.2 Motion-JPEG2000 With the increasing use of multimedia technologies, image compression requires higher performance as well as new features. To address this need in the specific area of still image encoding, a new standard was developed, the JPEG2000. It was not only intended to provide rate-distortion and subjective image quality performance superior to existing standards, but also to provide functionality that current standards can either not address efficiently or not address at all. JPEG 2000 supports lossy and lossless compression of single-component (e.g., grayscale) and multi-component (e.g., color) imagery. In addition to this basic compression functionality, however, numerous other features are provided, including: (1) progressive recovery of an image by fidelity or resolution; (2) region of interest coding, whereby different parts of an image can be coded with differing fidelity; (3) random access to particular regions of an image without needing to decode the entire code stream; (4) a flexible file format with provisions for specifying opacity information and image sequences; (5) good error resilience. Due to its excellent coding performance and many attractive features, there is a very large potential application base for JPEG 2000. Some possible application areas include: image archiving, Internet, web browsing, document imaging, digital photography, medical imaging, remote sensing, and desktop publishing. The codec is based on wavelet/subband coding techniques [2, 7]. It handles both lossy and lossless compression using the same transform-based framework, and borrows heavily on ideas from the embedded block coding with optimized truncation (EBCOT) scheme [12]. In order to facilitate both lossy and lossless coding in an efficient manner, reversible integer-to-integer [3, 5] and nonreversible realto-real transforms are employed. To code transform data, the codec makes use of bit-plane coding techniques. For entropy coding, a context-based adaptive arithmetic coder [14] is usedmore specifically, the MQ coder from the JBIG2 standard [6]. Two levels of syntax are employed to represent the coded image: a code stream and file format syntax. The code stream syntax is similar in spirit to that used in the JPEG standard. Motion JPEG 2000 (MJ2), a video stream and file format, was standardized in 2002 as part of ISO/IECs JPEG 2000 (JP2) standard, with subsequent refinements.Motion JPEG 2000 (often referenced as MJ2 or MJP2) is the leading digital cinema standard currently supported by Digital Cinema Initiatives (a consortium of most major studios and vendors) for the storage, distribution and exhibition of motion pictures. It also is under consideration as a digital archival format by the Library of Congress. It is an open ISO standard and an advanced update to MJPEG (or MJ), which was based on the legacy JPEG format. Unlike common video codecs, such as MPEG-4, WMV, and DivX, MJ2 does not employ temporal or inter-frame compression. Instead, each frame is an independent entity encoded by either a lossy or lossless variant of JPEG 2000. Its physical structure does not depend on time ordering, but it does employ a separate profile to complement the data. For audio, it supports LPCM encoding, as well as several MPEG-4 variants, as "raw" or complement data.
Expected applications include: Storing video clips taken using digital still cameras High-quality frame-based video recording and editing Digital cinema Medical and satellite imagery Motion JPEG2000 features are: Lossless and lossy compression in one codec Scalability in resolution and quality Accuracy depth up to 32 Bit/component Image width and height up to (232-1) Quality based, VBR, CBR coding, high efficient Motion Image specific additions are: Intraframe based coding scheme MPEG-4 based file format Syncronisation of audio and video Metadata embedding Multi-component, multi-sampling formats e.g. YUV422, RGB 444
4. NETWORK MODELING
4.1 Network Loss and Late Loss Network packet loss behavior modeling is very important to evaluate the Quality-of-Service (QoS) of multimedia communication and the performance of recovery/concealment algorithms. In a typical video streaming application, packet loss results from two situations: network loss and late loss (Fig. 4). In the first case (network loss), a packet is lost during transmission and cannot reach the destination. In the second case (late loss), the arriving packet is discarded because it arrives too late and misses the playback deadline. Different network loss behaviors significantly affect the performance of error recovery algorithms. For example, Forward Error Correction (FEC) has good performance with random packet loss, but not with bursty loss. In this case, the modeling and analysis of network packet loss behavior becomes a very important component of error recovery algorithms for multimedia streaming over IP networks.
Delayed Packet Delivered Packet Lost Packet
Packet Delay
Packet Delay Translates to Packet Loss
Inter-loss Distance
Loss Run
4.2 Network Trace Modeling In order to analyze the network trace obtained through the transmission experiments, Bernoulli Model, Gilbert Model, Pareto Model, and General Markov Model have been applied to model the loss behavior. Mainly, two methods are used to evaluate the precisions of the models: (1) loss-run distribution and (2) un-recoverable ratio with FEC. Loss-run is the length of consecutive packet loss; the loss-run distributions predicted by different models are compared with that of the
original trace. FEC is applied to the original network trace as well as the network traces generated by different models, and then un-recoverable ratio (ratio of packets that cannot be recovered with FEC) will be compared. Obviously, the model that can best predict the loss-run distribution and un-recoverable ratio possesses the best modeling precision. 4.3 General Markov Model Markov Model is a powerful tool in describing the relationship between events that are temporally related with each other. The reasons to choose Markov Model for network trace modeling are: (1) Successful arrivals or losses of adjacent packets are temporally related to each other: if the preceding packet is lost, the succeeding packet is also likely to be lost. Wah and Su ([13]) tested the Internet loss behavior by transmitting video packets from Hong Kong to Japan and USA, and it was found out that Internet packet loss tends to be bursty. This means consecutive packets loss is more likely to happen than single packet loss. For example, if packet n is discarded because of delay, then packet n+1 is also likely to be delayed and discarded; (2) In extended Gilbert model, the major assumption is that the loss probability of a certain packet depends on the past n consecutively lost packets. This assumption is not always true. In General Markov Model, the loss probability of a certain packet depends on the loss behavior of the past n packets. Markov Model gets rid of the assumption made by extended Gilbert model, so it is more general; (3) Since the General Markov Model has more states than the extended model, it is able to describe the network loss behavior more precisely ([9]). General Markov Model used to be considered as computationally demanding. However, with the advance of modern computing hardware/software, the high-order Markov model, which used to be computationally infeasible, now becomes approachable. It is also possible to investigate the general Markov model with higher order and more states. When trying to set up a Markov model, one of the major concerns is determining the appropriate order of the Markov chain. It has been found that generally an order of 6 is enough for the prediction of network traffic trace. In our experiments, we used a general Markov Model of order 6. To parameterize the General Markov Model, we need to generate the transition matrix. In a 6-order General Markov Model, there are 64 (26) possible states and 128 (64x2) possible transitions. Each state corresponds to two transitions: the transition to 0 and the transition to 1. Table-2 shows a part of the transition matrix for a particular trace obtained through previous experiments, where 8 possible states and 16 possible transitions (each state corresponds to two transitions) are listed.
Table-2: Part of the Transition Matrix of a 6-Order Markov Model
# 0 1 2 3 4 5 6 7 State 000000 100000 010000 110000 001000 101000 011000 111000 State # 4419 2376 2378 118 2454 24 77 42 State % 0.2210 0.1188 0.1189 0.0059 0.1227 0.0012 0.0038 0.0021 0# 2462 1956 2259 117 2357 21 77 41 0% 0.5571 0.8232 0.9499 0.9915 0.9604 0.875 1.0 0.9762 1# 1957 420 119 1 97 3 0 1 1% 0.4427 0.1768 0.0500 0.0085 0.0395 0.125 0.0 0.0238
index of the possible states a possible binary combination the number of the state the probability of the state number of transitions from current state to 0 probability of the transition from current state to 0 number of transitions from current state to 1 probability of the transition from current state to 1
As shown in Fig. 5, the General Markov Model has the best capability to predict the loss-run distributions of the network traces. This is due to the fact that the loss-run distribution of the network trace generated by Markov Model is always very close to the original network trace. One of the major applications of the network packet loss models is predicting the performance of FEC. Here we use a block size of 3. If only one packet within a block is lost, then FEC will have the capability to recover it. If more than one packet is lost within the block, then the lost packets will be un-recoverable by FEC. The number of un-recoverable loss packets and the un-recoverable ratio are computed for each of the ten traces with each of the five models. As can be seen from Fig. 6, the parameterized Markov Model that we have obtained predicts the performance of FEC very precisely. Bernoulli Model and Gilbert Model tend to over-estimate the performance of FEC because they are incapable of catching the loss-run distribution of network trace.
10
F C(N rm lized E o a )
Original Trace Bernoulli Trace Gilbert Trace Pareto Trace Markov Trace
12
10 8 6 4 2 0
Original Trace Bernoulli Trace Gilbert Trace Pareto Trace Markov Trace
0 1 2 3 4 5 6 7 8 9
10
Loss-Run Length
T race #
5. EXPERIMENTAL RESULTS
In the experiments, two coding schemes have been adopted: iAVC and MJ2. The latest version of H.264/AVC reference software JM14.2 has been used for the implementation of iAVC, while the open source implementation of MotionJPEG2000 OpenJPEG has been selected for the testing of MJ2. Four QCIF video sequences - akiyo, coastguard, foreman and news have been employed for testing. These four sequences have different levels of textural complexity and motion. In the settings of iAVC, each frame is divided into three slices and each slice has 33 macroblocks (MB). In MJ2, each frame is divided into three tiles, each of which has 33 MBs as well. We use a pseudo lossy channel to simulate video packet loss. In the experiment, we have simulated the following levels of loss: 5%, 10%, and 15%. Frame-copy has been used as the error recovery tool. As we can see in Fig. 8-11, iAVC outperforms MJ2 with the encoding of all four video sequences.
38
38
38
38
32
32
---a-- H.284/AVC
38
- - -= - - Motjon-JPEG2888
38
---a-- H.284/AVC
- - -= - - Motjon-JPEG2888
28 388
488
888
888
788
888
888
1888
1188
28 388
488
888
888
788
888
888
1888
1188
38
38
38
38
32
32
---a-- H.284/AVC
38
- - -= - - Motjon-JPEG2888
38
---a-- H.284/AVC
- - -= - - Motjon-JPEG2888
28
388
488
888
888
788
888
888
1888
1188
28
388
488
888
888
788
888
888
1888
1188
Bitrates (kbps)
Bitrates (kbps)
34
33.6
33
32.6
32
32 6
31.6
31
36.6
30
36.6
---a-- H.264/AVC
---
30
---a-- H.264/AVC
---
- - Mstisn-JPEG2000
26.6
- - Mstisn-JPEG2000
26.6
26
400
600
600
600
logo
1100
26
400
600
600
600
logo
1100
33
32.6
32
31 6
31
31.6
SI
36.6
38
30
26.6
26
--- - Motion-JPEG2888 --- Motion-JPEG2000 Fig. -10 Performance Evaluation on Loss-Run Distribution 29 Fig. 11 Performance Evaluation on FEC
---a-- H.264/AVC
28.6
---a-- H.264/AVC
20 6 400
600
600
600
logo
1100
28 6 488
688
688
988
1888
1188
--C---
OAV/99C9
v r-UO!TO 999W9d
--C--- OAV/99C9
--=--999
v r-UO!TO 999W9d
9 (sdq)sejeiji
UOI9OTSIO-81B9
AOJ
999
999
999
999
999
99Z
9 (sdq)sejeiji
UOI9OTSIO-81B9
AOJ
999
999
SSO1
999
UBUJBAOJ
92
SSO1
UBUJBAOJ
92
I' %91
TC''d
22
22
22
22
12
12
92
92
--C--- OAV/99C9
61r-UO!TO
999W9d
92
92
--C--- OAV/99C9
61
r-UO!TO 999W9d
9 9
991'
999
999
99Z
9 (sdq)sejeiji
999
996
999L
9 9
991'
999
999
99Z
999
996
999L
(sdq)sejeiji
UOTSIO-81B9 UOI
CABU
JB
%9 TCSBd
SSO1
TTOTSIO-0T09 001
TOT STABU TB
%9 TCSBd
SSO1
-UT
--C---
--
3AVn99
--C--- OAV/99C9
99990adr-uo,IoIN
--=---
v r-UO!TO999W9d
999
999
99Z
999
9 (sdq)sejeiji
996
999
UT,
999
999
99Z
999
9 (sdq)sejeiji
996
999
99
--C--- OAV/99C9
--C---
--=---
vr-UO!TO 999W9d
--=---
OAV/L'9C9
UL r-UO!LO
999W9d
999
999
99Z
999
9 (sdq)sejeiji
996
999
99
9: I'
999
999
UUZ
999
9 (sdq)sejeiji
996
999L
DULL
REFERENCES
[1] Adams, M.D., The JPEG-2000 still image compression standard, N2412, ISO/IEC JTC 1/SC 29/WG 1., (2002). [2] Antonini, M., Barlaud, M., Mathieu, P. and Daubechies, I., Image coding using wavelet transform, IEEE Trans. on Image Processing, vol. 1, no. 2, pp. 205220 (1992). [3] Calderbank, A.R. Daubechies, I., Sweldens, W. and Yeo, B.-L., Wavelet transforms that map integers to integers, Applied and Computational Harmonic Analysis, vol. 5, no. 3, pp. 332369 (1998). [4] Cho, S.G., Bojkovic, Z., Milovanovic, D., Lee, J., and Hwang, J.J., Image Quality Evaluation: JPEG 2000 vs. Intraonly H.264/AVC High Profile, Facta Universitatis, Ser. Elec. Energ. Vol. 20, pp. 71-83 (2007). [5] Gormish, M.J., Schwartz, E. L., Keith, A. F., Boliek, M. P. and Zandi, A. Lossless and nearly lossless compression of high-quality images, in Proc. of SPIE, San Jose, CA, USA, vol. 3025, pp. 6270 (1997).
[6] ISO/IEC Intl. Std. 15444, Information technology JPEG2000 image coding system, particularly Part 3: Motion JPEG2000, (with subsequent amendments) (2002). [7] Lewis, A.S. and Knowles, G., Image compression using the 2-D wavelet transform, IEEE Trans. on Image Processing, vol. 1, no. 2, pp. 244250 (1992). [8] Lin, J. Image compression: the mathematics of JPEG 2000, Modern Signal Processing 46, pp. 185-221 (2003). [9] Markovski, V., Xue, F.and Trajkovic, L. Simulation and analysis of packet loss in video transfers using user datagram protocol, The Journal of Supercomputing, Kluwer, Vol. 20, No. 2, pp. 175-196 (2001). [10] Panasonic Technical Report, AVC-Intra (H.264 Intra) compression: technical overview, (2007). [11] Qu, Q., Pei, Y., Modestino, J., and Tian, X., Source-adaptation-based wireless video transport: a cross-layer approach, EURASIP Journal on Applied Signal Processing, Vol. 2006, pp. 1-14 (2006). [12] Taubman, D. High performance scalable image compression with EBCOT, in Proc. of IEEE International Conference on Image Processing, Kobe, Japan, vol. 3, pp. 344348 (1999). [13] Wah B.W., and Su, X. Streaming video with transformation-based error concealment and reconstruction, Proceeding of IEEE Conf. on Multimedia Computing and Systems, Vol. 1, pp. 238-243 (1999). [14] Witten, I. H., Neal, R. M. and Cleary, J. G. Arithmetic coding for data compression, Communications of the ACM, vol. 30, no. 6, pp. 520540 (1987).