0% found this document useful (0 votes)
34 views4 pages

An Efficient Inter Mode Decision Approach For H.264 Video Coding

fffffffffffffffff

Uploaded by

Shafayet Uddin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views4 pages

An Efficient Inter Mode Decision Approach For H.264 Video Coding

fffffffffffffffff

Uploaded by

Shafayet Uddin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

2004 IEEE International Conference on Multimedia and Expo (ICME)

An Efficient Inter Mode Decision Approach for H.264 Video Coding


Xuan Jing and Lap-Pui Chau
School of Electrical and Electronic Engineering
Nanyang Technological UniversitySingapore
Email: [email protected] elpchau@ntu. edusg
Abstract
Variable size block motion estimation is a very
important technique for video coding. The new H.264
standard employs 7 dfferent size block types which
can significantly improve the coding performance
compared with the previous video coding standard.
On the other hand, the computational complexiry of
H.264 encoder increases dramatically due to the
various coding modes used. In this paper, an efficient
inter mode decision approach is presented. The
objective is to reduce the number of candidate block
types in the motion estimation while maintaining the
coding efficiency. Experimental results show that the
proposed method can save the computation cost by up
f a 42% at the same PSNR and bitrate.

macroblock, firstly, rate distortion optimized (RDO)


motion estimation is carried out for each 8 X 8 subpartition and its mode decision is done by minimizing
the Lagrangian functional:
J(s,e,MODE IQP,A,,)=SSD(s,c,MODE
A,,
.R(s,c,MODE I QP)

lQP)+

(1)

where QP is the macrohlock quantization parameter,


&,DE
is the Lagrange multiplier for mode decision,

SSD is the sum of the squared differences between the


original block s and its reconstruction c and MODE is
one of the potential prediction modes. Similarly,
motion estimation will he performed for 16 x 16, 16 x 8
and 8 x 16 modes and the final mode decision for the
macrohlock is made by examining the Lagrangian cost
function (I). Although the rate distortion performance
of H.264 encoder is much better than the previous
standards, the additional computation burden caused
1. Introduction
by exhaustively searching all combinations of different
modes makes it difficult to achieve real time
The JVT H.264 video coding standard [I] achieves
implementations. Besides various fast motion
much higher coding efficiency than the previous
estimation algorithms, many fast mode decision
standards such as H.261 and H.263. This is mainly due
strategies have been proposed recently to tiuther
to the fact that the H.264 encoder employs more
reduce the computation load of H.264. In [2] Tu et al
complicated approaches in the coding procedure. One
proposed a merging procedure for 8 X 8 blocks by
important approach is variable size block motion
checking the distances between their motion vectors.
estimation and mode decision. Traditionally, motion
In [3] and [4] early termination was used to reduce the
estimation is performed only on the macrohlock (MB)
number of potential prediction modes. With the help of
level thus each 1 6 x 16 macroblock will he assigned
an edge map, Lim et al in [5] assigned proper potential
one motion vector which can lead to a minimum block
modes for each macrohlock and its sub-partitions
matching error. However, when the macroblock
according to their homogeneity properties. Different
contains multiple objects and every object moves in
from the above mentioned methods, this paper
different directions or when the macroblock lies on the
proposed a very efficient classification method which
boundary of a moving object, only one motion vector
can reduce the average number of block types used
for the whole block will not be enough to represent
while maintaining the same coding performance.
true motions and it will result in serious prediction
The rest of the paper is organized as follows. In
error. In order to improve the prediction accuracy,
Section 2 we will present the proposed inter mode
H.264 enables 7 different block sizes in motion
decision method. Section 3 illustrates the simulation
estimationwhichare 16x16, 1 6 ~ 8 , 8 ~ 1 6 , 8 ~ 8 , 8 X 4 results
,
for performance comparison. Finally, a
4 x 8 and 4 x 4. When conducting mode decision for a
conclusion is given in Section 4.

0-7803-8603-5/04/$20.0002004 IEEE

1111

2. Efficient inter mode decision


Variable size block motion estimation has been
studied for many years. The main concept is to use
large blocks for homogeneous areas and small blocks
for areas containing complex motions such that the
resulting entropy for the residue and motion vectors
can be minimized. This bas been successfully
implemented into the H.264 standard recently which
results in a better rate distortion performance. However,
exhaustively examining all combinations of different
coding modes for every macroblock is not desirable
although it is very accurate. Since if a macroblock is
belong to a homogeneous area there is little chance that
it will be split into smaller size blocks according to the
Lagrangian functional. Therefore it is not necessary to
examine smaller size block types for these
macroblocks.
Previously, Huang et al [6] classified the
macroblocks into background, shade motion and edge
motion blocks by a two-stage classifier. In their
method, motion vectors for background blocks were
directly set to zero. For shade motion blocks pixel
subsampling technique was used to reduce the
computation of motion estimation. The variable size
block matching was only conducted for edge motion
blocks. In order to classify macroblocks as
homogeneous or non-homogeneous, an edge map was
used in [ 5 ] . If the amplitude of the edge vectors within
a block is less than a threshold, it will be classified as
homogeneous. Further, modes for smaller size blocks
will be disabled for homogeneous blocks.
Without using multi-stage classifier or edge
detection, our proposed efficient inter mode decision
method only depends on the absolute differences
between consecutive frames. Generally, the absolute
6ame difference contains lots of information about the
motions in successive frames. Large amplitudes will
appear on the moving edges or boundaries of moving
objects while small amplitudes in homogeneous areas.
So if the amplitudes in a macroblock are small, it is
most likely that this macroblock belongs to a
homogeneous region and using only larger block sizes
in motion estimation will be accurate. Otherwise, this
macroblock may contain complex motions and using
more block types can achieve better rate distortion
performance. In the proposed efiicient inter mode
decision algorithm, we f m t subtract the current frame
by its previous frame. Then the sum of absolute
difference for the current macroblock can be calculated
using (2).

where xi,jand

Y , , denote
~
the gray levels of pixels at

location ( i j ) of the current frame and its previous


frame respectively and N is the macroblock size wbicb
is 16. If the SAD is less than a threshold (TH) this
macroblock is considered as in a homogeneous region
and only large block types (l6X 16, 1 6 x 8 , 8X 16) will
be used in its motion estimation. Otherwise it belongs
to a moving edge region and additional four block
types(8~8,8~4,4~8and4x4)wiIlalsobeenabled.
The proposed algorithm is summarized as follows:
Step 1: Subtract the current frame by its previous
frame and calculate the SAD for the current
macroblock using (2).
Step 2: Compare the SAD with a threshold (TH).
If SAD<TH go to step 3;
Otherwise go to step 4;
Step 3: Conduct rate distortion optimized (RDO)
motion estimation and mode decision using
16x16, 1 6 ~ 8 , 8 ~ 1 6 b l o c k t y p e s .
Step 4: Conduct rate distortion optimized (RDO)
motion estimation and mode decision using all
7 block types 16 x 16, 1 6 x 8, 8 x 16, 8 X 8,
8 x 4 , 4 x 8 and 4 x 4 .
Note that, in calculating SAD the previous original
frame is used as the reference rather that the previous
reconstructed frame. This is because coding errors in
the reconstructed frame may affect the accuracy of this
classifier. By employing such classification algorithm,
computational complexity of the original encoder is
greatly reduced while the coding efficiency is
maintained. In the next section simulation results will
be discussed to demonstrate the performance of our
method.

3. Simulation results
The proposed mode decision approach was tested
using the fust 100 frames from four testing video
sequences (Foreman, Caiphone, Miss America and
Mother & Daughter) all in QCIF format 176X 144. In
these test videos, Foreman and Caphone have
relatively large motions while the other two have
moderate motions. The experiment was carried out on
the JVT JM6.0 encoder [7]. In the motion estimation, 5
reference frames were enabled with the maximum
search range 32 and the motion vector resolution was
114 pixel. CABAC (Context Adaptive Binary
Arithmetic Coding) was adopted as the entropy coding
method. The Hadamard transform was used to
transform DCT coefficients. The frame rate was 30 f p s
and the frame coding structure was IBBP. The

1112

experiments was conducted for four quantization


parameters QP=28,32,36 and 40.
The PSNR and bitrate comparisons between the
original encoder and the one with the proposed
efficient inter mode decision method are tabulated in
Table 2. The coding results are very similar. Compared
with the original encoder, the PSNR degradations of
the proposed algorithm are no more than 0.08 dB and
in most cases bitrate savings are achieved. On average,
the proposed scheme achieved 0.2% bitrate saving at
the cost of 0.04 dB PSNR drop. In other words, the
rate distortion performances of the two methods are the
same. The computational complexity comparisons are
illustrated in Table 1. Instead of using 7 block types
for every macroblock in motion estimation, only 4 to 5
block types were actually employed. Thus averagely
about 36% computations are saved. Figure land F i m e
2 give examples of the classification results and the
final mode decisions for Foreman and Mother &
Daughter sequences. In Figure I(b) and 2(b) the white
blocks are the macroblocks with SAD values larger
than the threshold (TH),they represent the areas which
may contain complex motions. Figure I(c) 2(c) and
I(d) 2(d) demonstrate the decoded frames of the
original encoder and the modified one with their
coding modes respectively.

4. Conclusion

5. References
[ I ] Joint Video Team (JVT) of ISOilEC MPEG and ITU-T
VCEG. Working Draft Number 2, Mar. 2002.

[2] Y.K.Tu, J.F.Yang, Y.N.Shen, M.T.Sun, Fast variable


size block motion estimation using merging procedure
with an adaptive threshold, Proc. Inl. Conf Midtimedia
and Expo, V01.2, pp.789-792, Jul. 2003.
131A.Chang, O:C.Au, Y.M.Yeung, A novel approach to fast
multi-block motion estimation for H.264 video coding,
Proc. Int. Conf MultimediaandExpo,Vol.1, pp.105-108,
Jul. 2003.
[4] P.Yin, H.C.Tourapis, A.M.Touapis, J.Boyce, Fast mode
decision and motion estimation for NT/H.264, Proc.
Inr. Conf Image Processing, Sept. 2003.
[5] K.P.Lim, S.Wu, D.J.Wu, SRahardja, X.Lin, F.Pan,
Z.G.Li, Fast inter mode selection, 1020, 9 N T
meeting, San Diego, USA, Sept. 2003.
[6] S.Y.Huang, J.R.Chen, J.S.Wang, K.R.Hsieh, H.Y.Hsieh,
Classified variable block size motion estimation
algorithm for image sequence coding, Proc. Int. Conf
h u g e Processing, Vo1.3, pp.736-740, Nov. 1994.
[7] N T Reference Software unofticial version JM6.0
http://bs.hhi.de/-snehnng/nnl/download/jm6O.zip

In this paper, an efficient inter mode decision


algorithm for H.264 standard has been proposed. The
motivation is to predict the potential block types before
conducting the rate distortion optimized motion
estimation. A classifier based on the absolute frame
difference has been employed to decide which block
types are to be enabled. Experimental results show that
the proposed method achieves large computation
savings while maintaining the same rate distortion
performance.

Table 2. Comparisons for PSNR and Bitrate


Foreman

I QP=28 I QP=32 I QP=36 I Q P 4 0


APSNR(dB)

~~

APSNR(dB)
ABitrate(%)

I
1

5.04
4.48
4.08

-28
-36
42

4.24

-39

Table 1. Complexity comparisons

Foreman
Carphone
Miss America
Mother &
Daughter

-0.04
-0.04
A R i t r a t e-,
~ ~ ,\ 0.9
-0.5
Carphone
QP=28 Q P 3 2

APSNR(~B)

A PSNR(dB)

1113

-0.08

I
I

0.2
Miss America
QP=28 QP=32
-0.1

-0.08

-0.06

I
I

-0.04
0.3

I
I

-0.01

-0.6

1 QP36 I QP40
0
-0.5

-0.01
-1.1

I Q P 3 6 I QP=40
-0.03

Mother & Daughter


QF28 QP=32 Q P 3 6
-0.01
-0.07

-0.03

QP-40
-0.08

Figure 1. Samples of Foreman

(4

(4

Figure 2. Samples of Mother & Daughter

* (a) Original frame (b) The classification result based on the SAD thresbolding
(c) Decoded frame with corresponding macroblock modes by the original 11.264 encoder
(d) Decoded frame with corresponding macroblock modes by the proposed method

1114

You might also like