Preview: December 2011

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

LOW COMPLEXITY AND HIGH EFFICIENCY PREDICTION TECHNIQUES FOR

VIDEO CODING

by

Chung-Cheng Lou

W
IE
EV

A Dissertation Presented to the


PR

FACULTY OF THE USC GRADUATE SCHOOL


UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)

December 2011

Copyright 2011 Chung-Cheng Lou


UMI Number: 3487980

All rights reserved

INFORMATION TO ALL USERS


The quality of this reproduction is dependent on the quality of the copy submitted.

In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.

W
IE
UMI 3487980
Copyright 2011 by ProQuest LLC.
EV
All rights reserved. This edition of the work is protected against
unauthorized copying under Title 17, United States Code.
PR

ProQuest LLC.
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, MI 48106 - 1346
Dedication

To my family.

W
IE
EV
PR

ii
Acknowledgments

I would like to thank my advisor Professor C.-C. Jay Kuo for his guidance and vision
throughout these years. Thank my mentor, Dr. Szu-Wei Lee, for sharing his insight and
experience.

W
I would like to thank my colleagues in Media Communications Lab. You broaden my
knowledge of exciting research fields, and always be the best consultants of my graduate
study.
IE
EV
PR

iii
Table of Contents

Dedication ii

Acknowledgments iii

List of Tables vi

W
List of Figures viii

Abstract xii

Chapter 1: Introduction
IE
1.1 Significance of the Research . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
1.2 Review of Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
EV
1.2.1 Complexity Reduction of MV Search . . . . . . . . . . . . . . . . . 3
1.2.2 Coding Efficiency Improvement . . . . . . . . . . . . . . . . . . . . 5
1.3 Contributions of the Research . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Complexity Reduction of Motion Estimation . . . . . . . . . . . . 6
1.3.2 Improvement of Video Coding Efficiency via MORP . . . . . . . . 7
1.3.3 Generalized Line-Based Intra Prediction in MORP . . . . . . . . . 8
PR

1.4 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Chapter 2: Background Review 11


2.1 H.264/AVC Prediction Techniques . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Inter Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.2 Intra Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Search Window Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Motion Vector Prediction . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Adaptive Search Range (ASR) Algorithms . . . . . . . . . . . . . . 15
2.3 Efficiency Improvement of Video Coding . . . . . . . . . . . . . . . . . . . 17
2.3.1 Prediction Techniques for HEVC . . . . . . . . . . . . . . . . . . . 17
2.3.2 Multi-Order-Residual (MOR) Video Coding . . . . . . . . . . . . . 21

Chapter 3: Adaptive Motion Search Range Prediction for Video Encoding 24


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Enlarged MV Prediction Set and MV Predictor . . . . . . . . . . . . . . . 26

iv
3.2.1 Limitation of Spatial or Temporal MV Prediction Set . . . . . . . 26
3.2.2 Joint Spatial-Temporal MV Prediction Set . . . . . . . . . . . . . 27
3.3 User-Controlled Search Range Selection . . . . . . . . . . . . . . . . . . . 28
3.3.1 Probability Controlled Search Range Selection . . . . . . . . . . . 29
3.3.2 Search Window Size Controlled SR Selection . . . . . . . . . . . . 32
3.4 Joint MVP / SR Selection Algorithm . . . . . . . . . . . . . . . . . . . . . 33
3.4.1 Relationship between Spatial and Temporal MVPs and Optimal MV 34
3.4.2 Proposed Joint MVP / SR Selection Algorithm . . . . . . . . . . . 37
3.5 Outlier Removal in MV Prediction Set . . . . . . . . . . . . . . . . . . . . 38
3.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Chapter 4: Video Coding with Multi-Order-Residual Prediction 55


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 H.264 Coding Efficiency Analysis . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.1 Performance Analysis of H.264/H.264 Prediction . . . . . . . . . . 57

W
4.2.2 Analysis Between Prediction Modes and Coding Efficiency . . . . . 59
4.2.3 Reasons for Higher Intra Mode Percentages . . . . . . . . . . . . . 60
4.3 Prediction Residual Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4 Proposed MORP Coding . . . . . . . . . . . . . . . . . . . . . . . .
IE . . . . 66
4.4.1 Proposed MORP Coding Architecture . . . . . . . . . . . . . . . . 67
4.4.2 Techniques in the First Order Prediction . . . . . . . . . . . . . . . 68
4.4.3 Techniques in the Second Order Prediction . . . . . . . . . . . . . 69
EV
4.4.4 Rate-Distortion Optimization . . . . . . . . . . . . . . . . . . . . . 70
4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Chapter 5: Generalized Line-Based Intra Prediction (GLIP) 79


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
PR

5.2 Prediction Residue Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 80


5.3 Generalized Line-based Intra Prediction (GLIP) . . . . . . . . . . . . . . . 82
5.4 VQ-based GLIP (VQGLIP) . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

Chapter 6: Conclusion and Future Work 102


6.1 Summary of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2 Future Research Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Bibliography 106

v
List of Tables

3.1 The cumulative probability of the normalized modified Cauchy function


with zero mean and unit variance. . . . . . . . . . . . . . . . . . . . . . . 31

3.2 2 with white


Variance of the joint spatial-temporal MV prediction set σst
Gaussian noise applied to the CIF Foreman sequence. . . . . . . . . . . . 40

W
3.3 The distribution of random variable |δ| under different noise variances. . . 42

3.4 Experimental set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43


IE
3.5 Comparison of the variance of M V P D with different MVPs as the search
window center. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
EV
4.1 The bit allocation when the Blue Sky HD (1920x1080) test sequence is
encoded by only inter or intra prediction at QP=20. . . . . . . . . . . . . 58

4.2 Mode distribution and bit rate comparison when various sequences are
encoded at QP=12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
PR

4.3 The bit reduction comparison of intra and inter modes with Tractor test
sequence of resolution 1920x1080 and QP = 28. . . . . . . . . . . . . . . . 74

4.4 The coding efficiency comparison of the proposed MORP scheme and
H.264/AVC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.1 Comparison of bit-rate reduction of intra and inter modes of the H.264/AVC
and the MORP scheme for test sequence Tractor of resolution 1920x1080
and QP = 28. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.2 Number of possible GLIP pixels and modes for each prediction direction. 84

5.3 The first 9 Exp-Golomb codewords. . . . . . . . . . . . . . . . . . . . . . . 86

5.4 Number of possible combinations for each prediction direction. . . . . . . 87

5.5 Edge directions and their boundaries of H.264/AVC intra prediction modes. 91

vi
5.6 The average bit-rate saving of Fig. 5.14. . . . . . . . . . . . . . . . . . . . 95

5.7 The performance comparison of intra and inter modes when the Foreman
test sequence of CIF resolution is encoded with QP = 12. . . . . . . . . . 98

5.8 The coding efficiency comparison of Fig 5.17 . . . . . . . . . . . . . . . . . 99

W
IE
EV
PR

vii
List of Figures

2.1 The intra prediction modes for a MB: (a) 4x4 modes, and (b) 16x16 modes. 13

2.2 MV candidates used in the MVP calculation. . . . . . . . . . . . . . . . . 15

2.3 Geometry motion partition. . . . . . . . . . . . . . . . . . . . . . . . . . . 19

W
2.4 Directional subsets for intra coding: (a) the directional subset in H.264/AVC;
(b) the new directional subset after rotating counter-clockwise with 12.5◦ ;
(c) all directions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
IE
2.5 RIP partition and prediction. . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.6 The block diagram of the MOR coding scheme. . . . . . . . . . . . . . . . 21


EV

2.7 The frequency domain prediction for higher order prediction residuals. . . 22

3.1 The 2D plot of histograms of M V P Ds and M V P Dt in logarithmic mag-


nitude, with the CIF Foreman sequence (300 frames, 30fps and QP = 28):
PR

(a) the x-component and (b) the y-component. . . . . . . . . . . . . . . . 27

3.2 MV candidates used in the purposed MVP calculation. . . . . . . . . . . . 28

3.3 Illustration of the geometrical relationship between the MVPD and the SR. 29

3.4 The contour plot of the cumulative probability of the 2D modified zero-
mean Cauchy function equal to 0.1, 0.2, · · · , 0.9, where the x- and y-axes
denote the SR in the x- and y-directions. . . . . . . . . . . . . . . . . . . . 32

3.5 The MVPD histograms of Football sequence (CIF frame size, 260 frames)
using full search with two initial search candidates: (a) the origin and (b)
the spatial median M V Ps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.6 The joint histogram of M V P Dspat and M V P Dtemp based on the CIF
Football test sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

viii
3.7 The histogram of the x-component of the value M Vf inal −0.5×(M V Pspat +
M V Ptemp ): (a) in zone (1) and (b) along line M V P Dspat −M V P Dtemp = 15. 36

3.8 The plot of the variance of random variable δx , which is the x-component
of value M Vf inal − 0.5 × (M V P Dspat + M V P Dtemp ), as a function of the
x-component of M V P Dspat − M V P Dtemp . . . . . . . . . . . . . . . . . . 36

3.9 The conditional variance of the MVPD with respect to given ∆M V P


with the CIF Foreman sequence (300 frames, 30fps and QP = 28): (a)
the x-component and (b) the y-component. . . . . . . . . . . . . . . . . . 37
2 and M V P D in logarithmic magnitude
3.10 The 2D MV histogram plots of σst st
with the CIF Foreman sequence (300 frames, 30fps and QP = 28): (a)
the x-component and (b) the y-component. . . . . . . . . . . . . . . . . . 38

3.11 The variance of M V P Dst as a function of σst 2 with the CIF Foreman

sequence (300 frames, 30fps and QP = 28) : (a) the x-component and (b)

W
the y-component. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.12 The histogram of MV differences for noisy CIF Foreman sequences of three
noise variance levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
IE
3.13 The distribution of δ = (M V − µst ) for the coding of the CIF Foreman
sequence with additive white Gaussian noise: (a) the x-component and
EV
(b) the y-component. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.14 Performance comparison of the SR selection in 3.3.1 by the Efficiency-


Complexity curve for ’Football’ QCIF sequence. . . . . . . . . . . . . . . . 45

3.15 Performance comparison of joint MVP/SR selection in 3.4 with the Efficiency-
PR

Complexity curve with ’Football’ QCIF sequence as the test sequence. . . 46

3.16 Comparison of (a) the RD performance and (b) the averaged search win-
dow size for the Foreman test sequence with respect to the case with a
fixed search window size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.17 Comparison of (a) the RD performance and (b) the averaged search win-
dow size for the Crew-QCIF test sequence. . . . . . . . . . . . . . . . . . . 47

3.18 Comparison of (a) the RD performance and (b) the averaged search win-
dow size for the Football-QCIF test sequence. . . . . . . . . . . . . . . . . 48

3.19 Comparison with two types of search window decision constraint. (Qcif
’Foreman’ test sequence) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.20 Comparison of the average SR size (dashed lines) and the RD performance
(solid lines) between the adaptive and fixed SR selection algorithms. . . . 52

ix
3.21 Comparison of the average SR size (dashed lines) and the RD performance
(solid lines) between the adaptive and fixed SR selection algorithms with
IBP as GOP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.22 Comparison of (a) the average SR size and (b) the RD performance for the
CIF Foreman sequence under three probability threshold values Tprob =
91%, 92%, or 93%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.23 Comparison of (a) the average SR size and (b) the RD performance for
the proposed adaptive SR selection algorithm with three benchmarking
algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.24 Variation of average SR size along the frame index under Foreman test
sequence with QP = 28. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.25 Comparison of (a) SR values and (b) the RD performance for M V Pst1
and M V Pst2 with and without outlier removal. . . . . . . . . . . . . . . . 54

W
4.1 A sample prediction residual frame from the Blue Sky HD sequence when
the frame is coded by (a) inter prediction, or (b) intra prediction. . . . . . 58

4.2
IE
Sample frames from four different video sequences. . . . . . . . . . . . . . 61

4.3 Mode distribution and the bit rate of the Rush Hour HD sequence under
different QPs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
EV

4.4 A sample frame its prediction residual from the Riverbed HD sequence
with QP = 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.5 Histogram of a sample frame from the Riverbed HD sequence with QP =


PR

12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.6 Overview of the MORP coding scheme comparing with H.264: (a) H.264,
and (b) MORP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.7 The block diagram of the proposed MORP coding scheme. . . . . . . . . . 68

4.8 Block based intra prediction with 2D translation. . . . . . . . . . . . . . . 69

4.9 Rate-distortion comparison between the proposed MOP scheme and the
benchmark codec for the Riverbed sequence with resolution 1920x1080
and intra frame coding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.10 Rate-distortion curves for different 1920x1080 HD sequences with I frame


encoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

x
4.11 Rate-distortion curves for different 1920x1080 HD sequences with I and P
frame encoding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.1 Visual comparison of two prediction residuals of test sequence “Foreman”


of the CIF resolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.2 Comparison of histograms of two prediction residuals of test sequence


“Foreman” of the CIF resolution. . . . . . . . . . . . . . . . . . . . . . . . 81

5.3 Close-up view of the inter-prediction residual from Fig. 5.1(b). . . . . . . 82

5.4 H.264/AVC intra prediction samples and directions. . . . . . . . . . . . . 83

5.5 Illustration of the proposed Generalized Line-based Intra Prediction (GLIP)


algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.6 The histogram of effective pixel range r for the CIF Foreman test sequence

W
coded by GLIP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.7 (a) Number of GLIP modes for each prediction direction. (b) Histogram of
the prediction direction when the Foreman test sequence of CIF resolution
IE
is encoded by GLIP algorithm. . . . . . . . . . . . . . . . . . . . . . . . . 87

5.8 The procedure to generate a VQGLIP codebook and prediction modes. . . 88


EV
5.9 The codebook for 4 × 4 blocks of size 32. . . . . . . . . . . . . . . . . . . . 90

5.10 An example of the weighted edge direction histogram of a block of size 4 × 4. 92

5.11 The involved pixels along the vertical left direction (mode 7) from neighbor
PR

pixel with index 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.12 The codebook of prediction modes and its 4 × 4 binary VQ patterns in


Fig. 5.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.13 Illustration of the histogram of the VQ modes for the CIF Foreman test
sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.14 The rate-distortion performance of the VQGLIP algorithm as a function


of codebook sizes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.15 Bit allocation and distortion comparison of the CIF Foreman test sequence
(QP = 12). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5.16 Residual signal serving as the second order input as well as the second
order prediction signal when different prediction methods are adopted. . . 97

5.17 The rate-distortion comparison between H.264/AVC and VQGLIP. . . . . 100

xi
Abstract

Video compression has been extensively studied in the last two decades. The success of a
coding algorithm relies on the effective removal of spatial and temporal redundancies in
input video sequences. On the other hand, effective spatial and temporal prediction tech-

W
niques demand high computational complexity, which makes it challenging to implement
in resource-limited mobile devices. This research focuses on two topics: 1) complexity
IE
reduction of temporal prediction without significant rate-distortion (RD) performance
degradation; and 2) the development of a more effective spatial prediction technique to
EV
enhance the RD performance.
For the first topic, complexity reduction in temporal prediction is achieved by the
development of an adaptive motion search range (SR) selection algorithm. A good choice
PR

of the SR size helps reduce memory access bandwidth while maintaining the RD coding
performance. To begin with, we get a motion vector predictor (MVP) for a target block
based on motion vectors (MVs) of its spatially and temporally neighboring blocks, which
form a MV prediction set. Then, we relate the variance of the MV prediction set to the
SR. That is, a larger variance implies lower accuracy of the MVP and a larger SR. Finally,
we derive a probability model for the motion vector prediction difference (MVPD), the
difference between the optimal MV and the MVP, to quantify the probability for a chosen
SR to contain the optimal MV. The superior performance of the proposed SR selection
algorithm is demonstrated by experimental results.
For the second topic, a novel multi-order-residual-prediction (MORP) coding approach
is proposed to improve spatial prediction efficiency in video coding. We observe that the

xii
compression ratio of a video coding algorithm depends on the nature of sequences as
indicated by the ratio between inter and intra blocks in the bit-stream. When the per-
centage of intra blocks increases, the prediction efficiency decreases, thus leading to a
poorer coding gain. In other words, one bottleneck of video coding lies in poor intra
prediction efficiency. To address this issue, we propose an MORP coding scheme that
adopts a second-order prediction scheme after the traditional first-order prediction. Dif-
ferent prediction techniques are adopted in different stages to tailor to the nature of the
corresponding residual signals. The proposed MORP scheme outperforms H.264/AVC
for the intra block coding and, thus, improves the overall coding efficiency.
Finally, we analyze prediction inefficiency of the proposed MORP scheme and present

W
an enhanced intra prediction coding called the generalized line-based intra prediction
(GLIP) to improve it. The GLIP allows partial prediction of a coding block by enabling
IE
a subset of the neighboring prediction pixels. The residual signal after the first order
prediction consists of the local line structure while the GLIP is designed to exploit this
feature. The vector quantization (VQ) technique is used to approximate and encode
EV
the shape of the binarized residual signal. The coded patterns of image residuals can
be predicted using partial line structures based on known neighboring pixel values. A
variable length coding scheme is adopted to encode the codewords of the VQ codebook
PR

to reduce the prediction overhead. The proposed GLIP algorithm effectively reduces
the residual bits for both intra- and inter- blocks and outperforms H.264/AVC by a
significant margin.

xiii
Chapter 1

Introduction
1.1 Significance of the Research

We have witnessed the rapid growth of digital video applications in the last two decades
in the entertainment, education and telecommunication industries. Video applications
enrich information sharing among people, visualize interpersonal communication and

W
improve social networking. Advances in digital technologies have stimulated research
on effective coding techniques for recording and transmission of the visual information.
IE
Among various technologies, digital video compression plays a key and fundamental role
since a huge amount of visual data have to be transmitted and stored efficiently.
The demand on high resolution video applications has further pushed the need for
EV
effective compression. For low-resolution mobile applications and video conferencing, the
raw data rate of the 352x288 Common Intermediate Format (CIF) is 36.5 Mbps. This
is already higher than the capacity of most existing network infrastructures. Recently,
PR

we have seen the popularity of the high-definition television (HDTV) of resolution 720P
(1280x720 progressive) and 1080P (1920x1080 progressive) with raw data rates of 331.8
Mbps and 746.5 Mbps, respectively. It is forecasted in [ [1] that video data will soon
occupy most of the network traffic. That is, various forms of video (TV, Video on
Demand, Internet Video, and Peer to Peer) will exceed 91% of the traffic in the global
consumer networks by year 2014. Although the network bandwidth and the storage
capability will continue to increase, the growing rate of video contents appears to be
even faster.
An emerging video coding standard, H.264/AVC, has been developed jointly by the
ITU-T Video Coding Experts Group (VCEG) and the ISO/ISE Moving Picture Experts

1
Group (MPEG) [45]. Under the same block-based motion-predictive coding framework
of previous standards, it incorporated several new techniques to improve the coding
gain. It outperforms previous video coding standards in a wide range of applications
and becomes the state-of-the-art coding method.
H.264/AVC employs the block-based hybrid coding scheme to achieve high coding
efficiency. A frame is divided into smaller macroblocks (MBs) or blocks of various sizes.
The spatial redundancy is removed using the coded neighbor pixels of the same frame
while the temporal redundancy between frames is removed by motion-compensated pre-
diction (MCP). By subtracting the prediction signal from the input video signal, we
obtain the residual signal with significantly reduced energy. The prediction residue is

W
further transformed via the discrete cosine transform (DCT), and the DCT coefficients
are quantized and coded by an entropy coder.
IE
Due to the success of H.264/AVC, efforts have been made to extend this standard
with three key extensions. Scalable Video Coding (SVC) [34] addresses the need of
scalability among different bit rates of the same video content. Multiview Video Coding
EV
(MVC) [29] targets at joint compression of multiple views of the same scene. Fidelity
Range Extension (FR-Ext) [38] supports recent demands on higher resolution video
contents. Recently, High Efficiency Video Coding (HEVC) is being proposed aiming at
PR

half of the bit rate of its ancestor H.264 while offering the same visual quality. Further
discussion on recent video coding technologies will be given in Chapter 2.
Video coding is an essential component in hand-held devices due to the popularity
of phone cameras. Powered by batteries, hand-held devices are limited in their com-
putational resource and memory access bandwidth. Reduction of power consumption
in a mobile communication environment [51] is a critical research problem. It is desir-
able to lower the encoder complexity, which is monotonically ascending with the coding
power [13].
Among various modules in video codecs, motion estimation (ME) is the most com-
putationally expensive one. It was reported by researchers [11], [13] that about one third

2
of processor cycles and 90% of memory access are dedicated to the ME. There are broad
research activities in lowering the ME complexity. However, complexity reduction tends
to come along with severe quality degradation, as fewer motion vector (MV) candidates
are examined by the ME. Reducing the computational complexity of the ME while pre-
serving good coding performance is a must. This issue will be addressed in Chapter
3.
We have also witnessed a surging demand for high definition (HD) and high fidelity
video contents. The TV resolution has migrated from the Standard Definition (SD) of
resolution 640x480 to HD. In the near future, the Ultra High Definition (UHD) video
of resolution 7680x4320 will enter the market for better home theater experience. Being

W
pushed by higher resolution, content fidelity becomes one of major goals of digital content
delivery in home entertainment systems. New video capture and display technology will
IE
enable a quantum leap in video quality, yet the network infrastructure becomes the main
bottleneck to deliver HD video contents economically to the end user. This demand drives
the need of a better video compression technology to reach a much higher coding gain
EV
than the existing H.264/AVC standard. This topic will be examined in Chapter 4 and
Chapter 5.
PR

1.2 Review of Previous Work

1.2.1 Complexity Reduction of MV Search

Quite a few techniques have been proposed to reduce the computational complexity
of ME in the past [14]. The number of search points was reduced in [41] under the
assumption that the distortion monotonically increases as the search position moves away
from the point corresponding to the minimum distortion. In that case, the convergence to
the optimal position can be still achieved without matching all candidates. Computation
is thus significantly reduced by decimation of search positions.

3
The criterion of the best matched block can be also simplified. In the block match
algorithm (BMA), all MV candidates are evaluated by their distortion, and the best
MV will be transmitted with prediction residue to the decoder. The distortion is mostly
evaluated by the sum of absolute differences (SAD). In [44], the distortion function
is simplified so that only sub-sampled pixels were taken into account to estimate the
distortion in both horizontal and vertical directions. Besides, pixels to be matched are
pre-selected based on the feature importance in determining a match.
The full search ME algorithm is commonly used in hardware implementation. The
speed-up of full search can be accomplished by hierarchical search [22], also known as
multi-resolution search. The pyramid multi-resolution scheme is based on the idea of

W
predicting an initial estimate at the coarse level followed by refining the estimate at the
fine level. Usually two- or three-level hierarchical search is adopted, and the search range
IE
at the fine level is much smaller than the original search range.
Predictive MV search methods were proposed in [4] and [22], which identified several
MVs as good initial search positions. For video sequences with fast motion, heuristic
EV
search algorithms perform poorly due to the frequent failure of monotonically increasing
distortion model assumption. Predictive methods, on the other hand, utilize the motion
information in the spatial or temporal neighboring blocks to form an initial estimate of
PR

the optimal MV. It can effectively reduce the search area. A good MV predictor can
improve the coding performance since the overhead of the MV prediction difference (the
difference between the final MV and the predictor) is much less than transmitting the
original MV directly.
To ease the burden of memory access bandwidth caused by reference pixel loading
during full search, data reuse methods were proposed in [43] and [5]. Depending on the
available memory size, four levels of memory reuse can be achieved where reference pixels
are reused among different MBs to be examined by the BMA. This type of algorithm
does not reduce the complexity of the ME. However, the reference frame transmission
bandwidth between the memory and the local cache can be effectively reduced.

4
1.2.2 Coding Efficiency Improvement

The H.264/AVC video coding standard targets at the low bit-rate range in the context
of mobile communication. To achieve better coding performance in higher bit-rates,
new technologies have been purposed in the emerging video coding standard called high
efficiency video coding (HEVC). For higher bit-rates and larger spatial resolutions, one
of the key elements to reach better coding performance is to introduce a larger block size
with a flexible partition scheme. To achieve this goal, HEVC defines coding units (CUs)
of a variable size. The CU replaces the macroblock structure used in the previous video
coding standards.
For the prediction techniques, in addition to the classical horizontal and vertical

W
partitioning of a MB, the Geometric Block Partitioning (GEO) mode includes another
kind of motion partition [12], which divides a block into two regions by a straight line so
IE
that different prediction methods can be applied to different regions. As an extension of
GEO, a block can be further divided into three or more regions with multiple geometric
EV
lines.
As to intra prediction, there are eight directional predictors and one non-directional
predictor in H.264/AVC. To extend the directional predictors furthermore, one can rotate
PR

the predictors of H.264/AVC to result in multiple predictor sets [25] to improve prediction
accuracy.
HEVC defines two context-adaptive entropy coding schemes [37], one for the lower-
complexity mode and the other for the higher-efficiency mode. The lower-complexity
entropy coder is based on a variable length code (VLC) table selection for all syntax
elements, with a particular code table selected in a context-dependent fashion based on
previous decoded values. The higher-complexity entropy coder uses a binarization and
context adaptation mechanism similar to the CABAC entropy coder of H.264/AVC. Yet,
it uses a set of variable-length-to-variable-length codes instead of an arithmetic coding
engine. This idea is implemented by employing a bank of parallel VLC coders, each of
which is responsible for a certain range of probabilities of binary events. While its coding

5
performance is similar to that of CABAC, it can be better parallelized and has higher
throughput per processing cycle in either software or hardware implementation.

1.3 Contributions of the Research

We have clearly identified two research problems; namely, the ME complexity reduction
and the coding efficiency improvement, in the last section. Complexity reduction will be
addressed in Chapter 3 while coding efficiency improvement will be discussed in Chapters
4 and 5. The major contributions of this research are summarized below.

1.3.1 Complexity Reduction of Motion Estimation

W
Most existing low-complexity ME algorithms aim at reducing the number of search points
by patterns, or reducing the memory access by data reuse. Although the complexity and
IE
the memory bandwidth can be significantly reduced by these techniques, we address
the problem from another angle, i.e., reducing the MV search window size. Specific
EV
contributions include the following.

1. To develop a good motion vector predictor (MVP), existing spatial and temporal
predicting techniques are reviewed with the conclusion that both methods have
PR

their own limitation. However, these two types of predictors can be complementary
to each other in prediction performance.

2. A better MVP is proposed to offer more accurate estimation of the optimal MV. It
takes both spatial and temporal predictions into account, including nine temporal-
related MVs and four spatial-related MVs. A more accurate MVP can be obtained
by taking their mean value.

3. Comparing to a pre-selected fixed search range by the user, an automatic SR


selection algorithm is proposed in our work. The 2D MV prediction difference
(MVPD) calculated by the difference of the final MV and MVP is modeled as a

6
probability function. By correlating the MVPD distribution with the SR, one can
specify the SR by considering either the probability or the complexity constraint
depending on the target application. Besides, the SR is no longer a square but
a rectangle so that it can meet different search requirements along the horizontal
and vertical directions.

4. We show that the accuracy of MVP is related to the SR. Basically, the MVPD
tends to be larger when the variance of the prediction set is larger. Thus, the
variance of MVP can be used to update the probability function parameters of the
SR model. By doing so, the accuracy of MVP can update the SR size adaptively
to achieve a better balance between the complexity and the RD performance.

W
5. Noise in the video signal can move predicted MVs far away from the optimal MVs
to result in a larger SR. To address this problem, outlier detection and removal is
IE
used to provide a more robust prediction to the SR.
EV
6. The proposed joint MVP and SR selection algorithm reduces the complexity of
ME significantly with negligible picture quality degradation as demonstrated by
extensive experimental results. At the same time, the proposed solution allows flex-
ible user control in finding a proper balance between complexity and performance
PR

degradation.

1.3.2 Improvement of Video Coding Efficiency via MORP

A novel Multi-Order-Residual-Prediction (MORP) coding scheme is presented in Chap-


ter 4 to reduce the residual energy and improve the overall coding efficiency. The con-
tributions in this part are given below.

1. To understand H.264/AVC more, we compare the efficiency of inter prediction


and intra prediction and show that the number of bits to encode an intra block
is about three times of that to encode an inter block to provide the same quality

7
level. Basically, the coding gain is poorer when the bit stream is dominated by
intra blocks. Furthermore, the reasons to have a larger number of intra coded MBs
are discussed, including irregular MV, the frame resolution and the quantization
parameter.

2. To improve prediction efficiency for intra blocks, the prediction residue after
inter/intra prediction in H.264/AVC is examined. It is found that the residue
still contains some structure that reflects local image features.

3. A novel Multi-Order-Residual-Prediction (MORP) scheme is proposed to reduce


spatial redundancy more effectively. It consists of two cascaded prediction modules.

W
The first-order predictor targets at the prediction of the original video signal while
the second-order predictor is tailored to the characteristics of the residue of the
first-order predictor. IE
4. The proposed MORP coding scheme offer an excellent coding gain if the video
EV
sequence is dominated by intra modes. When 1920x1080 HD test sequences are
encoded by intra modes, the MORP scheme can reduce the bit rate by 21.64% on
the average. When both intra and inter modes are enabled, the averaged bit rate
reduction is 10.25%.
PR

1.3.3 Generalized Line-Based Intra Prediction in MORP

In order to further improve the prediction efficiency of MORP in inter-frames, a novel


Generalized Line-Based Intra Prediction (GLIP) is presented in Chapter 5. Specific
contributions include the following.

1. The prediction residue of the intra- and inter- blocks is further examined, and it
is shown that the residue contains lots of line-based local image structure. Yet the
line structure for a coding block is not sound and only a portion can be predicted
by the neighboring blocks.

8
2. Based on the observation, a novel Generalized Line-Based Intra Prediction (GLIP)
is proposed to reduce the subtle spatial redundancy. GLIP is enhanced from the
line-based intra prediction of H.264/AVC, while GLIP allows partial prediction of
a coding block by enabling only a subset of the neighboring prediction pixels.

3. The proposed GLIP algorithm is further refined by Vector Quantization (VQ).


First, VQ is applied on the binarized residual signal to identify the frequent image
patterns. These patterns are further processed by the edge detection filters to
obtain the line structures to be predicted by the neighboring pixels.

4. The prediction overhead of the proposed VQ based GLIP (VQGLIP) is further

W
reduced by the variable length coding. Based on the histogram of the refined VQ
codewords, the Huffman Coding is applied to generate the variable length codes.
The frequent prediction modes is coded by fewer bits, and the overall prediction
IE
overhead is reduced. Experimental results show that the proposed VQGLIP scheme
outperforms H.264/AVC by a bit rate saving of 8.53%.
EV

1.4 Organization of the Thesis


PR

The rest of this thesis is organized as follows. Some background of this research is
reviewed in Chapter 2. It consists of state-of-the-art complexity reduction techniques for
video coding as well as recent efforts on achieving more efficient coding techniques. In
Chapter 3, we first analyze the relationship between the MV predictor and the SR. Based
on the analysis, an adaptive SR selection algorithm is proposed to reduce the number of
the search points without impacting of the rate performance. Then, in Chapter 4, a new
Multi-Order-Residual-Prediction (MORP) scheme is proposed to enhance the coding
gain of intra blocks and, hence, improve the overall coding performance. In Chapter
5, a new Generalized Line-based Intra Prediction (GLIP), which is an extension of the
Line-based Intra Prediction in H.264/AVC, is proposed. The GLIP can be applied in the

9
MORP scheme to achieve an even better coding gain for both intra- and inter-frames.
Finally, concluding remarks and future research issues are discussed in Chapter 6.

W
IE
EV
PR

10

You might also like