0% found this document useful (0 votes)
34 views13 pages

A Study of The Evolution of Video Codec

The document discusses the evolution of video codecs, focusing on their importance in optimizing video storage, transmission, and processing for modern applications like streaming and conferencing. It outlines traditional video compression techniques, their limitations, and future research directions, emphasizing the need for high-efficiency codecs to meet increasing bandwidth demands. The study also reviews various international video coding standards and highlights the advancements in codec technology since 1990.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views13 pages

A Study of The Evolution of Video Codec

The document discusses the evolution of video codecs, focusing on their importance in optimizing video storage, transmission, and processing for modern applications like streaming and conferencing. It outlines traditional video compression techniques, their limitations, and future research directions, emphasizing the need for high-efficiency codecs to meet increasing bandwidth demands. The study also reviews various international video coding standards and highlights the advancements in codec technology since 1990.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

1

A Study of the Evolution of Video Codec and


2020 Third International Conference on Advances in Electronics, Computers and Communications (ICAECC) | 978-1-7281-9183-6/20/$31.00 ©2020 IEEE | DOI: 10.1109/ICAECC50550.2020.9339513

its Future Research Direction


Anitha Kumari R.D, Dr. Narendranath Udupa
School of Electronics and Communication, Former Director, Philips Research
REVA University, Bengaluru, Karnataka, India Bengaluru, Karnataka, India
Email: [email protected] Email:[email protected]

requirements of dynamic streaming applications requirements


Abstract— The underlying principle of video compression is show the essentiality of video codecs in many other use-cases
methodically subject to optimizing the data required for the of modern life along with video conferencing chatting, such as
storage, transmission, and processing of video streaming video streaming in modern High Definition Smart Television
applications without compromising the video object's perceptual (HDTV), Digital Cinemas like Inox, PVR and also in modern
quality. Owing to its potential features, the core idea of video smartphone [3][4].
compression has become widely adopted prominent technology
On the other hand media production and service provider
among bandwidth-hungry mobile streaming applications, where
the prime aim is to reduce the redundant features from the video companies like Netflix, Hotstar, etc. and their on-demand
frame sequence, which in result optimizes the storage and video streaming services through various mobile and web-
transmission cost to a greater extent. The proposed methodology based applications generating potential revenue to the global
of this research study explores the traditional frequently adopted economy since many years and gained increasing end-users
video codec and also defines its future research direction and with the pace of subsequent internet-based application
evolutionary scope from both fundamental and theoretical view- development[5][6]. Online video industries like YouTube also
pint. As research outcome, it outlines the scope and limitations gaining many more users every year and envisioned potential
associated with the traditional approaches of video compression growth by investing in long-term research and development
and how it can be improvised to maintain futuristic dynamic
[7] [8]. The prime challenge in traditional video transmission
streaming application demands where storage and processing
complexity of frame sequence can be reduced while preserving and processing is that it requires high bandwidth for content
acceptable video quality. variability and is required to satisfy the video resolution
constraints by meeting the storage criteria. V.C. techniques
Index Terms— Video Codec, Video Coding, Video arise in the solution space to deal with this scenario in the
Compression, Video Signal Processing, Video Quality. restricted bandwidth environment, where video data can be
compressed and represented in a compact manner. The
I. INTRODUCTION compressed video object frame sequence can be efficiently
The invention of the camera or imaging technology provides a stored or transmitted for various use-cases [9]. The
new way to record, store, and transmit information and a International Video Coding Standards, which are frequently
picture say more than a word. The imaging technologies adopted and having a scope of improvement, consider both
produce images like scene image, document image, sketch legacy versions of MPEG series and H.26x series [10] [11]
image, medical imaging, etc. Further, the invention of the [12]. The study performs a comprehensive survey of the Video
motion picture as a video provides more meaning full ways of codec evolution and extract and identify the research gap in
utilities in the field of online video-based classes, video subsequent stages of evolution of the Video codec
conferencing, video chatting, remote robotic guided development timeline. It also provides a methodical course of
operations, etc. Tremendous rapid development has taken action to assess most of the traditional legacy video codecs
place in the type of videos to make it suitable for a small scale and their design spaces with a brief discussion on their
with very higher resolutions and transmission in the real-time solution approaches.
wireless communication or through the internet [1] [2]. These
all possibilities come with technological advancement into the
video compression techniques, and most popularly, it is II. INSIGHT INTO DIGITAL VIDEO COMPRESSION
known as video codecs. The core technological aspect of The conceptual idea behind developing the concept of video
video compression and video codecs it to offer better quality compression says that it aims to eliminate
video in streaming applications and, at the same time should unnecessary/redundant information from video frame
minimize the complexity of storage, transmission, and sequences and make it suitable with acceptable visual
processing of video frame sequence. Compression is the perception for multimedia streaming applications. And also, at
process where it emphasizes representing the video sequence the same time, it should not generate a burden to the storage
by eliminating redundant information from it. The enormous and processing pipeline and ease the transmission process by
growth of multimedia data traffic driven by unparalleled not consuming more network resources such as bandwidth.
Talking to the aspect of redundancy, a digital video signal
2

carries four types of information which are discussed as compression. The popular approaches include
follows: entropy coding such as Run Length Coding (RLC),
Huffman Coding, and Arithmetic Coding [23].

A. Redundant Information • Video Coding International Standards: V.C.


In the context of video compression (V.C.) and its execution procedure basically combines the strength factors
procedure, the objective of video coding can be attained by
from both spatial image compression and temporal
exploring the redundant information from a video object frame
motion compensation. However, it is observed that
sequence [13], [14], [15] [16], [17], [18]. A video object can
contain different types of redundant attributes which include most of the V.C. techniques, in reality, adopt the idea
the following: of lossy compression. Spatial frame coding can be
obtained by incorporating transform coding, and also,
• Colour-spatial Redundant Attributes: The Human the redundant temporal attributes can be eliminated
Visual Systems (HVS) is mechanized in a way that is using block-based motion estimation and
more perceptive towards the luminance components compensation methodologies. A brief introduction of
as compared to the chrominance components. A different international video coding standards is
methodical approach called color sub-sampling can presented as follows:
be applied over different channels, which in the long
run, minimize the resolution aspect required to o H.120: The 1st international video codec standard was
exhibit the chrominance components. The 1st step of officially released as H.120 [19] for compression, but
any image/video compression technique is to the standardization of this standard took enough time.
transform the digitized attributes/elements of a frame This standard adopted a set of techniques which are
into a frequency domain attributes. switch for sampling, scalar quantization, differential
• Redundant attributes in spatial correlation: The pulse-code modulation. As the video quality obtained
spatial redundancy (S.R.) occurs when the spatial by H.120 is observed not superior enough, hence
correlation between objects within a video frame is research effort further emphasized on improvising its
explored. There exist different types of spatial performance with block-based codec designs.
compression algorithm based approaches that are o H.261: Further, in the evolution process of video
analytically modelled to serve the purpose of video codec design, H.261 appeared as a practical ITU-T
coding during intra-frame compression such as video compression standard [20]. The design aspects
predictive coding (P.C.), transform coding such as of H.261 targeted practical applications of the ITU-T
Discrete Cosine Transform (DCT), quantization and video compression standard. It is exclusively
entropy coding. designed for transmitting the media over digital
• Temporal redundant Information: Another network-line with fixed data rates of 64 Kbits/s. The
redundancy aspect of video compression include video rate is also varying between 40 Kbits/s to 2
redundant temporal attributes, where it is observed Mbits/s.
that adjacent frame attributes comprise of a higher o H.263: The purpose of designing video compression
degree of correlation among each other. Therefore, in standard H.263 was to attain low bit-rate video
most cases, a video frame looks similar to its compression. It is applied over video-conferencing
previous frame with temporal correlation measures. [24]. The prime features of H.263 include motion
This results in redundancy, which can be minimized compensation with different block-size, and it also
by applying inter-frame compression methodologies. includes overlapped block-based motion
There exist several approaches to inter-frame compensation (MC). The compression accomplishes
compression methods where a different range of a better bit rate with 18-24Kbps. H.263 supportability
computational complexities is involved. The most includes variable frame sizes such as 128×96 (Sub-
popular inter-frame compression methods include QCIF), 176×144 (QCIF), 352×288 (CIF), 704×576
sub-sampling coding, difference coding, block- (4CIF), and 1408×1152 (16CIF).
oriented difference coding principle, and motion o H.263+: This video codec standard is approved in the
compensation [21], [22]. year 1998 by ITU-T. It facilitates higher flexible
packet transmission in the context of wireless-based
• Statistical Redundant Attributes: In any multimedia
and packet-based networks. It also handles the
data streaming, the prime focus is to incorporate a
compression error with a higher degree of resilience.
minimum number of bits to represent the data
o H.264: In the study of Wiegand et al. [25], the video
without compromising or losing any significant
coding standard H.264 is introduced and further
information related to it. Several studies claim that
applied to many use-cases such as wireless
minimization of bit redundancy possibly improves
broadcasting and mobile networks for on-demand
the performance of intra-frame and inter-frame
3

multimedia streaming/messaging services. The o To reconstruct the original video signal, adaptive
design features associated with this standard are offset based coding modules are utilized after the
listed as follows: execution of deblocking filtering.
1. It also incorporates the principle of variable block
size oriented MC technique. C. Research Evolution Timeline of Video Codecs
2. It enhances the efficiency of video coding by The following table shows the trend of evolution of video
incorporating a boundary extrapolation approach. codec standards since the year 1990.
3. For MC purpose, it also incorporates efficient usage
Table 1 Trend of evolution (year-wise) of video coding standards with down-
of reference frames. arrow line
4. It utilizes the direct approach of MC for B-plane
coded vectors.
5. For predictive coding, it uses a deblocking filter. It is
placed inside the MC predictive looping structure.
6. It represents the video signals with an adaptive
approach where 4x4 blocks based transformation
takes place. It also uses an inverse transform with
exact-matching.
7. This video coding also incorporates a context-aware
binary coding approach (CABC) for entropy
encoding purposes, which is later found a superior
method to enhance the performance of video coding.
8. It basically supports two different types of coding The trend of the research evolved since the year 1990 shows
methods, such as context-aware binary coding and that significant works have been carried out to design and
variable length coding, where both are superior for develop codecs to fulfill the requirements of streaming
applications. The evolution of the video codec development
video streaming applications supportability context.
started since the year 1990, where potential endeavor by both
B. High-efficiency video coding (HEVC) ISO/IEC Moving Picture Experts Group (MPEG) and the
o The underlying concept of High-efficiency video ITU-T Video Coding Experts Group (MPEG) initiated
coding (HEVC) is published in the manuscript by the standardizing and prototyping of the design process of
reference video codecs. Further joint collaboration between
authors of [26]. The system design of HEVC targets
these two working groups came up with a successful release of
to optimize video resolution performance by
H.264/AVC standard and further forms entities for the
incorporating parallel computing pipeline foundation of the Joint Video Exploration Team (JVET) with
architecture. combined effort. There also exist open-source video delivery
o The incorporation of HEVC architecture aims to protocols where Dynamic Adaptive Streaming over HTTP
increase the video resolution aspect with distinct (MPEG-DASH) and WebRTC (Real-time communication)
features. Some of the potential features of HEVC assist in the development of high-quality video streaming
include: applications. However, in 2010 and 2013, Google came up
o Tree-based codec with a potentially integrated with leading standardized video codecs such as VP8 and VP9,
encoder with reduced justifiable size. followed by the subsequent development of VP10 in 2015.
o Coding units are divided into predictive blocks and Alliance for Open Media (AOM) further formed in the year of
tree of transform entities. 2018 and introduced AV1, which is claimed to be the best
o The predictive coding reduces the block size from encoder to date [27][28]. JVET further also launched the
successor codec of HEVC, which is known as Versatile Video
64x64 to 4x4 samples.
Coding (VVC), where the supported software reference model
o It also adopts the concept of CABAC for entropy
is released with the name of VTM i.e., VVC test environment
encoding purposes. and expected to be launched in 2020 [31].
o Various prediction modes, along with 33 directional
modules, are designed to improvise the motion vector III. TYPES OF COMPRESSION
prediction.
o Motion interference is utilized for motion vector A. Fundamental of Lossless and Lossy Compression
(MV) signaling purposes. From both theoretical and implementation, view-point video
o The quantization mode utilizes uniform coding contains two prime frameworks i.e., 1: video encoders,
reconstruction quantization modeling to control the and 2. Video decoders, as appeared in Figure 1. A video
encoder comprises three principle practical units: color sub-
computing aspect of quantization.
sampling, a temporal module (inter-frame encoder), or a
o It also adopts the distinct features of the deblocking
spatial module (intra-frame encoder) and an entropy encoder.
principle to mechanize the inter-frame predictive The objective of the encoder is to gather the enormous
coding and looping structure.
4

measure of data expected to show a video frame sequence It usually comprises of I-frame, P-frame and B-frame [32],
outline so as to accomplish a high compression [33], [34]. I-frame 'Intra-coded Frame': In video coding, these
proportion[29],[30]. frames are coded autonomously from every other frame. This
frame is compressed in a way that follows the single image
B. Quality Measure Aspect (Q.M.) in Video Coding
compression technique. It includes transforming coding,
In context of video compression, lossy methodical approach is vector quantitation, or entropy coding.
the primary technique used to accomplish a high compression This sort of frame component can be bigger for the encoding
proportion (C.P.)/compression ratio (C.R.), despite attaining purpose. However, it is quicker to decompress as compared to
higher (C.P.) this methodology results in significant data loss other frames. P-frame component 'Anticipated frame': It
(it is termed as a distortion of information) after regeneration refers to an inter-coded frame component. This type of inter-
of the compacted video. So as to evaluate the nature of the coded frame component is compressed with a forward
recreated video, a few strategies have been created. One of the prediction approach. Here it takes the previous I-frame or P-
least difficult and most famous strategies is to utilize Mean frame as a reference component. It is not feasible to construct
Square Error (MSE) for each edge independently and take an inter-coded frame without having a dependency on the
their asthmatic mean computed outcome. MSE is the average precious frames such as (I-frame or P-frame). P-frames are
outcome of the squared erroneous measure decided by the smaller in size when they undergo encoding as compared to
following condition of the mathematical expression: the I-frames. There exist another type of frame called B-frame,
2
1 M N which is often termed as 'Bi-predictive frame': It is also an
 f (i, j ) − fˆ (i, j ) … (e.q.1)
MSE =
M ×N

i =1 j =1  
inter-coded frame current frame (Fc) where it considers
predictive coding considering bi-directional reference frames
Another significant approach to evaluating MSE is known as (Fr). This means it is bi-directionally predicted on the basis of
the computation of the Peak Signal-to-Noise Ratio (PSNR), previous I-frame and consecutive P-frames. But here is
which is expressed with the following equation. important to note that a B-frame cannot become a reference
 ( f max )2  frame for other B-frames. However, it is observed in various
PSNR = 10 log10  dB … (e.q.2)

studies that the video coding with P-frames and B-frames
 MSE  provides a higher compression rate but lacks computational
efficiency, unlike coding an I-frame, which is computationally
Here in Eq. (2) fmax refers to the maximum count of the
efficient in reality. This justifies a matter of fact which states
possible pixel entities. For example, it can be said that 255 to
that to perform coding of P-frame and B-frames motion
represent 8-bit of resolution of pixel attributes. To evaluate the
estimation and motion compensation procedures are often
performance of video coding in the application scenario,
PSNR is evaluated for each frame corresponding to the utilized.
original and regenerated compacted video sequence. And D. Inter-Frame Compression
further computed PSNR values for different frames The concept of Inter-frame compression is also well studied,
corresponding to both original and reconstructed videos are and it shows that during Inter-frame compression, the coding
processed under arithmetic mean computation. The higher module basically explores the correlation between different
outcome of PSNR indicates that retained video quality is successive frame sequences corresponding to a video object.
superior, and similarly, low PSNR indicates low-quality video This often takes place if the frame-rate is found quite higher.
object retention after compression. This computational operation basically leads to a situation
where temporal redundancy can be observed [35-37]. It
C. Types of Frames basically aims to reduce the data redundancy to optimize the
compression performance by retaining good quality video with
In general, video frame sequences subjected to compression
considerable resolution.
utilizing different calculations relying upon the type of frame.

Figure 1 Design of Inter-frame encoder [35]


The evolution of the video code timeline shows that video the post-processing of color sub-sampling, there will be 4-
codec standards share a set of features that are distinguishable stages of inter-frame encoding, which generates the
in the majority of the standard video codecs. In each standard, compacted bitstream after the execution of compression phase.
a common operation is mostly observed, which says that- in Here these approaches mostly include either temporal
5

prediction schema, transform quantization, and moreover based changes, a frame is split into non-covering macroblocks,
entropy encoding. The following figure shows a block-based and for each macroblock, the 2-D transform coding is applied.
representation of Inter-frame encoder design with respect to Most transform coding frameworks utilize a macroblock size
the current frame (Fr) and reference frame (Fc). of 8×8 or 16×16. Note that the two sizes are powers of 2,
which decrease the possibility of iterative calculation with the
E. Temporal Prediction
complex nature of the transform coding and requires low
In video codec design, the prime aim of temporal prediction memory. [42-45], [46-47].
analysis is to minimize the redundant temporal attributes G. Quantization (Q)
which incur with higher frame-rate. This occurs due to the Quantization refers to a mapping procedure for a huge
higher degree of correlation factors among the successive and arrangement of potential inputs to a littler arrangement of
consecutive frame sequences in video coding. It determines potential yields outcome. Quantization frames the core of
prediction to construct a few frames by taking other frames as lossy compression, and it is an irreversible procedure. The
reference. This actually helps in reducing the transmission rate objective of this plan is to delineate/extract information from a
of video frame sequences in order to utilize the bandwidth
source into as not many bits as conceivable to such an extent
requirement for streaming applications. It also efficiently
that the remade information from these bits is as near the first
performs compression where I and P frames are referred
during the prediction of P or B frames. one as could be expected under the circumstances. There are
It could also incorporate the forward prediction analysis two kinds of quantization, scalar, and vector. Since natural
where past frames that are ordered considered as a reference eyes are progressively touchy to low frequencies contrasted
frame to the current frame during video coding. On the other with high frequencies, more noteworthy compression can be
hand, backward prediction considers reference frames of accomplished by apply coarser quantization step size at higher
current frames and orders them in a way where it will be frequencies to evacuate irrelevant coefficient esteems [48],
displayed during the construction of the future frame [49], [50], [51].
sequence. The average computed outcome from forward and
backward prediction is utilized to determine the construction H. Entropy coding (E.C.)
of frame type B. In any predictive coding and analysis, the The principle of E.C. comes under the last computational stage
of execution of video codec design and analysis. It refers to a
encoder first encodes the reference frames followed by
lossless compression schema which intends to remove
computation of the residual frame, which is different between
statistical redundancy from the video frame sequence. It also
the current frame and reference frames. The residual frame aims to construct video frames with minimum possible bits
component contains lesser energy; thus, it is encoded later without losing significant information.
instead of the current encoded frame [34]. E.C. transforms the M.V.s, the quantized change coefficients,
and other data from the intra-prediction process into a packed
F. Transform Coding (T.C.) bitstream appropriate for transmission or storage capacity. The
T.C. is one of the most significant utilization of video coding generally utilized entropy coding is Variable Length Coding
design, which is utilized to minimize the occurrence of (VLC) and Arithmetic Coding. Arithmetic coding, for the
redundant spatial attributes. The RPE, which is the distinction
most part, gives better compression efficiency but, at the same
between the present frame and the Motion Compensation
time, result in moderately high computational unpredictability
Prediction (MCP) outline, has a high connection between
neighboring pixels [38-41]. Block-based coding is generally [128], [13], [137].
utilized in picture/video coding norms frameworks. In block-

Figure 2 Design of Interframe decoder


With regards to the deciphering of inter-frame compression, residual prediction error, which is meant by RPE in figure 2,
the decoder deciphered the packed information surges of the entropy translating is utilized trailed by backward quantization
compacted movement vectors and compacted RPE; the (Q-1), at that point, reverse change coding TC-1. Note that the
procedure is turned around to recreate the first frame. In the irreversible quantization process implies that ̂ isn't
decoder side, the reference frame is reproduced by intra- indistinguishable from RPE^. At long last, RPE^ is added to
outline interpreting and is prepared to compensate and the anticipated frame to present the recreated frame.
anticipate the present frame elements. To create a decipher of
6

and thus just a single vector for every two pixels has appeared.
IV. CONCEPTUAL PRINCIPLES OF MOTION Nonetheless, for movement remuneration, this is certifiably
COMPENSATION AND ESTIMATION not a practical solution strategy since the estimation of the
A. Motion Compensation (MC) optical stream is computationally escalated and needs
Motion Compensation (MC) has been utilized as a primary calculations for every pixel.
apparatus to lesser the temporal redundant attributes excess
that originates from the little change in the substance starting C. Block Matching Motion Estimation
with one frame then onto the next in video arrangements. That Block matching calculation is the most famous procedure
is, MC is the way to accomplish high compression proportions utilized for movement estimation, in which the present
for the coding framework. This method goes back to the mid- luminance outline is separated into non-covered MacroBlocks
1970s and has been received by the entirety of the current (M.B.s) of size N×M. These macroblocks are then contrasted
universal video coding standards, for example, MPEG and exhibited the comparing macroblock and their nearby
arrangement and H.26x arrangement including H.265 [52], neighbors in the reference frame. This will do the
[53], [54], [55], [56], [57]. displacement of vectors that specify the development of the
To accomplish such a high coding proficiency, H.264/MPEG- macroblocks starting with one area then onto the next in the
4 AVC utilizes Multiple Reference Frames' M.E. (MRFME) reference frame [26]. For any macroblock in the current
of up to five reference edges to foresee the present frame outline, the BMA finds the coordinating macroblock of a
structure [58], [59], [60]. similar size N×M in the search zone inside the reference
frame. The situation of the coordinating macroblock gives the
B. Motion Estimation (M.E.) Motion Vector (MV) of the current macroblock, as appeared
ME is the initial step of inter-frame encoding and as a rule the in figure 3. The coordinating measure is generally decided to
most computationally concentrated part (about half for one utilize a Block Distortion Measure (BDM) like Mean Absolute
reference - 80% for five of the whole framework) in a video Difference (MAD), or Sum of Absolute Differences (SAD) or
encoder [60], [58] ,[61], [62], [63], [64]. It is conceivable to Mean Square Error (MSE). The macroblock with the least
evaluate the dislodging for each pixel position between expense is viewed as the one coordinating the current
progressive video frames, creating a field of pixel stream macroblock [65].
vectors known as the optical stream. The field is subsampling,

Figure 3: Block matching M.E. procedure

D. Block-Size Motion Estimation nonetheless, it gives poor expectation, while littler macroblock
Macroblock size is a significant parameter of the BMA. In the size can deliver better movement estimation results and thus
BMA, expanding the size of the macroblock implies that more decreases leftover energy metric. The default square size for
calculations are required. In any case, it likewise implies that movement remuneration is 16×16 components for the
there will be fewer macroblocks per frame sequence, so the luminance part. Fixed Block-Size Motion Estimation
measure of calculation expected to perform movement (FBSME) of size 16×16 or 8×8 has been utilized in the
estimation will be diminished. There is a high chance that the original coding guidelines, while H.264\AVC uses VBSME,
enormous macroblock will contain various entities moving in which is progressively muddled. VBSME permits a
various ways. At the end of the day, utilizing a bigger macroblock of 16×16 examples of the luminance segment to
macroblock size lesser the measure of calculation; be divided into 4 different ways, as appeared in Figure 4:
7

Figure 4: Macroblock partitions and sub-macroblock partitions


V. METHODOLOGY OF THE RESEARCH
E. Full Search (F.S.) o Aim: The prime aim of this phase of research is to carry
One of the simplified forms of algorithm which is out an investigational study on the traditional video
incorporated to attain a better solution regarding motion codecs and their design modeling. Apart from discussing
estimation to find out the optimal solution of motion vectors is the fundamental aspects associated with the video
the F.S. algorithm. It efficiently finds out and converges compression techniques, as illustrated in the prior
towards the best matching block which appears within the sections, the study also provides deeper insight into the
search region. It functions in a way where the correlation fact that shows the most recent and frequently used
window traverses from one position to each candidate techniques in video compression with a brief analysis of
position. It can be represented with the mathematical notion of the techniques.
expressed as described by: o The outcome of the research-based study and analysis: It
N N basically objectifies the research statistics of the frequently
SAD(m, n) = | c(i, j ) − s(i + m, j + n) |;− p ≤ m, n ≤ p used prominent approaches in video codec design and also
i=1 j =1 further discusses the related works with a theoretical
… (e.q.3) analysis. Finally, the study identifies the potential gap that is
MV = {(u, v ) | SAD(u, v ) ≤ SAD(m, n );− p ≤ m, n ≤ p} in existence in the traditional video codec design evolution
… (e.q.4) and how the scope of improvisation of techniques can be
explored from a better futuristic streaming application
Here SAD(m,n), the distortion outcome of candidate M.B. at perspective. The overall research statistics is pertaining to
the search location indicative of (m.n). The search range here the domain of video codec design since the year 2010 with
defined between [-p to p]; the F.S. algorithm here processes the keywords: video codec design, video compression, video
the block size of N×N. It examines the search locations and quality is found with a web-based search and represented
finally attains the best possible match yield superior PSNR, with the following figure 5. The outcome is generated
among other block matching algorithms. But in this case, through a web search in Google scholar, and the frequency
computational complexity is more when it is applied with the of relevant articles for different prominent publications is
approach of VBSME and MRFME. obtained, as shown in figure 5.

Figure 5: Web-based search result of publication count from reputed technical societies (source: Google Scholar)

By inference figure 5, it can be clearly stated that the design and analysis are emphasized. The study also further
publication count is quite higher in the case of the IEEE and explored research publication statistics in the domain of video
Elsevier publishing website, whereas in other cases, it is compression, where it indicates the frequency of usage of
observed that no that much effort is laid when video codec different Fast Block Matching (FBM) algorithms and their
8

adoption since the last decade towards reducing the problem publications on different block matching algorithms (BMA),
of computational complexity with efficient motion estimation which are adopted into designing efficient video codec since
and coding approach. Further, the study narrowed down the the year 2010.
web search and observed the frequency of research

Table 1 Total research publication on Block Matching Algorithms in Video Codec Design since 2010
Different Algorithms (Fast Block Matching) Total research publication to
date
Lossy block matching algorithm 17400
Sub sampled pixels on matching error 17200
computation
Bandwidth reduction 23000
Lossless block matching algorithms 8880
Machine Learning-based Block Matching 5640
Algorithms

Figure 6 Total research publication visual graph of block matching algorithms (source: Google Scholar)
The closer analysis of the fast block matching (F-BMA) finally the number of points are explored during the
algorithm shows that it can be classified into two distinct searching mechanism. It is observed that most of the
groups. Here the first group of algorithms is intended for lossy F.S. procedures are mechanized in a way where it
block matching algorithm, and the second group of algorithms demonstrates extreme resource-intensive functional
is designed exclusively for lossless block matching algorithm. procedures. However, the motion vector (MV)
It can be seen that lossy BMA minimizes the cost of computation considers a considerable amount of
computational effort, but if we consider the search outcome execution steps from the operational aspects.
quality, then it is not much efficient as compared to the F.S. Computational complexity incurs in this phase as,
approach. It means that the peak-signal-to-noise-ratio (PSNR) during the F.S. operation of subsequent stages with
in the context of lossy BMA is found quite inferior and which increasing size of the search parameter, the numerical
is moreover good indicative in the case of F.S. On the other values increase exponentially, and it results in higher
hand lossless BMA retain much better video quality and also computational complexity. F.S. based algorithm
in the case of F.S. algorithm is speed up the execution scenario solutions are mostly inclined towards rendering a
[66],[67],[68]. The closer interpretation of figure 6 also shows lossless form of video compression, whereas
that the majority of the work to date carried out with the subsequent development of fast BMAs mostly
approach Bandwidth reduction, whereas adoption of Machine
exhibits lossy category procedure. Therefore it can be
Learning based techniques are comparatively quite lesser
outlined that the execution of the F.S. technique with
despite having a higher scope of applicability into futuristic
the integrated environment of video codec largely
streaming applications.
depends on various factors of hardware constraints
under which an individual operates. However, Fast
o Critical analysis of Block Matching Algorithms in
BMAs have been designed with a motive to offer
the context of Video Codec Design and Compression:
better computational resource management, which
` A set of different block matching algorithms are can alleviate the compression efficiency with
widely studied till date for their functional designs, unnoticeable loss of the redundant visual object. The
and from their behavioral aspects it can be included study also explored different block matching
that BMAs exhibits a set of variations with respect to algorithms and their functional procedure involved in
4 different prime factors such as quality factor, video codec design and compression. A brief
computation execution time, distortion level and
9

discussion on the frequently adopted popular BMAs and formats is objectified. The following table shows
and their characteristic features are concluded in the the most popular BMAs and their distinct design
following table 2, and also, a further assessment of features with critical inferences.
their performance with respect to different sequences

Table 2 Critical inferencing overview of frequently adopted block-matching algorithm into video compression
Algorithmic Approach Design Features
Full Search (FS) [69] [70] 1. It simplified the design approach.
2. Computationally exhaustive during the search procedure
3. Used for Benchmarking.
Diamond Search (D.S.) [71] 1. Incorporates two distinct search pattern procedure
New Three Steps Search (NTSS) [72] 1. Improvises the outcome of TSS in terms of the quality of the result.
2. Frequently adopted a fast blocking algorithm.
Four-Step Search (4SS) [73] 1. Halfway design technique with optimized execution steps of 2 to 4.
2. Minimizes the computational effort of the F.S. algorithm.
Simple and Efficient TSS (SESTSS) [74] 1. Computationally efficient require half of the computational effort of TSS
2. Consistency in regularity performance with improvised enhancement over
TSS.

Adaptive Rood Pattern Search Algorithm 1. The faster computational speed, which is approximately 2-3 times better
[75] than the D.S. approach.
2. Exhibits a similar outcome as D.S. [76] algorithm
outcome. It has indicated that for a micro block size
o Analysis outcome: The proposed investigational of (16x16) as compared to the F.S. algorithm, other
analysis regarding fast BMAs outlines that fact that algorithms exhibit better outcomes in the context of
there exist competitiveness in the different design an average number of search point indication (per
modeling and operational procedure of MBA M.B. of size 16x16). Here it is found that for QCIF
solutions towards addressing the compression format F.S. exhibits an average number of search
problem. The insight also provides, such points as 184.6 and 204.3 for CIF whereas in
competitiveness arises with regard to their 3 prime numerical outcome is quite superior in the case of
different aspects such as complexity processing time D.S., NTSS,4SS, SESTSS and ARPS which are
and levels of observed distortion rate among the found 11.63, 15.06, 14.77, 16.13, 5.191 for QCIF and
variable length of data frame sequences. The study 13.1, 17.07, 17.71, 16.12,15.73 and 7.74 respectively.
also performed an analysis of the above stated most The study also investigated the outcome of PSNR for
popular approaches of standard BMA procedures to 50 frames, which is shown in the below table 3.
validate their performance with respect to PSNR

Table 3 Analysis of the PSNR outcome of different block matching algorithms


Frame Sequence Format FS DS NTSS 4SS SETSS ARPS
Calire QCIF 38.94 38.94 38.94 38.92 38.89 38.94
Akiyo QCIF 39.61 39.61 39.61 39.61 39.61 39.61
Carphone QCIF 30.82 30.69 30.7 30.7 30.4 30.1
News CIF 33.77 33.45 33.63 33.42 33.19 33.39
Stafan CIF 22.16 21.49 21.81 21.51 21.04 21.82
Coastguard CIF 26.19 25.98 26.05 26.02 25.6 26.05
NTSS. The concept of NTSS also exhibits similar distortion
The outcome shows the PSNR obtained by processing levels as TSS, but its additional functionalities provided better
MacroBlock (M.B.) of (16x16), it indicates that the outcome solutions in terms of computational complexity and processing
of Fast BMAs is comparable with the F.S. approach and motion vectors. Further, the 4SS approach yields improvised
comparatively similar. performance as compared to NTSS with operational benefits.
In the context of this investigational study, it is mostly found Also, a shift in search procedures is further evolved with the
that TSS can be applied over the F.S. algorithm for optimal development of D.S. algorithms, which considers that majority
resource management operations. And studies claim that it can of the motion vectors reside inside the centralized locality.
improvise the resource intensity performance of F.S. D.S. algorithms are found superior and yield better
algorithms and also at the same time provides a satisfactory satisfactory outcomes corresponding to levels of fidelity,
range of outcome of fidelity levels within the encoded video distortion calculations as compared to NTSS and 4SS. The
sequence. Hence, this solution approach is widely accepted by reduction in the number of distortion calculation male D.S.
various research-based study review to strengthen the motion algorithms popular, but further Block based gradient descent
estimation procedure. However, in mid-nineties, centre based search (BBGDS) evolved with a more remarkable solution
pattern search mechanisms are designed with integrating with reduced computational complexity and higher
10

convergence solutions. But the limitation of BBGDS arises context of convolutional layers and deconvolutional
when computation of hexagonal attributes of motion vector layers, also the approach claimed to be performed
point calculation is concerned, here standard 3x3 and square- well.
based approach of window search provided a better outcome. o Research Gap Identified based on study and
o Machine Learning (ML) based video codec design analysis: The critical analysis shows that block-
approaches: In the conventional line of research in matching algorithms in the context of video codec
the current era, for the purpose of enhancing the design outperform various conventional video
traditional approach of video compression compression approaches with respect to the
architecture, deep learning (DL) techniques have computational complexity aspect, as it optimizes the
incorporated convolutional neural networks (CNN) to cost of computation with simplified solution
evolve up for better compression outcome with high- approach. However, it does not ensure generating
efficiency predictive coding with learning scenario. video frame sequences of high perceptual quality
The approach comprises features from both the fields once decoded by the inter-frame decoder. However, it
of computer vision and video processing. The study is observed that machine learning approaches have
by the authors (Krizhevsky et al.,2012) [77], been applied with Fast Block Matching Algorithms to
introduced a novel architecture of CNN which balance this trade-off of video quality and the
comprises of 8-layers which have obtained better computational complexity of predictive motion
classification performance with lower error rate as compensation and estimation where deep learning
compared to the previous literature. Further, another has gained much more popularity. However, the
study by (Girshick et al.,2014) applied CNN based solution space indicates that the traditional frequently
learning model in video coding for efficient object adopted approaches got design limitations and also
tracking purposes. The study also extracted efficient the scope of better design perspectives, which can
CNN features for the classification purpose[78]. strengthen the line of research in futuristic video
Similar work has been carried out by the author codec designs where video streaming applications are
(Dong et al.,2014) [79], where 2-layer CNN is used envisioned to be integrated with 5G communication
for single video frame compression. The study here standard.
addressed the resolution problem of a single image,
and the outcome obtained found quite superior in The critical analysis also shows that the majority of the deep
terms of video frame reconstruction quality with learning-based approaches are suitable for predictive analysis
higher computational speed. In the year 2017, Zhang during video coding corresponding to intra-frame and inter-
frame analysis but lacks in terms of computational
et al. [80] introduced another approach of image
performance and also do not guarantee superior performance
compression using CNN architecture, where it has in terms of accuracy, precision, and F-1 score. And also, it is
applied for image denoising. The outcome shows that observed in section 5.4 is- the majority of the research effort
the approach can handle the denoising problem for still focuses on image compression with CNN and Recurrent
several images and also reduces the compression Neural Network (RNN) architecture based solution
artifacts to a greater extent. But still, the experts approaches where video codec design using deep learning is
question whether deep learning can benefit and yet to receive its full-proof solution.
potentially assist the image/video coding or not. The study considered a large-scale dataset corresponding to
However, in the past various researchers have 32x32 thumbnails for the validation purpose. Further, the
envisioned to apply neural network-based image study of Johnston et al. [84] improvised the RNN based
coding and also illustrated its potential approach by adding a new feature of hidden-state priming.
benefits,[81][82]. But at that time between the 1980s The study claims better results as compared to the MS-SSIM
approach, where Kodak images are used. Another study by
to 1990s, networks were shallow and not dense as of
Covell et al. [85] improvises spatially adaptive bit rates with
the modern computer networks. But still, at that time,
optimized training performance of stop-code tolerant based
CNN could not provide better outcomes when learning models.
compression efficiency is checked for image and
video both. However, better scope of optimizing the VI. CONCLUSION
compression with superior computational VC techniques have a wide range of application areas/use-
performance is followed by the approach of Toderici cases starting from multimedia transmission to high definition
et al. [83]. The authors proposed a computational television, and the current research trend and timeline
model for variable rate image compression using the indicates that the global video traffic of users might increase
LSTM framework of deep learning. The outcome of by 82% of overall Internet traffic by the year of 2022. As the
the study illustrates that binary quantization based current multimedia streaming applications require efficient
coding provides scalable coding performance when management of both storage and processing aspects with
integrated with RNN and LSTM models. In the timely execution and high definition video streaming through
11

mobile platforms, therefore to cope up with the futuristic Fidelity Range Extensions". SPIE conference on Applications of
requirements of dynamic user-base, high efficient video Digital Image Processing XXVII.
coding approaches are envisioned. For many years the [10] Sullivan, G. J. & Wiegand, T. (2005). "Video Compression -
From Concepts to the H.264/AVC Standard". Proceedings of the
combined research effort by the ISO/IEC MPEG and ITU-T IEEE; Vol.93 (1); pp.18-31.
VCEG expert groups strategically designed and developed a [11] Mir, Junaid, Dumidu S. Talagala, and Anil Fernando.
successful H.264/AVC standard where motion compensation "Optimization of HEVC λ-domain rate control algorithm for HDR
(MC) plays a very crucial role in the elimination of temporal video." In 2018 IEEE International Conference on Consumer
redundancy from the frame-sequence. In this aspect of Electronics (ICCE), pp. 1-4. IEEE, 2018.
H.264/AVC, macro-blocks are divided into blocks, which [8] Cisco, V. N. I. ‘‘Cisco visual networking index: Forecast and
trends, 2017– 2022,’’ Cisco, San Jose, CA, USA, White Paper 1,
makes H.264/AVC more flexible. The idea behind motion 2018.
compensation has been adopted into most of the legacy of [9] Alliance for Open Media. Accessed: Sep. 2019. [Online].
international video coding standards. It functions in a way Available: https://aomedia.org/
where the predictive coding locally forms the current frame by [10] (Feb. 2019). Joint Video Experts Team (JVET): Test Model 3 of
taking reference frames into considerations. The study in this Versatile Video Coding (VTM 3). Accessed: Sep. 2019. [Online].
research track evaluates the functional components of Available: https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM
[11] BOVIK, A. C. (2009). "Chapter 1 - Introduction to Digital Video
conventional video codec design aspects and also highlights Processing". In: The Essential Guide to Video Processing, 2nd
the trend of research evolution since last once a decade to edition; Boston: Academic Press.
date. It also shows the strength factors of traditional block [12] Bross, B., Han, W. J., Ohm, J. R., Sullivan, G. J. & Weingand,
matching algorithms and their scope in improvising the T. (2012). "High Efficiency Video Coding (HEVC), text specification
performance of video compression by balancing the draft 6". Doc.JCTVC-H1003, Joint Collaborative Team on Video
complexity and quality aspects. Along with the discussion of Coding (JCT-VC) of ITU-T VCEG and ISO/IEC MPEG; Vol.21.
[13] Kim, Chanyul. "Complexity adaptation in video encoders for
fundamental design aspects of video codec, the study also power limited platforms." PhD diss., Dublin City University, 2010.
outlines the research gap, which indicates that the scope of [14]Al-Mualla, Mohammed, C. Nishan Canagarajah, and David R.
Machine Learning (ML) approaches to alleviate the video Bull. Video coding for mobile communications: efficiency,
codec design to another level, which will be more suitable for complexity and resilience. Elsevier, 2002.
futuristic streaming applications. [15]Metkar, Shilpa P., and Sanjay N. Talbar. "Fast motion estimation
using modified orthogonal search algorithm for video compression."
Signal, Image and Video Processing 4, no. 1 (2010): 123-128.
REFERENCES [16]Information technology – Coding of moving pictures and
[1] Tian, L., LI, S., Ahn, J., Chu, D., Han, R., LV, Q. & Mishra, S. associated audio for digital storage media at up to about 1.5 Mbits/2
(2013). "Understanding User Behavior at Scale in a Mobile Video part 2, ISO/IEC (1993)
Chat Application". In Proc. UbiComp '13 ACM International Joint [17]Sayood K, "Introduction to Data Compression", 3rd edition:
Conference on Pervasive and Morgan Kaufmann, 2006
Ubiquitous Computing; pp.647-656. [18] Pohjola, Teemu, Martti Kesaniemi, Kalle Soukka, and Tero
[2] Emarketer (2013). "Mobile, Video Drive Up Digital Ad Myllymaki. "Video compression method optimized for low-power
Investment in the U.K.". [Online]. Available: decompression platforms." U.S. Patent Application 10/792,310, filed
http://www.emarketer.com/Article/Mobile-Video-Drive-Up- Digital- September 8, 2005.
Ad-Investment-UK/1010097; [Accessed 5.05. 2017]. [19] "H.120", https://www.en.wikipedia.org/wiki/H.120, retrieved on
[3] Ohm, J. & Sullivan, G. J. (2013). "High Efficiency Video Coding: 10-06-2020
The Next Frontier in Video Compression [Standards in a Nutshell]". [20]"H.261", https://www.en.wikipedia.org/wiki/H.261, retrieved on
IEEE Signal Processing Magazine; Vol.30(1); pp.152-158. 10-06-2020
[4] RICHARDSON, I. (2003). "H.264 and MPEG-4 Video [21]Aguilera P. Comparison of Different Image Compression
Compression: Video Coding for Next Generation Multimedia": Formats. Wisconsin College of Engineering, ECE 533; 2006
Wiley. [22] Al-Najdawi, Nijad, M. Noor Al-Najdawi, and Sara Tedmori.
[5] https://fortune.com/company/netflix/fortune500/ "Employing a novel cross-diamond search in a modified hierarchical
[6] https://www.cnbc.com/2018/07/05/netflix-and-amazon-are- search motion estimation algorithm for video
struggling-to-win-over-indian-viewers.html compression." Information Sciences 268 (2014): 425-435.
[7] "Youtube.com Traffic, Demographics and [23]Gonzalez, R., Woods, R. & Eddins, S, "Digital Image processing
Competitors". www.alexa.com. Retrieved January 11, 2020. using MATLAB", 2nd edition: Gatesmark Publishing, 2009
[5] ISO/IEC (1993). "Information technology – Coding of moving [24]ITU-T & ISO/IEC (2003). "Advanced Video Coding for Generic
pictures and associated audio for digital storage media at up to about Audiovisual Services". H.264, MPEG, 14496-10.
1,5 Mbit/s – Part 2: video". ISO/IEC 11172-2. [25]Sullivan, G. J. & Wiegand, T. (2005). "Video Compression -
[6] ISO/IEC (1996). "Information technology – Generic coding of From Concepts to the H.264/AVC Standard". Proceedings of the
moving pictures and associated audio – Part 2: video". ISO/IEC IEEE; Vol.93(1); pp.18-31.
13818-2. [26] Ohm, J. & Sullivan, G. J, "High Efficiency Video Coding: The
[7] ISO/IEC (1999). "Coding of moving pictures and associated Next Frontier in Video Compression [Standards in a Nutshell]". IEEE
audio for digital storage media at up to about 1.5 Mbit/s". ISO/IEC Signal
11172-3. Processing Magazine; Vol.30(1); pp.152-158, 2013
[8] ITU-T & ISO/IEC (2003). "Advanced Video Coding for Generic [27] Series, H. "Audiovisual and Multimedia Systems, Infrastructure
Audiovisual Services". H.264, MPEG, 14496-10. of audiovisual services—Coding of moving video, Advanced video
[9] Sullivan, G., Topiwala, P. & Luthra, A. (2004). "The H.264/AVC coding for generic audiovisual services." International
Advanced Video Coding Standard: Overview and Introduction to the Telecommunication Union (2003): 144-153.
12

[28] Sullivan, G., Topiwala, P. & Luthra, A, "The H.264/AVC Asia and South Pacific Design Automation Conference, 2005., vol. 2,
Advanced Video Coding Standard: Overview and Introduction to the pp. 1264-1267. IEEE, 2005.
Fidelity Range Extensions", SPIE conference on Applications of [48]KOU, W. (1995). "Digital Image Compression: Algorithms and
Digital Image Processing XXVII, 2004 Standards": Kluwer Academic Publishers
[29] Halbach, Till. "The H. 264 video compression standard." [49] P.U., I. M. (2005). "Fundamental Data Compression":
In Proceedings of 6th Nordic Signal Processing Symposium. 2003. Butterworth-Heinemann
[30]Vanne, Jarno Johannes. "Design and Implementation of [50]Pereira, Fernando CN, Fernando Manuel Bernardo Pereira,
Configurable Motion Estimation Architecture for Video Encoding", Fernando C. Pereira, Fernando Pereira, and Touradj Ebrahimi. The
2011 MPEG-4 book. Prentice Hall Professional, 2002.
[31] Sullivan, G. J. & Wiegand, T, "Video Compression - From [51]Yu, Lu, and Jian-peng Wang. "Review of the current and future
Concepts to the H.264/AVC Standard". Proceedings of the IEEE; technologies for video compression." Journal of Zhejiang University
Vol.93(1); pp.18-31, 2005 SCIENCE C 11, no. 1 (2010): 1.
[32] Bhaskaran, V. & Konstantinides, K, "Image and Video [52]ISO/IEC (1993). "Information technology – Coding of moving
Compression Standards: Algorithms and Architecture", 2nd edition: pictures and associated audio for digital storage media at up to about
Kluwer 1,5 Mbit/s – Part 2:video". ISO/IEC 11172-2
Academic Publishers, 1997 [53] ISO/IEC (1996). "Information technology – Generic coding of
[33] Diepold, Klaus, and Sebastian Moeritz. Understanding MPEG- moving pictures and associated audio – Part 2: video". ISO/IEC
4. Taylor & Francis, 2005. 13818-2.
[34] Sullivan, Gary J., and Thomas Wiegand. "Video compression- [54] Rec, I. T. U. T. "H. 264: Advanced video coding for generic
from concepts to the H. 264/AVC standard." Proceedings of the audiovisual services." ITU-T Rec. H. 264-ISO/IEC 14496-10
IEEE 93, no. 1 (2005): 18-31. AVC (2005).
[35]D.S. Huang, The Study of Data Mining Methods for Gene [55] Schwarz, Heiko, Detlev Marpe, and Thomas Wiegand.
Expression Profiles, Science Press of China, March 2009 "Overview of the scalable video coding extension of the H. 264/AVC
[36]D.S.Huang, “A constructive approach for finding arbitrary roots standard." IEEE Transactions on circuits and systems for video
of polynomials by neural networks,” IEEE Transactions on Neural technology 17, no. 9 (2007): 1103-1120.
Networks, vol.15, no.2, pp.477-491, 2004 [56] Richardson, Iain E. The H. 264 advanced video compression
[37]D.S.Huang, and Wen Jiang, “A general CPL-AdS methodology standard. John Wiley & Sons, 2011.
for fixing dynamic parameters in dual environments,” IEEE Trans. on [57] Correa, Guilherme, Pedro A. Assuncao, Luciano Volcan
Systems, Man and Cybernetics - Part B, vol.42, no.5, pp.1489-1500, Agostini, and Luis A. da Silva Cruz. "Pareto-based method for high
2012 efficiency video coding with limited encoding time." IEEE
[38]Li Shang, D.S.Huang, Ji-Xiang Du, and Chun-Hou Zheng, " Transactions on Circuits and Systems for Video Technology 26, no. 9
Palmprint recognition using FastICA algorithm and radial basis (2015): 1734-1745.
probabilistic neural network," Neurocomputing, vol.69, nos.13-15, [58] Gharavi, H., and Mike Mills. "Blockmatching motion estimation
pp. 1782-1786, 2006 algorithms-new results." IEEE Transactions on Circuits and
[39]Zhan-Li Sun, D.S.Huang, and Chun-Hou Zheng, Li Shang, Systems 37, no. 5 (1990): 649-651.
“Optimal selection of time lags for temporal blind source separation [59] Ji, Wen, Jiangchuan Liu, Min Chen, and Yiqiang Chen. "Power-
based on genetic algorithm,” Neurocomputing, vol.69, nos.7-9, efficient video encoding on resource-limited systems: A game-
pp.884–887, 2006. theoretic approach." Future Generation Computer Systems 28, no. 2
[40] Chun-Hou Zheng, D.S.Huang, Zhan-Li Sun, Michael R. Lyu, (2012): 427-436.
and Tat-Ming Lok, "Nonnegative independent component analysis [60] Huang, Yu-Wen, Shyh-Yih Ma, Chun-Fu Shen, and Liang-Gee
based on minimizing mutual information technique," Chen. "Predictive line search: an efficient motion estimation
Neurocomputing, vol.69, nos.7-9, pp.878–883, 2006 algorithm for MPEG-4 encoding systems on multimedia
[41]Li Shang, D.S.Huang, Chun-Hou Zheng, and Zhan-Li Sun, processors." IEEE Transactions on Circuits and Systems for Video
"Noise removal using a novel non-negative sparse coding shrinkage Technology 13, no. 1 (2003): 111-117.
technique," Neurocomputing, vol.69, nos.7-9, pp.874–877, 2006. [61]Horn, B. & Schunck, B. (1981). "Determining Optical Flow".
[42]D.S.Huang, Xing-Ming Zhao, Guang-Bin Huang, and Yiu-Ming Artifical Intelligence; Vol.17; pp.185-203
Cheung, “Classifying protein sequences using hydropathy blocks,” [62]Fei Han, D.S.Huang, "Improved extreme learning machine for
Pattern Recognition, vol.39, no.12, pp. 2293–2300,2006 function approximation by encoding a priori information,"
[43]Chun-Hou Zheng, D.S.Huang, and Li Shang, “Feature selection Neurocomputing, vol.69, nos.16-18, pp.2369-2373, 2006
in independent component subspace for microarray data [63]Richardson, I. E. G. (2010). "The H.264 advanced video
classification,” Neurocomputing, vol.69, nos.16-18, pp.2407-2410, compression standard", 2nd edition; U.K.: John Wiley & Sons Inc.
2006. [64] Beasley, David, David R. Bull, and Ralph R. Martin. "A
[44] Beasley, David, David R. Bull, and Ralph R. Martin. "A sequential niche technique for multimodal function
sequential niche technique for multimodal function optimization." Evolutionary computation 1, no. 2 (1993): 101-125.
optimization." Evolutionary computation 1, no. 2 (1993): 101-125. [65] Kamble, Shailesh D., Nileshsingh V. Thakur, and Preeti R.
[45] Han, Fei, Tat-Ming Lok, and Michael R. Lyu. "A new learning Bajaj. "Modified Three-Step Search Block Matching Motion
algorithm for function approximation incorporating a priori Estimation and Weighted Finite Automata based Fractal Video
information into extreme learning machine." In International Compression." International Journal of Interactive Multimedia &
Symposium on Neural Networks, pp. 631-636. Springer, Berlin, Artificial Intelligence 4, no. 4 (2017).
Heidelberg, 2006. [66] Zhao, Hui, Xin-bo Yu, Jia-hong Sun, Chang Sun, and Hao-zhe
[46]Jizheng, X., Feng, W. & Wenjun, Z. (2009). "Intra-Predictive Cong. "An enhanced adaptive rood pattern search algorithm for fast
Transforms for Block-Based Image Coding". Signal Processing, block-matching motion estimation." In 2008 Congress on Image and
IEEE Transactions on; Vol.57(8); pp.3030-3040 Signal Processing, vol. 1, pp. 416-420. IEEE, 2008.
[47] Li, Tiejun, Sikun Li, and Chengdong Shen. "A novel [67] Huang, Yu-Wen, Shao-Yi Chien, Bing-Yu Hsieh, and Liang-
configurable motion estimation architecture for high-efficiency Gee Chen. "Global elimination algorithm and architecture design for
MPEG-4/H. 264 encoding." In Proceedings of the ASP-DAC 2005.
13

fast block matching motion estimation." IEEE Transactions on


circuits and systems for Video technology 14, no. 6 (2004): 898-907.
[68]CAI, C., ZENG, H. & MITRA, S. (2009). "Fast motion
estimation for H.264". Signal Processing: Image Communication.
[69] Blelloch, Guy E. "Introduction to data compression." Computer
Science Department, Carnegie Mellon University (2001).
[70]Srinivasan, R. & RAO, K. R. (1985). "Predictive coding based on
efficient motion estimation". IEEE Transactions on Communications;
Vol.33(8); pp.888– 896.
[71] Shan, Z. & Kai-Kuang, M. (1997). "A new diamond search
algorithm for fast block matching motion estimation". In Proc. IEEE
International Conference MUKHTAR, Information, Communications
and Signal Processing (ICICS); Vol.1; pp.292-296.
[72]Reoxiang, L., Bing, Z. & Liou, M. L. (1994). "A new three-step
search algorithm for block motion estimation". IEEE Transactions on
Circuits and Systems for Video Technology; Vol.4(4); pp.438-442
[73] Lai-Man, P. & WING-CHUNG, M. (1996). "A novel four-step
search algorithm for fast block motion estimation". IEEE
Transactions on Circuits and Systems for Video Technology; Vol.6;
pp. 313–317
[74]Jianhua, L. & Liou, M. L. (1997). "A simple and efficient search
algorithm for block-matching motion estimation". IEEE Transactions
on Circuits and Systems for Video Technology; Vol.7(2); pp. 429-
433
[75] NIE, Y. & M.A., K.-K. (2002). "Adaptive rood pattern search
for fast blockmatching motion estimation ". IEEE Trans on Image
Processing; Vol.11(12); pp.1442-1448
[76] Saito, Takahiro, and Takashi Komatsu. "Extending block-
matching algorithms for estimating multiple image motions."
In Proceedings of 1st International Conference on Image Processing,
vol. 1, pp. 735-739. IEEE, 1994.
[77]A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet
classification with deep convolutional neural networks,” in NIPS,
2012, pp. 1097–1105.
[78] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature
hierarchies for accurate object detection and semantic segmentation,”
in CVPR, 2014, pp. 580–587
[79] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep
convolutional network for image super-resolution,” in ECCV.
Springer, 2014, pp. 184–199
[80] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a
Gaussian denoiser: Residual learning of deep CNN for image
denoising,” IEEE Transactions on Image Processing, vol. 26, no. 7,
pp. 3142–3155, 2017.
[81] R. D. Dony and S. Haykin, “Neural network approaches to
image compression,” Proceedings of the IEEE, vol. 83, no. 2, pp.
288–303, 1995.
[82]J. Jiang, “Image compression with neural networks–A survey,”
Signal Processing: Image Communication, vol. 14, no. 9, pp. 737–
760, 1999
[83]G. Toderici, S. M. O’Malley, S. J. Hwang, D. Vincent, D.
Minnen, S. Baluja, M. Covell, and R. Sukthankar, “Variable rate
image compression with recurrent neural networks,” arXiv preprint
arXiv:1511.06085, 2015.
[84]N. Johnston, D. Vincent, D. Minnen, M. Covell, S. Singh, T.
Chinen, S. Jin Hwang, J. Shor, and G. Toderici, “Improved lossy
image compression with priming and spatially adaptive bit rates for
recurrent networks,” in CVPR, 2018, pp. 4385–4393
[85]M. Covell, N. Johnston, D. Minnen, S. J. Hwang, J. Shor, S.
Singh, D. Vincent, and G. Toderici, “Target-quality image
compression with recurrent, convolutional neural networks,” arXiv
preprint arXiv:1705.06687, 2017

You might also like