0% found this document useful (0 votes)
24 views11 pages

A Real-Time Low-Power Coding Bit-Rate Control Scheme For High-Efficiency Video Coding in A Multiprocessor System-on-Chip

This document discusses a real-time low-power coding bit-rate control scheme for high-efficiency video coding (HEVC) in a multiprocessor system-on-chip (MPSoC). It presents a hardware-oriented algorithm that uses simple coding bit-rate control functions instead of complex mathematical operations, to enable low-power and real-time HEVC applications on mobile MPSoCs. An adequate low-complexity hardware architecture is also proposed to accomplish the low-power and high-speed VLSI design of a coding bit-rate controller. Experimental results show the design can achieve low power consumption and high speeds using low-complexity hardware under varying video sizes and coding bit-rate constraints.

Uploaded by

palansamy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views11 pages

A Real-Time Low-Power Coding Bit-Rate Control Scheme For High-Efficiency Video Coding in A Multiprocessor System-on-Chip

This document discusses a real-time low-power coding bit-rate control scheme for high-efficiency video coding (HEVC) in a multiprocessor system-on-chip (MPSoC). It presents a hardware-oriented algorithm that uses simple coding bit-rate control functions instead of complex mathematical operations, to enable low-power and real-time HEVC applications on mobile MPSoCs. An adequate low-complexity hardware architecture is also proposed to accomplish the low-power and high-speed VLSI design of a coding bit-rate controller. Experimental results show the design can achieve low power consumption and high speeds using low-complexity hardware under varying video sizes and coding bit-rate constraints.

Uploaded by

palansamy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

264 IEEE SYSTEMS JOURNAL, VOL. 16, NO.

1, MARCH 2022

A Real-Time Low-Power Coding Bit-Rate Control


Scheme for High-Efficiency Video Coding in a
Multiprocessor System-on-Chip
Jui-Hung Hsieh , Member, IEEE, Jing-Cheng Syu, Zhe-Yu Guo, and Zhi-Yu Zhang

Abstract—A real-time high-performance transmission In recent years, an emerging video coding standard,
bandwidth-aware (TB-aware) coding bit-rate (CBR) controller H.265/high-efficiency video coding (HEVC) [1], has become
design with low power consumption and low hardware complexity the principal video coding standard, which yields coding bit-rate
is presented in this article for H.265/high-efficiency video coding
(HEVC) in a multiprocessor system-on-chip (MPSoC). Previous (CBR) savings of approximately 50% compared to H.264/AVC
TB-aware motion estimation designs with CBR-control capability [2] for the same perceived quality and at high resolutions,
in video coding have focused on algorithm development with thereby realizing real-time high-definition video coding. The
precise CBR models, which require a complicated algorithmic latest video coding standard, H.265/HEVC, preserves most
derivation according to the system on-demand CBR and are
of the beneficial coding tools in H.264/AVC; additionally,
difficult to realize in very large scale integration (VLSI) due
to their lack of consideration for hardware implementation H.265/HEVC provides more miscellaneous coding tools, such
and modeling. Consequently, we present a hardware-oriented as new coding hierarchy selections of coding tree units (CTUs),
CBR-control algorithm that uses simple CBR control functions transform units (TUs), and prediction units (PUs), along with
instead of requiring root and exponential operations to realize the diverse coding block size selections, to satisfy the requirements
real-time low-power design objective for HEVC applications within
of real-time high-definition video coding. Furthermore, the CTU
a mobile MPSoC. Then, an adequate hardware architecture with
low hardware complexity is exploited to accomplish a low-power size selection varies from 64 × 64 to 8 × 8, and each CTU may
and high-speed VLSI design of a CBR controller for our proposed be further partitioned into small quadtree-based coding units
algorithm. Using diverse video-sized sequences under on-demand (CUs). In addition, H.265/HEVC adopts distinct TU sizes from
system coding-bit-rate constraints, the experimental outcomes 32 × 32 down to 4 × 4 and symmetrical/asymmetrical PUs.
demonstrate that the introduced design is capable of low power
H.265/HEVC cooperates with motion search algorithms and
consumption and high speeds and can utilize low-complexity
hardware. motion prediction algorithms within interframe and intraframe
predictions to realize the reduction of the CBR and real-time
Index Terms—High-efficiency video coding (HEVC), motion HEVC design objectives. However, the transmission bandwidth
estimation (ME), rate control, very large scale integration (VLSI).
(TB) supplied for mobile video coding use varies over time due
to simultaneous multiple users and manifold applications; thus,
I. INTRODUCTION the available TB for mobile video coding use is not consistent.
Meanwhile, if the HEVC chip within a mobile MPSoC [3],
HE APPLICATION services provided by mobile devices,
T such as real-time video streaming and video conferences,
enrich and facilitate people’s lives, and video coding technology
[4] cannot dynamically adjust the video encoding algorithm
during runtime to conform to the time-varying TB constraints,
the performance of the HEVC chip will be degraded, and the
plays an important role in these services. Large amounts of high-performance HEVC chip design goals will not be realized.
uncompressed original video data are reduced via a video coding Additionally, regardless of whether the video coding standard
technique, thereby realizing substantial data compression. This is H.264/AVC or H.265/HEVC, motion estimation (ME) is a
technique can substantially decrease video encoding informa- highly important coding tool within interframe prediction for
tion storage and encoded information transmission, thus en- CBR control and is the most computationally complex and
abling consumers to view and enjoy real time and high-definition highest power-consuming processing module [3], [5] in video
videos. coding, including computing power, external memory accessing
power, and data transmission power. Among these three power
Manuscript received June 9, 2020; revised October 18, 2020 and January dissipation sources, the computing power and memory access-
27, 2021; accepted March 18, 2021. Date of publication April 21, 2021; date ing power depend on a motion search algorithm and motion
of current version March 24, 2022. This work was supported by the Ministry search range, respectively. For the former, superior solutions for
of Science and Technology, Taiwan, under Grant 107-2221-E-992-082-MY2.
(Corresponding author: Jui-Hung Hsieh.) both H.264/AVC and H.265/HEVC have been proposed in the
The authors are with the Department of Computer and Communication Engi- literature [6]–[9], in which a fast algorithm is used to terminate
neering, National Kaohsiung University of Science and Technology, Kaohsiung the motion search early; for the latter, an exceptional solution
82445, Taiwan (e-mail: [email protected]; [email protected];
[email protected]; [email protected]). is available in which the ME algorithm and ME VLSI archi-
Digital Object Identifier 10.1109/JSYST.2021.3069477 tecture are utilized [3], [10]. Regarding the data transmission
1937-9234 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
HSIEH et al.: REAL-TIME LOW-POWER CODING BIT-RATE CONTROL SCHEME FOR HEVC IN A MPSOC 265

power, numerous video coding standards work with bit-rate other hand, the inferior power dissipation and inferior hardware
control capability [11]–[16] and offer solutions, and the auxiliary cost characteristics make the proposed design appropriate to
consequence indicates that the presented ME algorithm can be amalgamated with the sophisticated HEVC design into a
automatically adjust the CBR usage for the data transmission power-on-demand mobile MPSoC for the fulfillment of a truly
according to the on-demand time-altering TB conditions; at the high-performance low-power TB-aware HEVC design.
same time, these designs realize superior video coding perfor- The remainder of this article is detailed as follows. Section II
mance. In addition, these methods [11]–[16] provide precise concisely explores the previous H.265/HEVC ME designs with
and accurate ME algorithm modeling by applying complicated and without CBR-control capability. Section III describes our
coding models with complicated mathematical formulas; hence, presented hardware-oriented H.265/HEVC ME CBR-control
they are difficult to use for the VLSI implementation. algorithm in detail and presents the homologous simulation
The complex ME algorithm anticipates an increase in the com- results. Section IV presents the suggested VLSI hardware ar-
putational power consumption of the HEVC chip and thus is not chitecture and the implementation results. Finally, Section V
suitable to be embedded in an MPSoC within battery-powered concludes this article.
mobile devices. Therefore, this article presents a codesign of ME
algorithms and the VLSI hardware architecture to obtain a TB-
aware ME CBR controller design that has the advantages of low II. RELATED CBR-CONTROL ME WORKS
hardware cost and low-power dissipation characteristics for the The existing video coding designs that are capable of CBR
embedding of HEVC chips inside an MPSoC. First, this article control for the H.265/HEVC video coding standards are con-
introduces a VLSI-hardware-oriented HEVC CBR-control ME cisely reviewed and discussed in this section.
algorithm based on a system-level heuristic method, the main
objective of which is realized in VLSI by developing an accurate
and uncomplicated hardware-accomplishable algorithm model A. Previous ME Without CBR-Control Capability in
with numerous offline experimental data analyses. Next, we H.265/HEVC
develop a VLSI hardware architecture that is appropriate for ME operations utilize the interframe correlation to reduce the
the proposed HEVC CBR-control ME algorithm, and the VLSI redundancy of coding data between consecutive video coding
implementation results demonstrate the power feasibility of the frames; then, they economize the amount of video coding data
proposed HEVC CBR-controller design. to perform video compression. A large number of cutting-edge
This article makes three principal contributions. First, the studies [5], [17], [18] have discussed and focused on this
proposed hardware-achievable algorithm and the VLSI imple- topic. Sinangil et al. [5] presented a low-bandwidth and low-
mentation of the TB-aware CBR controller design perform real- hardware-area H.265/HEVC ME engine design methodology
time HEVC ME rate control with low power and low hardware for algorithm modeling and hardware architecture according
complexity. The performance of our method is compared with to distinct ME design configurations, which included smaller
cutting-edge HEVC ME rate control, which requires compli- coding block sizes to larger coding block sizes. Their simulation
cated mathematical modeling and is difficult to implement in results demonstrated that the introduced ME design achieved
VLSI. Second, previous HEVC ME designs with rate-control prominent coding performance between coding efficiency and
capability are appropriate for software realization instead of hardware cost via quantitative analysis and hardware-oriented
hardware realization, which implies that they are not adequate algorithms. Additionally, a fast ME design that was obtained
for implementation inside an MPSoC. Therefore, integrating the via an algorithm and a hardware architecture codesign and based
presented TB-aware CBR controller design with an up-to-date on a predictive integer ME algorithm and a PU size-dependent
HEVC ME design can achieve a promising TB-aware HEVC fractional ME algorithm, along with scheduling and prediction
design scheme. Third, this article presents an all-embracing techniques, was introduced in [17]. Xiong et al. [18] introduced
exploration and comparison of the coding quality and CBR a rapid CU determination method according to the sum of the
under different target coding rates with QP settings equal absolute differences and rate-distortion cost for coding time
to 22, 27, 32, and 37 for the latest video coding standard, reduction.
H.265/HEVC. According to thorough experimental results that However, these ME studies did not consider the time-varying
are verified using the standard HEVC test sequence, the pro- TB limitations that are encountered in mobile video coding
posed TB-aware HEVC ME controller design greatly decreases operating on an MPSoC and are therefore not conducive to mo-
the complexity of the rate-control ME design because of its pro- bile video coding applications and result in coding performance
posed hardware-oriented rate-control algorithm, which exploits degradation due to insufficient TB.
hardware-accomplishable mathematical modeling rather than
the conventional complex mathematical modeling. Concretely,
the complexity of the VLSI design and hardware cost of the B. Previous ME With CBR-Control Capability in H.265/HEVC
ME rate control are lessened due to our proposed hardware- To overcome the time-varying TB problem of a mobile MP-
oriented ME algorithm, which signifies that the presented de- SoC when applying the H.265/HEVC video standard, many
sign can simultaneously attain the low-power dissipation and state-of-the-art ME designs with CBR-control capability [13]–
high-performance VLSI realization objectives as a result of the [16], [19]–[22] have been proposed. Wang et al. [13] presented
alleviated hardware complexity and computing power; on the a low-delay Lagrange multiplier-based rate control framework

Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
266 IEEE SYSTEMS JOURNAL, VOL. 16, NO. 1, MARCH 2022

that incurred low complexity overhead and maintained a consis-


tent coding quality. Fiengo et al. [14] proposed a low-execution-
time recursive rate-distortion model that used a convex optimiza-
tion method to resolve the frame-level rate allocation problem.
For a more accurate bit estimation, a temporal-layer λ-domain
picture-level CBR-control design was introduced in [15]. Li
et al. [16] presented an optimal bit allocation algorithm for
distinct coding levels according to λ-domain R-D theory. In
addition, a λ-domain CBR-control scheme based on the R-λ
model was proposed in [19]. Van et al. [20] introduced efficient
bit-rate transcoding for the CU-level and PU-level complex-
ity reduction. The adaptive quantization parameter cascading
scheme for assigning group-of-picture-, frame-, and CU-level
quantization parameters for rate control was proposed in [21]. Li
et al. [22] presented a CTU-level rate control scheme to perform
bit allocation based on the recursive Taylor expansion method.
These designs incorporated the time-varying TB limitations
into the ME design considerations during the algorithmic mod-
eling process. The experimental results demonstrated that the
aforementioned video coding designs could be realized under
limited TB constraints and yield superior coding quality. How-
ever, these TB-on-demand ME designs were modeled by means
of a highly complex mathematical algorithm and under a lack of
hardware implementation considerations, making them difficult
to implement in VLSI and not applicable to a mobile MPSoC.

III. PROPOSED HARDWARE-ORIENTED CBR-CONTROL ME


ALGORITHM AND SIMULATION RESULTS Fig. 1. Overall flowchart of our proposed H.265/HEVC ME with CBR-control
capability.
This section presents a TB-aware HEVC ME CBR-controller
design that simultaneously considers the VLSI complication
and video coding performance to realize the real-time low-
power mobile design objective for HEVC chips embedded in
an MPSoC within mobile devices. The proposed CBR-control
ME algorithm implements a video coding procedure under the
system CBR constraints while concurrently considering the
coding performance and the VLSI implementation complexity
to achieve a low hardware cost and an acceptable coding perfor-
mance. The simulation results demonstrate that the introduced
hardware-oriented coding algorithm can realize superior coding
quality with a simple hardware architecture under the target CBR
constraints (CBRT ).
A flowchart of our proposed hardware-oriented HEVC CBR-
controlled ME algorithm based on a heuristic method is de- Fig. 2. Relationship between CBR and QP for the class-A traffic video
scribed in Fig. 1. At the beginning of the design procedure in sequence.
Fig. 1, we consider various target CBR points (i.e., CBRT ),
which represent the time-altered TB for video coding use. Then,
we begin our TB-aware H.265/HEVC ME algorithmic modeling
Fig. 2 presents the CBR (kilobits per second, kbps) versus QP
by partitioning the design procedure into two coding layers:
effects when varying QP from 1 to 51 incrementally by one with
the frame layer (FL) and the CTU layer. The overall hardware-
low-delay P main profile settings and a full search ME algorithm
oriented TB-aware ME CBR-control design process is explained
in HEVC reference software (i.e., HM [23]) for a traffic sequence
in detail and shown in Sections III-A and III-B.
video with a resolution of 2560 × 1600 (i.e., class A). Similar
CBR-QP curves can be obtained for sequences [24] of class B,
A. Proposed FL TB-Aware ME CBR-Control Algorithm class C, class D, class E, and 4K (i.e., 3840×2160-sized) videos.
To determine the FL CBR and ME interrelation, we plot the According to Fig. 2, the larger the QP, the lower the CBR.
CBRs as functions of assorted quantization parameters on test Furthermore, a nonlinear relationship between the CBR and the
video sequences with various video resolutions and frame rates. coding quality has been identified in the literature [2]. Combined

Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
HSIEH et al.: REAL-TIME LOW-POWER CODING BIT-RATE CONTROL SCHEME FOR HEVC IN A MPSOC 267

with the CBR-QP and rate-distortion observations, to realize as follows:


CBR control with an acceptable coding quality under on-demand CBRT
CBR conditions for video coding, the choice of QP is highly CBT F = (2)
FR
correlated to these conditions, regardless of the video coding q
standard. 
CBAT F = CBTj F (3)
Additionally, the coded I frame typically requires more coding
j=1
bits (CB) than the P frame [16], [19] in video coding. Moreover,
the number of CBRs is inversely proportional to the QP settings CBSF = CBAT F − CBARF (4)
according to Fig. 2. Combined with these two observations, we
conclude that the CBR can be reduced by coding QP selection if where CBTj F denotes the target CB of each coded jth frame.
we can decrease the number of coded bits that are indispensable Accompanying the preceding analysis and definition, the second
for the I frame. Hence, we simultaneously simulate and analyze part of the FRE algorithm (i.e., j > 1) performs coarse-grained
a large number of disparate video-sized test sequences [24] with processing with three conditions by comparing CBAT F and
various QP settings for the CBR-QP relationships between the CBARF and further performs fine-grained processing by com-
I frame and P frame. paring CBT F and CBSF . Consequently, the final FL QP deter-
The complete procedure of the FL CBR-control algorithm for mination of the currently coded jth frame (i.e., QPFj ) can refer
all the coded q frames is given by the FRE algorithm, and the cor- jointly to CBAT F , CBARF , CBT F , and CBSF by applying
responding FL CBR-control process for the QP determination the QPFj−1 decrement, QPFj−1 increment, and QPFj−1 without
of the presently coded jth frame (i.e., QPjF ) is detailed below in alteration. It is worth noting that when there is a sufficient CB
the order of the I frame setting and the remaining P frame setting. supply, the coding quality can be improved by decreasing the
The QP settings for the initial I frame (i.e., j = 1) with assorted QP, and vice versa. The first condition occurs when CBAT F
video resolutions in the FRE algorithm are selected according to is greater than CBARF , and this condition can further refer
a heuristic process, and the threshold settings of T HH , T HM , to CBT F and CBSF for the QPFj designation (i.e., QPFj is
and T HL denote the high threshold, medium threshold, and low designated to the QPFj−1 decrement, QPFj−1 increment, and
threshold, respectively, with respect to the target coding bit rate QPFj−1 ). The second condition (i.e., QPFj is designated to the
(i.e., CBRT ) and are set accordingly. QPFj−1 increment) occurs when CBAT F is less than CBARF .
After the initial QP determination for I frame use under a The final condition (i.e., QPFj is designated to QPFj−1 ) occurs
target CBR has been completed, we proceed to develop an FL when CBAT F is equal to CBARF . After the FL QP is obtained,
CBR-control ME model for the remaining (q-1) frames (i.e., we proceed to model the CTU-layer (CTUL) QP determination
j > 1). According to the video compression characteristics [14] to further enhance and refine the coding performance of our
among the consecutively coded video frames and the previous proposed HEVC ME CBR-control scheme.
CBR-QP observation in Fig. 2, the CBR of video coding highly
depends on the QP settings of the initial I frame and the tem- B. Proposed CTUL TB-Aware ME CBR-Control Algorithm
porary consecutive coding frames; therefore, we perform QP
prediction of the currently coded frame in light of the QP settings In this section, we develop a CTUL QP prediction scheme on
of the previous coded frame under the current CBRT setting. the basis of the preceding FL QP determination among all of the k
According to these observations, the remainder of the coding coding CTUs within the currently coded jth frame by applying
frames under one CBRT constraint are evolved and are shown at an auxiliary assessment of the accumulated real coded bits and
the bottom of the FRE algorithm, where QPjF represents the FL accumulated target CBs. We first formulate the CTUL target CB
QP determination of the currently coded jth frame, and CBAT F (CBT CT U ) and CTUL accumulated target CB (CBAT CT U ) for
and CBARF are used to represent the accumulated target CB all k coding CTUs within the currently coded jth frame in (5)
and the accumulated real CB up to the currently coded jth and (6).
frame, respectively. Meanwhile, CBARF gathers the previously CBT F
used CB of each coded jth frame (CBFj ) up to the presently CBT CT U = (5)
CT Uk
coded q frames under diverse time alterations CBRT , which is
k

formulated as
CBAT CT U = CBTl CT U (6)
q
 l=1
CBARF = CBFj . (1)
j=1
where CT Uk expresses the number of CTUs within the cur-
rently coded jth frame and CBTl CT U expresses the CTUL
The QP determination of the FRE algorithm for consecutive target CB for the currently coded lth CTU. It is worth noting
video coding frames, except the first I frame, exploits the metric that the objective of the proposed VLSI-achievable algorithm
of the CB instead of the CBR to calculate the QP. Moreover, is to implement an ME CBR controller. Therefore, we need
the FL individual target CB (CBT F ) for a specified value of to consider the fixed-/floating-point influence of the division
the coding frame rate FR (frames/s, FPS) and CBRT , the FL operation in (2) and (5) between the algorithmic modeling and
accumulated target CB (CBAT F ) and the available FL surviving VLSI realization of the heterogeneous design environment. As a
CB (CBSF ) up to the presently coded jth frame are formulated consequence, our proposed ME CBR framework aims to realize

Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
268 IEEE SYSTEMS JOURNAL, VOL. 16, NO. 1, MARCH 2022

FRE Algorithm: FL CBR Control. CTU Algorithm: CTUL CBR Control.


Initialize the FL QP Initialize the CTUL QP
Input: Target CBR constraints: CBRT ; CBR Thresholds: Input: FL QP: QPFj ; CTUL compensated accumulated
T HH , T HM , T HL ; FL accumulated target CB: CBAT F ; target CB: CBCAT CT U ; CTUL accumulated real CB:
FL accumulated real CB: CBARF ; Available FL CBARCT U
l
surviving CB: CBSF ; FL individual target CB: CBT F Output: the CTUL QP: QPCT U
Output: FL QP: QPFj Set the CTU index l = 1
Set the frame index j = 1 1: while l ≤ k do
1: while j ≤ q do 2: if CBCAT CT U > CBARCT U then
l j
2: if j = 1 then 3: QPCT U ← QPF decrement;
3: if CBRT > T HL then 4: else if CBCAT CT U < CBARCT U then
l j
4: if CBRT > T HM then 5: QPCT U ← QPF increment;
5: if CBRT > T HH then 6: else then
6: QPFj ← 27; 7: QPCT l j
U ← QPF ;
7: else then 8: end if
8: QPFj ← 32; 9: l = l + 1;
9: end if 10: end while
10: else then
11: QPFj ← 37;
12: end if determination for use of the presently coded lth CTU, along
13: else then with CBCAT CT U and CBARCT U , which denote the CTUL
14: QPFj ← 42; compensated accumulated target CBs, as defined in (7), and the
15: end if CTUL accumulated real coded bits up to the l coded CTUs, as
16: else then defined in (8), respectively.
17: if CBAT F > CBARF then
18: if CBT F > CBSF then CBCAT CT U = CBT F + CBAT CT U (7)
19: QPFj ← QPFj−1 increment;
k

20: else if CBT F < CBFj−1 then CBARCT U = CBUl CT U (8)
21: QPFj ← QPFj−1 ; l=1
22: else then
23: QPFj ← QPFj−1 decrement; where CBUl CT U denotes the CB used for the currently coded
24: end if lth CTU. The CTU algorithm is performed under three con-
25: else if CBAT F < CBARF then ditions by comparing CBARCT U and CBCAT CT U , and the
26: QPFj ← QPFj−1 increment; final CTUL QP determination of the currently coded lth CTU
l
27: else then (i.e., QPCT U ) can refer jointly to CBARCT U and CBCAT CT U
28: QPFj ← QPFj−1 ; by applying the QPFj decrement, QPFj increment, and QPFj
l
29: end if without alteration. The first condition (i.e., QPCT U is designated
j
30: end if to the QPF decrement) occurs when CBCAT CT U is greater than
31: j = j + 1; CBARCT U , which indicates a sufficient CB supply, and we can
32: end while enhance the coding quality by decrementing the QP. The second
l j
condition (i.e., QPCT U is designated to the QPF increment)
occurs when CBCAT CT U is less than CBARCT U , which indi-
VLSI for integration into the HEVC design within a mobile cates an insufficient CB supply and requires increasing the QP.
l j
MPSoC; therefore, for practical mobile MPSoC design applica- The final condition (i.e., QPCT U is designated to QPF ) occurs
tions, we should implement a fixed-point divider by considering when CBCAT CT U is equal to CBARCT U . Regardless of the QP
hardware simplicity and power mitigation instead of utilizing settings for our proposed FL or CTUL CBR-control ME scheme,
a floating-point divider. According to our offline analysis, we the proposed algorithm employs only increment and decrement
empirically compensate for the fixed- and floating-point division operators, along with accumulators and fixed-point dividers,
influence in (2) and (5) by rewriting (6) as (7) because the instead of the complicated functional elements that were used
division influence in (5) is large due to its computational errors in previous representative designs [13]–[16], [19]–[22]. Most
with respect to CBT CT U and must be taken into design consid- importantly, our proposed algorithm can realize a negligible
eration. Consequently, we summarize the preceding hardware coding performance loss and be implemented in VLSI due to
implementation perspective and then model a complete CTUL its low power dissipation and low hardware cost, in addition
QP-setting procedure for the presently coded lth CTU among to its software fulfillment that truly actualizes the low cost
all the coded k CTUs within the presently coded jth frame and high-performance HEVC chip design in a mobile MPSoC,
l
by the CTU Algorithm, where QPCT U denotes the CTUL QP which will be discussed in detail in the following sections.

Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
HSIEH et al.: REAL-TIME LOW-POWER CODING BIT-RATE CONTROL SCHEME FOR HEVC IN A MPSOC 269

TABLE I
PERFORMANCE COMPARISON IN TERMS OF BD-PSNR AND BD-RATE FOR THE LOW-DELAY P-MAIN SETTINGS

C. Function Validation and Performance Evaluation According to Table I, when using the proposed HEVC CBR-
control scheme under the representative CBR-constrained video
To realize a real-time TB-aware CBR-constrained video cod-
ing system, the proposed hardware-oriented H.265/HEVC CBR coding, the coding quality of BD-PSNR for a high-resolution
class-A video test sequence under various CBRT conditions
controller design is integrated into the HEVC reference software
varies from 0.011 to 0.073 dB, with an average BD-PSNR
HM [23] and verified on a variety of video-sized test sequences
[24] under CBR conditions (i.e., under versatile target CBR improvement of 0.042 dB, and from 0.032 to 0.084 dB, with
an average BD-PSNR improvement of 0.058 dB, for the FS
constraints CBRT ). Function validation and performance eval-
approach and TZS approach, respectively. In addition, the CBR
uation are performed on a workstation with 24 CPU cores and
128 GB RAM, in which each core working frequency can reach of the BD-Rate for a class-A video sequence under different
CBRT constraints varies from −0.555% to −1.571%, with an
2.6 GHz.
Simulation results are acquired under diverse target CBR average coding BR cutback of 1.063%, and from −1.218% to
limitations (i.e., under the target CBR constraint, CBRT , −1.807%, with an average coding BR cutback of 1.51%, for the
FS approach and TZS approach, respectively. Similar tendencies
and each CBRT is set equal to the average of the four QP
settings of 22, 27, 32, and 37) by testing various video sequences of the coding quality improvement and the coding BR cutback
[24] with six resolutions of 416 × 240 (defined as class D), are observed for the remaining video-sized test sequences (i.e.,
class B, class C, class D, class E, and 4K). Consequently, the
832×480 (defined as class C), 1280 × 720 (defined as class E),
1920×1080 (defined as class B), 2560 × 1600 (defined as class overall average coding quality improvement and CBR reduction
A), and 3840×2160 (defined as 4K). We compare the coding ef- with the six video-sized datasets are 0.067 dB and −1.775% for
HM_FS and 0.078 dB and −2.147% for HM_TZS, respectively,
ficiency in terms of the Bjøntegaard assessment [26] of the peak
signal-to-noise ratio (i.e., BD-PSNR) and CBR (i.e., BD-rate) under diverse CBR constraints. It is worth noting that the simula-
tion results presented in Table I carefully consider the influence
by applying our proposed HEVC CBR-control ME algorithm
of the fixed-point operations in (2) and (5) to be consistent with
and the conventional HEVC CBR-control ME algorithm to the
HEVC reference software in HM [23] under the common HEVC the proposed VLSI implementation for solving practical design
problems because of the low-hardware-complexity design
test condition of low-delay P-main configurations [24] (i.e., the
considerations in the resource- and power-limited mobile
first I frames and remaining P frames) during the HEVC process.
Two representative ME search schemes, the full search (i.e., MPSoC. As a consequence, to evaluate the performance of HM
[23], as shown in Table I, we utilize the fixed-point declaration
HM_FS) and a test zonal search (i.e., HM_TZS), are adopted
instead of the floating-point declaration to be consistent with
to assess the overall coding efficiency of applying our proposed
CBR-control ME algorithm and the conventional CBR-control the proposed hardware design in our proposed CBR-control
ME scheme; in contrast, we exploit the floating-point
ME scheme.

Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
270 IEEE SYSTEMS JOURNAL, VOL. 16, NO. 1, MARCH 2022

Fig. 3. Block diagram of our proposed H.265/HEVC ME design with CBR-control capability.

TABLE II will lead to hardware cost increases and VLSI performance


COMPARATIVE RESULTS OF THE CONTEMPORARY ME RATE-CONTROL
ALGORITHM DESIGN
degradation.

IV. PROPOSED LOW-POWER AND


LOW-HARDWARE-COMPLEXITY CBR CONTROLLER VLSI
ARCHITECTURE AND ITS IMPLEMENTATION RESULTS
This section describes and illustrates the VLSI implemen-
tation of the proposed H.265/HEVC ME CBR controller de-
sign in detail. The VLSI hardware architecture is devised by
simultaneously considering the coding performance, the VLSI
implementation speed, and the hardware cost. Most importantly,
integrating our proposed H.265/HEVC CBR controller design
declaration for the conventional HEVC CBR-control ME with the conventional H.265/HEVC ME design can further
algorithm. realize the TB-aware H.265/HEVC encoder design.

D. Hardware Complexity Evaluation of Previous ME Rate A. Mobile MPSoC Architecture


Control Schemes
A comprehensive schematic diagram of the mobile MPSoC is
Previous state-of-the-art ME rate control designs for video shown on the right-hand side of Fig. 3, which involves a cross-
coding applications exhibit outstanding CBR control perfor- bar (such as a wide-ranging ARM advanced high-performance
mance and acceptable coding quality; however, they are difficult bus/advanced system bus, AHB/ASB), a multicore processor
to implement in VLSI because their complicated algorithmic (including an n-core ARM CPU, where n denotes the number
models require many mathematical calculations. Each parameter of CPU cores, and the n-core CPU can be dissociated in distinct
of the R-λ model, which was proposed in [16] and [19], requires clusters such as the ARM big.LITTLE processing architecture
exponential and multiplication operations with floating-point [4]), a neural processing unit for artificial intelligence (AI)-
precision to calculate the corresponding values of λ and R, which affiliated mobile computing, a graphics processing unit (GPU), a
requires high hardware complexity. The R-QP model, which was modulator–demodulator (Modem), on-chip SRAM memory, an
introduced in [27], requires the solution of a quadratic equation input/output interface (I/O interface, such as an ARM advanced
to calculate the QP according to R by performing division and peripheral bus, APB, along with one bus bridge) between the
root computations, which is difficult to realize in VLSI and periphery (such as the I/O communication interface of UART,
results in a high VLSI burden. The ρ-domain model, which was I2 C, SPI, USB, GPIO, and video) and crossbar, an intelligent TB-
proposed in [25], [28], and [29], also requires multiplication aware H.265/HEVC encoder, and an off-chip SDRAM memory
and root computations to calculate the corresponding values of interface (including a powerful SDRAM controller) along with
QP and R. Table II exhibits the comparative results of the con- a set of off-chip mobile low-power double data rate (LPDDRx)
temporary representative ME rate-control algorithm designs in SDRAM (i.e., LPDDRx SDRAM, where subscript x expresses
terms of the hardware complexity and the difficulty of hardware the distinct DDR SDRAM standard supplied diverse memory
realization. In view of the VLSI implementation in Table II, the width) and a power management IC (PMIC). Note that some
aforementioned rate control designs in the literature for video mobile MPSoCs contain an on-chip PMIC within the MPSoC
coding are associated with high hardware complexity, which instead of an external standalone PMIC for miscellaneous VLSI

Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
HSIEH et al.: REAL-TIME LOW-POWER CODING BIT-RATE CONTROL SCHEME FOR HEVC IN A MPSOC 271

Fig. 4. Block diagram of our proposed H.265/HEVC CBR controller design.

considerations. The intelligent TB-aware H.265/HEVC encoder TABLE III


IMPLEMENTATION RESULTS OF THE PROPOSED H.265/HEVC CBR
design presented in this article receives video coding informa- CONTROLLER
tion, such as the current pixels, reference pixels, and neighboring
MVs, from the off-chip DDR SDRAM by way of an off-chip
SDRAM interface and crossbar and sends motion search infor-
mation to the off-chip DDR SDRAM for coding information
retrieval. In addition, diverse TB-constrained circumstances are
set in advance and are deposited in the on-chip SRAM. However,
it is worth noting that the proposed design furnishes flexibility
to handle not only the predefined TB settings but also the imme-
diate time-varied wireless TB for realistic mobile applications.

B. Proposed TB-Aware H.265/HEVC CBR-Control a


9810 gates = 80 gates (FLQPCU) + 7797 gates (TCBCU) + 850 gates (CBCU) + 1083
Hardware Architecture gates (QPDU).

The principal block diagram of a TB-aware H.265/HEVC


encoder design is illustrated in the middle of Fig. 3 and outlined
in purple. The design includes a high-performance H.265/HEVC our proposed H.265/HEVC CBR controller design is shown in
encoder outlined in blue and an H.265/HEVC CBR controller Fig. 4. In the proposed algorithm, the most timing-sensitive
outlined in khaki. The internal coding structures of the rep- portions include the two operations of the two divisions and
resentative H.265/HEVC encoder VLSI design [30] can be accumulators that compute CBT F and CBT CT U by means of
partitioned into two primary design blocks. One design block (2) and (5), respectively, in the TCBCU, along with CBAT F ,
is the H.265/HEVC ME processor, and a wide range of con- CBAT CT U , and CBCAT CT U in the CBCU by means of (3), (6),
temporary H.265/HEVC ME processor designs [5], [17], [31] and (7). To achieve hardware cost reductions using the resource
are designated to fulfill miscellaneous system applications. In sharing scheme, we utilize two accumulators instead of three
addition, the remaining principal H.265/HEVC coder design, accumulators to implement (3), (6), and (7) since the data are
which excludes the H.265/HEVC ME processor, includes the ready and valid at that timing interval. Accordingly, the TCBCU
coding blocks for transform, quantization, inverse transform, requires two cycles to generate CBT F and CBT CT U . In addi-
inverse quantization, intraprediction, motion compensation, en- tion, to avoid long delays due to the accumulators, we assign
tropy coder, and filters. It is worth noting that our proposed two pipeline stages to implement the calculation of CBAT F ,
H.265/HEVC CBR controller can be further combined with a CBAT CT U , and CBCAT CT U . All four major tasks are pipelined
contemporary H.265/HEVC ME processor design [5], [17], [31] into six stages to condense the longest path delay and hardware
to establish a TB-aware H.265/HEVC ME design. costs.
The primary block diagram of our proposed H.265/HEVC The principal cycle of our proposed H.265/HEVC CBR con-
CBR controller design for dynamically coding QP decisions troller design used to perform TB-aware H.265/HEVC coding
is formulated by the FRE algorithm, the CTU algorithm, and for QP determination is shown in Fig. 5. The controller includes
(2)–(7) in the previous algorithm modeling section and is shown FL coding processes, and each FL coding process includes
on the left-hand side of Fig. 3. The block diagram includes the several CTUL coding processes. The number of CTUL cod-
four major CUs of the FL QP calculation unit (FLQPCU) from ing processes depends on the coded video size. In this timing
the upper half of the FRE algorithm; the target CB calculation diagram, the CUs of FLQPCU and TCBCU activate only once
unit (TCBCU) from (2) and (5); the CB compensation unit under a single CBRT for the accessible FL target CB calculation
(CBCU) from (3), (6), and (7); and the QP decision unit (QPDU) and the accessible CTUL target CB calculation. In addition,
from (4), the lower half of the FRE algorithm, and the entire CTU during the CTUL coding process, the two CUs of CBCU and
algorithm. The detailed block diagram of the four CUs within QPDU are repeatedly activated for CB information updates and

Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
272 IEEE SYSTEMS JOURNAL, VOL. 16, NO. 1, MARCH 2022

Fig. 5. Timing diagram of our proposed H.265/HEVC CBR controller design.

TABLE IV
HARDWARE-COST AND POWER-CONSUMPTION PERCENTAGE COMPARISON FOR THE PROPOSED H.265/HEVC CBR CONTROLLER AND STATE-OF-THE-ART
H.265/HEVC ME DESIGN

a
Percentage of hardware cost (%) = (logic gate count of reference H.265/HEVC ME design/logic gate count of our proposed H.265/HEVC CBR controller) ×100%.
b
Percentage of power consumption (%) = (average core power of reference H.265/HEVC ME design/average core power of our proposed H.265/HEVC CBR
controller) ×100%.

final QP determination. Notably, the total number of CTUL The detailed resource consumption of the four major blocks of
coding cycles in the H.265/HEVC coding process is far greater the presented design are shown at the bottom of Table III. It is
than that required by our proposed design. Thus, our proposed worth noting that the predetermined target CBR stored in the
CBR controller design can satisfy real-time system require- on-chip SRAM memory, instead of a random setting of the pro-
ments due to the short clock latency. Therefore, the introduced posed work, can further reduce the VLSI hardware cost because
H.265/HEVC CBR controller design is characterized by low of the divider component removal of (2) and (5); this means
cost, high performance, and low power consumption because that our proposed design can further decrease the hardware
of the modeled hardware-oriented algorithm, which includes cost, but the design flexibility provided by the predetermined
an appropriate hardware architecture arrangement that employs scheme is poor when solving real-time TB-aware H.265/HEVC
pipeline and resource sharing approaches. applications that appear in the MPSoC. On the other hand, the
critical path inside the proposed H.265/HEVC CBR controller is
dominated by the divider component; thus, while the supporting
C. VLSI Implementation Results video resolution decreases, the maximum operating frequency
We utilize the Verilog hardware description language to per- increases and the hardware costs decrease because of the divider
form the register-transfer level (RTL) design of the presented length reduction.
VLSI hardware architecture of the H.265/HEVC CBR con- Additionally, the operating frequency of 272 MHz is an ade-
troller, along with the VCS compiler and Design Compiler quate amalgamation into the leading-edge HEVC ME design to
provided by Synopsys to conduct RTL functional verification attain a TB-aware HEVC encoder design, which is shown and
and synthesize the generated RTL to a gate-level netlist with evidenced in Table IV. As a consequence, from the bottom two
the 90-nm standard cell library of the Taiwan Semiconductor rows of Table IV, the hardware cost of our proposed TB-aware
Manufacturing Company. The target CBR points are inconstant HEVC ME CBR controller design is only 0.25%, 1.26%, and
and can be designated according to the system requirements. The 0.68% of those achieved by the state-of-the-art HEVC ME
supporting maximum video coding resolution is 3840 × 2160, designs [5], [17], and [31], respectively. Meanwhile, the power
and the supporting maximum coding frame rate is 60 FPS. consumption of our proposed H.265/HEVC CBR controller
Table III lists the logically synthesized outcomes of the presented design only accounts for less than 1.62% of the state-of-the-art
H.265/HEVC CBR controller, which consists of the maximum H.265/HEVC ME design [31] under the same 90-nm CMOS
design frequency and the average core power using the Synopsys process and the same 250-MHz working frequency. The low
gate-level timing and power analysis tool, respectively. hardware cost, high speed, and low power consumption are
According to Table III, the presented design can achieve due to our hardware-oriented algorithm modeling by inferring
a performance of up to 272 MHz with a hardware cost of simple logic elements, such as incremental and detrimental
approximately 9.81 kilogates and power dissipation of 2.46 mW. elements, along with VLSI architecture planning, rather than

Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
HSIEH et al.: REAL-TIME LOW-POWER CODING BIT-RATE CONTROL SCHEME FOR HEVC IN A MPSOC 273

using a complicated mathematical model without considering [10] J. H. Hsieh and H. R. Wang, “VLSI design of an ML-Based power-
the VLSI hardware implementation. Due to the lack of hardware efficient motion estimation controller for intelligent mobile systems,”
IEEE Trans. Very Large Scale Integr. Syst., vol. 26, no. 2, pp. 262–271,
implementation in previous H.265/HEVC CBR-control designs, Feb. 2018.
we do not include hardware comparisons in the interest of [11] S. Li, M. Xu, Z. Wang, and X. Sun, “Optimal bit allocation for CTU level
maintaining a fair comparison. rate control in HEVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 27,
no. 11, pp. 2409–2424, Nov. 2017.
[12] Y. Li, H. Jia, X. Xie, and T. Huang, “Rate control for consistent video
quality with inter-dependent distortion model for HEVC,” in Proc. Vis.
V. CONCLUSION Commun. Image Process., 2016, pp. 1–4.
A real-time rapid, low-power, and low-VLSI-hardware-cost [13] M. Wang, K. N. Ngan, and H. Li, “Low-delay rate control for consistent
quality using distortion-based Lagrange multiplier,” IEEE Trans. Image
H.265/HEVC CBR controller, which is suitable for integration Process., vol. 25, no. 7, pp. 2943–2955, Jul. 2016.
into the H.265/HEVC encoder inside a mobile MPSoC for [14] A. Fiengo, G. Chierchia, M. Cagnazzo, and B. Pesquet-Popescu, “Rate
TB-limited mobile video applications, is introduced in this ar- allocation in predictive video coding using a convex optimization
framework,” IEEE Trans. Image Process., vol. 26, no. 1, pp. 479–489,
ticle. The real-time rapid, low-power, and low-VLSI-hardware- Jan. 2017.
cost design is a result of our modeled hardware-oriented al- [15] Y. Gong, S. Wan, K. Yang, H. R. Wu, and Y. Liu, “Temporal-layer-
motivated lambda domain picture level rate control for random-access
gorithm and adequately planned hardware architecture. More- configuration in H.265/HEVC,” IEEE Trans. Circuits Syst. Video Technol.,
over, combining our proposed CBR controller design with the vol. 29, no. 1, pp. 156–170, Jan. 2019.
conventional H.265/HEVC ME design achieves a TB-aware [16] L. Li, B. Li, H. Li, and C. W. Chen, “λ-Domain optimal bit allocation
algorithm for high efficiency video coding,” IEEE Trans. Circuits Syst.
H.265/HEVC system because it is intelligent enough to adapt to Video Technol., vol. 28, no. 1, pp. 130–142, Jan. 2018.
the dynamically time-revised TB variations in a mobile MPSoC. [17] S. Y. Jou, S. J. Chang, and T. S. Chang, “Fast motion estimation al-
However, sophisticated HEVC ME designs with CBR-control gorithm and design for real time QFHD high efficiency video coding,”
IEEE Trans. Circuits Syst. Video Technol., vol. 25, no. 9, pp. 1533–1544,
capabilities are difficult to achieve in VLSI. Most importantly, Sep. 2015.
the proposed design considers that the system dynamically alters [18] J. Xiong, H. Li, F. Meng, Q. Wu, and K. N. Ngan, “Fast HEVC inter
the TB in real time and generates suitable QP values for HEVC CU decision based on latent SAD estimation,” IEEE Trans. Multimedia,
vol. 17, no. 12, pp. 2147–2159, Dec. 2015.
ME use, which yield negligible coding performance influences [19] B. Li, H. Li, L. Li, and J. Zhang, “λ-domain rate control algorithm for
and a prevailing VLSI performance, making it appropriate for High efficiency video coding,” IEEE Trans. Image Process., vol. 23, no. 9,
power-limited mobile MPSoC applications. pp. 3841–3854, Sep. 2014.
[20] L. P. Van, J. D. Praeter, G. V. Wallendael, S. V. Leuven, J. D. Cock,
and R. V. d. Walle, “Efficient bit rate transcoding for high efficiency
REFERENCES video coding,” IEEE Trans. Multimedia, vol. 18, no. 3, pp. 364–378,
Mar. 2016.
[1] G. J. Sullivan, J. R. Ohm, W. J. Han, and T. Wiegand, “Overview of the [21] T. Zhao, Z. Wang, and C. W. Chen, “Adaptive quantization parameter
high efficiency video coding (HEVC) standard,” IEEE Trans. Circuits Syst. cascading in HEVC hierarchical coding,” IEEE Trans. Image Process.,
Video Technol., vol. 22, no. 12, pp. 1649–1668, Dec. 2012. vol. 25, no. 7, pp. 2997–3009, Jul. 2016.
[2] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of [22] S. Li, M. Xu, Z. Wang, and X. Sun, “Optimal bit allocation for CTU level
the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video rate control in HEVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 27,
Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003. no. 11, pp. 2409–2424, Nov. 2017.
[3] J. H. Hsieh, J. A. Cai, Y. N. Wang, and Z. Y. Guo, “ML-assisted DVFS- [23] Joint Collaborative Team on Video Coding (JCT-VC), HM 16.6, Reference
Aware HEVC motion estimation design scheme for mobile APSoC,” IEEE Software, 2015. [Online]. Available: https://hevc.hhi.fraunhofer.de/svn/
Syst. J., vol. 13, no. 4, pp. 4464–4473, Dec. 2019. svn_HEVCSoftware/tags/HM-16.6/
[4] Y. Ma, J. Zhou, T. Chantem, R. P. Dick, S. Wang, and X. S. Hu, “Online [24] F. Bossen, “Common test conditions and software reference configura-
resource management for improving reliability of real-time systems on tions,” Document JCTVC-K1100, Shanghai, China, Oct. 2013.
‘Big–Little’ type MPSoCs,” IEEE Trans. Comput.-Aided Des. Integr. [25] W. Gao, S. Kwong, H. Yuan, and X. Wang, “DCT coefficient distribution
Circuits Syst., vol. 39, no. 1, pp. 88–100, Jan. 2020. modeling and quality dependency analysis based frame-level bit allocation
[5] M. E. Sinangil, V. Sze, M. Zhou, and A. P. Chandrakasan, “Cost and coding for HEVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 26, no. 1,
efficient motion estimation design considerations for high efficiency video pp. 139–153, Jan. 2016.
coding (HEVC) standard,” IEEE J. Sel. Topics Signal Process., vol. 7, no. 6, [26] G. Bjontegaard, “Improvements of the BD-PSNR model,” Document
pp. 1017–1028, Dec. 2013. VCEG-AI11, 2008.
[6] T. H. Tsai and Y. N. Pan, “High efficiency architecture design of real-time [27] T. Chiang and Y.-Q. Zhang, “A new rate control scheme using quadratic
QFHD for h. 264/AVC fast block motion estimation,” IEEE Trans. Circuits rate distortion model,” IEEE Trans. Circuits Syst. Video Technol., vol. 7,
Syst. Video Technol., vol. 21, no. 11, pp. 1646–1658, Nov. 2011. no. 1, pp. 246–250, Feb. 1997.
[7] J. Zheng, C. Lu, J. Guo, D. Chen, and D. Guo, “A hardware-efficient [28] S. Milani, L. Celetto, and G. A. Mian, “An accurate low-complexity rate
block matching algorithm and its hardware design for variable block size control algorithm based on (ρ,Eq)-domain,” IEEE Trans. Circuits Syst.
motion estimation in ultra-high-definition video encoding,” ACM Trans. Video Technol., vol. 18, no. 2, pp. 257–262, Feb. 2008.
Des. Autom. Electron. Syst., vol. 24, pp. 1–21, 2019. [29] S. Wang, S. Ma, S. Wang, D. Zhao, and W. Gao, “Quadratic ρ-domain
[8] N. C. Vayalil, M. Paul, and Y. Kong, “A residue number system hard- based rate control algorithm for HEVC,” in Proc. IEEE Int. Conf. Acoust.,
ware design of fast-search variable-motion-estimation accelerator for Speech, Signal Process., May 2013, pp. 1695–1699.
HEVC/H.265,” IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 2, [30] C.-C. Ju et al., “A 0.5 nJ/pixel 4 k h. 265/HEVC codec LSI for multi-format
pp. 572–581, Feb. 2019. smartphone applications,” IEEE J. Solid-State Circuits, vol. 51, no. 1,
[9] T. S. Kim, C. E. Rhee, H.-J. Lee, and S.-I. Chae, “Fast integer motion es- pp. 56–67, Feb. 2015.
timation with bottom-up motion vector prediction for an HEVC encoder,” [31] K. Singh and S. R. Ahamed, “Low power motion estimation algorithm
IEEE Trans. Circuits Syst. Video Technol., vol. 28, no. 12, pp. 3398–3411, and architecture of HEVC/H.265 for consumer applications,” IEEE Trans.
Dec. 2018. Consum. Electron., vol. 64, no. 3, pp. 267–275, Aug. 2018.

Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
274 IEEE SYSTEMS JOURNAL, VOL. 16, NO. 1, MARCH 2022

Jui-Hung Hsieh (Member, IEEE) received the Ph.D. Zhe-Yu Guo received the B.S. and M.S. degrees
degree in electronics engineering from National in computer and communication engineering from
Chiao-Tung University, Hsinchu, Taiwan, in 2012. the National Kaohsiung University of Science and
From 2002 to 2014, he was a Group Leader, Man- Technology, Kaohsiung, Taiwan, in 2017 and 2019,
ager, and Technical Manager with Macronix Inc., respectively.
Modiotek Inc., and Mediatek Inc., Hsinchu, Tai- His current research interests include VLSI archi-
wan, respectively. In 2014, he joined the Depart- tecture design and HEVC.
ment of Computer and Communication Engineering,
National Kaohsiung First University of Science and
Technology, Kaohsiung, Taiwan, as an Assistant Pro-
fessor. In 2017, he joined Johns Hopkins University,
Baltimore, MD, USA, as a Visiting Professor. He is currently an Associate
Professor with the Department of Computer and Communication Engineering,
National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan.
His current research interests include low-power and high-performance system- Zhi-Yu Zhang received the B.S. degree in electrical
on-chip design, low-power ML-based VLSI of video, ECG, and breast tumor engineering from the National Chin Yi University of
signal processing. Science and Technology, Taichung, Taiwan, in 2019.
He is currently working toward the M.S. degree in
computer and communication engineering with the
National Kaohsiung University of Science and Tech-
nology, Kaohsiung, Taiwan.
His current research interests include VLSI archi-
tecture design and HEVC.

Jing-Cheng Syu received the B.S. and M.S. degrees


in computer and communication engineering from
the National Kaohsiung University of Science and
Technology, Kaohsiung, Taiwan, in 2018 and 2020,
respectively.
His current research interests include VLSI archi-
tecture design and HEVC.

Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.

You might also like