A Real-Time Low-Power Coding Bit-Rate Control Scheme For High-Efficiency Video Coding in A Multiprocessor System-on-Chip
A Real-Time Low-Power Coding Bit-Rate Control Scheme For High-Efficiency Video Coding in A Multiprocessor System-on-Chip
1, MARCH 2022
Abstract—A real-time high-performance transmission In recent years, an emerging video coding standard,
bandwidth-aware (TB-aware) coding bit-rate (CBR) controller H.265/high-efficiency video coding (HEVC) [1], has become
design with low power consumption and low hardware complexity the principal video coding standard, which yields coding bit-rate
is presented in this article for H.265/high-efficiency video coding
(HEVC) in a multiprocessor system-on-chip (MPSoC). Previous (CBR) savings of approximately 50% compared to H.264/AVC
TB-aware motion estimation designs with CBR-control capability [2] for the same perceived quality and at high resolutions,
in video coding have focused on algorithm development with thereby realizing real-time high-definition video coding. The
precise CBR models, which require a complicated algorithmic latest video coding standard, H.265/HEVC, preserves most
derivation according to the system on-demand CBR and are
of the beneficial coding tools in H.264/AVC; additionally,
difficult to realize in very large scale integration (VLSI) due
to their lack of consideration for hardware implementation H.265/HEVC provides more miscellaneous coding tools, such
and modeling. Consequently, we present a hardware-oriented as new coding hierarchy selections of coding tree units (CTUs),
CBR-control algorithm that uses simple CBR control functions transform units (TUs), and prediction units (PUs), along with
instead of requiring root and exponential operations to realize the diverse coding block size selections, to satisfy the requirements
real-time low-power design objective for HEVC applications within
of real-time high-definition video coding. Furthermore, the CTU
a mobile MPSoC. Then, an adequate hardware architecture with
low hardware complexity is exploited to accomplish a low-power size selection varies from 64 × 64 to 8 × 8, and each CTU may
and high-speed VLSI design of a CBR controller for our proposed be further partitioned into small quadtree-based coding units
algorithm. Using diverse video-sized sequences under on-demand (CUs). In addition, H.265/HEVC adopts distinct TU sizes from
system coding-bit-rate constraints, the experimental outcomes 32 × 32 down to 4 × 4 and symmetrical/asymmetrical PUs.
demonstrate that the introduced design is capable of low power
H.265/HEVC cooperates with motion search algorithms and
consumption and high speeds and can utilize low-complexity
hardware. motion prediction algorithms within interframe and intraframe
predictions to realize the reduction of the CBR and real-time
Index Terms—High-efficiency video coding (HEVC), motion HEVC design objectives. However, the transmission bandwidth
estimation (ME), rate control, very large scale integration (VLSI).
(TB) supplied for mobile video coding use varies over time due
to simultaneous multiple users and manifold applications; thus,
I. INTRODUCTION the available TB for mobile video coding use is not consistent.
Meanwhile, if the HEVC chip within a mobile MPSoC [3],
HE APPLICATION services provided by mobile devices,
T such as real-time video streaming and video conferences,
enrich and facilitate people’s lives, and video coding technology
[4] cannot dynamically adjust the video encoding algorithm
during runtime to conform to the time-varying TB constraints,
the performance of the HEVC chip will be degraded, and the
plays an important role in these services. Large amounts of high-performance HEVC chip design goals will not be realized.
uncompressed original video data are reduced via a video coding Additionally, regardless of whether the video coding standard
technique, thereby realizing substantial data compression. This is H.264/AVC or H.265/HEVC, motion estimation (ME) is a
technique can substantially decrease video encoding informa- highly important coding tool within interframe prediction for
tion storage and encoded information transmission, thus en- CBR control and is the most computationally complex and
abling consumers to view and enjoy real time and high-definition highest power-consuming processing module [3], [5] in video
videos. coding, including computing power, external memory accessing
power, and data transmission power. Among these three power
Manuscript received June 9, 2020; revised October 18, 2020 and January dissipation sources, the computing power and memory access-
27, 2021; accepted March 18, 2021. Date of publication April 21, 2021; date ing power depend on a motion search algorithm and motion
of current version March 24, 2022. This work was supported by the Ministry search range, respectively. For the former, superior solutions for
of Science and Technology, Taiwan, under Grant 107-2221-E-992-082-MY2.
(Corresponding author: Jui-Hung Hsieh.) both H.264/AVC and H.265/HEVC have been proposed in the
The authors are with the Department of Computer and Communication Engi- literature [6]–[9], in which a fast algorithm is used to terminate
neering, National Kaohsiung University of Science and Technology, Kaohsiung the motion search early; for the latter, an exceptional solution
82445, Taiwan (e-mail: [email protected]; [email protected];
[email protected]; [email protected]). is available in which the ME algorithm and ME VLSI archi-
Digital Object Identifier 10.1109/JSYST.2021.3069477 tecture are utilized [3], [10]. Regarding the data transmission
1937-9234 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
HSIEH et al.: REAL-TIME LOW-POWER CODING BIT-RATE CONTROL SCHEME FOR HEVC IN A MPSOC 265
power, numerous video coding standards work with bit-rate other hand, the inferior power dissipation and inferior hardware
control capability [11]–[16] and offer solutions, and the auxiliary cost characteristics make the proposed design appropriate to
consequence indicates that the presented ME algorithm can be amalgamated with the sophisticated HEVC design into a
automatically adjust the CBR usage for the data transmission power-on-demand mobile MPSoC for the fulfillment of a truly
according to the on-demand time-altering TB conditions; at the high-performance low-power TB-aware HEVC design.
same time, these designs realize superior video coding perfor- The remainder of this article is detailed as follows. Section II
mance. In addition, these methods [11]–[16] provide precise concisely explores the previous H.265/HEVC ME designs with
and accurate ME algorithm modeling by applying complicated and without CBR-control capability. Section III describes our
coding models with complicated mathematical formulas; hence, presented hardware-oriented H.265/HEVC ME CBR-control
they are difficult to use for the VLSI implementation. algorithm in detail and presents the homologous simulation
The complex ME algorithm anticipates an increase in the com- results. Section IV presents the suggested VLSI hardware ar-
putational power consumption of the HEVC chip and thus is not chitecture and the implementation results. Finally, Section V
suitable to be embedded in an MPSoC within battery-powered concludes this article.
mobile devices. Therefore, this article presents a codesign of ME
algorithms and the VLSI hardware architecture to obtain a TB-
aware ME CBR controller design that has the advantages of low II. RELATED CBR-CONTROL ME WORKS
hardware cost and low-power dissipation characteristics for the The existing video coding designs that are capable of CBR
embedding of HEVC chips inside an MPSoC. First, this article control for the H.265/HEVC video coding standards are con-
introduces a VLSI-hardware-oriented HEVC CBR-control ME cisely reviewed and discussed in this section.
algorithm based on a system-level heuristic method, the main
objective of which is realized in VLSI by developing an accurate
and uncomplicated hardware-accomplishable algorithm model A. Previous ME Without CBR-Control Capability in
with numerous offline experimental data analyses. Next, we H.265/HEVC
develop a VLSI hardware architecture that is appropriate for ME operations utilize the interframe correlation to reduce the
the proposed HEVC CBR-control ME algorithm, and the VLSI redundancy of coding data between consecutive video coding
implementation results demonstrate the power feasibility of the frames; then, they economize the amount of video coding data
proposed HEVC CBR-controller design. to perform video compression. A large number of cutting-edge
This article makes three principal contributions. First, the studies [5], [17], [18] have discussed and focused on this
proposed hardware-achievable algorithm and the VLSI imple- topic. Sinangil et al. [5] presented a low-bandwidth and low-
mentation of the TB-aware CBR controller design perform real- hardware-area H.265/HEVC ME engine design methodology
time HEVC ME rate control with low power and low hardware for algorithm modeling and hardware architecture according
complexity. The performance of our method is compared with to distinct ME design configurations, which included smaller
cutting-edge HEVC ME rate control, which requires compli- coding block sizes to larger coding block sizes. Their simulation
cated mathematical modeling and is difficult to implement in results demonstrated that the introduced ME design achieved
VLSI. Second, previous HEVC ME designs with rate-control prominent coding performance between coding efficiency and
capability are appropriate for software realization instead of hardware cost via quantitative analysis and hardware-oriented
hardware realization, which implies that they are not adequate algorithms. Additionally, a fast ME design that was obtained
for implementation inside an MPSoC. Therefore, integrating the via an algorithm and a hardware architecture codesign and based
presented TB-aware CBR controller design with an up-to-date on a predictive integer ME algorithm and a PU size-dependent
HEVC ME design can achieve a promising TB-aware HEVC fractional ME algorithm, along with scheduling and prediction
design scheme. Third, this article presents an all-embracing techniques, was introduced in [17]. Xiong et al. [18] introduced
exploration and comparison of the coding quality and CBR a rapid CU determination method according to the sum of the
under different target coding rates with QP settings equal absolute differences and rate-distortion cost for coding time
to 22, 27, 32, and 37 for the latest video coding standard, reduction.
H.265/HEVC. According to thorough experimental results that However, these ME studies did not consider the time-varying
are verified using the standard HEVC test sequence, the pro- TB limitations that are encountered in mobile video coding
posed TB-aware HEVC ME controller design greatly decreases operating on an MPSoC and are therefore not conducive to mo-
the complexity of the rate-control ME design because of its pro- bile video coding applications and result in coding performance
posed hardware-oriented rate-control algorithm, which exploits degradation due to insufficient TB.
hardware-accomplishable mathematical modeling rather than
the conventional complex mathematical modeling. Concretely,
the complexity of the VLSI design and hardware cost of the B. Previous ME With CBR-Control Capability in H.265/HEVC
ME rate control are lessened due to our proposed hardware- To overcome the time-varying TB problem of a mobile MP-
oriented ME algorithm, which signifies that the presented de- SoC when applying the H.265/HEVC video standard, many
sign can simultaneously attain the low-power dissipation and state-of-the-art ME designs with CBR-control capability [13]–
high-performance VLSI realization objectives as a result of the [16], [19]–[22] have been proposed. Wang et al. [13] presented
alleviated hardware complexity and computing power; on the a low-delay Lagrange multiplier-based rate control framework
Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
266 IEEE SYSTEMS JOURNAL, VOL. 16, NO. 1, MARCH 2022
Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
HSIEH et al.: REAL-TIME LOW-POWER CODING BIT-RATE CONTROL SCHEME FOR HEVC IN A MPSOC 267
Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
268 IEEE SYSTEMS JOURNAL, VOL. 16, NO. 1, MARCH 2022
Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
HSIEH et al.: REAL-TIME LOW-POWER CODING BIT-RATE CONTROL SCHEME FOR HEVC IN A MPSOC 269
TABLE I
PERFORMANCE COMPARISON IN TERMS OF BD-PSNR AND BD-RATE FOR THE LOW-DELAY P-MAIN SETTINGS
C. Function Validation and Performance Evaluation According to Table I, when using the proposed HEVC CBR-
control scheme under the representative CBR-constrained video
To realize a real-time TB-aware CBR-constrained video cod-
ing system, the proposed hardware-oriented H.265/HEVC CBR coding, the coding quality of BD-PSNR for a high-resolution
class-A video test sequence under various CBRT conditions
controller design is integrated into the HEVC reference software
varies from 0.011 to 0.073 dB, with an average BD-PSNR
HM [23] and verified on a variety of video-sized test sequences
[24] under CBR conditions (i.e., under versatile target CBR improvement of 0.042 dB, and from 0.032 to 0.084 dB, with
an average BD-PSNR improvement of 0.058 dB, for the FS
constraints CBRT ). Function validation and performance eval-
approach and TZS approach, respectively. In addition, the CBR
uation are performed on a workstation with 24 CPU cores and
128 GB RAM, in which each core working frequency can reach of the BD-Rate for a class-A video sequence under different
CBRT constraints varies from −0.555% to −1.571%, with an
2.6 GHz.
Simulation results are acquired under diverse target CBR average coding BR cutback of 1.063%, and from −1.218% to
limitations (i.e., under the target CBR constraint, CBRT , −1.807%, with an average coding BR cutback of 1.51%, for the
FS approach and TZS approach, respectively. Similar tendencies
and each CBRT is set equal to the average of the four QP
settings of 22, 27, 32, and 37) by testing various video sequences of the coding quality improvement and the coding BR cutback
[24] with six resolutions of 416 × 240 (defined as class D), are observed for the remaining video-sized test sequences (i.e.,
class B, class C, class D, class E, and 4K). Consequently, the
832×480 (defined as class C), 1280 × 720 (defined as class E),
1920×1080 (defined as class B), 2560 × 1600 (defined as class overall average coding quality improvement and CBR reduction
A), and 3840×2160 (defined as 4K). We compare the coding ef- with the six video-sized datasets are 0.067 dB and −1.775% for
HM_FS and 0.078 dB and −2.147% for HM_TZS, respectively,
ficiency in terms of the Bjøntegaard assessment [26] of the peak
signal-to-noise ratio (i.e., BD-PSNR) and CBR (i.e., BD-rate) under diverse CBR constraints. It is worth noting that the simula-
tion results presented in Table I carefully consider the influence
by applying our proposed HEVC CBR-control ME algorithm
of the fixed-point operations in (2) and (5) to be consistent with
and the conventional HEVC CBR-control ME algorithm to the
HEVC reference software in HM [23] under the common HEVC the proposed VLSI implementation for solving practical design
problems because of the low-hardware-complexity design
test condition of low-delay P-main configurations [24] (i.e., the
considerations in the resource- and power-limited mobile
first I frames and remaining P frames) during the HEVC process.
Two representative ME search schemes, the full search (i.e., MPSoC. As a consequence, to evaluate the performance of HM
[23], as shown in Table I, we utilize the fixed-point declaration
HM_FS) and a test zonal search (i.e., HM_TZS), are adopted
instead of the floating-point declaration to be consistent with
to assess the overall coding efficiency of applying our proposed
CBR-control ME algorithm and the conventional CBR-control the proposed hardware design in our proposed CBR-control
ME scheme; in contrast, we exploit the floating-point
ME scheme.
Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
270 IEEE SYSTEMS JOURNAL, VOL. 16, NO. 1, MARCH 2022
Fig. 3. Block diagram of our proposed H.265/HEVC ME design with CBR-control capability.
Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
HSIEH et al.: REAL-TIME LOW-POWER CODING BIT-RATE CONTROL SCHEME FOR HEVC IN A MPSOC 271
Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
272 IEEE SYSTEMS JOURNAL, VOL. 16, NO. 1, MARCH 2022
TABLE IV
HARDWARE-COST AND POWER-CONSUMPTION PERCENTAGE COMPARISON FOR THE PROPOSED H.265/HEVC CBR CONTROLLER AND STATE-OF-THE-ART
H.265/HEVC ME DESIGN
a
Percentage of hardware cost (%) = (logic gate count of reference H.265/HEVC ME design/logic gate count of our proposed H.265/HEVC CBR controller) ×100%.
b
Percentage of power consumption (%) = (average core power of reference H.265/HEVC ME design/average core power of our proposed H.265/HEVC CBR
controller) ×100%.
final QP determination. Notably, the total number of CTUL The detailed resource consumption of the four major blocks of
coding cycles in the H.265/HEVC coding process is far greater the presented design are shown at the bottom of Table III. It is
than that required by our proposed design. Thus, our proposed worth noting that the predetermined target CBR stored in the
CBR controller design can satisfy real-time system require- on-chip SRAM memory, instead of a random setting of the pro-
ments due to the short clock latency. Therefore, the introduced posed work, can further reduce the VLSI hardware cost because
H.265/HEVC CBR controller design is characterized by low of the divider component removal of (2) and (5); this means
cost, high performance, and low power consumption because that our proposed design can further decrease the hardware
of the modeled hardware-oriented algorithm, which includes cost, but the design flexibility provided by the predetermined
an appropriate hardware architecture arrangement that employs scheme is poor when solving real-time TB-aware H.265/HEVC
pipeline and resource sharing approaches. applications that appear in the MPSoC. On the other hand, the
critical path inside the proposed H.265/HEVC CBR controller is
dominated by the divider component; thus, while the supporting
C. VLSI Implementation Results video resolution decreases, the maximum operating frequency
We utilize the Verilog hardware description language to per- increases and the hardware costs decrease because of the divider
form the register-transfer level (RTL) design of the presented length reduction.
VLSI hardware architecture of the H.265/HEVC CBR con- Additionally, the operating frequency of 272 MHz is an ade-
troller, along with the VCS compiler and Design Compiler quate amalgamation into the leading-edge HEVC ME design to
provided by Synopsys to conduct RTL functional verification attain a TB-aware HEVC encoder design, which is shown and
and synthesize the generated RTL to a gate-level netlist with evidenced in Table IV. As a consequence, from the bottom two
the 90-nm standard cell library of the Taiwan Semiconductor rows of Table IV, the hardware cost of our proposed TB-aware
Manufacturing Company. The target CBR points are inconstant HEVC ME CBR controller design is only 0.25%, 1.26%, and
and can be designated according to the system requirements. The 0.68% of those achieved by the state-of-the-art HEVC ME
supporting maximum video coding resolution is 3840 × 2160, designs [5], [17], and [31], respectively. Meanwhile, the power
and the supporting maximum coding frame rate is 60 FPS. consumption of our proposed H.265/HEVC CBR controller
Table III lists the logically synthesized outcomes of the presented design only accounts for less than 1.62% of the state-of-the-art
H.265/HEVC CBR controller, which consists of the maximum H.265/HEVC ME design [31] under the same 90-nm CMOS
design frequency and the average core power using the Synopsys process and the same 250-MHz working frequency. The low
gate-level timing and power analysis tool, respectively. hardware cost, high speed, and low power consumption are
According to Table III, the presented design can achieve due to our hardware-oriented algorithm modeling by inferring
a performance of up to 272 MHz with a hardware cost of simple logic elements, such as incremental and detrimental
approximately 9.81 kilogates and power dissipation of 2.46 mW. elements, along with VLSI architecture planning, rather than
Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
HSIEH et al.: REAL-TIME LOW-POWER CODING BIT-RATE CONTROL SCHEME FOR HEVC IN A MPSOC 273
using a complicated mathematical model without considering [10] J. H. Hsieh and H. R. Wang, “VLSI design of an ML-Based power-
the VLSI hardware implementation. Due to the lack of hardware efficient motion estimation controller for intelligent mobile systems,”
IEEE Trans. Very Large Scale Integr. Syst., vol. 26, no. 2, pp. 262–271,
implementation in previous H.265/HEVC CBR-control designs, Feb. 2018.
we do not include hardware comparisons in the interest of [11] S. Li, M. Xu, Z. Wang, and X. Sun, “Optimal bit allocation for CTU level
maintaining a fair comparison. rate control in HEVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 27,
no. 11, pp. 2409–2424, Nov. 2017.
[12] Y. Li, H. Jia, X. Xie, and T. Huang, “Rate control for consistent video
quality with inter-dependent distortion model for HEVC,” in Proc. Vis.
V. CONCLUSION Commun. Image Process., 2016, pp. 1–4.
A real-time rapid, low-power, and low-VLSI-hardware-cost [13] M. Wang, K. N. Ngan, and H. Li, “Low-delay rate control for consistent
quality using distortion-based Lagrange multiplier,” IEEE Trans. Image
H.265/HEVC CBR controller, which is suitable for integration Process., vol. 25, no. 7, pp. 2943–2955, Jul. 2016.
into the H.265/HEVC encoder inside a mobile MPSoC for [14] A. Fiengo, G. Chierchia, M. Cagnazzo, and B. Pesquet-Popescu, “Rate
TB-limited mobile video applications, is introduced in this ar- allocation in predictive video coding using a convex optimization
framework,” IEEE Trans. Image Process., vol. 26, no. 1, pp. 479–489,
ticle. The real-time rapid, low-power, and low-VLSI-hardware- Jan. 2017.
cost design is a result of our modeled hardware-oriented al- [15] Y. Gong, S. Wan, K. Yang, H. R. Wu, and Y. Liu, “Temporal-layer-
motivated lambda domain picture level rate control for random-access
gorithm and adequately planned hardware architecture. More- configuration in H.265/HEVC,” IEEE Trans. Circuits Syst. Video Technol.,
over, combining our proposed CBR controller design with the vol. 29, no. 1, pp. 156–170, Jan. 2019.
conventional H.265/HEVC ME design achieves a TB-aware [16] L. Li, B. Li, H. Li, and C. W. Chen, “λ-Domain optimal bit allocation
algorithm for high efficiency video coding,” IEEE Trans. Circuits Syst.
H.265/HEVC system because it is intelligent enough to adapt to Video Technol., vol. 28, no. 1, pp. 130–142, Jan. 2018.
the dynamically time-revised TB variations in a mobile MPSoC. [17] S. Y. Jou, S. J. Chang, and T. S. Chang, “Fast motion estimation al-
However, sophisticated HEVC ME designs with CBR-control gorithm and design for real time QFHD high efficiency video coding,”
IEEE Trans. Circuits Syst. Video Technol., vol. 25, no. 9, pp. 1533–1544,
capabilities are difficult to achieve in VLSI. Most importantly, Sep. 2015.
the proposed design considers that the system dynamically alters [18] J. Xiong, H. Li, F. Meng, Q. Wu, and K. N. Ngan, “Fast HEVC inter
the TB in real time and generates suitable QP values for HEVC CU decision based on latent SAD estimation,” IEEE Trans. Multimedia,
vol. 17, no. 12, pp. 2147–2159, Dec. 2015.
ME use, which yield negligible coding performance influences [19] B. Li, H. Li, L. Li, and J. Zhang, “λ-domain rate control algorithm for
and a prevailing VLSI performance, making it appropriate for High efficiency video coding,” IEEE Trans. Image Process., vol. 23, no. 9,
power-limited mobile MPSoC applications. pp. 3841–3854, Sep. 2014.
[20] L. P. Van, J. D. Praeter, G. V. Wallendael, S. V. Leuven, J. D. Cock,
and R. V. d. Walle, “Efficient bit rate transcoding for high efficiency
REFERENCES video coding,” IEEE Trans. Multimedia, vol. 18, no. 3, pp. 364–378,
Mar. 2016.
[1] G. J. Sullivan, J. R. Ohm, W. J. Han, and T. Wiegand, “Overview of the [21] T. Zhao, Z. Wang, and C. W. Chen, “Adaptive quantization parameter
high efficiency video coding (HEVC) standard,” IEEE Trans. Circuits Syst. cascading in HEVC hierarchical coding,” IEEE Trans. Image Process.,
Video Technol., vol. 22, no. 12, pp. 1649–1668, Dec. 2012. vol. 25, no. 7, pp. 2997–3009, Jul. 2016.
[2] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of [22] S. Li, M. Xu, Z. Wang, and X. Sun, “Optimal bit allocation for CTU level
the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video rate control in HEVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 27,
Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003. no. 11, pp. 2409–2424, Nov. 2017.
[3] J. H. Hsieh, J. A. Cai, Y. N. Wang, and Z. Y. Guo, “ML-assisted DVFS- [23] Joint Collaborative Team on Video Coding (JCT-VC), HM 16.6, Reference
Aware HEVC motion estimation design scheme for mobile APSoC,” IEEE Software, 2015. [Online]. Available: https://hevc.hhi.fraunhofer.de/svn/
Syst. J., vol. 13, no. 4, pp. 4464–4473, Dec. 2019. svn_HEVCSoftware/tags/HM-16.6/
[4] Y. Ma, J. Zhou, T. Chantem, R. P. Dick, S. Wang, and X. S. Hu, “Online [24] F. Bossen, “Common test conditions and software reference configura-
resource management for improving reliability of real-time systems on tions,” Document JCTVC-K1100, Shanghai, China, Oct. 2013.
‘Big–Little’ type MPSoCs,” IEEE Trans. Comput.-Aided Des. Integr. [25] W. Gao, S. Kwong, H. Yuan, and X. Wang, “DCT coefficient distribution
Circuits Syst., vol. 39, no. 1, pp. 88–100, Jan. 2020. modeling and quality dependency analysis based frame-level bit allocation
[5] M. E. Sinangil, V. Sze, M. Zhou, and A. P. Chandrakasan, “Cost and coding for HEVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 26, no. 1,
efficient motion estimation design considerations for high efficiency video pp. 139–153, Jan. 2016.
coding (HEVC) standard,” IEEE J. Sel. Topics Signal Process., vol. 7, no. 6, [26] G. Bjontegaard, “Improvements of the BD-PSNR model,” Document
pp. 1017–1028, Dec. 2013. VCEG-AI11, 2008.
[6] T. H. Tsai and Y. N. Pan, “High efficiency architecture design of real-time [27] T. Chiang and Y.-Q. Zhang, “A new rate control scheme using quadratic
QFHD for h. 264/AVC fast block motion estimation,” IEEE Trans. Circuits rate distortion model,” IEEE Trans. Circuits Syst. Video Technol., vol. 7,
Syst. Video Technol., vol. 21, no. 11, pp. 1646–1658, Nov. 2011. no. 1, pp. 246–250, Feb. 1997.
[7] J. Zheng, C. Lu, J. Guo, D. Chen, and D. Guo, “A hardware-efficient [28] S. Milani, L. Celetto, and G. A. Mian, “An accurate low-complexity rate
block matching algorithm and its hardware design for variable block size control algorithm based on (ρ,Eq)-domain,” IEEE Trans. Circuits Syst.
motion estimation in ultra-high-definition video encoding,” ACM Trans. Video Technol., vol. 18, no. 2, pp. 257–262, Feb. 2008.
Des. Autom. Electron. Syst., vol. 24, pp. 1–21, 2019. [29] S. Wang, S. Ma, S. Wang, D. Zhao, and W. Gao, “Quadratic ρ-domain
[8] N. C. Vayalil, M. Paul, and Y. Kong, “A residue number system hard- based rate control algorithm for HEVC,” in Proc. IEEE Int. Conf. Acoust.,
ware design of fast-search variable-motion-estimation accelerator for Speech, Signal Process., May 2013, pp. 1695–1699.
HEVC/H.265,” IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 2, [30] C.-C. Ju et al., “A 0.5 nJ/pixel 4 k h. 265/HEVC codec LSI for multi-format
pp. 572–581, Feb. 2019. smartphone applications,” IEEE J. Solid-State Circuits, vol. 51, no. 1,
[9] T. S. Kim, C. E. Rhee, H.-J. Lee, and S.-I. Chae, “Fast integer motion es- pp. 56–67, Feb. 2015.
timation with bottom-up motion vector prediction for an HEVC encoder,” [31] K. Singh and S. R. Ahamed, “Low power motion estimation algorithm
IEEE Trans. Circuits Syst. Video Technol., vol. 28, no. 12, pp. 3398–3411, and architecture of HEVC/H.265 for consumer applications,” IEEE Trans.
Dec. 2018. Consum. Electron., vol. 64, no. 3, pp. 267–275, Aug. 2018.
Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.
274 IEEE SYSTEMS JOURNAL, VOL. 16, NO. 1, MARCH 2022
Jui-Hung Hsieh (Member, IEEE) received the Ph.D. Zhe-Yu Guo received the B.S. and M.S. degrees
degree in electronics engineering from National in computer and communication engineering from
Chiao-Tung University, Hsinchu, Taiwan, in 2012. the National Kaohsiung University of Science and
From 2002 to 2014, he was a Group Leader, Man- Technology, Kaohsiung, Taiwan, in 2017 and 2019,
ager, and Technical Manager with Macronix Inc., respectively.
Modiotek Inc., and Mediatek Inc., Hsinchu, Tai- His current research interests include VLSI archi-
wan, respectively. In 2014, he joined the Depart- tecture design and HEVC.
ment of Computer and Communication Engineering,
National Kaohsiung First University of Science and
Technology, Kaohsiung, Taiwan, as an Assistant Pro-
fessor. In 2017, he joined Johns Hopkins University,
Baltimore, MD, USA, as a Visiting Professor. He is currently an Associate
Professor with the Department of Computer and Communication Engineering,
National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan.
His current research interests include low-power and high-performance system- Zhi-Yu Zhang received the B.S. degree in electrical
on-chip design, low-power ML-based VLSI of video, ECG, and breast tumor engineering from the National Chin Yi University of
signal processing. Science and Technology, Taichung, Taiwan, in 2019.
He is currently working toward the M.S. degree in
computer and communication engineering with the
National Kaohsiung University of Science and Tech-
nology, Kaohsiung, Taiwan.
His current research interests include VLSI archi-
tecture design and HEVC.
Authorized licensed use limited to: Mahendra Educational Trust. Downloaded on November 04,2023 at 04:21:05 UTC from IEEE Xplore. Restrictions apply.