【2022】IEEE CSVT a Progressive Quadric Graph Convolutional Network for 3D Human Mesh Recovery2
【2022】IEEE CSVT a Progressive Quadric Graph Convolutional Network for 3D Human Mesh Recovery2
1, JANUARY 2023
Abstract— Human mesh recovery from one single image has by adding a decoder head. Consequently, the computational
achieved rapid progress recently, but many methods suffer complexity can be reduced greatly.
from the image appearance overfitting since the training data
are collected along with accurate 3D annotations in controlled Index Terms— 3D human mesh, graph convolutional network,
settings of monotonous backgrounds or simple clothes. Some deep learning, VR.
methods regress human mesh vertices from poses to tackle the
above problem. However the mesh topologies have not been well I. I NTRODUCTION
exploited, and artifacts are often generated. In this paper, we aim
to find an efficient low-cost solution to human mesh reconstruc-
tion. To this end, we propose a Progressive Quadric Graph
U NDERSTANDING humans and their behavior are of
prime importance in computer vision, while 3D human
pose [1], [2] and shape estimation [3], [4] provide 3D repre-
Convolutional Network (PQ-GCN), and design a simple and fast
method for 3D human mesh recovery from a single image in the sentation, as shown in Fig. 1. This can be used in potential
wild. Specifically, we apply quadric-based surface simplification applications, such as virtual reality (VR), sports motion analy-
to human meshes and design a progressive graph convolution sis, robotics [5], [6], [7]. 3D recovery from monocular image
network, accompanied by mesh feature up-sampling, to deal with or video is more convenient and lower-cost for practical appli-
the mesh topologies. We carry out a series of studies to validate
our method. The results prove that our method achieves superior cations than the methods based on multi-view cameras, depth
performance on a challenging in-the-wild dataset, while using cameras, or Inertial Measurement Unit (IMU) [8]. However,
66% fewer parameters than the existing method, Pose2Mesh. this task is difficult because of the complex human articulation
Artifacts have also been eliminated and better visual quality has and 2D-to-3D ambiguity.
been obtained without any further post-processing and model Recent deep neural networks (DNNs) in this field have
fitting. Besides, the recovery can be stopped at an earlier stage
obtained rapid progress, and these methods are model-based or
model-free. The former regress the pose and shape parameters
Manuscript received 17 March 2022; revised 20 June 2022 and 30 July of one human mesh model, such as the popular Skinned
2022; accepted 11 August 2022. Date of publication 17 August 2022; date Multi-Person Linear (SMPL) [3]. In these methods, registered
of current version 6 January 2023. This work was supported in part by
the Guangdong Basic and Applied Basic Research Foundation under Grant datasets are needed, which are generated by fitting model
2021A1515012637; in part by the National Natural Science Foundation of parameters [9], [10], [11]. But limited exemplars constrain the
China under Grant U21A20487 and Grant 61976143; in part by the Chinese pose and shape spaces [12]. On the contrary, the latter estimate
Academy of Sciences (CAS) Key Technology Talent Program, Colleges and
Universities Key Laboratory of Intelligent Integrated Automation under Grant mesh vertex coordinates directly, in which case Convolutional
201502; in part by the CAS Key Laboratory of Human-Machine Intelligence- Neural Network (CNN) has limited applications.
Synergy Systems, Shenzhen Institutes of Advanced Technology, CAS, under As to the above situation, some methods exploit Graph Con-
Grant 2014DP173025; in part by the Guangdong-Hong Kong-Macao Joint
Laboratory of Human-Machine Intelligence-Synergy Systems under Grant volutional Network (GCN) [13], [14], [15] for 3D human mesh
2019B121205007; in part by the Shenzhen Engineering Laboratory for 3D recovery [16], [17]. Kolotouros et al. [16] propose a GCN
Content Generating Technologies under Grant [2017] 476; and in part by the architecture to regress 3D vertex coordinates from one single
Shenzhen Technology Project under Grant JCYJ20180507182610734. This
article was recommended by Associate Editor Y. Wu. (Corresponding author: image. Since the directly generated meshes have artifacts on
Xiaoliang Ma.) the surface, a Multi-layer Perceptron (MLP) is needed to
Lei Wang and Jun Cheng are with the CAS Key Laboratory of Human- predict SMPL parameters as post-processing. Choi et al. [17]
Machine Intelligence-Synergy Systems, Guangdong-Hong Kong-Macao Joint
Laboratory of Human-Machine Intelligence-Synergy Systems, Shenzhen Insti- propose Pose2Mesh to recover a 3D human mesh from 2D
tute of Advanced Technology, Chinese Academy of Sciences, Shenzhen human pose using GCN. They apply the graph coarsening
518055, China (e-mail: [email protected]; [email protected]). technique [18] to generate eight graphs of different resolutions,
Xunyu Liu and Xiaoliang Ma are with the College of Computer Science and
Software Engineering, Shenzhen University, Shenzhen 518060, China (e-mail: with the number of vertices ranging from 96 to 12288. This
[email protected]; [email protected]). technique can generate meshes with topologies of higher
Jiaji Wu is with the School of Electronic Engineering, Xidian University, resolution than the target which helps to learn more details,
Xi’an 710071, China (e-mail: [email protected]).
Mengchu Zhou is with the Helen and John C. Hartmann Department of but leads to more graph convolutions and excessive compu-
Electrical and Computer Engineering, New Jersey Institute of Technology, tational resources. From coarse to fine, they use a nearest
Newark, NJ 07102 USA (e-mail: [email protected]). up-sampling algorithm to achieve mesh feature up-sampling
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TCSVT.2022.3199201. with a scale factor of 2, by copying features of each vertex in
Digital Object Identifier 10.1109/TCSVT.2022.3199201 a low-resolution graph to two corresponding vertices in a high-
1051-8215 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on January 06,2023 at 09:01:16 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: PQ-GCN FOR 3D HUMAN MESH RECOVERY 105
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on January 06,2023 at 09:01:16 UTC from IEEE Xplore. Restrictions apply.
106 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 33, NO. 1, JANUARY 2023
Fig. 2. Framework of the proposed method. Taking one single image as input, 2D pose is detected first and translated into 3D space when needed. PQ-GCN
learns to map the 2D/3D pose to 3D mesh based on Cheybyshev GCN and mesh feature upsampling on the template topologies progressively. The recovery
can be stopped at an early stage by a decoding head.
Transformer [40] has been widely used in natural language different method by designing a light-weight framework based
processing and recently applied in computer vision tasks [41]. on a quadric graph convolution network.
Lin et al. [12] propose a mesh transformer to regress coor-
dinates of 3D human joints and vertices. They design a III. P ROPOSED M ETHOD
progressive dimensionality reduction architecture in multi- We propose a progressive method for 3D human mesh
transformer encoder. In our network, we have not used the recovery. 2D pose is detected first from an input image and
self-attention mechanism or the transformer encoder, since then optionally translated to 3D space. A progressive graph
seventy times more parameters and much more training time convolution network is designed to transform a 2D/3D human
are needed, while we aim to realize a low-cost lightweight pose to a 3D human body mesh. We first introduce our model’s
solution. architecture, then present the mesh generation method. Since
the graph convolutions are operated on meshes with a 2D/3D
pose as initialization which only has sparse points (i.e. human
D. 3D Recovery Methods for Occlusions, Videos, joints), we generate a set of coarse-to-fine template topologies
Expressions, or the Clothed and progressively achieve the ultimate target. The recovery
can also be stopped at an earlier stage to generate the target
To tackle with partial occlusion or truncation problem,
by previously adding a decoder-head, and the computational
some approaches [42], [43], [44], [45], [46], [47] have
complexity will be greatly reduced.
been proposed. A part attention regressor is proposed in [44]
based on the visibility of individual body parts. To regress
multiple people in one-stage, a collision-aware body-center- A. Model Architecture
guided representation is proposed in [45] with robustness to The overall framework is illustrated in Fig. 2, which consists
person-person occlusions. There are also some methods that of two sub-tasks, pose generation and 3D mesh generation.
have been introduced based on video frames [48], [49], [50], For the first task, taking a single image as input, a 2D pose is
[51], [52] or SMPL-X model [30], [53]. The clothed human detected and optionally translated to corresponding root joint-
shape recovery has also been studied. For example, implicit relative 3D pose, to be briefly introduced later. In this paper,
functions are learned to predict the occupancy field [54], we focus on 3D mesh generation by our proposed PQ-GCN,
and Pixel-aligned Implicit Function (PIFu) is leveraged for which takes 2D pose or concatenates 2D and 3D poses as
3D textured human reconstruction [55]. A topologically-aware input and predicts the 3D mesh vertices’ coordinates.
generative model, SMPLicit, is proposed in [56], and an 1) 2D/3D Pose Generation: We first detect a 2D pose from
opacity-aware differentiable rendering is introduced for this one input image as P2D ∈ R J ×2 , where J is the number
task [57]. The method in [8] takes a video as input, and uses of human joints. If needed, P2D is fed into a 3D pose
SFM for calibration, point cloud reinforcement for shaky body generation network [17], which contains two fully-connected
parts, as well as mesh deformation for surface details. These layers and two residual blocks. A residual block consists of
issues are not the focus of this work, since we aim to recover 1D batch normalization, ReLU activation, a dropout layer,
a body mesh from one single image. Instead, we propose a and a fully-connected layer. The first fully-connected layer
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on January 06,2023 at 09:01:16 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: PQ-GCN FOR 3D HUMAN MESH RECOVERY 107
transforms the input 2D pose into a 4096-dimensional feature vertices of M, where |F| is the dimension of the feature vector
vector. Then, the vector is fed into residual blocks and the at each vertex. Since we exploit graph convolution to learn
output dimension of the feature vector is 4096. The last topology features of a mesh, we view a mesh as a graph in
fully-connected layer converts the output from the residual the following discussions.
block into a 3 J -dimensional vector which is transferred to the 1) Progressive Mesh Generation: We introduce the pro-
root joint-relative 3D pose P3D ∈ R J ×3 . gressive human mesh generation for a set of coarse-to-fine
2) Progressive GCN: Our target is to learn the 3D coordi- topologies using a down-sampling, the purpose of which is to
nates of human mesh vertices from P2D or P3D . The human generate a group of mesh templates with special graph struc-
mesh M ∈ R N×3 has N vertices. Since we use the SMPL tures (6890, 1723, 431, 108, 27). It is finished before network
human template mesh topology, N = 6890. Based on graph training, and it’s only used to generate target templates. Then
convolution and mesh feature up-sampling, we construct a pro- we use graph convolutions and up-sampling to progressively
gressive mesh processing mechanism. The graph convolution generate target meshes (27, 108, 431, 1723, 6890) from 2D/3D
unit is designed as Chebyshev graph convolution. As shown in pose. According to the weighted graph cuts demonstration in
Fig. 3, our PQ-GCN consists of Bottleneck Layer, Progressive graph clustering [18], the coarsening phase of the multilevel
Layer, and Output Layer. Graph clustering (Graclus) algorithm is efficient, and it has
Bottleneck Layer is composed of three graph convolu- been used in graph convolution [58]. Pose2Mesh also adopts
tion units, two reshaping layers, and a fully-connected layer. the coarsening method of Graclus for graphs generation [17].
We construct a graph G P = (V P , A P , FP ) from the human However, we have found that this is not the most suitable for
skeleton, where V P denotes a group of J human joints, and human mesh recovery. Based on a surface simplification [59],
FP = P is the initialization feature map of graph G P . A P ∈ in our method we simplify a mesh by contracting vertex pairs
{0, 1} J ×J is an adjacency matrix that defines the connectivity iteratively, and uses quadratic matrices to calculate contraction
of those joints. As shown in the left part of Fig. 3, three graph cost.
units sequentially perform the Chebyshev graph convolution A pair contraction can be defined as (vi , v j ) → v̄. The cost
on G P , and transform the feature map FP from R J ×2 or R J ×5 of contracting a pair is defined as
to R J ×64. Then, the bottleneck layer up-samples the feature
map FP to mesh feature F4 of the lowest resolution human (v̄) = v̄T (Qi + Q j )v̄, (1)
body mesh M4 by reshaping and a fully-connected layer.
Progressive Layer realizes the mesh feature progressive where v̄ = [v¯x , v¯y , v¯z , 1]T represents a vertex’s coordinates.
up-sampling which receives bottleneck layer’s output to ini- Qi and Q j are 4 × 4 symmetric matrices corresponding to
tialize the mesh feature F4 . Then, five graph convolution vertices vi and v j , respectively. A down-sampling matrix S ∈
blocks with interleaved up-sampling layers generate the 3D {0, 1}n×m can be obtained based on the mesh down-sampling
mesh feature F0 of human body mesh M0 in R6890×128. Each algorithm, where m and n respectively represent the number
graph convolution block consists of two graph convolution of vertices before and after mesh down-sampling.
units and a residual connection, while each up-sampling layer We apply mesh down-sampling to an SMPL model to gen-
corresponds to an up-sampling matrix. The set of matrices erate a set of coarse-to-fine meshes, {Mc = (Vc , Ac , Fc )}C c=0 ,
are {Uc ∈ R|Vc |×|Vc+1 | }C−1 and the corresponding down-sampling matrix is {Sc ∈
c=0 where C = 4. The up-sampling
process is defined as Fc = Uc Fc+1 , where Fc is the first R|Vc+1 |×|Vc | }C−1
c=0 , where C is the number of down-sampling
feature map of Mc and Fc+1 is the last feature map of Mc+1 . steps. A human mesh down-sampling can be defined as
Five graph convolution blocks sequentially operate on the
Vc+1 = Sc Vc , c = [0, . . . , C − 1]. (2)
progressive mesh group {Mc = (Vc , Ac , Fc )}4c=0 constituting a
coarse-to-fine human mesh processing mechanism. A decoding We set the down-sampling ratio to be 4 in our experiment,
head is optionally added at each block, which consists of two i.e.,
layers of Chebyshev graph convolution and one Multi-layer
Perceptron (MLP), so that the 3D recovery can be stopped at |Vc |
|Vc+1 | = + 0.5, (3)
an earlier stage when needed. 4
Output Layer is composed of two graph convolution units. where is a floor function and |V∗ | denote the number of
It receives the progressive layer’s output, and the vertices’ vertices in V∗ . The SMPL model has 6890 vertices, and the
feature dimension is reduced from 128 to 3 to generate a 3D number of vertices from coarse to fine can be
human mesh in R6890×3 .
|Vc | = 27, 108, 431, 1723, 6890. (4)
B. Progressive Quadric Graph Construction M0 represents the vanilla SMPL human mesh topology. Fig. 4
A 3D human mesh M can be represented as a set of vertices, shows the human body mesh down-sampling process.
edges and vertex features, M = (V, A, F), with |V | = n 2) GCN on the Progressive Meshes: Since 3D body meshes
vertices, V ∈ Rn×3 . A ∈ {0, 1}n×n is an adjacency matrix to {Mc = (Vc , Ac , Fc )}Cc=0 can be represented by undirected
define the connectivity of vertices in mesh M. Ai j = 1 when graphs, we can exploit the topology features using GCNs [58],
vertices i and j are the same or connected, and otherwise [60], [61]. In our experiment, we use Chebyshev graph con-
Ai j = 0. The mesh feature F ∈ Rn×|F | is attached to the volution [58], [62] to reduce the computational complexity.
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on January 06,2023 at 09:01:16 UTC from IEEE Xplore. Restrictions apply.
108 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 33, NO. 1, JANUARY 2023
Fig. 3. Our progressive graph convolutional network. It consists of three parts, i.e. the Bottleneck Layer, Progressive Layer, and Output Layer.
Fig. 4. Quadric graph down-sampling for the human body model in our method. A set of template topologies will be generated.
As for 3D mesh Mc , its normalized Laplacian matrix is The task of GCN is to train the parameters of the K -th
defined as Chebyshev coefficient matrix k ∈ R|Fin |×|Fout | in the graph
−1 −1 convolution unit.
L c = I − Dc 2 A c Dc 2 , (5)
3) Mesh-Feature Upsampling: To recover the final human
where I is the identity matrix, and
Dc is the Degree matrix mesh, we need to up-sample the coarse meshes accompanied
for each vertex in V as (Dc )i j = j (A c )i j . Then we can with graph convolutions progressively. The mesh-feature up-
calculate the scaled Laplacian as sampling (MF-Upsampling) defines the feature transformation
Lc = 2Lc /λ̂ − I, (6) relationship between meshes of adjacent-resolutions in a set
of progressive meshes {Mc = (Vc , Ac , Fc )}C c=0 . We use the
where λ̂ is the largest eigenvalue. The Chebyshev polynomial barycentric-based method [37] for our predefined human body
Tk (x) of order k can be computed as: meshes.
Tk (x) = 2x Tk−1 (x) − Tk−2 (x), (7) In {Mc = (Vc , Ac , Fc )}C c=0 , the resolution of Mc and
M0 respectively are the lowest and highest, and the mesh
where T0 (x) = 1, T1 (x) = x. The graph convolution unit resolution gradually increases from Mc to M0 . Generally,
performing the spectral graph convolution on the mesh Mc we use Fc+1 ∈ R|Vc+1 |×|F | and Fc ∈ R|Vc |×|F | to represent
can be defined as the features of a pair of adjacent resolution human meshes,
−1
K
and c = [C − 1, . . . , 0]. The target of mesh feature upsam-
Fout = Tk (
Lc )Fin k , (8) pling is to project Fc+1 to Fc . We define Uc as the mesh
k=0 feature up-sampling matrix, and the transformation can be
with input feature map Fin ∈ R N×|Fin | and output feature map defined as
Fout ∈ R N×|Fout | , where N = |V | is the number of vertices
of the mesh Mc . Fc = Uc Fc+1 , c = [C − 1, . . . ., 0]. (9)
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on January 06,2023 at 09:01:16 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: PQ-GCN FOR 3D HUMAN MESH RECOVERY 109
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on January 06,2023 at 09:01:16 UTC from IEEE Xplore. Restrictions apply.
110 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 33, NO. 1, JANUARY 2023
TABLE I
A CCURACY C OMPARISON OF T WO D IFFERENT T RAINING
M ETHODS ON H UMAN 3.6M
TABLE II
C OMPLEXITY AND S PEED C OMPARISON B ETWEEN P OSE 2M ESH [17] AND
O URS ON THE S AME GPU. O UR M ETHOD H AS R EDUCED A BOUT 66%
PARAMETERS , AND RUNS A BOUT 50% FASTER T HAN P OSE 2M ESH
B. Evaluation Metrics
We use two metrics, MPJPE and PA-MPJPE, for 3D pose
evaluation, and one metric, MPVE, for 3D mesh evaluation in
millimeters (mm). Mean-Per-Joint-Position-Error (MPJPE) [5]
assesses the Euclidean distance between the ground-truth
and the predicted joints. Procrustes Analysis MPJPE
(PA-MPJPE) or Reconstruction Error [70] computes MPJPE
after performing a 3D alignment on 3D pose using Procrustes Fig. 6. Qualitative results on in-the-wild datasets, COCO (rows 1-4) and
Analysis (PA). Mean-Per-Vertex-Error (MPVE) [24] measures 3DPW (rows 5-7).
the Euclidean distance between the predicted and ground truth
mesh vertices.
D. Comparison With Pose2Mesh compare the performance of our method and Pose2Mesh on
Since our work is based on Pose2Mesh [17], we make com- Human3.6M and 3DPW.
parison with it about resource consumption, model parameters, First, we list a comparison on Human3.6M in Table III.
inferencing speed, as well as reconstruction errors. The datasets on the top row of the table are used for training.
We first report the complexity comparison between our Table III shows that our method outperforms Pose2Mesh when
method and Pose2Mesh in Table II. When the batch size the training dataset is Human3.6M or Human3.6M+COCO.
is the same (i.e. 64), our model has reduced 46% GPU When we use MuCo-3DHP as an additional training dataset,
memory consumption of Pose2Mesh [17], and the parameters the performance of our method is slightly worse than that of
of our PQ-GCN is 34% of Pose2Mesh. Only one GPU is Pose2Mesh.
needed for training. Our method runs at 62 fps, 50% faster Second, we compare our method with Pose2Mesh on 3DPW
than Pose2Mesh (41 fps) on the same RTX3090. Then we in Table IV. The results demonstrate that our method obviously
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on January 06,2023 at 09:01:16 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: PQ-GCN FOR 3D HUMAN MESH RECOVERY 111
TABLE III
A CCURACY C OMPARISON B ETWEEN O UR PQ-GCN AND P OSE 2M ESH ON H UMAN 3.6M. THE DATASET ( S ) ON T OP IS (A RE ) U SED FOR T RAINING
TABLE IV
A CCURACY C OMPARISON B ETWEEN O UR PQ-GCN AND P OSE 2M ESH ON 3DPW. T HE D ATASET ( S ) ON T OP I S (A RE ) U SED FOR T RAINING
TABLE V
C OMPARISON W ITH S TATE - OF - THE -A RT M ETHODS
ON THE 3DPW D ATASET
TABLE VI
3D R ECONSTRUCTION R ESULTS W ITH T RUE 2D P OSE AS I NPUT
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on January 06,2023 at 09:01:16 UTC from IEEE Xplore. Restrictions apply.
112 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 33, NO. 1, JANUARY 2023
Fig. 9. Visual comparison between state-of-the-art methods and ours. From left to right: Input image, HMR [20], GraphCMR [16], SPIN [21], I2L-MeshNet
[29], and ours.
much better on MPJPE and MPVE. Our method outperforms following reasons. First, avoiding image appearance fitting
others including some most recent methods. Hence, this proves is important for the model’s generalization ability for in-the-
the efficiency of our proposed PQ-GCN attributed to the wild images. Second, we can benefit from accurate 2D human
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on January 06,2023 at 09:01:16 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: PQ-GCN FOR 3D HUMAN MESH RECOVERY 113
TABLE VII
A CCURACY C OMPARISON B ETWEEN O UR M ETHOD AND
S TATE - OF - THE -A RT ON H UMAN 3.6M AND 3DPW
Fig. 11. Visualization of reconstruction error map. From left to right: input
image, ground truth, our result, error map.
TABLE VIII
C OMPLEXITY C OMPARISON W ITH S TATE - OF - THE -A RT M ETHODS
Fig. 10. Some inaccurate results of our method. From left to right: Input
image, 2D pose, recovered mesh. Since our method recovers the mesh from
the human pose, inaccurate results will be generated when the 2D human pose
prediction is unreasonable.
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on January 06,2023 at 09:01:16 UTC from IEEE Xplore. Restrictions apply.
114 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 33, NO. 1, JANUARY 2023
TABLE XV
TABLE XI C OMPARISON B ETWEEN D IRECT-GCN AND PQ-GCN. (E RROR IN mm)
T HE J OINT E RRORS ON H UMAN 3.6M W HEN THE N ETWORK I S
T RAINED W ITH VARIOUS C OMBINATIONS OF L OSSES
TABLE XVI
C OMPARISON OF P ROGRESSIVE D ECODING F ROM D IFFERENT
R ESOLUTIONS ON 3DPW. H UMAN 3.6M AND
COCO A RE U SED FOR T RAINING
TABLE XII
L OSSES ’ E FFECT ON H UMAN 3.6M AND
3DPW. L sur f ace = L edge &L normal
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on January 06,2023 at 09:01:16 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: PQ-GCN FOR 3D HUMAN MESH RECOVERY 115
TABLE XVII
C OMPLEXITY C OMPARISON OF P ROGRESSIVE D ECODING S TRUCTURES
R EFERENCES
[1] G. Wei, C. Lan, W. Zeng, and Z. Chen, “View invariant 3D human pose
estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 12,
pp. 4601–4610, Dec. 2020.
[2] R. Gu, G. Wang, Z. Jiang, and J.-N. Hwang, “Multi-person hierarchical
3D pose estimation in natural videos,” IEEE Trans. Circuits Syst. Video
Technol., vol. 30, no. 11, pp. 4245–4257, Nov. 2020.
[3] M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black,
“SMPL: A skinned multi-person linear model,” ACM Trans. Graph.,
vol. 34, no. 6, pp. 1–16, Oct. 2015.
[4] F. Bogo, A. Kanazawa, C. Lassner, P. Gehler, J. Romero, and
M. J. Black, “Keep it SMPL: Automatic estimation of 3D human pose
and shape from a single image,” in Proc. Eur. Conf. Comput. Vis.
(ECCV), 2016, pp. 561–578.
[5] C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, “Human3.6M:
Large scale datasets and predictive methods for 3D human sensing in
Fig. 12. Visual comparison of decoded results from different resolutions. natural environments,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36,
From left to right: Input image, decoded from vertices of 27, 108, 431, 1723, no. 7, pp. 1325–1339, Jul. 2014.
6890. [6] C. Zhu, J. Yang, Z. Shao, and C. Liu, “Vision based hand gesture
recognition using 3D shape context,” IEEE/CAA J. Autom. Sinica, vol. 8,
where Decode-X represents a decoder-head is added under X no. 9, pp. 1600–1613, Sep. 2021.
[7] M. Zhao, G. Xiong, M. Zhou, Z. Shen, and F.-Y. Wang, “3D-
resolution. From Table XVI, we can see that the performance RVP: A method for 3D object reconstruction from a single depth
get better when progressively decoding with higher resolution. view using voxel and point,” Neurocomputing, vol. 430, pp. 94–103,
At the same time, the recovery can be stopped at an earlier Mar. 2021.
[8] H. Zhu, Y. Liu, J. Fan, Q. Dai, and X. Cao, “Video-based outdoor human
stage when needed. By previously adding the decoder head, reconstruction,” IEEE Trans. Circuits Syst. Video Technol., vol. 27, no. 4,
the GPU memory and inference time can be greatly reduced pp. 760–770, Apr. 2017.
as shown in Table XVII. Visual comparison of decoded results [9] M. Loper, N. Mahmood, and M. J. Black, “MoSh: Motion and shape
capture from sparse markers,” ACM Trans. Graph., vol. 33, no. 6,
from different resolutions has been shown in Fig. 12. pp. 1–13, Nov. 2014.
[10] N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. Black,
V. C ONCLUSION “AMASS: Archive of motion capture as surface shapes,” in Proc.
This paper presents a progressive 3D human mesh recovery IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 5441–5450.
method for a single image in the wild. We propose PQ-GCN [11] B. L. Bhatnagar, C. Sminchisescu, C. Theobalt, and G. Pons-Moll,
“LoopReg: Self-supervised learning of implicit surface correspondences,
to learn the mapping from a 2D/3D pose to a 3D mesh. pose and shape for 3D human mesh registration,” in Proc. Adv. Neural
Based on the elaborated template meshes, we construct the Inf. Process. Syst. (NeurIPS), vol. 33, 2020, pp. 12909–12922.
corresponding progressive graph convolution. Our method has [12] K. Lin, L. Wang, and Z. Liu, “End-to-end human pose and mesh
reconstruction with transformers,” in Proc. IEEE/CVF Conf. Comput.
much fewer parameters (66% down), a faster inference speed Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 1954–1963.
(50% up) and less GPU memory consumption. Experimental [13] X. Hong, T. Zhang, Z. Cui, and J. Yang, “Variational gridded graph
results show that it has eliminated the artifacts in previous convolution network for node classification,” IEEE/CAA J. Autom.
Sinica, vol. 8, no. 10, pp. 1697–1708, Oct. 2021.
graph-based methods, such as Pose2Mesh and GraphCMR, [14] X. Liu, M. Yan, L. Deng, G. Li, X. Ye, and D. Fan, “Sampling
and outperforms state-of-the-art methods, especially on an in- methods for efficient training of graph convolutional networks: A
the-wild dataset named 3DPW. Besides, by adding a decoding survey,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 2, pp. 205–234,
Feb. 2022.
head in the progressive layer, the recovery can be stopped at [15] Z. Zhuo, X. Luo, and M. Zhou, “An auxiliary learning task-enhanced
an earlier stage, thus effectively decreasing the computational graph convolutional network model for highly-accurate node classifica-
burden. tion on weakly supervised graphs,” in Proc. IEEE Int. Conf. Smart Data
Services (SMDS), Sep. 2021, pp. 192–197.
This paper focuses on the mapping from a 2D/3D pose to a [16] N. Kolotouros, G. Pavlakos, and K. Daniilidis, “Convolutional mesh
3D mesh, without well exploiting the shape estimation, which regression for single-image human shape reconstruction,” in Proc.
needs other information such as silhouette. 3D human data IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019,
pp. 4496–4505.
with a real ground-truth mesh tend to be deficient. The long [17] H. Choi, G. Moon, and K. M. Lee, “Pose2Mesh: Graph convolutional
range interactions in the mesh topology have not been well network for 3D human pose and mesh recovery from a 2D human pose,”
investigated. Our future work should study these questions, in Proc. Eur. Conf. Comput. Vis. (ECCV), 2020, pp. 769–787.
[18] I. S. Dhillon, Y. Guan, and B. Kulis, “Weighted graph cuts without
build a new dataset based on our structured light-based 3D eigenvectors a multilevel approach,” IEEE Trans. Pattern Anal. Mach.
range sensing system, and explore its new applications. Intell., vol. 29, no. 11, pp. 1944–1957, Nov. 2007.
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on January 06,2023 at 09:01:16 UTC from IEEE Xplore. Restrictions apply.
116 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 33, NO. 1, JANUARY 2023
[19] T. Marcard, R. Henschel, M. Black, B. Rosenhahn, and G. Pons-Moll, [43] R. Chris and F. F. David, “Full-body awareness from partial observa-
“Recovering accurate 3D human pose in the wild using IMUs and tions,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2020, pp. 522–539.
a moving camera,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, [44] M. Kocabas, C.-H.-P. Huang, O. Hilliges, and M. J. Black, “PARE: Part
pp. 601–617. attention regressor for 3D human body estimation,” in Proc. IEEE/CVF
[20] A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik, “End-to-end Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 11127–11137.
recovery of human shape and pose,” in Proc. IEEE/CVF Conf. Comput. [45] Y. Sun, Q. Bao, W. Liu, Y. Fu, M. J. Black, and T. Mei, “Monocular,
Vis. Pattern Recognit., Jun. 2018, pp. 7122–7131. one-stage, regression of multiple 3D people,” in Proc. IEEE/CVF Int.
[21] N. Kolotouros, G. Pavlakos, M. Black, and K. Daniilidis, “Learning Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 11179–11188.
to reconstruct 3D human pose and shape via model-fitting in the [46] H. Choi, G. Moon, J. Park, and K. M. Lee, “Learning to estimate
loop,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, robust 3D human mesh from in-the-wild crowded scenes,” 2021,
pp. 2252–2261. arXiv:2104.07300.
[22] R. A. Guler and I. Kokkinos, “HoloPose: Holistic 3D human recon- [47] K. Yang, R. Gu, M. Wang, M. Toyoura, and G. Xu, “LASOR: Learning
struction in-the-wild,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern accurate 3D human pose and shape via synthetic occlusion-aware data
Recognit. (CVPR), Jun. 2019, pp. 10876–10886. and neural mesh rendering,” IEEE Trans. Image Process., vol. 31,
[23] M. Omran, C. Lassner, G. Pons-Moll, P. Gehler, and B. Schiele, “Neural pp. 1938–1948, 2022.
body fitting: Unifying deep learning and model based human pose [48] J. Zhang, P. Felsen, A. Kanazawa, and J. Malik, “Predicting 3D human
and shape estimation,” in Proc. Int. Conf. 3D Vis. (DV), Sep. 2018, dynamics from video,” in Proc. IEEE/CVF Int. Conf. Comput. Vis.
pp. 484–494. (ICCV), Oct. 2019, pp. 7113–7122.
[24] G. Pavlakos, L. Zhu, X. Zhou, and K. Daniilidis, “Learning to esti- [49] A. Kanazawa, J. Y. Zhang, P. Felsen, and J. Malik, “Learning 3D human
mate 3D human pose and shape from a single color image,” in dynamics from video,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, Recognit. (CVPR), Jun. 2019, pp. 5607–5616.
pp. 459–468. [50] M. Kocabas, N. Athanasiou, and M. J. Black, “VIBE: Video inference
[25] Y. Xu, S.-C. Zhu, and T. Tung, “DenseRaC: Joint 3D pose and shape for human body pose and shape estimation,” in Proc. IEEE/CVF Conf.
estimation by dense render-and-compare,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 5252–5262.
Comput. Vis. (ICCV), Oct. 2019, pp. 7759–7769. [51] Z. Cao, M. Wang, S. Guan, W. Liu, C. Qian, and L. Ma, “PNO: Person-
[26] Y. Sun, Y. Ye, W. Liu, W. Gao, Y. Fu, and T. Mei, “Human mesh alized network optimization for human pose and shape reconstruction,”
recovery from monocular images via a skeleton-disentangled represen- in Proc. Artif. Neural Netw. Mach. Learn., 2021, pp. 356–367.
tation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, [52] S. Zou et al., “EventHPE: Event-based 3D human pose and shape esti-
pp. 5348–5357. mation,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021,
[27] Y. Rong, Z. Liu, C. Li, K. Cao, and C. C. Loy, “Delving deep into hybrid pp. 10976–10985.
annotations for 3D human recovery in the wild,” in Proc. IEEE/CVF Int.
[53] V. Choutas, G. Pavlakos, T. Bolkart, D. Tzionas, and M. J. Black,
Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 5339–5347.
“Monocular expressive body regression through body-driven attention,”
[28] A. S. Jackson, C. Manafas, and G. Tzimiropoulos, “3D human body in Proc. Eur. Conf. Comput. Vis. (ECCV), 2020, pp. 20–40.
reconstruction from a single image via volumetric regression,” in Proc.
[54] S. Saito, Z. Huang, R. Natsume, S. Morishima, H. Li, and A. Kanazawa,
Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 64–77.
“PIFu: Pixel-aligned implicit function for high-resolution clothed human
[29] G. Moon and K. M. Lee, “I2L-MeshNet: Image-to-lixel prediction
digitization,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV),
network for accurate 3D human pose and mesh estimation from a
Oct. 2019, pp. 2304–2314.
single RGB image,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2020,
[55] R. Li, Y. Xiu, S. Saito, Z. Huang, K. Olszewski, and H. Li, “Monocular
pp. 752–768.
real-time volumetric performance capture,” in Proc. Eur. Conf. Comput.
[30] G. Pavlakos et al., “Expressive body capture: 3D hands, face, and body
Vis. (ECCV), 2020, pp. 49–67.
from a single image,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Recognit. (CVPR), Jun. 2019, pp. 10967–10977. [56] E. Corona, A. Pumarola, G. Alenyà, G. Pons-Moll, and
[31] T. Luan, Y. Wang, J. Zhang, Z. Wang, Z. Zhou, and Y. Qiao, “PC-HMR: F. Moreno-Noguer, “SMPLicit: Topology-aware generative model
Pose calibration for 3D human mesh recovery from 2D images/videos,” for clothed people,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
in Proc. 35nd AAAI Conf. Artif. Intell. (AAAI), 2021, pp. 2269–2276. Recognit. (CVPR), Jun. 2021, pp. 11870–11880.
[32] H. Zhang et al., “PyMAF: 3D human pose and shape regression with [57] Z. Huang, Y. Xu, C. Lassner, H. Li, and T. Tung, “ARCH: Animatable
pyramidal mesh alignment feedback loop,” in Proc. IEEE/CVF Int. Conf. reconstruction of clothed humans,” in Proc. IEEE/CVF Conf. Comput.
Comput. Vis. (ICCV), Oct. 2021, pp. 11446–11456. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 3090–3099.
[33] K. Li et al., “Image-guided human reconstruction via multi-scale [58] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural
graph transformation networks,” IEEE Trans. Image Process., vol. 30, networks on graphs with fast localized spectral filtering,” in Proc. 30th
pp. 5239–5251, 2021. Int. Conf. Neural Inf. Process. Syst., vol. 29, 2016, pp. 3844–3852.
[34] W. Zeng, W. Ouyang, P. Luo, W. Liu, and X. Wang, “3D human [59] M. Garland and P. S. Heckbert, “Surface simplification using quadric
mesh regression with dense correspondence,” in Proc. IEEE/CVF Conf. error metrics,” in Proc. 24th Annu. Conf. Comput. Graph. Interact.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 7052–7061. Techn. (SIGGRAPH), 1997, pp. 209–216.
[35] P. Yao, Z. Fang, F. Wu, Y. Feng, and J. Li, “DenseBody: Directly [60] B. Joan, Z. Wojciech, S. Arthur, and L. Yann, “Spectral networks
regressing dense 3D human pose and shape from a single color image,” and locally connected networks on graphs,” in Proc. Int. Conf. Learn.
CoRR, vol. 1903.10153, pp. 1–10, Mar. 2019. Represent. (ICLR), 2014.
[36] T. Zhang, B. Huang, and Y. Wang, “Object-occluded human shape and [61] F. Wu, A. Souza, T. Zhang, C. Fifty, T. Yu, and K. Weinberger,
pose estimation from a single color image,” in Proc. IEEE/CVF Conf. “Simplifying graph convolutional networks,” in Proc. 36th Int. Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 7374–7383. Mach. Learn., vol. 97, Jun. 2019, pp. 6861–6871.
[37] A. Ranjan, T. Bolkart, S. Sanyal, and M. J. Black, “Generating 3D faces [62] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
using convolutional mesh autoencoders,” in Proc, Eur. Conf. Comput. convolutional networks,” in Proc. Int. Conf. Learn. Represent. (ICLR),
Vis. (ECCV), 2018, pp. 725–741. 2017.
[38] G. Varol et al., “BodyNet: Volumetric inference of 3D human body [63] T.-Y. Lin et al., “Microsoft COCO: Common objects in context,” in
shapes,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 20–38. Proc. Eur. Conf. Comput. Vis., 2014, pp. 740–755.
[39] Z. Zheng, T. Yu, Y. Wei, Q. Dai, and Y. Liu, “DeepHuman: 3D human [64] D. Mehta et al., “Single-shot multi-person 3D pose estimation from
reconstruction from a single image,” in Proc. IEEE/CVF Int. Conf. monocular RGB,” in Proc. Int. Conf. 3D Vis. (DV), Sep. 2018,
Comput. Vis. (ICCV), Oct. 2019, pp. 7738–7748. pp. 120–130.
[40] A. Vaswani et al., “Attention is all you need,” in Proc. 31st Int. Conf. [65] G. Pavlakos, X. Zhou, K. G. Derpanis, and K. Daniilidis, “Coarse-to-fine
Neural Inf. Process. Syst., Long Beach, CA, USA, 2017, pp. 6000–6010. volumetric prediction for single-image 3D human pose,” in Proc. IEEE
[41] A. Dosovitskiy et al., “An image is worth 16×16 words: Transformers Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1263–1272.
for image recognition at scale,” in Proc. Int. Conf. Learn. Represent. [66] D. Mehta et al., “Monocular 3D human pose estimation in the wild
(ICLR), 2021. using improved CNN supervision,” in Proc. Int. Conf. 3D Vis. (DV),
[42] M. Wang, F. Qiu, W. Liu, C. Qian, X. Zhou, and L. Ma, “Monocular Oct. 2017, pp. 506–516.
human pose and shape reconstruction using part differentiable render- [67] CMU Graphics Lab. (2020). CMU Graphics Lab Motion Capture
ing,” Comput. Graph. Forum, vol. 39, no. 7, pp. 351–362, Oct. 2020. Database. [Online]. Available: http://mocap.cs.cmu.edu/
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on January 06,2023 at 09:01:16 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: PQ-GCN FOR 3D HUMAN MESH RECOVERY 117
[68] X. Sun, B. Xiao, F. Wei, S. Liang, and Y. Wei, “Integral human pose Jiaji Wu (Member, IEEE) received the B.S. degree
regression,” in Proc. Comput. Vis. (ECCV), 2018, pp. 536–553. in electrical engineering from Xidian University,
[69] K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution represen- Xi’an China, in 1996, the M.S. degree from the
tation learning for human pose estimation,” in Proc. IEEE/CVF Conf. National Time Service Center (NTSC), Chinese
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 5686–5696. Academy of Sciences, in 2002, and the Ph.D. degree
[70] X. Zhou, M. Zhu, G. Pavlakos, S. Leonardos, K. G. Derpanis, and in electrical engineering from Xidian University in
K. Daniilidis, “MonoCap: Monocular human motion capture using a 2005. He is currently a Professor at Xidian Univer-
CNN coupled with a geometric prior,” IEEE Trans. Pattern Anal. Mach. sity. His current research interests include still image
Intell., vol. 41, no. 4, pp. 901–914, Apr. 2019. coding, hyperspectral/multispectral image process-
[71] K. Lin, L. Wang, and Z. Liu, “Mesh graphormer,” in Proc. IEEE/CVF ing, communication, big data, the IoT, and high-
Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 12919–12928. performance computing.
[72] A. A. M. Muzahid, W. Wan, F. Sohel, L. Wu, and L. Hou, “CurveNet:
Curvature-based multitask learning deep networks for 3D object recog-
nition,” in IEEE/CAA J. Autom. Sinica, vol. 8, no. 6, pp. 1177–1187,
June 2021.
[73] H. Xia, M. A. Khan, Z. Li, and M. Zhou, “Wearable robots for human
underwater movement ability enhancement: A survey,” in IEEE/CAA Jun Cheng (Member, IEEE) received the B.E.
J. Autom. Sinica, vol. 9, no. 6, pp. 967–977, June 2022. and M.E. degrees from the University of Science
and Technology of China, Hefei, China, in 1999 and
2002, respectively, and the Ph.D. degree from The
Lei Wang (Member, IEEE) received the Ph.D. Chinese University of Hong Kong, Hong Kong,
degree in electrical engineering from Xidian Uni- in 2006. He is currently with the Shenzhen Insti-
versity, China, in 2010. From 2011 to 2012, tute of Advanced Technology, Chinese Academy
he worked with Huawei Technologies Company Ltd. of Sciences, Shenzhen, China, as a Professor, and
From 2014 to 2015, he was with the Department of the Director of the Laboratory for Human Machine
Embedded Systems Engineering, Incheon National Control. His current research interests include com-
University, as a Post-Doctoral Fellow. He is currently puter vision, robotics, and machine intelligence and
an Associate Professor with the Shenzhen Institute control.
of Advanced Technology, Chinese Academy of Sci-
ences (CAS). He has authored or coauthored over
50 papers in conferences and journals. His research
interests include image processing, transforms, machine learning, computer
vision, visual semantic understanding, video analysis, 3D reconstruction, and
robotics. Mengchu Zhou (Fellow, IEEE) received the B.S.
degree in control engineering from the Nanjing Uni-
versity of Science and Technology, Nanjing, China,
Xunyu Liu is currently pursuing the M.S. degree in 1983, the M.S. degree in automatic control from
with the College of Computer Science and Software the Beijing Institute of Technology, Beijing, China,
Engineering, Shenzhen University, Shenzhen, China. in 1986, and the Ph.D. degree in computer and
His research interests include computer vision, deep systems engineering from the Rensselaer Polytech-
learning, and 3D reconstruction. nic Institute, Troy, NY, USA, in 1990. He joined
the New Jersey Institute of Technology, where he
is currently a Distinguished Professor. He has over
1000 publications including 12 books, 700 journal
articles (more than 600 in IEEE T RANSACTIONS), 29 patents, and 30 book-
chapters. His research interests include Petri nets, automation, the Internet
of Things, and big data. He is a Life Member of the Chinese Association
for Science and Technology, USA, and served as its President, in 1999. He
Xiaoliang Ma (Member, IEEE) received the Ph.D. is a fellow of the International Federation of Automatic Control (IFAC),
degree from the School of Computing, Xidian Uni- the American Association for the Advancement of Science (AAAS), the
versity, Xi’an, China, in 2014. He is currently Chinese Association of Automation (CAA), and the National Academy of
an Assistant Professor with the College of Com- Inventors (NAI). He was a recipient of the Excellence in Research Prize
puter Science and Software Engineering, Shenzhen and Medal from NJIT, the Humboldt Research Award for U.S. Senior
University, Shenzhen, China. His research inter- Scientists from Alexander von Humboldt Foundation, the Franklin V. Taylor
ests include evolutionary computation, multiobjec- Memorial Award and the Norbert Wiener Award from IEEE SMC Society, the
tive optimization, and cooperative coevolution. Computer-Integrated Manufacturing University-Lead Award from the Society
of Manufacturing Engineers, the Distinguished Service Award from the IEEE
Robotics and Automation Society, and the Edison Patent Award from the
Research and Development Council of New Jersey.
Authorized licensed use limited to: Shenzhen Institute of Advanced Technology CAS. Downloaded on January 06,2023 at 09:01:16 UTC from IEEE Xplore. Restrictions apply.