0% found this document useful (0 votes)
21 views11 pages

Learning Relation in Crowd Using Gated Graph Convolutional Networks For DRL-Based Robot Navigation

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 11

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO.

6, JUNE 2024 5085

Learning Relation in Crowd Using Gated Graph


Convolutional Networks for DRL-Based
Robot Navigation
Haoge Jiang , Niraj Bhujel , Zhuoyi Lin, Kong-Wah Wan , Jun Li ,
Senthilnath Jayavelu , Senior Member, IEEE, and Xudong Jiang , Fellow, IEEE

Abstract— Deep reinforcement learning (DRL) frameworks humans. When a mobile robot has to share an environment
have shown their remarkable effectiveness in learning navigation with rational humans, safe and effective collision avoidance
policy for the mobile robot navigating in a human crowded with socially-acceptable behavior becomes a crucial challeng-
environment. Moreover, attention mechanisms coupled with DRL
allows the robot to identify neighbors with different level of influ- ing aspect [1], [2]. As such, collision avoidance in a human
ence and incorporate them into the robot’s decision. However, crowded environment attracts lots of attention in the robot
as the crowd density increases, attention mechanisms may fail navigation community [3], [4], [5], [6]. Traditional approaches
to identify critical neighbors which can lead to significant drops like DWA [3] and Reciprocal Velocity Obstacle (RVO) meth-
in navigation efficiency. In this work, we aim to address this ods [4], [7], [8] consider all humans in the environments as
limitation by encoding both human-human and human-robot
interaction using a special class of Graph Convolutional Networks static obstacles or make only one-step prediction to find the
(GCN) known as Message-Passing GCN (MP-GCN). In con- next optimal action of the robot. Another line of works [9],
trast to existing methods, where attention between robot and [10], [11], [12], [13] focus on forecasting pedestrian trajecto-
humans are encoded uniformly, the proposed approach named ries prior to planning the robot’s movements. Unfortunately,
MP-GatedGCN-RL encodes asymmetric interactions using the as the crowd density increases, the possible movable space of
combination of novel message-passing function and edge-wise
gating mechanisms. We evaluate our approach on the simu- the robot could be fully occupied by the expected pedestrian
lated environments of ETH/UCY pedestrians datasets consisting paths. Meanwhile, these methods are based on mathemat-
of different scenarios like collision avoidance, group forming, ical models, that may not always yield optimal solutions
diverging, crossing, and so on. Experimental results demonstrate or even “freezing” when the number of humans in motion
that our proposed method outperforms the conventional bench- is very large, which may lead the agent to a dangerous
mark dynamic avoidance method ORCA with a 20.6% increase in
success rate and a 9.1% reduction in navigation time. Moreover, situation [14].
we also achieve a 5.5% enhancement in success rate compared to To alleviate the above problems, recent works suggest
other state-of-the-art DRL-based methods without any additional modeling the robot and human states and their relations
labeled expert data nor prior supervised learning. (interactions). As such, they applied a sequence to encode
Index Terms— Deep reinforcement learning, graph convo- the interactions between an agent and humans, which only
lutional network, collision avoidance, motion and path plan- partially models the interactions or lacks spatial relation
ning, real-time autonomous navigation, autonomous unmanned reasoning, i.e., they only consider explicitly modeling the
vehicles.
robot-human interactions or using a sequence model to store
the crowds’ state [5], [15], [16].
I. I NTRODUCTION
Most recently, graph-based learning methods MP-RGL [17]

M OBILE robot application environment will expand from


static in-door to outdoor which is often crowded with
and G-GCNRL [6] use Graph Convolutional Networks (GCN)
[18] to explicitly model all interactions and spatial relation-
ships among agents. These approaches, however, ignore the
Manuscript received 10 October 2022; revised 25 March 2023 and
23 September 2023; accepted 27 November 2023. Date of publication asymmetric interactions hidden in pedestrian trajectories and
28 December 2023; date of current version 31 May 2024. This work was make an assumption that interactions between neighboring
supported in part by the National Research Foundation, Singapore, under
the National Research Foundation (NRF) Medium Sized Centre Scheme pedestrians are uniform i.e., two neighboring pedestrians have
(CARTIN); and in part by Agency for Science, Technology and Research equal influence in both directions. Typically, the interactions
(A*STAR) from the National Robotics Programme (NRP), Singapore, under between the agent and adjacent pedestrian are likely to have
Grant 192 25 00049. The Associate Editor for this article was C. Sundarabalan.
(Corresponding author: Xudong Jiang.) asymmetric relationships. For instance, an agent or pedestrian
Haoge Jiang and Xudong Jiang are with the School of Electrical and in the front will have a higher influence than the agent or
Electronic Engineering, Nanyang Technological University, Singapore 639798 pedestrian in the back and vice-versa [19]. Modeling such
(e-mail: [email protected]; [email protected]).
Niraj Bhujel, Zhuoyi Lin, Kong-Wah Wan, Jun Li, and Senthilnath Jayavelu information is critical for identifying important neighbors in
are with the Institute for Infocomm Research, Agency for Science, Tech- the crowd. Moreover, these methods are built on the Deep
nology and Research (A*STAR), Singapore 138632 (e-mail: bhuj0001@ V-learning RL method [5], which necessitates the discretiza-
e.ntu.edu.sg; [email protected]; [email protected]; jli@i2r.
a-star.edu.sg; [email protected]). tion of the robot’s action space and traverses a discrete action
Digital Object Identifier 10.1109/TITS.2023.3343923 space to find the action with the maximum value from its
1558-0016 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
5086 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 6, JUNE 2024

the robot navigation policy by searching for finer optimal


action in the wider range of action space.
• The proposed method is evaluated on real-world crowd
datasets consisting of various situations like collision
avoidance, group avoidance, walking parallel, and so
on. We demonstrate how the proposed edge-wise gating
mechanism enables the robot to reach its destination
while paying attention to its neighboring humans in a
crowded environment.
Fig. 1. An overview of the interaction model used in our approach.
Varying levels of interactions between agents are encoded using directed edges
between the nodes. II. R ELATED W ORKS
A. Crowd Navigation
The reaction-based method like RVO [4] and ORCA [7]
value network. As a result, the size of the discrete action solve collision avoidance problems by modeling other agents
space affects the final action’s optimality, hence affecting the as dynamic obstacles to find optimal collision-free velocities
performance of the navigation policy. under reciprocal assumptions. While these models perform
To this end, we propose a bidirectional graph based Deep adequately where all agents perform the same policy, they are
Reinforcement Learning (DRL) approach via an actor-critic suboptimal in real-world crowd scenes since the assumption
framework for robot navigation in continuous action space in of reciprocity could be violated and the pedestrians’ motion
a human crowded environment. In particular, the environment is unpredictable and can be uncooperative. The performance
state is represented by a graph where information about can be significantly affected due to the invisible setting of the
humans and robot are stored as node attributes. Then, the robot and the increased size of the crowd.
relation between the robot and humans is represented by Another approach is using trajectory-based methods, to plan
constructing bidirectional edges on the graphs and stored as a feasible route for the robot before the motion by predicting
edge attributes. Fig. 1 illustrates how we explicitly repre- other agents’ intended paths [9], [10], [11], [12], [21]. Long-
sent the human-human and human-robot relationships and term decisions can be made by using the trajectory prediction
their asymmetric influence on each other. The thickness of results of the robot planner. The main problem is that searching
the edges indicates the degree of influence between the for a feasible path online after estimating trajectory sequences
agent and humans. In contrast to [6] and [17], we utilize a can be time-consumed and computationally expensive so the
Message-Passing based GatedGCN (MP-GatedGCN) to learn real-time performance is a concern [22]. Even though it can be
the influence functions between humans and robot, where the completely computed, all of the robot’s available space could
asymmetric interactions are learned by an edge-wise gating be taken up by predicted pedestrian paths, resulting in the
mechanism [20] between the nodes of the graph. We propose robot being overly cautious [23].
a novel message passing function for updating the states of Learning-based approaches have been presented recently as
humans and robot. The message passing function for humans a result of the development of deep learning. Especially Deep
learns the neighbor influences, whereas for the robot, it learns Reinforcement Learning (DRL) has been found great success
the influences from the human neighbors and the robot’s in robot continuous control and many other fields [24], [25],
goal simultaneously. During message passing, the neighboring [26], [27], [28], [29], [30]. CADRL [15] initially described
nodes will exchange their hidden states with each other which robot navigation as a sequential decision-making problem
are then aggregated using a permutation invariant reduce constrained by robot kinematics, and then used reinforcement
function (i.e. max-pooling) to generate the final hidden state of learning to learn the joint state value function. However,
the nodes during the reduction step. The node representation is it evaluates only the pairwise restrictions between robots and
subsequently leveraged by the DRL algorithm to generate con- pedestrians and disregards the interactions between pedestri-
tinuous actions of each agent. More specifically, the learnable ans. This type of end-to-end RL robot navigation strategy
weights of MP-GatedGCN are shared among the agents which has also been utilized in [31], [32], and [33]. To navigate
allows the robot node to generate a human-like navigation in situations of dynamically changing the number of pedestri-
strategy by considering its own experience and its neighbors’ ans, [16] proposes utilizing LSTM to model the pedestrians’
states in the dense crowd. state as a sequence with fixed length. However, LSTM-RL
The main contribution of this research work is outlined assigns attention to surrounding humans according to distance,
below: which is not always reasonable and consistent with human
• In this work, we investigate the importance of asymmetric decision-making. A typical case pointed out in the related
influence between robot and dynamic humans on robot works [6] is that the humans within the perception range
navigation by creating a novel human-human and human- should bring higher attention than those out of the perception
robot interaction model using the Message-Passing Graph range even though the former is further away. Several self-
Convolutional Network. attention models have been developed in recent years to model
• A novel navigation learning structure is proposed, which the interactions between pedestrians and robots, including LM-
is optimized in continuous action space. This optimizes SARL [5] and SARL* [34]. However, these approaches only

Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: LEARNING RELATION IN CROWD USING GATED GCNs FOR DRL-BASED ROBOT NAVIGATION 5087

model pedestrian interactions on a local map, which is just a robot successfully finishes the navigation task, or the robot
partial representation of crowd interactions. makes collisions or the maximum number of episode steps T
is reached.
B. Graph Representation Learning 1) Action Space: Two motion types of action at are con-
sidered for holonomic robots and non-holonomic robots, one
Graph convolutional networks (GCN) has shown great
uses angular and linear velocities [v, ω] and the other uses
promise in learning hidden structure in complex data such as
linear velocities [vx , v y ]. In this work, we select the holonomic
traffic networks [35], [36], social networks [37], physics [38],
model where at = [vx , v y ] for our wheel platform. In contrast
and medical diagnosis [39]. Graph neural networks learn the
to the existing V-learning based methods [5], [6], [15], [16],
node representations by applying message passing mecha-
[17] that are optimized for discrete action space, the action
nism [40] between graph nodes that aggregate information
at in our methods is optimized in a continuous action space
of local neighborhood [41], [42]. The fundamental structural
within the range of [−Vmax ,Vmax ].
characteristics of GCN makes it feasible to model spatial
interaction between pedestrians and robots in a crowd [6], 2) State Graph Representation: The state of the environ-
[17]. Pedestrians and the robot are modeled as graph nodes ment with N humans and a robot is represented by a directed
and their edges represent influences between them. The influ- graph G t (V, E, X, X e ) with  nodes V = {i | i ∈ {1, . . . , N +
ence function is then learned through a learnable adjacency 1}} and edges E = ei, j | i, j ∈ {1, . . . , N + 1}, i ̸ = j .
matrix [17] or through learnable edges [19]. Other factors like A node i ∈ V is connected to node j ∈ V via an undirected
the human gaze can be used to learn the influence function edge ei, j ∈ E if ∥ pi − p j ∥22 ≤ 5, where pi and p j is the
between the nodes [6]. These methods, however, use the high- position of node i and node j respectively. The nodes attribute
level graph representation of the crowd-interactions instead X ∈ R6 store state information of the particular node whereas
of fine-grained node-level representations for generating the the edge attributes X e ∈ R2 store relational information
action of the agents (nodes). between the nodes. Specifically, each edge attribute xi,e j ∈ X e
is the difference between the real-world position between the
neighbor nodes j and i.
III. P ROPOSED A PPROACH Each element of the node attributes x i ∈ R4 , i = 1 . . . N +
We first introduce how the agent decision-making in crowd 1 represents the position { px , p y } and velocity {vx , v y }. Since
is formulated as a reinforcement learning (RL) problem and the action of a robot depends on its goal position, we added
then we will model the environment state as a bidirectional the robot’s goal position {gx , g y } to the attributes of the
graph representation, which leads to our proposed network robot node x r ∈ R6 . Next, we utilize two linear projection
architecture. layers, one for human nodes and the other for the robot
node, to embed the node attributes to a fixed dimensional
A. Overview vector D. Additionally, we model the attractive influence from
the robot’s goal by adding a goal node to G t with node
The navigation problem in this work is for a mobile robot attributes x g = {0, 0, 0, 0, gx , g y } and edge attributes xr,g e =
to navigate across an area with dynamic objects such as {gx − px , g y − p y }, where { px , p y } is the position of robot.
human crowded environments and complete its task to reach The intuition is to facilitate the directional information going
a given target. The state of the environment is given as through the message passing to the robot’s action during the
st = {str , stn }, n ∈ 1 . . . N , where str is the state of the robot node updates.
and stn , n ∈ 1 . . . N are the states of N humans. At each step t,
3) Reward function: To obtain consistent results, all the
the state st of the environment is represented as a set of robot
algorithms were given the same settings and parameters for
and human node state xt = {xtr , xtn }, n ∈ 1 . . . N . In addition,
the reward functions Rt . The reward function is composed of
an edge states xi,e j is created that store relational information
four components according to the configuration of [6]:
between agent i and agent j. Two linear embedding layers are 
used to map the robot node state and human nodes state to a  rfail
 if crashes or timeout
fixed dimensional embedding X = {E tr , E tn }. Next, we create rt = rreach if reaches the target
a bidirectional spatial graph G t with node attributes X and 
rdanger if is too close to humans

edge attributes X e on which message-passing is performed
to extract hidden states h t = {h rt , h nt } of each agent. Finally, where rreach , rfail , and rdanger are defined as follows:
an MLP layer is used to generate the next action of each agent
at = {atr , atn } from h t . The actions will transit the environment • rreach = 1 is the positive reward for reaching the goal
state into the next state st+1 and the process is repeated for without any collision, otherwise 0.
each episode. A reward Rt will be computed for each agent • rfail = -0.25 is the negative penalty for the robot if it
by the reward function described in Section III-A.3 based on collides with an obstacle, otherwise 0.
the action at they executed. The experienced state transition • rdanger is set to make the robot keep an appropriate safe
of the environment will be stored in the prioritized experience distance from the detected obstacle(s). It is defined as:
reply buffer as a tuple Tr which includes graph representations,
action and reward (G t , at , G t+1 , Rt ), which is used to train the rdanger = −0.1 +
dt
if (dt < ri ) (1)
actor-critic network. The episode will be terminated when the 2

Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
5088 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 6, JUNE 2024

Fig. 2. An overview of our proposed approach. The agent’s environment is converted into a graph representation on which message passing is performed
to extract node-level state representation. The extracted representations are then fed into the Actor-Critic Networks (right box) to generate action action-value
pairs at and V for each agent in the environment.

otherwise 0, where ri is the danger layer, dt is the Algorithm 1 Algorithm to Train Our Method
instantaneous distance between the closet obstacle and 1: Initialize actor network π and critic networks V θ1 , V θ2 with
the agent. When the agent moves into the danger layer random parameters θ π , θ V1 , θ V2 .
Initialize target networks θ V1 ← θ V1 , θ V2 ← θ V2 , θ π ← θ π
′ ′ ′
of an obstacle, it will get a penalty, the closer the agent 2:
is to the pedestrian(s), the bigger penalty will be given. 3: Initialize replay buffer B with warm up steps M
In this paper, the ri = 0.2. 4: Set mini-batch N
5: for t = 1 . . . M do
The total reward of the reward function Rt is the summation
6: Build state graph G t from observation st .
of all rt cases as the below equation: 7: Sample action at ∼ π(G t ) + ϵ, ϵ ∼ N (0, σ )
Rt = rfail + rreach + rdanger (2) 8: Get reward Rt and next state graph representation G t+1
9: Store transition tuple (G t , at , Rt , G t+1 , d) in B
10: if t > M then
4) Actor-Critic configuration: We use MP-GatedGCN in 11: Sample N set of transitions (G t , at , Rt , G t+1 , d) from B
both actor and critic to model dynamic interactions between 12: Sample action at+1 ← π ′ (G t+1 ) + ϵ, ϵ ∼ N (0, σ )
Estimate V ′ = r + γ mini=1,2, V θi (G t+1 , at+1 )

robot and humans. Specifically, the message passing mecha- 13:
nism, described in III-B, is used to encode both human-human 14: Update critic networks:
V − V θi (G t , at )
2
θ Vi ← argminθ Vi N −1
P ′
and human-robot interactions. The hidden features of a node 15:
after l layers of message-passing have relevant information 16: if t mod d then
about the states of the neighboring nodes. The underlying 17: Update θ π by the deterministic policy gradient:
spatial relationships between agents and their influences are 18: ∇θ π L ≈ E ∇a V (G t , a|θ V )|a=π(G t ) ∇θ π π(G t )
encoded by the edge features. An aggregation operation is 19: Update target networks:
′ ′
performed on the node and edge features which are used 20: θ Vi ← τ θ Vi + (1 − τ )θ V i
21: π ′ π
θ ← τ θ + (1 − τ )θ π ′
to generate action-value pairs for the robot. Specifically, the
node and edge features are utilized for next layer node feature 22: end if
23: end if
representation h l+1
i , which is then fed to the actor network 24: end for
to generate the action at following its learnt policy π(at |st ).
Afterward, the action at will be passed to the critic network
to generate output V (st ) that can be regarded as the metric of
θ V1 , θ V2 , θ π respectively. And when calculating the Q-value
evaluation. The actor-critic networks arePtrained to maximize
∞ from the critic network, the smaller Q-value between the two
the expected value function V (st ) = t=0 γ E [R (st , at )],
t
critic networks is utilized. The Q-value will be estimated more
where γ is a discount factor that reflects the preference of
precisely in this way. A step-by-step training process is shown
instant rewards over future rewards. In the training procedure,
in Algorithm 1.
the DRL algorithm generates two copies of the neural network
(online and target network) to make training more stable for
the actor network (parametrized by θ π , θ π ) and critic network

B. Message Passing GCN With Edge-Wise Gating Mechanism

(parametrized by θ V , θ V ), noted that the target network The vanilla GCN considers the influence between neigh-
will be updated by soft update after several steps update of boring nodes uniformly. This may not be optimal in the real
the online network. Besides, a double Q learning technique world. Assuming a situation in which the robot is following
is applied to prevent overestimation of the Q-value. More the former pedestrian and a latter pedestrian is trailing the
specifically, based on the online and target network framework, robot, the former pedestrian will have more influence on
the DRL algorithm initializes two critic networks V θ1 , V θ2 the robot than the latter pedestrian. The robot should find
and an actor network π with their corresponding parameters optimal action to avoid collision by paying more attention

Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: LEARNING RELATION IN CROWD USING GATED GCNs FOR DRL-BASED ROBOT NAVIGATION 5089

to the former pedestrian. Hence, the weights from former


pedestrians to the robot must be up-weighted, and the weights
of the latter pedestrians need to be down-weighted due to its
effect being less than the former one. Similarly, the weights
of pedestrians in the opposite moving direction of the robot
must be up-weighted, and the pedestrians that are in the
same direction as the robot are down-weighted; up-weighted
weights of pedestrians that are near the robot and down-
weighted for those that are far away. In this way, the weights
between human-human, and human-robot will be reasonably
Fig. 3. A visualization of the pedestrian environment on our Gazebo
distinguished. Hence the crowd feature extraction ability is simulator. Black, red and white color represents robot, robot’s goal and
enhanced and we can direct the robot’s attention to the more pedestrians respectively.
important pedestrian.
To this end, we adopt GatedGCN [20] that control the
flow of information between the nodes through an edge-wise influences among nodes with different levels of field of view
gating mechanism. GatedGCN has been found particularly (FOV).
suitable for modeling asymmetric interactions among pedes-
trians [19]. Other complex methods such as self-attention IV. S IMULATION E NVIRONMENT
networks GAT [42] can also be used. But in our experiments, To train the policy and evaluate the performance of the
we didn’t observe significant differences between GAT and approaches in the crowd scene, we build a pedestrian simulator
GatedGCN [20] while the latter being faster. Since the GCN base on the Gazebo platform [43] and utilize the Turtlebot3
architecture of [19] is not specifically designed for robot robot as our evaluation wheel platform. The Gazebo pedestrian
behavior, we extend their architecture to take into account simulator with real-world pedestrian trajectories of datasets
of the robot’s goal during the update operation of the robot imported is shown in Fig.4. These figures demonstrate the
node. The node update operation at layer l for human and density level in the simulator. More specifically, according to
robot nodes is as follows where the robot node is specified the pedestrian data from datasets, we let human models (square
with subscript i equals r : cylinders) be respawned and deleted on the ground plane to
 
X simulate the real-world scene (e.g. a shopping mall center or
h l+1
i = h li + ReLU U l h li + ηil j ⊙ V l h lj  (3) a shopping mall street) of human walking into a scene and
j∈Ni leaving out from the scene. An example is shown in Fig.3(b).
  A video demo of our simulator can be viewed in the sup-
h l+1 = h lr + ReLU U l h lr +
X
ηrl j ⊙ V l h lj  (4) plementary video materials (https://youtu.be/CsaVqC4YS-E).
r
Every person with its specified ID has a full trajectory with
j∈{Nr ,Dr }
its appearing time interval and time stamps. We load these
where U l , V l ∈ R6×256 are the learnable matrices shared data from datasets, spawn and delete cylinders that represent
between humans and robot nodes, ⊙ denotes the Hadamard the humans frame by frame according to their time stamp.
product, ReLU denotes the non-linear activation function and We will then re-load human dataset frames from scratch after
Dr denote the node representing the destination of the robot. rendering all data frames. The robot is updating at a fixed
The term ηil j is a gating function that controls how much frequency where we used 0.25 seconds in the paper. As the
information from neighbor node j should be used for updating keyframes’ time interval of the original dataset is not the
the hidden state of node i: same as the robot, we use interpolate function and insert
    consecutive frames between every keyframe in their interval to
σ eil j σ erl j
ηil j = P   , ηrl j =   make the trajectory smoother and be able to synchronize the
σ eil j ′ + ϵ
P
σ erl j ′ + ϵ executive time step of human and robot. The inserted frame
j ′ ∈Ni j ′ ∈{Nr ,Dr } node information is based on the last frame node information,
(5) where [ px , p y ] is calculated from interpolate function and
[vx , v y ] is along the direction of the connected line from the
where, σ is the standard sigmoid activation function, ϵ is a last keyframe to the next keyframe.
small fixed constant for numerical instability, and eil j is the UCY and ETH datasets consist of tracked video segments
edge feature which is obtained by; of pedestrian crowds recorded from an elevated bird-eye
  camera in five different environments. An example is shown
eil j = eil−1
j + ReLU A l l−1
h i + B l l−1
h j + C l l−1
e ij (6)
  in Fig.3(a). The environment constitutes different crowd levels
erl j = erl−1 l l−1
+ B l h l−1 + C l erl−1 and varying levels of interaction scenarios such as colli-
j + ReLU A h r r j (7)
sion avoidance, crossing, merging, group formation, and so
where Al , B l ∈ R D×D , C l ∈ R D×D are the learn- on. UCY dataset [44] includes four scenes: STUDENTS001
able matrices. The relational edge features combined with (∼ 40 pedestrians/frame), STUDENTS003 (∼ 26 pedestri-
the gating mechanism facilitate the learning of asymmetric ans/frame), ZARA01 (∼ 26 pedestrians/frame), and ZARA2

Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
5090 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 6, JUNE 2024

Fig. 4. Visualization of crowd density on the simulated environment of ETH/UCY datasets at a particular frame.

TABLE I STUDENTS001 datasets will be used for policy training, and


H YPER PARAMETERS the remained datasets are used for evaluation. In the training
process, the starting point is randomly generated on a 4 m
radius circle, with the circle’s center corresponding to the
distribution center of pedestrians. The goal and starting point
are symmetrical concerning the center of the circle. So the
total length of navigation will be 8 m. Each policy is trained
for 1 × 106 timesteps in the training phase. In the evaluation
process, the reward function with the same setting for all
evaluated approaches will be used to maintain consistency.
The start point and goal point in evaluation follow the rule
described in training, a total length of 8 m with a fixed
start and goal point is set in each scene. To fully show the
method’s efficacy, the robot in the simulated situation is made
invisible to pedestrians. So the robot will have no effect on
TABLE II the pedestrian’s behavior which means the pedestrians will not
S IMULATION PARAMETERS perceive the robot and hence do not avoid it. Besides, this can
alleviate the problem that the robot makes an aggressive invade
policy, which is the situation where the robot invades human
social space and tries to force pedestrians to avoid it. To do
the statistics, each dataset is evaluated more than 250 times.

A. Comparison With ORCA


The conventional dynamic avoidance method ORCA based
on the optimization [7] and under the reciprocity assumption,
performs well in multi-agent navigation scenarios in which all
the agents execute the same ORCA policy. However, it requires
hand-tuned parameters and an optimization function. In addi-
tion, ORCA presupposes that agents would cooperate for half
of the necessary distance for avoidance while pedestrian policy
(∼ 7 pedestrians/frame). ETH dataset [45] includes the in real crowd scenes can be affected by various unknown
HOTEL (∼ 6 pedestrians/frame) and ETH dataset (∼ 6 pedes- reasons, and hence the reciprocity assumption may violate,
trians/frame). The trajectories of all pedestrians are manually resulting in a bad performance. We evaluate the ORCA and RL
extracted every 0.4s i.e. 2.5Hz. We smoothen the trajectory models by success rate and navigation time in the datasets with
by interpolating at 10Hz before feeding to the simulation different densities. As indicated in Table III, RL approaches
environment. have a great improvement in the success rate and navigation
The simulation is running on a server with Intel Core time as compared to ORCA. Particularly in the scenario of the
i9-9900k and NVIDIA GeForce RTX2080Ti. The training STUDENTS003 dataset, the navigation success rate reaches
parameters and simulation parameters are shown in Table I around 0.8 thanks to the attention weight extracted by our
and Table II respectively. MP-GatedGCN. In addition, the prolonged navigation time
demonstrates that ORCA is too cautious in the thick crowd.
V. S IMULATION R ESULTS AND A NALYSIS
This section compares the proposed approach to other B. Comparison With the State-of-the-Art Methods
methods. The generated pedestrian information is from the We first compare our proposed model with the State-of-
aforementioned real-world human trajectory datasets, the the-Art RL method G-GCNRL [6]. Note that the result of

Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: LEARNING RELATION IN CROWD USING GATED GCNs FOR DRL-BASED ROBOT NAVIGATION 5091

TABLE III
Q UANTITATIVE C OMPARISON OF S UCCESS R ATE AND NAVIGATION T IME W ITH THE S TATE - OF - THE -A RT M ETHOD

TABLE IV
C OMPARISON OF THE P ROPOSED M ETHOD AGAINST O UR BASELINE M ODELS

G-GCNRL is from the original paper since the source code comparing the G-GCNRL with U-GCNRL in table III, we can
is not released. We attempt to reproduce the experimental also notice that the improvement of G-GCNRL is mainly on
environment similar to that of the original paper as much as the dataset STUDENTS003, the most density environment,
possible in our gazebo simulator (e.g. trajectory datasets, robot which indicates the importance of distributing correct attention
prefer velocity, robot state representation, etc.) In contrast to the proper subject in the crowd environment. In contrast, our
to G-GCNRL which uses all pedestrians in the environment, method uses a gate mechanism and considers the asymmetric
we use only pedestrians that are within a radius of 5 meters influence of nodes. The attention is obtained from end-to-end
from the robot position, as the environment in the real world training, with no need for additional information. This demon-
is rarely able to be fully observed. The models are compared strates that the attention extracted by our data-driven method,
using two criteria: success rate and navigation time. Table III MP-GatedGCN with gate mechanism, is more promising than
shows the experimental results for four datasets. The success the method using the supervised learning method G-GCNRL,
rate of our approach is greater than G-GCNRL and higher than which supervised learns the attention by expert human gaze
6% of the model U-GCNRL. data.
In the STUDENTS003 dataset, our MP-GatedGCN model
outperforms the U-GCNRL and G-GCNRL in terms of success
rates and navigation time. However, the U-GCNRL model C. Ablation Studies
performs better in navigation time in the ZARA2, HOTEL, and We generate an ablation model that modeled the state as
ETH datasets, and also gets a satisfactory result in success rate. a DNN 1-D sequence representation and named this model
We can notice that these datasets have a common characteristic DNN-RL. This model concatenates the pedestrian state after
that is with low density-level (under 10 pedestrians per frame). the robot state. The number of surrounding pedestrians will
The possible explanation is that when crowds’ density is sparse be fixed at 40, so the sequence will be a vector with a fixed
(lower than 8 pedestrians per frame), the specific attention length of (1+40) x 6. We also implemented a GCN-RL model
mechanism is less important. It shortens navigation time by in our environment that uses graph representation to model
showing less focus on the crowd’s attention weights. As for the state. Besides, uniform attention is applied to all sensed
crowd navigation in the STUDENTS003 dataset (density of pedestrian nodes (5 meters) in this model, so the structure and
over 26 pedestrians per frame), our method has the best suc- function of the GCN-RL are similar to the U-GCNRL. From
cess rate and the shortest navigation time among all methods. the result in Tables III and IV, we can see that the performance
Even though the improvement is slight, there is one thing of U-GCNRL and GCN-RL only have a slight difference. The
worth noting that we do not need additional information such navigation time in the different kinds of density presents a
as human gaze expert data. The G-GCNRL paper describes similar trend and character as well. However, U-GCNRL gets
that the human gaze expert data is extracted by the Tobii a slightly lower success rate. The reason is that the U-GCNRL
Pro X60 remote eye tracker. And the way expert provides work is highly based on Deep V-learning code from [5] and the
the human gaze data is similar to work that collects gaze Deep V-learning method uses the discrete action space which
data while human play the Atari games [46]. So it means divides action space into 5 samples. We use the continuous
that the human gaze data collection requires an expert to have action space thus providing a wider range for searching for
a full observation view of the environment (a top view of the the optimal action, resulting in better navigation performance.
fully observed environment), which is very difficult to build The DNN-RL model performs poorly in our evaluation as
in the real world only by local sensors in the robot, so this it is unable to capture spatial relationships in state model-
work is hard to extend to real-world experiment. What’s more, ing. Since the state vector has a fixed length in a sparse

Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
5092 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 6, JUNE 2024

Fig. 5. Visualization of attention weights. The direction of influences from the neighboring pedestrians on the robot is shown by the black arrows. The number
represents the degree of influence by each pedestrian on the robot i.e. attention of the robot to the corresponding pedestrians. Short red arrows represent the
direction of the corresponding agent.

TABLE V Eq.(5), and this process is illustrated in Fig.5. The gating value
DANGERS /E PISODE M ETRIC IN A BLATION S TUDY ηil j plays a role of a soft attention mechanism [47], manifesting
different magnitude of attention values to the human/goal
objects. Elevated attention values indicate a more pronounced
influence of the human/goal object on the robot. For better
visualization, we only display the attention between human-
robot edges because we are using the fixed trajectories from
datasets and thus the attention between humans is unnecessary.
Note that the attention between human-human is used for the
environment, the pedestrians’ information will be none or
robot to learn the influence function during training, we only
0 in the sequence. As a result, its performance is not sta-
make it invisible in visualization. As shown in the figures of
ble across datasets. In addition, we observe a significant
Fig.5, the influence exerted by neighboring pedestrians on the
decrease in DNN-RL performance in the extreme crowd
robot is illustrated by the black arrows. The numerical values
scene (STUDENTS003 dataset). This is because the robot is
indicate the extent to which each pedestrian affects the robot,
unable to find the most appropriate action to perform collision
signifying the robot’s attention towards the respective pedes-
avoidance. After all, it does not generate proper attention
trians/goal. Meanwhile, short red arrows depict the direction
distribution for the surrounding pedestrians.
in which the robot is headed.
We also introduce a new metric named dangers per episode
From the illustration of figures, the robot’s priority of
to evaluate the effectiveness of our method that considers the
attention weights changes over the state. The robot initially
asymmetric influence of humans to the robot. Once the robot
concentrates on the left top approaching pedestrians since
moves close to a human with a distance lower than 0.2 meters,
they are with the same moving trend and are possible to
it will be counted as one instance of danger. The metric is
collide, so the highest priority attention weight is distributed
computed by the total number of danger counts divided by the
in Fig.5 (a). Then after several steps in Fig.5 (b), the front
total number of episodes. It evaluates whether the robot learns
pedestrian passed and a group approached humans from the
safe and efficient behavior to avoid obstacles while moving
right side. The attention on the right side increases due to
to the target. From Table V we can see that it is helpful for
the potential collision risk. The robot then turns a left angle
the robot to learn the optimized actions by constructing the
to avoid colliding right with the humans. After that, with
spatial relationship of the crowd. Compared to the DNN-RL
the pedestrians approaching closer, the attention weights of
model and the GCN-RL model, the proposed MP-GatedGCN-
the right-side group of humans become higher accordingly
RL model performs better with less number of danger cases
in Figs.5 (c-d). Then the crowd on the left and right sides
generated. It is worth noting that the best performance of this
converged in Figs.5 (e-f) over time, severely impeding the
metric achieved by our method demonstrates the importance
robot’s forward path. The robot precisely concentrates on the
of considering the asymmetric influence between agents.
most critical pedestrian with correct attention priority and
takes optimal action to avoid collisions. After moderately
D. Attention Visualization navigating out of the crowd, the robot’s attention shifted to
In order to analyze how robot attend to human changes, the goal, the goal’s attention priority increased, and then the
we show the attention values learned by the model using robot proceeded directly there in Figs.5 (g-h).

Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: LEARNING RELATION IN CROWD USING GATED GCNs FOR DRL-BASED ROBOT NAVIGATION 5093

Fig. 6. Visualization of robot trajectories during the evaluation phase. The sizes of the circles represent the timestamps.

It is worth noting that in Figs.5 (c-d), the converging More simulation cases can be found in the supplementary
crowd that tends to block the forward way of the robot may video materials (https://youtu.be/CsaVqC4YS-E).
lead to a “freezing” problem. However, our method finer
distinguishes the attention and the robot always focuses on VI. D ISCUSSION
the most critical human in the crowd. So the robot still takes
the left action even though there will be another potential Similar to other deep learning approaches, the current model
collision risk with the left human. After that, the attention has some limitations, such as the inability to decelerate or
weight on this human increase as the left human moves closer. accelerate abruptly to avoid colliding with pedestrians sud-
The robot then changes the collision avoidance priority and denly arising from the corner or suddenly deviating from
prioritizes the calculation of the collision-free action for this their normal path. In addition, in the simulation environment,
human. So our method is more adaptive in extremely crowded the robot’s motion seems to perfectly synchronize with the
environments than previous works and prevents the robot human movement, whereas in the real datasets, we have
trapped in “freezing” danger. observed that its motion is quite slow compared with pedes-
As previously demonstrated, the attention mechanism can trians, which may result in the robot being kicked off by
assist the robot in identifying targets that require higher the pedestrians. Besides, we infer the pedestrians’ intention
attention during navigation. Especially in crowded situa- using only the past observations of pedestrian position and
tions, the robot with assistance from the attention mecha- velocity. This may not be enough for an accurate prediction
nism in alleviating computational load and avoids potential of future intention. Other factors like the hidden intentions
risks. of humans and sudden changes of decisions are not cap-
tured in observation history. Thus, creating robot behavior
to adapt to pedestrian preferences and behaviors is still
E. Trajectories Visualization an open challenge that is worthwhile to address in future
works.
Fig.6 demonstrates the trajectories that our method gener-
ated in the evaluation datasets. The circles of different sizes
and colors correspond to the various timestamps. The robot’s VII. C ONCLUSION
path is denoted by a red line, while the start point and In this work, we propose a crowd navigation policy with
destination are represented by a blue outer circle and star, relation graph learning and investigate the importance of
respectively. In sparse environments like ETH, HOTEL, and asymmetric influence between dynamic humans and robot.
ZARA2 datasets, our method successfully avoids approaching In the state processing stage, the human/robot states are rep-
pedestrians and reaches the goal while generating a smooth resented by a bidirectional graph, with their spatial-temporal
and straight path for the robot. In the dense environment interactions stored as the node and edge attributes. Then a
STUDENTS003 dataset, while successfully performing the MP-GatedGCN model learns the asymmetric influence of the
navigation tasks, the path taken by our method indicates robot and pedestrians by using an edge-wise gating mechanism
that even in the most congested area, where the timestamp between the nodes of the graph to distribute proper attention
is between 6 and 10 seconds, the robot continues to move to the adjacent pedestrians. This attention is finally utilized
smoothly and does not pause in place for additional hesitation. for the DRL network to optimize the robot’s action to derive

Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
5094 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 6, JUNE 2024

safe and efficient navigation. We show that our model out- [19] N. Bhujel, Y. W. Yun, H. Wang, and V. P. Dwivedi, “Self-critical
performs the baselines and illustrates its performance on the learning of influencing factors for trajectory prediction using gated graph
convolutional network,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst.
capability to handle the challenging crowd navigation scenes (IROS), Sep. 2021, pp. 7904–7910.
on real-world pedestrian trajectory datasets with varying crowd [20] X. Bresson and T. Laurent, “Residual gated graph ConvNets,” 2017,
densities. arXiv:1711.07553.
[21] C. Cao, P. Trautman, and S. Iba, “Dynamic channel: A planning
framework for crowd navigation,” in Proc. Int. Conf. Robot. Autom.
ACKNOWLEDGMENT (ICRA), May 2019, pp. 5551–5557.
Any opinions, findings, and conclusions or recommenda- [22] K. Driggs-Campbell, V. Govindarajan, and R. Bajcsy, “Integrating intu-
itive driver models in autonomous planning for interactive maneuvers,”
tions expressed in this material are those of the author(s) and IEEE Trans. Intell. Transp. Syst., vol. 18, no. 12, pp. 3461–3472,
do not reflect the views of National Research Foundation, Dec. 2017.
Singapore. [23] K. Driggs-Campbell, R. Dong, and R. Bajcsy, “Robust, informative
human-in-the-loop predictions via empirical reachable sets,” IEEE Trans.
Intell. Vehicles, vol. 3, no. 3, pp. 300–309, Sep. 2018.
R EFERENCES [24] V. Mnih et al., “Playing Atari with deep reinforcement learning,” 2013,
[1] T. Kruse, A. K. Pandey, R. Alami, and A. Kirsch, “Human-aware robot arXiv:1312.5602.
navigation: A survey,” Robot. Auto. Syst., vol. 61, no. 12, pp. 1726–1743, [25] H. Jiang, H. Wang, W.-Y. Yau, and K.-W. Wan, “A brief survey: Deep
Dec. 2013. reinforcement learning in mobile robot navigation,” in Proc. 15th IEEE
[2] P. Du et al., “Online monitoring for safe pedestrian-vehicle interactions,” Conf. Ind. Electron. Appl. (ICIEA), Nov. 2020, pp. 592–597.
in Proc. IEEE 23rd Int. Conf. Intell. Transp. Syst. (ITSC), Sep. 2020, [26] K. Wu, H. Wang, M. Abolfazli Esfahani, and S. Yuan, “BND-DDQN:
pp. 1–8. Learn to steer autonomously through deep reinforcement learning,” IEEE
[3] D. Fox, W. Burgard, and S. Thrun, “The dynamic window approach to Trans. Cognit. Develop. Syst., vol. 13, no. 2, pp. 249–261, Jun. 2021.
collision avoidance,” IEEE Robot. Autom. Mag., vol. 4, no. 1, pp. 23–33, [27] Y. F. Chen, M. Everett, M. Liu, and J. P. How, “Socially aware motion
Mar. 1997. planning with deep reinforcement learning,” in Proc. IEEE/RSJ Int.
[4] J. van den Berg, M. Lin, and D. Manocha, “Reciprocal velocity obstacles Conf. Intell. Robots Syst. (IROS), Sep. 2017, pp. 1343–1350.
for real-time multi-agent navigation,” in Proc. IEEE Int. Conf. Robot. [28] H. Jiang et al., “ITD3-CLN: Learn to navigate in dynamic scene through
Autom., May 2008, pp. 1928–1935. deep reinforcement learning,” Neurocomputing, vol. 503, pp. 118–128,
[5] C. Chen, Y. Liu, S. Kreiss, and A. Alahi, “Crowd-robot interaction: Sep. 2022.
Crowd-aware robot navigation with attention-based deep reinforce- [29] H. Zhang, J. Cheng, L. Zhang, Y. Li, and W. Zhang, “H2GNN:
ment learning,” in Proc. Int. Conf. Robot. Autom. (ICRA), May 2019, Hierarchical-hops graph neural networks for multi-robot exploration
pp. 6015–6022. in unknown environments,” IEEE Robot. Autom. Lett., vol. 7, no. 2,
[6] Y. Chen, C. Liu, B. E. Shi, and M. Liu, “Robot navigation in crowds by pp. 3435–3442, Apr. 2022.
graph convolutional networks with attention learned from human gaze,” [30] Z. Rao et al., “Visual navigation with multiple goals based on deep
IEEE Robot. Autom. Lett., vol. 5, no. 2, pp. 2754–2761, Apr. 2020. reinforcement learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32,
[7] J. V. D. Berg, S. J. Guy, M. Lin, and D. Manocha, “Reciprocal n-body no. 12, pp. 5445–5455, Dec. 2021.
collision avoidance,” in Robotics Research. Berlin, Germany: Springer, [31] P. Long, W. Liu, and J. Pan, “Deep-learned collision avoidance policy
2011 pp. 3–19. for distributed multiagent navigation,” IEEE Robot. Autom. Lett., vol. 2,
[8] J. Snape, J. V. D. Berg, S. J. Guy, and D. Manocha, “The hybrid recipro- no. 2, pp. 656–663, Apr. 2017.
cal velocity obstacle,” IEEE Trans. Robot., vol. 27, no. 4, pp. 696–706, [32] J. Jin, N. M. Nguyen, N. Sakib, D. Graves, H. Yao, and M. Jagersand,
Aug. 2011. “Mapless navigation among dynamics with Social-safety-awareness:
[9] G. S. Aoude, B. D. Luders, J. M. Joseph, N. Roy, and J. P. How, A reinforcement learning approach from 2D laser scans,” in Proc. IEEE
“Probabilistically safe motion planning to avoid dynamic obstacles with Int. Conf. Robot. Autom. (ICRA), May 2020, pp. 6979–6985.
uncertain motion patterns,” Auto. Robots, vol. 35, no. 1, pp. 51–76, [33] H. Shi, L. Shi, M. Xu, and K.-S. Hwang, “End-to-end navigation strategy
Jul. 2013. with deep reinforcement learning for mobile robots,” IEEE Trans. Ind.
[10] H. Kretzschmar, M. Spies, C. Sprunk, and W. Burgard, “Socially Informat., vol. 16, no. 4, pp. 2393–2402, Apr. 2020.
compliant mobile robot navigation via inverse reinforcement learning,”
Int. J. Robot. Res., vol. 35, no. 11, pp. 1289–1307, Sep. 2016. [34] K. Li, Y. Xu, J. Wang, and M. Q.-H. Meng, “SARL: Deep reinforcement
learning based human-aware navigation for mobile robot in indoor
[11] P. Trautman, J. Ma, R. M. Murray, and A. Krause, “Robot navigation
environments,” in Proc. IEEE Int. Conf. Robot. Biomimetics (ROBIO),
in dense human crowds: The case for cooperation,” in Proc. IEEE Int.
Dec. 2019, pp. 688–694.
Conf. Robot. Autom., May 2013, pp. 2153–2160.
[12] M. Kuderer, H. Kretzschmar, C. Sprunk, and W. Burgard, “Feature-based [35] B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional
prediction of trajectories for socially compliant navigation,” Robot., Sci. networks: A deep learning framework for traffic forecasting,” 2017,
Syst., vol. 8, pp. 193–200, Jul. 2012. arXiv:1709.04875.
[13] V. V. Unhelkar, C. Pérez-D’Arpino, L. Stirling, and J. A. Shah, “Human– [36] W. Zhang, H. Liu, Y. Liu, J. Zhou, and H. Xiong, “Semi-supervised
robot co-navigation using anticipatory indicators of human walking hierarchical recurrent graph neural network for city-wide parking avail-
motion,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2015, ability prediction,” in Proc. AAAI Conf. Artif. Intell., vol. 34, 2020,
pp. 6183–6190. pp. 1186–1193.
[14] P. Trautman and A. Krause, “Unfreezing the robot: Navigation in dense, [37] F. Monti, F. Frasca, D. Eynard, D. Mannion, and M. M. Bronstein, “Fake
interacting crowds,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., news detection on social media using geometric deep learning,” 2019,
Oct. 2010, pp. 797–803. arXiv:1902.06673.
[15] Y. F. Chen, M. Liu, M. Everett, and J. P. How, “Decentralized non- [38] V. Bapst et al., “Unveiling the predictive power of static structure in
communicating multiagent collision avoidance with deep reinforcement glassy systems,” Nature Phys., vol. 16, no. 4, pp. 448–454, Apr. 2020.
learning,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2017, [39] N. Xu, P. Wang, L. Chen, J. Tao, and J. Zhao, “MR-GNN: Multi-
pp. 285–292. resolution and dual graph neural network for predicting structured entity
[16] M. Everett, Y. F. Chen, and J. P. How, “Motion planning among interactions,” 2019, arXiv:1905.09558.
dynamic, decision-making agents with deep reinforcement learning,” [40] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl,
in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), Oct. 2018, “Neural message passing for quantum chemistry,” in Proc. Int. Conf.
pp. 3052–3059. Mach. Learn., 2017, pp. 1263–1272.
[17] C. Chen, S. Hu, P. Nikdel, G. Mori, and M. Savva, “Relational graph [41] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation
learning for crowd navigation,” in Proc. IEEE/RSJ Int. Conf. Intell. learning on large graphs,” in Proc. Adv. Neural Inf. Process. Syst.,
Robots Syst. (IROS), Oct. 2020, pp. 10007–10013. vol. 30, 2017, pp. 1–11.
[18] T. N. Kipf and M. Welling, “Semi-supervised classification with graph [42] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and
convolutional networks,” 2016, arXiv:1609.02907. Y. Bengio, “Graph attention networks,” Stat, vol. 1050, p. 20, Oct. 2017.

Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: LEARNING RELATION IN CROWD USING GATED GCNs FOR DRL-BASED ROBOT NAVIGATION 5095

[43] N. Koenig and A. Howard, “Design and use paradigms for gazebo, an Jun Li received the B.S. and M.Eng. degrees from
open-source multi-robot simulator,” in Proc. IEEE/RSJ Int. Conf. Intell. the University of Science and Technology of China
Robots Syst. (IROS), Sep. 2004, pp. 2149–2154. (USTC) in 1997 and 2002, respectively, and the
[44] A. Lerner, Y. Chrysanthou, and D. Lischinski, “Crowds by example,” Ph.D. degree from Nanyang Technological Univer-
Comput. Graph. Forum, vol. 26, no. 3, pp. 655–664, 2007. sity, Singapore, in 2007. He is currently a Senior
[45] S. Pellegrini, A. Ess, K. Schindler, and L. van Gool, “You’ll never walk Scientist with the Institute for Infocomm Research
alone: Modeling social behavior for multi-target tracking,” in Proc. IEEE (I2 R), Singapore. His research interests include
12th Int. Conf. Comput. Vis., Sep. 2009, pp. 261–268. biometrics/soft biometrics, AI-enabled robotic per-
[46] R. Zhang et al., “AGIL: Learning attention from human for visuomotor ception, and cognitive navigation without precise
tasks,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 663–679. maps. He has authored a number of high quality
[47] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by papers in these areas. One of these papers was
jointly learning to align and translate,” 2014, arXiv:1409.0473. featured on the cover of Nature Machine Intelligence in June 2022, which he
has coauthored for the contribution of perception modules to the detection and
analysis of unseen tools. He has also translated his research into commercial
Haoge Jiang received the B.S. degree in infor- solutions and granted several software licenses and patents.
mation and telecommunications engineering from
Ming Chuan University, Taiwan, in 2019. He is
currently pursuing the Ph.D. degree with the School
of Electrical and Electronic Engineering, Nanyang
Technological University, Singapore. His research
focuses on the intersection of deep learning, rein-
forcement learning, autonomous navigation, and
robotics.

Niraj Bhujel received the B.E. degree in electron-


ics and communication engineering from Pokhara Senthilnath Jayavelu (Senior Member, IEEE)
University, Nepal, and the Ph.D. degree in electrical received the Ph.D. degree in aerospace engineering
and electronics engineering from Nanyang Techno- from the Indian Institute of Science (IISc), India.
logical University, Singapore, in July 2022. He is He is currently a Senior Scientist with the Institute
currently pursuing the Master of Science degree in for Infocomm Research (I2 R), Agency for Science,
communication networks with the Asian Institute of Technology and Research (A*STAR), Singapore.
Technology, Thailand. He is also a Research Scien- He has published over 100 high-quality papers and
tist with the Institute for Infocomm Research (I2 R), won five best paper awards. His current research
A*STAR. He has published his research works in interests include artificial intelligence, multi-agent
multiple human tracking and path predictions in systems, generative models, online learning, rein-
respected journals and conferences, such as RAL and IROS. His research forcement learning, and optimization. He is a
focuses lie in the intersection of deep learning, robotics, and computer vision. member of Artificial Intelligence, Analytics and Informatics (AI3), A*STAR.
He was a recipient of the A*STAR Singapore International Graduate Award. He has been serving as a Guest Editor/organizing chair/co-chair/PC member
He was a recipient of the Asian Development Bank Scholarships (ADB-JSP). in leading AI and data analytics journals and conferences.

Zhuoyi Lin received the Ph.D. degree in computer


science and engineering from Nanyang Techno-
logical University, Singapore, in 2022. Currently,
he is a Scientist with the Institute for Info-
comm Research (I2 R), Agency for Science, Tech-
nology and Research (A*STAR), Singapore. His
research has been published in leading confer-
ences and journals in related domains (e.g., IJCAI,
ICDM, IEEE T RANSACTIONS ON K NOWLEDGE
AND DATA E NGINEERING , and IEEE T RANSAC -
TIONS ON I NTELLIGENT T RANSPORTATION S YS -
Xudong Jiang (Fellow, IEEE) received the B.E.
TEMS ). His research interests include machine learning and data mining and
and M.E. degrees from the University of Electronic
their applications. Science and Technology of China (UESTC) and
the Ph.D. degree from Helmut Schmidt University,
Kong-Wah Wan received the B.Sc. and M.Sc. Hamburg, Germany. From 1986 to 1993, he was a
degrees from the National University of Singapore Lecturer with UESTC. From 1998 to 2004, he was
and the Ph.D. degree from Nanyang Technological with the Institute for Infocomm Research, A*STAR,
University, Singapore. A Principal Scientist with the Singapore, as a Lead Scientist and the Head of the
Institute for Infocomm Research (I2 R), he is cur- Biometrics Laboratory. He joined Nanyang Techno-
rently heading the Perception Laboratory, Robotics logical University (NTU), Singapore, as a Faculty
Department. He has authored numerous patents and Member in 2004, where he was the Director of the
technical publications in these fields, including best Centre for Information Security from 2005 to 2011. He is currently a Professor
paper awards in international scientific conferences. with the School of EEE, NTU, and also the Director of the Centre for
His research interests include video/image analysis, Information Sciences and Systems. He has authored over 200 articles with over
machine learning, and robotics navigation. He is also 50 articles in IEEE journals, including nine articles in IEEE T RANSACTIONS
actively collaborating with the local industry and government agencies to ON PATTERN A NALYSIS AND M ACHINE I NTELLIGENCE and 18 articles in
deploy technologies, such as robust ID recognition, robotic conveying systems, IEEE T RANSACTIONS ON I MAGE P ROCESSING. His current research inter-
and visual navigation for UAVs. His accolade include the PST Distinguished ests include pattern recognition, computer vision, machine learning, image
ExCEL Innovation Project in 2018 for Project “Automated Passenger In-Car processing, and biometrics. He served as an IFS TC Member for IEEE Signal
Clearance System,” the GovInsider “Best Drone and Robotics Project” Award Processing Society and an Associate Editor for IEEE S IGNAL P ROCESSING
in 2018 for Project “Outdoor Robotic Surveillance Systems,” the MHA Firefly L ETTERS and IEEE T RANSACTIONS ON I MAGE P ROCESSING. He serves as
Silver Award in 2021 for Project MATAR, and the Dunman Partnership Award a Senior Area Editor for IEEE T RANSACTIONS ON I MAGE P ROCESSING and
(Silver) in 2023. the Editor-in-Chief for IET Biometrics.

Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.

You might also like