Learning Relation in Crowd Using Gated Graph Convolutional Networks For DRL-Based Robot Navigation
Learning Relation in Crowd Using Gated Graph Convolutional Networks For DRL-Based Robot Navigation
Learning Relation in Crowd Using Gated Graph Convolutional Networks For DRL-Based Robot Navigation
Abstract— Deep reinforcement learning (DRL) frameworks humans. When a mobile robot has to share an environment
have shown their remarkable effectiveness in learning navigation with rational humans, safe and effective collision avoidance
policy for the mobile robot navigating in a human crowded with socially-acceptable behavior becomes a crucial challeng-
environment. Moreover, attention mechanisms coupled with DRL
allows the robot to identify neighbors with different level of influ- ing aspect [1], [2]. As such, collision avoidance in a human
ence and incorporate them into the robot’s decision. However, crowded environment attracts lots of attention in the robot
as the crowd density increases, attention mechanisms may fail navigation community [3], [4], [5], [6]. Traditional approaches
to identify critical neighbors which can lead to significant drops like DWA [3] and Reciprocal Velocity Obstacle (RVO) meth-
in navigation efficiency. In this work, we aim to address this ods [4], [7], [8] consider all humans in the environments as
limitation by encoding both human-human and human-robot
interaction using a special class of Graph Convolutional Networks static obstacles or make only one-step prediction to find the
(GCN) known as Message-Passing GCN (MP-GCN). In con- next optimal action of the robot. Another line of works [9],
trast to existing methods, where attention between robot and [10], [11], [12], [13] focus on forecasting pedestrian trajecto-
humans are encoded uniformly, the proposed approach named ries prior to planning the robot’s movements. Unfortunately,
MP-GatedGCN-RL encodes asymmetric interactions using the as the crowd density increases, the possible movable space of
combination of novel message-passing function and edge-wise
gating mechanisms. We evaluate our approach on the simu- the robot could be fully occupied by the expected pedestrian
lated environments of ETH/UCY pedestrians datasets consisting paths. Meanwhile, these methods are based on mathemat-
of different scenarios like collision avoidance, group forming, ical models, that may not always yield optimal solutions
diverging, crossing, and so on. Experimental results demonstrate or even “freezing” when the number of humans in motion
that our proposed method outperforms the conventional bench- is very large, which may lead the agent to a dangerous
mark dynamic avoidance method ORCA with a 20.6% increase in
success rate and a 9.1% reduction in navigation time. Moreover, situation [14].
we also achieve a 5.5% enhancement in success rate compared to To alleviate the above problems, recent works suggest
other state-of-the-art DRL-based methods without any additional modeling the robot and human states and their relations
labeled expert data nor prior supervised learning. (interactions). As such, they applied a sequence to encode
Index Terms— Deep reinforcement learning, graph convo- the interactions between an agent and humans, which only
lutional network, collision avoidance, motion and path plan- partially models the interactions or lacks spatial relation
ning, real-time autonomous navigation, autonomous unmanned reasoning, i.e., they only consider explicitly modeling the
vehicles.
robot-human interactions or using a sequence model to store
the crowds’ state [5], [15], [16].
I. I NTRODUCTION
Most recently, graph-based learning methods MP-RGL [17]
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
5086 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 6, JUNE 2024
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: LEARNING RELATION IN CROWD USING GATED GCNs FOR DRL-BASED ROBOT NAVIGATION 5087
model pedestrian interactions on a local map, which is just a robot successfully finishes the navigation task, or the robot
partial representation of crowd interactions. makes collisions or the maximum number of episode steps T
is reached.
B. Graph Representation Learning 1) Action Space: Two motion types of action at are con-
sidered for holonomic robots and non-holonomic robots, one
Graph convolutional networks (GCN) has shown great
uses angular and linear velocities [v, ω] and the other uses
promise in learning hidden structure in complex data such as
linear velocities [vx , v y ]. In this work, we select the holonomic
traffic networks [35], [36], social networks [37], physics [38],
model where at = [vx , v y ] for our wheel platform. In contrast
and medical diagnosis [39]. Graph neural networks learn the
to the existing V-learning based methods [5], [6], [15], [16],
node representations by applying message passing mecha-
[17] that are optimized for discrete action space, the action
nism [40] between graph nodes that aggregate information
at in our methods is optimized in a continuous action space
of local neighborhood [41], [42]. The fundamental structural
within the range of [−Vmax ,Vmax ].
characteristics of GCN makes it feasible to model spatial
interaction between pedestrians and robots in a crowd [6], 2) State Graph Representation: The state of the environ-
[17]. Pedestrians and the robot are modeled as graph nodes ment with N humans and a robot is represented by a directed
and their edges represent influences between them. The influ- graph G t (V, E, X, X e ) with nodes V = {i | i ∈ {1, . . . , N +
ence function is then learned through a learnable adjacency 1}} and edges E = ei, j | i, j ∈ {1, . . . , N + 1}, i ̸ = j .
matrix [17] or through learnable edges [19]. Other factors like A node i ∈ V is connected to node j ∈ V via an undirected
the human gaze can be used to learn the influence function edge ei, j ∈ E if ∥ pi − p j ∥22 ≤ 5, where pi and p j is the
between the nodes [6]. These methods, however, use the high- position of node i and node j respectively. The nodes attribute
level graph representation of the crowd-interactions instead X ∈ R6 store state information of the particular node whereas
of fine-grained node-level representations for generating the the edge attributes X e ∈ R2 store relational information
action of the agents (nodes). between the nodes. Specifically, each edge attribute xi,e j ∈ X e
is the difference between the real-world position between the
neighbor nodes j and i.
III. P ROPOSED A PPROACH Each element of the node attributes x i ∈ R4 , i = 1 . . . N +
We first introduce how the agent decision-making in crowd 1 represents the position { px , p y } and velocity {vx , v y }. Since
is formulated as a reinforcement learning (RL) problem and the action of a robot depends on its goal position, we added
then we will model the environment state as a bidirectional the robot’s goal position {gx , g y } to the attributes of the
graph representation, which leads to our proposed network robot node x r ∈ R6 . Next, we utilize two linear projection
architecture. layers, one for human nodes and the other for the robot
node, to embed the node attributes to a fixed dimensional
A. Overview vector D. Additionally, we model the attractive influence from
the robot’s goal by adding a goal node to G t with node
The navigation problem in this work is for a mobile robot attributes x g = {0, 0, 0, 0, gx , g y } and edge attributes xr,g e =
to navigate across an area with dynamic objects such as {gx − px , g y − p y }, where { px , p y } is the position of robot.
human crowded environments and complete its task to reach The intuition is to facilitate the directional information going
a given target. The state of the environment is given as through the message passing to the robot’s action during the
st = {str , stn }, n ∈ 1 . . . N , where str is the state of the robot node updates.
and stn , n ∈ 1 . . . N are the states of N humans. At each step t,
3) Reward function: To obtain consistent results, all the
the state st of the environment is represented as a set of robot
algorithms were given the same settings and parameters for
and human node state xt = {xtr , xtn }, n ∈ 1 . . . N . In addition,
the reward functions Rt . The reward function is composed of
an edge states xi,e j is created that store relational information
four components according to the configuration of [6]:
between agent i and agent j. Two linear embedding layers are
used to map the robot node state and human nodes state to a rfail
if crashes or timeout
fixed dimensional embedding X = {E tr , E tn }. Next, we create rt = rreach if reaches the target
a bidirectional spatial graph G t with node attributes X and
rdanger if is too close to humans
edge attributes X e on which message-passing is performed
to extract hidden states h t = {h rt , h nt } of each agent. Finally, where rreach , rfail , and rdanger are defined as follows:
an MLP layer is used to generate the next action of each agent
at = {atr , atn } from h t . The actions will transit the environment • rreach = 1 is the positive reward for reaching the goal
state into the next state st+1 and the process is repeated for without any collision, otherwise 0.
each episode. A reward Rt will be computed for each agent • rfail = -0.25 is the negative penalty for the robot if it
by the reward function described in Section III-A.3 based on collides with an obstacle, otherwise 0.
the action at they executed. The experienced state transition • rdanger is set to make the robot keep an appropriate safe
of the environment will be stored in the prioritized experience distance from the detected obstacle(s). It is defined as:
reply buffer as a tuple Tr which includes graph representations,
action and reward (G t , at , G t+1 , Rt ), which is used to train the rdanger = −0.1 +
dt
if (dt < ri ) (1)
actor-critic network. The episode will be terminated when the 2
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
5088 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 6, JUNE 2024
Fig. 2. An overview of our proposed approach. The agent’s environment is converted into a graph representation on which message passing is performed
to extract node-level state representation. The extracted representations are then fed into the Actor-Critic Networks (right box) to generate action action-value
pairs at and V for each agent in the environment.
otherwise 0, where ri is the danger layer, dt is the Algorithm 1 Algorithm to Train Our Method
instantaneous distance between the closet obstacle and 1: Initialize actor network π and critic networks V θ1 , V θ2 with
the agent. When the agent moves into the danger layer random parameters θ π , θ V1 , θ V2 .
Initialize target networks θ V1 ← θ V1 , θ V2 ← θ V2 , θ π ← θ π
′ ′ ′
of an obstacle, it will get a penalty, the closer the agent 2:
is to the pedestrian(s), the bigger penalty will be given. 3: Initialize replay buffer B with warm up steps M
In this paper, the ri = 0.2. 4: Set mini-batch N
5: for t = 1 . . . M do
The total reward of the reward function Rt is the summation
6: Build state graph G t from observation st .
of all rt cases as the below equation: 7: Sample action at ∼ π(G t ) + ϵ, ϵ ∼ N (0, σ )
Rt = rfail + rreach + rdanger (2) 8: Get reward Rt and next state graph representation G t+1
9: Store transition tuple (G t , at , Rt , G t+1 , d) in B
10: if t > M then
4) Actor-Critic configuration: We use MP-GatedGCN in 11: Sample N set of transitions (G t , at , Rt , G t+1 , d) from B
both actor and critic to model dynamic interactions between 12: Sample action at+1 ← π ′ (G t+1 ) + ϵ, ϵ ∼ N (0, σ )
Estimate V ′ = r + γ mini=1,2, V θi (G t+1 , at+1 )
′
robot and humans. Specifically, the message passing mecha- 13:
nism, described in III-B, is used to encode both human-human 14: Update critic networks:
V − V θi (G t , at )
2
θ Vi ← argminθ Vi N −1
P ′
and human-robot interactions. The hidden features of a node 15:
after l layers of message-passing have relevant information 16: if t mod d then
about the states of the neighboring nodes. The underlying 17: Update θ π by the deterministic policy gradient:
spatial relationships between agents and their influences are 18: ∇θ π L ≈ E ∇a V (G t , a|θ V )|a=π(G t ) ∇θ π π(G t )
encoded by the edge features. An aggregation operation is 19: Update target networks:
′ ′
performed on the node and edge features which are used 20: θ Vi ← τ θ Vi + (1 − τ )θ V i
21: π ′ π
θ ← τ θ + (1 − τ )θ π ′
to generate action-value pairs for the robot. Specifically, the
node and edge features are utilized for next layer node feature 22: end if
23: end if
representation h l+1
i , which is then fed to the actor network 24: end for
to generate the action at following its learnt policy π(at |st ).
Afterward, the action at will be passed to the critic network
to generate output V (st ) that can be regarded as the metric of
θ V1 , θ V2 , θ π respectively. And when calculating the Q-value
evaluation. The actor-critic networks arePtrained to maximize
∞ from the critic network, the smaller Q-value between the two
the expected value function V (st ) = t=0 γ E [R (st , at )],
t
critic networks is utilized. The Q-value will be estimated more
where γ is a discount factor that reflects the preference of
precisely in this way. A step-by-step training process is shown
instant rewards over future rewards. In the training procedure,
in Algorithm 1.
the DRL algorithm generates two copies of the neural network
(online and target network) to make training more stable for
the actor network (parametrized by θ π , θ π ) and critic network
′
B. Message Passing GCN With Edge-Wise Gating Mechanism
′
(parametrized by θ V , θ V ), noted that the target network The vanilla GCN considers the influence between neigh-
will be updated by soft update after several steps update of boring nodes uniformly. This may not be optimal in the real
the online network. Besides, a double Q learning technique world. Assuming a situation in which the robot is following
is applied to prevent overestimation of the Q-value. More the former pedestrian and a latter pedestrian is trailing the
specifically, based on the online and target network framework, robot, the former pedestrian will have more influence on
the DRL algorithm initializes two critic networks V θ1 , V θ2 the robot than the latter pedestrian. The robot should find
and an actor network π with their corresponding parameters optimal action to avoid collision by paying more attention
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: LEARNING RELATION IN CROWD USING GATED GCNs FOR DRL-BASED ROBOT NAVIGATION 5089
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
5090 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 6, JUNE 2024
Fig. 4. Visualization of crowd density on the simulated environment of ETH/UCY datasets at a particular frame.
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: LEARNING RELATION IN CROWD USING GATED GCNs FOR DRL-BASED ROBOT NAVIGATION 5091
TABLE III
Q UANTITATIVE C OMPARISON OF S UCCESS R ATE AND NAVIGATION T IME W ITH THE S TATE - OF - THE -A RT M ETHOD
TABLE IV
C OMPARISON OF THE P ROPOSED M ETHOD AGAINST O UR BASELINE M ODELS
G-GCNRL is from the original paper since the source code comparing the G-GCNRL with U-GCNRL in table III, we can
is not released. We attempt to reproduce the experimental also notice that the improvement of G-GCNRL is mainly on
environment similar to that of the original paper as much as the dataset STUDENTS003, the most density environment,
possible in our gazebo simulator (e.g. trajectory datasets, robot which indicates the importance of distributing correct attention
prefer velocity, robot state representation, etc.) In contrast to the proper subject in the crowd environment. In contrast, our
to G-GCNRL which uses all pedestrians in the environment, method uses a gate mechanism and considers the asymmetric
we use only pedestrians that are within a radius of 5 meters influence of nodes. The attention is obtained from end-to-end
from the robot position, as the environment in the real world training, with no need for additional information. This demon-
is rarely able to be fully observed. The models are compared strates that the attention extracted by our data-driven method,
using two criteria: success rate and navigation time. Table III MP-GatedGCN with gate mechanism, is more promising than
shows the experimental results for four datasets. The success the method using the supervised learning method G-GCNRL,
rate of our approach is greater than G-GCNRL and higher than which supervised learns the attention by expert human gaze
6% of the model U-GCNRL. data.
In the STUDENTS003 dataset, our MP-GatedGCN model
outperforms the U-GCNRL and G-GCNRL in terms of success
rates and navigation time. However, the U-GCNRL model C. Ablation Studies
performs better in navigation time in the ZARA2, HOTEL, and We generate an ablation model that modeled the state as
ETH datasets, and also gets a satisfactory result in success rate. a DNN 1-D sequence representation and named this model
We can notice that these datasets have a common characteristic DNN-RL. This model concatenates the pedestrian state after
that is with low density-level (under 10 pedestrians per frame). the robot state. The number of surrounding pedestrians will
The possible explanation is that when crowds’ density is sparse be fixed at 40, so the sequence will be a vector with a fixed
(lower than 8 pedestrians per frame), the specific attention length of (1+40) x 6. We also implemented a GCN-RL model
mechanism is less important. It shortens navigation time by in our environment that uses graph representation to model
showing less focus on the crowd’s attention weights. As for the state. Besides, uniform attention is applied to all sensed
crowd navigation in the STUDENTS003 dataset (density of pedestrian nodes (5 meters) in this model, so the structure and
over 26 pedestrians per frame), our method has the best suc- function of the GCN-RL are similar to the U-GCNRL. From
cess rate and the shortest navigation time among all methods. the result in Tables III and IV, we can see that the performance
Even though the improvement is slight, there is one thing of U-GCNRL and GCN-RL only have a slight difference. The
worth noting that we do not need additional information such navigation time in the different kinds of density presents a
as human gaze expert data. The G-GCNRL paper describes similar trend and character as well. However, U-GCNRL gets
that the human gaze expert data is extracted by the Tobii a slightly lower success rate. The reason is that the U-GCNRL
Pro X60 remote eye tracker. And the way expert provides work is highly based on Deep V-learning code from [5] and the
the human gaze data is similar to work that collects gaze Deep V-learning method uses the discrete action space which
data while human play the Atari games [46]. So it means divides action space into 5 samples. We use the continuous
that the human gaze data collection requires an expert to have action space thus providing a wider range for searching for
a full observation view of the environment (a top view of the the optimal action, resulting in better navigation performance.
fully observed environment), which is very difficult to build The DNN-RL model performs poorly in our evaluation as
in the real world only by local sensors in the robot, so this it is unable to capture spatial relationships in state model-
work is hard to extend to real-world experiment. What’s more, ing. Since the state vector has a fixed length in a sparse
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
5092 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 6, JUNE 2024
Fig. 5. Visualization of attention weights. The direction of influences from the neighboring pedestrians on the robot is shown by the black arrows. The number
represents the degree of influence by each pedestrian on the robot i.e. attention of the robot to the corresponding pedestrians. Short red arrows represent the
direction of the corresponding agent.
TABLE V Eq.(5), and this process is illustrated in Fig.5. The gating value
DANGERS /E PISODE M ETRIC IN A BLATION S TUDY ηil j plays a role of a soft attention mechanism [47], manifesting
different magnitude of attention values to the human/goal
objects. Elevated attention values indicate a more pronounced
influence of the human/goal object on the robot. For better
visualization, we only display the attention between human-
robot edges because we are using the fixed trajectories from
datasets and thus the attention between humans is unnecessary.
Note that the attention between human-human is used for the
environment, the pedestrians’ information will be none or
robot to learn the influence function during training, we only
0 in the sequence. As a result, its performance is not sta-
make it invisible in visualization. As shown in the figures of
ble across datasets. In addition, we observe a significant
Fig.5, the influence exerted by neighboring pedestrians on the
decrease in DNN-RL performance in the extreme crowd
robot is illustrated by the black arrows. The numerical values
scene (STUDENTS003 dataset). This is because the robot is
indicate the extent to which each pedestrian affects the robot,
unable to find the most appropriate action to perform collision
signifying the robot’s attention towards the respective pedes-
avoidance. After all, it does not generate proper attention
trians/goal. Meanwhile, short red arrows depict the direction
distribution for the surrounding pedestrians.
in which the robot is headed.
We also introduce a new metric named dangers per episode
From the illustration of figures, the robot’s priority of
to evaluate the effectiveness of our method that considers the
attention weights changes over the state. The robot initially
asymmetric influence of humans to the robot. Once the robot
concentrates on the left top approaching pedestrians since
moves close to a human with a distance lower than 0.2 meters,
they are with the same moving trend and are possible to
it will be counted as one instance of danger. The metric is
collide, so the highest priority attention weight is distributed
computed by the total number of danger counts divided by the
in Fig.5 (a). Then after several steps in Fig.5 (b), the front
total number of episodes. It evaluates whether the robot learns
pedestrian passed and a group approached humans from the
safe and efficient behavior to avoid obstacles while moving
right side. The attention on the right side increases due to
to the target. From Table V we can see that it is helpful for
the potential collision risk. The robot then turns a left angle
the robot to learn the optimized actions by constructing the
to avoid colliding right with the humans. After that, with
spatial relationship of the crowd. Compared to the DNN-RL
the pedestrians approaching closer, the attention weights of
model and the GCN-RL model, the proposed MP-GatedGCN-
the right-side group of humans become higher accordingly
RL model performs better with less number of danger cases
in Figs.5 (c-d). Then the crowd on the left and right sides
generated. It is worth noting that the best performance of this
converged in Figs.5 (e-f) over time, severely impeding the
metric achieved by our method demonstrates the importance
robot’s forward path. The robot precisely concentrates on the
of considering the asymmetric influence between agents.
most critical pedestrian with correct attention priority and
takes optimal action to avoid collisions. After moderately
D. Attention Visualization navigating out of the crowd, the robot’s attention shifted to
In order to analyze how robot attend to human changes, the goal, the goal’s attention priority increased, and then the
we show the attention values learned by the model using robot proceeded directly there in Figs.5 (g-h).
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: LEARNING RELATION IN CROWD USING GATED GCNs FOR DRL-BASED ROBOT NAVIGATION 5093
Fig. 6. Visualization of robot trajectories during the evaluation phase. The sizes of the circles represent the timestamps.
It is worth noting that in Figs.5 (c-d), the converging More simulation cases can be found in the supplementary
crowd that tends to block the forward way of the robot may video materials (https://youtu.be/CsaVqC4YS-E).
lead to a “freezing” problem. However, our method finer
distinguishes the attention and the robot always focuses on VI. D ISCUSSION
the most critical human in the crowd. So the robot still takes
the left action even though there will be another potential Similar to other deep learning approaches, the current model
collision risk with the left human. After that, the attention has some limitations, such as the inability to decelerate or
weight on this human increase as the left human moves closer. accelerate abruptly to avoid colliding with pedestrians sud-
The robot then changes the collision avoidance priority and denly arising from the corner or suddenly deviating from
prioritizes the calculation of the collision-free action for this their normal path. In addition, in the simulation environment,
human. So our method is more adaptive in extremely crowded the robot’s motion seems to perfectly synchronize with the
environments than previous works and prevents the robot human movement, whereas in the real datasets, we have
trapped in “freezing” danger. observed that its motion is quite slow compared with pedes-
As previously demonstrated, the attention mechanism can trians, which may result in the robot being kicked off by
assist the robot in identifying targets that require higher the pedestrians. Besides, we infer the pedestrians’ intention
attention during navigation. Especially in crowded situa- using only the past observations of pedestrian position and
tions, the robot with assistance from the attention mecha- velocity. This may not be enough for an accurate prediction
nism in alleviating computational load and avoids potential of future intention. Other factors like the hidden intentions
risks. of humans and sudden changes of decisions are not cap-
tured in observation history. Thus, creating robot behavior
to adapt to pedestrian preferences and behaviors is still
E. Trajectories Visualization an open challenge that is worthwhile to address in future
works.
Fig.6 demonstrates the trajectories that our method gener-
ated in the evaluation datasets. The circles of different sizes
and colors correspond to the various timestamps. The robot’s VII. C ONCLUSION
path is denoted by a red line, while the start point and In this work, we propose a crowd navigation policy with
destination are represented by a blue outer circle and star, relation graph learning and investigate the importance of
respectively. In sparse environments like ETH, HOTEL, and asymmetric influence between dynamic humans and robot.
ZARA2 datasets, our method successfully avoids approaching In the state processing stage, the human/robot states are rep-
pedestrians and reaches the goal while generating a smooth resented by a bidirectional graph, with their spatial-temporal
and straight path for the robot. In the dense environment interactions stored as the node and edge attributes. Then a
STUDENTS003 dataset, while successfully performing the MP-GatedGCN model learns the asymmetric influence of the
navigation tasks, the path taken by our method indicates robot and pedestrians by using an edge-wise gating mechanism
that even in the most congested area, where the timestamp between the nodes of the graph to distribute proper attention
is between 6 and 10 seconds, the robot continues to move to the adjacent pedestrians. This attention is finally utilized
smoothly and does not pause in place for additional hesitation. for the DRL network to optimize the robot’s action to derive
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
5094 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 25, NO. 6, JUNE 2024
safe and efficient navigation. We show that our model out- [19] N. Bhujel, Y. W. Yun, H. Wang, and V. P. Dwivedi, “Self-critical
performs the baselines and illustrates its performance on the learning of influencing factors for trajectory prediction using gated graph
convolutional network,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst.
capability to handle the challenging crowd navigation scenes (IROS), Sep. 2021, pp. 7904–7910.
on real-world pedestrian trajectory datasets with varying crowd [20] X. Bresson and T. Laurent, “Residual gated graph ConvNets,” 2017,
densities. arXiv:1711.07553.
[21] C. Cao, P. Trautman, and S. Iba, “Dynamic channel: A planning
framework for crowd navigation,” in Proc. Int. Conf. Robot. Autom.
ACKNOWLEDGMENT (ICRA), May 2019, pp. 5551–5557.
Any opinions, findings, and conclusions or recommenda- [22] K. Driggs-Campbell, V. Govindarajan, and R. Bajcsy, “Integrating intu-
itive driver models in autonomous planning for interactive maneuvers,”
tions expressed in this material are those of the author(s) and IEEE Trans. Intell. Transp. Syst., vol. 18, no. 12, pp. 3461–3472,
do not reflect the views of National Research Foundation, Dec. 2017.
Singapore. [23] K. Driggs-Campbell, R. Dong, and R. Bajcsy, “Robust, informative
human-in-the-loop predictions via empirical reachable sets,” IEEE Trans.
Intell. Vehicles, vol. 3, no. 3, pp. 300–309, Sep. 2018.
R EFERENCES [24] V. Mnih et al., “Playing Atari with deep reinforcement learning,” 2013,
[1] T. Kruse, A. K. Pandey, R. Alami, and A. Kirsch, “Human-aware robot arXiv:1312.5602.
navigation: A survey,” Robot. Auto. Syst., vol. 61, no. 12, pp. 1726–1743, [25] H. Jiang, H. Wang, W.-Y. Yau, and K.-W. Wan, “A brief survey: Deep
Dec. 2013. reinforcement learning in mobile robot navigation,” in Proc. 15th IEEE
[2] P. Du et al., “Online monitoring for safe pedestrian-vehicle interactions,” Conf. Ind. Electron. Appl. (ICIEA), Nov. 2020, pp. 592–597.
in Proc. IEEE 23rd Int. Conf. Intell. Transp. Syst. (ITSC), Sep. 2020, [26] K. Wu, H. Wang, M. Abolfazli Esfahani, and S. Yuan, “BND-DDQN:
pp. 1–8. Learn to steer autonomously through deep reinforcement learning,” IEEE
[3] D. Fox, W. Burgard, and S. Thrun, “The dynamic window approach to Trans. Cognit. Develop. Syst., vol. 13, no. 2, pp. 249–261, Jun. 2021.
collision avoidance,” IEEE Robot. Autom. Mag., vol. 4, no. 1, pp. 23–33, [27] Y. F. Chen, M. Everett, M. Liu, and J. P. How, “Socially aware motion
Mar. 1997. planning with deep reinforcement learning,” in Proc. IEEE/RSJ Int.
[4] J. van den Berg, M. Lin, and D. Manocha, “Reciprocal velocity obstacles Conf. Intell. Robots Syst. (IROS), Sep. 2017, pp. 1343–1350.
for real-time multi-agent navigation,” in Proc. IEEE Int. Conf. Robot. [28] H. Jiang et al., “ITD3-CLN: Learn to navigate in dynamic scene through
Autom., May 2008, pp. 1928–1935. deep reinforcement learning,” Neurocomputing, vol. 503, pp. 118–128,
[5] C. Chen, Y. Liu, S. Kreiss, and A. Alahi, “Crowd-robot interaction: Sep. 2022.
Crowd-aware robot navigation with attention-based deep reinforce- [29] H. Zhang, J. Cheng, L. Zhang, Y. Li, and W. Zhang, “H2GNN:
ment learning,” in Proc. Int. Conf. Robot. Autom. (ICRA), May 2019, Hierarchical-hops graph neural networks for multi-robot exploration
pp. 6015–6022. in unknown environments,” IEEE Robot. Autom. Lett., vol. 7, no. 2,
[6] Y. Chen, C. Liu, B. E. Shi, and M. Liu, “Robot navigation in crowds by pp. 3435–3442, Apr. 2022.
graph convolutional networks with attention learned from human gaze,” [30] Z. Rao et al., “Visual navigation with multiple goals based on deep
IEEE Robot. Autom. Lett., vol. 5, no. 2, pp. 2754–2761, Apr. 2020. reinforcement learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32,
[7] J. V. D. Berg, S. J. Guy, M. Lin, and D. Manocha, “Reciprocal n-body no. 12, pp. 5445–5455, Dec. 2021.
collision avoidance,” in Robotics Research. Berlin, Germany: Springer, [31] P. Long, W. Liu, and J. Pan, “Deep-learned collision avoidance policy
2011 pp. 3–19. for distributed multiagent navigation,” IEEE Robot. Autom. Lett., vol. 2,
[8] J. Snape, J. V. D. Berg, S. J. Guy, and D. Manocha, “The hybrid recipro- no. 2, pp. 656–663, Apr. 2017.
cal velocity obstacle,” IEEE Trans. Robot., vol. 27, no. 4, pp. 696–706, [32] J. Jin, N. M. Nguyen, N. Sakib, D. Graves, H. Yao, and M. Jagersand,
Aug. 2011. “Mapless navigation among dynamics with Social-safety-awareness:
[9] G. S. Aoude, B. D. Luders, J. M. Joseph, N. Roy, and J. P. How, A reinforcement learning approach from 2D laser scans,” in Proc. IEEE
“Probabilistically safe motion planning to avoid dynamic obstacles with Int. Conf. Robot. Autom. (ICRA), May 2020, pp. 6979–6985.
uncertain motion patterns,” Auto. Robots, vol. 35, no. 1, pp. 51–76, [33] H. Shi, L. Shi, M. Xu, and K.-S. Hwang, “End-to-end navigation strategy
Jul. 2013. with deep reinforcement learning for mobile robots,” IEEE Trans. Ind.
[10] H. Kretzschmar, M. Spies, C. Sprunk, and W. Burgard, “Socially Informat., vol. 16, no. 4, pp. 2393–2402, Apr. 2020.
compliant mobile robot navigation via inverse reinforcement learning,”
Int. J. Robot. Res., vol. 35, no. 11, pp. 1289–1307, Sep. 2016. [34] K. Li, Y. Xu, J. Wang, and M. Q.-H. Meng, “SARL: Deep reinforcement
learning based human-aware navigation for mobile robot in indoor
[11] P. Trautman, J. Ma, R. M. Murray, and A. Krause, “Robot navigation
environments,” in Proc. IEEE Int. Conf. Robot. Biomimetics (ROBIO),
in dense human crowds: The case for cooperation,” in Proc. IEEE Int.
Dec. 2019, pp. 688–694.
Conf. Robot. Autom., May 2013, pp. 2153–2160.
[12] M. Kuderer, H. Kretzschmar, C. Sprunk, and W. Burgard, “Feature-based [35] B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional
prediction of trajectories for socially compliant navigation,” Robot., Sci. networks: A deep learning framework for traffic forecasting,” 2017,
Syst., vol. 8, pp. 193–200, Jul. 2012. arXiv:1709.04875.
[13] V. V. Unhelkar, C. Pérez-D’Arpino, L. Stirling, and J. A. Shah, “Human– [36] W. Zhang, H. Liu, Y. Liu, J. Zhou, and H. Xiong, “Semi-supervised
robot co-navigation using anticipatory indicators of human walking hierarchical recurrent graph neural network for city-wide parking avail-
motion,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2015, ability prediction,” in Proc. AAAI Conf. Artif. Intell., vol. 34, 2020,
pp. 6183–6190. pp. 1186–1193.
[14] P. Trautman and A. Krause, “Unfreezing the robot: Navigation in dense, [37] F. Monti, F. Frasca, D. Eynard, D. Mannion, and M. M. Bronstein, “Fake
interacting crowds,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., news detection on social media using geometric deep learning,” 2019,
Oct. 2010, pp. 797–803. arXiv:1902.06673.
[15] Y. F. Chen, M. Liu, M. Everett, and J. P. How, “Decentralized non- [38] V. Bapst et al., “Unveiling the predictive power of static structure in
communicating multiagent collision avoidance with deep reinforcement glassy systems,” Nature Phys., vol. 16, no. 4, pp. 448–454, Apr. 2020.
learning,” in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2017, [39] N. Xu, P. Wang, L. Chen, J. Tao, and J. Zhao, “MR-GNN: Multi-
pp. 285–292. resolution and dual graph neural network for predicting structured entity
[16] M. Everett, Y. F. Chen, and J. P. How, “Motion planning among interactions,” 2019, arXiv:1905.09558.
dynamic, decision-making agents with deep reinforcement learning,” [40] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl,
in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), Oct. 2018, “Neural message passing for quantum chemistry,” in Proc. Int. Conf.
pp. 3052–3059. Mach. Learn., 2017, pp. 1263–1272.
[17] C. Chen, S. Hu, P. Nikdel, G. Mori, and M. Savva, “Relational graph [41] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation
learning for crowd navigation,” in Proc. IEEE/RSJ Int. Conf. Intell. learning on large graphs,” in Proc. Adv. Neural Inf. Process. Syst.,
Robots Syst. (IROS), Oct. 2020, pp. 10007–10013. vol. 30, 2017, pp. 1–11.
[18] T. N. Kipf and M. Welling, “Semi-supervised classification with graph [42] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and
convolutional networks,” 2016, arXiv:1609.02907. Y. Bengio, “Graph attention networks,” Stat, vol. 1050, p. 20, Oct. 2017.
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: LEARNING RELATION IN CROWD USING GATED GCNs FOR DRL-BASED ROBOT NAVIGATION 5095
[43] N. Koenig and A. Howard, “Design and use paradigms for gazebo, an Jun Li received the B.S. and M.Eng. degrees from
open-source multi-robot simulator,” in Proc. IEEE/RSJ Int. Conf. Intell. the University of Science and Technology of China
Robots Syst. (IROS), Sep. 2004, pp. 2149–2154. (USTC) in 1997 and 2002, respectively, and the
[44] A. Lerner, Y. Chrysanthou, and D. Lischinski, “Crowds by example,” Ph.D. degree from Nanyang Technological Univer-
Comput. Graph. Forum, vol. 26, no. 3, pp. 655–664, 2007. sity, Singapore, in 2007. He is currently a Senior
[45] S. Pellegrini, A. Ess, K. Schindler, and L. van Gool, “You’ll never walk Scientist with the Institute for Infocomm Research
alone: Modeling social behavior for multi-target tracking,” in Proc. IEEE (I2 R), Singapore. His research interests include
12th Int. Conf. Comput. Vis., Sep. 2009, pp. 261–268. biometrics/soft biometrics, AI-enabled robotic per-
[46] R. Zhang et al., “AGIL: Learning attention from human for visuomotor ception, and cognitive navigation without precise
tasks,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 663–679. maps. He has authored a number of high quality
[47] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by papers in these areas. One of these papers was
jointly learning to align and translate,” 2014, arXiv:1409.0473. featured on the cover of Nature Machine Intelligence in June 2022, which he
has coauthored for the contribution of perception modules to the detection and
analysis of unseen tools. He has also translated his research into commercial
Haoge Jiang received the B.S. degree in infor- solutions and granted several software licenses and patents.
mation and telecommunications engineering from
Ming Chuan University, Taiwan, in 2019. He is
currently pursuing the Ph.D. degree with the School
of Electrical and Electronic Engineering, Nanyang
Technological University, Singapore. His research
focuses on the intersection of deep learning, rein-
forcement learning, autonomous navigation, and
robotics.
Authorized licensed use limited to: B.S. ABDUR RAHMAN INSTITUTE OF SCIENCE & TECH. Downloaded on October 28,2024 at 05:51:49 UTC from IEEE Xplore. Restrictions apply.