Goal-Driven Autonomous Exploration Through Deep Reinforcement Learning
Goal-Driven Autonomous Exploration Through Deep Reinforcement Learning
2, APRIL 2022
Abstract—In this letter, we present an autonomous naviga- environmental dangers, and others. Subsequently, autonomous
tion system for goal-driven exploration of unknown environments exploration is an area that is garnering a lot of attention with on-
through deep reinforcement learning (DRL). Points of interest going research in the ability to delegate navigation and mapping
(POI) for possible navigation directions are obtained from the envi- tasks to autonomous vehicles. But unlike regular environment
ronment and an optimal waypoint is selected, based on the available exploration, where the task is only to map the surroundings,
data. Following the waypoints, the robot is guided towards the
global goal and the local optimum problem of reactive navigation
fully autonomous goal-driven exploration is a twofold problem.
is mitigated. Then, a motion policy for local navigation is learned First, the exploration robot needs to make a decision on where
through a DRL framework in a simulation. We develop a navigation to go to have the highest possibility of arriving at the global
system where this learned policy is integrated into a motion plan- goal. Without prior information or the global goal in sight, the
ning stack as the local navigation layer to move the robot between system needs to point to possible navigation directions directly
waypoints towards a global goal. The fully autonomous navigation from the sensor data. From such points of interest (POI) the
is performed without any prior knowledge while a map is recorded best possible one needs to be selected as a waypoint to guide the
as the robot moves through the environment. Experiments show robot to the global goal in the most optimal way. Second, a robot
that the proposed method has an advantage over similar explo- motion policy, that does not depend on map data, for uncertain
ration methods, without reliance on a map or prior information in environments needs to be obtained. With the recent advances in
complex static as well as dynamic environments.
deep reinforcement learning (DRL) for robot navigation, high-
Index Terms—AI-enabled robotics, reinforcement learning, precision decision-making has become feasible for autonomous
sensor-based Control. agents. Using DRL, an agent control policy can be learned to
achieve the target task in an unknown environment [2]. However,
due to its reactive nature and lack of global information, it easily
I. INTRODUCTION
encounters the local optimum problem, especially for large-scale
VER the last couple of decades, the field of simultane- navigation tasks [3].
O ous localization and mapping (SLAM) has been studied
extensively. Typically, in SLAM systems a person operates a
Therefore, we present a fully autonomous exploration system
for navigation to a global goal, without the necessity of human
measuring device and a map is generated from the location and control or prior information about the environment. Points of
the environmental landmarks [1]. The operator makes decisions, interest are extracted from the immediate vicinity of the robot,
which parts of the previously unmapped environment to visit and evaluated, and one of them is selected as a waypoint. Waypoints
their visitation order. More so if the requirement is to map a path guide the DRL-based motion policy to the global goal, mitigat-
between two locations. Humans can use their best knowledge of ing the local optimum problem. Then, motion is performed based
their surroundings and instincts to locate possible pathways to on the policy, without requiring fully mapped representation of
the goal, even if working in unknown environments. Afterward, the surroundings. The main contributions of this work can be
they can manually guide the mapping robot along the selected itemized as follows:
path to the global goal. However, it is not always possible to r Designed a global navigation and waypoint selection strat-
manually operate the mapping device due to various reasons - egy for goal-driven exploration.
the high cost of labor, physical constraints, limited resources, r Developed a TD3 architecture based neural network for
mobile robot navigation in continuous action space.
r Combined the DRL motion policy with global navigation
Manuscript received September 8, 2021; accepted November 25, 2021. Date
of publication December 10, 2021; date of current version December 14, 2021. strategy to mitigate the local optimum problem for explo-
This letter was recommended for publication by Associate Editor S. Cascianelli ration of unknown environments and performed extensive
and Editor M. Vincze upon evaluation of the reviewers’ comments. This work experiments to validate the proposed method.
was supported in part by the Ministry of Trade, Industry and Energy (MOTIE),
through Technology Innovation Program (Industrial Strategic Technology De-
The remainder of this paper is organized as follows. In
velopment) under Grant 10080638 and in part by the National Research Foun- Section II related works are reviewed. The proposed method
dation, Ministry of Science and ICT, South Korea, through Korea Advanced is described in Section III. Experimental results are given in
Research Program under Grant 2020M3H8A1114945. (Corresponding author: Section IV and conclusions given in Section V.
Reinis Cimurs.)
Reinis Cimurs is with the Robert Bosch GmbH, Hanyang University, Seoul
04763, South Korea (e-mail: [email protected]).
Il Hong Suh is with the COGAPLEX Seoul 04763, South Korea (e-mail:
[email protected]). II. RELATED WORKS
Jin Han Lee is with the CLE Inc., Hanyang University, Seoul 04763, South
Korea (e-mail: [email protected]). Environmental exploration and mapping has been a popular
Digital Object Identifier 10.1109/LRA.2021.3133591 study in the field of robotics for decades [4]–[6]. With the
2377-3766 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on March 25,2024 at 03:58:54 UTC from IEEE Xplore. Restrictions apply.
CIMURS et al.: GOAL-DRIVEN AUTONOMOUS EXPLORATION THROUGH DEEP REINFORCEMENT LEARNING 731
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on March 25,2024 at 03:58:54 UTC from IEEE Xplore. Restrictions apply.
732 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 7, NO. 2, APRIL 2022
B. Local Navigation
In a planning-based navigation stack, local motion is per-
formed following the local planner. In our approach, we replace
this layer with a neural network motion policy. We employ DRL
to train the local navigation policy separately in a simulated
environment.
A Twin Delayed Deep Deterministic Policy Gradient (TD3)
based neural network architecture is used to train the motion
policy [39]. TD3 is an actor-critic network that allows perform-
Fig. 2. POI extraction from the environment. Blue circles represent an ex- ing actions in continuous action space. The local environment is
tracted POI with the respective method. Green circles represent the current
waypoint. (a) Blue POI is obtained from a gap between laser readings. (b) Blue
described by bagged laser readings in 180◦ range in front of the
POI extracted from non-numerical laser readings. robot. This information is combined with polar coordinates of
the waypoint with respect to the robot’s position. The combined
data is used as an input state s in the actor-network of the TD3.
The actor-network consists of two fully connected (FC) layers.
represent free space in the environment. A POI is placed in Rectified linear unit (ReLU) activation follows after each of
the environment if sequential laser readings return a non- these layers. The last layer is then connected to the output layer
numerical value. with two action parameters a that represent the linear velocity a1
Examples of POI extraction from the environment are de- and angular velocity a2 of the robot. A tanh activation function
picted in Fig. 2 . is applied to the output layer to limit it in the range (−1, 1).
If in subsequent steps any of the POI are found to be located Before applying the action in the environment, it is scaled by
near an obstacle, they are deleted from the memory. A POI the maximum linear velocity vmax and the maximum angular
will not be obtained from the laser readings in a place that the velocity ωmax as follows:
robot has already visited. Additionally, if a POI is selected as
a waypoint but cannot be reached over a number of steps, it is a1 + 1
a = vmax , ωmax a2 . (5)
deleted and a new waypoint selected. 2
From available POI, the optimal waypoint at the time-step
t is selected by using the Information-based Distance Limited Since the laser readings only record data in front of the robot,
Exploration (IDLE) evaluation method [38]. The IDLE method motion backward is not considered and the linear velocity is
evaluates the fitness of each candidate POI as: adjusted to only be positive.
⎛ d(p ,c ) 2 ⎞ The Q value of the state-action pair Q(s, a) is evaluated by two
t i
e l2 −l1 critic-networks. Both critic-networks have the same structure but
h(ci ) = tanh ⎝ l 2 ⎠ l2 + d(ci , g) + eIi,t , (1) their parameter updates are delayed allowing for divergence in
2
e l2 −l1 parameter values. The critic-networks use a pair of the state s
and action a as an input. The state s is fed into a fully connected
where score h of each candidate POI c with index i is a sum of layer followed by ReLU activation with output Ls . The output
three components. The Euclidean distance component d(pt , ci ) of this layer, as well as the action, are fed into two separate
between robots position p at t and candidate POI is expressed transformation fully connected layers (TFC) of the same size τ1
as a hyperbolic tangent tanh function: and τ2 , respectively. These layers are then combined as follows:
⎛ d(p ,c ) 2 ⎞
t i
l2 −l1
Lc = Ls Wτ1 + aWτ2 + bτ2 , (6)
e
tanh ⎝ l 2 ⎠ l2 , (2)
2 where Lc is the combined fully connected layer (CFC), Wτ1
l2 −l1
e and Wτ2 are the weights of the τ1 and τ2 , respectively. bτ2
where e is the Euler’s number, l1 and l2 are two-step distance is bias of layer τ2 . Then ReLU activation is applied to the
limits in which to discount the score. The two-step distance limits combined layer. Afterward, it is connected to the output with
are set based on the area size of the DRL training environment. 1 parameter representing the Q value. The minimum Q value
The second component d(ci , g) represents Euclidean distance of both critic-networks is selected as the final critic output to
between the candidate and the global goal g. Finally, map limit the overestimation of the state-action pair value. The full
information score at t is expressed as: network architecture is visualized in Fig. 3.
The policy is rewarded according to the following function:
eIi,t , (3) ⎧
⎨rg if Dt < ηD
where Ii,t is calculated as: r(st , at ) = rc if collision (7)
⎩
k2 k2 v − |ω| otherwise,
w=− k h=− k
c(x+w)(y+h)
Ii,t = 2 2
. (4) where the reward r of the state-action pair (st , at ) at timestep t
k2 is dependant on three conditions. If the distance to the goal at the
k is size of the kernel to calculate the information around the current timestep Dt is less than the threshold ηD , a positive goal
candidate points coordinates x and y, w and h represent the reward rg is applied. If a collision is detected, a negative collision
kernel width and height, respectively. reward rc is applied. If both of these conditions are not present,
A POI with the smallest IDLE score from (1) is selected as an immediate reward is applied based on the current linear
the optimal waypoint for local navigation. velocity v and angular velocity ω. To guide the navigation policy
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on March 25,2024 at 03:58:54 UTC from IEEE Xplore. Restrictions apply.
CIMURS et al.: GOAL-DRIVEN AUTONOMOUS EXPLORATION THROUGH DEEP REINFORCEMENT LEARNING 733
Fig. 3. TD3 network structure including the actor and critic parts. Layer type and the number of their respective parameters are described within the layers. TFC
layers refer to transformation fully connected layers τ and CFC layer to combined fully connected layer Lc .
A. System Setup For the network to work not only in simulation but also in
The learning of local navigation through DRL was performed real life, it needs to learn generic obstacle avoidance behaviour
on a computer equipped with an NVIDIA GTX 1080 graphics from laser data. In order to facilitate generalization and policy
card, 32 GB of RAM, and Intel Core i7-6800 K CPU. The TD3 exploration, Gaussian noise was added to the sensor and action
network was trained in the Gazebo simulator for 800 episodes values. To help with sim2real transfer, the environment was
which took approximately 8 hours. Each training episode con- varied on each episode by randomly changing the locations of
cluded when a goal was reached, a collision was detected or 500 the box-shaped obstacles. Example of their changing locations
steps were taken. vmax and ωmax were set as 0.5 meters per is depicted in Fig. 4(a), (b), and (c). The IDLE method of POI
second and 1 rad per second, respectively. The delayed rewards evaluation selects waypoints within the learned distance of the
were updated over the last n = 10 steps and parameter update simulation. The same robot model was used in simulation as in
delay was set as 2 episodes. The training was carried out in a the experiments and its starting position and the goal locations
simulated 10x10 m-sized environment depicted in Fig 4. were randomized on every episode.
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on March 25,2024 at 03:58:54 UTC from IEEE Xplore. Restrictions apply.
734 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 7, NO. 2, APRIL 2022
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on March 25,2024 at 03:58:54 UTC from IEEE Xplore. Restrictions apply.
CIMURS et al.: GOAL-DRIVEN AUTONOMOUS EXPLORATION THROUGH DEEP REINFORCEMENT LEARNING 735
Fig. 6. Environment with complex obstacles and resulting navigation examples of the respective methods in the second quantitative experiment. (a) Depiction of
the experiment environment. (b) Example of a resulting map of the environment with overlaid path of each method.
TABLE I
RESULTS OF THE FIRST QUANTITATIVE EXPERIMENT
TABLE II
RESULTS OF THE SECOND QUANTITATIVE EXPERIMENT
C. Qualitative Experiments
Additional experiments with GDAE method were performed
in various indoor settings. In Fig. 7 three different environments
are depicted, where a local optimum needed to be avoided or
navigated out of to arrive at the global goal. The blue dots
represent the obtained POI, green path represents the robot’s
path.
In Fig. 7(a) the goal position was located at coordinate (22,
12). A straight path towards the goal led towards a room corner
and a local optimum. Once the robot had obtained the informa-
tion about a corner, it was able to backtrack, find the exit of the
room, and navigate towards the global goal through the hallways.
In Fig. 7(b) the goal position was located at coordinate (23, 2),
and the robot starting location was placed in the local optimum.
By using the global navigation strategy, the local environment
was explored, which allowed the robot to navigate out of the
confined hallway and to the global goal. In Fig. 7(c) the goal
Fig. 7. Qualitative experiments. (a) Navigation towards a goal in hallway. coordinates were (10, 10) and the robot starting location was
(b) Navigation towards a goal in a hallway, when starting position is in a local the local optimum. The environment was littered with obstacles
optimum. (c) Navigation towards a goal in a hall through a clutter. of various shapes and sizes, such as chairs, signs, potted plants,
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on March 25,2024 at 03:58:54 UTC from IEEE Xplore. Restrictions apply.
736 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 7, NO. 2, APRIL 2022
Fig. 8. Experimental results of goal driven exploration and mapping with goal located at a significant distance from the robot. (a) Navigation towards a goal in
hallway. (b) Navigation towards a goal in a hall with backtracking. (c) Navigation towards a goal in an underground parking lot.
and others. The robot successfully explored its surroundings, but its shortcomings are alleviated by introducing the global
navigated through the clutter using the local navigation, and navigation strategy. The obtained experimental results show
arrived at the global goal. that the proposed system works reasonably close to the optimal
Afterward, navigation and mapping experiments were per- solution obtained by the path planner from an already known
formed on a larger scale. In Fig. 8 three scenarios are introduced, environment. Additionally, as the SPL measurement results
where a robot navigated towards a goal in hallways and an un- show, GDAE is more reliable in complex scenarios than map
derground parking lot. In Fig. 8(a) goal was located at coordinate dependant SLAM methods.
(12, −90). In Fig. 8(b) goal coordinate was (60, −7). In Fig. 8 In the current implementation, the motion policy training was
the goal was located 100 meters diagonally across a parking lot performed with the model of the same robot as in the real-life
at coordinate (100, 0). In these experiments, the robot navigated experiments. This allowed for easy transfer of the network
out of cluttered environment and arrive at the goal at a significant parameters to the embedded implementation as the network
distance. parameters were optimized for its specifications. To generalize
From experiments, we can observe that the navigation system to various types of robots, system dynamics could be introduced
is capable of exploring and navigating in an unknown environ- as a separate input state to the neural network, and training
ment of various scale and complexity and reliably find its way performed accordingly. By only providing robot dynamics, it
to the global goal. The experimentation code with images and would be possible to perform local navigation up to its best
videos1 of the presented and additional experiments in static capabilities.
and dynamic settings are available from our repository2 and
supplementary material.
REFERENCES
V. CONCLUSIONS [1] K. Yousif, A. Bab-Hadiashar, and R. Hoseinnezhad, “An overview to visual
odometry and visual SLAM: Applications to mobile robotics,” Intell. Ind.
In this paper, a DRL goal-driven fully autonomous exploration Syst., vol. 1, no. 4, pp. 289–311, 2015.
system is introduced. As the experiments show, the system suc- [2] M. Sugiyama, Statistical Reinforcement Learning: Modern Machine
cessfully combines reactive local and global navigation strate- Learning Approaches. Boca Raton, FL, USA: CRC Press, 2015.
[3] D. Aberdeen et al., “Policy-gradient algorithms for partially observable
gies. Moreover, the task of introducing a neural network-based markov decision processes,” Ph.D. dissertation, Australian Nat. Univ.,
module for an end-to-end system proves to be beneficial as it Canberra ACT, Australia, 2003.
allows the robot to move without generating an explicit plan, [4] A. Zelinsky, “A mobile robot navigation exploration algorithm,” IEEE
Trans. Robot. Automat., vol. 8, no. 6, pp. 707–717, Dec. 1992.
[5] H. Surmann, A. Nüchter, and J. Hertzberg, “An autonomous mobile robot
1 Video: https://youtu.be/MhuhsSdzZFk with a 3D laser range finder for 3D exploration and digitalization of indoor
2 Repository: https://github.com/reiniscimurs/GDAE environments,” Robot. Auton. Syst., vol. 45, no. 3-4, pp. 181–198, 2003.
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on March 25,2024 at 03:58:54 UTC from IEEE Xplore. Restrictions apply.
CIMURS et al.: GOAL-DRIVEN AUTONOMOUS EXPLORATION THROUGH DEEP REINFORCEMENT LEARNING 737
[6] T. Chen, S. Gupta, and A. Gupta, “Learning exploration policies for [25] R. Shrestha, F.-P. Tian, W. Feng, P. Tan, and R. Vaughan, “Learned map
navigation,” CoRR, 2019, arXiv:1903.01959. prediction for enhanced mobile robot exploration,” in Proc. Int. Conf.
[7] R. Mur-Artal and Juan D. Tardós, “ORB-SLAM2: An open-source SLAM Robot. Automat., 2019, pp. 1197–1204.
system for monocular, stereo, and RGB-D cameras,” IEEE Trans. Robot., [26] D. S. Chaplot, D. Gandhi, S. Gupta, A. Gupta, and R. Salakhutdi-
vol. 33, no. 5, pp. 1255–1262, Oct. 2017. nov, “Learning to explore using active neural SLAM,” CoRR, 2020,
[8] C. Yu et al., “DS-SLAM: A semantic visual SLAM towards dynamic arXiv:2004.05155.
environments,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2018, [27] S. K. Ramakrishnan Z. Al-Halah, and K. Grauman, “Occupancy anticipa-
pp. 1168–1174. tion for efficient exploration and navigation,” in Proc. Eur. Conf. Comput.
[9] L. Cui and C. Ma, “Sof-SLAM: A semantic visual SLAM for dynamic Vis., 2020, pp. 400–418.
environments,” IEEE Access, vol. 7, pp. 166528–166539, 2019. [28] S. Gupta, J. Davidson, S. Levine, R. Sukthankar, and J. Malik, “Cognitive
[10] B. Li, Y. Wang, Y. Zhang, W. Zhao, J. Ruan, and P. Li, “GP-SLAM: mapping and planning for visual navigation,” in Proc. IEEE Conf. Comput.
Laser-based SLAM approach based on regionalized Gaussian process map Vis. Pattern Recognit., 2017, pp. 2616–2625.
reconstruction,” Auton. Robots, vol. 44, pp. 947–967, 2020. [29] L. Tai and M. Liu, “A robot exploration strategy based on Q-learning
[11] G. Jiang, L. Yin, G. Liu, W. Xi, and Y. Ou, “FFT-based scan-matching for network,” in Proc. IEEE Int. Conf. Real-Time Comput. Robot., 2016,
SLAM applications with low-cost laser range finders,” Appl. Sci., vol. 9, pp. 57–62.
no. 1, pp. 1–18, 2019, Art. no. 41. [30] L. Xie, S. Wang, A. Markham, and N. Trigoni, “Towards monocular vision
[12] Z. Ren, L. Wang, and L. Bi, “Robust gicp-based 3D LiDAR SLAM based obstacle avoidance through deep reinforcement learning,” CoRR,
for underground mining environment,” Sensors, vol. 19, no. 13, 2019, 2017, arXiv:1706.09829.
Art. no. 2915. [31] L. Tai, G. Paolo, and M. Liu, “Virtual-to-real deep reinforcement learning:
[13] X. Liang, H. Chen, Y. Li, and Y. Liu, “Visual laser-SLAM in large-scale continuous control of mobile robots for mapless navigation,” in Proc.
indoor environments,” in Proc. IEEE Int. Conf. Robot. Biomimetics, 2016, IEEE/RSJ Int. Conf. Intell. Robots Syst., 2017, pp. 31–36.
pp. 19–24. [32] R. Cimurs, J. H. Lee, and I. H. Suh, “Goal-oriented obstacle avoidance
[14] S.-H. Chan, P.-T.Wu, and L.-C. Fu, “Robust 2D indoor localization through with deep reinforcement learning in continuous action space,” Electronics,
laser SLAM and visual SLAM fusion,” in Proc. IEEE Int. Conf. Syst., Man, vol. 9, no. 3, pp. 1–16, 2020, Art. no. 411.
Cybern., 2018, pp. 1263–1268. [33] Y. F. Chen, M. Everett, M. Liu, and J. P. How, “Socially aware motion
[15] Z. Zhang, S. Liu, G. Tsai, H. Hu, C.-C. Chu, and F. Zheng, “PIRVS: An planning with deep reinforcement learning,” in Proc. IEEE/RSJ Int. Conf.
advanced visual-inertial SLAM system with flexible sensor fusion and Intell. Robots Syst., 2017, pp. 1343–1350.
hardware Co-design,” in Proc. IEEE Int. Conf. Robot. Automat., 2018, [34] A. Pokle et al., “Deep local trajectory replanning and control
pp. 1–7. for robot navigation,” in Proc. Int. Conf. Robot. Automat., 2019,
[16] T. Taketomi, H. Uchiyama, and S. Ikeda, “Visual SLAM algorithms: A pp. 5815–5822.
survey from 2010 to 2016,” IPSJ Trans. Comput. Vis. Appl., vol. 9, no. 1, [35] A. Faust et al., “prm-Rl: Long-range robotic navigation tasks by combining
p. 16, 2017. reinforcement learning and sampling-based planning,” in Proc. IEEE Int.
[17] M. Filipenko and I. Afanasyev, “Comparison of various SLAM systems Conf. Robot. Automat., 2018, pp. 5113–5120.
for mobile robot in an indoor environment,” in Proc. Int. Conf. Intell. Syst., [36] Y. Zhu et al., “Target-driven visual navigation in indoor scenes using deep
2018, pp. 400–407. reinforcement learning,” in Proc. IEEE Int. Conf. Robot. Automat., 2017,
[18] D. W. Ko, Y. N. Kim, J. H. Lee, and I. H. Suh, “A scene-based dependable pp. 3357–3364.
indoor navigation system,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots [37] A. Devo, G. Mezzetti, G. Costante, M. L. Fravolini, and P. Valigi, “To-
Syst., 2016, pp. 1530–1537. wards generalization in target-driven visual navigation by using deep re-
[19] L. von Stumberg, V. Usenko, J. Engel, J. Stückler, and D. Cremers, “From inforcement learning,” IEEE Trans. Robot., vol. 36, no. 5, pp. 1546–1561,
monocular SLAM to autonomous drone exploration,” in Proc. Eur. Conf. Oct. 2020.
Mobile Robots, 2017, pp. 1–8. [38] R. Cimurs, I. H. Suh, and J. H. Lee, “Information-based heuristics for
[20] H. Gao, X. Zhang, J. Wen, J. Yuan, and Y. Fang, “Autonomous indoor learned goal-driven exploration and mapping,” in Proc. 18th Int. Conf.
exploration via polygon map construction and graph-based SLAM using Ubiquitous Robots, 2021, pp. 571–578.
directional endpoint features,” IEEE Trans. Automat. Sci. Eng., vol. 16, [39] S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation
no. 4, pp. 1531–1542, Oct. 2019. error in actor-critic methods,” in Proc. Int. Conf. Mach. Learn., 2018,
[21] N. Palomeras, M. Carreras, and J. Andrade-Cetto, “Active SLAM for pp. 1587–1596.
autonomous underwater exploration,” Remote Sens., vol. 11, no. 23, 2019, [40] S. S. Macenski, and I. Jambrecic, “SLAM Toolbox: SLAM for the dynamic
Art. no. 2827. world,” J. Open Source Software, vol. 6, no. 61, pp. 2783–2790, 2021.
[22] M. Keidar and A. GalKaminka, “Efficient frontier detection for robot [41] D. L. da Silva Lubanco, M. Pichler-Scheder, and T. Schlechter, “A novel
exploration,” Int. J. Robot. Res., vol. 33, no. 2, pp. 215–236, 2014. frontier-based exploration algorithm for mobile robots,” in Proc. 6th Int.
[23] W. Gao, M. Booker, A. Adiwahono, M. Yuan, J. Wang, and Y. W. Yun, Conf. Mechatronics Robot. Eng., 2020, pp. 1–5.
“An improved frontier-based approach for autonomous exploration,” in [42] P. Anderson et al., “On evaluation of embodied navigation agents,” CoRR,
Proc. 15th Int. Conf. Control, Automat., Robot. Vis., 2018, pp. 292–297. 2018, arXiv:1807.06757.
[24] Y. Tang, J. Cai, M. Chen, X. Yan, and Y. Xie, “An autonomous exploration
algorithm using environment-robot interacted traversability analysis,” in
Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2019, pp. 4885–4890.
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on March 25,2024 at 03:58:54 UTC from IEEE Xplore. Restrictions apply.