0% found this document useful (0 votes)

7 views7 pages

SA031PL

This paper presents a novel deep reinforcement learning (DRL) model for the autonomous navigation of drones in complex environments, utilizing TD3 and PPO algorithms for improved performance. The study emphasizes the importance of explainability in DRL models through techniques like LIME and SHAP to address the black box nature of these algorithms. The simulation environment is developed using Unreal Engine and AirSim, demonstrating the model's effectiveness in real-time navigation tasks while highlighting the need for efficient algorithms in real-world applications.

Uploaded by

chrysayiss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views7 pages

SA031PL

Uploaded by

chrysayiss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Proceedings of the 28th World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2024)

Autonomous Navigation of Drones Using Explainable Deep Reinforcement Learning

in Complex Environments

Pawan Kumar SOURIPALLI

Mechanical Engineering department
Visvesvaraya National Institute of Technology, Nagpur, India
[email protected]

Laeba Jeelani SAYED

Mechanical Engineering department
Visvesvaraya National Institute of Technology, Nagpur, India
[email protected]

Dr. Shital S. CHIDDARWAR

Mechanical Engineering department
Visvesvaraya National Institute of Technology, Nagpur, India
[email protected]

ABSTRACT 1. INTRODUCTION

Autonomous navigation of Unmanned Aerial Vehicles (UAV) in Unmanned Aerial Vehicles (UAVs) have revolutionized tasks
complex environments is still a challenging field. Recognizing from delivery to surveillance, significantly impacted by
UAV real-time perception as a sequential decision-making Artificial Intelligence (AI) and Information Technology (IT)
challenge, researchers increasingly adopt learning-based advancements. They offer unparalleled flexibility and
methods, leveraging machine learning to enhance navigation in time-saving capabilities in applications such as safe drone
complex environments.In this paper, a novel deep reinforcement operation models in urban air traffic flows, hovermap drone
learning (DRL) model has been proposed for the smooth systems, and AI-driven 5G UAV systems. [1, 2, 3]. Their
navigation of the UAV. The paper provides an overview of autonomous navigation is crucial for path planning, obstacle
existing techniques, laying the foundation for our proposed avoidance, and control, which can be addressed through
work, which not only addresses certain limitations but also real-time perception or through the utilization of existing
demonstrates superior performance in complex environments. environmental data. To address complex scenarios, the former
The simulation environment is built using Unreal Engine, and approach increasingly adopts learning-based techniques such as
the connections have been established using AirSim APIs. The Machine Learning (ML) [4].
implementation of the TD3 algorithm is chosen for its
exceptional adaptability in continuous action spaces due to its Deep Reinforcement Learning (DRL), a subset of ML, combines
off policy, value-based approach, resulting in improved stability Deep Learning (DL) for neural network training and
and sample efficiency whereas the implementation of PPO Reinforcement Learning (RL) for sequential decision-making
algorithm is due to its on-policy method that leads to stable through Markov Decision Process (MDP). Based on prevalent
learning without the need for value function estimation. Our analysis, DRL algorithms combined with PID controllers reduce
model undergoes training in a customized landscape collision rates in UAV control, but highlight the need for sample
mountainous environment, and the results, obtained after efficient algorithms. Relevant Experience Learning (REL) and
rigorous training, are thoroughly analyzed. The state-action non-sparse rewards handle large state and action spaces, while
pairs of our trained TD3 agent are explained using LIME and PPO and LSTM networks emphasize sensor data importance for
SHAP techniques. The paper concludes by presenting promising accurate models [5]. Other RL algorithms like TEXPLORE
directions for further exploration and advancement in this were utilized for autonomous navigation but it fell short due to
evolving field. absence of comparative analysis and scalability issues in real
world applications [6]. The lack of real-world training scenarios
Keywords: Unmanned Aerial Vehicle, Deep Reinforcement poses a hurdle for validating algorithms, as seen in the analysis
Learning (DRL), Twin Delayed DDPG (TD3), Proximal Policy of incremental learning with PPO [7].
Optimization (PPO), AirSim, Unreal Engine, Explainable The opacity of DRL methods, acting as black boxes, necessitates
Artificial Intelligence (XAI), Local Interpretable Model Explainability to render the model transparent. Explainability
agnostic Explanation (LIME), Shapley additive
can be approached globally or locally. For complex models, this
explanations (SHAP), Application Program Interface (APIs).
involves using a model-agnostic approach like LIME and SHAP,
or a model-specific approach like Random Forest Regressor,

ISBN: 978-1-950492-79-4
ISSN: 2771-0947
44 https://doi.org/10.54808/WMSCI2024.01.44
Proceedings of the 28th World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2024)

where the former can be applied to any ML model, and the latter solving a first-order differential equation with the help of the
is tailored to work for a particular type of model or algorithm[8]. new, more manageable constraint .
The essence of our contribution can be outlined as follows: ___________________________________________________
Algorithm : Proximal Policy Optimization (PPO) Pseudocode.
1. Comparison of TD3 and PPO algorithms with MLP neural ___________________________________________________
networks for drone navigation in complex environments, Initialization of total number of iterations (I), number of actors
established in Unreal Engine, utilizing connections through (J), and the time steps (T);
AirSim.
for iteration = 1 to I do
2. Evaluating the TD3 agent's state-action pairs, using SHAP
and LIME explainability techniques to address the black box for actor = 1 to J do
issue. Run the policy πθ on the prior theta value, θold, for T
time steps; Calculate the advantage estimates Aˆ t for
2. PRELIMINARIES the time steps;
A. TD3 Algorithm end
Twin Delayed DDPG (TD3) is an off-policy model-free DRL Optimize the objective function L(θ) with respect to θ and
approach used for model training to provide smooth control
commands for UAV navigation. TD3 replaces DDPG, call it θopt;
introducing three crucial techniques to overcome the Q-value Define θold = θopt ;
overestimation problem: delayed policy update, target policy
smoothing, and clipped double Q-learning. end
Unlike the DDPG algorithm, the TD3 algorithm minimizes the _________________________________________________
mean squared Bellman error while simultaneously learning two
Q-functions, Q1 and Q2 [8]. The process involves initialization, iterations, and the execution
of the policy in the environment to gather data such as states,
actions, rewards, and subsequent states. It assesses the benefit of
selecting a particular action at a certain time step compared to
the average value of actions, then refines the policy parameters
by optimizing an objective function. This function, which
defines the goal of the optimization problem, aims to maximize
the cumulative reward in the context of reinforcement learning.
By adjusting the policy parameters, the function seeks to
maximize expected returns and incrementally enhance the
policy’s performance.

3. DRL FRAMEWORK
Figure 1 : TD3 Networks
During each learning time step, the attributes of the actor and
critic are updated, while a stochastic noise model is applied to
perturb the action selected by the policy.**
Its ability to handle continuous action spaces, reduce
overestimation bias, and encourage exploration make it an
effective choice for training policies in complex and dynamic
environments.
B. PPO Algorithm
PPO follows the general framework used by many RL
algorithms as shown by equation 1 where the expected reward
(Q-value) for taking action (a) in state (s) and then following a
policy (π) sums up the expected rewards for all possible future
states and actions, considering the probabilities defined by the
policy. The foundation lies in substituting flexible constraints,
Figure 2 : The Deep Reinforcement Learning workflow involves
regarded as penalties, for rigid ones. An approximation of the
the agent (Top Block) acquiring precursory state and reward
second-order optimization of a differential equation is found by
from the environment (Bottom Block), after which the agent
generates actions accordingly.

45
Proceedings of the 28th World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2024)

With the use of a machine learning technique called deep | |. |

𝑅𝑠𝑡𝑎𝑡𝑒 = 𝑑𝑑𝑒𝑠𝑖𝑟𝑒𝑑(𝑥𝑦) − 𝑑𝑥𝑦 + 𝑑𝑑𝑒𝑠𝑖𝑟𝑒𝑑(𝑧) − 𝑑𝑧 | (3)
reinforcement learning (DRL), computers can learn from their
behavior in a manner akin to how people learn from experience.
This kind of machine learning is far more centered on Penalizing gap between desired state and actual state.
interaction-based, goal-directed learning than previous methods.
𝑑−𝑑 𝑐𝑟𝑎𝑠ℎ
The learning entity is not instructed on what steps to take; 𝑅𝑜𝑏𝑠 = 1 − 𝑚𝑎𝑥(0, 𝑚𝑖𝑛(1, )) (4)
5
instead, it must try potential courses of action and determine
which ones yield the highest reward or is much closer to the
objective. Reward for maintaining a safe distance from the obstacles and
avoiding crash.
A. State Space
A state is a precise location and time, an instantaneous 𝑅𝑎𝑐𝑡𝑖𝑜𝑛 = 𝑅𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦 + 𝑅𝑦𝑎𝑤 𝑒𝑟𝑟𝑜𝑟 (5)
configuration that positions the agent in relation to other
important objects such as goals and impediments. It represents | 𝑉 −𝑉 |
the physical and immediate circumstance in which the agent 𝑅𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦 = | 𝑚𝑎𝑥 |, (6)
| 𝑉𝑚𝑎𝑥 |
finds itself. It may also be the environment's feedback in the
form of the current or future situation. 𝑅𝑦𝑎𝑤 𝑒𝑟𝑟𝑜𝑟 = 𝑎𝑏𝑠(𝑦𝑎𝑤_𝑒𝑟𝑟𝑜𝑟 / 90) (7)
B. Action Space
In the DRL framework, the action space comprises the set of Penalizing for the error between the instructed action and the
possible actions available to an agent at each state. The agent action performed.
selects an action informed by observations, with the objective of
optimizing cumulative long-term rewards in alignment with a 𝑅𝑡 = 𝐾1 * 𝑅𝑑𝑖𝑠𝑡 + 𝐾2 * 𝑅𝑠𝑡𝑎𝑡𝑒 + 𝐾3 * 𝑅𝑜𝑏𝑠 + 𝐾4 * 𝑅𝑎𝑐𝑡𝑖𝑜𝑛 (8)
specified policy.
K1, K2, K3, K4 are weight parameters.
C. Reward
The reward function is used to define a goal in a reinforcement 𝑅𝑓𝑖𝑛𝑎𝑙 = 𝑅𝑡 (if the episode is still running)
learning problem. It is the mapping of each perceived state (or
state-action pair) of the environment to a single number, 𝑅𝑒 (if the episode is completed)
specifying the intrinsic desirability of that state.

The computation of both the final reward and its cumulative

4. REWARD MECHANISM reward will act as inputs for our model optimization process.
The Final reward will act as a real-time feedback system to
Within a Deep Reinforcement Learning (DRL) framework, the make swift adjustments, while the cumulative rewards will
reward function is a crucial element. It serves as the metric for provide a broader perspective showing the consistency in
agent learning, imbuing each interaction with significance. A performance.
well-designed reward function is not merely a metric; it
becomes the perspective through which the agent perceives its
environment, evaluates its actions, and refines its strategies. 5. TRAINING PROCESS
We crafted our reward function for both completed episodes and
those still in progress. A. Setting up environment
For the first case, the reward assignment is simple: if the The simulation environment is built on AirSim which is
destination is reached, it's +10; if the agent crashes, it's profound in generating environments with minimal gaps
penalized with -20; and if the agent goes out of the boundaries, between the real-world and simulation. In order to stabilize the
it's -10. UAV, this simulator can offer a low-level controller and a
high-fidelity environment with a ground truth depth picture.
reward = 0
reward_reach = 10
reward_crash = -20
reward_outside = -10

For the second case where the episode is still in the running Figure 3 : Connection Establishment between AirSim and
phase; we will be inculcating multiple factors in the reward Visual Studio 2022
function.
𝑅𝑑𝑖𝑠𝑡 = 𝑑𝑔(𝑡−1) − 𝑑𝑔(𝑡) (2)
The establishment of a connection between AirSim and Visual
Studio 2022 represents a significant advancement in the field of
Reward based on the change in distance to the goal. It’s autonomous navigation.
computed based on the variation between the previous and This modern linkage not only introduces a new method for
current distances to the goal (the distance it has moved towards model training but does so with notable efficiency through
the goal). seamless integration using AirSim APIs.

46
Proceedings of the 28th World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2024)

Through this connection, the convergence of AirSim simulation Our model was trained using the TD3 Reinforcement Learning
with the development environment of VS2022 brings us, algorithm, integrated with MLP Deep Neural Networks. This
notably closer to real-world scenarios. This innovative coupling strategic integration facilitated the sequential decision-making
offers a training platform that closely emulates the intricate process, within the framework of a Markov Decision Process
challenges of genuine environments. (MDP).

B. Hardware Setup Careful attention was devoted to the tuning of hyperparameters,

ensuring a good trade-off between exploration and exploitation.
We trained our model using a strong GPU (NVIDIA Quadro This meticulous tuning strategy not only fostered a conducive
RTX 6000) and an AMD Ryzen Threadripper 2950X 16-Core environment for learning but also expedited the model's
Processor. During the training process, the destination points are convergence towards an optimal function.
randomized.
Our approach fosters a robust framework for autonomous
decision-making and the fine-tuning of hyperparameters
reinforces the model’s adaptability and its efficiency.
Table 1
Parameters for the Simulation Environment

No Parameter Value
1 Time step(dt) 0.1
2 Maximum acceleration in Horizontal 2.0
plane
3 Maximum Velocity in Horizontal plane 5.0
4 Minimum Velocity in Horizontal plane 0.5
5 Maximum Velocity in Vertical plane 2.0
6 Maximum yaw rate 50
7 Crash Distance 2
8 Accept radius 2

Table 2
Hyperparameters for the DRL Algorithms Figure 4 : Customized landscape environment built in AirSim.

No.
6. EXPLAINABILITY
Hyperparameters Value
1 Gamma 0.99 Explainability enhances transparency in DRL models by
2 Learning rate 1e-3 revealing black box elements and clarifying the relationships
3 Learning starts 1000 between actions and state features. We employed SHAP and
4 Buffer size 50000 LIME to explain our model.
5 Batch size 128 A. Shapley additive explanations (SHAP)
6 Train frequency 100
7 Gradient steps 200 SHAP employs Shapley values to provide a comprehensive
8 Action noise sigma 0.1 understanding of feature importance, both locally and globally.
In our XDRL-based autonomous drone navigation model,
SHAP enhances interpretability and optimization by offering
clear visual insights into model decisions at the individual data
The navigation network is trained in AirSim, a simulator built point level.
on Unreal Engine. This simulator provides an extremely
realistic environment with a ground truth depth picture and a B. Local Interpretable Model-agnostic Explanations (LIME)
simple controller to keep the UAV stable. A customized LIME generates interpretable surrogate models around specific
landscape mountain environment is created for training,
data points to clarify individual predictions. In the context of
featuring a square environment with a side length of 256 units.
At the beginning of each episode, the quadrotor takes off from autonomous drone navigation using XDRL, LIME offers
the random start position in the environment. The goal position localized insights into feature contributions, enhancing both
is set randomly within the boundaries of rectangular model performance and interpretability across dynamic
coordinates. The episode ends either when the quadrotor scenarios. Its adaptability to various machine learning
reaches the goal position within an acceptable radius or when it techniques ensures precise decision-making in complex
crashes into obstacles. In order to generate the velocity setpoint environments.
in the three-dimensional environment, the neural network gets
both the quadrotor's state information and the depth image at
each time step.

47
Proceedings of the 28th World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2024)

Figure 5A : Graphical user interface obtained in the training Figure 5B : Graphical user interface obtained in the training
process of TD3 algorithm (Reward per episode) process of TD3 algorithm (Cumulative Reward)

Figure 6A : Graphical user interface obtained in the training Figure 6B : Graphical user interface obtained in the training
process of TD3 algorithm (Yaw Rate) process of TD3 algorithm (Trajectory of agent)

7. RESULTS AND DISCUSSIONS Figure 5A shows the plot of rewards per episode, where the
model assigns rewards based on performance in each episode.
Figure 5B presents the plot of cumulative rewards, indicating
Evaluation metric TD3 PPO the consistency in model performance and optimization. Figure
Mean episode reward 78 72 6A illustrates the oscillations in yaw, which are decreasing and
Crash rate 16% 22% approaching a saturation state, demonstrating that the drone has
Convergence time 2400 episodes 3300 episodes achieved stabilization after progressive iterations. Figure 6B
depicts the drone's trajectory, showing how efficiently the
Success rate 77% 68%
model avoids obstacles, maintains a safe distance, and moves
swiftly towards the destination. Complex environments pose
After designing the reward function, the model is trained increased challenges for the model, which typically results in
extensively with nearly 100,000 steps with 197,000 updates. longer convergence times compared to training in simpler
TD3 algorithm has been converged after 2400 episodes with a environments.
success rate of 77%. When compared with the PPO algorithm,
We observed several positive aspects, including an increase in
the mean episode reward, decrease in crash rate, and a
significant increase in success rate.

48
Proceedings of the 28th World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2024)

Figure 7A : Using Lime method Feature analysis Velocity in the Figure 7B : Using SHAP method Feature analysis for Velocity
XY plane Vxy. in the XY plane Vxy.

Figure 8A : Using Lime method Feature analysis Vertical speed Figure 8B : Using SHAP method Feature analysis for Vertical
Vz. speed Vz.

Figure 9A : Using Lime method Feature analysis Yaw rate. Figure 9B : Using SHAP method Feature analysis for Yaw rate.

We have utilized SHAP and LIME methods to analyze feature

contributions to the model's decisions. Figures 7A and 7B Actions and their major contributors in states are :
display feature analysis plots for velocity in the XY plane, Velocity in Horizontal plane: angular velocity, Linear
Figures 8A and 8B for vertical speed, and Figures 9A and 9B velocity_xy
for steering speed (yaw rate). These analyses reveal the Vertical speed: Vertical Distance, Relative
influence of different features on the model's predictions for Yaw, Linear Velocity in Z
horizontal, vertical, and rotational movements. Random forest Yaw rate(Steering speed): Linear velocity Z, Vertical
regression was employed to establish correlations between the Distance and Angular
six state features and the three action features, offering insights Velocity
into the model's decision-making process.
The table summarizes the correlation of the independent
features with the dependent feature

49
Proceedings of the 28th World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2024)

The major contributors for the action, velocity in Horizontal [7] V.J Hodge, R. Hawkins, R. Alexander, “Deep
plane in states are angular velocity and linear velocity_xy. For reinforcement learning for drone navigation using sensor
the action Vertical speed features affecting the most are Vertical data”, Neural Computing & Applications, 2021, Vol.33,
Distance, Relative yaw, Linear Velocity_z and for Yaw rate; pp. 2015–2033.
linear velocity in Z direction, Vertical distance and angular [8] L. He, N. Aouf, B. Song, “Explainable Deep
velocity have good correlation. Reinforcement Learning for UAV Autonomous
Navigation”, Aerospace Science and Technology, 2021,
Vol. 118, pp. 107052.
8. CONCLUSION [9] A.T. Azar, F.E. Serrano, N.A. Kamal, A. Kouba, “Robust
Kinematic Control of Unmanned Aerial Vehicles with
In this paper autonomous navigation of UAVs is addressed Non-holonomic Constraints”, International Conference
using Explainable Deep Reinforcement Learning Technique. on Advanced Intelligent System and Informatics, 2020.
The simulation environment was built on AirSim and pp. 839-850.
connections were established between VS2022 and Unreal [10] Y. Song, L.-T. Hsu, “Tightly coupled integrated navigation
engine. We have used the standard TD3 and PPO algorithms system via factor graph for UAV indoor localization”,
and molded it according to the requirements it must serve. A Aerospace Science and Technology, 2021, Vol.108, pp.
well crafted reward function was prepared considering all the 106370.
crucial factors and the hyperparameters were tuned efficiently to
obtain the desired outputs. The training process has been [11] M. Shah, N. Aouf, “3D cooperative Pythagorean
intensive producing a success rate of 77% for the TD3 hodograph path planning and obstacle avoidance for
algorithm which has a significant hike compared to PPO multiple UAVs”, IEEE 9th International Conference on
algorithm. The use of explainability techniques has greatly Cybernetic Intelligent Systems, 2010, pp. 1–6.
improved the transparency of our model. We now understand [12] Y. Shin, E. Kim, “Hybrid path planning using positioning
why certain actions occur in specific states, giving us insight risk and artificial potential fields”, Aerospace Science and
into the decision-making process.The future scope of this work Technology, 2021, Vol. 112, pp. 106640.
is integrating defogging techniques which will make our model [13] N. Imanberdiyev, C. Fu, E. Kayacan, I.-M. Chen,
robust enough to tackle the foggy conditions. “Autonomous navigation of UAV by using real-time
model-based reinforcement learning”, 14th International
Conference on Control, Automation, Robotics and
9. REFERENCES Vision (ICARCV), 2016, pp. 1–6.
[14] V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness,
[1] D.D. Nguyen, J. Rohacs, D. Rohacs, “Autonomous Flight M.G. Bellemare, A. Graves, M. Riedmiller, A.K.
Trajectory Control System for Drones in Smart City Traffic Fidjeland, G. Ostrovski, et al., “Human-level control
Management”, International Journal of through deep reinforcement learning”, 2015, Nature, Vol.
Geo-Information, 2021, Vol. 10, pp. 338. 518, pp. 529–533.
[2] E. Jones, J. Sofonia, C. Canales, S. Hrabar, F. Kendoul, [15] A.B. Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot,
“Applications for the Hovermap autonomous drone system S. Tabik, A. Barbado, S. García, S. Gil-López, D. Molina,
in underground mining operations”, Journal of the R. Benjamins, et al., “Explainable artificial intelligence
Southern African Institute of Mining and Metallurgy, (XAI): concepts, taxonomies, opportunities and challenges
2020, Vol. 120, pp. 49-56. toward responsible AI”, Information Fusion, 2019, Vol.
[3] H. Kim, J. Ben-Othman, L. Mokdad, J. Son, C. Li, 58, pp. 82–115.
"Research Challenges and Security Threats to AI-Driven [16] A. Singla, S. Padakandla, S. Bhatnagar, “Memory-based
5G Virtual Emotion Applications Using Autonomous deep reinforcement learning for obstacle avoidance in UAV
Vehicles, Drones, and Smart Devices," in IEEE Network, with limited environment knowledge”, IEEE
2020, vol. 34, no. 6, pp. 288-294. Transactions on Intelligent Transportation Systems,
[4] A.T.Azar, A. Koubaa, N.A. Mohamed, H.A. Ibrahim, Z.F. 2020, Vol. 20, No.1, pp. 107-118.
Ibrahim, M. Kazim, A. Ammar, B. Benjdira, A.M. Khamis, [17] O. Bouhamed, H. Ghazzai, H. Besbes, Y. Massoud,
I.A. Hameed, et al., “Drone Deep Reinforcement Learning: “Autonomous UAV navigation: A DDPG-based deep
A Review”, Electronics, 2021, Vol.10, pp. 999. reinforcement learning approach”, In Proceedings of IEEE
[5] Z. Hu, X. Gao, K. Wan, Y. Zhai, Q. Wang, “Relevant International Symposium on Circuits and Systems
experience learning: A deep reinforcement learning (ISCAS), 2020, pp. 1–5.
method for UAV autonomous motion planning in complex [18] U. Challita, W. Saad, C. Bettstetter, “Interference
unknown environments”, Chinese Journal of management for cellular-connected UAVs: A deep
Aeronautics, 2021, Vol. 34, No. 12. reinforcement learning approach”, IEEE Transactions on
[6] N. Imanberdiyev, C. Fu, E. Kayacan, I.-M. Chen, Wireless Communications, 2019, Vol. 18, pp. 2125–2140.
“Autonomous navigation of UAV by using real-time [19] C. Yan, X. Xiang, C. Wang, “Towards Real-Time Path
model-based reinforcement learning”, 14th International Planning through Deep Reinforcement Learning for a UAV
Conference on Control, Automation, Robotics and in Dynamic Environments”, Journal of Intelligent and
Vision (ICARCV), 2016, pp. 1–6. Robotic Systems, 2019, Vol. 98, pp. 297–309.

He 等 - 2021 - Explainable Deep Reinforcement Learning for UAV Autonomous Navigation
No ratings yet
He 等 - 2021 - Explainable Deep Reinforcement Learning for UAV Autonomous Navigation
12 pages
Deep Reinforcement Learning Based Local
No ratings yet
Deep Reinforcement Learning Based Local
7 pages
Deep Reinforcement Learning For Drone Delivery
No ratings yet
Deep Reinforcement Learning For Drone Delivery
19 pages
Baocaopbl 5
No ratings yet
Baocaopbl 5
14 pages
Reinforcement Learning Based Quadcopter Controller
No ratings yet
Reinforcement Learning Based Quadcopter Controller
7 pages
A Reinforcement Learning Agent For UAV Control Mathematical Foundations, Implementation, and Human Vs AI Benchmarking
No ratings yet
A Reinforcement Learning Agent For UAV Control Mathematical Foundations, Implementation, and Human Vs AI Benchmarking
13 pages
Two Step Dynamic Obstacle Avoidance
No ratings yet
Two Step Dynamic Obstacle Avoidance
38 pages
Autonomous Unmanned Aerial Vehicle Navigation Using Reinforcement Learning: A Systematic Review
No ratings yet
Autonomous Unmanned Aerial Vehicle Navigation Using Reinforcement Learning: A Systematic Review
24 pages
The Actor-Dueling-Critic Method
No ratings yet
The Actor-Dueling-Critic Method
20 pages
1 s2.0 S2352711023001930 Main
No ratings yet
1 s2.0 S2352711023001930 Main
8 pages
Offline Reinforcement Learning For LLM Multi-Step Reasoning
No ratings yet
Offline Reinforcement Learning For LLM Multi-Step Reasoning
13 pages
Design and Experimental Validation of Deep Reinforcement Learning-Based Fast Trajectory Planning and Control For Mobile Robot in Unknown Environment
No ratings yet
Design and Experimental Validation of Deep Reinforcement Learning-Based Fast Trajectory Planning and Control For Mobile Robot in Unknown Environment
15 pages
Drones 08 00060 With Cover
No ratings yet
Drones 08 00060 With Cover
22 pages
Reference Paper
No ratings yet
Reference Paper
25 pages
Deep Reinforcement Learning With Heuristic Correct
No ratings yet
Deep Reinforcement Learning With Heuristic Correct
18 pages
Reinforcement Learning Control of An Aerial Robot Based On A Tuned Proximal Policy Optimization in Takeoff and Hover Phases
No ratings yet
Reinforcement Learning Control of An Aerial Robot Based On A Tuned Proximal Policy Optimization in Takeoff and Hover Phases
7 pages
Behavior-Aware Robot Navigation With DRL
No ratings yet
Behavior-Aware Robot Navigation With DRL
8 pages
RLC Project
No ratings yet
RLC Project
13 pages
A New Autonomous Method of Drone Path Planning Based On Multiple Strategies For Avoiding Obstacles With High Speed and High Density
No ratings yet
A New Autonomous Method of Drone Path Planning Based On Multiple Strategies For Avoiding Obstacles With High Speed and High Density
21 pages
Sensors 23 02036
No ratings yet
Sensors 23 02036
24 pages
3D Obstacle Avoidance For UAV Based On RL and RealSense
No ratings yet
3D Obstacle Avoidance For UAV Based On RL and RealSense
6 pages
Comparison of Multiple Reinforcement Learning and Deep Reinforcement Learning Methods For The Task Aimed at Achieving The Goal
No ratings yet
Comparison of Multiple Reinforcement Learning and Deep Reinforcement Learning Methods For The Task Aimed at Achieving The Goal
9 pages
Abdolmaleki Et Al. - 2018 - Maximum A Posteriori Policy Optimisation
No ratings yet
Abdolmaleki Et Al. - 2018 - Maximum A Posteriori Policy Optimisation
23 pages
Manoj Ssa4 RL
No ratings yet
Manoj Ssa4 RL
6 pages
Final MSC Report Divyam Rastogi
No ratings yet
Final MSC Report Divyam Rastogi
78 pages
Deep Reinforcement Learning For Mobile Robot Path Planning: Hao Liu, Yi Shen, Shuangjiang Yu, Zijun Gao, Tong Wu
No ratings yet
Deep Reinforcement Learning For Mobile Robot Path Planning: Hao Liu, Yi Shen, Shuangjiang Yu, Zijun Gao, Tong Wu
7 pages
4001 Where To Go Next Learning A Subgoal Recommendation Policy For Navigation Among Pedestrians
No ratings yet
4001 Where To Go Next Learning A Subgoal Recommendation Policy For Navigation Among Pedestrians
8 pages
Trajectory-Control Using Deep System Identification and Model Predictive Control For Drone Control Under Uncertain Load
No ratings yet
Trajectory-Control Using Deep System Identification and Model Predictive Control For Drone Control Under Uncertain Load
6 pages
Virtual Testing and Policy Deployment Framework For Autonomous Navigation of An Unmanned Ground Vehicle Using Reinforcement Learning
No ratings yet
Virtual Testing and Policy Deployment Framework For Autonomous Navigation of An Unmanned Ground Vehicle Using Reinforcement Learning
6 pages
Reinforcement Learning For Robust Missile Autopilot Design
No ratings yet
Reinforcement Learning For Robust Missile Autopilot Design
10 pages
MEG511 - Term Report
No ratings yet
MEG511 - Term Report
15 pages
Reinforcement Learning For UAV Control With Policy and Reward Shaping
No ratings yet
Reinforcement Learning For UAV Control With Policy and Reward Shaping
9 pages
1 s2.0 S100093612030594X Main
No ratings yet
1 s2.0 S100093612030594X Main
18 pages
أمير الكوفي
No ratings yet
أمير الكوفي
20 pages
Modern Deep Reinforcement Learning Algorithms
No ratings yet
Modern Deep Reinforcement Learning Algorithms
56 pages
Reinforced Learning-Based Robust Control Design For Unmanned Aerial Vehicle
No ratings yet
Reinforced Learning-Based Robust Control Design For Unmanned Aerial Vehicle
16 pages
QPGAO RL UAVQ Rev1 Fix-1
No ratings yet
QPGAO RL UAVQ Rev1 Fix-1
15 pages
A Deep Reinforcement Learning Control Approach For High-Performance Aircraft
No ratings yet
A Deep Reinforcement Learning Control Approach For High-Performance Aircraft
41 pages
Environment Interaction of A Bipedal Robot Using Model-Free Control Framework Hybrid Off-Policy and On-Policy Reinforcement Learning Algorithm
No ratings yet
Environment Interaction of A Bipedal Robot Using Model-Free Control Framework Hybrid Off-Policy and On-Policy Reinforcement Learning Algorithm
12 pages
2.'continuous Control With Deep Reinforcement Learning
No ratings yet
2.'continuous Control With Deep Reinforcement Learning
16 pages
Intelligent Path Navigation For Autonomous Drones
No ratings yet
Intelligent Path Navigation For Autonomous Drones
24 pages
Comprehensive Survey of Reinforcement Learning From Algorithms To Practical Challenges
No ratings yet
Comprehensive Survey of Reinforcement Learning From Algorithms To Practical Challenges
79 pages
04.02 OBDP2021 Romanelli
No ratings yet
04.02 OBDP2021 Romanelli
7 pages
Autonomous Decision-Making Generation of UAV Based On Soft Actor-Critic Algorithm-1
No ratings yet
Autonomous Decision-Making Generation of UAV Based On Soft Actor-Critic Algorithm-1
6 pages
Autonomous Drone Racing With Deep Reinforcement Learning
No ratings yet
Autonomous Drone Racing With Deep Reinforcement Learning
9 pages
Towards Delivering A Coherent Self-Contained Explanation of Proximal Policy Optimization
No ratings yet
Towards Delivering A Coherent Self-Contained Explanation of Proximal Policy Optimization
36 pages
Hansen 2022
No ratings yet
Hansen 2022
20 pages
Autonomous Vehicle Control Via Deep Reinforcement Learning: Simon Kardell Mattias Kuosku
No ratings yet
Autonomous Vehicle Control Via Deep Reinforcement Learning: Simon Kardell Mattias Kuosku
73 pages
DL Questions
No ratings yet
DL Questions
30 pages
Path Following For Autonomous Mobile Robots With Deep Reinforcement Learning
No ratings yet
Path Following For Autonomous Mobile Robots With Deep Reinforcement Learning
22 pages
Reinforcement Learning and Dynamic Programming For Control
100% (1)
Reinforcement Learning and Dynamic Programming For Control
111 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Deep Reinforcement Learning With Optimized Reward Functions For Robotic Trajectory Planning
No ratings yet
Deep Reinforcement Learning With Optimized Reward Functions For Robotic Trajectory Planning
11 pages
3.reinforcement Learning DDPG-PPO Agent-Based Control S Ystem
No ratings yet
3.reinforcement Learning DDPG-PPO Agent-Based Control S Ystem
14 pages
Deep Reinforcement Learning Handout v2.0
0% (1)
Deep Reinforcement Learning Handout v2.0
6 pages
Deep Reinforcement Learning: Overcoming The Challenges of Deep Learning in Discrete and Continuous Markov Decision Processes
No ratings yet
Deep Reinforcement Learning: Overcoming The Challenges of Deep Learning in Discrete and Continuous Markov Decision Processes
110 pages
Autonomous Driving With Deep Reinforcement Learning in CARLA Simulation
No ratings yet
Autonomous Driving With Deep Reinforcement Learning in CARLA Simulation
7 pages
Drones 06 00323 v3
No ratings yet
Drones 06 00323 v3
18 pages
15) EXPLAIN Fitted Q and Deep Q-Learning
No ratings yet
15) EXPLAIN Fitted Q and Deep Q-Learning
17 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Face Recognition Using Matlab PDF
0% (1)
Face Recognition Using Matlab PDF
2 pages
Data Structures and Algorithms - CodeChef Discuss
No ratings yet
Data Structures and Algorithms - CodeChef Discuss
5 pages
SECA4002
No ratings yet
SECA4002
65 pages
Grade 8 June Exam
No ratings yet
Grade 8 June Exam
4 pages
Mid Term P2
No ratings yet
Mid Term P2
6 pages
AI - SMPS-Unit 6 - Week 2.2
No ratings yet
AI - SMPS-Unit 6 - Week 2.2
5 pages
EE-211 Circuit Analysis: Dr. Hadeed Ahmed Sher
No ratings yet
EE-211 Circuit Analysis: Dr. Hadeed Ahmed Sher
23 pages
Coding Theory and Applications (I) : A Quick Introduction To Coding Theory
No ratings yet
Coding Theory and Applications (I) : A Quick Introduction To Coding Theory
86 pages
1.8 Classical Encryption Techniques-Substitution Techniques
No ratings yet
1.8 Classical Encryption Techniques-Substitution Techniques
33 pages
Linear Programming: X X 0, y 0 Are Co-Ordinate Axes. X 0, y 0 Represents The Region in 1
No ratings yet
Linear Programming: X X 0, y 0 Are Co-Ordinate Axes. X 0, y 0 Represents The Region in 1
4 pages
On The Application of Linear Programming On A Transportation Problem
No ratings yet
On The Application of Linear Programming On A Transportation Problem
5 pages
Numerical Solutions of The Integral Equations of The First Kind
100% (1)
Numerical Solutions of The Integral Equations of The First Kind
8 pages
M335 Lecture
No ratings yet
M335 Lecture
30 pages
Anitha Christopher Automata Theory Lecture Notes
No ratings yet
Anitha Christopher Automata Theory Lecture Notes
80 pages
Context of Cryptography: Confidentiality
No ratings yet
Context of Cryptography: Confidentiality
13 pages
Quiz 7 Data Sci
No ratings yet
Quiz 7 Data Sci
3 pages
Machine Learning Assignments and Answers
No ratings yet
Machine Learning Assignments and Answers
35 pages
DS Lab Manual Final
No ratings yet
DS Lab Manual Final
49 pages
Design of Singly Reinforced Beam Case 1
No ratings yet
Design of Singly Reinforced Beam Case 1
12 pages
Be Computer Engineering Ai, DS, ML Semester 6 2023 December Data Analytics and Visualization Rev 2019 C Scheme
No ratings yet
Be Computer Engineering Ai, DS, ML Semester 6 2023 December Data Analytics and Visualization Rev 2019 C Scheme
2 pages
3 Signals and Systems
No ratings yet
3 Signals and Systems
41 pages
2024 Estimation
No ratings yet
2024 Estimation
91 pages
06 Searching and Sorting (DONE)
No ratings yet
06 Searching and Sorting (DONE)
187 pages
A Review of PID Control Tuning Methods and Applications
No ratings yet
A Review of PID Control Tuning Methods and Applications
10 pages
Resume Updated
100% (3)
Resume Updated
2 pages
Process-Validation - General-Principles-and-Practices-11
No ratings yet
Process-Validation - General-Principles-and-Practices-11
1 page
A Heuristic Algorithm For Identical Parallel Machine Scheduling: Splitting Jobs, Sequence Dependent Setup Times, and Limited Setup Operators
No ratings yet
A Heuristic Algorithm For Identical Parallel Machine Scheduling: Splitting Jobs, Sequence Dependent Setup Times, and Limited Setup Operators
35 pages
MCQ Test On Unit 6 - Attempt Review
No ratings yet
MCQ Test On Unit 6 - Attempt Review
6 pages
Lecture 10-Mealy and Moore Machine and Their Conversions
No ratings yet
Lecture 10-Mealy and Moore Machine and Their Conversions
5 pages
Cube It: Creating A 3D Rubik'S Cube Simulator in C++ and Opengl
No ratings yet
Cube It: Creating A 3D Rubik'S Cube Simulator in C++ and Opengl
1 page

SA031PL

Uploaded by

SA031PL

Uploaded by

Proceedings of the 28th World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2024)

Autonomous Navigation of Drones Using Explainable Deep Reinforcement Learning

Pawan Kumar SOURIPALLI

Laeba Jeelani SAYED

Dr. Shital S. CHIDDARWAR

With the use of a machine learning technique called deep | |. |

The computation of both the final reward and its cumulative

B. Hardware Setup Careful attention was devoted to the tuning of hyperparameters,

We have utilized SHAP and LIME methods to analyze feature

You might also like