Drones 08 00582 v2
Drones 08 00582 v2
Article
Enhancing UAV Swarm Tactics with Edge AI: Adaptive Decision
Making in Changing Environments
Wooyong Jung † , Changmin Park † , Seunghyeon Lee and Hwangnam Kim *
School of Electrical Engineering, Korea University, Seoul 02841, Republic of Korea; [email protected] (W.J.);
[email protected] (C.P.); [email protected] (S.L.)
* Correspondence: [email protected]; Tel.: +82-2-3290-4821
† These authors contributed equally to this work.
Abstract: This paper presents a drone system that uses an improved network topology and MultiA-
gent Reinforcement Learning (MARL) to enhance mission performance in Unmanned Aerial Vehicle
(UAV) swarms across various scenarios. We propose a UAV swarm system that allows drones to
efficiently perform tasks with limited information sharing and optimal action selection through our
Efficient Self UAV Swarm Network (ESUSN) and reinforcement learning (RL). The system reduces
communication delay by 53% and energy consumption by 63% compared with traditional MESH
networks with five drones and achieves a 64% shorter delay and 78% lower energy consumption with
ten drones. Compared with nonreinforcement learning-based systems, mission performance and
collision prevention improved significantly, with the proposed system achieving zero collisions in
scenarios involving up to ten drones. These results demonstrate that training drone swarms through
MARL and optimized information sharing significantly increases mission efficiency and reliability,
allowing for the simultaneous operation of multiple drones.
1. Introduction
Citation: Jung, W.; Park, C.; Lee, S.; In recent years, drone technology, initially developed for military purposes, has sig-
Kim, H. Enhancing UAV Swarm nificantly expanded to various industrial sectors [1,2]. In agriculture, drones equipped
Tactics with Edge AI: Adaptive with advanced imaging sensors and AI algorithms monitor crop health with precision,
Decision Making in Changing detecting early signs of nutrient deficiencies or water stress to optimize yields [3,4]. By
Environments. Drones 2024, 8, 582.
integrating various sensors and working alongside other rescue equipment, drones en-
https://doi.org/10.3390/
hance the effectiveness of disaster response efforts. Drones play a crucial role in search and
drones8100582
rescue missions with their real-time mapping capabilities, quickly surveying large areas
Academic Editor: Ziyang Zhen and creating topographic maps for efficient rescue operations. Surveillance drones provide
commanders with real-time information and situational awareness, which is essential for
Received: 28 July 2024
strategic decision making. In military contexts, drones are indispensable for surveillance,
Revised: 2 October 2024
reconnaissance, and combat [5]. Armed drones execute precision strikes with minimal
Accepted: 14 October 2024
Published: 15 October 2024
collateral damage, enhancing operational efficiency and reducing personnel risk. Addition-
ally, drones equipped with advanced sensors can detect and track hostile forces, enhancing
military preparedness and response capabilities [6].
However, as drone operations continue to expand in scale, traditional control methods
Copyright: © 2024 by the authors. utilizing Ground Control Stations (GCSs) and remote control reveal significant limitations,
Licensee MDPI, Basel, Switzerland. particularly in scenarios involving large swarms of drones. These conventional methods
This article is an open access article are increasingly inadequate in terms of response time and operational efficiency [7]. The
distributed under the terms and reliance on direct communication between an ever-growing number of drones amplifies the
conditions of the Creative Commons risks associated with delays, packet loss, and mission disruptions [8]. Furthermore, the task
Attribution (CC BY) license (https://
of programming multiple drones individually proves to be not only highly time-consuming
creativecommons.org/licenses/by/
but also inefficient. This inefficiency is starkly illustrated by incidents at drone shows, where
4.0/).
numerous drones have collided due to errors in direct programming. Such events highlight
the critical need for more advanced and reliable methods of managing and controlling
large-scale drone operations to mitigate these risks and enhance overall performance.
To address these challenges, we propose a swarming drone operation system like the
Boids algorithm, which simulates biological systems such as schools of fish and flocks
of birds. The Boids algorithm is a flock movement algorithm that simulates the flocking
behavior of birds. These natural systems exhibit swarming intelligence, where simple local
interactions among individuals result in sophisticated collective behaviors and efficient
task execution. As shown in Figure 1, similar to the operation of the Boids algorithm,
our system uses surrounding information to determine appropriate swarm behavior and
learns each Unmanned Aerial Vehicle (UAV) to perform optimized behavior through the
RL we designed. Through this process, each UAV recognizes the situation required to
achieve the goal of the entire UAV swarm, and at the same time takes the most optimal
action for that situation. The proposed system leverages sensor data from each drone,
along with information from nearby drones, to facilitate intelligent cooperation. Operating
autonomously, each drone communicates and collaborates with others to seamlessly achieve
collective goals, functioning like a unified organism. This biomimetic approach ensures
that the entire drone fleet can adapt dynamically to changing conditions, optimize task
performance, and mitigate the risks associated with traditional control methods.
The core of our approach is to build a dynamically efficient network between drones,
called the Efficient Self UAV Swarm Network (ESUSN), and to design the MultiAgent
Reinforcement Learning (MARL) system that efficiently improves performance by pro-
viding the necessary information for agents to select an action. ESUSN limits the number
of connections each drone maintains while optimizing overall cooperation. And using
the technology of the designed MARL system, each drone learns to select the optimal
operation based on information from surrounding drones and sensor values collected by
the Single-Board Computer (SBC). This adaptive framework not only enhances network
stability but also significantly improves the efficiency of critical tasks such as collision
avoidance. By continually adapting to the environment and the state of neighboring drones,
our system ensures robust and efficient drone operations, thereby overcoming the limita-
tions of traditional control methods and paving the way for more advanced and reliable
drone management strategies.
Drones 2024, 8, 582 3 of 23
Compared with existing technologies, our proposed system has several advantages,
as listed below:
• Efficient and reliable network algorithms for UAV swarms. We propose a swarm drone
operation system that leverages advanced network algorithms to build a dynamically
efficient network called the ESUSN. This system limits the number of connections each
drone maintains while optimizing overall cooperation to ensure robust and reliable
drone operations.
• Optimal action selection through RL. We design a flexible MARL system that operates
across various scenarios based on specific parameters. In this system, each drone
within the swarm learns to select the optimal behavior using a limited set of inputs
from surrounding drones and sensor values collected by the SBC. This adaptive
framework reduces the need for complex networks and significantly enhances the
efficiency of critical tasks such as reconnaissance and collision avoidance.
• Appropriate method of communication. Until now, when exchanging data between
drones in a UAV swarm, various communication problems have occurred because
the state of the drone was not taken into consideration and an inappropriate routing
method was used. However, our system improved network performance by consider-
ing the drone’s environment and configuring an appropriate communication system
to ensure that data are accurately delivered to the destination.
• Real-time adaptation. Each drone adapts in real time based on environmental changes
and the status of neighboring drones. This dynamic adaptive ability maximizes mission
performance efficiency and overcomes the limitations of traditional control methods.
The rest of this paper is organized as follows. Section 2 details the related research
for autonomous drones, including existing drone-to-drone communication approaches
and drone research related to RL. Section 3 describes the proposed UAV swarm, internal
communication system, and the system architecture designed for swarm drone operations
using MARL. Section 4 presents experimental evaluations verifying the effectiveness of
the swarm drone system, while Section 5 provides concluding insights and outlines future
research directions.
2. Related Work
This section discusses existing approaches related to our proposed system, providing a
comprehensive overview of advancements in UAV swarm network planning and decision
making using RL. We explore foundational research and recent developments that inform
and contrast with our innovative methodologies.
with cognitive radio technologies has been explored to improve spectrum efficiency and
communication reliability in UAV networks [17].
FANETs, in particular, are specialized for UAV communication, leveraging high mobil-
ity and dynamic topology to enable robust network connections [18]. The FANET structure
allows nodes to dynamically join and leave the network, making drone swarms highly scal-
able [19]. Even if some nodes fail, the remaining nodes can reconfigure the network through
a dynamic routing protocol, improving the robustness of the drone swarm [20]. Mutual
relay between each node reduces the node’s radiated power and reduces the likelihood of
drone swarms being detected, increasing survivability in battlefield environments.
Among the many routing protocols available for FANETs, AODV (Ad hoc On-demand
Distance Vector) is commonly used due to its ability to dynamically establish routes only
when needed. This reactive protocol is well suited for environments with high mobility,
such as drone swarms, where network topology frequently changes. AODV minimizes rout-
ing overhead by discovering routes on-demand and maintaining only active routes, which
is crucial for maintaining communication efficiency in highly dynamic UAV networks [21].
However, while AODV is effective in managing dynamic topology changes, it also has
limitations, such as increased latency during route discovery and higher susceptibility to
link failures during periods of rapid mobility.
However, there are many challenges in FANET, and there are several prior studies to
solve them. Research by Bekmezci et al. highlighted the unique challenges and solutions in
FANETs, emphasizing the importance of adaptive routing protocols to maintain network
stability and performance in dynamic environments [22]. Research to solve the problems of
FANET is trying to overcome the shortcomings of this network through various approaches.
By introducing Software-Defined Networking (SDN) and network virtualization in FANET,
it aims to solve major problems that arise in the process of building a high-performance
three-dimensional distributed heterogeneous network [23]. To minimize the periodic
transmission of “Hello” messages to maintain the route, AI-Hello aims to reduce network
bandwidth waste and energy consumption by adjusting the transmission interval of “Hello”
messages [24].
Despite these advancements, both FANETs and MANETs face limitations in scalability
and response times, particularly in large-scale UAV operations. Traditional networks often
struggle with maintaining efficient communication as the number of drones increases, lead-
ing to issues such as network congestion and increased latency. Our proposed algorithm
aims to address these issues by limiting the number of connections each drone maintains,
optimizing overall cooperation, and ensuring reliable communication through an adaptive
network structure. This approach not only enhances the scalability of the network but also
improves its robustness against node failures and dynamic environmental changes.
processes the shared data and infers optimized actions based on RL strategies. These
policies allow the UAVs to learn from their environment and improve task execution over
time. The inferred optimized actions are transmitted to the PX4 flight controller, which
executes these commands to control the UAV’s motors and sensors, ensuring stable flight.
Each UAV continuously checks if it has reached its assigned goal and adjusts its actions
as necessary.
The method for configuring the network is decentralized rather than centralized.
This ensures that communication does not overload a particular UAV, forming links that
allow each drone to exchange the minimal necessary information for action selection.
The proposed network topology algorithm, ESUSN, determines the links for sending
and receiving data and updates the network whenever the location changes. The system
architecture enables each UAV to autonomously navigate and share critical data with peers,
enhancing situational awareness and collective decision making. The inclusion of MARL in
each UAV facilitates continuous learning and optimization, allowing the swarm to adapt to
dynamic environments and execute complex missions efficiently. The PX4 flight controller
ensures high reliability and stability, enabling safe operation across various environments.
The modular design supports scalability, allowing more UAVs to be added to the swarm
without significant adjustments to the system.
distance, optimizing the overall efficiency of the network. This foundational structure
minimizes initial setup costs and forms a backbone for further optimization.
Beyond constructing the MST, the system ensures that each drone maintains at least
two connections to its nearest neighbors. This redundancy enhances network robustness,
allowing the system to tolerate node failures and dynamic changes in drone positions.
The optimization process further refines the network to maintain these connections while
ensuring that no drone exceeds its maximum link limit. This balanced approach helps
achieve minimal packet loss and high energy efficiency, making the ESUSN an optimal
solution for UAV networks.
Furthermore, the ESUSN system can dynamically adjust the network in real time to
accommodate changes in drone positions and mission requirements. This flexibility is
crucial for maintaining network stability and efficiency, enabling high performance across
various mission environments. By adapting to the real-time conditions and demands, the
ESUSN system ensures sustained optimal performance.
Energy efficiency is further enhanced through power-aware routing protocols that
consider the remaining battery life of each drone when establishing and maintaining
connections. By prioritizing routes that involve drones with higher energy reserves, the
ESUSN extends the overall operational time of the network, making it suitable for long-
duration missions. Moreover, the system can implement sleep modes for drones that
are temporarily not needed for communication or sensing tasks, conserving energy and
prolonging the network’s operational lifespan.
The key processes used in ESUSN are as follows:
1. Euclidean Distance Calculation:
q
d(i, j) = ( x i − x j )2 + ( y i − y j )2 (1)
where d(i, j) is the distance between drone i and drone j, and ( xi , yi ) and ( x j , y j ) are
the coordinates of drones i and j, respectively.
2. Minimum Spanning Tree (MST) Calculation using GHS Algorithm: The MST is
considered to care for all drones in the swarm. In our system, MST is constructed
using the GHS algorithm [39].
T = GHS_MST( G )
where T is the MST, and G is the graph of drone distances.
The algorithm involves the following steps:
(a) Each drone starts as an individual fragment.
(b) Each fragment identifies the smallest edge to another fragment and attempts
to merge.
(c) Repeat the merging process until all drones are connected into a single MST.
3. Network Optimization: To ensure all drones are connected while maintaining the
maximum number of links per drone:
(a) For each drone i, identify the two closest drones and add links such that i has
at most 2 links:
(b) Remove excess links if a drone has more than two connections. Keep only the
two shortest links:
remove_links(i ) = {(i, j) | i has more than two links, keep the two shortest}.
Drones 2024, 8, 582 8 of 23
final_links = T ∪ links.
The process of the ESUSN algorithm is expressed in Algorithm 1. Firstly, the algorithm
operates by continuously updating the positions of each drone and recalculating the
Euclidean distances between every pair of drones to form a distance matrix. This matrix is
used to construct a graph G where drones are vertices and the distances are edge weights.
Each drone is initially treated as an individual fragment. The GHS algorithm is employed to
construct the MST by identifying and merging the smallest edges between fragments until
all fragments are connected. Subsequently, each drone identifies its two closest neighbors
and establishes links with them. If any drone ends up with more than two connections, the
longest link is removed to ensure optimal network topology. The final network combines
the MST with these additional optimized links, and this process is continuously repeated
to dynamically adjust the network in real time, maintaining robustness and efficiency.
Algorithm 1 Efficient Self UAV Swarm Network (ESUSN) using GHS Algorithm
1: Input: Coordinates of drones ( xi , yi ) for i = 1, 2, . . . , n
2: Output: Optimized network connections
3: while network is active do
4: for each drone i do
5: Update position
6: end for
7: for each pair of dronesq(i, j) do
8: Calculate d(i, j) = ( xi − x j )2 + (yi − y j )2
9: end for
10: Construct graph G with d(i, j) as edge weights
11: for each drone i do
12: Initialize as an individual fragment
13: end for
14: while not all fragments are merged do
15: for each fragment do
16: Identify the smallest edge to another fragment
17: Attempt to merge fragments
18: end for
19: end while
20: for each drone i do
21: Identify two closest drones and add links
22: end for
23: while number of links of i > 2 do
24: Remove the longest link
25: end while
26: Combine MST with additional links to form final_links
27: end while
velocity vector allows the model to account for dynamic changes in velocity, which is
crucial for predicting agent motion and optimizing path planning. In addition, direction
vectors toward destinations play an important role in guiding agents toward their goals. By
incorporating these directional signals into the decision-making process, our model ensures
that agents are not only aware of their current location but also move towards achieving
their goals efficiently. An important aspect of our model design is to leverage distances to
fairly close peripheral agents or obstacles through 360-degree LiDAR measurements. These
measurements provide detailed information about spatial relationships between agents in
the environment, allowing them to better coordinate and avoid collisions.
The implementation of our model is supported by a PPO algorithm known to be able
to handle complex environments characterized by numerous interacting agents. PPO’s
on-policy learning approach allows agents to continuously improve their decision-making
strategies based on real-time data, optimizing their behavior in response to changing
environmental conditions. Notably, the use of a clip epsilon parameter in PPO, set at
0.2, prevents abrupt policy changes. Meanwhile, the discount factor (gamma), crucial
for balancing immediate rewards against future gains, is fixed at 0.8, while λ, which
adjusts action variability, is set to 0.9. The formulation for calculating the PPO Loss is as
Equation (2), ensuring precise evaluation and refinement of the model’s performance under
varying conditions. This structured approach ensures that our model remains adaptable
and effective across diverse operational environments, promoting reliable decision making
and collaborative synergy among agents.
πθ ( at | st )
rt (θ ) = . (3)
πθold ( at | st )
where Dcollision is a very short distance to assume that the agent has collided.
And j is another agent or obstacle,
• Si is the negative reward that agents i receive when they are outside a certain
range in the connected agent, defined as
(
0 if a ≤ d(i, j) ≤ b
Si = . (7)
−1 otherwise
where a and b are the minimum and maximum distance parameters between
agents declared for navigation and isolation prevention. And j are agents that
are linked through ESUSN.
All these reward components are integrated into a weighted sum to determine the
total reward value using Equation (4) for each agent. The weights of each reward com-
Drones 2024, 8, 582 11 of 23
ponent are set to reinforce specific behaviors of the agents. This comprehensive reward
structure ensures that agents are motivated to achieve their goals efficiently, avoid colli-
sions, and maintain proper spacing, thus facilitating effective collaboration and optimized
performance in a multiagent system.
In addition, at each time step, the agent’s action is defined as the acceleration values
along the x and y axes, which are used to compute the UAV’s movement. The policy
network outputs these acceleration values to adjust the UAV’s trajectory. The action
a = [ a x , ay ] is computed as follows:
a = π (s; θ ) (8)
where π (s; θ ) is the policy function parameterized by θ, mapping the state s to the cor-
responding acceleration values a x and ay along the x and y axes. At each step, the UAV
updates its position using the computed acceleration, allowing it to move dynamically
toward its target while avoiding obstacles in real time. This continuous adjustment ensures
smooth navigation and efficient task execution in a multiagent environment.
In summary, the integration of PPO in multiagent environments not only enhances
optimization through on-policy learning but also fosters adaptive decision-making capabili-
ties critical for navigating complex and dynamic operational settings. The PPO algorithm in
MARL with sharing ambient agent’s information through ESUSN is noted in Algorithm 2.
This approach not only promotes robust performance across diverse scenarios but also
emphasizes the efficacy of collaborative synergy among autonomous agents in achieving
shared objectives.
4. Performance Evaluation
We tested decision making in UAV swarms by implementing our system. To demonstrate
our results, we compared the network performance for existing network algorithms and our
algorithm. Additionally, our system showed superiority when learning was applied.
4.1. Implementation
For the drone hardware, we used the PX4 flight controller to ensure stable and precise
flight operations, which are critical for the success of our multi-UAV missions. The onboard
computational tasks were handled by a Raspberry Pi, which served as the SBC. For the
MARL aspect, we designed our system using the PPO algorithm due to its efficiency and
reliability in training complex policies for agents in dynamic environments. To facilitate
the training and testing of our MARL algorithms, we utilized the Vectorized MultiAgent
Simulator (VMAS) environment.
Communication between the SBC and the PX4 flight controller was achieved using
MAVSDK, a comprehensive software development kit for MAVLink protocol. MAVSDK
facilitates seamless communication and command execution between the Raspberry Pi and
PX4, allowing us to send flight commands, receive telemetry data, and ensure synchronized
operations across multiple UAVs. This setup allows for real-time data exchange and control,
which is crucial for coordinated multi-UAV missions.
Our experimental environment includes both static and dynamic obstacles to reflect
real-world UAV operational conditions. Static obstacles, such as buildings and other
immovable structures, represent the fixed elements that drones must navigate around
in structured environments. Additionally, the dynamic aspect is introduced through the
movement of other drones within the swarm, which act as moving obstacles for each other.
This constantly changing environment requires each drone to adapt in real time, adjusting
its behavior to avoid collisions and optimize navigation.
The key distinction between the proposed ESUSN algorithm combined with MARL
and traditional heuristic optimization algorithms lies in their adaptability and cooperative
efficiency. Heuristic methods, which often rely on static, predefined rules, are limited in
their ability to respond effectively to dynamic and unpredictable changes in the environ-
ment. In contrast, the ESUSN algorithm dynamically adjusts network topology in real
time, enabling UAVs to swiftly adapt to unexpected environmental shifts. Moreover, the
integration of MARL fosters cooperative learning among UAVs, allowing them to optimize
their actions collectively. This results in enhanced mission success rates and improved
collision avoidance. These adaptive capabilities are particularly advantageous in dynamic
scenarios where UAVs must operate autonomously under uncertain conditions. Unlike
heuristic methods, which struggle to provide real-time adaptability and collaboration, our
approach significantly enhances flexibility and efficiency through continuous learning
and cooperation.
The network performance experiments were conducted under two different scenar-
ios, each assuming 5 and 10 drones within the swarm, respectively. In a dynamically
updating routing environment, we compared the delay and energy consumption at each
second. In the Mesh configuration, every drone in the swarm communicated with all other
drones, whereas, in the ESUSN configuration, data were exchanged according to the links
determined by the algorithm. Through these experiments, we aimed to analyze the most
suitable network topology for drone environments and compare the performance of the
two network configurations as the network size and complexity increased.
As a result of comparing the MESH and ESUSN simulation datasets in an environment
with five drones in a swarm, as can be seen in Figure 4, the average delay time of the
MESH network is 0.884 s, while ESUSN is 0.418 s, which is approximately 53% shorter. It
turned out to be shorter. This indicates that the ESUSN network can provide faster and
more stable communication. Additionally, The average energy consumption of the MESH
network was 2569.141 Joules, whereas the ESUSN network’s average energy consumption
was 950.605 Joules, about 63% less. Over a communication period of 100 s, the total energy
consumption for the ESUSN network was 2397 Joules, compared with 7490 Joules for the
MESH network.
5,000
0.7
Delay (s)
4,000
0.6
3,000
0.5 2,000
1,000
0.4
0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Time (s) Time (s)
0.8
10,000
0.7
Delay (s)
0.6 8,000
0.5 6,000
0.4
4,000
0.3
2,000
0.2
0.1 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Time (s) Time (s)
The comparison between ESUSN and AODV-based FANET was conducted to evaluate
ESUSN’s performance in dynamic environments such as drone swarms. The results, as
shown in Figure 6, indicate that ESUSN outperforms AODV-based FANET in key metrics.
In terms of communication delay, ESUSN consistently maintained lower latency compared
with AODV, demonstrating its ability to provide more stable and efficient communication in
dynamic conditions. Additionally, ESUSN exhibited more optimized energy consumption,
maintaining a lower and more stable energy usage pattern compared with AODV-based
FANET. These findings confirm that ESUSN is more suited for drone swarm environments,
offering better resource efficiency and stability in rapidly changing network conditions.
Drones 2024, 8, 582 15 of 23
5
0.5
Delay (s)
0.3
3
0.2 2
0.1 1
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Time (s) Time (s)
20
10
5 10
0
0
−5
Reward
Reward
−10 −10
−15
−20
−20
−30
−25
−30 −40
0 10 20 30 40 50 0 10 20 30 40 50
Epochs Epochs
0
10
0 −10
−10
Reward
Reward
−20
−20
−30 −30
−40
−40
−50
0 10 20 30 40 50 0 10 20 30 40 50
Epochs Epochs
(c) MESH network with 5 agents (d) MESH network with 10 agents
10 20
0 10
−5
Reward
Reward
−10 0
−15
−10
−20
−25
−20
−30
0 10 20 30 40 50 0 10 20 30 40 50
Epochs Epochs
(e) ESUSN network with 5 agents (f) ESUSN network with 10 agents
Figure 8 compares the trajectories of agents based on identical coordinates for agents,
obstacles, and targets. As seen above, when there are 10 agents, the MESH network scenario
does not allow the policy to learn properly, resulting in the agents failing to reach the target
correctly. While the No Link and ESUSN scenarios may appear to have similar performance,
as shown in Figure 9, there is a notable difference when the number of agents is ten. In the
ESUSN scenario, the distance of all agents to their targets quickly converges the distance
vector to (0, 0) as steps progress. In contrast, in the No Link scenario, the target distance
does not converge properly distance vector to (0, 0). This indicates that without the ability
to know the information of surrounding agents, each agent moves closer to the target but
ends up choosing unnecessary actions to avoid collisions.
Drones 2024, 8, 582 17 of 23
2.5
agent_0 agent_0
2.0 agent_1 agent_1
agent_2 2.0 agent_2
agent_3 agent_3
agent_4 agent_4
1.5
obstacle 1.5 agent_5
agent_6
Y-position
Y-position
agent_7
1.0 1.0 agent_8
agent_9
obstacle
0.5 0.5
0.0
0.0
−0.5
−0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8
X-position X-position
agent_0
2.0 agent_1
agent_2 1.5
agent_3
agent_4
1.5
obstacle
1.0 agent_0
Y-position
Y-position
agent_1
1.0 agent_2
agent_3
0.5 agent_4
agent_5
0.5
agent_6
0.0 agent_7
agent_8
0.0 agent_9
obstacle
−0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8
X-position X-position
(c) MESH network with 5 agents (d) MESH network with 10 agents
agent_0 agent_0
2.0 agent_1 2.0 agent_1
agent_2 agent_2
agent_3 agent_3
agent_4 1.5 agent_4
1.5
obstacle agent_5
agent_6
Y-position
Y-position
1.0 agent_7
1.0 agent_8
agent_9
obstacle
0.5
0.5
0.0
0.0
−0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8
X-position X-position
(e) ESUSN network with 5 agents (f) ESUSN network with 10 agents
Table 1 compares the performance metrics for each scenario in test situations. Accord-
ing to Table 1, the number of steps required for agents to reach the target is relatively lower
in the ESUSN scenario. This indicates that agents in our proposed system can quickly
reach the target by selecting optimal actions. Additionally, unlike the other scenarios, the
ESUSN scenario maintains a collision count of zero, demonstrating that agents can find
safe and efficient actions. Epoch time refers to the time consumed per epoch during policy
training and is measured in minutes. With five agents, the No Link scenario recorded
2 min, the MESH scenario 5 min, and the ESUSN scenario 2 min. With ten agents, the No
Link scenario recorded 8 min, the MESH scenario 15 min, and the ESUSN scenario 6 min.
As the number of links in the network increases, the model’s size also grows, naturally
increasing the time required for training. Additionally, epoch time is related to the number
of steps required to complete an episode; the faster an episode is completed, the quicker
the transition to the next epoch.
Drones 2024, 8, 582 18 of 23
agent_0 agent_0
0.0 agent_1 0.0 agent_1
agent_2 agent_2
agent_3 agent_3
agent_4 −0.5 agent_4
−0.5
agent_5
Y-goal_vector
Y-goal_vector
agent_6
−1.0 agent_7
−1.0 agent_8
agent_9
−1.5
−1.5
−2.0
−2.0
−2.5
−0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8
X-goal_vector X-goal_vector
agent_0
0.0
agent_1
agent_2 −0.5
agent_3
agent_0
−0.5 agent_4
agent_1
agent_2
Y-goal_vector
Y-goal_vector
−1.0
agent_3
−1.0 agent_4
agent_5
−1.5 agent_6
agent_7
−1.5 agent_8
−2.0 agent_9
−2.0
−2.5
−0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8
X-goal_vector X-goal_vector
(c) MESH network with 5 agents (d) MESH network with 10 agents
Y-goal_vector
agent_6
−1.0 agent_7
−1.0
agent_8
agent_9
−1.5
−1.5
−2.0
−2.0
−2.5
−0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8
X-goal_vector X-goal_vector
(e) ESUSN network with 5 agents (f) ESUSN network with 10 agents
Number of Agents 5 10
Collision 1 0 0 2 5 0
Epoch Time 2 5 2 8 15 6
The ESUSN scenario shows that efficient link configuration does not significantly
increase training time compared with the No Link scenario. However, due to the higher
Drones 2024, 8, 582 19 of 23
computational load, the epoch time is naturally higher than No Link. Nevertheless, by
utilizing information from surrounding agents, agents can reach the target faster and
complete episodes more quickly, thus reducing the overall epoch time. These experimental
results emphasize the efficiency and safety of our proposed system regarding energy and
time consumption compared with other scenarios. By prioritizing the overall reward of the
drone swarm over individual rewards and optimizing actions through shared information
and reinforcement learning, the system achieves higher mission performance and collision
avoidance. This study demonstrates that the selective information sharing of the ESUSN
network and the MARL system effectively and safely achieves overall mission objectives,
even in scenarios where multiple drones operate simultaneously.
To evaluate the performance of the system proposed in this paper, we additionally
conducted comparative experiments with Dijkstra’s algorithm and General RL. Table 2
presents the comparative performance results between the systems. Dijkstra’s algorithm
is commonly used to find the shortest path from a single source node to other nodes in a
graph and is widely applied in path optimization problems. Since Dijkstra’s algorithm is
an algorithm rather than a learning-based model, it does not require training time, which
may give it an advantage over our model at the initial comparison. General RL refers to
reinforcement learning that does not utilize multiagent systems, and we applied the PPO
method to conduct the comparative experiments. The experimental environment was set
up similarly to the experiments in Table 1, dividing the number of agents into 5 and 10 for
the comparative experiments.
Even when the number of agents was 5, Dijkstra’s algorithm and Normal RL already
showed slightly inferior performance compared with the system proposed in this paper.
Dijkstra’s algorithm reached the destination in 73 steps but had 1 collision, and General
RL reached it in 90 steps but had 2 collisions, failing to match our proposed model’s
performance of reaching the destination in 70 steps with 0 collisions. As the number of
agents increased, this difference became more pronounced. When the number of agents was
10, Dijkstra’s algorithm reached the destination in 96 steps with 2 collisions, and General
RL reached it in 150 steps with 3 collisions. However, our proposed system completed the
scenario in 105 steps with 0 collisions.
These performance records demonstrate that Dijkstra’s algorithm and General RL
learning do not consider the environmental changes that occur as multiple agents select
their actions. In contrast, our proposed system maintains high performance because
it is trained to recognize the environment and dynamically select actions by efficiently
exchanging only the necessary information through interactions among agents.
Number of Agents 5 10
Collision 1 2 0 2 3 0
state values held by each agent stays consistent regardless of the number of UAVs, and the
model can be applied even if the number of agents during training differs from the number
of agents during inference like Figure 10. In practice, when the policy derived through
training is deployed to each UAV’s SBC, the UAV collects state values via its sensors, inputs
them into the policy, and operates as expected. This approach theoretically allows for the
infinite expansion of UAV numbers.
Figure 11 compares the performance of each algorithm as the number of UAVs in-
creases to 15. The MESH network was excluded from the comparison due to its structure
requiring retraining based on the number of UAVs. As seen in Figure 11a, when the
number of UAVs expands to 15, both Dijkstra and No Link fail to complete the episode.
This is because the lack of information about neighboring UAVs prevents each UAV from
optimizing its trajectory, making it unable to avoid obstacles and complete the episode.
Additionally, while the general RL model reaches the destination faster than the MARL
model using the ESUSN network, Figure 11b shows that it results in more than five times
the number of collisions compared with the MARL model using ESUSN. Compared with
these three algorithms, the proposed model demonstrates stable performance even as the
number of UAVs increases, effectively avoiding collisions with obstacles and other UAVs,
and successfully completing the episode.
Figure 11. Performance comparison between algorithms as the number of UAVs increases.
UAVs to navigate more complex spaces where height adjustments are critical for optimal
performance and collision avoidance. Furthermore, it is important to test the system’s
scalability and practicality in various real-world scenarios. Although our experiments
include static obstacles such as buildings and dynamic elements like other drones in the
swarm, future work can be focused on testing the system in larger and more intricate
environments. These environments could feature additional dynamic variables, such as
unpredictable external factors or more sophisticated obstacle configurations. Expanding
the scale and complexity of the testing environment can help to validate the system’s
robustness and adaptability in real-world scenarios. Moreover, exploring improvements in
communication protocols and decision-making algorithms could further optimize UAV
swarm performance in environments that require rapid adaptation to changing conditions.
Future research could investigate the integration of more advanced learning algorithms
and the development of enhanced communication systems that can maintain efficiency in
highly dynamic and large-scale scenarios alike.
6. Conclusions
This study introduced an innovative UAV swarm system utilizing the Efficient Self
UAV Swarm Network (ESUSN) and MultiAgent Reinforcement Learning (MARL) to im-
prove mission performance, communication efficiency, and collision avoidance. The results
demonstrated notable advantages, such as reduced communication delay and energy
consumption compared with traditional network configurations, and enhanced mission
success rates, particularly in scenarios involving multiple UAVs. The proposed system
showed significant improvements in reducing communication delay by up to 64% and
energy consumption by 78% compared with MESH networks, while also achieving zero
collisions in test scenarios. These findings emphasize the system’s capability to increase
mission efficiency, making it more reliable in real-time applications involving multiple
drones. The combination of optimized network design and reinforcement learning allows
UAVs to perform complex tasks while sharing minimal data, ensuring both stability and
efficiency. Overall, this research highlights the potential of combining reinforcement learn-
ing and optimized network algorithms to enhance UAV swarm operations. The presented
methodology showcases a promising approach for improving the scalability, efficiency, and
robustness of UAV swarms across various mission-critical applications, from surveillance
to disaster management.
Author Contributions: Conceptualization, W.J., C.P. and H.K.; methodology, W.J. and C.P.; software,
W.J., C.P. and S.L; validation, W.J., C.P. and H.K.; investigation, W.J., C.P. and S.L.; formal analysis,
W.J., C.P. and H.K.; resources, H.K.; data curation, W.J. and C.P.; writing—original draft preparation,
W.J., C.P., S.L. and H.K.; writing—review and editing, W.J., C.P., S.L. and H.K.; visualization, W.J., C.P.
and S.L.; supervision, H.K.; project administration, H.K.; funding acquisition, H.K. All authors have
read and agreed to the published version of the manuscript.
Funding: This research was supported by Korea Research Institute for defense Technology Planning
and advancement (KRIT) grant funded by the Korea government (DAPA(Defense Acquisition Pro-
gram Administration)) (KRIT-CT-23-041, LiDAR/RADAR Supported Edge AI-based Highly Reliable
IR/UV FSO/OCC Specialized Laboratory, 2024).
Data Availability Statement: The data presented in this study are available on request from the
corresponding author. The data are not publicly provided under the terms of the contract with the
funding agency.
Conflicts of Interest: The authors declare no conflicts of interest. Furthermore, the funders had no
role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of
the manuscript; or in the decision to publish the results.
Drones 2024, 8, 582 22 of 23
Abbreviations
The following abbreviations are used in this manuscript:
References
1. Hayat, S.; Yanmaz, E.; Muzaffar, R. Survey on Unmanned Aerial Vehicle Networks for Civil Applications: A Communications
Viewpoint. IEEE Commun. Surv. Tutor. 2016, 18, 2624–2661. [CrossRef]
2. Shakhatreh, H.; Sawalmeh, A.H.; Al-Fuqaha, A.; Dou, Z.; Almaita, E.; Khalil, I.; Othman, N.S.; Khreishah, A.; Guizani,
M. Unmanned Aerial Vehicles (UAVs): A Survey on Civil Applications and Key Research Challenges. IEEE Access 2019,
7, 48572–48634. [CrossRef]
3. Dutta, G.; Goswami, P. Application of drone in agriculture: A review. Int. J. Chem. Stud. 2020, 8, 181–187. [CrossRef]
4. Veroustraete, F. The rise of the drones in agriculture. EC Agric. 2015, 2, 325–327.
5. Yoo, T.; Lee, S.; Yoo, K.; Kim, H. Reinforcement learning based topology control for UAV networks. Sensors 2023, 23, 921.
[CrossRef] [PubMed]
6. Park, S.; Kim, H.T.; Kim, H. Vmcs: elaborating apf-based swarm intelligence for mission-oriented multi-uv control. IEEE Access
2020, 8, 223101–223113. [CrossRef]
7. Lee, W.; Lee, J.Y.; Lee, J.; Kim, K.; Yoo, S.; Park, S.; Kim, H. Ground control system based routing for reliable and efficient
multi-drone control system. Appl. Sci. 2018, 8, 2027. [CrossRef]
8. Fotouhi, A.; Ding, M.; Hassan, M. Service on Demand: Drone Base Stations Cruising in the Cellular Network. In Proceedings of
the 2017 IEEE Globecom Workshops (GC Wkshps), Singapore, 4–8 December 2017; pp. 1–6. [CrossRef]
9. Shahzad, M.M.; Saeed, Z.; Akhtar, A.; Munawar, H.; Yousaf, M.H.; Baloach, N.K.; Hussain, F. A review of swarm robotics in a
nutshell. Drones 2023, 7, 269. [CrossRef]
10. Yoon, N.; Lee, D.; Kim, K.; Yoo, T.; Joo, H.; Kim, H. STEAM: Spatial Trajectory Enhanced Attention Mechanism for Abnormal
UAV Trajectory Detection. Appl. Sci. 2023, 14, 248. [CrossRef]
11. Park, S.; Kim, H. Dagmap: Multi-drone slam via a dag-based distributed ledger. Drones 2022, 6, 34. [CrossRef]
12. Bekmezci, I.; Sen, I.; Erkalkan, E. Flying ad hoc networks (FANET) test bed implementation. In Proceedings of the 2015 7th
International Conference on Recent Advances in Space Technologies (RAST), Istanbul, Turkey, 16–19 June 2015; pp. 665–668.
13. Corson, S.; Macker, J. Mobile ad Hoc Networking (MANET): Routing Protocol Performance Issues and Evaluation Considerations.
RFC 2501 1999. Available online: https://www.rfc-editor.org/rfc/rfc2501 (accessed on 27 July 2024).
14. Park, C.; Lee, S.; Joo, H.; Kim, H. Empowering adaptive geolocation-based routing for UAV networks with reinforcement learning.
Drones 2023, 7, 387. [CrossRef]
15. Park, S.; La, W.G.; Lee, W.; Kim, H. Devising a distributed co-simulator for a multi-UAV network. Sensors 2020, 20, 6196.
[CrossRef] [PubMed]
16. Nazib, R.A.; Moh, S. Routing protocols for unmanned aerial vehicle-aided vehicular ad hoc networks: A survey. IEEE Access
2020, 8, 77535–77560. [CrossRef]
17. Saleem, Y.; Rehmani, M.H.; Zeadally, S. Integration of cognitive radio technology with unmanned aerial vehicles: issues,
opportunities, and future research challenges. J. Netw. Comput. Appl. 2015, 50, 15–31. [CrossRef]
18. Gupta, L.; Jain, R.; Vaszkun, G. Survey of important issues in UAV communication networks. IEEE Commun. Surv. Tutor. 2015,
18, 1123–1152. [CrossRef]
19. Khan, M.F.; Yau, K.L.A.; Noor, R.M.; Imran, M.A. Routing schemes in FANETs: A survey. Sensors 2019, 20, 38. [CrossRef]
Drones 2024, 8, 582 23 of 23
20. Srivastava, A.; Prakash, J. Future FANET with application and enabling techniques: Anatomization and sustainability issues.
Comput. Sci. Rev. 2021, 39, 100359. [CrossRef]
21. Nayyar, A. Flying adhoc network (FANETs): simulation based performance comparison of routing protocols: AODV, DSDV, DSR,
OLSR, AOMDV and HWMP. In Proceedings of the 2018 International Conference on Advances in Big Data, Computing and Data
Communication Systems (icABCD), Durban, South Africa, 6–7 August 2018; pp. 1–9.
22. Bekmezci, I.; Sahingoz, O.K.; Temel, Ş. Flying ad-hoc networks (FANETs): A survey. Ad Hoc Netw. 2013, 11, 1254–1270. [CrossRef]
23. Zhu, L.; Karim, M.M.; Sharif, K.; Xu, C.; Li, F. Traffic flow optimization for UAVs in multi-layer information-centric software-
defined FANET. IEEE Trans. Veh. Technol. 2022, 72, 2453–2467. [CrossRef]
24. Ayub, M.S.; Adasme, P.; Melgarejo, D.C.; Rosa, R.L.; Rodríguez, D.Z. Intelligent hello dissemination model for FANET routing
protocols. IEEE Access 2022, 10, 46513–46525. [CrossRef]
25. Koch, W.; Mancuso, R.; West, R.; Bestavros, A. Reinforcement learning for UAV attitude control. ACM Trans.-Cyber-Phys. Syst.
2019, 3, 1–21. [CrossRef]
26. Azar, A.T.; Koubaa, A.; Ali Mohamed, N.; Ibrahim, H.A.; Ibrahim, Z.F.; Kazim, M.; Ammar, A.; Benjdira, B.; Khamis, A.M.;
Hameed, I.A.; et al. Drone deep reinforcement learning: A review. Electronics 2021, 10, 999. [CrossRef]
27. Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep
reinforcement learning. arXiv 2015, arXiv:1509.02971.
28. Tsouros, D.C.; Bibi, S.; Sarigiannidis, P.G. A review on UAV-based applications for precision agriculture. Information 2019, 10, 349.
[CrossRef]
29. Yang, Q.; Zhu, Y.; Zhang, J.; Qiao, S.; Liu, J. UAV air combat autonomous maneuver decision based on DDPG algorithm. In
Proceedings of the 2019 IEEE 15th International Conference on Control and Automation (ICCA), Edinburgh, UK, 16–19 July 2019;
pp. 37–42.
30. Cetin, E.; Barrado, C.; Muñoz, G.; Macias, M.; Pastor, E. Drone navigation and avoidance of obstacles through deep reinforcement
learning. In Proceedings of the 2019 IEEE/AIAA 38th Digital Avionics Systems Conference (DASC), San Diego, CA, USA, 8–12
September 2019; pp. 1–7.
31. Li, K.; Ni, W.; Emami, Y.; Dressler, F. Data-driven flight control of internet-of-drones for sensor data aggregation using multi-agent
deep reinforcement learning. IEEE Wirel. Commun. 2022, 29, 18–23. [CrossRef]
32. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep
reinforcement learning. arXiv 2013, arXiv:1312.5602.
33. Hodge, V.J.; Hawkins, R.; Alexander, R. Deep reinforcement learning for drone navigation using sensor data. Neural Comput.
Appl. 2021, 33, 2015–2033. [CrossRef]
34. Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference
on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30.
35. Tong, G.; Jiang, N.; Biyue, L.; Xi, Z.; Ya, W.; Wenbo, D. UAV navigation in high dynamic environments: A deep reinforcement
learning approach. Chin. J. Aeronaut. 2021, 34, 479–489.
36. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017,
arXiv:1707.06347.
37. Drew, D.S. Multi-agent systems for search and rescue applications. Curr. Robot. Rep. 2021, 2, 189–200. [CrossRef]
38. Gavin, T.; LacroiX, S.; Bronz, M. Multi-Agent Reinforcement Learning based Drone Guidance for N-View Triangulation. In
Proceedings of the 2024 International Conference on Unmanned Aircraft Systems (ICUAS), Chania-Crete, Greece, 4–7 June 2024;
pp. 578–585.
39. Gallager, R.G.; Humblet, P.A.; Spira, P.M. A Distributed Algorithm for Minimum-Weight Spanning Trees. ACM Trans. Program.
Lang. Syst. (TOPLAS) 1983, 5, 66–77. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.