Towards Learning-Based Distributed Task Allocation Approach for Multi-Robot System
Towards Learning-Based Distributed Task Allocation Approach for Multi-Robot System
Abstract—This paper introduces a novel application of Graph times, task durations, task deadlines, and fuel constraints are
Convolutional Networks (GCNs) for enhancing the efficiency of factors. Finding the optimal solution to this task allocation
the Consensus-Based Bundle Algorithm (CBBA) in multi-robot problem in real-time environments becomes computationally
task allocation scenarios. The proposed approach in this research
lies in the integration of a learning-based strategy to approximate unfeasible as the number of tasks/agents grows. However,
the heuristic methods traditionally used for scoring in the CBBA distributed algorithms assume ideal communication conditions
framework. By employing GCNs, the proposed methodology aims and rely on consensus for consistent situational awareness
to learn and predict the score function, which is crucial for (SA). Current state-of-the-art consensus-based task allocation
task allocation decisions in multi-robot systems. This approach algorithms incorporate heuristics into agent score functions in
not only streamlines the allocation process but also potentially
improves the accuracy and efficiency of task distribution among order to optimize a given objective. While extensive research
robots. The paper presents a detailed exploration of how GCNs has been done in the area of multi-agent learning of optimal
can be effectively tailored for this specific application, along policies [8], [9]. Each solution involves trade-offs between
with results demonstrating the advantages of this learning- efficiency, optimality, and robustness [10], [11], [12], [13].
based approach over conventional heuristic methods in various An alternative approach to resolving this dilemma is the
simulated multi-robot task allocation scenarios.
Keywords—Task Allocation, Multirobot System, Distributed implementation of the auction algorithm. In the auction algo-
Algorithms, Graph Convolutional Neural Networks rithm, agents bid on individual tasks, and a central system or
designated agent acts as an auctioneer to select the winning
I. I NTRODUCTION bids. The bundle algorithm simplifies this by having agents bid
The task allocation problem aims to find a globally fea- on groups of tasks, or bundles, rather than single tasks. While
sible allocation of tasks to agents while optimizing one or both methods offer dynamic and potentially efficient task
more objectives. For Multi-Robot Systems (MRS) with varied allocation, they may not be as robust as consensus algorithms
capabilities, two main challenges arise: the high computa- in adapting to changes in communication networks. However,
tional complexity of traversal algorithms and the limitations traditional auction algorithms are generally more computa-
of centralized algorithms, including reduced task range and tionally efficient compared to consensus algorithms, which
single point of failure risk [1], [2]. To address these, heuristic excel in robustness but may lack in speed, especially in large-
algorithms are used as a more efficient, though not always scale systems. The choice between these methods depends
optimal, alternative to traversal algorithms. The effectiveness on the MRS’s specific needs for adaptability, efficiency, and
of a given heuristic is dependent on various factors including communication robustness.
the constraints and parameters of the problem being solved The Consensus-Based Bundle Algorithm (CBBA) [4] dis-
and the objective being optimized [3]. Additionally, distributed cussed in this paper is a hybrid approach for task allocation
algorithms replace centralized ones, enhancing task range in MRS, combining auction-based methods and consensus
and system robustness by distributing decision-making across algorithms. It uses auctions to distribute tasks among robots
agents. Distributed consensus-based algorithms can solve task and employs a consensus mechanism to resolve any conflicts
allocation problems in a cooperative planning process consist- arising from overlapping bids or dependencies. CBBA stands
ing of two phases [4]. In the first phase, an agent constructs out for its efficient convergence, quickly reaching a stable state
a schedule of selected tasks through an internal decision- of task allocation. Additionally, it guarantees at least 50%
making process. This process has previously been referred to optimality in its solutions, when the bidding price has the
as a utility function [5], a score function [6], or an objective diminishing marginal gain (DMG) property [14]. Balancing
function [7]. In the second phase, agents communicate bids speed and efficiency with a reasonable level of accuracy. The
on their selected task allocation and resolve conflicts by CBBA effectively addresses the challenges of distributed task
assigning tasks to the agents with the highest bids. Agents allocation by combining the dynamic nature of auctions with
perform one task at a time, and each agent can be assigned the conflict resolution capabilities of consensus algorithms.
multiple tasks that they execute based on a schedule. Travel On the other hand, Graph Convolutional Networks (GCNs)
35
Authorized licensed use limited to: National Univ of Defense Tech. Downloaded on December 15,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
Algorithm 1 Path Planning Algorithm for agent-i/iteration (t+ for their unparalleled prowess in feature extraction and rep-
1) resentation learning. Among the array of deep learning archi-
1: Process: Construct Bundle(νi (t), σi (t), ζi (t), ηi (t)) tectures, Convolutional Neural Networks (CNNs) have gar-
2: νi (t + 1) = νi (t) nered widespread acclaim, especially for their performance
3: σi (t + 1) = σi (t) in processing data characterized by a Euclidean or grid-like
4: ζi (t + 1) = ζi (t) topology.
5: ηi (t + 1) = ηi (t) Despite their success, traditional CNNs encounter signif-
6: while |ζi | ≥ Ki do icant challenges when confronted with data embedded in
η ⊕ {j}
7: S̃ij [ζi ] = maxn≤|ηi | Θi i n − Θηi i , j ∈ It − ζi (t) non-Euclidean spaces, such as the intricate webs of social
8: νij = I(S̃ij >ωij ) ∀j ∈ I t
or information networks, where translation invariance is no
9: Ji = argmax S̃ij [ζi ] × νij longer a given. To bridge this gap, Graph Convolutional
j Networks (GCNs) have been introduced as a robust method
η ⊕n {Ji }
10: ni,Ji = argmaxΘi i for navigating the complex terrain of graph-structured data.
n GCNs have revolutionized our ability to tap into the rich
11: ηi = ηi ⊕ni,Ji {Ji }
vein of information contained within non-Euclidean domains,
12: ζi = ζi ⊕end {Ji }
enabling the extraction of salient features that conventional
13: ωiJi (t + 1) = S̃iJi
methods would struggle to discern.
14: σiJi (t + 1) = i
Considering a graph G = (V, E) with V as the set of
15: End Process vertices and E as the edges denoting relationships, graph
convolutions can be processed in either the spatial or spectral
domains. Spatially, convolutions aggregate feature information
b) Phase II: Conflict Resolution Procedure: During con- from a node’s local neighborhood directly, leveraging residual
flict resolution, agents communicate their bid values and the connections for deep memory across layers. Each vertex is
provisional winners for each task. The task is provisionally equipped with its own neural network, and its activation in
awarded to the agent with the highest marginal score for that (k)
the k th layer, denoted by hv , is given by the equation:
task. An agent that has been outbid for a task must relinquish
the task and any subsequent tasks in its bundle that were X
dependent on it. h(k)
v =σ
W (k) xv + θ(k) h(k−1)
u
This phase operates under the principle of Diminishing u∈N (v)
Marginal Gain (DMG), which posits that the marginal score (k) (k)
for a task, denoted by S̃ij [ηi ], should not increase with the where W and θ are learned parameters for intra- and
addition of tasks to the agent’s bundle. Formally, this is inter-nodal connections, respectively, and σ(·) represents a
expressed as: nonlinear activation function.
For the spectral domain, graph convolutions apply through
the transformation of features into the Fourier space using
S̃ij [ηi ] ≥ S̃ij [ηi ⊕end {j}], (3)
eigendecomposition of the normalized graph Laplacian L =
1 1
where ηi is the current task bundle for agent i, and j is a I − D− 2 AD− 2 = U ΛU T . Here, U contains the eigenvec-
new task being considered. tors, Λ is a diagonal matrix of eigenvalues, and the Fourier
Convergence to a stable task allocation and a guarantee of transformed features are U T x. A filter parameterized by Θ
at least 50% optimality are ensured by the CBBA under the operates on these transformed features, which is expressed as:
DMG condition for the scoring function. Should the scoring
function not naturally fulfill the DMG criterion, a warping gθ′ ⋆ x = U gθ ΛU T x
mechanism is applied. The warping adjusts the score S̃ij [ηi ] where gθ′ denotes the filtered signal. The adjacency matrix,
to:
with self-loops, is denoted by à = A + IN , and the layer-
wise propagation in the spectral GCN, which is utilized in
S̃ij [ηi ] = min{S̃ij [η]}, ∀η ⊆ ηi , (4) this research, follows:
which assists in algorithm convergence when the natural 1 1
H (l+1) = σ D̃− 2 ÃD̃− 2 H (l) W (l)
scoring function lacks diminishing returns.
In practical applications, such as multi-robot systems, graph
B. Learning-based Optimization structures capture the complexity of interactions within the
In recent years, learning-based optimization has emerged system and between agents and environments. The agent-
as a frontier in advanced computational methodologies, of- entity graph and task-entity graph encode these interactions.
fering profound insights into complex problem-solving. This Through machine learning methodologies, specifically graph
paradigm shift has been largely propelled by the advent and convolutional networks, we analyze these complex structures.
subsequent dominance of deep learning techniques, known For instance, we encode the position and attributes of tasks in
36
Authorized licensed use limited to: National Univ of Defense Tech. Downloaded on December 15,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
a vector, apply a GCN to learn meaningful features from these The scoring functions for the heuristics are defined as
relationships, and use the extracted features to understand follows:
the underlying data structure. The proposed distributed task
H = γij − ∆Eij [ζi ]
1
allocation algorithm depicted in Figure 1 demonstrates the
H2 = γij
application of spectral GCNs for such feature extraction. γ
H3 = ∆Eijij[ζi ] (6)
H4 = γij −∆Eij [ζi ]
Ei [ζi ]
In alignment with the CBBA’s foundational principles, we layers yielding 32-, 16-, and 8-dimensional feature maps,
utilize ξi (t) to signify the task sequence within agent i’s bundle designed to distill environment-specific information such
at time t, where ξi represents the ordered set of tasks. Notably, as task connectivity and site distances.
the sequence ξi (t) does not necessarily correspond with the • A mean pooling layer follows to aggregate node features
order of tasks within the bundle ζi (t). The path length, now into a comprehensive graph-level representation.
interpreted as energy consumption, is represented by D[ξi (t)]. • Two dense layers, each with eight neurons, to process the
To adapt CBBA for scenarios with energy constraints, each pooled graph features.
agent commences by initializing their bundle ζi (t) to include • The model culminates in an output layer that provides a
starting point µi and a terminating point νi . For an agent predictive assessment of the aggregate rewards.
i’s current bundle ζi (t), the marginal utility Sij [ζi (t)] of
appending task j is conceptualized as the task’s reward less
the incremental energy cost, now expressed as:
(
γij − ∆Eij [ζi (t)], if ∆Eij [ζi (t)] ≤ Ei [ζi (t)],
Sij [ζi (t)] =
0, otherwise
(5)
Here, Ei [ζi (t)] is the residual energy for agent i post-
traversal of ξi (t), while ∆Eij [ζi (t)] denotes the additional
energy required if task j is to be included in ξi (t).
Fig. 2. Proposed Graph Convolutional Network
In the revised marginal utility equation, if the vertex x is in
close proximity to νi , it is conceivable for ∆Eij [ζi (t)⊕{x}] to By integrating the GCN predictions with our heuristic
be less than or equal to ∆Eij [ζi (t)], potentially contravening framework, we aim to enhance the decision-making process
the DMG principle. To mitigate this and ensure convergence in the allocation of tasks, ensuring an informed and adaptive
when utilizing non-DMG score functions, a warping mech- approach.
anism is introduced, adjusting the score to minξ⊆ζi (t) Sij [ξ],
thereby aiding the convergence process where traditional DMG III. R ESULTS AND DISCUSSION
is not inherently present. Figure 3 presents a series of bar charts comparing the perfor-
The score function’s direct correlation to both reward and mance of the predictive model across four different heuristic
energy consumption raises concerns about its scale invariance. methods: H1, H2, H3, and H4. For each heuristic, Series1
When the mapping of tasks to agents is scaled linearly, the rel- represents the values obtained using the heuristic method,
ative value of the scores, and consequently the task allocation while Series2 represents the predicted values generated by the
decisions, may be altered. In our proposed methodology, we model. The predictions for H1 are closely aligned with the
introduce a suite of heuristic extensions to the CBBA, each heuristic values, indicating a high degree of accuracy for this
characterized by a novel scoring function. The diversity of method. This is particularly evident in instances where the two
these heuristic extensions is tailored to address the varying series produce almost identical bar heights (e.g., at intervals
demands of distinct allocation problems, potentially outper- 1, 4, 6, and 10). This indicates that the model has learned the
forming the application of a single heuristic in all scenarios. pattern for H1 and can replicate its decision-making process
37
Authorized licensed use limited to: National Univ of Defense Tech. Downloaded on December 15,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
with high fidelity. For H2, the model appears to have greater
variance in its predictive accuracy. While some predictions are
quite close to the heuristic values (e.g., intervals 5 and 8), there
are others where there is a noticeable difference (e.g., intervals
2 and 9). The model demonstrates a similar pattern of accuracy
with the H3 method as with H1, with many of the predictions
being close to the heuristic values. The close correspondence
in intervals 3, 4, and 7 suggests that the model is largely
effective in estimating the H3 heuristic method’s outcomes.
Lastly, H4 shows a mixed pattern where the model accurately
predicts the heuristic values in several intervals (such as 2, 5,
and 7), but also deviates significantly in others (such as 1, 8,
and 10).
Overall, across all four heuristic methods, the model seems
capable of making reasonably accurate predictions and appears
to be a promising tool for replicating the patterns in different
heuristic methods. However, the variations in predictive ac-
curacy across different methods and intervals show that there
are few unique characteristics in each heuristic that the model
is variably capturing. Further analysis would be beneficial to
understand these differences, refine the model accordingly, and
potentially improve its predictive performance.
38
Authorized licensed use limited to: National Univ of Defense Tech. Downloaded on December 15,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES
[1] M. Campion, P. Ranganathan, and S. Faruque, “Uav swarm communica-
tion and control architectures: a review,” Journal of Unmanned Vehicle
Systems, vol. 7, no. 2, pp. 93–106, 2018.
[2] Y. Zhou, B. Rao, and W. Wang, “Uav swarm intelligence: Recent
advances and future trends,” Ieee Access, vol. 8, pp. 183 856–183 878,
2020.
[3] P. E. Hart, N. J. Nilsson, and B. Raphael, “A formal basis for the heuristic
determination of minimum cost paths,” IEEE transactions on Systems
Science and Cybernetics, vol. 4, no. 2, pp. 100–107, 1968.
[4] H.-L. Choi, L. Brunet, and J. P. How, “Consensus-based decentralized
auctions for robust task allocation,” IEEE transactions on robotics,
vol. 25, no. 4, pp. 912–926, 2009.
[5] G. A. Korsah, A. Stentz, and M. B. Dias, “A comprehensive taxonomy
for multi-robot task allocation,” The International Journal of Robotics
Research, vol. 32, no. 12, pp. 1495–1512, 2013.
[6] L. Johnson, H.-L. Choi, S. Ponda, and J. P. How, “Allowing non-
submodular score functions in distributed task allocation,” in 2012 IEEE
51st IEEE Conference on Decision and Control (CDC). IEEE, 2012,
pp. 4702–4708.
[7] L. B. Johnson, H.-L. Choi, and J. P. How, “The role of information
Fig. 5. Average Distance vs Tasks assumptions in decentralized task allocation: A tutorial,” IEEE Control
Systems Magazine, vol. 36, no. 4, pp. 45–58, 2016.
[8] L. Busoniu, R. Babuska, and B. De Schutter, “A comprehensive survey
of multiagent reinforcement learning,” IEEE Transactions on Systems,
number of tasks and the agents. The first column, representing Man, and Cybernetics, Part C (Applications and Reviews), vol. 38, no. 2,
the minimum time, consistently increases with the number pp. 156–172, 2008.
[9] L. Panait and S. Luke, “Cooperative multi-agent learning: The state of
of agents. The average time, denoted in the second column, the art,” Autonomous agents and multi-agent systems, vol. 11, pp. 387–
similarly ascends with the number of agents. Maximal time 434, 2005.
in the third column escalates as well with agent count. The [10] M. Barer, G. Sharon, R. Stern, and A. Felner, “Suboptimal variants of
the conflict-based search algorithm for the multi-agent pathfinding prob-
standard deviation of time, illustrated in the fourth column, lem,” in Proceedings of the International Symposium on Combinatorial
indicates a growing variability in the time to complete tasks. Search, vol. 5, no. 1, 2014, pp. 19–27.
[11] J. P. Van Den Berg and M. H. Overmars, “Prioritized motion planning
for multiple robots,” in 2005 IEEE/RSJ International Conference on
TABLE II Intelligent Robots and Systems. IEEE, 2005, pp. 430–435.
T IME R ESULTS S UMMARY IN [ S ] [12] T. Standley and R. Korf, “Complete algorithms for cooperative pathfind-
ing problems,” in IJCAI. Citeseer, 2011, pp. 668–673.
Number of Agents Min Time Avg Time Max Time Std Dev [13] W. Wu, S. Bhattacharya, and A. Prorok, “Multi-robot path deconfliction
2 0.0488 0.5393 0.9020 0.2682 through prioritization by path prospects,” in 2020 IEEE international
4 0.0917 1.0006 1.5168 0.4583 conference on robotics and automation (ICRA). IEEE, 2020, pp. 9809–
6 0.1372 1.4613 2.1640 0.6599 9815.
8 0.2154 1.9308 2.8104 0.8517 [14] C. Luo, Q. Huang, F. Kong, S. Khan, and Q. Qiu, “Applying machine
10 0.2462 2.3975 3.4392 1.0542 learning in designing distributed auction for multi-agent task allocation
with budget constraints,” in 2021 20th International Conference on
Advanced Robotics (ICAR). IEEE, 2021, pp. 356–363.
IV. C ONCLUSION [15] J. Blumenkamp, S. Morad, J. Gielis, Q. Li, and A. Prorok, “A framework
for real-world multi-robot systems running decentralized gnn-based
In this study, we successfully integrated Graph Convolu- policies,” in 2022 International Conference on Robotics and Automation
tional Networks (GCNs) into the Consensus-Based Bundle (ICRA). IEEE, 2022, pp. 8772–8778.
[16] A. Khan, E. Tolstaya, A. Ribeiro, and V. Kumar, “Graph policy gradients
Algorithm (CBBA) to enhance task allocation in multi-robot for large scale robot control,” in Conference on robot learning. PMLR,
systems, creating an AI-enhanced version (AI-CBBA). This 2020, pp. 823–834.
integration marks a shift from traditional heuristic methods [17] E. Tolstaya, F. Gama, J. Paulos, G. Pappas, V. Kumar, and A. Ribeiro,
“Learning decentralized controllers for robot swarms with graph neural
to a learning-based approach. AI-CBBA outperforms existing networks,” in Conference on robot learning. PMLR, 2020, pp. 671–682.
algorithms like original CBBA, Improved CBBA (ICBA), and [18] Q. Li, F. Gama, A. Ribeiro, and A. Prorok, “Graph neural networks
Prim’s algorithm in task allocation efficiency. It excels in for decentralized multi-robot path planning,” in 2020 IEEE/RSJ Inter-
national Conference on Intelligent Robots and Systems (IROS). IEEE,
managing complex task loads, demonstrating AI’s capability to 2020, pp. 11 785–11 792.
learn and optimize both task allocation and sequencing. Our [19] R. Kortvelesy and A. Prorok, “Modgnn: Expert policy approximation in
findings indicate that AI-CBBA could significantly advance multi-agent systems with a modular graph neural network architecture,”
in 2021 IEEE International Conference on Robotics and Automation
multi-robot system coordination, promising improvements for (ICRA). IEEE, 2021, pp. 9161–9167.
complex operations across various domains. [20] E. Ghisoni, S. Govindaraj, A. M. C. Faulı́, G. De Cubber, F. Polisano,
N. Aouf, D. Rondao, Z. Chekakta, and B. de Waard, “Multi-agent
ACKNOWLEDGMENT system and ai for explosive ordnance disposal,” CEIA HUMANITARIAN
CLEARANCE TEAMWORK, p. 26.
The research presented in this paper was financed by the
European Commission and managed by the European Defense
Agency in the framework of the Preparatory Action on De-
fense Research under Grant Agreement 884866 (AIDED).
39
Authorized licensed use limited to: National Univ of Defense Tech. Downloaded on December 15,2024 at 14:06:06 UTC from IEEE Xplore. Restrictions apply.