Multi-Agent Pathfinding With Continuous Time

Download as pdf or txt
Download as pdf or txt
You are on page 1of 80

Journal Pre-proof

Multi-Agent Pathfinding with Continuous Time

Anton Andreychuk, Konstantin Yakovlev, Pavel Surynek, Dor Atzmon and Roni Stern

PII: S0004-3702(22)00002-9
DOI: https://doi.org/10.1016/j.artint.2022.103662
Reference: ARTINT 103662

To appear in: Artificial Intelligence

Received date: 11 May 2020


Revised date: 22 October 2021
Accepted date: 7 January 2022

Please cite this article as: A. Andreychuk, K. Yakovlev, P. Surynek et al., Multi-Agent Pathfinding with Continuous Time, Artificial
Intelligence, 103662, doi: https://doi.org/10.1016/j.artint.2022.103662.

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and
formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and
review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal
pertain.

© 2022 Published by Elsevier.


Multi-Agent Pathfinding with Continuous Time

Anton Andreychuka,d , Konstantin Yakovleva,b,c,, Pavel Surynekg , Dor


Atzmone , Roni Sterne,f
a Federal Research Center for Computer Science and Control of Russian Academy of
Sciences, Russia
b National Research University Higher School of Economics, Russia
c Moscow Institute of Physics and Technology, Russia
d Peoples’ Friendship University of Russia (RUDN University), Russia
e Software and Information Systems Eng., Ben Gurion University of the Negev, Israel
f System Sciences Laboratory (SSL), Palo Alto Research Center (PARC), USA
g Faculty of Information Technology (FIT), Czech Technical University (ČVUT), Czechia

Abstract

Multi-Agent Pathfinding (MAPF) is the problem of finding paths for multi-


ple agents such that each agent reaches its goal and the agents do not collide. In
recent years, variants of MAPF have risen in a wide range of real-world appli-
cations such as warehouse management and autonomous vehicles. Optimizing
common MAPF objectives, such as minimizing sum-of-costs or makespan, is
computationally intractable, but state-of-the-art algorithms are able to solve
optimally problems with dozens of agents. However, most MAPF algorithms
assume that (1) time is discretized into time steps and (2) the duration of every
action is one time step. These simplifying assumptions limit the applicability
of MAPF algorithms in real-world applications and raise non-trivial questions
such as how to discretize time in an effective manner. We propose two novel
MAPF algorithms for finding optimal solutions that do not rely on any time
discretization. In particular, our algorithms do not require quantization of wait
and move actions’ durations, allowing these durations to take any value required
to find optimal solutions. The first algorithm we propose, called Continuous-
time Conflict-Based Search (CCBS), draws on ideas from Safe Interval Path
Planning (SIPP), a single-agent pathfinding algorithm designed to cope with
dynamic obstacles, and Conflict-Based Search (CBS), a state-of-the-art search-
based MAPF algorithm. SMT-CCBS builds on similar ideas, but is based on

Preprint submitted to Artificial Intelligence Journal January 10, 2022


a different state-of-the-art MAPF algorithm called SMT-CBS, which applied a
SAT Modulo Theory (SMT) problem-solving procedure. CCBS guarantees to
return solutions that have minimal sum-of-costs, while SMT-CCBS guarantees
to return solutions that have minimal makespan. We implemented CCBS and
SMT-CCBS and evaluated them on grid-based MAPF problems and general
graphs (roadmaps). The results show that both algorithms can efficiently solve
optimally non-trivial MAPF problems.
Keywords: Multi-Agent Pathfinding, Conflict-Based Search, Safe-Interval
Path Planning, SAT Modulo Theory, Heuristic Search

1. Introduction

MAPF is the problem of finding paths for multiple agents such that each
agent reaches its goal and the agents do not collide. MAPF has topical ap-
plications in warehouse management [1], airport towing [2], autonomous ve-
5 hicles, robotics [3], and digital entertainment [4]. A common requirement in
MAPF applications is to minimize either the sum of costs (SOC) of the agents’
plans or their maximum, also known as their makespan. Finding MAPF solu-
tions that have minimal SOC or minimal makespan are both NP hard prob-
lems [5, 6]. Nevertheless, AI researchers in the past years have made substantial
10 progress in finding solutions with minimal SOC or minimal makespan for a
growing number of MAPF problems, including problems with over one hundred
agents [7, 8, 9, 10, 11, 12, 6].
However, most prior work on solving MAPF optimally assumed that (1) time
is discretized into time steps, and (2) the duration of every action is one time
15 step. These simplifying assumptions limit the applicability of MAPF algorithms
in real-world application. In this work, we address a more general type of MAPF
problem called MAPFR [13].1 In MAPFR , agents occupy a defined area in a
metric space at any moment of a continuous timeline, and do not rely on any

1 We discuss later the relation between Walker et al.’s [13] definition of MAPFR and ours.

2
time discretization.
20 Existing algorithms such as E-ICTS [13] or ECBS-CT [14], partially address
the MAPFR problem, in that they support move actions of non-uniform duration
(cost) and consider the area the agents occupy. However, these algorithms still
rely on discretizing time to define the duration of the wait actions. This can
have a negative effect on both solution quality and runtime. In this work, we
25 propose two optimal algorithms that solve MAPFR problems and do not require
any time discretization, allowing wait actions with arbitrary durations.
The first algorithm we propose is called CCBS. CCBS builds on two existing
algorithms: SIPP [15] and CBS [7]. SIPP is a single-agent pathfinding algorithm
designed to find paths that avoid dynamic obstacles. CBS is a state-of-the-art
30 search-based MAPF algorithm with many extensions and improvements. CCBS
follows the same problem-solving framework as CBS, but uses a variant of SIPP
to plan paths for the individual agents, and imposes a novel type of constraints
to resolve conflicts between the resulting plans.
The second algorithm we propose is called SMT-CCBS. SMT-CCBS is a
35 SAT-based algorithm, that is, it solves a MAPFR problem by compiling parts
of the problem to a Boolean Satisfiability problem (SAT) and applies a state-
of-the-art SAT solver to solve it. SAT-based approaches have been applied
successfully for classical MAPF and take advantage of the power of modern
SAT solvers. Technically, SMT-CCBS resolves conflicts between agents using
40 the same type of constraints as CCBS. However, it does so in the SAT Modulo
Theory (SMT) problem solving framework used by the SMT-CBS algorithm [16]
to solve MAPF problems. Both CCBS and SMT-CCBS are guaranteed to (1)
only return solutions that are conflict-free, (2) find a solution if one exists, (3)
return a solution that has a minimal SOC or makespan (for CCBS and SMT-
45 CCBS, respectively).
We implemented CCBS and SMT-CCBS, and evaluated them on standard
grid-based MAPF benchmarks as well as generated roadmaps. The results show
that both algorithms can solve MAPF problems with dozens of agents optimally.
While we are not the first to study MAPF beyond its basic setting [13, 17, 14],

3
50 to the best of our knowledge we are the first to propose MAPF algorithms that
can handle agents with volumes and non-uniform action durations while avoiding
any form of time discretization (see Section 7 for a more detailed discussion).
All our implementations are publicly available, so researchers can reproduce our
results and build on them.
55 A preliminary version of this work was published in two conference pa-
pers [18, 19]. The main contribution of this journal paper is to present these
works in a unified and coherent manner. In addition, this journal paper extends
these works significantly, by providing:

• A formal proof of completeness and optimality for CCBS and SMT-CCBS.

60 • A detailed explanation of the algorithms with a running example to illus-


trate their behavior.

• A detailed description of how one can implement collision and conflict


detection, and unsafe interval calculation.

• A comprehensive set of experiments, including experiments on roadmaps


65 and experimental comparison with E-ICTS [13] and ECBS-CT [14] .

This paper is structured as follows. Section 2 provides relevant MAPF defini-


tions and relevant background. Section 3 formally defines the MAPFR problem
we address. Section 4 presents the CCBS algorithm and Section 5 presents SMT-
CCBS. Results of the empirical evaluation of both algorithms are described in
70 Section 6. Section 7 discusses the pros and cons of the CCBS and SMT-CCBS
and how they relate to other algorithms for solving different variants of the
general MAPF problem. Section 8 concludes with a list of directions for future
work.

2. Background

75 A classical MAPF problem [20] with k agents is defined by a tuple hG, S, Gi


where G = (V, E) is an undirected graph, S : [1, . . . , k] → V maps an agent to a
start vertex, and G : [1, . . . , k] → V maps an agent to a goal vertex (see Fig. 1).

4
Figure 1: An example of a classical MAPF problem instance on a 4-neighborhood grid.

Time is discretized into time steps. For every time step t, each agent occupies
one of the graph vertices, referred to as the location of that agent at time t. An
80 action in classical MAPF is a function a : V → V such that a(v) = v 0 means
the agent’s current location is v and its location in the next time step is v 0 . In
every time step each agent can choose to perform an action. There are two types
of actions: a wait action, in which the agent stays in its location, and a move
action, in which the agent moves to one of the vertices adjacent to its current
85 location.
A sequence of actions πi = (a1 , . . . , an ) is a single-agent plan for agent i if
an (an−1 (· · · (a1 (S(i))) · · · )) = G(i), i.e., if it leads the agent from its start to
its goal. The number of actions in the plan defines its cost denoted as c(πi ). A
solution to a classical MAPF problem is a set of k single-agent plans, one for
90 each agent: Π = {π1 , ..., πk }.
A solution to a classical MAPF problem is valid if its constituent single-
agent plans do not conflict. There are multiple types of conflicts. A vertex
conflict between single-plan πi for agent i and single-plan πj for agent j occurs
at location v and time step t iff according to these plans agents i and j plan
95 to occupy v at the same time step t. We represent such a conflict by the tuple

5

 
 A  A B 

A B C A B C A B
 B  D C 
  
(a) (b) (c) (d) (e)

Figure 2: An illustration from [20] of an edge conflict (a), a vertex conflict (b), a following
conflict (c), a cycle conflict (d), and a swapping conflict (e).

hi, j, v, ti. A swapping conflict between single-agent plans πi and πj occurs on


edge e at time step t iff according to πi and πj both agents plan to traverse the
edge e ∈ E from opposite directions at the same time step t. We represent such
a conflict by the tuple hi, j, e, ti. Figure 2 illustrates these conflicts as well as
100 other types of conflict that arise in classical MAPF. See Stern et al. [20] for a

deeper and more formal discussion of these conflicts.
A B
In classical MAPF, every action incurs a unit cost, and the cost
C of a single-
agent plan is the number of its constituent actions. Sum-of-costs (SOC) and
D  E
makespan are two common ways to define the cost of a MAPF solution. The
105 former, also known as the flowtime, is the summation over the costs of the con-
Pk
stituent single-agent plans, i=1 c(πi ), and the latter is their max, maxπ∈Π c(π).
A solution to a MAPF problem is called SOC-optimal if it is valid and there
is no valid solution that has a smaller SOC. A makespan-optimal solution is
defined in a similar way. In general, the problem of finding SOC-optimal or
110 makespan-optimal solutions for a given MAPF problem is known to be NP
hard [21, 5].

2.1. Optimal Algorithms for Classical MAPF

Several approaches have been proposed to find either SOC-optimal or makespan-


optimal solutions for classical MAPF (e.g., see the survey by Felner et al. [22]).
115 In this work, we build on three specific state-of-the-art classical MAPF solvers:
CBS [7], MDD-SAT [23], and SMT-CBS [16]. For completeness, we provide a
brief description of these algorithms here.

6
2.1.1. Conflict Based Search (CBS) for Finding SOC-Optimal Solutions
CBS [7] is a solution complete algorithm for classical MAPF that is guaran-
120 teed to return a SOC-optimal solution.2 It solves a given MAPF problem by
finding a plan for each agent separately, detecting conflicts between these plans,
and resolving them by replanning for the individual agents subject to specific
constraints.
The basic CBS implementation considers two types of conflicts: vertex con-
125 flicts and swapping conflicts. Correspondingly, the basic CBS implementation
considers two types of constraints. A CBS vertex-constraint is defined by a tuple
hi, v, ti and means that agent i is prohibited from occupying vertex v at t. A CBS
edge-constraint is defined similarly by a tuple hi, e, ti, where e ∈ E. To guar-
antee solution completeness and optimality, CBS runs two search algorithms:
130 a low-level search algorithm that finds plans for individual agents subject to a
given set of constraints, and a high-level search algorithm that chooses which
constraints to add.

CBS: Low-Level Search. The task of the low-level search in CBS is to find an
optimal plan for an agent that is consistent with a given set of CBS constraints.
135 Any single-agent pathfinding algorithm that can do this can be used as the
CBS low-level search. To adapt single-agent pathfinding algorithms, such as
A∗ , to consider CBS constraints, the search space must also consider the time
dimension since a CBS constraint hi, v, ti blocks location v only at a specific
time step t. This means that a state in this single-agent search space is a pair
140 (v, t), representing that the agent is in location v at time step t. Expanding
such a state generates states of the form (v 0 , t + 1), where v 0 is either equal to v,
representing a wait action, or equal to one of the locations adjacent to v. States
generated by actions that violate the given set of CBS constraints are pruned.
Running A∗ on this search space returns the lowest-cost plan to the agent’s

2A solution complete algorithm is an algorithm that is guaranteed to find a solution if a


solution exists. But, a solution complete algorithm may not detect unsolvability. That is,
given an unsolvable problem, a solution complete algorithm may run indefinitely [24].

7
145 goal that is consistent with the given set of CBS constraints, as required. This
adaptation of textbook A∗ is very simple, and indeed most papers on CBS do
not report it and just mention that the low-level search of CBS is A∗ .

CBS: High-Level Search. The high-level search algorithm in CBS works on


a Constraint Tree (CT), which is a binary tree, in which each node N is defined
150 by a pair (N.const, N.Π) where N.const represents a set of CBS constraints
imposed on the agents and N.Π is a MAPF solution that satisfies these CBS
constraints. A CT node N is generated by first setting its constraints (N.const)
and then computing N.Π by running the low-level solver, which finds a plan
for each agent subject to the constraints relevant to it in N.const. If N.Π does
155 not contain any conflict, then N.Π is a valid MAPF solution and N is regarded
as a goal node. Else, one of the conflicts hi, j, x, ti (where x is either a vertex
or an edge) in N.Π is chosen and two new CT nodes are generated Ni and
Nj . Both nodes have the same set of constraints as N , plus a new constraint,
added to resolve the conflict: Ni adds the constraint hi, x, ti and Nj adds the
160 constraint hj, x, ti. CBS searches the CT in a best-first manner, expanding in
every iteration the CT node N with the lowest-cost joint plan. The search halts
when the CT node N chosen for expansion is conflict-free, i.e., when N.Π is a
valid solution.

2.1.2. MDD-SAT for Finding Makespan-Optimal Solutions


165 A Boolean Satisfiability (SAT) problem is defined by a set of Boolean vari-
ables {v1 , . . . vn } and a Boolean formula Φ defined over these variables. A so-
lution to a SAT problem is an assignment to these variables such that Φ is
satisfied, or UNSAT if there is no assignment of these variables that satisfies
Φ. Modern SAT solvers are extremely efficient and can scale to SAT problems
170 with over a million variables [25, 26]. Thus, a successful approach for solving
many combinatorial search problems is to compile them to a SAT problem and
solve it with a state-of-the-art SAT solver. This approach has also been applied
successfully to classical MAPF [23, 27].

8
The SAT-based approach for finding makespan-optimal solutions works by
175 solving a sequence of SAT problems. Each of these SAT problems represents
the decision problem: “is there a valid solution to the given classical MAPF
problem within a makespan of at most µ?” where µ is a parameter. We call this
decision problem the bounded-cost MAPF problem.
Each bounded-cost MAPF problem is encoded as a SAT problem by defining
180 two types of Boolean variables. The first type, denoted Xvt (i), is defined for every
discrete time step t = {0, . . . , µ}, agent i, and vertex v the agent may occupy at
t
time step t. The second type of variables, denoted Eu,v (i), is defined for every
time step t, agent i, and edge (u, v) the agent may start traversing at time step t.
MAPF movement rules and collision avoidance constraints are encoded on top
185 of these variables as simple local constraints. The resulting Boolean formula is
given to a SAT solver, which returns either a satisfying assignment or UNSAT.
A satisfying assignment to the decision variables specifies a valid solution for
the given MAPF problem. An UNSAT indicates no valid solution exists with
makespan smaller than or equal to µ. In this case, µ is increased by one. This
190 process continues until the minimal µ for which a solution exists is found. This µ
is guaranteed to be makespan optimal. The SATPlan algorithm [28, 29] followed
a similar approach for classical planning. While this SAT-based approach for
classical MAPF returns makespan-optimal solutions, it is also possible, with
some additional bookkeeping, to use it to return SOC-optimal solutions [23].
195 As expected, the size of the generated Boolean formulas has a great impact
on the overall runtime. To reduce this size, Surynek et al. [28] proposed to
introduce the Xvt (i) and Eu,v
t
(i) variables only for reachable vertices and edges
at given time step t. This reachability analysis can be done by constructing a
Multi-Value Decision Diagram (MDD) [30] for each agent representing all single-
200 agent plans for that agent up to the makespan bound µ. Each of these MDDs is
a directed acyclic graph with a single source. A node in these MDDs represents
a vertex and time pair (v, t) the agent may occupy in a single agent plan that
reaches the goal before the makespan bound. Edges represent moves in such a
single-agent plan. The construction of Φ then relies on these MDDs rather than

9
205 on the original graph, i.e., it includes only variables that have corresponding
vertices and edges in an MDD. This algorithm is known as MDD-SAT.

2.1.3. An SMT Approach to Classical MAPF


SMT-CBS [16] is a recently introduced MAPF algorithm that integrates
ideas from CBS and SAT-based MAPF algorithms. Like SAT-based MAPF al-
210 gorithm, SMT-CBS finds an optimal solution to the given MAPF problem by
solving a sequence of bounded-cost MAPF problems. However, SMT-CBS solves
each of these bounded-cost problems using a SAT Modulo Theory (SMT) [31,
32, 33] problem-solving framework. For completeness, we provide a brief back-
ground on SMT.
215 SMT is a problem-solving framework designed to leverage the power of mod-
ern SAT solvers while applying them to a larger set of problems. The basic use
of SMT divides a given decision problem Γ into two parts. The first, called the
Propositional Skeleton (PS), is an abstraction of Γ that keeps only its Boolean
structure. The second, called DECIDET , is a decision procedure that accepts
220 an assignment that satisfies the PS and outputs true if this assignment corre-
sponds to a solution of the original problem Γ. If DECIDET returns false, i.e.,
if it is given a solution to the PS that cannot be mapped to a solution for Γ,
then DECIDET returns also a conflict (often called a lemma) that explains why
the solution to the PS is not valid.
225 The standard SMT solving procedure is iterative. First, find a satisfying as-
signment of the PS. Then, call DECIDET with this assignment. If DECIDET
returns true, the satisfying assignment is returned and the SMT solving proce-
dure is finished. Otherwise, the PS is extended with new constraints designed to
resolve the conflict returned by DECIDET . In more general cases, not only new
230 constraints are added to resolve a conflict but also new propositional variables.
Appendix A lists a simple example of this problem-solving procedure.
SMT-CBS follows the same SMT solving procedure for solving bounded-cost
MAPF. The PS initially created in SMT-CBS is the same as the SAT prob-
lem created by SAT-based MAPF algorithms, except that it does not encode

10
235 constraints to avoid conflicts between the agents’ plans. Instead, these conflicts
are detected after a solution to this PS is found, in a MAPF-specific DECIDET
procedure that we denote by DECIDEMAPF . That is, whenever the SAT solver
outputs a solution, this solution is checked for conflicts. If no conflicts exist, the
solution is returned. Otherwise, additional constraints are added to all subse-
240 quent PS to ensure that every previously detected conflict will not occur again.
These constraints are exactly the constraints CBS would impose to resolve the
found conflicts. Theoretically, SMT-CBS can converge to the same formula as
MDD-SAT but it often finishes with a smaller formula due to the iterative/lazy
construction. A similar approach has been used in the Lazy CBS algorithm [34].
245 SMT-CBS showed impressive performance on standard MAPF benchmarks.

2.2. Limitations of Classical MAPF and Prior Work on General MAPF


Moving a group of mobile robots safely in a shared environment is com-
monly considered as a primary application for a MAPF research. An important
question is therefore how the definition of a classical MAPF problem relates to
250 real-world MAPF applications in robotics. In particular, consider the following
intrinsic simplifying assumptions of classical MAPF and their implications in
robotics.

• A1. The duration of every move action is one time step. This
means either all agents move in exactly the same speed and graph edges
255 represent transitions of the same length, or that graph edges represent
transitions of different lengths and the agents adapt their velocity and
acceleration profile so that all moves take one time step.

• A2. The duration of a wait action is one time step. This means
an agent may wait any discrete number of time steps, as opposed to any
260 real valued duration.

In this work, we lift these assumptions. Since we are not the first to do
this, we first discuss prior works and the relation between them. Walker et
al. [13] introduced the MAPFR problem, which lifts the first classical MAPF

11
assumption (A1). In MAPFR , every edge e = (v, v 0 ) in the underlying graph G
265 is associated with a positive weight w(e) ∈ R>0 that represents the duration it
takes an agent to move from v to v 0 . Every location v ∈ V is associated with a
unique coordinate in a metric space, denoted coord(v). Agents occupy non-zero
volume in this space. When the location of an agent is v, it means that the
reference point of the agent is located at coord(v). When an agent moves along
270 an edge (v, v 0 ), it means it moves along a straight line from coord(v) to coord(v 0 )
in a constant velocity motion. There is a conflict between two single-agent plans
iff the volumes of two or more agents “overlap at the same instant in time” [13].
This can be detected using standard collision-detection techniques [35, 36].
The original definition of MAPFR only states that action durations are non-
275 uniform, and does not differentiate between move actions and wait actions.
However, the algorithm Walker et al. proposed in that paper – Extended In-
creasing Cost Tree Search (E-ICTS) – relies on wait actions having a fixed,
pre-determined duration. So, while E-ICTS does not lift A2, the definition of
MAPFR there is ambiguous.
280 Cohen et al. [14] proposed an extension of classical MAPF called multi-
agent motion planning (MAMP). In MAMP, each agent is associated with a
graph Gi = (Vi , Ei ). A vertex in Vi represents a possible state of agent i, where
a state of an agent represents its location as well as other relevant features
such as orientation and steering angle. An edge e = (v, v 0 ) ∈ Ei represents
285 a kinodynamically feasible motion of agent i from state v to state v 0 , and the
weight of an edge is the duration of performing this motion. The agents in
MAMP move in an environment represented by a set of grid cells C. Every
state v ∈ Vr of an agent i is associated with a set of cells in C, representing the
cells occupied by that agent when in state v. Every edge e = (v, v 0 ) ∈ Ei is
290 associated with a multiset of cells in C. Each cell in this multiset is associated
with a time interval indicating the time interval in which this cell is occupied
when agent i moves from v to v 0 . Like E-ICTS, ECBS-CT lifts A2 only partially,
as the duration of the wait actions is tied to the given time discretization (which

12
is done by diving it into the timesteps of  duration).3

295 2.3. SIPP

The MAPFR algorithms we propose in this paper are based on the SIPP
algorithm. SIPP [15] is a powerful algorithm for building a plan for a single
agent moving among static and dynamic obstacles [15]. It has also been used
within prioritized MAPF solvers [38] and for solving multi-agent pickup and
300 delivery (MAPD) problems [39]. SIPP accepts as input a graph, a start and
goal vertices in that graph, and trajectories specifying the motion of the dynamic
obstacles over time. The algorithm pre-processes these trajectories to compute
safe intervals for each vertex in the graph. A safe interval is a contiguous
period of time for a vertex, during which if the agent occupies that vertex then
305 it will not collide with any dynamic obstacle. Safe intervals are assumed to be
maximal, i.e., extending a safe interval is not possible.
For example, consider a case where only one dynamic obstacle is present and
both an agent and the obstacle are disks of radius r. Now, consider a vertex
v such that the distance between this obstacle and v is less than 2r between
310 time moments t1 and t2 . The corresponding safe intervals for v are [0, t1 ] and
[t2 , +∞). Note that in this example, we assume that (1) a collision happens
only when the distance between two disks is less than the sum of their radii,
when the distance is equal to it – no collision happens; and (2) the collision does
not occur at the specific points t1 and t2 .
315 A vertex may have multiple, non-overlapping, safe intervals. In general, the
number of safe intervals is proportional to the number of obstacles that pass
nearby the vertex. The chronologically last safe interval for a vertex might not
end with ∞. For example, an obstacle may come to that vertex and stay in
it. The safe interval might be an ∅ as well, e.g., in cases where an obstacle
320 constantly moves back-and-forth in the vicinity of the vertex.
SIPP performs an A∗ -like search over the search space in which each node

3 These details are listed in Cohen’s Ph.D. thesis [37].

13
is a pair of graph vertex and one of its safe intervals. This means there may be
multiple search nodes for the same vertex but with different safe time intervals.
In the example above, nodes n1 = (v, [0, t1 ]) and n2 = (v, [t2 , +∞)) correspond
325 to the same vertex v, but have different safe intervals. When expanding a node
n = (v, Tv ), SIPP generates a node for every pair (v 0 , Tv0 ) where v 0 is a neighbor
of v and Tv0 is a safe interval of v 0 in which the agent can safely arrive starting at
a time in Tv . SIPP also considers collisions with obstacles that occur when when
the agent moves along an edge. This is done when computing the time it takes
330 to traverse an edge. When there exists an expected collision on some edge, then
the time it takes to traverse that edge at that time will include staying in the
vertex until it is possible to move along that edge safely (i.e., without colliding).
Guided by a consistent heuristic, SIPP is guaranteed to return optimal solutions.
Sub-optimal and anytime variations of the algorithm are also known [40, 41].

335 3. Problem Statement

We define a MAPFR problem by the tuple hG, M, S, G, coord, Ai where G =


(V, E) is a graph, M is a metric space, S and G are the start and goal functions,
coord maps every vertex in G to a coordinate in M, and A is a finite set of
possible move actions.
340 Every action a ∈ A in MAPFR is defined by a duration aD and a motion
function aϕ . A motion function aϕ is a function aϕ : [0, aD ] → M, that maps
time to metric space. Here aϕ (t) is the coordinate of an agent (in M) at the
time t while executing an action a. Observe that the definition of these motion
functions allows us to model agents that move with a non-constant speed and
345 follow an arbitrary geometric curve.
There are two types of actions in MAPFR : move actions and wait actions.
For a move action a ∈ A, we restrict aϕ so that it starts in some vertex v, ends
in some other vertex v 0 , and (v, v 0 ) is an edge in E. That is, there exists v
and v 0 such that aϕ (0) = coord(v), aϕ (aD ) = coord(v 0 ), and (v, v 0 ) ∈ E. We
350 denote these vertices by from(a) and to(a). Note that by assuming a finite set

14
of move actions, we limit the completeness and optimality of the algorithms we
propose to be only with respect to the chosen set of move actions. We say
that an algorithm is complete with respect to a set of move actions A if it is
guaranteed to return a valid solution if one exists in which all move actions the
355 agents perform are from A. Similarly, we say that an algorithm is optimal with
respect to a set of move actions A if the solution it returns is guaranteed to
have the lowest cost among all other valid solutions in which all move actions
the agents perform are from A. Solutions that are optimal or even valid with
respect to one set of move actions can be non-optimal or even invalid under a
360 different set of move actions.
A wait action a is an action for which there exists a vertex v ∈ V such that
for every t ∈ [0, aD ] we have that aϕ (t) = coord(v). For completeness, we define
from(a) and to(a) for wait actions to be this vertex. Note that while the set of
move actions is given as input (A), the set of wait actions is implicitly defined
365 for every vertex v ∈ V and any positive real number aD . Thus, the set of wait
actions is infinitely large.
In MAPFR , when an agent is at a vertex v it can choose to perform any action
— move or wait — that starts at v, i.e., from(a) = v. A collision between the
agents occurs if their shapes overlap. To detect such an overlap, we assume a
collision-detection method IsCollision : {1, . . . , k} × {1, . . . , k} × M × M →
{true, false} is available where IsCollision(i, j, mi , mj )=true iff when agents i
and j occupy locations mi and mj , respectively, then their shapes overlap. For
example, if the agents are 2D disk-shaped with a radius of r, then a collision
occurs if the distance between the centers of the agents is less than 2r. That is,
in this setting,

true ||mi − mj ||2 < 2r

IsCollision(i, j, mi , mj ) = (1)
false Otherwise

Note that our problem definition and the algorithms we propose later are not
restricted to disk-shaped agents and this particular type of IsCollision imple-
mentation.

15
For a sequence of actions π = (a1 , . . . , an ), we denote by π[: j] the prefix of
the first j actions, i.e., π[: j] = (a1 , . . . aj ). The duration and motion function
of π, denoted πD and πϕ , respectively, are defined as follows:
X
πD = aD (2)
a∈π


a1ϕ (t)


 t ≤ a1D






··· ···



aj (t − (π[: j − 1])D )

(π[: j − 1])D < t ≤ (π[: j])D
ϕ
πϕ (t) = (3)
··· ···







anϕ (t − (π[: n − 1])D ) (π[: n − 1])D < t ≤ (π[: n])D








a (a ) (π[: n])D < t
nϕ nD

370 To explain Equation 3, which computes the location at each time moment
while executing π, observe that the motion functions are not defined with respect
to when their respective actions are applied. In other words, motion functions
are defined with respect to relative time and not absolute time. For instance,
for any action a that moves the agent from v to v 0 , by definition aϕ (0) = v and
375 aϕ (aD ) = v 0 . Therefore, to compute πϕ (t) we need to first identify the action
planned to be executed at time t. This can be computed by observing that
the ith action in π starts at time (π[: i − 1])D and ends at time (π[: i])D for
every i > 1. Then, we “correct” t by the starting time of that action, to obtain
the location of the agent during the execution of that action. The last line in
380 Equation 3 defines that the agent is assumed to stay in its last location after
the plan ends.4
As in classical MAPF, we define a single-agent plan for an agent i to be a
sequence of actions πi = (a1 , . . . , an ) such that executing it moves agent i from
S(i) to G(i). A conflict between two single-agent plans is naturally defined as

4 In this work we assume that an agent does not disappear when it reaches its goal. Other
variants also exist [20, 42].

16
Figure 3: A MAPFR problem instance with 3 agents. (left) The dashed circle mark the goal
location of each agent. (right) The positions of the agents at time moment t = 2.8 when
following their individual plans. Red and blue agents are in a collision.

385 the case where if the agents execute their respective plans starting at the same
time then there exists a point in time in which a collision occurs.

Definition 1 (Conflict in MAPFR ). Two single-agent plans πi and πj have a


conflict iff

∃t ∈ [0, max(πiD , πj D )] IsCollision i, j, πiϕ (t), πj ϕ (t) (4)

A solution to a MAPFR is valid if all its constituent single-agent plans do not


conflict. The cost of a single-agent plan is its duration. Like in classical MAPF,
390 the sum-of-costs of a solution is the sum-of-costs (SOC) of its constituent single-
agent plans, and the makespan of a solution is the maximum over these costs.
Correspondingly, we define two problems: the problem of finding a SOC-optimal
solution to a MAPFR problem and the problem of finding a makespan-optimal
solution to a MAPFR problem. In this work, we consider both problems and
395 propose algorithms for solving them.

3.1. MAPFR Examples

To highlight the differences between MAPF and MAPFR , we provide here


two examples of the MAPFR problems.

17
Example 1 (Figure 3). The graph vertices and edges are represented by small
400 circles with letters inside and the straight line segments between them, respec-
tively. Agents are shown as colored circles. Each agent is a disk with a radius of
0.5. Initially, the green agent occupies vertex A, the red agent occupies vertex
E, and the blue agent occupies vertex G. Their respective goals are I (green
agent), J (red agent), and D (blue agent). There are 20 move actions in this
405 example, corresponding to 10 undirected graph edges. We assume the agents
move in a constant speed of one unit; they start and stop instantaneously; and
move from one vertex to the other following the straight line segment connect-
ing them. Thus, the duration of every action equals the distance between the
vertices that define that move action. E.g. the duration of the action “move
410 from A to B”, denoted as A → B, is 2. The motion function that describes this
−→ 1 −−→
action is: OA + ||AB|| · AB · t, which is an analytical way to define a constant-
velocity straightforward motion between A and B. The least-cost individual
plans to the agents’ respective goals are: πgreen = {A → B, B → F, F → I}
for the green agent; πred = {E → F, F → I, I → J} for the red agent;
415 πblue = {G → H, H → C, C → D} for the blue one. If the agents start ex-
ecuting them simultaneously, then at t = 2.8 their positions will be (3, 4.2),
√ √
(3 + 0.4 · 2, 3 − 0.4 · 2), (3.48, 1.64) respectively (see Figure 3 (right)). The
distance between the positions of red and blue agents is less than the sum of their
radii, thus, they are colliding and their plans are in conflict.

420 Example 2 (Figure 4). In this MAPFR problem there are 3 disk-shaped agents

of size (diameter) 2/2 located in a 4-neighborhood 2×4 grid. Thus, the move
actions correspond to moving along the four cardinal directions. The duration of
every such move is 1 time unit. The start locations of the agents are illustrated
in the grid on the left. The goal locations of the agents are shown on the grid
425 that is on the top-right. The 3 top right grids show the solution found by a SOC-
optimal MAPF solver which assumes that minimal wait time equals 1 time unit.
The 3 bottom right grids show key steps in the solution found by a SOC-optimal
MAPFR solver. As can be seen, agents 2 and 3 reach their goals earlier in

18
Figure 4: A MAPFR problem instance with 3 agents where non-unit wait actions are needed
to find a SOC-optimal solution.

the SOC-optimal MAPFR solution, and therefore its overall SOC is lower than
430 the SOC-optimal MAPF solution (8.414 vs. 9). This highlights the advantage
of allowing arbitrary wait durations. An animation of this example is given in
https: // tinyurl. com/ ccbs-vs-cbs2 .

The first example illustrates that in MAPFR agents may collide even if they
do not occupy the same edge or vertex. The second example illustrates that
435 allowing arbitrary wait times enables finding solutions with a lower cost. In
particular, consider applying ECBS-CT or E-ICTS on this example. Recall
that both algorithms require determining a-priori the duration of wait actions.
However, since the underlying graph here is based on a simple 4-neighborhood
grid, it is not obvious how to determine the appropriate wait action duration so
440 as to result in the SOC-optimal MAPFR solution shown in this example. The
algorithms we propose in this work are able to find this solution.

19
4. Conflict-Based Search with Continuous Time

In this section, we introduce CCBS – an algorithm that finds SOC-optimal


solutions to MAPFR . It is based on two algorithms: CBS [7] and SIPP [15].

445 4.1. From CBS to CCBS

CCBS follows the CBS framework. The main differences between CCBS and
CBS are:

• To detect conflicts, CCBS uses the given conflict detection mechanism


InConflict, which, in turn, uses the given geometry-aware collision de-
450 tection mechanism IsCollision.

• To resolve conflicts, CCBS uses a geometry-aware unsafe-interval detection


mechanism.

• CCBS adds constraints over pairs of actions and time ranges, instead of
location-time pairs.

455 • For the low-level search, CCBS uses a version of SIPP adapted to handle
CCBS constraints.

Next, we explain these differences in detail.

4.1.1. Conflict Detection in CCBS


CCBS is designed for MAPFR , where agents can have any geometric shape,
460 agents’ actions can have any duration, and agents move continuously in time fol-
lowing some motion function. Thus, conflicts can occur between agents travers-
ing different edges, as well as between agents where one is traversing an edge and
the other is waiting at a vertex [43, 17]. Also, an action a initiated at some time
t may conflict with an action a0 initiated at some other time t0 , as long as there
465 is some overlap in their execution time, i.e., as long as [t, t+aD ]∩[t0 , t0 +a0D ] 6= ∅.
To this end, we define a CCBS conflict with respect to a pair of timed actions.
A timed action is a pair (a, t) where a is an action and t is a point in time.
To execute a timed action (a, t) means to execute action a starting from time

20
t. A CCBS conflict is a tuple hi, j, (ai , ti ), (aj , tj )i, representing that if agent i
470 executes the timed action (ai , ti ) and agent j executes the timed action (aj , tj )
then they will collide. Formally:

Definition 2 (CCBS Conflict). hi, j, (ai , ti ), (aj , tj )i is a CCBS conflict, de-


 
noted InConflict i, j, (ai , ti ), (aj , tj ) , iff
 
∃t ∈ [ti , ti + aiD ] ∩ [tj , tj + aj D ] : IsCollision i, j, aiϕ (t − ti ), aj ϕ (t − tj ) (5)

Whenever i and j are clear from the context, we omit them and use IsCollision(mi , mj )
 
and InConflict (ai , ti ), (aj , tj ) . Observe that a CCBS conflict is agnostic to
the absolute time the actions are performed, and only considers their relative
time, that is, for any constant number ∆ > 0,
   
InConflict (ai , ti ), (aj , tj ) → InConflict (ai , ti + ∆), (aj , tj + ∆) (6)

The complexity of computing InConflict depends on the shapes of the


agents i and j and the motion functions of the actions ai and aj . For the setting
475 used in our experiments – disk-shaped agents moving in constant speed on
straight lines – we used a fast closed-loop collision detection mechanism [35] that
runs in O(1). In general, computing InConflict for arbitrary agents’ shapes
and motion functions translates to the task of collision detection for arbitrary-
shaped moving objects, which is a non-trivial problem extensively studied in
480 computer graphics, computational geometry, and robotics [36].
Any single-agent plan π = (a1 , . . . , an ) can also be viewed as a set of timed

actions (a1 , t1 ), . . . , (an , tn ), (an+1 , tn+1 ) , where ti is the time in which ai is
planned to be executed according to π. t1 is equal to zero, and for all other
values of i = 2, . . . , n it is the duration of the plan up to action i, that is,
485 ti = (π[: (i − 1)])D . The last timed action (an+1 , tn+1 ) is a “dummy” wait action
whose duration is infinite. The purpose of this last timed action is to detect
conflicts between an agent that finished its plan and other agents. In CCBS, we
say that single-agent plans πi and πj for agents i and j have a CCBS conflict

21
if there exists a pair of timed actions (ai , ti ) ∈ πi and (aj , tj ) ∈ πj such that
 
490 InConflict (ai , ti ), (aj , tj ) is true. It is straightforward to see that a pair
of single-agent plans have a conflict, as defined in Definition 1, if they have a
CCBS conflict.
In our running example from Figure 3, consider the following single-agent
plans for the green, red, and blue agents, represented as sequences of timed
495 actions: πgreen = {(A → B, 0), (B → F, 2), (F → I, 4)}, πred = {(E →

F, 0), (F → I, 2), (I → J, 2 + 2 2)}, πblue = {(G → H, 0), (H → C, 2), (C →
D, 7}. The plans of the red and blue agents do, indeed, conflict (as was shown
on Figure 3 (right)). CCBS conflict here is: hred, blue, (F → I, 2), (H → C, 2)i.

4.1.2. Resolving Conflicts in CCBS


500 The high-level search in CCBS runs a best-first search like CBS for classical
MAPF. In every iteration, a leaf node N in the CT is selected that has the
solution with the smallest cost. The InConflict function is used to check if
N.Π has a CCBS conflict. If no CCBS conflicts were found, N is declared a
goal node and the search halts. Otherwise, the high-level search expands N
505 by choosing one of the CCBS conflicts detected in N.Π. Let h(ai , ti ), (aj , tj )i
be that chosen CCBS conflict. CCBS resolves this conflict by generating two
new CT nodes, Ni and Nj . To compute the constraints to add to Ni and Nj ,
CCBS computes for each timed action its unsafe intervals w.r.t the other timed
action. The unsafe interval of (ai , ti ) w.r.t. (aj , tj ) is the maximal contiguous
510 time interval starting from ti in which if agent i will perform ai then it will
conflict with the timed action (aj , tj ). Formally:

Definition 3 (CCBS Unsafe Interval). [ti , tui ) is the unsafe interval for (ai , ti )
with respect to (aj , tj ), where

tui = argmin {InConflict((ai , t), (aj , tj )) = false} (7)


t∈[ti ,tj +aj D ]

and [tj , tuj ) is the unsafe interval for (aj , tj ) with respect to (ai , ti ), where

tuj = argmin {InConflict((ai , ti ), (aj , t)) = false} (8)


t∈[tj ,ti +ai D ]

22
If there is no time t ∈ [ti , tj + aj D ] for which InConflict((ai , t), (aj , tj )) =
false then tui is not defined mathematically. For such cases, we set tui to be tj +
aj D , indicating that ai must start after aj has already finished. Similarly, tuj can
515 be undefined when there is no t ∈ [tj , ti +aiD ] for which InConflict((ai , ti ), (aj , tj )) =
false, and we set tuj to ti + aiD in such cases.
A constraint in CCBS is of the form hi, ai , [ti , tui )i, saying that agent i cannot
perform ai in the range [ti , tui ). When the agent’s index is clear from the con-
text, we omit it from the definition of the constraint, i.e., just write ha, [t, tu )i.
520 Constraints over time intervals, also known as “range constraints”, have already
been introduced in the context of robust MAPF [44].
For a CCBS conflict h(ai , ti ), (aj , tj )i, CCBS adds to Ni the constraint


hi, ai , [ti , tui )i and adds to Nj the constraint j, aj , [tj , tuj ) . Section 4.3 discusses
methods for computing the unsafe intervals.
525 For example, assume that we are running CCBS over the MAPFR problem
depicted in Figure 3. The high-level search expands a root CT node that con-
tains individual plans of the agents that were planned agnostic to each other.
As mentioned above, the plans of the red and blue agents conflict. This CCBS
conflict is hred, blue, (F → I, 2), (H → C, 2)i. The unsafe interval of the action
530 (F → I, 2) w.r.t. action (H → C, 2) is [2.000, 3.743), i.e. the first moment
of time the red agent might safely start moving from F to I after time 2 is
3.743. The unsafe interval of the action (H → C, 2) w.r.t. action (F → I, 2) is
[2.000, 3.310).

4.1.3. SIPP for the Low-Level of CCBS


535 The low-level solver of CCBS is based on SIPP [15]. Originally, SIPP uses
the trajectories of the dynamic obstacles to compute the safe intervals of the
graph vertices. In our cases, we do not have dynamic obstacles. Instead, the safe
intervals for every vertex is computed by reasoning over the CCBS constraints
passed to SIPP, as follows. A CCBS constraint hi, ai , [ti , tui )i imposed over a
540 wait action ai translates to two safe intervals: one that ends at ti and another
that starts at tui .

23
Consider the following example: two CCBS constraints imposed over the
wait actions associated with the same graph vertex v are passed to SIPP:
hi, a1 , [t1 , tu1 )i and hi, a2 , [t2 , tu2 )i (t1 < tu1 < t2 < tu2 ). Then, the safe intervals
545 for v are the following: [0; t1 ], [tu1 ; t2 ], [t2u , +∞).5
CCBS constraints imposed over move actions are incorporated into SIPP
in a different way. Let hi, ai , [ti , tui )i be a CCBS constraint imposed over a
move action that is defined by the graph edge (v, v 0 ) and a SIPP search node
associated with the vertex v. For this search node, we replace the action ai with
action a0i that starts by waiting in v for duration tui − ti before moving to v 0 .
Formally, a0i is defined by a duration ai 0D = aiD + tui − ti and a motion function

coord(v) t ≤ tui

0
ai ϕ (t) = (9)
aiϕ (t + tu ) t > tu

i i

We denote this version of SIPP, which accepts a set of CCBS constraints,


as Constrained Safe Interval Path Planning (CSIPP). CSIPP is similar to Soft
Conflict Interval Path Planning (SCIPP) by Cohen et al. [14], which is also a
variant of SIPP designed to be a low-level search for CBS. CSIPP and SCIPP
550 are fundamentally different in several aspects. First, SCIPP is designed for
the multi-agent motion planning (MAMP) problem addressed by Cohen et al,
which is different from MAPFR as explained in Section 2.2. Second, SCIPP
considers constraints over points in time, while CSIPP considers constraints
of time intervals. Third, SCIPP is designed to consider soft-constraints so as
555 to allow bounded-suboptimal solutions, while CSIPP is currently designed to
return optimal solutions.

5 Note that the safe intervals include the boundary time moments t1 , t2 . The rationale
behind this is the following. CCBS constrains wait actions that have certain durations, that
is – no wait action is possible that starts at t1 or t2 . However, the agent can start moving at
t1 (or t2 ) and thus leave the vertex immediately after that time moment, without violating
the CCBS constraint.

24
Algorithm 1: CCBS pseudo code.
input: G = (V, E), S, G)
1 foreach agent i do
πi ←A∗ G, S(i), G(i)

2
 
3 N ← ∅, (π1 , . . . , πk )

4 Create Open; Add N to Open


5 while Open is not empty do
6 N ← pop N from Open such that cost(N.Π) = min cost(N 0 .Π)
N 0 ∈Open

7 if N.Π has no conflicts then


8 return N.Π

9 hi, j, (ai , ti ), (aj , tj )i ← FindConflict(N.Π)


10 for l ∈ {i, j} do
11 [tl , tul ) ← compute unsafe interval for agent l
12 const ← N.const ∪ {hl, al , [tl , tul )i}
πl0 ← CSIPP G, S(l), G(l), const

13

14 Π0 ← (N.Π \ {N.πl }) ∪ {πl0 }


15 Nl ← (const, Π0 )
16 Add Nl to Open

25
4.1.4. CCBS Pseudo-code
Algorithm 1 lists the complete pseudo-code of CCBS. First, a simple A∗
search is used to find a single-agent plan for each agent ignoring all other agents
560 (line 2). Note that SIPP is not needed as this stage, because the optimal plan
for each agent at this stage includes only move actions. This set of single-agent
plans is used to create the root of the CT, which is added to the open list
(Open). Then, in every iteration of the algorithm, we extract the node N from
Open that represents a solution with a minimal cost, compared to solutions
565 in the other nodes in Open(line 6). If N has no CCBS conflicts, we return
N.Π. Otherwise, we choose one of the CCBS conflicts C = hi, j, (ai , ti ), (aj , tj )i
detected for N.Π. For each agent in the conflict, i.e., i and j, we compute
the unsafe interval for its action (line 11), and create a new CT node with the
corresponding constraint and a new single-agent plan for that agent. Note that
570 in the pseudo-code we use N.πl to refer to the single-agent plan of agent l in
N.Π. This CT node is added to Open, so that it may be chosen for expansion
in future iterations.

4.2. Theoretical Properties

Next, we prove that CCBS is sound, solution complete, and optimal. Our
575 analysis is based on the notion of a sound pair of constraints, established by
Atzmon et al. [44].

Definition 4 (Sound Pair of Constraints). For a given MAPFR problem, a pair


of constraints is sound iff in every optimal valid solution it holds that at least
one of these constraints hold.

580 Lemma 1. For any CCBS conflict h(ai , ti ), (aj , tj )i and corresponding unsafe
intervals [ti , tui ) and [tj , tuj ), the pair of CCBS constraints hi, ai , [ti , tui )i and


j, aj , [tj , tuj ) is a sound pair of constraints.

Proof. By contradiction, assume that there exists a valid solution to the corre-
sponding MAPFR problem in which action ai is performed at time ti + ∆i ∈

26
[ti , tui ) and action aj is performed at time tj + ∆j ∈ [tj , tuj ). This means that

InConflict((ai , ti + ∆i ), (aj , tj + ∆j )) = false (10)

Case #1: ∆i > ∆j . By definition, ti + ∆i − ∆j is in the unsafe interval


[ti , tui ) and therefore:

InConflict((ai , ti + ∆i − ∆j ), (aj , tj )) (11)

Due to Equation 6, this means InConflict((ai , ti + ∆i ), (aj , tj + ∆j )), contra-


dicting Equation 10.
585

Case #2: ∆j ≥ ∆i . Similarly, tj + ∆j − ∆i is in the unsafe interval


[tj , tj + tuj ) in this case, and thus

InConflict((ai , ti ), (aj , tj + ∆j − ∆i )) (12)

Again, using Equation 6 results in InConflict((ai , ti +∆i ), (aj , tj +∆j )) which


contradicts Equation 10.

Theorem 1. CCBS is sound, solution complete, and is guaranteed to return an


optimal solution if one exists.

590 Proof. Soundness follows from the fact that CCBS stops only when the N.Π
has no conflicts. Solution completeness and optimality proof is similar to the
completeness and optimality proof for k-robust CBS [44], as follows.
For a CT node N , let π(N ) be all valid MAPFR solutions that satisfy
N.constraints, and let N1 , and N2 be the children of N . For any N that
595 is not a goal node, the following two conditions hold.

1. π(N ) = π(N1 ) ∪ π(N2 )


2. cost(N ) ≤ min(cost(N1 .Π), cost(N2 , Π))

The first condition holds because N1 and N2 are constrained by a sound pair
of constraints (Lemma 1 and Definition 4). The second condition holds because
600 N.Π by construction is the lowest cost solution that satisfies the constraints in
N , and the constraints in N1 and N2 are a superset of N.constraints.

27
Now let N be the root of the CT. π(N ) is the set of all valid solutions,
and thus any valid solution will be either in π(N1 ) or π(N2 ). Applying this
reasoning recursively yields that every valid solution is always reachable via one
605 of the un-expanded CT nodes. By performing a best-first search over the CT,
exploring CT nodes with minimal cost first, CCBS is guaranteed to find an
optimal MAPFR solution.

CCBS, like CBS, is solution-complete but it is not complete. That is, if a


valid solution exists, CCBS will find it, but if a valid solution does not exist then
610 CCBS will not detect this. In an unsolvable MAPFR problem, CCBS will add
more and more constraints indefinitely, eventually generating single-agent plans
in which the agents go back and forth between locations they already occupied.
In classical MAPF this can be partially overcome by computing an upper bound
on the solution cost of a given MAPF problem and pruning CT nodes whose
615 cost exceed this bound. Obtaining an upper bound on the solution cost of a
given MAPF problem can be done in polynomial time using Kornhauser et al.’s
algorithm [45], as long as the underlying graph is undirected [46]. However,
such an upper bound is not known for MAPFR problems.

4.3. Conflict and Unsafe Interval Detection Methods

620 The soundness, solution completeness, and optimality of CCBS rely on hav-
ing accurate collision detection, conflict detection, and unsafe interval detec-
tion mechanisms. That is, we require (1) the collision detection mechanism
(IsCollision) to detect a collision iff one exists, (2) the conflict detection
mechanism to detect a conflict iff one exists (InConflict), (3) and the un-
625 safe interval detection mechanism returns the maximal unsafe interval for every
given pair of actions. Constructing such accurate mechanisms for agents with
arbitrary shapes and arbitrary motion-functions is not trivial, however for many
individual cases fast and exact solutions exist.

28
4.3.1. Implementing Collision Detection
630 If the agents are modeled as spheres (or disks in 2D), IsCollision is trivially
implemented in O(1) by computing the distance between the centers of the
agents and comparing this distance to the sum of their radii. When the agents
are convex polyhedrons, IsCollision can be implemented in O(log(n) log(m)),
where m, n is the number of vertices comprising the polyhedrons [47]. General
635 polyhedra are more difficult to handle. In this case the agents’ regions or their
surfaces are typically decomposed into convex parts and then collision detection
is applied to these parts in a systematic fashion, see [48] for example.

4.3.2. Implementing Conflict Detection


A general approach to detect conflicts is to sample the agents’ motion func-
640 tions and to apply IsCollision to the sampled positions of the agents. This
approach is widespread in robotics, see [49] for example. However, this approach
may result in missing conflicts due to an inappropriate sampling strategy. To
mitigate this issue, more advanced approaches to conflict detection have been
proposed – see Jiménez et al. [36] for an overview. Tang et al. [50] proposed a
645 particular mechanism that accurately solves conflict-detection queries when the
agents are represented as triangle meshes (i.e. their bounding surfaces are com-
posed of triangles whose coordinates are known). These InConflict detectors
are non-trivial to implement and computationally expensive. Fortunately, in a
number of cases exact and fast InConflict implementations can be proposed.
650 For example, when agents are represented as disks that move along straight lines
with constant speed, conflict detection can be done in O(1) using a closed-loop
formula [35]. Walker and Sturtevant [51] proposed an extension of this formula
that is able to handle accelerated movements.

4.3.3. Implementing Unsafe Interval Detection


655 Computing the unsafe interval of an action w.r.t another action also requires
analyzing the kinematics and the geometry of the agents. However, unlike colli-
sion and conflict detection, which have been studied for many years and can be

29
computed with closed-loop formulas in some settings, the problem of computing
an unsafe interval is less investigated. A naı̈ve general method for computing
660 the unsafe interval for an action ai is to apply the conflict detection mechanism
multiple times, starting from t = ti and incrementing t by some small ∆ > 0
until InConflict returns f alse, meaning the unsafe interval is done. This ap-
proach is limited in that the resulting unsafe interval may be larger than the real
one. One can extend this approach to get a more accurate solution, as follows.
665 Suppose that InConflict returns true when the start moment of the action ai
is ti + (k − 1)∆ and returns f alse if it is ti + k∆. Obviously, the true endpoint
of the unsafe interval lies in (ti + (k − 1)∆, ti + k∆]. One can now apply binary
search over this interval to identify the unsafe interval endpoint. Theoretically,
this search may not converge due to the unlimited number of time moments
670 comprising the interval. However, in practice, these moments are represented as
the floating-point approximations thus the search will, indeed, converge. More-
over, the resultant endpoint will be exact in the sense that a computer program
is not able to present this endpoint with more accuracy. Finally, a recent in-
vestigation of Walker and Sturtevant [51] describes an approach to compute the
675 “exact minimum delay for collision avoidance” in case agents are disks moving
with constant velocities. This can be straightforwardly transformed to the exact
computation of the CCBS unsafe intervals.
Overall, when agents are represented as disks that move from one location to
the other with constant velocities along the straight lines, there exist fast and
680 exact mechanisms to implement IsCollision, InConflict, and to compute
the endpoints of unsafe intervals.

4.4. Design Choices and Implementation Details

One of the major design choices when implementing an algorithm from the
CBS family is which conflict to choose when processing a high-level node. In
685 classical MAPF, choosing conflicts in an intelligent manner can significantly
reduce the number of the expanded CT nodes and speed up the search by
several orders of magnitude [52, 53]. A particularly effective method for choosing

30
conflicts is to prefer cardinal conflicts over semi-cardinal conflicts, and semi-
cardinal conflicts over non-cardinal conflicts. A CBS conflict is called cardinal
690 iff resolving it using either of the corresponding CBS constraints results in
increasing the solution cost. The conflict is semi-cardinal if the solution cost
increases only when imposing the corresponding CCBS constraint on only one of
the agents involved in the conflict. In all other cases, the conflict is non-cardinal.
Effectively implementing this prioritization of cardinal and non-cardinal con-
695 flicts in CCBS is not trivial, since conflict detection in MAPFR is more costly
than in classical MAPF. In our CCBS implementation, we store all detected
conflicts with their types (cardinal, semi-cardinal, or non-cardinal) in the nodes
of the CCBS constraint tree. When a child CT node is generated, it immedi-
ately copies all the conflicts from its parent, except those that were resolved.
700 Then, we detect conflicts only with the newly constructed plan and identify their
type (cardinal, semi-cardinal, or non-cardinal). This allowed choosing cardinal
and semi-cardinal conflicts first in an effective manner, which in turn proved
to be very effective for speeding up CCBS. Note that in a preliminary version
of this work [18], we proposed a heuristic for choosing when to detect cardinal
705 conflicts and when to avoid doing so. This heuristic does not yield significant
improvement when using the described-above implementation for detecting con-
flict types.
An additional CCBS design choice that is worth mentioning is the tie-
breaking strategy used to choose which CT node to expand from all CT nodes
710 with minimal cost. We used the following tie-breaking strategy. If two or more
high-level nodes with the same (minimal) cost exist in the CT tree we prefer the
one with the lower number of conflicts. A secondary tie-breaking criterion we
used is the number of constraints, preferring to expand first nodes with more
constraints. Similar tie-breaking techniques for CT nodes were proposed by
715 Barer et al. [54]. Our implementation of CCBS is in C++ and is available at
github.com/PathPlanning/Continuous-CBS.

31
5. An SMT-Based Approach for Makespan-Optimal MAPFR

In this section, we present SMT-CCBS, an algorithm that finds makespan-


optimal solutions to MAPFR problems. As it name suggests, SMT-CCBS builds
720 on the SMT-CBS algorithm [16] for classical MAPF. SMT-CCBS breaks the
problem of finding a valid makespan-optimal solution in MAPFR into a sequence
of bounded-cost MAPFR problems. A bounded-cost MAPFR problem is a prob-
lem of finding a valid solution to a given MAPFR problem with makespan lower
than a given bound µ. Each of the bounded-cost MAPFR problems created
725 by SMT-CCBS is solved using an SMT problem-solving procedure. Section 5.1
describes in detail how SMT-CCBS solves each of these bounded-cost problems
and Section 5.2 provides a complete pseudo-code for SMT-CCBS. The relation
between CCBS and SMT-CCBS is discussed later, in Section 7.1.

5.1. Solving Bounded-Cost MAPFR Problems with SMT

730 We refer to the SMT-based bounded-cost MAPFR algorithm we propose


as µSMT-CCBS. Some aspects of µSMT-CCBS are similar to the SMT-based
bounded-cost MAPF algorithm used by SMT-CBS. The propositional skeleton
(PS) in µSMT-CCBS is constructed such that a satisfying assignment to this
PS defines a solution to P with makespan at most µ, where P is the MAPFR
735 problem we are solving. The DECIDET procedure in µSMT-CCBS checks if
this solution is valid, i.e., if it contains any CCBS conflicts. We refer to this
procedure as to DECIDEMAPFR . If the solution is not valid, DECIDEMAPFR
returns one of the detected CCBS conflicts. Then, the PS is updated so that
assignments that satisfy it define a solution in which all the CCBS conflicts
740 returned so far by DECIDEMAPFR are avoided.
Algorithm 2 lists a high-level pseudo-code for µSMT-CCBS. µSMT-CCBS
maintains a set Ψ that contains all the CCBS conflicts returned by DECIDEMAPFR
so far. Initially, Ψ is empty (line 1). In every iteration of µSMT-CCBS, the PS
is created for the current set of conflicts (CreatePS(·) in line 3). A SAT solver
745 is used to search for a satisfying assignment to this PS (line 4). If no solution

32
Algorithm 2: The µSMT-CCBS algorithm.
Input: P , the MAPFR problem; µ, the makespan bound
1 Ψ←∅
2 while True do
3 PS ← CreatePS(Ψ, P, µ)
4 Π ← Solve(PS)
5 if No solution found (i.e., Π is null) then
6 return null

7 Con ← DECIDEMAPFR (Π, P, µ)


8 if No conflict found (i.e., Con is null) then
9 return Π

10 Add Con to Ψ

exists, the given decision problem is unsolvable, i.e., there are no solutions to P
with makespan equal to or smaller than µ. Otherwise, we apply DECIDEMAPFR
to check if the found solution is a valid MAPFR solution (line 7). If it is a valid
solution, we return it. Otherwise, DECIDEMAPFR returns one of the CCBS
750 conflicts that exist in this solution. The returned CCBS conflict is added to the
set of conflicts Ψ. This process continues until either a valid solution is found
(line 9) or we establish that no solution exists (line 6).
Our DECIDEMAPFR procedure is exactly the same as the conflict-detection
step in CCBS. It applies a conflict-detection mechanism to check for collisions
755 between the agents’ plans in the given solution. The process of generating our
PS for a given set of CCBS conflicts, MAPFR problem, and makespan bound –
CreatePS(Ψ, P, µ) – is more involved and we describe it next.

5.1.1. Generating the Propositional Skeleton


CreatePS consists of the following steps.

760 1. Compute a set of CCBS constraints for the set of conflicts in Ψ.

33
2. Compute for each agent a set of single-agent plans that satisfy these con-
straints.
3. Create a PS for choosing a solution from these sets of single-agent plans.
4. Add constraints to verify the chosen solution avoids all conflicts in Ψ.

765 Next, we describe the details of each step of CreatePS. As we will see, these
steps are designed to obtain a behavior similar to CCBS. In particular, in the
first step, CreatePS computes all the constraints CCBS would impose to resolve
the conflicts in Ψ, and in the second step, CreatePS computes efficiently all
single-agent plans that CSIPP would generate to satisfy every subset of these
770 constraints.6
Step 1. Compute the set of CCBS constraints for Ψ. For every CCBS
conflict Con in Ψ, we generate the pair of CCBS constraints CCBS would create
to resolve Con. This pair of CCBS constraints is specified in Lemma 1. Note
that to compute such a pair of constraints, we need to compute the unsafe
intervals for the respective conflict. By const(Con) we denote the pair of CCBS
constraints for a conflict Con ∈ Ψ, and by const(Ψ) – the set of all pairs of
CCBS constraints were generated this way, i.e.,

const(Con) = const(h(ai , ti ), (aj , tj )i) = {hi, ai , [ti , tui )i , j, aj , [tj , tuj ) }




(13)

const(Ψ) = {const(Con) | Con ∈ Ψ} (14)

Step 2. Compute the set of relevant single-agent plans. For each


agent, we compute all single-agent plans CSIPP would generate for that agent
given any subset of constraints in const(Ψ) that involve that agent. To compute
and store this set of single-agent plans efficiently, we create for each agent a
775 specific Multi-Value Decision Diagram (MDD) [30] that we refer to as M DDR .
An MDD is a direct acyclic graph with a single source and sink.7 MDDs have

6 Note that CreatePS does so without explicitly running CSIPP an exponential number of
times.
7 Technically, an MDD can have multiple sinks, where each sink is labeled as either true

34
been used in the context of MAPF before, e.g., in the ICTS and E-ICTS algo-
rithms [8, 13]. Every node in our M DDR is a pair (u, t) where u is a vertex
in G and t is a point in time. An edge ((u, t)(v, t0 )) corresponds to a timed
780 action from u to v that starts at t and ends in t0 . We generate our M DDR
by performing a variant of Dijkstra’s algorithm that considers all possible wait
actions CSIPP may include to avoid any constraint in const(Ψ). Algorithm 3
describes in detail how this is done.

Algorithm 3: Generate M DDR for agent i.


1 GenerateMddR (Π, const(Ψ), µ, i)
2 X i ← {(S(i), 0)}; E i ← ∅; Open ← ∅
3 insert (S(i), 0) into Open
4 while Open 6= ∅ do
5 (u, t) ← pop (u, t) from Open where t = mint (Open)
6 if t ≤ µ then
7 foreach a ∈ A such that from(a) = u and t + aD ≤ µ do
8 insert (to(a), t + aD ) into Open
9 X i ← X i ∪ {(to(a), t + aD )}
10 E i ← E i ∪ {[(u, t); (to(a), t + aD )]}
11 foreach hi, a, [t0 , t0u )i ∈ const(Ψ) do
12 if t ∈ [t0 , t0u ] and t0u + aD ≤ µ then
13 insert (u, t0u ) into Open
14 X i ← X i ∪ {(u, t0u )}
15 E i ← E i ∪ {[(u, t); (u, t0u )]}

16 foreach hi, a0 , [t0 , t0u )i ∈ const(Ψ) where a0 is wait at to(a) do


17 if t < t0u − aD and t0u ≤ µ then
18 insert (u, t0u − aD ) into Open
19 X i ← X i ∪ {(u, t0u − aD )}
20 E i ← E i ∪ {[(u, t); (u, t0u − aD )]}

21 return (X i , E i )

or false. This is equivalent to having a single sink that gathers all sinks labeled as true, and
removing all branches that only end up in sinks labeled as false.

35
The input to Algorithm 3 is a MAPFR problem Π, a set of CCBS constraints
785 const(Ψ), a makespan bound µ, and an agent i. X i and E i in Algorithm 3 are
the nodes and edges of the M DDR it generates for agent i. The root node of this
M DDR is (S(i), t). Initially, X i contains only the root node and E i is empty.
Open is a collection of M DDR nodes ordered by their time points, which is
initialized with the M DDR root node. In every iteration, the best (minimal
790 time) node (u, t) in Open is removed from Open and expanded. The children
of (u, t) are generated as follows. For every move action a ∈ A that starts with
u and ends before the makespan bound (i.e., t + aD ≤ µ), we generate a child
node (to(a), t + aD ), which represents performing action a at time t (line 9-10).
If there exists a CCBS constraint that prohibits agent i from performing a at
795 time t, we create an additional node in the M DDR that represents the option
of waiting at u until a can be performed without conflicting with this constraint
(line 11-15). If there is a CCBS constraint that prohibits waiting at to(a) then
we create an additional node in the M DDR to allow the agent to stay at from(a)
until it can apply a to reach to(a) immediately after the constraint on waiting at
800 to(a) ends (line 16-20). Note that the generated M DDR includes single-agent
plans that violate some of the given CCBS constraints. For example, for a move
action a ∈ A and an M DDR node (u, t), Algorithm 3 will create a child node
(to(a), t + aD ) even if there is a CCBS constraint that prohibits performing a at
that time. This is done so that the generated M DDR represents all single-agent
805 plans CSIPP would return given any subset of the given CCBS constraints. The
create PS described below will ensure that the set of single-agent plans chosen
will comprise a valid solution if possible.

Example 3. Figure 5 shows the M DDR structures for the agents in our run-
ning example (Figure 3). The upper part shows the M DDR structures created
for each agent w.r.t. an empty set of constraints. In our example, each agent has
a single shortest single-agent plan, and so the M DDR created for an empty set
of constraints is a line. The red and blue agents’ plans have a conflict between
the timed actions (F → I, 2.000) and (H → C, 2.000). The unsafe intervals for

36
A B F I I

E F I J J

G H C D

0 1 2 3 4 5 6 7 8 9 10
4.828 6.828

A B F I I I

E F I J J
F I J

G H C D D
H C D

0 1 2 3 4 5 6 7 8 9 10
3.310 3.743 4.828 6.571 6.828 8.571
8.310 9.310

Figure 5: An illustration of M DDR s. The upper part shows an initial M DDR for the
MAPFR problem in Figure 3. The lower part shows the M DDR for the same problem after
resolving a conflict between the red and blue agents traversing (F, I) and (H, C) respectively.

37
these actions are [2.000, 3.310) and [2.000, 3.743), respectively. So, the CCBS
constraints generated for this conflict are

{hred, (F → I, 2.000), [2.000, 3.310)i , hblue, (H → C, 2.000), [2.000, 3.743)i}


(15)
The lower part of Figure 5 shows the M DDR generated for each agent when
considering the pair of CCBS constraints in Equation (15).

Step 3. Create a PS. We create in this step a PS such that every


satisfying assignment to it corresponds to a MAPFR solution in which each
agent follows a single-agent plan from its M DDR . To create such a PS, we
0
t,t t i
create Boolean variables Ev,v 0 (i) and Xv (i) for every edge and node in E and
X i from M DDR respectively. Setting Xvt (i) to true represents that agent i
0
t,t
plans to occupy location v in time t. Setting Ev,v 0 (i) to true represents that

agent i plans performs an action that starts at time t, ends at time t0 , and
moves the agent from v to v 0 . Wait actions from the M DDR are represented
0
t,t 0
in our PS by Ev,v 0 (i) variables with v = v . To ensure that an assignment to all

these variables constitutes a solution to the given MAPFR problem, we add the
following restrictions over these Boolean variables to the PS.
_ 0
t,t
Xvt (i) ⇒ Ev,v 0 (i), (16)
(v 0 ,t0 ) | [(v,t);(v 0 ,t0 )]∈E i

X 0
t,t
Ev,v 0 (i) ≤ 1 (17)
(v 0 ,t0 ) | [(v,t);(v 0 ,t0 )]∈E i

0 0
t,t t
Ev,v 0 (i) ⇒ Xv 0 (i) (18)

810 Equations 16 and 17 state that if agent i appears in vertex v at time t then it has
to leave through exactly one edge connected to v. Equation 18 establishes that
once an agent enters an edge in the M DDR it has to leave it at its endpoint.
We also add to the PS clauses that verify every agent starts in its start location
and ends in its goal. Thus, a satisfying assignment to this formula represents a
815 solution to the given MAPFR problem.

38
Step 4. Add constraints to verify all conflicts are avoided. To
avoid all the CCBS conflicts in Ψ, we verify that for every conflict Con ∈ Ψ
one of the CCBS constraints in const(Con) is satisfied. This is done by adding
to the PS created in step 3 an appropriate disjunction. That is, for a conflict
h(ai , ti ), (aj , tj )i, where ai is the action that moves the agent i from v to v 0 and
aj – moves agent j from u to u0 , we add the following clause:

ti ,ti +ai D t ,t +aj D


j j
¬Ev,v 0 (i) ∨ ¬Eu,u 0 (j) (19)

This clause represents mutual exclusion between the two conflicting actions.
Thus, having this clause in the PS ensures at most one of these timed actions
will be applied in any satisfying solution.

Example 4. Consider again our running example from Figure 3, and the three
M DDR structures shown in the upper part of Figure 5. The variables created
for the corresponding PS, are:

XA0.000 (1), XB2.000 (1), XF4.000 (1), XI6.828 (1), XI8.000 (1),
0.000,2.000 2.000,4.000 4.000,6.828 6.828,8.000
EA,B (1), EB,F (1), EF,I (1), EI,I (1)

for agent 1 (green)

XE0.000 (2), XF2.000 (2), XI4.828 (2), XJ6.828 (2), XJ8.000 (2)
0.000,2.000 2.000,4.828 4.828,6.828 6.828,8.000
EE,F (2), EF,I (2), EI,J (2), EJ,J (2)

for agent 2 (red), and

XG0.000 (3), XH
2.000
(3), XC7.000 (3), XD
8.000
(3)
0.000,2.000 2.000,7.000 7.000,8.000
EG,H (3), EH,C (3), EC,D (3)

for agent 3 (blue). These variables are used in the constraints defined by Equa-
tions 16-19. In addition, we establish that each agent starts and ends in its
start and goal location by setting the variables XA0.000 (1), XE0.000 (2), XG0.000 (3)
(start positions) and XI8.000 (1), XJ8.000 (2), XD
8.000
(3) (goal positions) to true.

39
The resulting formula can be solved easily by setting all variables to true, which
corresponds to the solution in which the agents do not wait in any position and
go directly to their goals: agent 1 (green) goes from A to B to F to I, agent 2
(red) goes from E to F to I to J, and agent 3 (blue) goes from G to H to C to D.
As noted earlier, this solution has a conflict, and the M DDR in the lower part
of Figure 5 is created by considering the CCBS constraints designed to resolve
this conflict. The PS created for the three M DDR s in the lower part of Figure 5
includes additional variables representing new timed actions and locations the
agent may now occupy. These variables are:

8.000,8.571
Agent 1 (green):XI8.571 (1)EI,I (1)

2.000,3.743 3.743,6.571
Agent 2 (red):XF3.743 (2), XI6.571 (2), XJ8.571 (2), EF,F (2), EF,I (2),
6.571,8.571
EI,J (2)

3.310 2.000,3.310 3.310,8.310


Agent 3 (blue):XH (3), XC8.310 (3), XD
9.310
(3), EH,H (3), EH,C (3),
8.310,9.310
EC,D (3)

Additional clauses are added for these variables following Equations 16–18. In
addition, the following clause it added as specified in Equation 19

2.000,4.828 2.000,7.000
¬EF,I (2) ∨ ¬EH,C (3). (20)

to verify that the conflict identified in the previous solution will be avoided.

820 Theorem 2. µSMT-CCBS is a sound and complete algorithm for bounded-cost


MAPFR , i.e., every solution returned by µSMT-CCBS is indeed a solution with
makespan equal to or smaller than the makespan bound, and if such a solution
exists then it will be found.

Soundness is established by the fact that DECIDEMAPFR verifies that the


825 returned solution is valid. Establishing completeness, however, is less trivial.
The key understanding is that performing wait actions is only beneficial if it

40
eventually enables performing a move action. Therefore, it is sufficient to only
consider the wait actions’ duration specified in lines 12 and 17. A formal proof
is given in Appendix B.

830 5.2. SMT-CCBS

Algorithm 4: Pseudo code for SMT-CCBS


Input: P , the MAPFR problem
1 Π ← {π1 , . . . , πk }, where πi is the shortest single-agent plan for agent ai
2 µ ← maxki=1 µ(πi )
3 while True do
4 Π ← µSMT-CCBS(P , µ)
5 if Π is not null then
6 return Π

7 µ ← Compute next µ

Finally, we can present our SMT-CCBS algorithm. Pseudo code is listed


in Algorithm 4. It starts by setting the makespan-bound µ to the single-agent
plan with the longest duration. Then, it runs µSMT-CCBS to try to find a
valid solution whose makespan is µ. If it succeeded, it returns that solution.
835 Otherwise, µ is increased to the next relevant makespan (line 7). The next
relevant makespan is set by adding to µ the minimal amount that enables adding
at least one more node to one of the M DDR structures. We computed this
minimal amount when generating the M DDR for each agent (Alg. 3). There,
whenever an MDD node is pruned for exceeding the makespan bound µ (lines 7,
840 12, and 17), we keep track of the gap between µ and the minimal makespan
bound in which that node would not have been pruned. The minimal gap over
all pruned nodes and agents is exactly the minimal increase in µ that would
allow an additional action. Appendix C lists the complete pseudo code for
generating an M DDR for an agent and computing the next makespan bound.

41
845 Theorem 3. SMT-CCBS is sound, solution complete, and is guaranteed to
return a makespan-optimal solution.

Proof. Since every solution returned by SMT-CCBS was also returned by µSMT-
CCBS, and µSMT-CCBS is sound (Theorem 2), then SMT-CCBS is sound as
well. The initial value of µ is set as a lower bound on the optimal makespan,
850 since no agent can reach faster to its goal than when it ignores all other agents.
In every subsequent iteration of SMT-CCBS, µ is incremented by the minimal
amount required to allow Algorithm 3 to add at least a single node to one of the
M DDR structures. Let ∆µ be this minimal amount. By definition, increasing
µ by any amount smaller than ∆µ will result in exactly the same M DDR struc-
855 tures as created for µ. Consequently, µSMT-CCBS will not find a valid solution
for any makespan bound smaller than µ + ∆µ . Thus, solution completeness and
makespan-optimality of SMT-CCBS directly follows from the completeness of
µSMT-CCBS(Theorem 2).

SMT-CCBS, like CCBS, is also only solution-complete as opposed to com-


860 plete. Given an unsolvable MAPFR problem, SMT-CCBS will continue to in-
crease its makespan bound indefinitely. However, if an upper bound on the
makespan of the given MAPFR problem is available, then SMT-CCBS can be
modified to be complete by halting when µ reaches that bound.
Moreover, µSMT-CCBS is complete for any input parameter µ, hence we
865 can easily modify SMT-CCBS to a bounded suboptimal planner by changing
the µ increasing strategy at the high level. Instead of the minimum increase of
µ guaranteeing the optimality, a larger amount of increase can be used.

5.3. Design Choices and Implementation Details

For our implementation of SMT-CCBS we used Glucose 3.0 as the underlying


870 SAT solver [55]. Glucose 3.0, as well as other modern SAT solvers, can be run in
an incremental manner [25]. This means that the SAT solver re-uses information
from its previous call to speedup its next call, namely learnt clauses. This is
particularly useful in SMT-CCBS, because it calls the SAT solver multiple times

42
when solving a given MAPFR problem, adding more clauses. Moreover, the
875 instances being sequentially solved by the SAT solver are very similar to each
other, differing only in relatively few new binary clauses which empowers the
role of clause learning from previous calls.
Preliminary experiments with SMT-CBS, the previous SMT-based solver for
the discrete variant of MAPF, showed that it is better to collect and reflect all
880 conflicts discovered after each plan validation step rather than choosing a subset
of them according to any preference [56]. It also turned out to be good to transfer
a set of conflicts to the next value of the objective in the iterative scheme.
We follow the analogous design choice in SMT-CCBS too. That is, we collect
and maintain the set of conflicts discovered by DECIDEMAPFR throughout the
885 entire course of the SMT-CCBS algorithm. Our implementation of SMT-CCBS
is written in C++ and available at https://github.com/surynek/boOX.

6. Experimental Evaluation

warehouse rooms den520d empty16x16


warehouse-10-20-10-2-2 room-64-64-8 den520d empty-16-16

161 × 63 64 × 64 256 × 257 16 × 16


9, 776 free cells 3, 232 free cells 28, 176 free cells 256 free cells

Table 1: Details about the grid graphs used in our experiments, including their source files in
the grid MAPF benchmark, their dimensions, and the numbers of free cells.

We implemented CCBS and SMT-CCBS, and evaluated their performance


on a range of MAPFR problems. In this section, we report the results of this
890 evaluation.

43
Figure 6: (Left) Illustration of the 2k neighborhood for k = 2, 3, 4, and 5. (Right) An invalid
move on a 16-neighborhood grid. Although the source and target cells are unblocked and the

line connecting them does not intersect the blocked cell, a disk-shaped agent of radius 2/4
will run into the latter in case the move is executed.

6.1. Experimental Setup

In our experiments, we used two types of graphs: 2k -neighborhood grids [57]


and graphs that represent roadmaps.

6.1.1. 2k -Neighborhood Grids


895 For the 2k -neighborhood grid graphs, we used four grids from the recently
introduced grid-based MAPF benchmark [20] in the Moving AI repository [58].8
Table 1 provides a visual representation of these grids, as well as other statistics
such as the dimension of each grid and the number of non-obstacle cells. We
chose these specific grids as they represent different settings in which MAPFR
900 problems arise: video-games (den520d), warehouse logistics (warehouse), in-
door robotics (rooms), and field robotics (empty16×16). The corridors on the
Warehouse grid are 2 cells in width, the width of the entrances on the Rooms
grid is 1 cell.
For each grid we created a graph whose vertices are the grid cells, and edges
905 connect every grid cell to the 2k grid cells closest to it, where k varied from 2
to 5 (see Fig.6). We ignored inertial effects and assumed the moving speed of
each agent is one grid cell per one time unit. That is, in one time unit an agent
covers the segment whose length is equal to the distance between two adjacent

8 https://movingai.com/benchmarks/mapf.html

44
grid cells. This does not mean the duration of all move actions is unit. For
910 example, the duration of a move action from grid cell (x, y) to (x + 1, y + 1) is

2.

Agents’ shapes were disks of radius 2/4. The shape of an agent was consid-
ered when checking collisions with static obstacles as well as other agents. Thus,
if a move action starts and ends in empty grid cells but the disk representing
915 the agent intersects with a blocked cell while performing this move action, then
this action is prohibited. An example of this is given in Fig. 6 (right).
This grid-based MAPF benchmark [20] also provides scenario files for each
grid. Each scenario file lists start and goal locations for the agents. We used
this list to create MAPFR problem instances as suggested by the maintainers
920 of the benchmark [20]. That is, we created a MAPFR problem instance with
two agents whose start and goal locations are given in the first two lines of the
scenario file. Then, we created a new MAPFR problem instance by adding a
third agent whose start and goal locations are given in the third line of the
scenario file. This process is repeated to create a sequence of MAPFR problem
925 instances with more and more agents. To evaluate an algorithm on this sequence
of MAPFR problem instances, we run it until a MAPFR problem instance is
reached that the evaluated algorithm cannot solve in the given time limit. In
our experiments, we generated such sequences of MAPFR problem instances for
all the 25 random scenario files available for each grid. Overall, for each grid
930 and a specific number of agents we ran the evaluated algorithms on 25 different
MAPFR problem instances.

6.1.2. Roadmap Graphs


The second type of graphs we used in our experiments represents roadmaps.
These graphs were generated by processing the den520d grid graph with a
935 roadmap-generation tool from the Open Motion Planning Library (OMPL) [59],
which is a widely used tool in the robotics community. Specifically, we created
three graphs which we refer to as the sparse, dense, and mega-dense roadmap
graphs. As their names suggest, these graphs differ in their sizes and den-

45
sparse dense mega-dense

169 vertices, 349 edges 878 vertices, 7,341 edges 11,342 vertices, 263,533 edges

Table 2: Details about the roadmap graphs used in our experiments, including the number of
vertices and edges in each graph.

sity. Table 2 lists the exact number of vertices and edges in each graph and
940 shows them visually.9 The finite set of move actions A for each MAPFR
problem instances created for these road graphs comprise two move actions
for each edge in the roadmap graph, corresponding to crossing that edge in
each direction at a fixed speed. We created a suite of such MAPFR prob-
lem instances in a similar way as described above for the grid-based graphs,
945 creating 25 scenario files for each roadmap graph such that each scenario file
contains non-overlapping start-goal pairs chosen randomly out of the graph ver-
tices. All the scenario files used in our experiments are publicly available at:
https://tinyurl.com/ccbs-aij-instances.

6.1.3. Evaluation Metrics and Time Limit


950 We considered three main metrics in our evaluation. The first metric is the
success rate, which is the ratio of problem instances solved under a given time
limit. The second and third metrics are the SOC and makespan, respectively,
of the returned solution. These metrics are common in the MAPF literature.
Unless stated otherwise, we set the time limit to 30 seconds. We chose this
955 timeout to study the applicability of the suggested approaches in applications

9 The mega-dense graph contains so many edges that they visually overlap.

46
where the time allocated to pathfinding is very minimal. Such applications are
prevalent in robotics, digital entertainment, and many other real-work problems.
However, we also performed dedicated experiments with different time limits,
to investigate how the evaluated algorithms scale with time.

960 6.2. Grid Results

Figure 7 depict the success rates of CCBS and SMT-CCBS under the 30
second time limit for different number of agents (x-axis) in our 2k -neighborhood
grids. Plots of different colors correspond to results for different values of k
(neighborhood size).
965 The results show that CCBS and SMT-CCBS can find SOC-optimal and
makespan-optimal solutions, respectively, under the 30 second time limit for
MAPFR problem instances with several dozens of agents. For example, both
CCBS and SMT-CCBS solved 80% of problem instances for 24 agents on the
warehouse grid for all evaluated values of k (2, 3, 4, and 5). Note that state-
970 of-the-art solvers for classical MAPF can find SOC/makespan-optimal solutions
on similar grids with many more agents. This is expected as these solvers
solve an easier problem (MAPF as opposed to MAPFR ), do not perform time-
consuming conflict detection and unsafe-intervals estimation procedures, and
their constraints over vertices/edges are more restrictive.
975 Consider now the impact of increasing the neighborhood size, i.e., increasing
k. For both algorithms, in general, increasing k leads to lower success rates. This
is an expected effect resulting from the difference in branching factor, which is 5
for k = 2 and goes up to 32 for k = 5. However, for some settings, the difference
in algorithms’ performance for k = 2 and k = 3 is negligible, and setting k to
980 3 can actually lead to better results. Consider for example results of CCBS on
den520 or the results of SMT-CCBS on warehouse, when the number of agents
is below 30. In these examples, setting k = 3 improves the success rate in most
cases. A possible explanation is that increasing k also allows the agents to find
shorter single-agent plans. This can result in a faster low-level search for CCBS,
985 and smaller M DDR structures for SMT-CCBS, both of which may reduce the

47
Figure 7: Success rate for CCBS(left plots) and SMT-CCBS(right plots) on grid maps.

48
overall runtime. In other words, increasing k means a search space with a larger
branching factor, but also having a potentially smaller depth.
Relating the SMT-CCBS results to the CCBS results is somewhat prob-
lematic, since the former aims to optimize makespan while the latter aims to
990 optimize SOC. Nevertheless, one can see that there is no universal winner, where
CCBS is able to solve problem instances with more agents in den520d and ware-
house grids while SMT-CCBS solves problem instances with more agents in
empty 16 × 16 grids. These results are consistent with the common observation
that SAT-based methods work well for small and dense graphs while CBS-based
995 methods excel in larger and sparser maps [56].
To better understand the scalability of CCBS and SMT-CCBS, we repeated
the above experiments with k = 3 and timeouts of 1, 10, 30, 60, 150, and 300
seconds. The results are depicted on Fig. 8. The y-axis shows the total number
of solved instances for each grid across the different number of agents. As
1000 expected, increasing the timeout allows both algorithms to solve more instances.
However, CCBS quickly reaches a plateau in which extending the timeout does
not allow solving significantly more instances. The most notable increase in the
performance of CCBS is visible only for going from a 1s timeout to 30s. On the
other hand, SMT-CCBS is able to gain more by increasing the timeout in all
1005 maps except empty, where the difference between the number of tasks solved
under 30s timeout and 300s timeout is considerable. For example, this increase
in timeout results in SMT-CCBS solving approximately 1.5 times more problem
instances in the den520 and warehouse grids.
Allowing for longer runtime is especially beneficial for SMT-CCBS because
1010 more time can be spent in the SAT solving phase and this time can be utilized by
the SAT solver to fully employ its learning mechanism to prune the search space.
During long runs of the SMT-CCBS solver, the high-level phases that construct
the formula become relatively less time-consuming while the SAT solving phase
is increasingly dominant with respect to the overall runtime. Hence the efficiency
1015 of the SAT solver in pruning the search space represented by formulae modeling
the MAPFR problem becomes more pronounced.

49
Figure 8: Amounts of totally solved instances by CCBS and SMT-CCBS on different 8-
neighborhood grids depending on the time-limit.

Overall, these results suggest that when the time budget is low (1-30s) CCBS
should be preferred, when this budget is high (more than 3 minutes) than the
preference should be given to SMT-CCBS.

1020 6.2.1. Solution Cost Results


Next, we analyze the costs — makespan and SOC — of the solutions returned
by CCBS and SMT-CCBS. To allow comparing these costs for both algorithms
and between values of k, we considered only MAPFR problem instances that
were successfully solved by both algorithms for all values of k. For each problem
1025 instance, algorithm, and value of k we computed two additional metrics: (1)
SOC gain ratio, which is the ratio between the SOC of the returned solution
and SOC of the solution returned by CCBS(k = 2); and (2) makespan gain
ratio, which is the ratio between the makespan of the returned solution and
makespan of the solution returned by SMT-CCBS(k = 2). For example, if the
1030 SOC gain ratio for an algorithm for a particular MAPFR problem instance is
0.75, it means that the SOC of the solution returned by this algorithm is smaller

50
Figure 9: The SOC- and makespan- gain ratios for CCBS and SMT-CCBS on grid maps.

51
by 25% compared to the SOC of the solution obtained by CCBS with k = 2.
Figure 9 shows the SOC- and makespan-gain ratios as box-and-whisker plots,
for different values of k. Each plot shows the following statistics: minimum
1035 (lower whisker), maximum (upper whisker), median (horizontal line inside the
box), mean (cross sign inside the box), first and third quartiles (box) and also
outliers (bold dots above/below whiskers).
As one can see, increasing k, indeed, decreases both SOC and makespan.
This effect is most notable when going from k = 2 to k = 3. Consider, den520d
1040 map for example. Setting k to 3 reduces both makespan and SOC by about
15-17% on average. However, increasing k beyond 3 had a much smaller effect
on cost. In fact, in all our experiments going from k = 4 to k = 5 yielded a
marginal improvement of at most 1% approximately in terms of makespan and
SOC, for both CCBS and SMT-CCBS. It is noteworthy, that a similar result
1045 was was previously observed in 2k single-agent pathfinding [57].
Next, consider the makespan results for CCBS. While CCBS is designed to
optimize SOC and not makespan, our results show that very often the solution
returned by CCBS is also optimal w.r.t. makespan. Similarly, while SMT-
CCBS is designed to optimize makespan, it often returns solutions that are
1050 SOC-optimal. To see this, consider the flat bars at 1.0 on both SOC gain and
makespan gain plots for CCBS(k=2) and SMT-CCBS(k=2) on all grids. In some
cases, CCBS returns a solution with suboptimal makespan, and, similarly, SMT-
CCBS returns a solution with suboptimal SOC. This can be seen by observing
the outliers in the room and empty grids. But in general, the shapes of the
1055 CCBS and SMT-CCBS boxes look very similar. This suggests that for our
MAPFR problems a SOC-optimal solution is often makespan-optimal and vice
versa. Moreover, the high-level behavior of both algorithms is similar – starting
with the optimal single-agent plans, identifying conflicts and resolving them in
a similar way. Therefore, the eventual solutions they return tend to be similar.

52
Figure 10: Results of CCBS(left) and SMT-CCBS(right) for different roadmaps.

53
1060 6.3. Roadmaps Results
Fig. 10 depicts the success rates for CCBS and SMT-CCBS on our roadmap
graphs. Consider first the impact of the different roadmap types on the success
rates of CCBS and SMT-CCBS. For CCBS, the success rates of the sparse and
mega-dense roadmaps are similar and notably lower than the success rate for
1065 the dense roadmap. This can be explained as follows. In the sparse roadmap,
the number of edges is small and thus agents are likely to conflict by planning
to use the same edges. In the mega-dense roadmap, the edges densely populate
the metric space in the map, and thus agents are likely to conflict by planning
to use different edges that are too close to each other. In other words, in the
1070 sparse roadmap the agents have fewer alternative paths to choose from to avoid
conflicts, while in the mega-dense roadmap agents have many alternative paths
to choose from but a large number of them conflict. In both cases, numerous
conflicts arise, making the problem harder for CCBS. The dense roadmap
provides a reasonable trade-off between the number of alternative paths the
1075 agents have to choose from and the likelihood that such paths conflict, yielding
a notably higher success rate.
The impact of the different roadmaps on success rates is different for SMT-
CCBS. There, the success rates monotonically decrease for all number of agents
when increasing the density of the roadmap graph. This occurs because signifi-
1080 cantly larger and denser graphs require creating and solving SAT formulae with
more variables and constraints. The impact of increasing the density of edges
here is multiplied by the absence of compression at the M DDR level, as in the
roadmaps there is scarce symmetry in the neighborhood of a position. There-
fore, the possibility of generating a single node in M DDR by different paths
1085 is eliminated. This is also shown when comparing the success rates of CCBS
and SMT-CCBS. The latter performs better on the sparse roadmap while the
former is superior on the moderately sized and larger roadmaps (dense and
mega-dense).
Next, consider the SOC and makespan of CCBS and SMT-CCBS in the
1090 different roadmaps. Since the results show SOC and makespan over only the

54
empty16x16 den520d
1100 900
1000 800
900
Solved Instances

700

Solved Instances
800
700 600
600 500
500 400
400 300
300 200
200
100 100
0 0
k=2 k=3 k=4 k=5 k=2 k=3 k=4 k=5
CBS-CT E-ICTS CCBS SMT-CCBS CBS-CT E-ICTS CCBS SMT-CCBS

warehouse rooms
1400 400
1200 350

Solved Instances
Solved Instances

1000 300
250
800
200
600 150
400 100
200 50
0 0
k=2 k=3 k=4 k=5 k=2 k=3 k=4 k=5
CBS-CT E-ICTS CCBS SMT-CCBS CBS-CT E-ICTS CCBS SMT-CCBS

Figure 11: Amounts of totally solved instances.

solved instances, there are some cases where the average SOC or makespan for
the denser roadmaps is slightly higher. However, in general, denser roadmaps
allow finding solutions with lower costs by both algorithms, as expected. That
being said, the actual difference in makespan between dense and mega-dense
1095 is very small, suggesting that for finding optimal solutions, the dense roadmap
provide sufficient discretization of the continuous space to find high quality
solutions. However, finding the suitable density for a roadmap of a given terrain
is a topic that is beyond the scope of this work.

6.4. Comparison to Other Solvers

1100 Next, we compare the performance of CCBS and SMT-CCBS to other solvers
that also address versions of the MAPFR problem as explained in Section 2.2.
Namely, we compared our algorithms with E-ICTS [13] and ECBS-CT [14].
Both E-ICTS and ECBS-CT are bounded-suboptimal algorithms — they ac-
cept a parameter w ≥ 1 that bounds the suboptimality of the solution they
1105 return. In our experiments, we set w = 1 to ensure that the returned solution

55
is optimal.10 We refer to ECBS-CT with w = 1.0 as CBS-CT. We used the au-
thors’ implementations of these algorithms, which are either freely available on
Github (E-ICTS)11 or were shared with us by the ECBS-CT authors. Note that
these implementations may not exactly match those used in prior publications
1110 about these algorithms, and thus the results we report may differ.
E-ICTS and CBS-CT implementations handle continuous time by discretiz-
ing it according to a minimal wait time parameter ∆. We set it to be 1/1000 in
our experiments. Since these implementations do not support general graphs,
we ran our comparison on 2k -neighborhood grids only. E-ICTS supports these
1115 grids by default. For CBS-CT we designed motion primitives that correspond
to moves along the edges of 2k -neighborhood grid.
Fig. 11 shows the number of solved instances for all algorithms under a time
limit of 30s. Each plot corresponds to a particular map from our dataset. On the
empty-16-16 grid SMT-CCBS clearly outperforms the competitors, while the
1120 performance of CCBS is similar to the one of E-ICTS. On the warehouse map,
CCBS dominates for all k, however, SMT-CCBS is very close for k = 3, 4, 5.
On rooms for k = 2 E-ICTS is a winner, for k = 2, 3 – CCBS is and for k = 5
CBS-CT solves the most instances. Finally, for the largest den520d map CCBS
and SMT-CCBS evidently outperform the other algorithms for k = 3, 4, 5. For
1125 k = 2 CBS-CT is very close to SMT-CCBS. Overall, our experiments show that
there is currently no universal winner. This is aligned with the current state of
the art in classical MAPF algorithms, where identifying which MAPF algorithm
to use in which domain is an open question.

10 Note that since E-ICTS and ECBS-CT discretize time, the cost of the solutions they
return may still be higher than those returned by CCBS and SMT-CCBS. An example where
discretizing time yields suboptimal solution is shown in Figure 4. However, in our experiments
we did not observe any notable difference in solution costs.
11 We used the version dated 30 August 2019 from https://github.com/thaynewalker/hog2.

Newer versions were unstable in our tests.

56
7. Related Work and Discussion

1130 In this section, we discuss the pros and cons of the two algorithms we pro-
posed (CCBS and SMT-CCBS), and how they are related to algorithms for
solving other variants of the general MAPF problem.

7.1. Discussion: CCBS vs. SMT-CCBS

In this paper, we defined CCBS to optimize SOC and SMT-CCBS to opti-


1135 mize makespan. But, it is possible to adapt them to optimize other objective
functions. Adapting SMT-CCBS to optimize SOC requires bounding both SOC
and makespan, such that SOC-optimal solutions are not missed. Surynek [60, 61]
has recently proposed a method to do so. Adapting CCBS to optimize makespan
is simpler, and Appendix D reports on limited comparison between such a
1140 makespan-optimal version of CCBS and SMT-CCBS.
Our current results show that CCBS often outperforms SMT-CCBS. In ad-
dition, there are many enhancements to the basic CBS algorithm that can be
migrated to CCBS and further improve its performance [62]. Nevertheless, a
major benefit for using SMT-CCBS is that improvements in the underlying
1145 SAT/SMT solver may automatically result in an improved MAPF solver. We
have observed this in a different MAPF-related study. Thus, SMT-CCBS may
still yield better results than CCBS in the future, with the advancement of SAT
solver technology.
More generally, CCBS and SMT-CCBS embody two standard approaches
1150 for solving combinatorial search problems – heuristic search and compilation
to SAT. Both approaches have been used to solve different types of planning
problems, including classical planning [63, 28, 64] and classical MAPF [65]. A
comparison between MAPF algorithms from these two approaches have been
reported in many papers [12, 66], including a recent paper [67] that included a
1155 comprehensive evaluation over a wide range or benchmark domains. The results,
in general, show that there is no single approach that can solve all problems.
Some classical MAPF problems can only be solved with a SAT-based algorithm

57
while others can only be solved by a search-based algorithm. This motivated us
to develop both a search-based algorithm (CCBS) and a SAT-based algorithm
1160 (SMT-CCBS) for solving MAPFR problems.

7.2. Discussion: Solvers for Different MAPF Variants

Non-uniform Wait is not Any-angle Agents’


Complete Opt. Dist.
move duration discretized moves volumes

CBS-CT [14] 3 7 7 3 3 3 7
E-ICTS [13] 3 7 7 3 3 3 7
AA-SIPP(m) [38] 3 3 3 3 7 7 7
MCCBS [17] 7 7 7 3 3 3 7
CBS-CL[68] 3 7 7 7 3 7 7
MAPF-POST [69, 70] 3 3 7 3 3 7 7
ORCA [71], ALAN [72] 3 7 3 3 7 7 3
dRRT* [73] 3 7 7 3 3 7? 7
CCBS [18], SMT-CCBS [19] 3 3 7 3 3 3 7

Table 3: Overview: MAPF research beyond the basic setting.

There exists a vast body of other works that study MAPF beyond its ba-
sic, classical, setting. Table 3 provides a differential overview of related work
on MAPF beyond its basic setting. Rows correspond to different algorithms
1165 or family of algorithms. Columns specify algorithm properties that the listed
algorithms have or not. These properties are whether the listed algorithm (1)
supports non-uniform durations for move actions (“Non-uniform move dura-
tions” column), (2) requires discretizing the durations of wait actions (“Wait
is not discretized”), (3) allows moving from one graph vertex to the other via
1170 the straight line even if there is no corresponding edge (“Any-angle moves”), (4)
considers the agents’ geometric shapes (“Agents’ volume”), (5) guarantees solu-
tion completeness (“Complete”), (6) guarantees optimal solution cost (“Opt.”),
and (7) whether the algorithm is distributed or not (“Dist.”). Completeness and
optimality here are with respect to a given movement and time resolution. In
1175 the rest of this section, we discuss the related works listed in this table and
others, in more detail.

58
In Section 2.2, we discussed several algorithms that are closest to our work,
including E-ICTS [13] and CBS-CT [14] . As explained there, both E-ICTS and
CBS-CT require a minimal wait duration to be given as the input parameter,
1180 and the duration of all wait actions are multiplicatives of that parameter. As
shown in Example 2, discretizing wait actions in such a way can lead to finding
suboptimal solutions. So, while E-ICTS and CBS-CT are optimal with respect
to the chosen discretization, the solutions they return may have a higher cost
than the solutions returned by CCBS and SMT-CCBS, which do not discretize
1185 time.
Yakovlev and Andreychuk [38] proposed AA-SIPP(m), an any-angle MAPF
algorithm. AA-SIPP(m) is based on SIPP and adopts a prioritized planning
approach. It does not guarantee completeness or optimality. Li et al. [17]
proposed Multi-Constraint CBS (MCCBS), a CBS-based algorithm for agents
1190 with a geometrical shape that may have different configuration spaces. However,
they assumed all actions have a unit duration and did not address continuous
time. Walker et al. [68] proposed CBS-CL, a CBS-based algorithm designed to
handle non-unit edge costs and hierarchy of movement abstractions. CBS-CL is
solution complete. However, CBS-CL does not allow reasoning about continuous
1195 time and does not return provably optimal solutions. Hönig et al. [69, 70]
proposed MAPF-POST, which is a post-processing step that adapts a solution
to a classical MAPF such that it respects a given set of kinematic constraints
over the agents’ motions. They prove that MAPF-POST can always find a
feasible schedule that satisfies the given kinematic constraints, and thus we
1200 view MAPF-POST as a complete algorithm. It does not, however, guarantee
optimality.
dRRT* is a hybrid between a tree-search algorithm and a sampling-based
algorithm [73]. It runs a tree search over locations sampled in the configuration
space. dRRT* is solution complete – given enough time it will find a solution
1205 if one exists. Regarding solution quality, dRRT* provides a probabilistic type
of guarantee: given enough time it will return an optimal solution. However,
it does not provide a mechanism for identifying when this occurs. Thus, we do

59
not consider dRRT* as an optimal MAPF algorithm, since the cost the solution
it returns when halted may be far from optimal. Also, it is not clear how the
1210 duration of wait actions are chosen by dRRT*.
ORCA [74, 71] is a fast and distributed collision avoidance mechanism that
has been used to navigate multiple agents in continuous space [75]. It works by
computing for each agent in every time step the direction and velocity it should
use to avoid the other agents. While fast, using ORCA to navigate multiple
1215 agents does not provide any completeness or optimality guarantees. In addition,
ORCA requires time to be discretized. ALAN [72] is a multi-agent navigation
algorithm that integrates multi-armed bandits with collision avoidance using
ORCA that yields lower cost solutions. Like ORCA, ALAN is not complete or
optimal, and requires time to be discretized.
1220 All the algorithms mentioned above are designed to solve some variants of
the MAPF problem. The Multi-Agent Pickup and Delivery (MAPD) problem is
a problem that is closely related to MAPF, in which agents need to pick up and
deliver packages from one location to another. Techniques proposed to solve
MAPF problems have also been adapted to solve MAPD problems [76, 77, 39].
1225 In particular, Ma et al. [39] proposed a MAPD algorithm called TP-SIPPwRT
that also uses SIPP to handle continuous movement of non-holonomic agents in
this setting. TP-SIPPwRT is not optimal and is only complete for well-formed
MAPD problems. Future work may consider integrating ideas from CCBS and
SMT-CCBS and applying them to solve MAPD problems as well.

1230 7.3. Discussion: Improvements for CCBS and SMT-CCBS

Since CCBS has been introduced, a number of enhancements were suggested


that have the potential to improve its performance. Some of these improvements
were designed explicitly for CCBS or SMT-CCBS. Others are more general and
can be applied to both MAPF and MAPFR .
1235 Andreychuk et al. [62] transferred different techniques designed for solving
classical MAPF with CBS to MAPFR and CCBS. These enhancements included
disjoint splitting [78], prioritizing conflicts based on their cost impacts, high-

60
level heuristics [11]. The resulting solver, named Improved CCBS (ICCBS), was
reported to notably outperform CCBS on both 2k grids and roadmap graphs.
1240 Walker et al. [24] proposed another powerful CBS enhancement as part of
their CBS+TAB algorithm. The key idea of CBS+TAB is to add multiple
time-range constraints (similar to the ones proposed in this work) per single CT
node of CBS in order to prune the CT. The reported increase in algorithm’s
performance is significant. The original implementation of CBS+TAB allows
1245 wait actions only of the fixed duration. That being said, one may be able to
implement CBS+TAB such that it handles non-discretized wait actions, and one
may incorporate the action pruning rules from CBS+TAB in our algorithms.
This spatial symmetry-breaking technique of CBS+TAB was later augmented
with a cost-based symmetry-breaking [79] inspired by techniques used in E-
1250 ICTS. The resulting hybrid algorithm, called Conflict-Based Increasing Cost
Search (CBICS), was reported to outperform the competitors. Still, their eval-
uation was carried out with the fixed duration of the wait action equal to one
time step. We believe that implementing CBICS that supports wait actions of
arbitrary duration is a promising direction of future work.
1255 Recent development in SMT-CCBS focused on reducing the size of M DDR
structures being generated by the algorithm [80]. The M DDR size reduction
method is based on including promising paths into M DDR first. An additional
loop is needed in the corresponding modification of SMT-CCBS that gradually
extends the set of paths represented in M DDR structures starting with promis-
1260 ing ones and continuing towards full M DDR structures representing all paths.
The preference of paths for inclusion in M DDR structures represents a room
for integrating domains specific heuristics in SMT-CCBS.

8. Conclusion

We proposed two novel algorithms for solving MAPFR , which is a form of


1265 MAPF in which time is continuous, actions can have an non-uniform duration,
and agents, as well as objects, have geometric shapes. The first algorithm we

61
presented is called CCBS. It follows the Conflict Based Search (CBS) frame-
work [7], but uses an adapted version of Safe Interval Path Planning (SIPP) [15]
for the low-level search, and a unique type of conflicts and constraints for the
1270 high-level search. We prove that CCBS is a sound, solution complete, and re-
turns SOC-optimal solutions. The second algorithm we presented is called SMT-
CCBS. It follows the same approach as SMT-CBS [16] by breaking the given
MAPFR problem into a sequence of bounded-cost MAPFR problems. Each of
these bounded-cost problems is solved by applying a SAT modulo Theory (SMT)
1275 problem-solving framework. That is, a propositional skeleton (PS) is generated
and continuously updated until a solution to this PS represents a valid solution
to the given bounded-cost problem. We prove that SMT-CCBS is also sound
and solution complete, and guarantees a makespan-optimal solution is returned.
To the best of our knowledge, CCBS and SMT-CCBS are the first algorithms
1280 to provide optimality guarantees for such a general version of MAPF.
We implemented both algorithms and evaluated them experimentally on a
set of benchmarks, including grid-based graphs [20] and roadmaps. Our exper-
imental results showed that while MAPFR is, in general, more difficult than
classical MAPF, using either CCBS or SMT-CCBS enables finding optimal so-
1285 lutions to problems with dozens of agents under a strict time limit of 30 seconds.
Nevertheless, there is much room for improvement, and scaling to even larger
problems is still an open challenge. One of the bottlenecks we observed in
our experimental evaluation is conflict detection, which is more challenging in
MAPFR than in classical MAPF. Future work may apply meta-reasoning tech-
1290 niques to decide when and how much to invest in conflict detection throughout
the search. Another direction for future work is to integrate in CCBS the many
extensions and improvements that have been proposed over the years. These
improvements include disjoint splitting of CBS constraints [78], adding admis-
sible heuristic to the high-level search [11, 81], and novel forms of constraints
1295 symmetry-breaking [82]. Some of these improvements may be easy to incorpo-
rate in CCBS while incorporating others is more challenging. An initial step in
this direction has already been taken [62]. Finally, incorporating recently pro-

62
posed CBS enhancements, such as those mentioned in Section 7.3, into CCBS
and SMT-CCBS, is another prominent prospect of future work.
1300 One possible future research direction for SMT-CCBS is to study its con-
version to DPLL(MAPFR ), a solver that would integrate MAPFR solving and
the SAT solver in a more sophisticated way similarly as it is done in DPLL(T)
solvers [31]. In contrast to SMT-CCBS which waits until the SAT solver finds
a complete satisfying truth-value assignment and then tests it for collisions,
1305 DPLL(MAPFR ) would test for collisions even partial truth-value assignments
that arise during the search in the SAT solver. This could prune the search
made by the SAT solver using the high-level knowledge of MAPFR .
Finally, both CCBS and SMT-CCBS rely on a given finite set of move actions
that are given as input. This limits the completeness of both algorithms, in the
1310 sense that a MAPFR problem may be solvable with one set of move actions and
unsolvable in another. Similarly, it limits our claims for optimality — CCBS and
SMT-CCBS are both only optimal with respect to the given set of move actions.
Future research may investigate developing algorithms that do not accept the
set of allowed move actions as input, and instead automatically computes the
1315 necessary move actions to find an optimal solution. This research direction is
particularly challenging, as there are an infinite number of move actions, which
include stopping for an arbitrary amount of time at any point in space and
varying the speed at which the agents move between vertices.

Acknowledgments
1320 This research was partially funded by the Israeli Science Foundation (ISF)
grant #210/17 to Roni Stern. Konstantin Yakovlev and Anton Andreychuk
were supported by Russian Science Foundation (RSF) grant #16-11-00048. An-
ton Andreychuk was also supported by the “RUDN University Program 5-100”.
Pavel Surynek was supported by the Czech Science Foundation (GAČR) grant
1325 #19-17966S.

63
Appendix A. An Example of the SMT Problem-Solving Procedure

Consider the following simple example of problem-solving with SMT. The


decision problem is to assign integer values to the variables a, b, c, d such that
the following formula is true:

Γ := ((a = b) ∧ (a = c)) ∨ (¬(b = c) ∧ ¬(a = d)) (A.1)

Note that in any solution to Γ, the common axioms of equality, such as transi-
tivity, must hold. In an SMT approach to solve Γ, we can define the PS

(Xab ∧ Xac ) ∨ (¬Xbc ∧ ¬Xad ) (A.2)

where Xab , Xac , Xbc , and Xad are propositional variables such that xij rep-
resents that i = j for i, j = {a, b, c}. The SAT solver can set Xab = true,
Xac = true, Xbc = f alse, and Xad = f alse to satisfy this PS. The correspond-
1330 ing solution is inconsistent with the semantic of equality (EQ) theory, due to the
transitivity of equality. Namely, if xab and xac are both true, then xac must also
be true. Note that the SAT solver does not know this, as it is not represented in
the PS. A suitable DECIDET is needed to check that equality theory holds and
suggest a conflict otherwise. We denote by DECIDEEQ this implementation of
1335 DECIDET . In our case, DECIDEEQ can suggest the conflict Xab ∧ Xac → Xbc .
This conflict is added as to PS, so the SAT solver can give a new assignment.
In this way, knowledge from the underlying theory is propagated by the conflict
to the PS level.

Appendix B. Completeness of µSMT-CCBS

1340 In this section, we provide a formal proof for Theorem 2, which states that
µSMT-CCBS is sound and complete. Soundness in this context means that
every solution returned by µSMT-CCBS is indeed a valid MAPFR solution with
makespan equal to or smaller than the makespan bound. Completeness in this
context means that if such a solution exists then it will be found, and if a solution
1345 does not exist then µSMT-CCBS will return that no solution exists. Soundness

64
is established by the fact that DECIDEMAPFR verifies that the returned solution
is valid. Establishing completeness, however, is less trivial and is done below.
For µSMT-CCBS to be complete, we require that the set of single-agent
plans it considers contains every single-agent plan that CSIPP would return
1350 given every subset of the set of all pairs of CCBS constraints (const(Ψ)). To
state this more formally, let M DDR (µ, const(Ψ)) be the set of single-agent plans
represented by the M DDR created for µ and const(Ψ), and let CSIPP(Const0 )
be the single-agent plan created by CSIPP given the set of constraints Const0 .

Lemma 2. For every set of CCBS constraints Const0 ⊆ const(Ψ) it holds that
1355 CSIPP(Const0 ) ∈ M DDR (µ, const(Ψ)).

Proof. Let π = (a1 , . . . , an )) be the single-agent plan returned by CSIPP given


a set of constraints Const0 ⊆ const(Ψ). Let {(a1 , t1 ), . . . , (an , tn )} be the corre-
sponding set of timed-actions, i.e., tj = ([π[: (j − 1)])D . To prove Lemma 2, we
proof by induction over j that for every timed action (tj , aj ) in this set there
1360 exists an edge ((from(aj ), tj ), (to(aj ), tj + (aj )D )) in our M DDR .
Base case. For j = 0, we need to show that ((from(a1 ), 0), (to(a1 ), a1D )) is
in our M DDR . There are two cases: either a1 is a wait action or a move action.
In case a1 is a move action, the ((from(a1 ), 0), (to(a1 ), (a1 )D )) trivially exists in
our M DDR by construction (lines 9-10 in Algorithm 3). Now consider the case
1365 where a1 is a wait action. There are only two cases in which the plan returned
by CSIPP starts with a wait action:

1. There is a move action a0 that starts at S(i), and there is CCBS constraint
hi, a0 , [0, tu )i.
2. There is a move action a0 that starts at S(i) and ends in some vertex v
1370 which has safe interval that starts at some time t where t > a0D . In this
case, CSIPP may choose to wait at S(i) until it can arrive at v at time t.

In the first case, (a1 )D = tu . In the second case, (a1 )D = t − a0D . Our M DDR
covers both cases. The first case is explicitly mentioned in lines 11-15. To
see why the second case is also covered in our M DDR , recall that CSIPP only

65
1375 creates a new safe interval in a vertex v only when there is constraint over a wait
action at that vertex. Let hi, a0 , [t, tu )i be this constraint. The new safe interval
for CSIPP is designed such that it starts exactly when the unsafe interval of this
constraint ends, i.e., at tu . Therefore, to reach v at time tu with some action
avu we need to wait at v exactly tu − avu . An edge corresponding to such a wait
1380 action exists in our M DDR , as specified in lines 16-20.
Induction step. Now, assume the induction statement holds for j < m,
and consider the mth timed action (am , tm ) in the plan CSIPP returned. Let
(v, t) be the location and time reached after performing the first m − 1 actions
in π. By the induction assumption, the node (v, t) exists in our M DDR . The
1385 same argument used in the base step hold here as well. If am is a move action,
the edge ((from(am ), t), (to(am ), t + (am )D )) exists in our M DDR (lines 9-10).
If am is a wait action, it was created either to avoid a conflict with a subsequent
move action or to allow a subsequent move action to reach the start of some
safe interval. Both options are covered in our M DDR generation algorithm in
1390 lines 11-15 lines 16-20

Theorem 4. µSMT-CCBS is a complete algorithm for bounded-cost MAPFR .

Proof. There are three outcomes for every iteration of µSMT-CCBS: (1) the
current PS is not solvable, (2) the solution returned for the current PS has a
conflict, and (3) the solution returned for the current PS has no conflict. If the
1395 first outcome occurs, then due to Lemma 2 we can conclude that no solution
indeed exists. If the second outcome occurs, the added conflict will be avoided
in future iteration by adding its corresponding pair of CCBS constraints. No
potential solution is lost by this since the CCBS constraints are a sound pair of
constraints. If the third outcome occurs, the solution is returned, as required.
1400 Thus, in every iteration of µSMT-CCBS we do not lose any possible solutions.
Observe that a given MAPFR problem can give rise to finitely many sound
pairs of constraints under the given makespan limit µ. Hence the µSMT-CCBS
terminates as only finitely many sound pairs of constraints can be found and
resolved while pairs of constraints being resolved remain resolved in all future

66
1405 steps of the algorithm. Therefore, after finitely many steps the algorithm either
succeeds or fails. In case of failure, the set of sound constraints cannot be
satisfied for the current makespan limit µ which means there is no solution of
the input MAPFR instance for this makespan limit µ.

Appendix C. Generating an M DDR and Computing the Next Makespan


1410 Bound

Here, we provide the complete pseudo-code for generating an M DDR struc-


tures, which also integrates the computation of the next makespan bound. The
main difference between this pseudo-code and the one given in Algorithm 3 is
that it includes the computation of the next makespan bound to consider, de-
1415 noted in the pseudo code as µnext . This computation of µnext is given in lines 4,
19, 27, 29, and 31.

Appendix D. Makespan-Optimal CCBS

The experimental results of CCBS and SMT-CCBS are not directly compa-
rable due to the different implementations and different cost objectives. While
1420 CCBS optimizes SOC, SMT-CCBS returns makespan-optimal solutions. We
have also implemented a version of CCBS that returns makespan-optimal solu-
tions. This involves changing the cost of nodes in the CT to be the makespan
of the incumbent solution rather than SOC.
We performed a limited set of experiments with this makespan-optimal ver-
1425 sion of CCBS, on two 2k -connected grid maps — empty16x16 and den520d —
for k = 2, 3, 4, 5. We used the same scenario files and generated instances in the
same way as described in Section 6. The time limit was 30 seconds.
Figure D.12 shows the success rate plots comparing the makespan-optimal
CCBS with SMT-CCBS. As one can note, in many settings CCBS was able to
1430 find more solutions compared to SMT-CCBS. Still, in some settings (empty-16-
16, k=2) the latter outperformed CCBS. Exploring when each algorithms will
perform best is a topic for future research.

67
Algorithm 5: Generate M DDR for agent i and compute next
makespan bound µ.
1 GenerateMddR (Π, const(Ψ), µ, i)
2 X i ← {(S(i), 0)}; E i ← ∅; Open ← ∅
3 insert (S(i), 0) into Open
4 µnext ← ∞
5 while Open 6= ∅ do
6 (u, t) ← pop (u, t) from Open where t = mint (Open)
7 if t ≤ µ then
8 foreach a ∈ A such that from(a) = u do
9 if t + aD ≤ µ then
10 insert (to(a), t + aD ) into Open
11 X i ← X i ∪ {(to(a), t + aD )}
12 E i ← E i ∪ {[(u, t); (to(a), t + aD )]}
13 foreach hi, a, [t0 , t0u )i ∈ const(Ψ) such that t ∈ [t0 , t0u ] do
14 if t0u + aD ≤ µ then
15 insert (u, t0u ) into Open
16 X i ← X i ∪ {(u, t0u )}
17 E i ← E i ∪ {[(u, t); (u, t0u )]}
18 else
19 µnext ← min(µnext , t0u + aD )

20 foreach hi, a0 , [t0 , t0u )i ∈ const(Ψ) where a0 is wait at to(a) do


21 if t < t0u − aD then
22 if t0u ≤ µ then
23 insert (u, t0u − aD ) into Open
24 X i ← X i ∪ {(u, t0u − aD )}
25 E i ← E i ∪ {[(u, t); (u, t0u − aD )]}
26 else
27 µnext ← min(µnext , t0u )

28 else
29 µnext ← min(µnext , t + aD )

30 else
31 µnext ← min(µnext , t)

32 return (X i , E i ), µnext

68
Figure D.12: Comparison of CCBS and SMT-CCBS, that both optimize makespan objective.

Bibliography

References

1435 [1] P. R. Wurman, R. D’Andrea, M. Mountz, Coordinating hundreds of co-


operative, autonomous vehicles in warehouses, AI magazine 29 (1) (2008)
9.

[2] R. Morris, C. S. Pasareanu, K. S. Luckow, W. Malik, H. Ma, T. S. Ku-


mar, S. Koenig, Planning, scheduling and monitoring for airport surface
1440 operations., in: AAAI Workshop: Planning for Hybrid Systems, 2016.

[3] M. M. Veloso, J. Biswas, B. Coltin, S. Rosenthal, Cobots: Robust symbiotic


autonomous mobile service robots., in: IJCAI, 2015, p. 4423.

[4] H. Ma, J. Yang, L. Cohen, T. K. S. Kumar, S. Koenig, Feasibility study:


Moving non-homogeneous teams in congested video game environments, in:
1445 Conference on Artificial Intelligence and Interactive Digital Entertainment
(AIIDE), 2017, pp. 270–272.

[5] P. Surynek, An optimization variant of multi-robot path planning is in-


tractable., in: AAAI, 2010, pp. 1–3.

[6] J. Yu, S. M. LaValle, Structure and intractability of optimal multi-robot


1450 path planning on graphs, in: AAAI, 2013.

69
[7] G. Sharon, R. Stern, A. Felner, N. R. Sturtevant, Conflict-based search for
optimal multi-agent pathfinding, Artificial Intelligence 219 (2015) 40–66.

[8] G. Sharon, R. Stern, M. Goldenberg, A. Felner, The increasing cost tree


search for optimal multi-agent pathfinding, Artificial Intelligence (2013).

1455 [9] G. Wagner, H. Choset, Subdimensional expansion for multirobot path plan-
ning, Artificial Intelligence 219 (2015) 1–24.

[10] T. S. Standley, Finding optimal solutions to cooperative pathfinding prob-


lems, in: AAAI, 2010.

[11] A. Felner, J. Li, E. Boyarski, H. Ma, L. Cohen, T. S. Kumar, S. Koenig,


1460 Adding heuristics to conflict-based search for multi-agent path finding, in:
ICAPS, 2018.

[12] R. Barták, N.-F. Zhou, R. Stern, E. Boyarski, P. Surynek, Modeling and


solving the multi-agent pathfinding problem in picat, in: International Con-
ference on Tools with Artificial Intelligence (ICTAI), 2017.

1465 [13] T. T. Walker, N. R. Sturtevant, A. Felner, Extended increasing cost tree


search for non-unit cost domains., in: IJCAI, 2018, pp. 534–540.

[14] L. Cohen, T. Uras, T. S. Kumar, S. Koenig, Optimal and bounded-


suboptimal multi-agent motion planning, in: Symposium on Combinatorial
Search (SoCS), 2019.

1470 [15] M. Phillips, M. Likhachev, Sipp: Safe interval path planning for dynamic
environments, in: IEEE International Conference on Robotics and Automa-
tion (ICRA), 2011, pp. 5628–5635.

[16] P. Surynek, Unifying search-based and compilation-based approaches to


multi-agent path finding through satisfiability modulo theories, in: Pro-
1475 ceedings of the Twenty-Eighth International Joint Conference on Artificial
Intelligence, IJCAI 2019, ijcai.org, 2019, pp. 1177–1183.

70
[17] J. Li, P. Surynek, A. Felner, H. Ma, Multi-agent path finding for large
agents, in: AAAI, 2019.

[18] A. Andreychuk, K. Yakovlev, D. Atzmon, R. Stern, Multi-agent pathfind-


1480 ing with continuous time, in: International Joint Conference on Artificial
Intelligence (IJCAI), 2019, pp. 39–45.

[19] P. Surynek, Multi-agent path finding with continuous time and geometric
agents viewed through satisfiability modulo theories (SMT), in: Interna-
tional Symposium on Combinatorial Search (SOCS), 2019, pp. 200–201.

1485 [20] R. Stern, N. R. Sturtevant, A. Felner, S. Koenig, H. Ma, T. T. Walker,


J. Li, D. Atzmon, L. Cohen, T. K. S. Kumar, R. Barták, E. Boyarski, Multi-
agent pathfinding: Definitions, variants, and benchmarks, in: International
Symposium on Combinatorial Search (SOCS), 2019, pp. 151–159.

[21] J. Yu, S. M. LaValle, Multi-agent path planning and network flow, in:
1490 Workshop on the Algorithmic Foundations of Robotics (WAFR), 2012, pp.
157–173.

[22] A. Felner, R. Stern, S. E. Shimony, E. Boyarski, M. Goldenberg, G. Sharon,


N. R. Sturtevant, G. Wagner, P. Surynek, Search-based optimal solvers for
the multi-agent pathfinding problem: Summary and challenges, in: the
1495 International Symposium on Combinatorial Search (SoCS), 2017, pp. 29–
37.

[23] P. Surynek, A. Felner, R. Stern, E. Boyarski, Efficient SAT approach to


multi-agent path finding under the sum of costs objective, in: European
Conference on Artificial Intelligence (ECAI), Vol. 285, 2016, pp. 810–818.

1500 [24] T. Walker, N. R. Sturtevant, A. Felner, Generalized and sub-optimal bipar-


tite constraints for conflict-based search, in: AAAI Conference on Artificial
Intelligence, 2020.

[25] G. Audemard, J. Lagniez, L. Simon, Improving glucose for incremental SAT


solving with assumptions: Application to MUS extraction, in: M. Järvisalo,

71
1505 A. V. Gelder (Eds.), International Conference Theory and Applications
of Satisfiability Testing (SAT), Vol. 7962 of Lecture Notes in Computer
Science, Springer, 2013, pp. 309–317.

[26] G. Audemard, L. Simon, On the glucose sat solver, International Journal


on Artificial Intelligence Tools 27 (01) (2018) 1840001.

1510 [27] P. Surynek, Time-expanded graph-based propositional encodings for


makespan-optimal solving of cooperative path finding problems, Ann.
Math. Artif. Intell. 81 (3-4) (2017) 329–375.

[28] H. A. Kautz, B. Selman, Planning as satisfiability, in: European Conference


on Artificial Intelligence (ECAI), 1992, pp. 359–363.

1515 [29] H. A. Kautz, B. Selman, Unifying sat-based and graph-based planning, in:
T. Dean (Ed.), Proceedings of the Sixteenth International Joint Conference
on Artificial Intelligence, IJCAI 99, Stockholm, Sweden, July 31 - August
6, 1999. 2 Volumes, 1450 pages, Morgan Kaufmann, 1999, pp. 318–325.

[30] A. Srinivasan, T. Ham, S. Malik, R. K. Brayton, Algorithms for discrete


1520 function manipulation, in: IEEE International Conference on Computer-
Aided Design, 1990, pp. 92–95.

[31] R. Nieuwenhuis, A. Oliveras, C. Tinelli, Solving SAT and SAT modulo


theories: From an abstract davis–putnam–logemann–loveland procedure to
dpll(T ), J. ACM 53 (6) (2006) 937–977. doi:10.1145/1217856.1217859.
1525 URL https://doi.org/10.1145/1217856.1217859

[32] M. Bofill, M. Palahı́, J. Suy, M. Villaret, Solving constraint satisfaction


problems with SAT modulo theories, Constraints 17 (3) (2012) 273–303.

[33] R. Nieuwenhuis, SAT modulo theories: Getting the best of SAT and global
constraint filtering, in: Principles and Practice of Constraint Programming
1530 - CP 2010 - 16th International Conference, CP 2010, Proceedings, 2010,
pp. 1–2.

72
[34] G. Gange, D. Harabor, P. J. Stuckey, Lazy CBS: Implicit conflict-based
search using lazy clause generation, in: International Conference on Auto-
mated Planning and Scheduling (ICAPS), Vol. 29, 2019, pp. 155–162.

1535 [35] S. J. Guy, I. Karamouzas, Guide to anticipatory collision avoidance, in:


S. Rabin (Ed.), Game AI Pro 2: Collected Wisdom of Game AI Profession-
als, 2015, Ch. 19, pp. 195–208.

[36] P. Jiménez, F. Thomas, C. Torras, 3d collision detection: a survey, Com-


puters & Graphics 25 (2) (2001) 269–285.

1540 [37] L. Cohen, Efficient bounded-suboptimal multi-agent path finding and mo-
tion planning via improvements to focal search, Ph.D. thesis (2020).

[38] K. Yakovlev, A. Andreychuk, Any-angle pathfinding for multiple agents


based on SIPP algorithm, in: International Conference on Automated Plan-
ning and Scheduling (ICAPS), 2017, p. 586.

1545 [39] H. Ma, W. Hönig, T. S. Kumar, N. Ayanian, S. Koenig, Lifelong path


planning with kinematic constraints for multi-agent pickup and delivery,
in: AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 7651–
7658.

[40] V. Narayanan, M. Phillips, M. Likhachev, Anytime safe interval path plan-


1550 ning for dynamic environments, in: IEEE/RSJ International Conference
on Intelligent Robots and Systems, 2012, pp. 4708–4715.

[41] A. A. Konstantin Yakovlev, R. Stern, Revisiting bounded-suboptimal safe


interval path planning, in: Accepted to the International Conference on
Automated Planning and Scheduling (ICAPS), 2020.

1555 [42] J. Švancara, M. Vlk, R. Stern, D. Atzmon, R. Barták, Online multi-agent


pathfinding, in: AAAI Conference on Artificial Intelligence, Vol. 33, 2019,
pp. 7732–7739.

73
[43] W. Hönig, J. A. Preiss, T. S. Kumar, G. S. Sukhatme, N. Ayanian, Trajec-
tory planning for quadrotor swarms, IEEE Transactions on Robotics 34 (4)
1560 (2018) 856–869.

[44] D. Atzmon, R. Stern, A. Felner, G. Wagner, R. Barták, N. Zhou, Robust


multi-agent path finding and executing, Journal of Artificial Intelligence
Research (JAIR) 67 (2020) 549–579.

[45] D. Kornhauser, G. Miller, P. Spirakis, Coordinating pebble motion on


1565 graphs, the diameter of permutation groups, and applications, in: Sym-
posium on Foundations of Computer Science, IEEE, 1984, pp. 241–250.

[46] B. Nebel, On the computational complexity of multi-agent pathfinding on


directed graphs, in: Accepted to the International Conference on Auto-
mated Planning and Scheduling (ICAPS), 2020.

1570 [47] D. P. Dobkin, D. G. Kirkpatrick, Determining the separation of prepro-


cessed polyhedra—a unified approach, in: International Colloquium on
Automata, Languages, and Programming (ICALP), Springer, 1990, pp.
400–413.

[48] S. Gottschalk, M. C. Lin, D. Manocha, Obbtree: A hierarchical structure


1575 for rapid interference detection, in: Proceedings of the 23rd annual confer-
ence on Computer graphics and interactive techniques, 1996, pp. 171–180.

[49] S. Cameron, A study of the clash detection problem in robotics, in: IEEE
International Conference on Robotics and Automation (ICRA), Vol. 2,
IEEE, 1985, pp. 488–493.

1580 [50] M. Tang, R. Tong, Z. Wang, D. Manocha, Fast and exact continuous col-
lision detection with bernstein sign classification, ACM Transactions on
Graphics (TOG) 33 (6) (2014) 186.

[51] T. T. Walker, N. R. Sturtevant, Collision detection for agents in multi-agent


pathfinding (2019). arXiv:1908.09707.

74
1585 [52] E. Boyarski, A. Felner, R. Stern, G. Sharon, O. Betzalel, D. Tolpin, E. Shi-
mony, ICBS: The improved conflict-based search algorithm for multi-agent
pathfinding, in: International Joint Conference on Artificial Intelligence
(IJCAI), 2015, pp. 740–746.

[53] E. Boyraski, A. Felner, G. Sharon, R. Stern, Don’t split, try to work it


1590 out: Bypassing conflicts in multi-agent pathfinding., in: ICAPS, 2015, pp.
47–51.

[54] M. Barer, G. Sharon, R. Stern, A. Felner, Suboptimal variants of the


conflict-based search algorithm for the multi-agent pathfinding problem,
in: Symposium on Combinatorial Search (SoCS), 2014.

1595 [55] G. Audemard, L. Simon, On the glucose SAT solver, International Journal
on Artificial Intelligence Tools 27 (1) (2018) 1840001:1–1840001:25.

[56] P. Surynek, Lazy compilation of variants of multi-robot path planning with


satisfiability modulo theory (smt) approach, in: IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS), 2019, pp. 3282–3287.

1600 [57] N. Rivera, C. Hernández, J. A. Baier, Grid pathfinding on the 2k neigh-


borhoods., in: AAAI, 2017, pp. 891–897.

[58] N. R. Sturtevant, Benchmarks for grid-based pathfinding, IEEE Transac-


tions on Computational Intelligence and AI in Games 4 (2) (2012) 144–148.

[59] I. A. Şucan, M. Moll, L. E. Kavraki, The Open Motion Planning Li-


1605 brary, IEEE Robotics & Automation Magazine 19 (4) (2012) 72–82, http:
//ompl.kavrakilab.org. doi:10.1109/MRA.2012.2205651.

[60] P. Surynek, Multi-agent path finding modulo theory with continuous move-
ments and the sum of costs objective, in: U. Schmid, F. Klügl, D. Wolter
(Eds.), KI 2020: Advances in Artificial Intelligence, Vol. 12325 of Lecture
1610 Notes in Computer Science, Springer, 2020, pp. 219–232.

75
[61] P. Surynek, Logic-based multi-agent path finding with continuous move-
ments and the sum of costs objective, in: S. O. Kuznetsov, A. I. Panov,
K. S. Yakovlev (Eds.), Artificial Intelligence - 18th Russian Conference,
RCAI 2020, Proceedings, Vol. 12412 of Lecture Notes in Computer Sci-
1615 ence, Springer, 2020, pp. 85–99.

[62] A. Andreychuk, K. Yakovlev, E. Boyarski, R. Stern, Improving continuous-


time conflict based search, in: AAAI Conference on Artificial Intelligence,
2021.

[63] B. Bonet, H. Geffner, Planning as heuristic search, Artificial Intelligence


1620 129 (1-2) (2001) 5–33.

[64] J. Rintanen, K. Heljanko, I. Niemelä, Planning as satisfiability: parallel


plans and algorithms for plan search, Artificial Intelligence 170 (12-13)
(2006) 1031–1080.

[65] P. Surynek, A. Felner, R. Stern, E. Boyarski, An empirical comparison of


1625 the hardness of multi-agent path finding under the makespan and the sum
of costs objectives, in: J. A. Baier, A. Botea (Eds.), Proceedings of the
Ninth Annual Symposium on Combinatorial Search, SOCS 2016, AAAI
Press, 2016, pp. 145–147.

[66] P. Surynek, A. Felner, R. Stern, E. Boyarski, An empirical comparison of


1630 the hardness of multi-agent path finding under the makespan and the sum
of costs objectives, in: Symposium on Combinatorial Search (SoCS), 2016,
pp. 145–147.

[67] E. B. Omri Kaduri, R. Stern, Analysis of classical multi agent path finding
algorithms, in: Accepted to the International Symposium on Combinatorial
1635 Search (SOCS), 2021.

[68] T. T. Walker, D. Chan, N. R. Sturtevant, Using hierarchical constraints to


avoid conflicts in multi-agent pathfinding, in: International Conference on
Automated Planning and Scheduling (ICAPS), 2017.

76
[69] W. Hönig, T. S. Kumar, L. Cohen, H. Ma, H. Xu, N. Ayanian, S. Koenig,
1640 Multi-agent path finding with kinematic constraints, in: ICAPS, 2016, pp.
477–485.

[70] W. Hönig, T. Kumar, L. Cohen, H. Ma, H. Xu, N. Ayanian, S. Koenig,


Summary: multi-agent path finding with kinematic constraints, in: In-
ternational Joint Conference on Artificial Intelligence (IJCAI), 2017, pp.
1645 4869–4873.

[71] J. Snape, J. Van Den Berg, S. J. Guy, D. Manocha, The hybrid reciprocal
velocity obstacle, IEEE Transactions on Robotics 27 (4) (2011) 696–706.

[72] J. Godoy, T. Chen, S. J. Guy, I. Karamouzas, M. Gini, Alan: adaptive


learning for multi-agent navigation, Autonomous Robots (2018) 1–20.

1650 [73] R. Shome, K. Solovey, A. Dobson, D. Halperin, K. E. Bekris, dRRT*: Scal-


able and informed asymptotically-optimal multi-robot motion planning,
Autonomous Robots 44 (3) (2020) 443–467.

[74] J. P. Van Den Berg, M. H. Overmars, Prioritized motion planning for mul-
tiple robots, in: IEEE/RSJ International Conference on Intelligent Robots
1655 and Systems (IROS), 2005, pp. 430–435.

[75] J. Snape, J. Van Den Berg, S. J. Guy, D. Manocha, Smooth and collision-
free navigation for multiple robots under differential-drive constraints, in:
IEEE/RSJ International Conference on Intelligent Robots and Systems,
2010, pp. 4584–4589.

1660 [76] H. Ma, J. Li, T. K. S. Kumar, S. Koenig, Lifelong multi-agent path finding
for online pickup and delivery tasks, in: Conference on Autonomous Agents
and MultiAgent Systems (AAMAS), 2017, pp. 837–845.

[77] M. Liu, H. Ma, J. Li, S. Koenig, Task and path planning for multi-agent
pickup and delivery, in: Conference on Autonomous Agents and MultiAgent
1665 Systems (AAMAS), 2019, pp. 1152–1160.

77
[78] J. Li, D. Harabor, P. J. Stuckey, H. Ma, S. Koenig, Disjoint splitting for
multi-agent path finding with conflict-based search, in: International Con-
ference on Automated Planning and Scheduling (ICAPS), Vol. 29, 2019,
pp. 279–283.

1670 [79] T. T. Walker, N. R. Sturtevant, A. Felner, H. Zhang, J. Li, T. S. Kumar,


Conflict-based increasing cost search, in: Proceedings of the 31st Interna-
tional Conference on Automated Planning and Scheduling (ICAPS 2021),
2021, pp. 385–395.

[80] P. Surynek, Sparse real-time decision diagrams for continuous multi-robot


1675 path planning, in: International Conference on Tools with Artificial Intel-
ligence (ICTAI), IEEE, in press, 2021.

[81] J. Li, A. Felner, E. Boyarski, H. Ma, S. Koenig, Improved heuristics for


multi-agent path finding with conflict-based search, in: International Joint
Conference on Artificial Intelligence (IJCAI), 2019, pp. 442–449.

1680 [82] J. Li, G. Gange, D. Harabor, P. J. Stuckey, H. Ma, S. Koenig, New tech-
niques for pairwise symmetry breaking in multi-agent path finding, in: In-
ternational Conference on Automated Planning and Scheduling (ICAPS),
2020.

78
Declaration of interests

☒ The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.

☐The authors declare the following financial interests/personal relationships which may be considered
as potential competing interests:

You might also like