Multi-Agent Pathfinding With Continuous Time
Multi-Agent Pathfinding With Continuous Time
Multi-Agent Pathfinding With Continuous Time
Anton Andreychuk, Konstantin Yakovlev, Pavel Surynek, Dor Atzmon and Roni Stern
PII: S0004-3702(22)00002-9
DOI: https://doi.org/10.1016/j.artint.2022.103662
Reference: ARTINT 103662
Please cite this article as: A. Andreychuk, K. Yakovlev, P. Surynek et al., Multi-Agent Pathfinding with Continuous Time, Artificial
Intelligence, 103662, doi: https://doi.org/10.1016/j.artint.2022.103662.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and
formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and
review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal
pertain.
Abstract
1. Introduction
MAPF is the problem of finding paths for multiple agents such that each
agent reaches its goal and the agents do not collide. MAPF has topical ap-
plications in warehouse management [1], airport towing [2], autonomous ve-
5 hicles, robotics [3], and digital entertainment [4]. A common requirement in
MAPF applications is to minimize either the sum of costs (SOC) of the agents’
plans or their maximum, also known as their makespan. Finding MAPF solu-
tions that have minimal SOC or minimal makespan are both NP hard prob-
lems [5, 6]. Nevertheless, AI researchers in the past years have made substantial
10 progress in finding solutions with minimal SOC or minimal makespan for a
growing number of MAPF problems, including problems with over one hundred
agents [7, 8, 9, 10, 11, 12, 6].
However, most prior work on solving MAPF optimally assumed that (1) time
is discretized into time steps, and (2) the duration of every action is one time
15 step. These simplifying assumptions limit the applicability of MAPF algorithms
in real-world application. In this work, we address a more general type of MAPF
problem called MAPFR [13].1 In MAPFR , agents occupy a defined area in a
metric space at any moment of a continuous timeline, and do not rely on any
1 We discuss later the relation between Walker et al.’s [13] definition of MAPFR and ours.
2
time discretization.
20 Existing algorithms such as E-ICTS [13] or ECBS-CT [14], partially address
the MAPFR problem, in that they support move actions of non-uniform duration
(cost) and consider the area the agents occupy. However, these algorithms still
rely on discretizing time to define the duration of the wait actions. This can
have a negative effect on both solution quality and runtime. In this work, we
25 propose two optimal algorithms that solve MAPFR problems and do not require
any time discretization, allowing wait actions with arbitrary durations.
The first algorithm we propose is called CCBS. CCBS builds on two existing
algorithms: SIPP [15] and CBS [7]. SIPP is a single-agent pathfinding algorithm
designed to find paths that avoid dynamic obstacles. CBS is a state-of-the-art
30 search-based MAPF algorithm with many extensions and improvements. CCBS
follows the same problem-solving framework as CBS, but uses a variant of SIPP
to plan paths for the individual agents, and imposes a novel type of constraints
to resolve conflicts between the resulting plans.
The second algorithm we propose is called SMT-CCBS. SMT-CCBS is a
35 SAT-based algorithm, that is, it solves a MAPFR problem by compiling parts
of the problem to a Boolean Satisfiability problem (SAT) and applies a state-
of-the-art SAT solver to solve it. SAT-based approaches have been applied
successfully for classical MAPF and take advantage of the power of modern
SAT solvers. Technically, SMT-CCBS resolves conflicts between agents using
40 the same type of constraints as CCBS. However, it does so in the SAT Modulo
Theory (SMT) problem solving framework used by the SMT-CBS algorithm [16]
to solve MAPF problems. Both CCBS and SMT-CCBS are guaranteed to (1)
only return solutions that are conflict-free, (2) find a solution if one exists, (3)
return a solution that has a minimal SOC or makespan (for CCBS and SMT-
45 CCBS, respectively).
We implemented CCBS and SMT-CCBS, and evaluated them on standard
grid-based MAPF benchmarks as well as generated roadmaps. The results show
that both algorithms can solve MAPF problems with dozens of agents optimally.
While we are not the first to study MAPF beyond its basic setting [13, 17, 14],
3
50 to the best of our knowledge we are the first to propose MAPF algorithms that
can handle agents with volumes and non-uniform action durations while avoiding
any form of time discretization (see Section 7 for a more detailed discussion).
All our implementations are publicly available, so researchers can reproduce our
results and build on them.
55 A preliminary version of this work was published in two conference pa-
pers [18, 19]. The main contribution of this journal paper is to present these
works in a unified and coherent manner. In addition, this journal paper extends
these works significantly, by providing:
2. Background
4
Figure 1: An example of a classical MAPF problem instance on a 4-neighborhood grid.
Time is discretized into time steps. For every time step t, each agent occupies
one of the graph vertices, referred to as the location of that agent at time t. An
80 action in classical MAPF is a function a : V → V such that a(v) = v 0 means
the agent’s current location is v and its location in the next time step is v 0 . In
every time step each agent can choose to perform an action. There are two types
of actions: a wait action, in which the agent stays in its location, and a move
action, in which the agent moves to one of the vertices adjacent to its current
85 location.
A sequence of actions πi = (a1 , . . . , an ) is a single-agent plan for agent i if
an (an−1 (· · · (a1 (S(i))) · · · )) = G(i), i.e., if it leads the agent from its start to
its goal. The number of actions in the plan defines its cost denoted as c(πi ). A
solution to a classical MAPF problem is a set of k single-agent plans, one for
90 each agent: Π = {π1 , ..., πk }.
A solution to a classical MAPF problem is valid if its constituent single-
agent plans do not conflict. There are multiple types of conflicts. A vertex
conflict between single-plan πi for agent i and single-plan πj for agent j occurs
at location v and time step t iff according to these plans agents i and j plan
95 to occupy v at the same time step t. We represent such a conflict by the tuple
5
A A B
A B C A B C A B
B D C
(a) (b) (c) (d) (e)
Figure 2: An illustration from [20] of an edge conflict (a), a vertex conflict (b), a following
conflict (c), a cycle conflict (d), and a swapping conflict (e).
6
2.1.1. Conflict Based Search (CBS) for Finding SOC-Optimal Solutions
CBS [7] is a solution complete algorithm for classical MAPF that is guaran-
120 teed to return a SOC-optimal solution.2 It solves a given MAPF problem by
finding a plan for each agent separately, detecting conflicts between these plans,
and resolving them by replanning for the individual agents subject to specific
constraints.
The basic CBS implementation considers two types of conflicts: vertex con-
125 flicts and swapping conflicts. Correspondingly, the basic CBS implementation
considers two types of constraints. A CBS vertex-constraint is defined by a tuple
hi, v, ti and means that agent i is prohibited from occupying vertex v at t. A CBS
edge-constraint is defined similarly by a tuple hi, e, ti, where e ∈ E. To guar-
antee solution completeness and optimality, CBS runs two search algorithms:
130 a low-level search algorithm that finds plans for individual agents subject to a
given set of constraints, and a high-level search algorithm that chooses which
constraints to add.
CBS: Low-Level Search. The task of the low-level search in CBS is to find an
optimal plan for an agent that is consistent with a given set of CBS constraints.
135 Any single-agent pathfinding algorithm that can do this can be used as the
CBS low-level search. To adapt single-agent pathfinding algorithms, such as
A∗ , to consider CBS constraints, the search space must also consider the time
dimension since a CBS constraint hi, v, ti blocks location v only at a specific
time step t. This means that a state in this single-agent search space is a pair
140 (v, t), representing that the agent is in location v at time step t. Expanding
such a state generates states of the form (v 0 , t + 1), where v 0 is either equal to v,
representing a wait action, or equal to one of the locations adjacent to v. States
generated by actions that violate the given set of CBS constraints are pruned.
Running A∗ on this search space returns the lowest-cost plan to the agent’s
7
145 goal that is consistent with the given set of CBS constraints, as required. This
adaptation of textbook A∗ is very simple, and indeed most papers on CBS do
not report it and just mention that the low-level search of CBS is A∗ .
8
The SAT-based approach for finding makespan-optimal solutions works by
175 solving a sequence of SAT problems. Each of these SAT problems represents
the decision problem: “is there a valid solution to the given classical MAPF
problem within a makespan of at most µ?” where µ is a parameter. We call this
decision problem the bounded-cost MAPF problem.
Each bounded-cost MAPF problem is encoded as a SAT problem by defining
180 two types of Boolean variables. The first type, denoted Xvt (i), is defined for every
discrete time step t = {0, . . . , µ}, agent i, and vertex v the agent may occupy at
t
time step t. The second type of variables, denoted Eu,v (i), is defined for every
time step t, agent i, and edge (u, v) the agent may start traversing at time step t.
MAPF movement rules and collision avoidance constraints are encoded on top
185 of these variables as simple local constraints. The resulting Boolean formula is
given to a SAT solver, which returns either a satisfying assignment or UNSAT.
A satisfying assignment to the decision variables specifies a valid solution for
the given MAPF problem. An UNSAT indicates no valid solution exists with
makespan smaller than or equal to µ. In this case, µ is increased by one. This
190 process continues until the minimal µ for which a solution exists is found. This µ
is guaranteed to be makespan optimal. The SATPlan algorithm [28, 29] followed
a similar approach for classical planning. While this SAT-based approach for
classical MAPF returns makespan-optimal solutions, it is also possible, with
some additional bookkeeping, to use it to return SOC-optimal solutions [23].
195 As expected, the size of the generated Boolean formulas has a great impact
on the overall runtime. To reduce this size, Surynek et al. [28] proposed to
introduce the Xvt (i) and Eu,v
t
(i) variables only for reachable vertices and edges
at given time step t. This reachability analysis can be done by constructing a
Multi-Value Decision Diagram (MDD) [30] for each agent representing all single-
200 agent plans for that agent up to the makespan bound µ. Each of these MDDs is
a directed acyclic graph with a single source. A node in these MDDs represents
a vertex and time pair (v, t) the agent may occupy in a single agent plan that
reaches the goal before the makespan bound. Edges represent moves in such a
single-agent plan. The construction of Φ then relies on these MDDs rather than
9
205 on the original graph, i.e., it includes only variables that have corresponding
vertices and edges in an MDD. This algorithm is known as MDD-SAT.
10
235 constraints to avoid conflicts between the agents’ plans. Instead, these conflicts
are detected after a solution to this PS is found, in a MAPF-specific DECIDET
procedure that we denote by DECIDEMAPF . That is, whenever the SAT solver
outputs a solution, this solution is checked for conflicts. If no conflicts exist, the
solution is returned. Otherwise, additional constraints are added to all subse-
240 quent PS to ensure that every previously detected conflict will not occur again.
These constraints are exactly the constraints CBS would impose to resolve the
found conflicts. Theoretically, SMT-CBS can converge to the same formula as
MDD-SAT but it often finishes with a smaller formula due to the iterative/lazy
construction. A similar approach has been used in the Lazy CBS algorithm [34].
245 SMT-CBS showed impressive performance on standard MAPF benchmarks.
• A1. The duration of every move action is one time step. This
means either all agents move in exactly the same speed and graph edges
255 represent transitions of the same length, or that graph edges represent
transitions of different lengths and the agents adapt their velocity and
acceleration profile so that all moves take one time step.
• A2. The duration of a wait action is one time step. This means
an agent may wait any discrete number of time steps, as opposed to any
260 real valued duration.
In this work, we lift these assumptions. Since we are not the first to do
this, we first discuss prior works and the relation between them. Walker et
al. [13] introduced the MAPFR problem, which lifts the first classical MAPF
11
assumption (A1). In MAPFR , every edge e = (v, v 0 ) in the underlying graph G
265 is associated with a positive weight w(e) ∈ R>0 that represents the duration it
takes an agent to move from v to v 0 . Every location v ∈ V is associated with a
unique coordinate in a metric space, denoted coord(v). Agents occupy non-zero
volume in this space. When the location of an agent is v, it means that the
reference point of the agent is located at coord(v). When an agent moves along
270 an edge (v, v 0 ), it means it moves along a straight line from coord(v) to coord(v 0 )
in a constant velocity motion. There is a conflict between two single-agent plans
iff the volumes of two or more agents “overlap at the same instant in time” [13].
This can be detected using standard collision-detection techniques [35, 36].
The original definition of MAPFR only states that action durations are non-
275 uniform, and does not differentiate between move actions and wait actions.
However, the algorithm Walker et al. proposed in that paper – Extended In-
creasing Cost Tree Search (E-ICTS) – relies on wait actions having a fixed,
pre-determined duration. So, while E-ICTS does not lift A2, the definition of
MAPFR there is ambiguous.
280 Cohen et al. [14] proposed an extension of classical MAPF called multi-
agent motion planning (MAMP). In MAMP, each agent is associated with a
graph Gi = (Vi , Ei ). A vertex in Vi represents a possible state of agent i, where
a state of an agent represents its location as well as other relevant features
such as orientation and steering angle. An edge e = (v, v 0 ) ∈ Ei represents
285 a kinodynamically feasible motion of agent i from state v to state v 0 , and the
weight of an edge is the duration of performing this motion. The agents in
MAMP move in an environment represented by a set of grid cells C. Every
state v ∈ Vr of an agent i is associated with a set of cells in C, representing the
cells occupied by that agent when in state v. Every edge e = (v, v 0 ) ∈ Ei is
290 associated with a multiset of cells in C. Each cell in this multiset is associated
with a time interval indicating the time interval in which this cell is occupied
when agent i moves from v to v 0 . Like E-ICTS, ECBS-CT lifts A2 only partially,
as the duration of the wait actions is tied to the given time discretization (which
12
is done by diving it into the timesteps of duration).3
The MAPFR algorithms we propose in this paper are based on the SIPP
algorithm. SIPP [15] is a powerful algorithm for building a plan for a single
agent moving among static and dynamic obstacles [15]. It has also been used
within prioritized MAPF solvers [38] and for solving multi-agent pickup and
300 delivery (MAPD) problems [39]. SIPP accepts as input a graph, a start and
goal vertices in that graph, and trajectories specifying the motion of the dynamic
obstacles over time. The algorithm pre-processes these trajectories to compute
safe intervals for each vertex in the graph. A safe interval is a contiguous
period of time for a vertex, during which if the agent occupies that vertex then
305 it will not collide with any dynamic obstacle. Safe intervals are assumed to be
maximal, i.e., extending a safe interval is not possible.
For example, consider a case where only one dynamic obstacle is present and
both an agent and the obstacle are disks of radius r. Now, consider a vertex
v such that the distance between this obstacle and v is less than 2r between
310 time moments t1 and t2 . The corresponding safe intervals for v are [0, t1 ] and
[t2 , +∞). Note that in this example, we assume that (1) a collision happens
only when the distance between two disks is less than the sum of their radii,
when the distance is equal to it – no collision happens; and (2) the collision does
not occur at the specific points t1 and t2 .
315 A vertex may have multiple, non-overlapping, safe intervals. In general, the
number of safe intervals is proportional to the number of obstacles that pass
nearby the vertex. The chronologically last safe interval for a vertex might not
end with ∞. For example, an obstacle may come to that vertex and stay in
it. The safe interval might be an ∅ as well, e.g., in cases where an obstacle
320 constantly moves back-and-forth in the vicinity of the vertex.
SIPP performs an A∗ -like search over the search space in which each node
13
is a pair of graph vertex and one of its safe intervals. This means there may be
multiple search nodes for the same vertex but with different safe time intervals.
In the example above, nodes n1 = (v, [0, t1 ]) and n2 = (v, [t2 , +∞)) correspond
325 to the same vertex v, but have different safe intervals. When expanding a node
n = (v, Tv ), SIPP generates a node for every pair (v 0 , Tv0 ) where v 0 is a neighbor
of v and Tv0 is a safe interval of v 0 in which the agent can safely arrive starting at
a time in Tv . SIPP also considers collisions with obstacles that occur when when
the agent moves along an edge. This is done when computing the time it takes
330 to traverse an edge. When there exists an expected collision on some edge, then
the time it takes to traverse that edge at that time will include staying in the
vertex until it is possible to move along that edge safely (i.e., without colliding).
Guided by a consistent heuristic, SIPP is guaranteed to return optimal solutions.
Sub-optimal and anytime variations of the algorithm are also known [40, 41].
14
of move actions, we limit the completeness and optimality of the algorithms we
propose to be only with respect to the chosen set of move actions. We say
that an algorithm is complete with respect to a set of move actions A if it is
guaranteed to return a valid solution if one exists in which all move actions the
355 agents perform are from A. Similarly, we say that an algorithm is optimal with
respect to a set of move actions A if the solution it returns is guaranteed to
have the lowest cost among all other valid solutions in which all move actions
the agents perform are from A. Solutions that are optimal or even valid with
respect to one set of move actions can be non-optimal or even invalid under a
360 different set of move actions.
A wait action a is an action for which there exists a vertex v ∈ V such that
for every t ∈ [0, aD ] we have that aϕ (t) = coord(v). For completeness, we define
from(a) and to(a) for wait actions to be this vertex. Note that while the set of
move actions is given as input (A), the set of wait actions is implicitly defined
365 for every vertex v ∈ V and any positive real number aD . Thus, the set of wait
actions is infinitely large.
In MAPFR , when an agent is at a vertex v it can choose to perform any action
— move or wait — that starts at v, i.e., from(a) = v. A collision between the
agents occurs if their shapes overlap. To detect such an overlap, we assume a
collision-detection method IsCollision : {1, . . . , k} × {1, . . . , k} × M × M →
{true, false} is available where IsCollision(i, j, mi , mj )=true iff when agents i
and j occupy locations mi and mj , respectively, then their shapes overlap. For
example, if the agents are 2D disk-shaped with a radius of r, then a collision
occurs if the distance between the centers of the agents is less than 2r. That is,
in this setting,
true ||mi − mj ||2 < 2r
IsCollision(i, j, mi , mj ) = (1)
false Otherwise
Note that our problem definition and the algorithms we propose later are not
restricted to disk-shaped agents and this particular type of IsCollision imple-
mentation.
15
For a sequence of actions π = (a1 , . . . , an ), we denote by π[: j] the prefix of
the first j actions, i.e., π[: j] = (a1 , . . . aj ). The duration and motion function
of π, denoted πD and πϕ , respectively, are defined as follows:
X
πD = aD (2)
a∈π
a1ϕ (t)
t ≤ a1D
··· ···
aj (t − (π[: j − 1])D )
(π[: j − 1])D < t ≤ (π[: j])D
ϕ
πϕ (t) = (3)
··· ···
anϕ (t − (π[: n − 1])D ) (π[: n − 1])D < t ≤ (π[: n])D
a (a ) (π[: n])D < t
nϕ nD
370 To explain Equation 3, which computes the location at each time moment
while executing π, observe that the motion functions are not defined with respect
to when their respective actions are applied. In other words, motion functions
are defined with respect to relative time and not absolute time. For instance,
for any action a that moves the agent from v to v 0 , by definition aϕ (0) = v and
375 aϕ (aD ) = v 0 . Therefore, to compute πϕ (t) we need to first identify the action
planned to be executed at time t. This can be computed by observing that
the ith action in π starts at time (π[: i − 1])D and ends at time (π[: i])D for
every i > 1. Then, we “correct” t by the starting time of that action, to obtain
the location of the agent during the execution of that action. The last line in
380 Equation 3 defines that the agent is assumed to stay in its last location after
the plan ends.4
As in classical MAPF, we define a single-agent plan for an agent i to be a
sequence of actions πi = (a1 , . . . , an ) such that executing it moves agent i from
S(i) to G(i). A conflict between two single-agent plans is naturally defined as
4 In this work we assume that an agent does not disappear when it reaches its goal. Other
variants also exist [20, 42].
16
Figure 3: A MAPFR problem instance with 3 agents. (left) The dashed circle mark the goal
location of each agent. (right) The positions of the agents at time moment t = 2.8 when
following their individual plans. Red and blue agents are in a collision.
385 the case where if the agents execute their respective plans starting at the same
time then there exists a point in time in which a collision occurs.
17
Example 1 (Figure 3). The graph vertices and edges are represented by small
400 circles with letters inside and the straight line segments between them, respec-
tively. Agents are shown as colored circles. Each agent is a disk with a radius of
0.5. Initially, the green agent occupies vertex A, the red agent occupies vertex
E, and the blue agent occupies vertex G. Their respective goals are I (green
agent), J (red agent), and D (blue agent). There are 20 move actions in this
405 example, corresponding to 10 undirected graph edges. We assume the agents
move in a constant speed of one unit; they start and stop instantaneously; and
move from one vertex to the other following the straight line segment connect-
ing them. Thus, the duration of every action equals the distance between the
vertices that define that move action. E.g. the duration of the action “move
410 from A to B”, denoted as A → B, is 2. The motion function that describes this
−→ 1 −−→
action is: OA + ||AB|| · AB · t, which is an analytical way to define a constant-
velocity straightforward motion between A and B. The least-cost individual
plans to the agents’ respective goals are: πgreen = {A → B, B → F, F → I}
for the green agent; πred = {E → F, F → I, I → J} for the red agent;
415 πblue = {G → H, H → C, C → D} for the blue one. If the agents start ex-
ecuting them simultaneously, then at t = 2.8 their positions will be (3, 4.2),
√ √
(3 + 0.4 · 2, 3 − 0.4 · 2), (3.48, 1.64) respectively (see Figure 3 (right)). The
distance between the positions of red and blue agents is less than the sum of their
radii, thus, they are colliding and their plans are in conflict.
420 Example 2 (Figure 4). In this MAPFR problem there are 3 disk-shaped agents
√
of size (diameter) 2/2 located in a 4-neighborhood 2×4 grid. Thus, the move
actions correspond to moving along the four cardinal directions. The duration of
every such move is 1 time unit. The start locations of the agents are illustrated
in the grid on the left. The goal locations of the agents are shown on the grid
425 that is on the top-right. The 3 top right grids show the solution found by a SOC-
optimal MAPF solver which assumes that minimal wait time equals 1 time unit.
The 3 bottom right grids show key steps in the solution found by a SOC-optimal
MAPFR solver. As can be seen, agents 2 and 3 reach their goals earlier in
18
Figure 4: A MAPFR problem instance with 3 agents where non-unit wait actions are needed
to find a SOC-optimal solution.
the SOC-optimal MAPFR solution, and therefore its overall SOC is lower than
430 the SOC-optimal MAPF solution (8.414 vs. 9). This highlights the advantage
of allowing arbitrary wait durations. An animation of this example is given in
https: // tinyurl. com/ ccbs-vs-cbs2 .
The first example illustrates that in MAPFR agents may collide even if they
do not occupy the same edge or vertex. The second example illustrates that
435 allowing arbitrary wait times enables finding solutions with a lower cost. In
particular, consider applying ECBS-CT or E-ICTS on this example. Recall
that both algorithms require determining a-priori the duration of wait actions.
However, since the underlying graph here is based on a simple 4-neighborhood
grid, it is not obvious how to determine the appropriate wait action duration so
440 as to result in the SOC-optimal MAPFR solution shown in this example. The
algorithms we propose in this work are able to find this solution.
19
4. Conflict-Based Search with Continuous Time
CCBS follows the CBS framework. The main differences between CCBS and
CBS are:
• CCBS adds constraints over pairs of actions and time ranges, instead of
location-time pairs.
455 • For the low-level search, CCBS uses a version of SIPP adapted to handle
CCBS constraints.
20
t. A CCBS conflict is a tuple hi, j, (ai , ti ), (aj , tj )i, representing that if agent i
470 executes the timed action (ai , ti ) and agent j executes the timed action (aj , tj )
then they will collide. Formally:
Whenever i and j are clear from the context, we omit them and use IsCollision(mi , mj )
and InConflict (ai , ti ), (aj , tj ) . Observe that a CCBS conflict is agnostic to
the absolute time the actions are performed, and only considers their relative
time, that is, for any constant number ∆ > 0,
InConflict (ai , ti ), (aj , tj ) → InConflict (ai , ti + ∆), (aj , tj + ∆) (6)
21
if there exists a pair of timed actions (ai , ti ) ∈ πi and (aj , tj ) ∈ πj such that
490 InConflict (ai , ti ), (aj , tj ) is true. It is straightforward to see that a pair
of single-agent plans have a conflict, as defined in Definition 1, if they have a
CCBS conflict.
In our running example from Figure 3, consider the following single-agent
plans for the green, red, and blue agents, represented as sequences of timed
495 actions: πgreen = {(A → B, 0), (B → F, 2), (F → I, 4)}, πred = {(E →
√
F, 0), (F → I, 2), (I → J, 2 + 2 2)}, πblue = {(G → H, 0), (H → C, 2), (C →
D, 7}. The plans of the red and blue agents do, indeed, conflict (as was shown
on Figure 3 (right)). CCBS conflict here is: hred, blue, (F → I, 2), (H → C, 2)i.
Definition 3 (CCBS Unsafe Interval). [ti , tui ) is the unsafe interval for (ai , ti )
with respect to (aj , tj ), where
and [tj , tuj ) is the unsafe interval for (aj , tj ) with respect to (ai , ti ), where
22
If there is no time t ∈ [ti , tj + aj D ] for which InConflict((ai , t), (aj , tj )) =
false then tui is not defined mathematically. For such cases, we set tui to be tj +
aj D , indicating that ai must start after aj has already finished. Similarly, tuj can
515 be undefined when there is no t ∈ [tj , ti +aiD ] for which InConflict((ai , ti ), (aj , tj )) =
false, and we set tuj to ti + aiD in such cases.
A constraint in CCBS is of the form hi, ai , [ti , tui )i, saying that agent i cannot
perform ai in the range [ti , tui ). When the agent’s index is clear from the con-
text, we omit it from the definition of the constraint, i.e., just write ha, [t, tu )i.
520 Constraints over time intervals, also known as “range constraints”, have already
been introduced in the context of robust MAPF [44].
For a CCBS conflict h(ai , ti ), (aj , tj )i, CCBS adds to Ni the constraint
hi, ai , [ti , tui )i and adds to Nj the constraint j, aj , [tj , tuj ) . Section 4.3 discusses
methods for computing the unsafe intervals.
525 For example, assume that we are running CCBS over the MAPFR problem
depicted in Figure 3. The high-level search expands a root CT node that con-
tains individual plans of the agents that were planned agnostic to each other.
As mentioned above, the plans of the red and blue agents conflict. This CCBS
conflict is hred, blue, (F → I, 2), (H → C, 2)i. The unsafe interval of the action
530 (F → I, 2) w.r.t. action (H → C, 2) is [2.000, 3.743), i.e. the first moment
of time the red agent might safely start moving from F to I after time 2 is
3.743. The unsafe interval of the action (H → C, 2) w.r.t. action (F → I, 2) is
[2.000, 3.310).
23
Consider the following example: two CCBS constraints imposed over the
wait actions associated with the same graph vertex v are passed to SIPP:
hi, a1 , [t1 , tu1 )i and hi, a2 , [t2 , tu2 )i (t1 < tu1 < t2 < tu2 ). Then, the safe intervals
545 for v are the following: [0; t1 ], [tu1 ; t2 ], [t2u , +∞).5
CCBS constraints imposed over move actions are incorporated into SIPP
in a different way. Let hi, ai , [ti , tui )i be a CCBS constraint imposed over a
move action that is defined by the graph edge (v, v 0 ) and a SIPP search node
associated with the vertex v. For this search node, we replace the action ai with
action a0i that starts by waiting in v for duration tui − ti before moving to v 0 .
Formally, a0i is defined by a duration ai 0D = aiD + tui − ti and a motion function
coord(v) t ≤ tui
0
ai ϕ (t) = (9)
aiϕ (t + tu ) t > tu
i i
5 Note that the safe intervals include the boundary time moments t1 , t2 . The rationale
behind this is the following. CCBS constrains wait actions that have certain durations, that
is – no wait action is possible that starts at t1 or t2 . However, the agent can start moving at
t1 (or t2 ) and thus leave the vertex immediately after that time moment, without violating
the CCBS constraint.
24
Algorithm 1: CCBS pseudo code.
input: G = (V, E), S, G)
1 foreach agent i do
πi ←A∗ G, S(i), G(i)
2
3 N ← ∅, (π1 , . . . , πk )
25
4.1.4. CCBS Pseudo-code
Algorithm 1 lists the complete pseudo-code of CCBS. First, a simple A∗
search is used to find a single-agent plan for each agent ignoring all other agents
560 (line 2). Note that SIPP is not needed as this stage, because the optimal plan
for each agent at this stage includes only move actions. This set of single-agent
plans is used to create the root of the CT, which is added to the open list
(Open). Then, in every iteration of the algorithm, we extract the node N from
Open that represents a solution with a minimal cost, compared to solutions
565 in the other nodes in Open(line 6). If N has no CCBS conflicts, we return
N.Π. Otherwise, we choose one of the CCBS conflicts C = hi, j, (ai , ti ), (aj , tj )i
detected for N.Π. For each agent in the conflict, i.e., i and j, we compute
the unsafe interval for its action (line 11), and create a new CT node with the
corresponding constraint and a new single-agent plan for that agent. Note that
570 in the pseudo-code we use N.πl to refer to the single-agent plan of agent l in
N.Π. This CT node is added to Open, so that it may be chosen for expansion
in future iterations.
Next, we prove that CCBS is sound, solution complete, and optimal. Our
575 analysis is based on the notion of a sound pair of constraints, established by
Atzmon et al. [44].
580 Lemma 1. For any CCBS conflict h(ai , ti ), (aj , tj )i and corresponding unsafe
intervals [ti , tui ) and [tj , tuj ), the pair of CCBS constraints hi, ai , [ti , tui )i and
j, aj , [tj , tuj ) is a sound pair of constraints.
Proof. By contradiction, assume that there exists a valid solution to the corre-
sponding MAPFR problem in which action ai is performed at time ti + ∆i ∈
26
[ti , tui ) and action aj is performed at time tj + ∆j ∈ [tj , tuj ). This means that
590 Proof. Soundness follows from the fact that CCBS stops only when the N.Π
has no conflicts. Solution completeness and optimality proof is similar to the
completeness and optimality proof for k-robust CBS [44], as follows.
For a CT node N , let π(N ) be all valid MAPFR solutions that satisfy
N.constraints, and let N1 , and N2 be the children of N . For any N that
595 is not a goal node, the following two conditions hold.
The first condition holds because N1 and N2 are constrained by a sound pair
of constraints (Lemma 1 and Definition 4). The second condition holds because
600 N.Π by construction is the lowest cost solution that satisfies the constraints in
N , and the constraints in N1 and N2 are a superset of N.constraints.
27
Now let N be the root of the CT. π(N ) is the set of all valid solutions,
and thus any valid solution will be either in π(N1 ) or π(N2 ). Applying this
reasoning recursively yields that every valid solution is always reachable via one
605 of the un-expanded CT nodes. By performing a best-first search over the CT,
exploring CT nodes with minimal cost first, CCBS is guaranteed to find an
optimal MAPFR solution.
620 The soundness, solution completeness, and optimality of CCBS rely on hav-
ing accurate collision detection, conflict detection, and unsafe interval detec-
tion mechanisms. That is, we require (1) the collision detection mechanism
(IsCollision) to detect a collision iff one exists, (2) the conflict detection
mechanism to detect a conflict iff one exists (InConflict), (3) and the un-
625 safe interval detection mechanism returns the maximal unsafe interval for every
given pair of actions. Constructing such accurate mechanisms for agents with
arbitrary shapes and arbitrary motion-functions is not trivial, however for many
individual cases fast and exact solutions exist.
28
4.3.1. Implementing Collision Detection
630 If the agents are modeled as spheres (or disks in 2D), IsCollision is trivially
implemented in O(1) by computing the distance between the centers of the
agents and comparing this distance to the sum of their radii. When the agents
are convex polyhedrons, IsCollision can be implemented in O(log(n) log(m)),
where m, n is the number of vertices comprising the polyhedrons [47]. General
635 polyhedra are more difficult to handle. In this case the agents’ regions or their
surfaces are typically decomposed into convex parts and then collision detection
is applied to these parts in a systematic fashion, see [48] for example.
29
computed with closed-loop formulas in some settings, the problem of computing
an unsafe interval is less investigated. A naı̈ve general method for computing
660 the unsafe interval for an action ai is to apply the conflict detection mechanism
multiple times, starting from t = ti and incrementing t by some small ∆ > 0
until InConflict returns f alse, meaning the unsafe interval is done. This ap-
proach is limited in that the resulting unsafe interval may be larger than the real
one. One can extend this approach to get a more accurate solution, as follows.
665 Suppose that InConflict returns true when the start moment of the action ai
is ti + (k − 1)∆ and returns f alse if it is ti + k∆. Obviously, the true endpoint
of the unsafe interval lies in (ti + (k − 1)∆, ti + k∆]. One can now apply binary
search over this interval to identify the unsafe interval endpoint. Theoretically,
this search may not converge due to the unlimited number of time moments
670 comprising the interval. However, in practice, these moments are represented as
the floating-point approximations thus the search will, indeed, converge. More-
over, the resultant endpoint will be exact in the sense that a computer program
is not able to present this endpoint with more accuracy. Finally, a recent in-
vestigation of Walker and Sturtevant [51] describes an approach to compute the
675 “exact minimum delay for collision avoidance” in case agents are disks moving
with constant velocities. This can be straightforwardly transformed to the exact
computation of the CCBS unsafe intervals.
Overall, when agents are represented as disks that move from one location to
the other with constant velocities along the straight lines, there exist fast and
680 exact mechanisms to implement IsCollision, InConflict, and to compute
the endpoints of unsafe intervals.
One of the major design choices when implementing an algorithm from the
CBS family is which conflict to choose when processing a high-level node. In
685 classical MAPF, choosing conflicts in an intelligent manner can significantly
reduce the number of the expanded CT nodes and speed up the search by
several orders of magnitude [52, 53]. A particularly effective method for choosing
30
conflicts is to prefer cardinal conflicts over semi-cardinal conflicts, and semi-
cardinal conflicts over non-cardinal conflicts. A CBS conflict is called cardinal
690 iff resolving it using either of the corresponding CBS constraints results in
increasing the solution cost. The conflict is semi-cardinal if the solution cost
increases only when imposing the corresponding CCBS constraint on only one of
the agents involved in the conflict. In all other cases, the conflict is non-cardinal.
Effectively implementing this prioritization of cardinal and non-cardinal con-
695 flicts in CCBS is not trivial, since conflict detection in MAPFR is more costly
than in classical MAPF. In our CCBS implementation, we store all detected
conflicts with their types (cardinal, semi-cardinal, or non-cardinal) in the nodes
of the CCBS constraint tree. When a child CT node is generated, it immedi-
ately copies all the conflicts from its parent, except those that were resolved.
700 Then, we detect conflicts only with the newly constructed plan and identify their
type (cardinal, semi-cardinal, or non-cardinal). This allowed choosing cardinal
and semi-cardinal conflicts first in an effective manner, which in turn proved
to be very effective for speeding up CCBS. Note that in a preliminary version
of this work [18], we proposed a heuristic for choosing when to detect cardinal
705 conflicts and when to avoid doing so. This heuristic does not yield significant
improvement when using the described-above implementation for detecting con-
flict types.
An additional CCBS design choice that is worth mentioning is the tie-
breaking strategy used to choose which CT node to expand from all CT nodes
710 with minimal cost. We used the following tie-breaking strategy. If two or more
high-level nodes with the same (minimal) cost exist in the CT tree we prefer the
one with the lower number of conflicts. A secondary tie-breaking criterion we
used is the number of constraints, preferring to expand first nodes with more
constraints. Similar tie-breaking techniques for CT nodes were proposed by
715 Barer et al. [54]. Our implementation of CCBS is in C++ and is available at
github.com/PathPlanning/Continuous-CBS.
31
5. An SMT-Based Approach for Makespan-Optimal MAPFR
32
Algorithm 2: The µSMT-CCBS algorithm.
Input: P , the MAPFR problem; µ, the makespan bound
1 Ψ←∅
2 while True do
3 PS ← CreatePS(Ψ, P, µ)
4 Π ← Solve(PS)
5 if No solution found (i.e., Π is null) then
6 return null
10 Add Con to Ψ
exists, the given decision problem is unsolvable, i.e., there are no solutions to P
with makespan equal to or smaller than µ. Otherwise, we apply DECIDEMAPFR
to check if the found solution is a valid MAPFR solution (line 7). If it is a valid
solution, we return it. Otherwise, DECIDEMAPFR returns one of the CCBS
750 conflicts that exist in this solution. The returned CCBS conflict is added to the
set of conflicts Ψ. This process continues until either a valid solution is found
(line 9) or we establish that no solution exists (line 6).
Our DECIDEMAPFR procedure is exactly the same as the conflict-detection
step in CCBS. It applies a conflict-detection mechanism to check for collisions
755 between the agents’ plans in the given solution. The process of generating our
PS for a given set of CCBS conflicts, MAPFR problem, and makespan bound –
CreatePS(Ψ, P, µ) – is more involved and we describe it next.
33
2. Compute for each agent a set of single-agent plans that satisfy these con-
straints.
3. Create a PS for choosing a solution from these sets of single-agent plans.
4. Add constraints to verify the chosen solution avoids all conflicts in Ψ.
765 Next, we describe the details of each step of CreatePS. As we will see, these
steps are designed to obtain a behavior similar to CCBS. In particular, in the
first step, CreatePS computes all the constraints CCBS would impose to resolve
the conflicts in Ψ, and in the second step, CreatePS computes efficiently all
single-agent plans that CSIPP would generate to satisfy every subset of these
770 constraints.6
Step 1. Compute the set of CCBS constraints for Ψ. For every CCBS
conflict Con in Ψ, we generate the pair of CCBS constraints CCBS would create
to resolve Con. This pair of CCBS constraints is specified in Lemma 1. Note
that to compute such a pair of constraints, we need to compute the unsafe
intervals for the respective conflict. By const(Con) we denote the pair of CCBS
constraints for a conflict Con ∈ Ψ, and by const(Ψ) – the set of all pairs of
CCBS constraints were generated this way, i.e.,
6 Note that CreatePS does so without explicitly running CSIPP an exponential number of
times.
7 Technically, an MDD can have multiple sinks, where each sink is labeled as either true
34
been used in the context of MAPF before, e.g., in the ICTS and E-ICTS algo-
rithms [8, 13]. Every node in our M DDR is a pair (u, t) where u is a vertex
in G and t is a point in time. An edge ((u, t)(v, t0 )) corresponds to a timed
780 action from u to v that starts at t and ends in t0 . We generate our M DDR
by performing a variant of Dijkstra’s algorithm that considers all possible wait
actions CSIPP may include to avoid any constraint in const(Ψ). Algorithm 3
describes in detail how this is done.
21 return (X i , E i )
or false. This is equivalent to having a single sink that gathers all sinks labeled as true, and
removing all branches that only end up in sinks labeled as false.
35
The input to Algorithm 3 is a MAPFR problem Π, a set of CCBS constraints
785 const(Ψ), a makespan bound µ, and an agent i. X i and E i in Algorithm 3 are
the nodes and edges of the M DDR it generates for agent i. The root node of this
M DDR is (S(i), t). Initially, X i contains only the root node and E i is empty.
Open is a collection of M DDR nodes ordered by their time points, which is
initialized with the M DDR root node. In every iteration, the best (minimal
790 time) node (u, t) in Open is removed from Open and expanded. The children
of (u, t) are generated as follows. For every move action a ∈ A that starts with
u and ends before the makespan bound (i.e., t + aD ≤ µ), we generate a child
node (to(a), t + aD ), which represents performing action a at time t (line 9-10).
If there exists a CCBS constraint that prohibits agent i from performing a at
795 time t, we create an additional node in the M DDR that represents the option
of waiting at u until a can be performed without conflicting with this constraint
(line 11-15). If there is a CCBS constraint that prohibits waiting at to(a) then
we create an additional node in the M DDR to allow the agent to stay at from(a)
until it can apply a to reach to(a) immediately after the constraint on waiting at
800 to(a) ends (line 16-20). Note that the generated M DDR includes single-agent
plans that violate some of the given CCBS constraints. For example, for a move
action a ∈ A and an M DDR node (u, t), Algorithm 3 will create a child node
(to(a), t + aD ) even if there is a CCBS constraint that prohibits performing a at
that time. This is done so that the generated M DDR represents all single-agent
805 plans CSIPP would return given any subset of the given CCBS constraints. The
create PS described below will ensure that the set of single-agent plans chosen
will comprise a valid solution if possible.
Example 3. Figure 5 shows the M DDR structures for the agents in our run-
ning example (Figure 3). The upper part shows the M DDR structures created
for each agent w.r.t. an empty set of constraints. In our example, each agent has
a single shortest single-agent plan, and so the M DDR created for an empty set
of constraints is a line. The red and blue agents’ plans have a conflict between
the timed actions (F → I, 2.000) and (H → C, 2.000). The unsafe intervals for
36
A B F I I
E F I J J
G H C D
0 1 2 3 4 5 6 7 8 9 10
4.828 6.828
A B F I I I
E F I J J
F I J
G H C D D
H C D
0 1 2 3 4 5 6 7 8 9 10
3.310 3.743 4.828 6.571 6.828 8.571
8.310 9.310
Figure 5: An illustration of M DDR s. The upper part shows an initial M DDR for the
MAPFR problem in Figure 3. The lower part shows the M DDR for the same problem after
resolving a conflict between the red and blue agents traversing (F, I) and (H, C) respectively.
37
these actions are [2.000, 3.310) and [2.000, 3.743), respectively. So, the CCBS
constraints generated for this conflict are
agent i plans performs an action that starts at time t, ends at time t0 , and
moves the agent from v to v 0 . Wait actions from the M DDR are represented
0
t,t 0
in our PS by Ev,v 0 (i) variables with v = v . To ensure that an assignment to all
these variables constitutes a solution to the given MAPFR problem, we add the
following restrictions over these Boolean variables to the PS.
_ 0
t,t
Xvt (i) ⇒ Ev,v 0 (i), (16)
(v 0 ,t0 ) | [(v,t);(v 0 ,t0 )]∈E i
X 0
t,t
Ev,v 0 (i) ≤ 1 (17)
(v 0 ,t0 ) | [(v,t);(v 0 ,t0 )]∈E i
0 0
t,t t
Ev,v 0 (i) ⇒ Xv 0 (i) (18)
810 Equations 16 and 17 state that if agent i appears in vertex v at time t then it has
to leave through exactly one edge connected to v. Equation 18 establishes that
once an agent enters an edge in the M DDR it has to leave it at its endpoint.
We also add to the PS clauses that verify every agent starts in its start location
and ends in its goal. Thus, a satisfying assignment to this formula represents a
815 solution to the given MAPFR problem.
38
Step 4. Add constraints to verify all conflicts are avoided. To
avoid all the CCBS conflicts in Ψ, we verify that for every conflict Con ∈ Ψ
one of the CCBS constraints in const(Con) is satisfied. This is done by adding
to the PS created in step 3 an appropriate disjunction. That is, for a conflict
h(ai , ti ), (aj , tj )i, where ai is the action that moves the agent i from v to v 0 and
aj – moves agent j from u to u0 , we add the following clause:
This clause represents mutual exclusion between the two conflicting actions.
Thus, having this clause in the PS ensures at most one of these timed actions
will be applied in any satisfying solution.
Example 4. Consider again our running example from Figure 3, and the three
M DDR structures shown in the upper part of Figure 5. The variables created
for the corresponding PS, are:
XA0.000 (1), XB2.000 (1), XF4.000 (1), XI6.828 (1), XI8.000 (1),
0.000,2.000 2.000,4.000 4.000,6.828 6.828,8.000
EA,B (1), EB,F (1), EF,I (1), EI,I (1)
XE0.000 (2), XF2.000 (2), XI4.828 (2), XJ6.828 (2), XJ8.000 (2)
0.000,2.000 2.000,4.828 4.828,6.828 6.828,8.000
EE,F (2), EF,I (2), EI,J (2), EJ,J (2)
XG0.000 (3), XH
2.000
(3), XC7.000 (3), XD
8.000
(3)
0.000,2.000 2.000,7.000 7.000,8.000
EG,H (3), EH,C (3), EC,D (3)
for agent 3 (blue). These variables are used in the constraints defined by Equa-
tions 16-19. In addition, we establish that each agent starts and ends in its
start and goal location by setting the variables XA0.000 (1), XE0.000 (2), XG0.000 (3)
(start positions) and XI8.000 (1), XJ8.000 (2), XD
8.000
(3) (goal positions) to true.
39
The resulting formula can be solved easily by setting all variables to true, which
corresponds to the solution in which the agents do not wait in any position and
go directly to their goals: agent 1 (green) goes from A to B to F to I, agent 2
(red) goes from E to F to I to J, and agent 3 (blue) goes from G to H to C to D.
As noted earlier, this solution has a conflict, and the M DDR in the lower part
of Figure 5 is created by considering the CCBS constraints designed to resolve
this conflict. The PS created for the three M DDR s in the lower part of Figure 5
includes additional variables representing new timed actions and locations the
agent may now occupy. These variables are:
8.000,8.571
Agent 1 (green):XI8.571 (1)EI,I (1)
2.000,3.743 3.743,6.571
Agent 2 (red):XF3.743 (2), XI6.571 (2), XJ8.571 (2), EF,F (2), EF,I (2),
6.571,8.571
EI,J (2)
Additional clauses are added for these variables following Equations 16–18. In
addition, the following clause it added as specified in Equation 19
2.000,4.828 2.000,7.000
¬EF,I (2) ∨ ¬EH,C (3). (20)
to verify that the conflict identified in the previous solution will be avoided.
40
eventually enables performing a move action. Therefore, it is sufficient to only
consider the wait actions’ duration specified in lines 12 and 17. A formal proof
is given in Appendix B.
7 µ ← Compute next µ
41
845 Theorem 3. SMT-CCBS is sound, solution complete, and is guaranteed to
return a makespan-optimal solution.
Proof. Since every solution returned by SMT-CCBS was also returned by µSMT-
CCBS, and µSMT-CCBS is sound (Theorem 2), then SMT-CCBS is sound as
well. The initial value of µ is set as a lower bound on the optimal makespan,
850 since no agent can reach faster to its goal than when it ignores all other agents.
In every subsequent iteration of SMT-CCBS, µ is incremented by the minimal
amount required to allow Algorithm 3 to add at least a single node to one of the
M DDR structures. Let ∆µ be this minimal amount. By definition, increasing
µ by any amount smaller than ∆µ will result in exactly the same M DDR struc-
855 tures as created for µ. Consequently, µSMT-CCBS will not find a valid solution
for any makespan bound smaller than µ + ∆µ . Thus, solution completeness and
makespan-optimality of SMT-CCBS directly follows from the completeness of
µSMT-CCBS(Theorem 2).
42
when solving a given MAPFR problem, adding more clauses. Moreover, the
875 instances being sequentially solved by the SAT solver are very similar to each
other, differing only in relatively few new binary clauses which empowers the
role of clause learning from previous calls.
Preliminary experiments with SMT-CBS, the previous SMT-based solver for
the discrete variant of MAPF, showed that it is better to collect and reflect all
880 conflicts discovered after each plan validation step rather than choosing a subset
of them according to any preference [56]. It also turned out to be good to transfer
a set of conflicts to the next value of the objective in the iterative scheme.
We follow the analogous design choice in SMT-CCBS too. That is, we collect
and maintain the set of conflicts discovered by DECIDEMAPFR throughout the
885 entire course of the SMT-CCBS algorithm. Our implementation of SMT-CCBS
is written in C++ and available at https://github.com/surynek/boOX.
6. Experimental Evaluation
Table 1: Details about the grid graphs used in our experiments, including their source files in
the grid MAPF benchmark, their dimensions, and the numbers of free cells.
43
Figure 6: (Left) Illustration of the 2k neighborhood for k = 2, 3, 4, and 5. (Right) An invalid
move on a 16-neighborhood grid. Although the source and target cells are unblocked and the
√
line connecting them does not intersect the blocked cell, a disk-shaped agent of radius 2/4
will run into the latter in case the move is executed.
8 https://movingai.com/benchmarks/mapf.html
44
grid cells. This does not mean the duration of all move actions is unit. For
910 example, the duration of a move action from grid cell (x, y) to (x + 1, y + 1) is
√
2.
√
Agents’ shapes were disks of radius 2/4. The shape of an agent was consid-
ered when checking collisions with static obstacles as well as other agents. Thus,
if a move action starts and ends in empty grid cells but the disk representing
915 the agent intersects with a blocked cell while performing this move action, then
this action is prohibited. An example of this is given in Fig. 6 (right).
This grid-based MAPF benchmark [20] also provides scenario files for each
grid. Each scenario file lists start and goal locations for the agents. We used
this list to create MAPFR problem instances as suggested by the maintainers
920 of the benchmark [20]. That is, we created a MAPFR problem instance with
two agents whose start and goal locations are given in the first two lines of the
scenario file. Then, we created a new MAPFR problem instance by adding a
third agent whose start and goal locations are given in the third line of the
scenario file. This process is repeated to create a sequence of MAPFR problem
925 instances with more and more agents. To evaluate an algorithm on this sequence
of MAPFR problem instances, we run it until a MAPFR problem instance is
reached that the evaluated algorithm cannot solve in the given time limit. In
our experiments, we generated such sequences of MAPFR problem instances for
all the 25 random scenario files available for each grid. Overall, for each grid
930 and a specific number of agents we ran the evaluated algorithms on 25 different
MAPFR problem instances.
45
sparse dense mega-dense
169 vertices, 349 edges 878 vertices, 7,341 edges 11,342 vertices, 263,533 edges
Table 2: Details about the roadmap graphs used in our experiments, including the number of
vertices and edges in each graph.
sity. Table 2 lists the exact number of vertices and edges in each graph and
940 shows them visually.9 The finite set of move actions A for each MAPFR
problem instances created for these road graphs comprise two move actions
for each edge in the roadmap graph, corresponding to crossing that edge in
each direction at a fixed speed. We created a suite of such MAPFR prob-
lem instances in a similar way as described above for the grid-based graphs,
945 creating 25 scenario files for each roadmap graph such that each scenario file
contains non-overlapping start-goal pairs chosen randomly out of the graph ver-
tices. All the scenario files used in our experiments are publicly available at:
https://tinyurl.com/ccbs-aij-instances.
9 The mega-dense graph contains so many edges that they visually overlap.
46
where the time allocated to pathfinding is very minimal. Such applications are
prevalent in robotics, digital entertainment, and many other real-work problems.
However, we also performed dedicated experiments with different time limits,
to investigate how the evaluated algorithms scale with time.
Figure 7 depict the success rates of CCBS and SMT-CCBS under the 30
second time limit for different number of agents (x-axis) in our 2k -neighborhood
grids. Plots of different colors correspond to results for different values of k
(neighborhood size).
965 The results show that CCBS and SMT-CCBS can find SOC-optimal and
makespan-optimal solutions, respectively, under the 30 second time limit for
MAPFR problem instances with several dozens of agents. For example, both
CCBS and SMT-CCBS solved 80% of problem instances for 24 agents on the
warehouse grid for all evaluated values of k (2, 3, 4, and 5). Note that state-
970 of-the-art solvers for classical MAPF can find SOC/makespan-optimal solutions
on similar grids with many more agents. This is expected as these solvers
solve an easier problem (MAPF as opposed to MAPFR ), do not perform time-
consuming conflict detection and unsafe-intervals estimation procedures, and
their constraints over vertices/edges are more restrictive.
975 Consider now the impact of increasing the neighborhood size, i.e., increasing
k. For both algorithms, in general, increasing k leads to lower success rates. This
is an expected effect resulting from the difference in branching factor, which is 5
for k = 2 and goes up to 32 for k = 5. However, for some settings, the difference
in algorithms’ performance for k = 2 and k = 3 is negligible, and setting k to
980 3 can actually lead to better results. Consider for example results of CCBS on
den520 or the results of SMT-CCBS on warehouse, when the number of agents
is below 30. In these examples, setting k = 3 improves the success rate in most
cases. A possible explanation is that increasing k also allows the agents to find
shorter single-agent plans. This can result in a faster low-level search for CCBS,
985 and smaller M DDR structures for SMT-CCBS, both of which may reduce the
47
Figure 7: Success rate for CCBS(left plots) and SMT-CCBS(right plots) on grid maps.
48
overall runtime. In other words, increasing k means a search space with a larger
branching factor, but also having a potentially smaller depth.
Relating the SMT-CCBS results to the CCBS results is somewhat prob-
lematic, since the former aims to optimize makespan while the latter aims to
990 optimize SOC. Nevertheless, one can see that there is no universal winner, where
CCBS is able to solve problem instances with more agents in den520d and ware-
house grids while SMT-CCBS solves problem instances with more agents in
empty 16 × 16 grids. These results are consistent with the common observation
that SAT-based methods work well for small and dense graphs while CBS-based
995 methods excel in larger and sparser maps [56].
To better understand the scalability of CCBS and SMT-CCBS, we repeated
the above experiments with k = 3 and timeouts of 1, 10, 30, 60, 150, and 300
seconds. The results are depicted on Fig. 8. The y-axis shows the total number
of solved instances for each grid across the different number of agents. As
1000 expected, increasing the timeout allows both algorithms to solve more instances.
However, CCBS quickly reaches a plateau in which extending the timeout does
not allow solving significantly more instances. The most notable increase in the
performance of CCBS is visible only for going from a 1s timeout to 30s. On the
other hand, SMT-CCBS is able to gain more by increasing the timeout in all
1005 maps except empty, where the difference between the number of tasks solved
under 30s timeout and 300s timeout is considerable. For example, this increase
in timeout results in SMT-CCBS solving approximately 1.5 times more problem
instances in the den520 and warehouse grids.
Allowing for longer runtime is especially beneficial for SMT-CCBS because
1010 more time can be spent in the SAT solving phase and this time can be utilized by
the SAT solver to fully employ its learning mechanism to prune the search space.
During long runs of the SMT-CCBS solver, the high-level phases that construct
the formula become relatively less time-consuming while the SAT solving phase
is increasingly dominant with respect to the overall runtime. Hence the efficiency
1015 of the SAT solver in pruning the search space represented by formulae modeling
the MAPFR problem becomes more pronounced.
49
Figure 8: Amounts of totally solved instances by CCBS and SMT-CCBS on different 8-
neighborhood grids depending on the time-limit.
Overall, these results suggest that when the time budget is low (1-30s) CCBS
should be preferred, when this budget is high (more than 3 minutes) than the
preference should be given to SMT-CCBS.
50
Figure 9: The SOC- and makespan- gain ratios for CCBS and SMT-CCBS on grid maps.
51
by 25% compared to the SOC of the solution obtained by CCBS with k = 2.
Figure 9 shows the SOC- and makespan-gain ratios as box-and-whisker plots,
for different values of k. Each plot shows the following statistics: minimum
1035 (lower whisker), maximum (upper whisker), median (horizontal line inside the
box), mean (cross sign inside the box), first and third quartiles (box) and also
outliers (bold dots above/below whiskers).
As one can see, increasing k, indeed, decreases both SOC and makespan.
This effect is most notable when going from k = 2 to k = 3. Consider, den520d
1040 map for example. Setting k to 3 reduces both makespan and SOC by about
15-17% on average. However, increasing k beyond 3 had a much smaller effect
on cost. In fact, in all our experiments going from k = 4 to k = 5 yielded a
marginal improvement of at most 1% approximately in terms of makespan and
SOC, for both CCBS and SMT-CCBS. It is noteworthy, that a similar result
1045 was was previously observed in 2k single-agent pathfinding [57].
Next, consider the makespan results for CCBS. While CCBS is designed to
optimize SOC and not makespan, our results show that very often the solution
returned by CCBS is also optimal w.r.t. makespan. Similarly, while SMT-
CCBS is designed to optimize makespan, it often returns solutions that are
1050 SOC-optimal. To see this, consider the flat bars at 1.0 on both SOC gain and
makespan gain plots for CCBS(k=2) and SMT-CCBS(k=2) on all grids. In some
cases, CCBS returns a solution with suboptimal makespan, and, similarly, SMT-
CCBS returns a solution with suboptimal SOC. This can be seen by observing
the outliers in the room and empty grids. But in general, the shapes of the
1055 CCBS and SMT-CCBS boxes look very similar. This suggests that for our
MAPFR problems a SOC-optimal solution is often makespan-optimal and vice
versa. Moreover, the high-level behavior of both algorithms is similar – starting
with the optimal single-agent plans, identifying conflicts and resolving them in
a similar way. Therefore, the eventual solutions they return tend to be similar.
52
Figure 10: Results of CCBS(left) and SMT-CCBS(right) for different roadmaps.
53
1060 6.3. Roadmaps Results
Fig. 10 depicts the success rates for CCBS and SMT-CCBS on our roadmap
graphs. Consider first the impact of the different roadmap types on the success
rates of CCBS and SMT-CCBS. For CCBS, the success rates of the sparse and
mega-dense roadmaps are similar and notably lower than the success rate for
1065 the dense roadmap. This can be explained as follows. In the sparse roadmap,
the number of edges is small and thus agents are likely to conflict by planning
to use the same edges. In the mega-dense roadmap, the edges densely populate
the metric space in the map, and thus agents are likely to conflict by planning
to use different edges that are too close to each other. In other words, in the
1070 sparse roadmap the agents have fewer alternative paths to choose from to avoid
conflicts, while in the mega-dense roadmap agents have many alternative paths
to choose from but a large number of them conflict. In both cases, numerous
conflicts arise, making the problem harder for CCBS. The dense roadmap
provides a reasonable trade-off between the number of alternative paths the
1075 agents have to choose from and the likelihood that such paths conflict, yielding
a notably higher success rate.
The impact of the different roadmaps on success rates is different for SMT-
CCBS. There, the success rates monotonically decrease for all number of agents
when increasing the density of the roadmap graph. This occurs because signifi-
1080 cantly larger and denser graphs require creating and solving SAT formulae with
more variables and constraints. The impact of increasing the density of edges
here is multiplied by the absence of compression at the M DDR level, as in the
roadmaps there is scarce symmetry in the neighborhood of a position. There-
fore, the possibility of generating a single node in M DDR by different paths
1085 is eliminated. This is also shown when comparing the success rates of CCBS
and SMT-CCBS. The latter performs better on the sparse roadmap while the
former is superior on the moderately sized and larger roadmaps (dense and
mega-dense).
Next, consider the SOC and makespan of CCBS and SMT-CCBS in the
1090 different roadmaps. Since the results show SOC and makespan over only the
54
empty16x16 den520d
1100 900
1000 800
900
Solved Instances
700
Solved Instances
800
700 600
600 500
500 400
400 300
300 200
200
100 100
0 0
k=2 k=3 k=4 k=5 k=2 k=3 k=4 k=5
CBS-CT E-ICTS CCBS SMT-CCBS CBS-CT E-ICTS CCBS SMT-CCBS
warehouse rooms
1400 400
1200 350
Solved Instances
Solved Instances
1000 300
250
800
200
600 150
400 100
200 50
0 0
k=2 k=3 k=4 k=5 k=2 k=3 k=4 k=5
CBS-CT E-ICTS CCBS SMT-CCBS CBS-CT E-ICTS CCBS SMT-CCBS
solved instances, there are some cases where the average SOC or makespan for
the denser roadmaps is slightly higher. However, in general, denser roadmaps
allow finding solutions with lower costs by both algorithms, as expected. That
being said, the actual difference in makespan between dense and mega-dense
1095 is very small, suggesting that for finding optimal solutions, the dense roadmap
provide sufficient discretization of the continuous space to find high quality
solutions. However, finding the suitable density for a roadmap of a given terrain
is a topic that is beyond the scope of this work.
1100 Next, we compare the performance of CCBS and SMT-CCBS to other solvers
that also address versions of the MAPFR problem as explained in Section 2.2.
Namely, we compared our algorithms with E-ICTS [13] and ECBS-CT [14].
Both E-ICTS and ECBS-CT are bounded-suboptimal algorithms — they ac-
cept a parameter w ≥ 1 that bounds the suboptimality of the solution they
1105 return. In our experiments, we set w = 1 to ensure that the returned solution
55
is optimal.10 We refer to ECBS-CT with w = 1.0 as CBS-CT. We used the au-
thors’ implementations of these algorithms, which are either freely available on
Github (E-ICTS)11 or were shared with us by the ECBS-CT authors. Note that
these implementations may not exactly match those used in prior publications
1110 about these algorithms, and thus the results we report may differ.
E-ICTS and CBS-CT implementations handle continuous time by discretiz-
ing it according to a minimal wait time parameter ∆. We set it to be 1/1000 in
our experiments. Since these implementations do not support general graphs,
we ran our comparison on 2k -neighborhood grids only. E-ICTS supports these
1115 grids by default. For CBS-CT we designed motion primitives that correspond
to moves along the edges of 2k -neighborhood grid.
Fig. 11 shows the number of solved instances for all algorithms under a time
limit of 30s. Each plot corresponds to a particular map from our dataset. On the
empty-16-16 grid SMT-CCBS clearly outperforms the competitors, while the
1120 performance of CCBS is similar to the one of E-ICTS. On the warehouse map,
CCBS dominates for all k, however, SMT-CCBS is very close for k = 3, 4, 5.
On rooms for k = 2 E-ICTS is a winner, for k = 2, 3 – CCBS is and for k = 5
CBS-CT solves the most instances. Finally, for the largest den520d map CCBS
and SMT-CCBS evidently outperform the other algorithms for k = 3, 4, 5. For
1125 k = 2 CBS-CT is very close to SMT-CCBS. Overall, our experiments show that
there is currently no universal winner. This is aligned with the current state of
the art in classical MAPF algorithms, where identifying which MAPF algorithm
to use in which domain is an open question.
10 Note that since E-ICTS and ECBS-CT discretize time, the cost of the solutions they
return may still be higher than those returned by CCBS and SMT-CCBS. An example where
discretizing time yields suboptimal solution is shown in Figure 4. However, in our experiments
we did not observe any notable difference in solution costs.
11 We used the version dated 30 August 2019 from https://github.com/thaynewalker/hog2.
56
7. Related Work and Discussion
1130 In this section, we discuss the pros and cons of the two algorithms we pro-
posed (CCBS and SMT-CCBS), and how they are related to algorithms for
solving other variants of the general MAPF problem.
57
while others can only be solved by a search-based algorithm. This motivated us
to develop both a search-based algorithm (CCBS) and a SAT-based algorithm
1160 (SMT-CCBS) for solving MAPFR problems.
CBS-CT [14] 3 7 7 3 3 3 7
E-ICTS [13] 3 7 7 3 3 3 7
AA-SIPP(m) [38] 3 3 3 3 7 7 7
MCCBS [17] 7 7 7 3 3 3 7
CBS-CL[68] 3 7 7 7 3 7 7
MAPF-POST [69, 70] 3 3 7 3 3 7 7
ORCA [71], ALAN [72] 3 7 3 3 7 7 3
dRRT* [73] 3 7 7 3 3 7? 7
CCBS [18], SMT-CCBS [19] 3 3 7 3 3 3 7
There exists a vast body of other works that study MAPF beyond its ba-
sic, classical, setting. Table 3 provides a differential overview of related work
on MAPF beyond its basic setting. Rows correspond to different algorithms
1165 or family of algorithms. Columns specify algorithm properties that the listed
algorithms have or not. These properties are whether the listed algorithm (1)
supports non-uniform durations for move actions (“Non-uniform move dura-
tions” column), (2) requires discretizing the durations of wait actions (“Wait
is not discretized”), (3) allows moving from one graph vertex to the other via
1170 the straight line even if there is no corresponding edge (“Any-angle moves”), (4)
considers the agents’ geometric shapes (“Agents’ volume”), (5) guarantees solu-
tion completeness (“Complete”), (6) guarantees optimal solution cost (“Opt.”),
and (7) whether the algorithm is distributed or not (“Dist.”). Completeness and
optimality here are with respect to a given movement and time resolution. In
1175 the rest of this section, we discuss the related works listed in this table and
others, in more detail.
58
In Section 2.2, we discussed several algorithms that are closest to our work,
including E-ICTS [13] and CBS-CT [14] . As explained there, both E-ICTS and
CBS-CT require a minimal wait duration to be given as the input parameter,
1180 and the duration of all wait actions are multiplicatives of that parameter. As
shown in Example 2, discretizing wait actions in such a way can lead to finding
suboptimal solutions. So, while E-ICTS and CBS-CT are optimal with respect
to the chosen discretization, the solutions they return may have a higher cost
than the solutions returned by CCBS and SMT-CCBS, which do not discretize
1185 time.
Yakovlev and Andreychuk [38] proposed AA-SIPP(m), an any-angle MAPF
algorithm. AA-SIPP(m) is based on SIPP and adopts a prioritized planning
approach. It does not guarantee completeness or optimality. Li et al. [17]
proposed Multi-Constraint CBS (MCCBS), a CBS-based algorithm for agents
1190 with a geometrical shape that may have different configuration spaces. However,
they assumed all actions have a unit duration and did not address continuous
time. Walker et al. [68] proposed CBS-CL, a CBS-based algorithm designed to
handle non-unit edge costs and hierarchy of movement abstractions. CBS-CL is
solution complete. However, CBS-CL does not allow reasoning about continuous
1195 time and does not return provably optimal solutions. Hönig et al. [69, 70]
proposed MAPF-POST, which is a post-processing step that adapts a solution
to a classical MAPF such that it respects a given set of kinematic constraints
over the agents’ motions. They prove that MAPF-POST can always find a
feasible schedule that satisfies the given kinematic constraints, and thus we
1200 view MAPF-POST as a complete algorithm. It does not, however, guarantee
optimality.
dRRT* is a hybrid between a tree-search algorithm and a sampling-based
algorithm [73]. It runs a tree search over locations sampled in the configuration
space. dRRT* is solution complete – given enough time it will find a solution
1205 if one exists. Regarding solution quality, dRRT* provides a probabilistic type
of guarantee: given enough time it will return an optimal solution. However,
it does not provide a mechanism for identifying when this occurs. Thus, we do
59
not consider dRRT* as an optimal MAPF algorithm, since the cost the solution
it returns when halted may be far from optimal. Also, it is not clear how the
1210 duration of wait actions are chosen by dRRT*.
ORCA [74, 71] is a fast and distributed collision avoidance mechanism that
has been used to navigate multiple agents in continuous space [75]. It works by
computing for each agent in every time step the direction and velocity it should
use to avoid the other agents. While fast, using ORCA to navigate multiple
1215 agents does not provide any completeness or optimality guarantees. In addition,
ORCA requires time to be discretized. ALAN [72] is a multi-agent navigation
algorithm that integrates multi-armed bandits with collision avoidance using
ORCA that yields lower cost solutions. Like ORCA, ALAN is not complete or
optimal, and requires time to be discretized.
1220 All the algorithms mentioned above are designed to solve some variants of
the MAPF problem. The Multi-Agent Pickup and Delivery (MAPD) problem is
a problem that is closely related to MAPF, in which agents need to pick up and
deliver packages from one location to another. Techniques proposed to solve
MAPF problems have also been adapted to solve MAPD problems [76, 77, 39].
1225 In particular, Ma et al. [39] proposed a MAPD algorithm called TP-SIPPwRT
that also uses SIPP to handle continuous movement of non-holonomic agents in
this setting. TP-SIPPwRT is not optimal and is only complete for well-formed
MAPD problems. Future work may consider integrating ideas from CCBS and
SMT-CCBS and applying them to solve MAPD problems as well.
60
level heuristics [11]. The resulting solver, named Improved CCBS (ICCBS), was
reported to notably outperform CCBS on both 2k grids and roadmap graphs.
1240 Walker et al. [24] proposed another powerful CBS enhancement as part of
their CBS+TAB algorithm. The key idea of CBS+TAB is to add multiple
time-range constraints (similar to the ones proposed in this work) per single CT
node of CBS in order to prune the CT. The reported increase in algorithm’s
performance is significant. The original implementation of CBS+TAB allows
1245 wait actions only of the fixed duration. That being said, one may be able to
implement CBS+TAB such that it handles non-discretized wait actions, and one
may incorporate the action pruning rules from CBS+TAB in our algorithms.
This spatial symmetry-breaking technique of CBS+TAB was later augmented
with a cost-based symmetry-breaking [79] inspired by techniques used in E-
1250 ICTS. The resulting hybrid algorithm, called Conflict-Based Increasing Cost
Search (CBICS), was reported to outperform the competitors. Still, their eval-
uation was carried out with the fixed duration of the wait action equal to one
time step. We believe that implementing CBICS that supports wait actions of
arbitrary duration is a promising direction of future work.
1255 Recent development in SMT-CCBS focused on reducing the size of M DDR
structures being generated by the algorithm [80]. The M DDR size reduction
method is based on including promising paths into M DDR first. An additional
loop is needed in the corresponding modification of SMT-CCBS that gradually
extends the set of paths represented in M DDR structures starting with promis-
1260 ing ones and continuing towards full M DDR structures representing all paths.
The preference of paths for inclusion in M DDR structures represents a room
for integrating domains specific heuristics in SMT-CCBS.
8. Conclusion
61
presented is called CCBS. It follows the Conflict Based Search (CBS) frame-
work [7], but uses an adapted version of Safe Interval Path Planning (SIPP) [15]
for the low-level search, and a unique type of conflicts and constraints for the
1270 high-level search. We prove that CCBS is a sound, solution complete, and re-
turns SOC-optimal solutions. The second algorithm we presented is called SMT-
CCBS. It follows the same approach as SMT-CBS [16] by breaking the given
MAPFR problem into a sequence of bounded-cost MAPFR problems. Each of
these bounded-cost problems is solved by applying a SAT modulo Theory (SMT)
1275 problem-solving framework. That is, a propositional skeleton (PS) is generated
and continuously updated until a solution to this PS represents a valid solution
to the given bounded-cost problem. We prove that SMT-CCBS is also sound
and solution complete, and guarantees a makespan-optimal solution is returned.
To the best of our knowledge, CCBS and SMT-CCBS are the first algorithms
1280 to provide optimality guarantees for such a general version of MAPF.
We implemented both algorithms and evaluated them experimentally on a
set of benchmarks, including grid-based graphs [20] and roadmaps. Our exper-
imental results showed that while MAPFR is, in general, more difficult than
classical MAPF, using either CCBS or SMT-CCBS enables finding optimal so-
1285 lutions to problems with dozens of agents under a strict time limit of 30 seconds.
Nevertheless, there is much room for improvement, and scaling to even larger
problems is still an open challenge. One of the bottlenecks we observed in
our experimental evaluation is conflict detection, which is more challenging in
MAPFR than in classical MAPF. Future work may apply meta-reasoning tech-
1290 niques to decide when and how much to invest in conflict detection throughout
the search. Another direction for future work is to integrate in CCBS the many
extensions and improvements that have been proposed over the years. These
improvements include disjoint splitting of CBS constraints [78], adding admis-
sible heuristic to the high-level search [11, 81], and novel forms of constraints
1295 symmetry-breaking [82]. Some of these improvements may be easy to incorpo-
rate in CCBS while incorporating others is more challenging. An initial step in
this direction has already been taken [62]. Finally, incorporating recently pro-
62
posed CBS enhancements, such as those mentioned in Section 7.3, into CCBS
and SMT-CCBS, is another prominent prospect of future work.
1300 One possible future research direction for SMT-CCBS is to study its con-
version to DPLL(MAPFR ), a solver that would integrate MAPFR solving and
the SAT solver in a more sophisticated way similarly as it is done in DPLL(T)
solvers [31]. In contrast to SMT-CCBS which waits until the SAT solver finds
a complete satisfying truth-value assignment and then tests it for collisions,
1305 DPLL(MAPFR ) would test for collisions even partial truth-value assignments
that arise during the search in the SAT solver. This could prune the search
made by the SAT solver using the high-level knowledge of MAPFR .
Finally, both CCBS and SMT-CCBS rely on a given finite set of move actions
that are given as input. This limits the completeness of both algorithms, in the
1310 sense that a MAPFR problem may be solvable with one set of move actions and
unsolvable in another. Similarly, it limits our claims for optimality — CCBS and
SMT-CCBS are both only optimal with respect to the given set of move actions.
Future research may investigate developing algorithms that do not accept the
set of allowed move actions as input, and instead automatically computes the
1315 necessary move actions to find an optimal solution. This research direction is
particularly challenging, as there are an infinite number of move actions, which
include stopping for an arbitrary amount of time at any point in space and
varying the speed at which the agents move between vertices.
Acknowledgments
1320 This research was partially funded by the Israeli Science Foundation (ISF)
grant #210/17 to Roni Stern. Konstantin Yakovlev and Anton Andreychuk
were supported by Russian Science Foundation (RSF) grant #16-11-00048. An-
ton Andreychuk was also supported by the “RUDN University Program 5-100”.
Pavel Surynek was supported by the Czech Science Foundation (GAČR) grant
1325 #19-17966S.
63
Appendix A. An Example of the SMT Problem-Solving Procedure
Note that in any solution to Γ, the common axioms of equality, such as transi-
tivity, must hold. In an SMT approach to solve Γ, we can define the PS
where Xab , Xac , Xbc , and Xad are propositional variables such that xij rep-
resents that i = j for i, j = {a, b, c}. The SAT solver can set Xab = true,
Xac = true, Xbc = f alse, and Xad = f alse to satisfy this PS. The correspond-
1330 ing solution is inconsistent with the semantic of equality (EQ) theory, due to the
transitivity of equality. Namely, if xab and xac are both true, then xac must also
be true. Note that the SAT solver does not know this, as it is not represented in
the PS. A suitable DECIDET is needed to check that equality theory holds and
suggest a conflict otherwise. We denote by DECIDEEQ this implementation of
1335 DECIDET . In our case, DECIDEEQ can suggest the conflict Xab ∧ Xac → Xbc .
This conflict is added as to PS, so the SAT solver can give a new assignment.
In this way, knowledge from the underlying theory is propagated by the conflict
to the PS level.
1340 In this section, we provide a formal proof for Theorem 2, which states that
µSMT-CCBS is sound and complete. Soundness in this context means that
every solution returned by µSMT-CCBS is indeed a valid MAPFR solution with
makespan equal to or smaller than the makespan bound. Completeness in this
context means that if such a solution exists then it will be found, and if a solution
1345 does not exist then µSMT-CCBS will return that no solution exists. Soundness
64
is established by the fact that DECIDEMAPFR verifies that the returned solution
is valid. Establishing completeness, however, is less trivial and is done below.
For µSMT-CCBS to be complete, we require that the set of single-agent
plans it considers contains every single-agent plan that CSIPP would return
1350 given every subset of the set of all pairs of CCBS constraints (const(Ψ)). To
state this more formally, let M DDR (µ, const(Ψ)) be the set of single-agent plans
represented by the M DDR created for µ and const(Ψ), and let CSIPP(Const0 )
be the single-agent plan created by CSIPP given the set of constraints Const0 .
Lemma 2. For every set of CCBS constraints Const0 ⊆ const(Ψ) it holds that
1355 CSIPP(Const0 ) ∈ M DDR (µ, const(Ψ)).
1. There is a move action a0 that starts at S(i), and there is CCBS constraint
hi, a0 , [0, tu )i.
2. There is a move action a0 that starts at S(i) and ends in some vertex v
1370 which has safe interval that starts at some time t where t > a0D . In this
case, CSIPP may choose to wait at S(i) until it can arrive at v at time t.
In the first case, (a1 )D = tu . In the second case, (a1 )D = t − a0D . Our M DDR
covers both cases. The first case is explicitly mentioned in lines 11-15. To
see why the second case is also covered in our M DDR , recall that CSIPP only
65
1375 creates a new safe interval in a vertex v only when there is constraint over a wait
action at that vertex. Let hi, a0 , [t, tu )i be this constraint. The new safe interval
for CSIPP is designed such that it starts exactly when the unsafe interval of this
constraint ends, i.e., at tu . Therefore, to reach v at time tu with some action
avu we need to wait at v exactly tu − avu . An edge corresponding to such a wait
1380 action exists in our M DDR , as specified in lines 16-20.
Induction step. Now, assume the induction statement holds for j < m,
and consider the mth timed action (am , tm ) in the plan CSIPP returned. Let
(v, t) be the location and time reached after performing the first m − 1 actions
in π. By the induction assumption, the node (v, t) exists in our M DDR . The
1385 same argument used in the base step hold here as well. If am is a move action,
the edge ((from(am ), t), (to(am ), t + (am )D )) exists in our M DDR (lines 9-10).
If am is a wait action, it was created either to avoid a conflict with a subsequent
move action or to allow a subsequent move action to reach the start of some
safe interval. Both options are covered in our M DDR generation algorithm in
1390 lines 11-15 lines 16-20
Proof. There are three outcomes for every iteration of µSMT-CCBS: (1) the
current PS is not solvable, (2) the solution returned for the current PS has a
conflict, and (3) the solution returned for the current PS has no conflict. If the
1395 first outcome occurs, then due to Lemma 2 we can conclude that no solution
indeed exists. If the second outcome occurs, the added conflict will be avoided
in future iteration by adding its corresponding pair of CCBS constraints. No
potential solution is lost by this since the CCBS constraints are a sound pair of
constraints. If the third outcome occurs, the solution is returned, as required.
1400 Thus, in every iteration of µSMT-CCBS we do not lose any possible solutions.
Observe that a given MAPFR problem can give rise to finitely many sound
pairs of constraints under the given makespan limit µ. Hence the µSMT-CCBS
terminates as only finitely many sound pairs of constraints can be found and
resolved while pairs of constraints being resolved remain resolved in all future
66
1405 steps of the algorithm. Therefore, after finitely many steps the algorithm either
succeeds or fails. In case of failure, the set of sound constraints cannot be
satisfied for the current makespan limit µ which means there is no solution of
the input MAPFR instance for this makespan limit µ.
The experimental results of CCBS and SMT-CCBS are not directly compa-
rable due to the different implementations and different cost objectives. While
1420 CCBS optimizes SOC, SMT-CCBS returns makespan-optimal solutions. We
have also implemented a version of CCBS that returns makespan-optimal solu-
tions. This involves changing the cost of nodes in the CT to be the makespan
of the incumbent solution rather than SOC.
We performed a limited set of experiments with this makespan-optimal ver-
1425 sion of CCBS, on two 2k -connected grid maps — empty16x16 and den520d —
for k = 2, 3, 4, 5. We used the same scenario files and generated instances in the
same way as described in Section 6. The time limit was 30 seconds.
Figure D.12 shows the success rate plots comparing the makespan-optimal
CCBS with SMT-CCBS. As one can note, in many settings CCBS was able to
1430 find more solutions compared to SMT-CCBS. Still, in some settings (empty-16-
16, k=2) the latter outperformed CCBS. Exploring when each algorithms will
perform best is a topic for future research.
67
Algorithm 5: Generate M DDR for agent i and compute next
makespan bound µ.
1 GenerateMddR (Π, const(Ψ), µ, i)
2 X i ← {(S(i), 0)}; E i ← ∅; Open ← ∅
3 insert (S(i), 0) into Open
4 µnext ← ∞
5 while Open 6= ∅ do
6 (u, t) ← pop (u, t) from Open where t = mint (Open)
7 if t ≤ µ then
8 foreach a ∈ A such that from(a) = u do
9 if t + aD ≤ µ then
10 insert (to(a), t + aD ) into Open
11 X i ← X i ∪ {(to(a), t + aD )}
12 E i ← E i ∪ {[(u, t); (to(a), t + aD )]}
13 foreach hi, a, [t0 , t0u )i ∈ const(Ψ) such that t ∈ [t0 , t0u ] do
14 if t0u + aD ≤ µ then
15 insert (u, t0u ) into Open
16 X i ← X i ∪ {(u, t0u )}
17 E i ← E i ∪ {[(u, t); (u, t0u )]}
18 else
19 µnext ← min(µnext , t0u + aD )
28 else
29 µnext ← min(µnext , t + aD )
30 else
31 µnext ← min(µnext , t)
32 return (X i , E i ), µnext
68
Figure D.12: Comparison of CCBS and SMT-CCBS, that both optimize makespan objective.
Bibliography
References
69
[7] G. Sharon, R. Stern, A. Felner, N. R. Sturtevant, Conflict-based search for
optimal multi-agent pathfinding, Artificial Intelligence 219 (2015) 40–66.
1455 [9] G. Wagner, H. Choset, Subdimensional expansion for multirobot path plan-
ning, Artificial Intelligence 219 (2015) 1–24.
1470 [15] M. Phillips, M. Likhachev, Sipp: Safe interval path planning for dynamic
environments, in: IEEE International Conference on Robotics and Automa-
tion (ICRA), 2011, pp. 5628–5635.
70
[17] J. Li, P. Surynek, A. Felner, H. Ma, Multi-agent path finding for large
agents, in: AAAI, 2019.
[19] P. Surynek, Multi-agent path finding with continuous time and geometric
agents viewed through satisfiability modulo theories (SMT), in: Interna-
tional Symposium on Combinatorial Search (SOCS), 2019, pp. 200–201.
[21] J. Yu, S. M. LaValle, Multi-agent path planning and network flow, in:
1490 Workshop on the Algorithmic Foundations of Robotics (WAFR), 2012, pp.
157–173.
71
1505 A. V. Gelder (Eds.), International Conference Theory and Applications
of Satisfiability Testing (SAT), Vol. 7962 of Lecture Notes in Computer
Science, Springer, 2013, pp. 309–317.
1515 [29] H. A. Kautz, B. Selman, Unifying sat-based and graph-based planning, in:
T. Dean (Ed.), Proceedings of the Sixteenth International Joint Conference
on Artificial Intelligence, IJCAI 99, Stockholm, Sweden, July 31 - August
6, 1999. 2 Volumes, 1450 pages, Morgan Kaufmann, 1999, pp. 318–325.
[33] R. Nieuwenhuis, SAT modulo theories: Getting the best of SAT and global
constraint filtering, in: Principles and Practice of Constraint Programming
1530 - CP 2010 - 16th International Conference, CP 2010, Proceedings, 2010,
pp. 1–2.
72
[34] G. Gange, D. Harabor, P. J. Stuckey, Lazy CBS: Implicit conflict-based
search using lazy clause generation, in: International Conference on Auto-
mated Planning and Scheduling (ICAPS), Vol. 29, 2019, pp. 155–162.
1540 [37] L. Cohen, Efficient bounded-suboptimal multi-agent path finding and mo-
tion planning via improvements to focal search, Ph.D. thesis (2020).
73
[43] W. Hönig, J. A. Preiss, T. S. Kumar, G. S. Sukhatme, N. Ayanian, Trajec-
tory planning for quadrotor swarms, IEEE Transactions on Robotics 34 (4)
1560 (2018) 856–869.
[49] S. Cameron, A study of the clash detection problem in robotics, in: IEEE
International Conference on Robotics and Automation (ICRA), Vol. 2,
IEEE, 1985, pp. 488–493.
1580 [50] M. Tang, R. Tong, Z. Wang, D. Manocha, Fast and exact continuous col-
lision detection with bernstein sign classification, ACM Transactions on
Graphics (TOG) 33 (6) (2014) 186.
74
1585 [52] E. Boyarski, A. Felner, R. Stern, G. Sharon, O. Betzalel, D. Tolpin, E. Shi-
mony, ICBS: The improved conflict-based search algorithm for multi-agent
pathfinding, in: International Joint Conference on Artificial Intelligence
(IJCAI), 2015, pp. 740–746.
1595 [55] G. Audemard, L. Simon, On the glucose SAT solver, International Journal
on Artificial Intelligence Tools 27 (1) (2018) 1840001:1–1840001:25.
[60] P. Surynek, Multi-agent path finding modulo theory with continuous move-
ments and the sum of costs objective, in: U. Schmid, F. Klügl, D. Wolter
(Eds.), KI 2020: Advances in Artificial Intelligence, Vol. 12325 of Lecture
1610 Notes in Computer Science, Springer, 2020, pp. 219–232.
75
[61] P. Surynek, Logic-based multi-agent path finding with continuous move-
ments and the sum of costs objective, in: S. O. Kuznetsov, A. I. Panov,
K. S. Yakovlev (Eds.), Artificial Intelligence - 18th Russian Conference,
RCAI 2020, Proceedings, Vol. 12412 of Lecture Notes in Computer Sci-
1615 ence, Springer, 2020, pp. 85–99.
[67] E. B. Omri Kaduri, R. Stern, Analysis of classical multi agent path finding
algorithms, in: Accepted to the International Symposium on Combinatorial
1635 Search (SOCS), 2021.
76
[69] W. Hönig, T. S. Kumar, L. Cohen, H. Ma, H. Xu, N. Ayanian, S. Koenig,
1640 Multi-agent path finding with kinematic constraints, in: ICAPS, 2016, pp.
477–485.
[71] J. Snape, J. Van Den Berg, S. J. Guy, D. Manocha, The hybrid reciprocal
velocity obstacle, IEEE Transactions on Robotics 27 (4) (2011) 696–706.
[74] J. P. Van Den Berg, M. H. Overmars, Prioritized motion planning for mul-
tiple robots, in: IEEE/RSJ International Conference on Intelligent Robots
1655 and Systems (IROS), 2005, pp. 430–435.
[75] J. Snape, J. Van Den Berg, S. J. Guy, D. Manocha, Smooth and collision-
free navigation for multiple robots under differential-drive constraints, in:
IEEE/RSJ International Conference on Intelligent Robots and Systems,
2010, pp. 4584–4589.
1660 [76] H. Ma, J. Li, T. K. S. Kumar, S. Koenig, Lifelong multi-agent path finding
for online pickup and delivery tasks, in: Conference on Autonomous Agents
and MultiAgent Systems (AAMAS), 2017, pp. 837–845.
[77] M. Liu, H. Ma, J. Li, S. Koenig, Task and path planning for multi-agent
pickup and delivery, in: Conference on Autonomous Agents and MultiAgent
1665 Systems (AAMAS), 2019, pp. 1152–1160.
77
[78] J. Li, D. Harabor, P. J. Stuckey, H. Ma, S. Koenig, Disjoint splitting for
multi-agent path finding with conflict-based search, in: International Con-
ference on Automated Planning and Scheduling (ICAPS), Vol. 29, 2019,
pp. 279–283.
1680 [82] J. Li, G. Gange, D. Harabor, P. J. Stuckey, H. Ma, S. Koenig, New tech-
niques for pairwise symmetry breaking in multi-agent path finding, in: In-
ternational Conference on Automated Planning and Scheduling (ICAPS),
2020.
78
Declaration of interests
☒ The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.
☐The authors declare the following financial interests/personal relationships which may be considered
as potential competing interests: