A Parallel Monte-Carlo Tree Search-Based Metaheuristic For Optimal Fleet Composition Considering Vehicle Routing Using Branch Amp Bound
A Parallel Monte-Carlo Tree Search-Based Metaheuristic For Optimal Fleet Composition Considering Vehicle Routing Using Branch Amp Bound
Abstract—Autonomous mobile robots enable increased flexibil- combinatorial optimization problem, structure the problem as
ity of manufacturing systems. The design and operating strategy a tree exploration problem and are solved using Branch &
of such a fleet of robots requires careful consideration of both Bound (B&B) methods [7]. However, due to the N P-hard
fixed and operational costs. In this paper, a Monte-Carlo Tree
2023 IEEE Intelligent Vehicles Symposium (IV) | 979-8-3503-4691-6/23/$31.00 ©2023 IEEE | DOI: 10.1109/IV55152.2023.10186562
Search (MCTS)-based metaheuristic is developed that guides a nature of the problem, the application of exact algorithms is
Branch & Bound (B&B) algorithm to find the globally optimal restricted to small problem instances [8]. Real-life VRPTW
solution to the Fleet Size and Mix Vehicle Routing Problem applications are considerably larger in scale [8] and finding the
with Time Windows (FSMVRPTW). The metaheuristic and exact optimal solution to such a problem is computationally expen-
algorithms are implemented in a parallel hybrid optimization sive. Therefore, most VRPTWs are solved using metaheuristic
algorithm where the metaheuristic rapidly finds feasible solutions
that provide candidate upper bounds for the B&B algorithm. methods due to their ability to find near-optimal solution in a
The MCTS additionally provides a candidate fleet composition limited time [7], [8]. However, such approximate methods do
to initiate the B&B search. Experiments show that the proposed not provide guarantees on the optimality of the solution [7].
approach results in significant improvements in computation time Hybrid optimization methods can improve the performance
and convergence to the optimal solution. and efficiency of the optimizer by combining the strengths of
Keywords: Fleet composition, Vehicle Routing, Branch &
Bound, Monte-Carlo Tree Search, Metaheuristic metaheuristics and exact algorithms. Successful metaheuristics
provide a balance between exploration and exploitation of the
I. I NTRODUCTION search space [9]. As such, Monte-Carlo Tree Search (MCTS)
In the industrial sector, reconfigurable manufacturing sys- is a reinforcement learning algorithm that balances this ex-
tems are increasingly being adopted because of their ability to ploration and exploitation and it is well suited to large-scale
scale and diversify production by supporting the adaptability combinatorial optimization problems [7], [10], [11]. In fact,
of process controls, functions, and operations [1]. A key MCTS has already been used in literature as a metaheuristic
enabler is the added production flexibility provided by the that guides a CPLEX solver toward the optimal solution [12].
adoption of fleets of autonomous mobile robots (AMRs) that Moreover, it is frequently hybridized with other optimization
move material within a plant [2]. In particular, multi-load algorithms [11]. MCTS has been found to obtain state-of-the-
AMRs enhance efficiency by picking up and dropping off art results in resource allocation problems (RAP) [13] and
multiple items in a single mission [3]. The design of such in single vehicle instances of the VRPTW, called Travelling
a fleet is a strategic problem and involves considerable capital Salesperson Problems with Time Windows (TSPTW) [14]. It
investment [4]. Therefore, all costs related to the acquisition has also been used to solve VRP problems with variable fleet
and operation should be considered. Although [5] and [6] have sizes [13]–[15]. However, MCTS has not yet been used to
recently shown the relevance of combining vehicle routing and solve FSMVRPTWs that permit different types of vehicles.
component design of the vehicles in the fleet, the combined The first contribution of this paper is the development of
vehicle routing and fleet composition has generally received an exact incremental B&B algorithm for the FSMVRPTW.
insufficient attention [4]. In this paper, the Vehicle Routing This algorithm employs a divide and conquer approach where
Problem with Time Windows (VRPTW) and capacity con- the VRPTW is partitioned into an (RAP) that first assigns
straints on the cargo mass, volume and vehicle range is used tasks to each robot using a parallel B&B algorithm, and then
to obtain operational costs. The combined VRPTW with the finds the optimal sequence in which the assigned tasks are
heterogeneous fleet composition problem, is called the Fleet completed by solving a nested TSPTW, using another B&B
Size and Mix Vehicle Routing Problem with Time Windows algorithm. The second contribution is a hybrid MCTS-based
(FSMVRPTW). This problem accommodates a heterogeneous metaheuristic (UCT-MH), that uses the Upper Confidence
fleet and considers both fixed and operational costs [4]. bounds applied to Trees algorithm [16] in the fleet composition
Fleet composition optimization problems are typically posed levels to guide its search and solves the nested TSPTW using
as a capacitated VRPTW where the fleet size can be var- a B&B algorithm. The third novelty presented in this paper
ied [7]. Exact algorithms that guarantee optimality for this is the hybrid optimization framework where the UCT-MH
guides the incremental B&B to find the optimal solution
1 Tren Baltussen, Mithun Goutham and Stephanie Stockar are with the to the FSMVRPTW. When possible, this B&B is initialized
Center for Automotive Research, The Ohio State University, Columbus, OH with a fleet composition identified by the rapid search space
43212, USA. {baltussen.1, goutham.1, stockar.1}@osu.edu exploration enabled by the UCT-MH. Additionally, the best
2 Meghna Menon, Sarah Garrow and Mario Santillo are with the Ford
Motor Company, Dearborn, MI 48109 USA, {mmenon8, sgarrow1, solutions found by the UCT-MH update the upper bound used
msantil3}@ford.com by the incremental B&B, which allows sub-optimal solutions
Authorized licensed use limited to: Universitas Brawijaya. Downloaded on November 29,2023 at 13:02:45 UTC from IEEE Xplore. Restrictions apply.
to be pruned earlier. The performance of the proposed method is the arc set. Between every pair of nodes (i, j) ∈ A, the
is verified on various real-life case studies. Results show a operational costs Dij ∈ R+ , energy consumed δeij ∈ R+ and
significant reduction in computation time when the incre- travel time δtij ∈ R+ are pre-computed before initializing
mental B&B algorithm is guided by the proposed UCT-MH, the optimization by solving a path planning problem between
especially for large problem sizes. every two locations i, j ∈ V : (i, j) ∈ A.
2nr 2n r +1
II. P ROBLEM F ORMULATION & M ETHODOLOGY r
X X
J (r, Tr ) = min
x
Dij xij (3)
Consider a manufacturing plant with a known layout that ij
∀ij∈A i=0 j=1
comprises various spatial constraints, and a set of material
s.t. xij ∈ {0, 1} ∀(i, j) ∈ A (4)
handling tasks T . Each task involves picking-up certain cargo
nr
items at inventory locations and dropping them off at their X
x0j = 1 (5)
designated drop-off locations within defined time windows.
j=1
The objective of the optimization is to find the optimal fleet 2nr 2n r +1
of multi-load capacitated AMRs that completes all the defined X
xil =
X
xlj = 1, ∀l ∈ {V \ {0, 2nr + 1}} (6)
tasks T while minimizing fixed and operational costs. i=0 j=1
Let the set H := {1, 2, ..., h} identify h different AMR 2nr
types available, each with specific traveling speeds, energy
X
xi,2nr +1 = 1 (7)
efficiency, cargo capacity, driving range, charge-time etc. Let i=nr +1
ki ≤ kimax : i ∈ H denote the number of each type of AMR
that forms a fleet so that any fleet composition can be fully
zi − δeij
if x = 1 ∧ z − δe > 0 ∀(i, j) ∈ A
defined by a vector k ∈ Nh0 . This fleet is associated with a zj =
ij i ij
(8)
fixed cost J f (k) composed of purchase costs, depreciation,
1 − δe 0j
etc., that can be captured by J f (k) = c⊤ k for some c ∈ Rh .
if xij = 1 ∧ zi − δeij ≤ 0 ∀(i, j) ∈ A
For completing all the tasks in T , the operational cost J o (k)
z0 =1; 0 ≤ zi ≤ 1 ∀i ∈ V (9)
can be any combination of relevant metrics to be minimized
such as energy, slack time, number of turns, asset depreciation,
δtij
etc. [17]–[19]. The total cost to be minimized is: if x = 1 ∧ z − δe > 0 ∀(i, j) ∈ A
ij i ij
Tij = −1
(10)
min J = c⊤ k + J o (k) (1)
δt0i + (1 − zi − δei0 )p + δt0j
k
if xij = 1 ∧ zi − δeij ≤ 0 ∀(i, j) ∈ A
The fleet operational cost J o (k) is posed as an RAP that
xij = 1 → ti + si + Tij ≤ tj ∀(i, j) ∈ A (11)
finds the optimal partition of tasks to be assigned to AMRs that
minimizes total operational cost. If the total number ti + si + Ti,n+i ≤ tn+i ∀i ∈ V (12)
Ph of robots
in the heterogeneous fleet k is given by m = i=1 ki , every ei ≤ ti ≤ li ∀i ∈ V (13)
robot in this fleet can be identified by r ∈ Rk := {1, 2, ..., m}. xij = 1 → yj = yi + qj ∀(i, j) ∈ A (14)
Let the set Tr ⊆ T denote the tasks assigned to robot r by the
y0 = 0; 0 ≤ yi ≤ Qr ∀i ∈ V (15)
partitioning
S of T , denoted by T := {Tr : r ∈TRk }, meaning
r∈Rk T r = T and ∀r, s ∈ Rk : r ̸= s, Tr Ts = ∅. The The binary flow variable xij = 1 signifies that the robot uses
optimal partition of task set T minimizes J o (k) in Eq (2). directed arc (i, j) ∈ A. Constraints related to the robot starting
X from the depot 0, visiting every location once and terminating
J o (k) = min J r (r, Tr ) (2) the sequence at 2nr + 1 are enforced by Eq. (4-7).
T
r∈Rk
The battery states of charge zj in Eq. (8-10) are updated
The minimum operational cost J r (r, Tr ) for each robot in as the robot goes about its mission. Whenever the battery is
fleet k is dependent on the AMR type, and is also affected depleted, the robot heads to the depot where it is fully charged
by the sequence with which task locations are visited, as up with a constant recharging rate p. The variable Tij in Eq.
it is possible for the robot to pickup multiple items before (10-12) updates the travel time between locations i and j based
dropping them off so long as each pickup is visited before the on whether a recharge is required between the two locations.
corresponding drop-off. The objective function and constraints Time variables ti in Eq. (11-13) denote the arrival time of
that yield J r (r, Tr ) are defined in Eq. (3-15). the robot at location i ∈ V. Each location is associated with
Let robot r of the fleet be assigned nr = |Tr | tasks. The a time si for material handling and a time window [ei , li ]
set of pickup and drop-off locations are defined as VP := which represents the earliest and latest time at which material
{1, 2, ..., nr } and VD := {nr + 1, nr + 2, ..., 2nr } respectively, handling can start. Cargo constraints are captured in Eq. (14,
so that an item picked up at location i must be dropped off 15) where payload variables yi capture the cargo mass being
at location nr + i. The origin and final destination locations carried by the robot as it leaves location i ∈ V. Each robot r
of the robot are identified by {0, 2nr + 1}. Let V := {VP ∪ has a cargo capacity limitation of Qr and each location i ∈ V
VD ∪ {0, 2nr + 1}} be the set of all locations in a graph is associated with a cargo load qi ∈ R such that qi +qn+i = 0.
representation G := (V, A) where A := {(i, j) ∈ V × V} Volumetric constraints are modeled similarly.
Authorized licensed use limited to: Universitas Brawijaya. Downloaded on November 29,2023 at 13:02:45 UTC from IEEE Xplore. Restrictions apply.
A. Exact Algorithm: Incremental Branch & Bound work pool. For each processor, this RAP B&B algorithm is
implemented by a recursive function to minimize memory and
The incremental B&B systematically partitions the search computational requirements as the tree is explored. Further,
space into subsets that are arranged in a tree structure. The since the computation time of B&B algorithms increases with
root of the tree is the original problem and the leaves of the number of feasible branches at each node, the fleet is
the tree are its individual candidate solutions. Between the initiated with a smaller candidate fleet f 1 ∈ Nh0 than the
root and the leaves are intermediate nodes that represent maximal fleet kmax ∈ Nh0 . After evaluating the total cost
subproblems obtained by recursively partitioning the original of this candidate fleet, the number of robots is incrementally
problem by a process called branching. B&B algorithms are raised until further increments do not reduce the total cost or
used to solve these sub-problems. The order according to additional robots remain idle. For each fleet increment, only
which these subproblems are examined is determined by a RAP subproblems that include at least one of the newly added
best-first selection criteria, i.e. exploitation, that first explores robots are evaluated since other solutions are guaranteed to
the problem with the cheapest cost. have been evaluated already.
For minimization problems, the upper bound is the incum- For h different AMR types available, the fleet is initiated
bent solution which is the cheapest candidate solution to the with a candidate fleet f 1 ≤ kmax , which is chosen based
original problem found at the leaf node. The upper bound on problem parameters and prior experience so that feasible
is continuously updated as the tree is explored, and is used solutions exist. The RAP of fleet f 1 is then solved using the
to prune sub-optimal branches without recursively evaluating described parallel B&B algorithm, and its minimum total cost
their solutions up to the leaf node. Thus, as the algorithm J 1 is found, which utilizes robots k1 ∈ Nh0 : k1 ≤ f 1 . In
searches from the root to the leaves, branching is conducted the increment step, only robot types i that satisfy k1i = fi1
only if the cost at the node is lower than the incumbent are incremented by 1 for the next candidate fleet f 2 . These
solution, and branching can potentially find a better solution increments are conducted so long as both J i+1 ≤ J i and
than the incumbent solution. Following this process, the B&B ∃ i : k1i = fi1 . The optimal fleet that completes all tasks while
algorithm recursively decomposes the original problem until minimizing total cost is then k i when J i+1 > J i or when
further branching is futile when the solution cannot be im- ∄ i : ki+1 = fii+1 , i.e, the additional robots were idle.
i
proved, or until the original problem has been solved when At each instance that the RAP subproblem is solved at a
every feasible branch has been evaluated. node in the arborescence shown in Fig. 1, the TSPTW problem
The N P-hard RAP problem described by Eq. 2 is solved defined by Eq. (3-15) is solved to find the cost at that node.
by the B&B algorithm implemented in a parallel framework This TSPTW problem is solved using recursive Algorithm
that uses p processing cores, as shown in Fig. 1, where robots 1 that employs another B&B to find the optimal sequence
in the fleet are identified by subscript r ∈ R := {1, 2, ..., m}. of task completion for each robot. In summary, the B&B
Thus, by splitting the arborescence at some task assignment incrementally increases the fleet size while minimizing total
level and assigning the emanating sub-trees to the available cost J of Eq. 1, which includes operational cost J o (k) of Eq.
processors, several subproblems are explored simultaneously. 2 found using the RAP B&B algorithm and the cost J r (r, Tr )
During each processor’s exploration, updated incumbent solu- of Eq. (3-15) found using the TSPTW B&B algorithm.
tions are instantaneously made available to every processor in
an asynchronous information sharing method using a shared B. Metaheuristic: Monte-Carlo Tree Search
Each iteration of MCTS involves four steps [20]: Selection:
at every node v in the arborescence, the tree policy selects
Root the next node v ′ . This node selection is initiated at the root
node v0 and is used for navigation until the leaf node vl is
… reached. Expansion: at the leaf node vl , a random action is
ܶܽ ݇ݏ1 ݎଵ ݎଶ ݎ
taken to expand the tree. Simulation: a Monte-Carlo simulation
ܶܽ ݇ݏ2 ݎଵ ݎଶ … ݎ ݎଵ ݎଶ … ݎ ݎଵ ݎଶ … ݎ is performed starting from the expansion node to complete the
solution. Backpropagation: the cost/reward of the expansion
and simulation is propagated back to the root node v0 .
…
…
…
Authorized licensed use limited to: Universitas Brawijaya. Downloaded on November 29,2023 at 13:02:45 UTC from IEEE Xplore. Restrictions apply.
Algorithm 1 Recursive TSPTW B&B determined by action a0 , where g1 (s0 (v0 ), a0 ) := 0, for the
Cost = B&B(State, taskList, current Location) fleet cost is determined by its composition. Subsequently, the
1: Find feasible next locations based on completed pickups, fleet composition k is determined by a1 ∈ A1 (m), with fixed
time, cargo, battery constraints cost g2 (s1 (m), a1 ) = J f (k). Fig. 2 provides a schematic
2: Sort feasible next locations by operational cost of branch- overview of the problem and the proposed metaheuristic.
ing to that location (Best First Search) At the fleet sizing and composition stages, the UCT-MH
3: for i in feasibleLocations do utilizes the UCB1 tree policy [16] for the selection step at
4: branchCost = tourCost + operationalCost(i) node v of the search tree:
5: if branchCost ≥ State.bestCost then s
Q(v ′ ) 2 ln N (v)
6: continue { skip to next location i+} UCB1(v) = arg max + (16)
′ N (v ′ )
7: else if branchCost< State.bestCost then v ′ ∈children of v N (v )
8: State+ = Update stateOfTime, stateOfCharge, final- Here, Q(v ′ ) is the total reward of all plays through child
Position, remainingLocations node v ′ , N (v ′ ) denotes the number of visits of child node
9: if remainingLocations > 0 then v ′ , and N (v) is the number of visits of the parent node v. The
10: Cost = B&B(State+, taskList, location(i)) policy function is dependent on the quality of the node being
11: else considered as well as the number of evaluations of that node,
12: State.bestCost = Cost balancing the exploration and exploitation of the search space
13: end if [20]. In order to apply the UCB1 policy and have a proper
14: end if balance between exploration and exploitation, the problem is
15: end for transformed such that the stage reward Ri (v) ∈ [0, 1] [16]:
gi (v ′ )
Ri (v ′ ) = 1 − (17)
gmax
size and composition. This estimate serves as a measure for
the quality of that branch and can be used by the MCTS to where Ri (v ′ ) is the reward of the transition from state si−1 (v)
navigate the search. MCTS is most effective as a heuristic at to state si (v ′ ) and v ′ ∈ children of v. It follows that Q(v ′ ) is
the early stages of the decision problem [12]. Moreover, for the sum of all rewards of all N (v ′ ) plays through node v ′ back
smaller problem instances, B&B algorithms are often more to the root node v0 :
suitable than MCTS [14]. As such, the proposed hybrid MCTS N (v ′ )
X
′
algorithm is aimed to utilize the strengths of the different Q(v ) = Ri (v ′ ) + Ri (v) + ... + Ri (v0 ) (18)
algorithms and combine them into an effective hybrid MCTS- i=1
based metaheuristic. Considering that the number of permutations of the RAP is
Although MCTS was originally designed to solve Markov exponential with the number of tasks, it is deemed sufficient to
Decision Processes, without loss of generality, MCTS can determine the task assignment by a random rollout (ξ1 , ..., ξn ).
be used to solve a design problem by formulating it as a In order to prevent any bias toward another fleet size, it is
deterministic Markov Decision Process [11]. The optimization ensured that the full fleet size is utilized, i.e. each AMR in
problem is modeled as a 3-tuple ⟨S, A, g⟩, where S is a set of the fleet will have at least one assignment. The assigned tasks
states, A is a set of actions and g(s, a) : S × A → [0, gmax ] is do not have any associated costs/rewards.
a scalar cost function for taking action a at state s. The state Since many of the TSPTW instances encountered are small
s(v) contains the parameters that follow from the decisions problem instances, it is advantageous to use the same recursive
up to node v. At the root node v0 , the fleet size m is B&B algorithm for TSPTW as described in Section II-A to
find the optimal sequence in which the assigned tasks are
completed by each robot. Each TSPTW B&B is terminated
after a one second time cap since the metaheuristic is not
aimed at local convergence. Considering the best first order
of exploration, this still finds reasonably good estimates for
the operational cost J˜o (k). The cost that is obtained through
the rollout of the RAP and the TSPTW, is backpropagated
through the tree and are assigned to Q(v) at node v that is
associated with a particular fleet size or composition. This is
in turn used by the UCB1 policy function to determine the
decisions in ′the next iteration. As a result, at the root node,
Q(v )
the term N (v ′ ) in (16) is proportional to the total mean cost-
to-go for a given fleet size or composition at node v ′ . As the
total number of plays at the root node N (v0 ) grows to infinity,
Figure 2. Overview of the multi-stage design problem, with the FSMVRPTW the UCB1 function converges to the expected value of the total
(red) and the nested VRPTW (blue), and the proposed UCT-MH Algorithm. cost for a given fleet size.
Authorized licensed use limited to: Universitas Brawijaya. Downloaded on November 29,2023 at 13:02:45 UTC from IEEE Xplore. Restrictions apply.
104
C. Hybrid Optimization: Guiding B&B with the UCT-MH
10
m=5 m = 12 m = 19
Two smaller problems are studied in detail to illustrate the 1.8 m=6 m = 13 m = 20
behavior of the UCT-MH in Fig. 3-4. The best-found cost by m=7 m = 14 m = 21
1.6
each algorithm is summarized for all case studies in Table I.
1.4
B. Case Studies
1.2
1) n = 10 and mmax = 6: Figure 3a shows the UCT-
MH exploration of the various fleet sizes, where the mean of 1
the cost-to-go starts to converge and the algorithm gains more
0.8
confidence in particular solutions as the number of evaluations 0 10 20 30 40 50 60
Time [min.]
increases. The guiding UCT-MH finds that m = 6 is the
(a) The quality of candidate fleet sizes as determined by the UCT-MH, first
best candidate and dedicates more visits to these branches as 60 minutes of simulation.
shown in Fig. 3b. As a result, the guided B&B quickly focuses
104
on local convergence (Fig. 3c). As the entire search space is 5
Best cost found [-]
Here, an increase in fleet size results in an incremental increase (b) Performance of UCT-MH and B&B algorithm and their parallelization.
of the mean cost-to-go which is associated with the fleet cost. Figure 4. Case Study 2: The UCT-MH and B&B algorithm, number of tasks
Remarkably, Fig. 4b shows that the standalone B&B is initially n = 20, maximum number of AMRs: mmax = 21, k⊤ max = [7, 7, 7].
Authorized licensed use limited to: Universitas Brawijaya. Downloaded on November 29,2023 at 13:02:45 UTC from IEEE Xplore. Restrictions apply.
Table I
E XPERIMENTAL R ESULTS - B&B AND UCT-MH
more difficult to solve. Consequently, the guided B&B discards [3] R. Yan, L. Jackson, and S. Dunnett, “A study for further exploring the
suboptimal fleets and focuses on local convergence thereby advantages of using multi-load automated guided vehicles,” Journal of
Manufacturing Systems, vol. 57, pp. 19–30, 10 2020.
reducing the overall computation time of the guided B&B. [4] A. Hoff, H. Andersson, M. Christiansen, G. Hasle, and A. Løkketangen,
“Industrial aspects and literature survey: Fleet composition and routing,”
C. Discussion Computers and Operations Research, vol. 37, no. 12, pp. 2041–2061,
12 2010.
The time taken to initialize the parallel B&B algorithm is [5] F. Paparella, T. Hofman, and M. Salazar, “Joint optimization of number
sufficient for the guiding UCT-MH to find a strong candidate of vehicles, battery capacity and operations of an electric autonomous
fleet that warm starts the guided B&B. The UCT-MH provides mobility-on-demand fleet,” in IEEE 61st Conference on Decision and
Control (CDC), 2022, pp. 6284–6291.
a reduction of computation time ranging from 38.3% up to [6] A. Wallar, W. Schwarting, J. Alonso-Mora, and D. Rus, “Optimizing
86.5%. The local convergence of the UCT-MH is dependent multi-class fleet compositions for shared mobility-as-a-service,” in IEEE
on the problem size due to the time cap imposed at the TSPTW Intelligent Transportation Systems Conference, 2019, pp. 2998–3005.
[7] G. Desaulniers, O. B. G. Madsen, and S. Ropke, “The Vehicle Routing
level. As seen in Table I, for a higher number of tasks where Problem with Time Windows,” in Vehicle Routing Problems, Methods,
the TSPTW is larger, the gap with the best-known solution and Applications., 2nd ed., 2014, pp. 119–159.
is greater (∼ 40%). However, the guided B&B is able to [8] R. Elshaer and H. Awad, “A taxonomic review of metaheuristic al-
gorithms for solving the vehicle routing problem and its variants,”
close this gap since it conducts local searches systematically. Computers and Industrial Engineering, vol. 140, 2 2020.
Further, for the case with 100 tasks, the standalone B&B was [9] I. Boussaı̈d, J. Lepagnot, and P. Siarry, “A survey on optimization
unable to find any feasible solution in 24 hours while the UCT- metaheuristics,” Information Sciences, vol. 237, pp. 82–117, 7 2013.
[10] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van
MH provided multiple solutions through its efficient stochastic Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam,
exploration of the design space. M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner,
I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and
IV. C ONCLUSIONS D. Hassabis, “Mastering the game of Go with deep neural networks and
tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 1 2016.
In this paper, a hybrid optimization algorithm was developed [11] M. Świechowski, K. Godlewski, B. Sawicki, and J. Mańdziuk, “Monte
that uses a Monte-Carlo Tree Search-based metaheuristic Carlo Tree Search: a review of recent modifications and applications,”
Artificial Intelligence Review, 2022.
(UCT-MH) to guide an exact incremental Branch & Bound [12] A. Sabharwal, H. Samulowitz, and C. Reddy, “Guiding Combinatorial
algorithm, which solves a real-life Fleet Size and Mix Ve- Optimization with UCT,” in International Conference on Integration of
hicle Routing Problem with Time Windows. The UCT-MH Artificial Intelligence (AI) and Operations Research (OR) Techniques in
Constraint Programming. Springer, 6 2012, pp. 356–361.
yields a significant improvement in the computation time and [13] B. Kartal, E. Nunes, J. Godoy, and M. Gini, “Monte Carlo Tree
convergence of the B&B by constantly sharing the expected Search for Multi-Robot Task Allocation,” in Proceedings of the AAAI
optimal fleet composition as well as the upper bound on the Conference on Artificial Intelligence, 2016, pp. 4222–4223.
[14] S. Edelkamp, M. Gath, C. Greulich, M. Humann, O. Herzog, and
cost. Although in this study MCTS was only employed at the M. Lawo, “Monte-Carlo Tree Search for Logistics,” in Lecture Notes
fleet sizing and composition level, future research needs to in Logistics. Springer Cham, 2015, pp. 427–440.
determine to what depth MCTS can be effective. Moreover, [15] C. Barletta, W. Garn, C. Turner, and S. Fallah, “Hybrid fleet capacitated
vehicle routing problem with flexible Monte–Carlo Tree search,” Inter-
modifications to the selection policy as well as bi-directional national Journal of Systems Science: Operations and Logistics, 2022.
communication between the UCT-MH and the B&B algorithm [16] L. Kocsis and C. Szepesvári, “Bandit based Monte-Carlo Planning,” in
could further improve computation times. European Conference on Machine Learning, 9 2006, pp. 282–293.
[17] J. Lu, Y. Chen, J.-K. Hao, and R. He, “The time-dependent electric
ACKNOWLEDGMENTS vehicle routing problem: Model and solution,” Expert Systems with
Applications, vol. 161, p. 113593, 2020.
This research was supported by the Ford Motor Company [18] I. Kucukoglu, R. Dewil, and D. Cattrysse, “The electric vehicle routing
as part of the Ford-OSU Alliance Program. problem and its variations: A literature review,” Computers & Industrial
Engineering, vol. 161, p. 107650, 2021.
[19] M. Goutham, S. Boyle, M. Menon, S. Mohan, S. Garrow, and S. Stockar,
R EFERENCES “Optimal path planning through a sequence of waypoints,” IEEE
Robotics and Automation Letters, vol. 7, no. 4, pp. 8566–8573, 2022.
[1] J. Morgan, M. Halton, Y. Qiao, and J. G. Breslin, “Industry 4.0 smart [20] C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling,
reconfigurable manufacturing machines,” pp. 481–506, 4 2021. P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton, “A
[2] Z. Ghelichi and S. Kilaru, “Analytical models for collaborative au- survey of Monte Carlo tree search methods,” pp. 1–43, 3 2012.
tonomous mobile robot solutions in fulfillment centers,” Applied Math- [21] O. S. Center, “Ohio supercomputer center,” 1987. [Online]. Available:
ematical Modelling, vol. 91, pp. 438–457, 3 2021. http://osc.edu/ark:/19495/f5s1ph73
Authorized licensed use limited to: Universitas Brawijaya. Downloaded on November 29,2023 at 13:02:45 UTC from IEEE Xplore. Restrictions apply.