Harvesting Efficient On-Demand Order Pooling From Skilled Couriers - Enhancing Graph Representation Learning For Refining Real-Time Many-to-One Assignments
Harvesting Efficient On-Demand Order Pooling From Skilled Couriers - Enhancing Graph Representation Learning For Refining Real-Time Many-to-One Assignments
ABSTRACT efficiency by 45-55% during noon peak hours, while upholding the
The recent past has witnessed a notable surge in on-demand food timely delivery commitment.
delivery (OFD) services, offering delivery fulfillment within dozens
of minutes after an order is placed. In OFD, pooling multiple or- KEYWORDS
ders for simultaneous delivery in real-time order assignment is a on-demand food delivery, order pooling, many-to-one assignment
pivotal efficiency source, which may in turn extend delivery time. problem, graph representation learning
Constructing high-quality order pooling to harmonize platform ef-
ficiency with the experiences of consumers and couriers, is crucial 1 INTRODUCTION
to OFD platforms. However, the complexity and real-time nature of
order assignment, making extensive calculations impractical, signif-
1.1 Backgrounds
icantly limit the potential for order consolidation. Moreover, offline In recent years, there has been a remarkable upsurge in the wide-
environment is frequently riddled with unknown factors, posing spread adoption of on-demand food delivery (OFD) services world-
challenges for the platform’s perceptibility and pooling decisions. wide. With a mere few clicks, consumers can enjoy delicious meals
Nevertheless, delivery behaviors of skilled couriers (SCs) who without stepping out, all delivered right to their doorstep within
know the environment well, can improve system awareness and just a few dozen minutes. This trend is attributable to the overarch-
effectively inform decisions. Hence a SC delivery network (SCDN) ing shifts in technological innovation, including the popularity of
is constructed, based on an enhanced attributed heterogeneous apps and online platforms, and the growing dependence on third-
network embedding approach tailored for OFD. It aims to extract party services for OFD. Global revenues for OFD sector were about
features from rich temporal and spatial information, and uncover $90 billion in 2018, rose to $294 billion in 2021, and are expected to
the latent potential for order combinations embedded within SC exceed $466 billion by 2026 [16]. Meituan Waimai, China’s pioneer-
trajectories. Accordingly, the vast search space of order assignment ing OFD platform has witnessed remarkable growth over the last
can be effectively pruned through scalable similarity calculations of decade. In 2023, the platform handles over 70 million orders daily,
low-dimensional vectors, making comprehensive and high-quality encompassing an extensive reach across almost 3,000 cities, coun-
pooling outcomes more easily identified in real time. In addition, ties and regions throughout China. 6.24 million couriers earned
the acquired embedding outcomes highlight promising subspaces income via Meituan, with over 1 million actively engaged daily.
embedded within this space, i.e., scale-effect hotspot areas, which In OFD, orders are placed continuously by consumers from var-
can offer significant potential for elevating courier efficiency. ious locations. In response, the platform promptly gathers these
SCDN has now been deployed in Meituan dispatch system. On- newly initiated orders, channels them to merchants, and assigns
line tests reveal that with SCDN, the pooling quality and extent dedicated couriers for pick-up and delivery within the promised
have been greatly improved. And our system can boost couriers’ delivery time. The platforms act as intermediaries, linking a multi-
tude of consumers, merchants and couriers within the ecosystem,
∗ Corresponding author.
† This
and strike a balance between gains and losses among these stake-
work was fulfilled when Chen Zhang interned at Meituan.
holders to achieve sustained growth and prosperity [14]. Among
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY these, consumers desire prompt services, merchants seek to main-
2018. tain food freshness, couriers aim to fulfill enough orders to earn a
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Yile Liang et al.
1
(𝐶 3000 2
+𝐶 3000 3
+𝐶 3000 4
+𝐶 3000 5 ) × 100. On the other hand, the
+𝐶 3000 leverages additional decomposition mechanisms to reduce com-
MOA problem itself is categorized as an NP-hard integer program- putational cost, yet it falls short of enabling real-time application
ming problem, known for its extremely vast search space. Crafting despite notable performance gains. To satisfy the need for solutions
online algorithms that perform effectively for the MOA is an excep- within seconds, XGBoost models are built through supervised learn-
tionally challenging task[3, 15, 35]. Moreover, the fast movement ing on historical order assignment results in [29, 32], to promote
of couriers requires assignment decisions be made within a mere combined order assignments. However, the consolidation results
10 seconds. This imperative time frame ensures the consistency of struggle to break through the constraints of historical decisions,
courier status between the information acquisition phase and the resulting in limited effectiveness.
actual assignment moment.
Consequently, the platform tends to favor one(order)-to-one(courier) 1.4 Motivations
assignments during each dispatch cycle, a strategy that reduces In light of the limitations present in existing work, it’s worth noting
computational volume and complexity, albeit at the expense of that OFD platforms are equipped with a vast fleet of couriers, and
comprehensive order pooling. extensive data on courier behaviors, especially from the skilled ones,
which offer insights for high-efficiency and quality delivery ser-
vices and enhance system intelligence. Skilled couriers (SCs) often
possess a comprehensive grasp of the offline environment, includ-
ing order distribution and road logistics, and continually improve
their delivery skills to adapting to complex conditions. Moreover,
our couriers can reject or transfer system-assigned orders, lever-
aging their expertise to optimize routes, minimizing detours and
overtime. Additionally, the platform gathers courier preferences
for pick-up and delivery locations via their apps, promoting effi-
Figure 3: Calculation volume and search space for modeling cient operations with fewer bottlenecks. Thus SCs’ behaviours
and solving MOA problems in each dispatch cycle. of order selection, route sequence and feedback can provide
the system superior courier-oriented pooling outcomes and
help improve decision quality.
(2) Limited system awareness on the “last mile" offline
In the past decade, the work on word representation learning
environment. In OFD, the "last mile" offline environment is highly
has achieved cutting-edge results [7, 17, 20, 25]. Neural language
intricate and dynamic [34], encompassing unforeseen road closures,
models replace traditional high-dimensional and sparse word vec-
unknown natural obstacles, and pandemic-related lockdowns. OFD
tors with low-dimensional and dense embeddings, which assume
platforms are unable to fully access these extensive, finely-detailed
that frequently co-occurring words share stronger statistical depen-
spatiotemporal data during large-scale decision-making, due to
dencies. Recently, graph representation learning (GRL) methods
insufficient map precision and digital capabilities, along with com-
[4, 13] have increasingly been applied in various fields, including
putational and storage constraints. Consequently, order pooling
e-commerce [6, 8, 28], job search [12, 21], ride-sharing [26, 27],
decisions based on coarse data and limited awareness, may not
to discover diverse types of recommendations on the Web. These
be reasonable, potentially harming courier experiences, causing
approaches have had a major impact in both academia and industry.
delivery delays, and reducing delivery efficiency.
Drawing on prior achievements and the principle that orders
frequently combined together in SCs’ routes tend to yield
1.3 Related Work
top-tier pooling results, this paper aims to using GRL methods
Prior research on order pooling algorithms primarily focused on to uncover the latent potential for order pooling embedded
batching issues in traditional warehouse management [1, 19, 30]. within the SCs’ behaviour data. Therefore, through scalable
However, the more relaxed time constraints of warehouse batching low-dimension vector calculations, instead of massive and time-
algorithms, typically in minutes, or even hours, are not well-suitable consuming PDRP computations, we effectively prune the MOA
for the urgency required in OFD. problem’s search space, shown in Figure 3, meanwhile extract
In recent years, research pertaining to OFD has gradually gained small-scale and isolated subspaces promising for high-quality order
traction. The prevalent method for order pooling batches orders consolidation results, facilitating real-time, effective order pooling.
based on geographical proximity and closeness of their promised
delivery time [22]. However, while these criteria-based batching 1.5 Contributions
rules are straightforward, they limit the scope for consolidation. An
Accordingly, a systemic solution framework, named as SC delivery
exact algorithm for order batching and assignment is proposed in
network (SCDN), is proposed. The novel contributions are:
[31], under the unrealistic assumption of perfect information about
(1) Graph Modelling: We construct a delivery network from
the arrival of orders. The study in [9] produces monthly OFD task
SC route sequences, with flow unit (FU) as nodes linked by SC
groupings offline to facilitate order consolidation, However, their
behavior sequences. An FU is a directed vector from pick-up areas
effectiveness is heavily reliant on order structure stability. Work in
of interest (AOI 2 )[36] to delivery AOI. Orders of an FU share the
[10, 11] achieve order consolidation using iterative clustering on
an order graph, but the batching algorithm’s complexity and com- 2 AOIsare defined as non-overlapping irregular polygons that comprehensively divide
putational load hinder real-time processing. Similar work in [24] and cover the space
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Yile Liang et al.
same pick-up and delivery AOIs. The network is formulated as an participate in both pick-up and delivery actions during order fulfil-
attributed multiplex heterogeneous network (AMHEN), with ment, there are two kinds of FU sequences: one based on pick-up
FU nodes featuring multiple attributes for temporal and spatial behavior and the other on delivery, as shown in Figure 4. Diverse
information, and links representing two different types of courier couriers’ FU sequences may incorporate some common FUs.
behaviors, namely pick-up and delivery.
(2) Learning Algorithm: Based on GATNE [2], an effective
GRL method for AMHEN, an enhanced attributed heterogeneous
network embedding (EATNE) approach tailored for OFD is derived
to obtain FU embeddings. First, given the fact that couriers move
within a confined region3 in a city, a region-congregated nega-
tive sampling mechanism is proposed as an enhancement over
traditional randomized negative sampling to improve algorithm
performance. Second, we employ a customized margin ranking
loss instead of cross-entropy used by GATNE, aiming to refine
embedding quality. Last, to address dispersed order distribution
and limited FU coverage in SC behaviors, we build a cold start mit-
igation mechanism, using geographic information to generate
embeddings of FUs previously unseen, thus broadening coverage.
(3) MOA Search Space Refinement and OFD Application: Figure 4: Illustration of AMHEN Construction, including 2
Utilizing FU embedding, we reconstruct the order combination and sessions. Session A contains 3 orders for FUs DE, FB and FC.
courier recall mechanisms within Meituan’s dispatch system, facili- The pick-up FU sequence is DE->FC->FB. And the delivery FU
tating superior real-time order pooling. Our use of SCDN refines sequence is DE->FB->FC. Session B follows the same process.
order structure profiles and pinpoints scale-effect hotspots within
MOA’s vast search space, uncovering independent and small-scale To capture shared experiences of SCs, by treating FU as nodes
subspaces for thorough and high-quality order pooling. Accord- and their connections in the FU sequence as links, we can integrate
ingly, an innovative delivery mode is developed to enhance courier all the FU sequences into a unified yet heterogeneous graph. More-
efficiency without compromising service reliability. over, it is crucial to utilize the rich temporal and spatial information
To our knowledge, this is the first application of GRL methods in to enhance learning accuracy, e.g. average historical order amount
achieving real-time order pooling in OFD, now deployed in Meituan and delivery distance of each FU, which makes the above graph an
Waimai’s dispatch system. Online tests shows significant improve- AMHEN . More about node attributes is in Appendix C.
ment in order pooling. The total MD score of the MOA problem is Denote AMHEN by 𝐺 = (𝑉 , 𝐸, 𝐴), where 𝑉 is the FU node set,
improved by 5.3%, indicating more efficient order assignments with 𝐴 is the attribute set for all nodes. FU node 𝑣𝑖 ∈ 𝑉 owns fruitful
reduced detours and overtime risks. The newly-built mode cut the attributes x𝑖 ∈ 𝐴 to describe its crucial characters. 𝐸 = (𝐸 𝑝 , 𝐸𝑑 ) is
average incremental pick-up time for couriers 4 during noon peak the set of edges, which contains two types: pick-up and delivery.
by 51% and delivery time by 21%. These enhancements have led Specifically, there may be two types of edges between the FU nodes
to a 45-55% boost in efficiency, maintaining consistent work hours 𝑝
𝑣𝑖 and 𝑣 𝑗 , where 𝑒𝑖 𝑗 ∈ 𝐸 𝑝 indicates a pick-up edge and 𝑒𝑖𝑑𝑗 ∈ 𝐸𝑑 a
and on-time delivery standards.
delivery one. If two orders, belonging to FU nodes 𝑣𝑖 and 𝑣 𝑗 , are
successively picked up by the same SC, there exists a pick-up edge
2 GRAPH REPRESENTATION LEARNING 𝑝
𝑒𝑖 𝑗 connecting 𝑣𝑖 and 𝑣 𝑗 . Similarly, a delivery edge 𝑒𝑖𝑑𝑗 indicates
APPROACH there exist orders of FU nodes 𝑣𝑖 and 𝑣 𝑗 that are consecutively
In this section, we will detail the step-by-step process by which the delivered by the same SC. Hence, an AMHEN is constructed by
FU embeddings are acquired. merging massive records from tens of thousands of SCs.
where 𝜏 ∈ {𝑝, 𝑑 } indicates the edge type, 𝑠 is the dimension of edge are constructed by random sampling from pick-up and delivery
embeddings, and N𝑖,𝜏 is the neighbors of node 𝑣𝑖 on edge type 𝜏. FU pairs in the same delivery region but excluding positive pairs,
(0) respectively. In other words, we select k-hop (k>2) neighbors of the
The initial edge embedding u𝑖,𝜏 is parameterized as the function of
(0) FU node that share the same confined region as the challenging
attributes x𝑖 : u𝑖,𝜏 = g𝜏 (x𝑖 ), where g𝜏 is a transformation function.
negative samples to enable the effective training of the proposed
The aggregator function is mean operation in practice. model. Traditional GATNE uses randomized negative sampling, yet
(𝐾 )
We denote the 𝐾-th level edge embedding u𝑖,𝜏 by u𝑖,𝜏 . Then the ignores the regional effects in OFD. We find that the performance
pick-up edge embedding u𝑖,𝑝 and the delivery edge embedding u𝑖,𝑑 of GATNE decreases as the negative sampling scope expands and
of node 𝑣𝑖 are combined as U𝑖 = u𝑖,𝑝 , u𝑖,𝑑 . Given that the pick- the effect becomes almost random as it reaches the city size.
up edge and delivery edge have different impacts, self attention Margin Ranking Loss. The learning task is to make the repre-
mechanism is used to calculate the weights a𝑖,𝜏 ∈ a𝑖,𝑝 , a𝑖,𝑑 . sentation of positive FU pairs lying nearby in the embedding space,
⊤ and the negative pairs different. However, achieving this with cross-
a𝑖,𝜏 = softmax w𝜏⊤ tanh (W𝜏 U𝑖 ) , (2)
entropy can be challenging. Therefore, a customized optimization
where w𝜏 ∈ R𝑑𝑎 , W𝜏 ∈ R𝑑𝑎 ×𝑠 are trainable parameters for edge objective based on margin ranking loss is proposed to maximize
type 𝜏. Thus, the overall embedding of node 𝑣𝑖 for pick-up edge the distance between positive and negative samples in Equation
v𝑖,𝑝 and delivery edge v𝑖,𝑑 can be computed as: 5, where 𝛾𝑝𝑃 , 𝛾𝑑𝑃 , 𝛾𝑝𝑁 and 𝛾𝑑𝑁 are hyperparmeters representing the
weights of various data sets, 𝑚𝑝 and 𝑚𝑑 are the minimum distance
v𝑖,𝑝 = h (x𝑖 ) + 𝛼𝑝 a𝑖,𝑝 M𝑝⊤ u𝑖,𝑝 + 𝛽𝑝 g𝑝 x𝑖 , (3)
between negative pairs for pick-up and delivery, and cos represents
v𝑖,𝑑 = h (x𝑖 ) + 𝛼𝑑 a𝑖,𝑑 M𝑑⊤ u𝑖,𝑑 + 𝛽𝑑 g𝑑 x𝑖 , (4) the cosine similarity between FU embeddings.
where 𝛼𝑝 and 𝛼𝑑 indicate importance of pick-up and delivery edge
embeddings, respectively, characterizing how pick-up and delivery 𝛾𝑝𝑃 ∑︁
behaviors affect courier efficiency. M𝑝 , M𝑑 ∈ R𝑠 ×𝑑 are trainable 𝐿= (1 − cos(v𝑖,𝑝 , v 𝑗,𝑝 ))
|𝐷𝑝𝑃 |
parameters. 𝛽𝑝 and 𝛽𝑑 control the importance of node attributes. (𝑣𝑖 ,𝑣 𝑗 ) ∈𝐷𝑝𝑃
The FU embedding v𝑖 is the average of v𝑖,𝑝 and v𝑖,𝑑 . The detailed 𝛾𝑑𝑃 ∑︁
implementation of EATNE can be found in Appendix D. + (1 − cos(v𝑖,𝑑 , v 𝑗,𝑑 ))
|𝐷𝑑𝑃 |
(𝑣𝑖 ,𝑣 𝑗 ) ∈𝐷𝑑𝑃
𝛾𝑝𝑁 ∑︁
+ max 0, cos(v𝑖,𝑝 , v 𝑗,𝑝 ) − 𝑚𝑝
|𝐷𝑝𝑁 |
(𝑣𝑖 ,𝑣 𝑗 ) ∈𝐷𝑝𝑁
𝛾𝑑𝑁 ∑︁
+ max 0, cos(v𝑖,𝑑 , v 𝑗,𝑑 ) − 𝑚𝑑 , (5)
|𝐷𝑑𝑁 |
(𝑣𝑖 ,𝑣 𝑗 ) ∈𝐷𝑑𝑁
partition the city network into separate regional groups for parallel
training at regional group level.
The models are trained using 4 weeks of data across the country.
They are trained for less than 2 weeks on 4 NVIDIA Tesla V100 Figure 7: The main execution process of the dispatch system
GPUs with 32GB of memory each, and the models get updated in each dispatch cycle.
every 2 weeks.
3.2 Information Mining 3.3.1 Order Combination and Courier Recall. The MOA problem of
Leveraging the FU embeddings, we’ve created a set of indices. our system is now solved by well-crafted constructive heuristics, i.e.
(1) High-quality pooling probability (HPP) quantifies how imitation learning-enhanced iterated matching algorithm (ILIMA)
well multiple orders can be consolidated together, sharing common [3], since metaheuristic algorithms with in-depth search fail to
pick-up and delivery times and travel distances. Since two FUs that meet the real-time requirements [35]. Meanwhile, a few orders are
consecutively appear in the SC behavior sequence often possess combined in mutually exclusive groups based on the closeness of
the above traits, this metric is calculated by the cosine similarity be- their origins and destinations, as well as promised delivery time,
tween the FU embeddings of these orders, reflecting the frequency before MD score evaluation. However, the real-time performance
of consecutive co-occurrence of the two FUs in SC behavior data. severely restricts the search depth of the algorithm, resulting in
insufficient and suboptimal order pooling.
𝑝𝑖 𝑗 = 𝑐𝑜𝑠 (v𝑖 , v 𝑗 ), ∀𝑖, 𝑗 ∈ 𝑉 (6) With SCDN, we develop scalable mechanisms for courier recall
Orders with high HPP values can be consolidated and assigned to and order combination, which can cut down the MOA search space,
the same courier to achieve efficient delivery. and let us focus our limited computation time on promising areas.
(2) FU efficiency indicator (FEI) measures how much an or- Generally, orders with high HPP are formed as favorable combi-
der in this FU improves efficiency, based on how likely it is to be nations in advance, which can greatly expand the proportion of
combined with orders from other FUs to form an efficient delivery combined orders. Order combinations with low HPP and couriers
sequence. It is calculated by the weighed aggregate of HPPs for the whose on-hand orders mostly share low HPP with the new order
FU and its neighbouring FUs that share same or nearby pick-up or are filtered out. Hence, we can facilitate high-quality order pooling
delivery AOIs. The weights are determined by the order volume of in real time, without obvious increase in score calculation volume
those neighboring FUs. and computation time.
∑︁ Order Combination. Based on HPP, high-quality order combi-
𝜂𝑖 = 𝑝𝑖 𝑗 × 𝑤𝑖 𝑗 , ∀𝑖 ∈ 𝑉 (7)
nations can be identified and incorporated into ILIMA as expanding
𝑗 ∈𝑉𝑖
decision entities rather than single orders. As illustrated in Figure 8,
The higher FEI values, the more likely for the order to be efficiently on one hand, order combinations with very low HPP can be pruned
pooled with other orders, thus improving courier efficiency. FEI to avoid unnecessary score calculation. On the other hand, since
values are normalized at the city level for ease of comparisons. top-tier order combinations found by high HPP should be pooled
(3) Scale-effect hotspot (SEH) for OFD refers to a local net- to the same courier, other combinations containing partial orders,
work of geographically proximate FUs, wherein the marginal cost and conflicting orders themselves can be removed from the search
and time of delivery for couriers fulfilling orders in this network space. It can guide ILIMA to search deeply and effectively without
progressively diminishes, allowing for comprehensive order con- obviously increasing score calculation volume.
solidation within promised delivery time. In accordance, FUs in an Courier Recall. When retrieving available couriers for an order
SEH should have high FEI values, and any pair of FUs in the same (combination), we calculate the average value of HPP between it
SEH exhibit a relatively high HPP. And the total order volume for and the courier’s on-hand orders, to quickly estimate MD between
each SEH should exceed certain criteria. the order (combination) and the courier, instead of time-consuming
𝜂𝑖 > 𝑇ℎ𝑟𝑒𝜂 ; score calculations. For the on-hand orders already picked up by the
𝑆 = 𝑖 ∈𝑉 (8)
𝑝𝑖 𝑗 > 𝑇ℎ𝑟𝑒𝑝 , ∀𝑖, 𝑗 ∈ 𝑆 courier, its FU can be considered as the FU starting from the AOI
Harvesting Efficient On-Demand Order Pooling from Skilled Couriers Conference acronym ’XX, June 03–05, 2018, Woodstock, NY
where the courier is currently located and ending at its delivery AOI.
This further helps to prune the MOA search space and reduce real-
time computational pressure while maintaining solution quality, as
shown in Figure 9. Figure 10: Promising MOA search subspace described by SEH.
FU pair where one runs alongside the other, and (4) head-to-tail FU
pair, with the tail one pointing high-order-density AOIs, leading to
less courier empty run time 6 after completing deliveries. Orders in
these FU pairs can be pooled for simultaneous delivery to improve
courier efficiency. Meanwhile, we also identify FU pairs with low
HPP. Figure 13(b) illustrates four cases of this situation, including
(1) FU pair with the same delivery AOI but pick-up AOIs located far
apart, (2) reverse parallel FU pair, (3) FU pair where one FU runs
alongside the other but points a low-order-density AOI, leading to
longer courier empty run time, and (4) head-to-tail FU pair that also
leads to a low-order-density area. These FU pairs are unlikely to be
Figure 11: The convergence curve for different algorithms. efficiently pooled together and may undermine courier efficiency.
4.1.2 FU Embedding Effectiveness. To evaluate the effectiveness of 4.2 Order Combination and Courier Recall
FU embeddings, we examine the training results via the data of the The proposed method, ILIMA + SCDN, is evaluated against the
same district in Beijing. First, by performing DBSCAN clustering current online implementation, which utilizes ILIMA with ruled
on learned embeddings, we evaluate if geographical similarity is batching method, and MNDS, a metaheuristic algorithm used in [3].
encoded. Figure 12, which shows resulting 33 clusters, confirms the Experiments are conducted in a mid-sized Chinese city, involving
FUs from close locations are clustered together in the hidden space. around 500 orders and 2,500 couriers in a dispatch cycle during
noon peak.
The comparison results on both computational cost and solution
quality are presented in Table 2. The ILIMA+SCDN approach en-
hances the total MD score of MOA solutions by 5.3% compared to
ILIMA+Rule method, without incurring a significant increase in
time consumption. However, it lags by 1.2 𝑝𝑝 behind MNDS. De-
spite this, MNDS requires exploration of a much larger search space
and massive PDRP calculations, which takes over 20 seconds on
average, making it unsuitable for online use. Hence, the proposed
method excels at balancing computational time and solution qual-
ity, securing more optimal MOA solutions in real-time. Moreover,
Figure 14 illustrates that the overall combination level grows as
Figure 12: FU embedding clusters of a district in Beijing on the percentage of couriers assigned only one order decreases by
map (left) and after T-SNE (right). 16.3 𝑝𝑝. This shift results in increasing order consolidation. Online
A/B test show that while maitaining delivery experience, couirer
efficiency, i.e.orders completed per hour, is augmented by 3.7%.
Next we demonstrate high-quality pooling potential can be cap-
Table 3 presents the results of offline experiments conducted
tured by FU embedding similarity, i.e. HPP. Figure 13(a) shows four
with varying order volumes. In different order size scenarios, the
cases of FU pairs with high HPP, including (1) FU pair with pick-up
and delivery AOIs located closely, (2) nearby parallel FU pair, (3) 6 Empty run time refers to the empty cruising time before carriers deliver next orders.
Harvesting Efficient On-Demand Order Pooling from Skilled Couriers Conference acronym ’XX, June 03–05, 2018, Woodstock, NY
(a) ILIMA+Rule. (b) ILIMA+SCDN. (c) MNDS. Figure 15: SEHs over time, with each image capturing half an
hour. Colors denote different areas, bold lines for internal
SEH FUs, and thin lines for external FUs.
Figure 14: Combination level distribution.
proposed ILIMA+SCDN method significantly enhances the MD been reduced by 51% and delivery time by 21%. These enhancements
score over the existing ILIMA+Ruled method. Regarding PDRP lead to a 45-55% boost in courier efficiency, i.e.orders completed
Calculations, for orders fewer than 400, our proposed ILIMA+SCDN per hour, while maintaining consistent work hours and on-time
method demonstrates lower PDRP Calculations compared to the delivery standards. Figure 16 illustrates the superior performance
ILIMA+Ruled method. Nevertheless, as the order volume escalates, of SEH mode against city average level in noon peak, where each
the computational burden of both methods exhibits nearly linear bar corresponds to the trial performance of a specific courier.
growth, aligning with the online time requirements.
Method (0, 200] (200, 400] (400, 600] (600, 800] (800, 1000]
MD Score Improvement
ILIMA+Ruled 0% 0% 0% 0% 0%
ILIMA+SCDN 1.0% 4.0% 4.4% 5.5% 3.7%
MNDS 1.7% 5.3% 5.3% 6.9% 5.6%
PDRP Calculations
ILIMA+Ruled 4,285 21,910 37,323 57,700 79,596
ILIMA+SCDN 4,250 20,589 38,358 65,011 94,292
Algorithm 1 EATNE for OFD (1) average order volume of FU, and the corresponding pick-up
Input: Network 𝐺; Embedding dimension 𝑑; Edge embedding and delivery AOIs in the scenario for last 30 days;
dimension 𝑠; Window size 𝑐; Learning rate 𝜂; Marigin loss min (2) average meal-waiting and pick-up time duration of the corre-
distance 𝑚𝑝 , 𝑚𝑑 ; coefficient 𝛼, 𝛽, 𝛾𝑝𝑃 , 𝛾𝑑𝑃 , 𝛾𝑝𝐷 , 𝛾𝑑𝐷 . sponding pick-up AOI in the scenario for last 30 days ;
Output: Embedding vi , and Embeddding v𝑖,𝑝 and v𝑖,𝑑 on the (3) average delivery time duration of the corresponding delivery
pick-up and delivery edge for all 𝑣𝑖 ∈ 𝑉 . AOI in the scenario for last 30 days ;
1: Initialize all the model parameters 𝜃 . (4) average delivery distance of the FU;
2: Generate positive data sets D𝑝𝑃 and D 𝑃 by random walk on (5) average FU delivery period of time since consumers order in
𝑑 the scenario for last 30 days;
the pick-up and delivery edge, respectively.
3: Randomly sample FU pairs within the same delivery region,
(6) type and number of natural barriers (e.g. bridge, river, high-
then add to negative data set D𝑝𝑁 and D𝑑𝑁 . way) along the FU path;
4: while not converged do
(7) latitudes and longitudes of the center points of the corre-
sponding pick-up and delivery AOIs;
5: for each FU pair in D𝑝𝑃 , D𝑑𝑃 do
(8) the proportion of SCs who chose the corresponding pick-up
6: Calculate v𝑖,𝑝 and v𝑖,𝑑 using Equation (4) and (5) respec-
and delivery AOIs as their preferred locations for the scenario in
tively;
the past 30 days.
7: Sample 𝑚 negative samples and calculate loss value
using Equation (6).
D IMPLEMENTATION OF EATNE ALGORITHM
8: Update model parameters 𝜃 by 𝜕𝐸 𝜕𝜃 .
9: end for The proposed EATNE algorithm is summarized in Algorithm 1.
10: end while
E EATNE MODEL PARAMETER
11: Set vi as the average of v𝑖,𝑝 and v𝑖,𝑑 .
CONFIGURATION
The detailed parameter setting is shown in Table 4. We employ
After getting all these MD scores, the MOA problem can be for- the Adam optimizer with default settings for training. The model
mulated into an integer programming problem in Equation (9). The implements early stopping if there’s no improvement in the ROC-
objective function is to minimize the total MD scores for different AUC on the validation set within a single training epoch.
𝑔,𝑜¯
goals, and 𝑓𝑡,𝑟 is the MD score of assigning order combination 𝑜¯
to courier 𝑟 at time 𝑡 for goal 𝑔, 𝑐𝑜𝑚𝑏 (𝑂𝑡 ) refers to all the possible Table 4: Parameter configuration of EATNE model.
𝑔
combinations constructed by orders in 𝑂𝑡 , 𝜂𝑡 is the weight of goal
𝑔 in the objective function at time 𝑡. The constraint is to make sure Notation Description Setting Value
each combination 𝑜¯ can only be assigned to one courier and only
𝑑 base embedding dimension 200
one combination of each order can be selected. 𝑜¯ (𝑜) represents the
𝑠 edge embedding dimension 20
order combination containing order 𝑜.
𝑙 random walk length 10
B DEFINITION OF SKILLED COURIER AND 𝑐 sampling window size 3
𝑚𝑝 , 𝑚𝑑 margin loss min distance 0.3
SELECTION CRITERIA OF ROUTE SESSIONS 𝜂 learning rate 0.001
As mentioned above, SC refers to the couriers with relatively high 𝛼𝑝 , 𝛼𝑑 ,𝛽𝑝 ,𝛽𝑞 edge weights 1
efficiency, currently set top rank 5%-35% in a delivery region. It 𝛾𝑝𝑃 , 𝛾𝑑𝑃 , 𝛾𝑝𝐷 , 𝛾𝑑𝐷 weights in loss objective 1
should be noted that in order to prevent extreme cases from affect-
ing the validity of the learning outcomes, the top 5% of couriers
have been excluded.
The SC route sessions of both pick-up and delivery type, for con- F IMPLEMENTATION DETAILS OF ORDER
structing the network are selected based on the following criteria: COMBINATION AND COURIER RECALL.
(1) time interval between the execution of two consecutive orders The MOA problem in our system is now solved using a constructive
less than 30 minutes; heuristic framework. The process during each dispatch cycle may
(2) no overtime orders; require multiple iterations. Let 𝑂 𝑘 denote the set of pending orders
(3) no speeding behaviours; during iteration 𝑘, with 𝑂 0 = 𝑂 initially, where 𝑂 represents all
(4) no orders with negative feedback reported. pending orders during this dispatch cycle. And at iteration 𝑘,
Then based on the carefully selected sessions of SCs, we construct (1) Evaluation stage: For the pending orders 𝑂 𝑘 and their as-
the corresponding AMHEN using the method outlined in Section 2. sociated recalled courier candidates 𝑅𝑜𝑘 , 𝑜 ∈ 𝑂 𝑘 , MD scores
{{𝑓𝑜𝑟 }𝑟 ∈𝑅𝑘 }𝑜 ∈𝑂 𝑘 are calculated.
C FU NODE ATTRIBUTES IN AMHEN 𝑜
(2) Matching stage: Based on current MD scores, a one(order)-
We incorporate rich spatial and temporal information as attributes to-one(courier) assignment decision is made following greedy
of a FU node, for a specific scenario (i.e., weekday/weekend, peak/i- policy (aiming to optimize the sum of MD scores for all
dle time), mainly including: matching relations at the current iteration). This may result
in only a subset 𝑂 𝑘 being successfully assigned.
Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Yile Liang et al.
(3) Termination condition: Denote the remaining unassigned If either 𝑓e𝑜𝑟1 or 𝑓e𝑜𝑟2 is lower than threshold 𝑃2 , courier 𝑟 will
𝑘 𝑘
orders as 𝑂 . If 𝑂 = ∅, stop the iterations. Otherwise, up- be removed from the candidate set, i.e. 𝑅𝑐𝑘 = 𝑅𝑐𝑘 − {𝑟 }.
date the state of couriers by including newly assigned orders, (3) For orders in 𝑂 𝑘 and combinations in 𝐶c𝑘 , calculate the MD
𝑘
let 𝑂 𝑘+1 = 𝑂 , 𝑘 = 𝑘 + 1, proceed to Step (1). scores with their refined couriers.
The above process is illustrated as in Figure 9. And in practice, 𝑃2
F.1 Order Combination Mechanism. is set to 0.5.
Although the above algorithm has good performance in solving,
it tends to promote one-to-one assignment results, which is not G SEH IDENTIFICATION APPROACH
conducive to sufficient order pooling. To facilitate many-to-one as- We utilize BP to identify SEHs during each time interval from FUs
signments, high-quality and mutually exclusive order combinations with high FEI in a city or nearby areas. In this section, we introduce
are identified based on HPP, and incorporated into the algorithm as the variable definitions, objective function, and constraints of the
expanding entities rather than single orders. The evaluation stage model.
𝑔
at iteration 𝑘 is executed as follows: The decision variable 𝑥 𝑓 represents whether FU 𝑓 belongs to
(1) For pending orders 𝑂 𝑘 , calculate the HPPs between the FUs SEH 𝑔. To calculate the average HPP in each SEH, we introduce
𝑔
of any two orders and denote the combination set as 𝐶 𝑘 . Set a binary auxiliary variable 𝑦 𝑓 ,𝑓 ′ , which indicates whether FU 𝑓
the order combination set preserved for MD evaluation at and 𝑓 ′ belong to SEH 𝑔 simultaneously. The objective function in
iteration 𝑘 as 𝐶c𝑘 = ∅. Equation (10) is to maximize the average HPP in each SEH, where
(2) Prune two-order combinations with low HPPs (𝑝𝑜 1 ,𝑜 2 < 𝑃1 ), 𝑝 𝑓 ,𝑓 ′ is the HPP between FU 𝑓 and 𝑓 ′ .
i.e., let 𝐶 𝑘 = 𝐶 𝑘 − 𝐶𝑙𝑜𝑤
𝑘 . The constraint in Equation (11) limits each FU to appear in only
one SEH. Equation (12) limits the minimum and maximum number
(3) Repeat this step until 𝐶 𝑘 = ∅: pickup 𝑐 = {𝑜 1, 𝑜 2 } ∈ 𝐶 𝑘
of FUs in each SEH. Equation (13) limits the minimum number
with the highest HPP value, let 𝐶c𝑘 = 𝐶c𝑘 + {𝑐}. Then remove of orders in each SEH, where 𝑛 𝑓 is the number of orders of FU 𝑓 .
its related entries in 𝐶 𝑘 and 𝑂 𝑘 , i.e, let 𝐶 𝑘 = 𝐶 𝑘 − {𝑐 |𝑜 1 ∈ 𝑔
Equation (14) and Equation (15) ensure that 𝑦 𝑓 ,𝑓 ′ = 1 if and only if
𝑐 or 𝑜 2 ∈ 𝑐, 𝑐 ∈ 𝐶 𝑘 }, 𝑂 𝑘 = 𝑂 𝑘 − {𝑜 1 } − {𝑜 2 }. 𝑔 𝑔
𝑥 𝑓 = 𝑥 𝑓 ′ = 1. Equation (16) constrains the minimum average HPP
(4) Use 𝐶c𝑘 and 𝑂 𝑘 as decision entities and calculate the MD
scores with their associated couriers. in each SEH 𝑔. Equation (17) and Equation (18) ensure that all the
decision variables are binary.
The above process is illustrated in Figure 8. And in practice, 𝑃1 is
set to 0.6. Í Í 𝑔
∑︁ 𝑓 ∈𝐹 ′
𝑓 ∈𝐹,𝑓 ≠𝑓
′ 𝑝 𝑓 ,𝑓 ′ × 𝑦 ′
𝑓 ,𝑓
F.2 Courier Recall Mechanism. max Í Í 𝑔 (10)
𝑓 ∈𝐹 ′ ′ 𝑦 ′
𝑔∈𝐺 𝑓 ∈𝐹,𝑓 ≠𝑓 𝑓 ,𝑓
To reduce MD score calculation volume, we can further refine the ∑︁ 𝑔
courier candidates recalled for each order/order combination using 𝑠.𝑡 . 𝑥 𝑓 = 1, ∀𝑓 ∈ 𝐹 (11)
HPP. For the evaluation stage at iteration 𝑘 , the pending entity 𝑔∈𝐺¯
∑︁ 𝑔
sets are 𝐶c𝑘 and 𝑂 𝑘 , and the courier recall mechanism is executed |𝑔| min ≤ 𝑥 𝑓 ≤ |𝑔| max ,∀𝑔 ∈ 𝐺 (12)
as follows: 𝑓 ∈𝐹
(1) For 𝑜 ∈ 𝑂 𝑘 , denote the corresponding courier candidate set ∑︁ 𝑔
𝑛 𝑓 × 𝑥 𝑓 ≥ 𝑁 , ∀𝑔 ∈ 𝐺 (13)
as 𝑅𝑜𝑘 . For 𝑟 ∈ 𝑅𝑜𝑘 , if the on-hand order set 𝑂𝑟𝑘 ≠ ∅, calculate
𝑓 ∈𝐹
the average HPP of 𝑜 and orders in 𝑂𝑟𝑘 as an estimation of 𝑔 𝑔 𝑔
MD score, i.e. 𝑓e𝑜𝑟 = 1𝑘 𝑜 ′ ∈𝑂 𝑘 𝑝𝑜,𝑜 ′ .
Í 𝑦 ′ ≥ 𝑥𝑓 + 𝑥 ′ − 1 (14)
𝑓 ,𝑓 𝑓
|𝑂𝑟 | 𝑟
𝑔 𝑔 𝑔 𝑔
For the on-hand order already picked up by courier 𝑟 , its FU 𝑦 ′ ≤ 𝑥𝑓 , 𝑦 ′ ≤ 𝑥 ′ (15)
𝑓 ,𝑓 𝑓 ,𝑓 𝑓
can be considered as the FU starting from the AOI where the ∑︁ ∑︁ 𝑔
courier is currently located and ending at its delivery AOI. (𝑝 𝑓 ,𝑓 ′ − 𝑃) × 𝑦 ′ ≥ 0, ∀𝑔 ∈ 𝐺 (16)
′ ′
𝑓 ,𝑓
For the on-hands whose FU embedding is absent, the associ- 𝑓 ∈𝐹 𝑓 ∈𝐹,𝑓 ≠𝑓
ated HPP is set as 0. 𝑔
𝑥 𝑓 ∈ {0, 1}, ∀𝑓 ∈ 𝐹, ∀𝑔 ∈𝐺 (17)
If 𝑓e𝑜𝑟 is lower than threshold 𝑃2 , courier 𝑟 will be removed ′ ′
𝑔
from the candidate set, i.e. 𝑅𝑜𝑘 = 𝑅𝑜𝑘 − {𝑟 }. 𝑦 ′ ∈ {0, 1}, ∀𝑓 , 𝑓 ∈ 𝐹, 𝑓 ≠ 𝑓 , ∀𝑔 ∈ 𝐺 (18)
𝑓 ,𝑓
(2) For 𝑐 = {𝑜 1, 𝑜 2 } ∈ 𝐶c𝑘 , denote the corresponding courier
candidate set as 𝑅𝑐𝑘 , which is the intersection of courier can-
didate sets of 𝑜 1 and 𝑜 2 . For 𝑟 ∈ 𝑅𝑐𝑘 , calculate the average
HPP of 𝑜 1 and 𝑜 2 as Step (1), respectively.