Using A Multi-Armed Bandit With Thompson Sampling To Identify Responsive Dashers
Using A Multi-Armed Bandit With Thompson Sampling To Identify Responsive Dashers
Arjun Sharma
ML approaches we considered
One possible approach is to treat this as a supervised learning
classification problem. We can use past data that is labeled – for
example, we see whether a Dasher historically has signed on to dash
when invited – and try to create a model that predicts a driver’s
probability of dashing now when sent a message under a given set of
features.
https://doordash.engineering/2022/03/15/using-a-multi-armed-bandit-with-thompson-sampling-to-identify-responsive-dashers/ 2/11
6/30/24, 12:05 PM Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers
Another problem with this approach is that it only optimizes for the
probability of dashing when a message is sent. With this approach, we
would only be sending messages to Dashers we already know are likely
to convert. There would be no basis to send messages to other Dashers,
giving them a chance to self-identify as responsive Dashers.
Because of our constraints and what we are optimizing for, there are
multiple benefits to using a bandit algorithm instead of supervised
learning. We can construct a bandit-like procedure that allows us to
dynamically explore Dashers to message, over time identifying and
optimizing on Dashers who respond to messages. This approach would
allow us to dynamically allocate traffic to Dashers who are more
responsive.
https://doordash.engineering/2022/03/15/using-a-multi-armed-bandit-with-thompson-sampling-to-identify-responsive-dashers/ 3/11
6/30/24, 12:05 PM Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers
• Pros:
• Easy to understand and implement
• Makes it easier to prioritize known Dashers based on their likelihood
to respond to messages
• Cons:
• Because we have to define this constant epsilon percentage, it does
not improve over time. We can explore too little early on and too
much later in the process
• Experimentation is not dynamic; no matter what we have learned
about Dashers’ preferences, we are always exploring at a fixed
percentage
• Pros:
• Finds the best-performing Dashers quickly
• Once there’s enough data, starts to optimize sending messages to
responsive Dashers instead of exploring
• Cons:
• Difficult to communicate the strategy to stakeholders about why a
specific action was taken
• When there is an excess of new Dashers, this method could end up
only messaging new Dashers until enough signal is received
https://doordash.engineering/2022/03/15/using-a-multi-armed-bandit-with-thompson-sampling-to-identify-responsive-dashers/ 4/11
6/30/24, 12:05 PM Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers
• Pros:
• Intuitive approach that counts the successes and failures of each
message sent to a Dasher
• Depending on the probability distribution used, we can take
advantage of the conjugate relationship between prior and
posterior probabilities and use a simple update rule to get the
posterior probability
• Easy to implement
• Finds best-performing Dashers quickly
• Cons:
• Requires manually setting priors for new Dashers; an approach like
UCB always includes Dashers we have not previously messaged
https://doordash.engineering/2022/03/15/using-a-multi-armed-bandit-with-thompson-sampling-to-identify-responsive-dashers/ 5/11
6/30/24, 12:05 PM Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers
Our last decision when defining the prior was whether to start at pure
exploration -- uniform distribution – or use past data to inform our prior.
We chose to inform each Dasher’s prior with previous messages and
conversion data to speed up the convergence of the distributions. We
apply a weight-decay parameter on previous observations to favor
recent data over historical observations. This way, when we start the
experiment, the bandit has a head start on Dasher preferences without
biasing too heavily to old – and potentially stale – data.
https://doordash.engineering/2022/03/15/using-a-multi-armed-bandit-with-thompson-sampling-to-identify-responsive-dashers/ 6/11
6/30/24, 12:05 PM Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers
send out a message, we let the bandit sample all prior distributions to
give us the probability of converting when messaged. We then rank the
Dashers in descending order by their sampled value and take the top
Dashers whose sampled value is greater than a predetermined threshold
so that we don’t message Dashers who the bandit has determined won’t
convert. We define the number of Dashers to contact by first
determining how many are needed to resolve the current shortage. We
then divide that number by the average conversion rate for Dashers in
that location. The bandit then can message the Dashers who it has
determined are most likely to get on the road.
Results
Currently, we are running experiments to test this bandit framework
against our previous random sampling method. We are using a
switchback experiment to measure the impact that improved message
targeting has on the overall supply/demand balance for a given location.
Using this testing framework, we not only see if there is an increase in
Dashers who respond to messages, but we can also see what effect
these additional Dashers have on the market supply. So far, we have seen
an improvement in the conversion rate of messages sent in the bandit
framework, which has allowed us to send fewer messages than required
by our control variant. We are experimenting further to prove the
impact.
Conclusion
While we have tailored Thompson Sampling to a specific Dasher
scenario, this solution can work in many different scenarios. Companies
seeking to provide a personalized experience to all of their customers
may have limited data to figure out how to best accomplish that.
Thompson Sampling can help demonstrate which options give the
greatest reward in a non-stationary environment. The method works well
in a quickly changing business environment where there’s a need to
https://doordash.engineering/2022/03/15/using-a-multi-armed-bandit-with-thompson-sampling-to-identify-responsive-dashers/ 7/11
6/30/24, 12:05 PM Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers
While what we have done to date works well, there are many ways we
can improve upon this approach. Currently, we only consider whether a
Dasher signed on after receiving a message. But additional data lets us
know that Dashers’ preferences change based on their location, time of
day, day of week, and much more. Over time, we can encode this
information as contextual features so that the bandit can make even
smarter decisions.
Acknowledgements
This post is based in large part on the great work of our intern Hamlin Liu.
We are excited to have him join us full time in August!
Comments
Share on:
Popular Posts
Your Deep Links Might Be
Broken: Web Intents and
Android 12
10 Minute Read
https://doordash.engineering/2022/03/15/using-a-multi-armed-bandit-with-thompson-sampling-to-identify-responsive-dashers/ 8/11
6/30/24, 12:05 PM Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers
Building a Gigascale ML
Feature Store with Redis,
Binary Serialization, String
Hashing, and Compression
21 Minute Read
18 Minute Read
Subscribe
Related Positions
Director/Senior Director, Marketing
Analytics
SAN FRANCISCO, CA; SEATTLE, WA; NEW YORK, NY;
LOS ANGELES, CA; CHICAGO, IL; AUSTIN, TX;
WASHINGTON, D.C.
• • • • • • • • •
Get To Know Us
About us
Careers
Blog
Glassdoor
Accessibility
Account details
Order History
https://doordash.engineering/2022/03/15/using-a-multi-armed-bandit-with-thompson-sampling-to-identify-responsive-dashers/ 10/11
6/30/24, 12:05 PM Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers
Buy Gift Card
Help
Doing Business
Become a Dasher
Be a Partner Restaurant
https://doordash.engineering/2022/03/15/using-a-multi-armed-bandit-with-thompson-sampling-to-identify-responsive-dashers/ 11/11