0% found this document useful (0 votes)

44 views11 pages

Using A Multi-Armed Bandit With Thompson Sampling To Identify Responsive Dashers

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views11 pages

Using A Multi-Armed Bandit With Thompson Sampling To Identify Responsive Dashers

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

6/30/24, 12:05 PM Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers

Using a Multi-Armed Bandit

with Thompson Sampling to
Identify Responsive Dashers
 March 15, 2022  10 Minute Read  Machine Learning 22

Arjun Sharma

Maintaining Dasher supply to meet consumer demand is one of the

most important problems for DoorDash to resolve in order to offer timely
deliveries. When too few Dashers are on the road to fulfill orders, we take
reactive actions to persuade more Dashers to begin making deliveries.
One of the most effective things we can do is to message Dashers that
there are a lot of orders in their location and that they should sign on to
start dashing. Dashing during peak hours can mean a more productive
shift, higher earnings, and more flexibility in choosing which offers to
accept.

We need to optimize which Dashers to target with our messages

because approaching Dashers with no interest in dashing at that time
can create a bad user experience. Here we will describe a bandit-like
framework to dynamically learn and rank the preferences of Dashers
when we send out messages so that we can optimize our decisions
about who to message at a given time.
https://doordash.engineering/2022/03/15/using-a-multi-armed-bandit-with-thompson-sampling-to-identify-responsive-dashers/ 1/11
6/30/24, 12:05 PM Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers

Finding the best way to alert Dashers

about low supply
Currently we select Dashers to message by identifying who has been
active in a given location and then selecting recipients at random. While
this approach doesn’t overload specific Dashers with messages, it
doesn’t improve the conversion rate of Dashers coming onto the
platform after receiving a push notification.

We need to find a methodology that uses our information about Dasher

preferences while avoiding spamming Dashers who wouldn’t be
interested in receiving notifications at that time. This problem statement
lends itself to finding a machine learning approach that can:

• Identify current responsive Dashers who are more likely to convert

when asked to dash now
• Identify Dashers who aren’t interested in these messages so we can
avoid spamming them
• Identify new responsive Dashers so that we don’t overtax our existing
responsive Dashers
• Rank Dashers by their willingness to dash when contacted so we know
how to prioritize who to message at each send

ML approaches we considered
One possible approach is to treat this as a supervised learning
classification problem. We can use past data that is labeled – for
example, we see whether a Dasher historically has signed on to dash
when invited – and try to create a model that predicts a driver’s
probability of dashing now when sent a message under a given set of
features.

https://doordash.engineering/2022/03/15/using-a-multi-armed-bandit-with-thompson-sampling-to-identify-responsive-dashers/ 2/11
6/30/24, 12:05 PM Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers

While this approach is easy to frame as a binary classification model,

there are some issues with this approach. What if Dasher preferences
change over time? For example, a Dasher who is enrolled in college
could be very responsive during breaks, but largely unavailable once
school resumes. This type of non-stationary behavior would have to be
handled by the model trainer through retraining and heavily weighing
more recent observations.

Another problem with this approach is that it only optimizes for the
probability of dashing when a message is sent. With this approach, we
would only be sending messages to Dashers we already know are likely
to convert. There would be no basis to send messages to other Dashers,
giving them a chance to self-identify as responsive Dashers.

Because of our constraints and what we are optimizing for, there are
multiple benefits to using a bandit algorithm instead of supervised
learning. We can construct a bandit-like procedure that allows us to
dynamically explore Dashers to message, over time identifying and
optimizing on Dashers who respond to messages. This approach would
allow us to dynamically allocate traffic to Dashers who are more
responsive.

As Dasher preferences change over time, the algorithm can relearn

dynamically which Dashers would be most likely to convert. We can even
easily extend this framework to use a contextual bandit; if Dasher
preferences change based on time of day or day of week, the bandit can
be given these features as context to make more accurate decisions.

Next, we need to select which bandit framework to use in order to

allocate traffic to Dashers dynamically.

A trio of possible bandits

https://doordash.engineering/2022/03/15/using-a-multi-armed-bandit-with-thompson-sampling-to-identify-responsive-dashers/ 3/11
6/30/24, 12:05 PM Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers

There are multiple factors involved in determining which bandit to use.

We want an algorithm that explores enough to adjust to changing
Dasher preferences and yet still sends messages to Dashers who we
already know are responsive. Several algorithms come to mind as
possible choices:

The Epsilon-Greedy algorithm defines a parameter – epsilon – that

determines how much to explore sending messages to Dashers about
whom we don’t know as much.

• Pros:
• Easy to understand and implement
• Makes it easier to prioritize known Dashers based on their likelihood
to respond to messages
• Cons:
• Because we have to define this constant epsilon percentage, it does
not improve over time. We can explore too little early on and too
much later in the process
• Experimentation is not dynamic; no matter what we have learned
about Dashers’ preferences, we are always exploring at a fixed
percentage

The Upper Confidence Bound (UCB) bandit algorithm is based on the

principle of “optimism in the face of uncertainty,” which translates to
selecting the action that has the highest estimated reward.

• Pros:
• Finds the best-performing Dashers quickly
• Once there’s enough data, starts to optimize sending messages to
responsive Dashers instead of exploring
• Cons:
• Difficult to communicate the strategy to stakeholders about why a
specific action was taken
• When there is an excess of new Dashers, this method could end up
only messaging new Dashers until enough signal is received
https://doordash.engineering/2022/03/15/using-a-multi-armed-bandit-with-thompson-sampling-to-identify-responsive-dashers/ 4/11
6/30/24, 12:05 PM Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers

Thompson Sampling takes a Bayesian approach to the problem. We

assign a prior probability distribution to each Dasher that is updated to a
posterior probability after reviewing observations.

• Pros:
• Intuitive approach that counts the successes and failures of each
message sent to a Dasher
• Depending on the probability distribution used, we can take
advantage of the conjugate relationship between prior and
posterior probabilities and use a simple update rule to get the
posterior probability
• Easy to implement
• Finds best-performing Dashers quickly
• Cons:
• Requires manually setting priors for new Dashers; an approach like
UCB always includes Dashers we have not previously messaged

Why we chose Thompson Sampling

Given these three frameworks, we selected Thompson Sampling for its
intuitive approach and ease of implementation.

We started by defining our target function: Determining what the

probability is that a Dasher who receives a message will convert and sign
on to DoorDash immediately. After this, we needed to compute a prior
for each Dasher from which we could sample to decide who to message.
A prior is a probability distribution that models the probability that a
given Dasher will respond when messaged. Along with choosing an
appropriate prior, we also need to have a method for updating it given
new information. We used a beta distribution to do this because it
directly uses the number of successes (alpha) and number of failures
(beta) to create a distribution of success. By using the conjugate
relationship between beta prior and posterior distributions, we
developed an intuitive update rule – add to alpha if a Dasher converts or,

https://doordash.engineering/2022/03/15/using-a-multi-armed-bandit-with-thompson-sampling-to-identify-responsive-dashers/ 5/11
6/30/24, 12:05 PM Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers

if not, add to beta. As we update the distribution following each

message, the variance of the distribution shrinks as we become more
certain of the outcome.

Our last decision when defining the prior was whether to start at pure
exploration -- uniform distribution – or use past data to inform our prior.
We chose to inform each Dasher’s prior with previous messages and
conversion data to speed up the convergence of the distributions. We
apply a weight-decay parameter on previous observations to favor
recent data over historical observations. This way, when we start the
experiment, the bandit has a head start on Dasher preferences without
biasing too heavily to old – and potentially stale – data.

Next, we needed to tune a set of hyperparameters vital to modeling the

situation accurately. Among the steps we took were:

• Consider the length of each observation – over what time period

should we use to consider each observation? If it’s too short, we can’t
accumulate enough reward/penalty for each run. If too long, it takes
extra time to update the algorithm to find high-performing Dashers.
• How stationary is the problem? Dasher behavior changes over time, so
we must give greater weight to recent observations than those
recorded in the past. If a previously responsive Dasher ceases to
respond, we need to update our probability distribution quickly.
• What prior should we give new Dashers? It’s important to add new
Dashers to the algorithm without degrading our performance while
still giving them a chance to be selected so that we can learn if they
are a high-performing Dasher.
• Given that there's an imbalance in data (– a majority of many more
Dashers choose not to dash when messaged), – how much weight
should we give success vs. failure?

After defining our beta distribution, update rule, and these

hyperparameters, we are ready to use the bandit procedure to decide
which Dashers to message. In our experiment, whenever we are ready to

https://doordash.engineering/2022/03/15/using-a-multi-armed-bandit-with-thompson-sampling-to-identify-responsive-dashers/ 6/11
6/30/24, 12:05 PM Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers

send out a message, we let the bandit sample all prior distributions to
give us the probability of converting when messaged. We then rank the
Dashers in descending order by their sampled value and take the top
Dashers whose sampled value is greater than a predetermined threshold
so that we don’t message Dashers who the bandit has determined won’t
convert. We define the number of Dashers to contact by first
determining how many are needed to resolve the current shortage. We
then divide that number by the average conversion rate for Dashers in
that location. The bandit then can message the Dashers who it has
determined are most likely to get on the road.

Results
Currently, we are running experiments to test this bandit framework
against our previous random sampling method. We are using a
switchback experiment to measure the impact that improved message
targeting has on the overall supply/demand balance for a given location.
Using this testing framework, we not only see if there is an increase in
Dashers who respond to messages, but we can also see what effect
these additional Dashers have on the market supply. So far, we have seen
an improvement in the conversion rate of messages sent in the bandit
framework, which has allowed us to send fewer messages than required
by our control variant. We are experimenting further to prove the
impact.

Conclusion
While we have tailored Thompson Sampling to a specific Dasher
scenario, this solution can work in many different scenarios. Companies
seeking to provide a personalized experience to all of their customers
may have limited data to figure out how to best accomplish that.
Thompson Sampling can help demonstrate which options give the
greatest reward in a non-stationary environment. The method works well
in a quickly changing business environment where there’s a need to
https://doordash.engineering/2022/03/15/using-a-multi-armed-bandit-with-thompson-sampling-to-identify-responsive-dashers/ 7/11
6/30/24, 12:05 PM Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers

dynamically optimize traffic. With a single model, we get the advantages

of velocity, dynamic traffic allocation, and a solution that handles
changing behavior over time.

While what we have done to date works well, there are many ways we
can improve upon this approach. Currently, we only consider whether a
Dasher signed on after receiving a message. But additional data lets us
know that Dashers’ preferences change based on their location, time of
day, day of week, and much more. Over time, we can encode this
information as contextual features so that the bandit can make even
smarter decisions.

Acknowledgements
This post is based in large part on the great work of our intern Hamlin Liu.
We are excited to have him join us full time in August!

Comments 

Share on:   

Popular Posts
Your Deep Links Might Be
Broken: Web Intents and
Android 12

 10 Minute Read

https://doordash.engineering/2022/03/15/using-a-multi-armed-bandit-with-thompson-sampling-to-identify-responsive-dashers/ 8/11
6/30/24, 12:05 PM Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers

Building a Gigascale ML
Feature Store with Redis,
Binary Serialization, String
Hashing, and Compression

 21 Minute Read

Using ML and Optimization to

Solve DoorDash’s Dispatch
Problem

 18 Minute Read

Subscribe to stay up to date with the

lates engineering news and trends!

Related Positions
Director/Senior Director, Marketing
Analytics
SAN FRANCISCO, CA; SEATTLE, WA; NEW YORK, NY;
LOS ANGELES, CA; CHICAGO, IL; AUSTIN, TX;
WASHINGTON, D.C.

See All Jobs

Machine Learning Data Machine Learning

Powering Search & How DoorDash is S

Recommendations at Data Platform to De
DoorDash Customers and Me
Customers across North America come to Learn the challenges and bes
DoorDash to discover and order from a vast successfully growing a data p
selection of their favorite stores. Our mission i… organization

Aamir Mitchell Sudhir

Manasawala Koch Tonse

 8 Minute Read     26 Minute Read

• • • • • • • • •

Get To Know Us

About us

Careers

Blog

Glassdoor

Accessibility

Let Us Help You

Account details

Order History

https://doordash.engineering/2022/03/15/using-a-multi-armed-bandit-with-thompson-sampling-to-identify-responsive-dashers/ 10/11
6/30/24, 12:05 PM Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers
Buy Gift Card

Help

Doing Business

Become a Dasher

Be a Partner Restaurant

Get Dashers for Deliveries

How to Call Our API

Privacy Policy Terms and Conditions

  

https://doordash.engineering/2022/03/15/using-a-multi-armed-bandit-with-thompson-sampling-to-identify-responsive-dashers/ 11/11

Robotics Research Paper
100% (3)
Robotics Research Paper
23 pages
Multimedia Chapter 1 and 2
No ratings yet
Multimedia Chapter 1 and 2
22 pages
CCNA1 v7.0: ITN Practice PT Skills Assessment (PTSA) Answers
No ratings yet
CCNA1 v7.0: ITN Practice PT Skills Assessment (PTSA) Answers
1 page
Bayesian Bandit Core
No ratings yet
Bayesian Bandit Core
8 pages
CS181 P - A - : Roject New Exploration of The Multi Armed Bandit Problem
No ratings yet
CS181 P - A - : Roject New Exploration of The Multi Armed Bandit Problem
9 pages
Recommendation Proposal
No ratings yet
Recommendation Proposal
3 pages
Bandit Algorithms in Hyperparameter Tuning Extended Refreshed
No ratings yet
Bandit Algorithms in Hyperparameter Tuning Extended Refreshed
3 pages
Bandit Algorithms
No ratings yet
Bandit Algorithms
2 pages
RL Sem Ans
No ratings yet
RL Sem Ans
90 pages
Aifinal
No ratings yet
Aifinal
15 pages
How Time Flies For Gig Workers
No ratings yet
How Time Flies For Gig Workers
8 pages
RL Unit 1 - QA
No ratings yet
RL Unit 1 - QA
10 pages
Master Thesis On Mixed Model Bandits
No ratings yet
Master Thesis On Mixed Model Bandits
73 pages
Bandit Algorithms in Hyperparameter Tuning
No ratings yet
Bandit Algorithms in Hyperparameter Tuning
1 page
Multi-Armed Bandit Problem With Online Clustering As Side
No ratings yet
Multi-Armed Bandit Problem With Online Clustering As Side
13 pages
Adkdd24 Han Augmented
No ratings yet
Adkdd24 Han Augmented
6 pages
Real-Time Integrated Dispatching and Idle Fleet Steering With Deep Reinforcement Learning For A Meal Delivery Platform
No ratings yet
Real-Time Integrated Dispatching and Idle Fleet Steering With Deep Reinforcement Learning For A Meal Delivery Platform
46 pages
Introduction To Bandits: (Some Slides Stolen From Csaba's AAAI Tutorial)
No ratings yet
Introduction To Bandits: (Some Slides Stolen From Csaba's AAAI Tutorial)
16 pages
Practical Bandits: An Industry Perspective: Bram Van Den Akker Olivier Jeunen Ying Li
No ratings yet
Practical Bandits: An Industry Perspective: Bram Van Den Akker Olivier Jeunen Ying Li
5 pages
Chen Luo Yuan
No ratings yet
Chen Luo Yuan
70 pages
1 s2.0 S0968090X2300044X Main
No ratings yet
1 s2.0 S0968090X2300044X Main
30 pages
Deliver AI
No ratings yet
Deliver AI
22 pages
Stochastic Multi-Path Routing Problem With Non-Stationary Rewards: Building PayU's Dynamic Routing
No ratings yet
Stochastic Multi-Path Routing Problem With Non-Stationary Rewards: Building PayU's Dynamic Routing
6 pages
Rlassignment 2
No ratings yet
Rlassignment 2
3 pages
Contextual Bandits
No ratings yet
Contextual Bandits
34 pages
10 1287@trsc 2014 0569
No ratings yet
10 1287@trsc 2014 0569
16 pages
Multi-Armed Bandits and The Stitch Fix Experimentation Platform - Stitch Fix Technology - Multithreaded
No ratings yet
Multi-Armed Bandits and The Stitch Fix Experimentation Platform - Stitch Fix Technology - Multithreaded
12 pages
Reading 3-Russo & Van Roy 2014
No ratings yet
Reading 3-Russo & Van Roy 2014
24 pages
Online Learning For Causal Bandits
No ratings yet
Online Learning For Causal Bandits
7 pages
SSRN Id4823494 Code3832712
No ratings yet
SSRN Id4823494 Code3832712
52 pages
29117-Article Text-33171-1-2-20240324
No ratings yet
29117-Article Text-33171-1-2-20240324
8 pages
Project
No ratings yet
Project
47 pages
A Multi-Armed Bandit Approach To Hyperparameter Tuning: Bhishma Dedhia Swadha Sanghvi Santanu Rathod
No ratings yet
A Multi-Armed Bandit Approach To Hyperparameter Tuning: Bhishma Dedhia Swadha Sanghvi Santanu Rathod
43 pages
Anaytical Case Competition Deck
No ratings yet
Anaytical Case Competition Deck
4 pages
Yancey kdd20
No ratings yet
Yancey kdd20
9 pages
SequentialDecisionMaking PDF
No ratings yet
SequentialDecisionMaking PDF
50 pages
Paper 2
No ratings yet
Paper 2
12 pages
Non-Stochastic Best Arm Identification and Hyperparameter Optimization
No ratings yet
Non-Stochastic Best Arm Identification and Hyperparameter Optimization
13 pages
End-To-End Workflow For v5.0
No ratings yet
End-To-End Workflow For v5.0
4 pages
Optimal Route Finder Report
No ratings yet
Optimal Route Finder Report
3 pages
3 Programación Lineal 2021
No ratings yet
3 Programación Lineal 2021
16 pages
Blinkit Analytics - Hometask
No ratings yet
Blinkit Analytics - Hometask
2 pages
RFGF
No ratings yet
RFGF
5 pages
Stacked Thompson Bandits: Lenz Belzner Thomas Gabor
No ratings yet
Stacked Thompson Bandits: Lenz Belzner Thomas Gabor
4 pages
Taylor Swift
No ratings yet
Taylor Swift
5 pages
Summary
No ratings yet
Summary
48 pages
Assignment 1: CS747: F I L A
No ratings yet
Assignment 1: CS747: F I L A
10 pages
Constrained Reinforcement Learning For Dynamic Material Handling
No ratings yet
Constrained Reinforcement Learning For Dynamic Material Handling
9 pages
Contextual Bandits For Advertising Budget Allocation
No ratings yet
Contextual Bandits For Advertising Budget Allocation
6 pages
Literature Review TIL Research Project-2
No ratings yet
Literature Review TIL Research Project-2
11 pages
Offline Data-Driven Multiobjective Optimization Knowledge Transfer Between Surrogates and Generation of Final Solutions
No ratings yet
Offline Data-Driven Multiobjective Optimization Knowledge Transfer Between Surrogates and Generation of Final Solutions
15 pages
2025 These Califrais LPSM ML CO
No ratings yet
2025 These Califrais LPSM ML CO
4 pages
Unit - 1: Probability Linear Algebra
No ratings yet
Unit - 1: Probability Linear Algebra
20 pages
No Regrets Waiting Model
No ratings yet
No Regrets Waiting Model
8 pages
Data Challenge - NC Soft
No ratings yet
Data Challenge - NC Soft
4 pages
Experiment 6
No ratings yet
Experiment 6
7 pages
Rl-Unit Iv Qa
No ratings yet
Rl-Unit Iv Qa
16 pages
Bandit Book
No ratings yet
Bandit Book
129 pages
Hayashi 2025
No ratings yet
Hayashi 2025
14 pages
Spring 2010
No ratings yet
Spring 2010
32 pages
Omni Legend Scanner
No ratings yet
Omni Legend Scanner
13 pages
DSE Inequalitiesdsdsddwdddwdwdw
No ratings yet
DSE Inequalitiesdsdsddwdddwdwdw
4 pages
Casework Aime
No ratings yet
Casework Aime
5 pages
File List
No ratings yet
File List
8 pages
Intelligent Network: Insufficient
No ratings yet
Intelligent Network: Insufficient
6 pages
Controllable Sentence Simplification With A Unified Text-to-Text Transfer Transformer
No ratings yet
Controllable Sentence Simplification With A Unified Text-to-Text Transfer Transformer
12 pages
Katalog - Body Vzplanuti PDF
No ratings yet
Katalog - Body Vzplanuti PDF
44 pages
CMM Company
No ratings yet
CMM Company
640 pages
MITS Gwalior Application Shortlisted.v1.01
No ratings yet
MITS Gwalior Application Shortlisted.v1.01
26 pages
CBD Aisc 360 16
100% (1)
CBD Aisc 360 16
98 pages
CV - Manuel Antonio Gomez Merino
No ratings yet
CV - Manuel Antonio Gomez Merino
2 pages
Capstone: Stem-Based Research
No ratings yet
Capstone: Stem-Based Research
29 pages
Act No. 2 of 2021the Cyber Security and Cyber Crimes
No ratings yet
Act No. 2 of 2021the Cyber Security and Cyber Crimes
49 pages
21341A0596 Literature Review
No ratings yet
21341A0596 Literature Review
57 pages
Env SPV DR B 001 QC Manual Rev.A
No ratings yet
Env SPV DR B 001 QC Manual Rev.A
92 pages
Initial Consonant Blends
No ratings yet
Initial Consonant Blends
10 pages
Fire Pump Digital Panel (FPDP) : Specification Sheet
No ratings yet
Fire Pump Digital Panel (FPDP) : Specification Sheet
2 pages
MT 199 Corrección
No ratings yet
MT 199 Corrección
3 pages
MCA Cloud Storage Report
No ratings yet
MCA Cloud Storage Report
13 pages
Animal Breeding Methods
No ratings yet
Animal Breeding Methods
186 pages
PERIODIC TEST in ICT-Grade 9 (Computer System Servicing)
No ratings yet
PERIODIC TEST in ICT-Grade 9 (Computer System Servicing)
3 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
Inline Terminal - IB IL AO 4/I/4-20-ECO - 2702497: Product Description
No ratings yet
Inline Terminal - IB IL AO 4/I/4-20-ECO - 2702497: Product Description
10 pages
Fire Alarm Control Panel: Efficient, Scalable, Connected General
No ratings yet
Fire Alarm Control Panel: Efficient, Scalable, Connected General
7 pages
AKTU - QP20E290QP: Time: 3 Hours Total Marks: 100
No ratings yet
AKTU - QP20E290QP: Time: 3 Hours Total Marks: 100
4 pages

Using A Multi-Armed Bandit With Thompson Sampling To Identify Responsive Dashers

Uploaded by

Using A Multi-Armed Bandit With Thompson Sampling To Identify Responsive Dashers

Uploaded by

6/30/24, 12:05 PM Using a Multi-Armed Bandit with Thompson Sampling to Identify Responsive Dashers

Using a Multi-Armed Bandit

Maintaining Dasher supply to meet consumer demand is one of the

We need to optimize which Dashers to target with our messages

Finding the best way to alert Dashers

We need to find a methodology that uses our information about Dasher

• Identify current responsive Dashers who are more likely to convert

While this approach is easy to frame as a binary classification model,

As Dasher preferences change over time, the algorithm can relearn

Next, we need to select which bandit framework to use in order to

A trio of possible bandits

There are multiple factors involved in determining which bandit to use.

The Epsilon-Greedy algorithm defines a parameter – epsilon – that

The Upper Confidence Bound (UCB) bandit algorithm is based on the

Thompson Sampling takes a Bayesian approach to the problem. We

Why we chose Thompson Sampling

We started by defining our target function: Determining what the

if not, add to beta. As we update the distribution following each

Next, we needed to tune a set of hyperparameters vital to modeling the

• Consider the length of each observation – over what time period

After defining our beta distribution, update rule, and these

dynamically optimize traffic. With a single model, we get the advantages

Using ML and Optimization to

Subscribe to stay up to date with the

See All Jobs

You May Also Like

Machine Learning Data Machine Learning

Powering Search & How DoorDash is S

Aamir Mitchell Sudhir

 8 Minute Read     26 Minute Read

Let Us Help You

Get Dashers for Deliveries

How to Call Our API

Privacy Policy Terms and Conditions

You might also like