Learning in Repeated Auctions: Ffi FF
Learning in Repeated Auctions: Ffi FF
Abstract
Online auctions are one of the most fundamental facets of the modern economy
and power an industry generating hundreds of billions of dollars a year in revenue.
Auction theory has historically focused on the question of designing the best way to
sell a single item to potential buyers, with the concurrent objectives of maximizing
revenue generated or welfare created. Theoretical results in this area have typically
relied on some prior Bayesian knowledge agents were assumed to have on each-other.
This assumption is no longer satisfied in new markets such as online advertising:
similar items are sold repeatedly, and agents are unaware of each other or might try to
manipulate each-other. On the other hand, statistical learning theory now provides
tools to supplement those missing pieces of information given enough data, as agents
can learn from their environment to improve their strategies.
This survey covers recent advances in learning in repeated auctions, starting from
the traditional economic study of optimal one-shot auctions with a Bayesian prior.
We then focus on the question of learning optimal mechanisms from a dataset of
bidders’ past values. The sample complexity as well as the computational efficiency
of different methods will be studied. We will also investigate online variants where
gathering data has a cost to be accounted for, either by seller or buyers ("earning while
learning"). Later in the survey, we will further assume that bidders are also adaptive
to the mechanism as they interact repeatedly with the same seller. We will show how
strategic agents can actually manipulate repeated auctions, to their own advantage.
A particularly interesting example is that of reserve price improvements for strategic
buyers in second price auctions.
All the questions discussed in this survey are grounded in real-world applications
and many of the ideas and algorithms we describe are used every day to power the
Internet economy.
1
Contents
Bibliography 124
2
1 Introduction: scope and motivation
The main purpose of auction theory is to construct a set of rules that will be used by a
seller to sell one or several items to a group of potential buyers, that will send messages
(or bids) to the seller – usually indicating how much they value the item or how much they
are willing to pay to acquire it. In almost all cases, it is sufficient to define only two rules.
First, the allocation rule describes which buyer wins the auction (if a unique non-divisible
item is sold), depending on the different messages received; if the item is divisible, the
allocation rule describes how the item is shared between winners. Second, the payment
rule indicates to buyers how much they are going to pay to the seller, again based on the
different messages. Those rules are known publicly before the auction starts, and they
influence the behavior, or strategy, of the different buyers.
When choosing an allocation and a payment rule, the seller might have several con-
straints to respect: 1) maximizing the revenue she is getting from the auction (revenue
maximization); 2) ensuring the participation of buyers to the auction and making sure they
have an incentive to participate (individual rationality); 3) ensuring that given the rules of
the auction, it is in the best interest of buyers to reveal how much they truly value an item
(incentive compatibility) as it may make revenue maximization easier. On the other side of
the game, the buyers adapt strategically the bids sent to the seller depending on auction
rules in order to maximize their own utility.
Historically, auctions have often been designed so that buyers have an incentive to
bid in a way that reflects how much they truly value the items that are for sale. This
constraint still leaves plenty of choices for auction design, and a large part of the literature
has focused on designing auctions that maximize the seller’s revenue, assuming buyers
are rational. However, with the advent of the Internet and the automation of auctions, the
landscape of possible applications has changed drastically, necessitating more complex
settings to accurately study the incentives and behaviors at play. More recently, the auction
literature has aimed at understanding how the design of an auction platform impacts
seller’s revenue, the global welfare and the behavior of buyers and sellers in contexts where
sellers (and sometimes buyers) participate in a very large number of auctions each day.
These setups reflect situations appearing in modern online marketplaces.
3
owned privately by the buyers and the information that the seller has on each buyer. This
information owned privately by the buyers is the value they give to the item, i.e, the highest
price they are willing to pay to get the item. The uncertainties upon these different values
lie at the gist of the seller’s optimization problem: otherwise, she would just have to sell
the item to the buyer with the highest value, at this price or infinitesimally less.
To handle this deficit of information about buyers, it is standard to take a “Bayesian"
viewpoint and assume that the seller has some probabilistic prior on the values given to
the item by each bidder. This prior distribution is usually called the value distribution and
it encompasses the seller’s uncertainty on a specific bidder’s values. There are of course
several possibilities for how this value distribution is constructed. For instance, in wine or
art auctions, it often comes from expert knowledge about an admissible price for a good
wine bottle or for an important piece of art.
4
improve their bidding strategies against automated mechanisms. This flood of data and the
associated paradigm shift it constitutes opens many new interesting practical problems,
new theoretical questions and new interesting games to study.
The first natural repeated game setting consists in understanding how the seller can
learn a revenue-maximizing auction mechanism from a dataset of bids or values. In the
example of Ebay marketplace, the seller (Ebay) observes numerous auctions a day for
similar items. Hence, from its point of view, the mechanism is repeated and she can
aim at optimizing some long-term revenue. On the contrary, buyers are individuals that
participate in a few, if not a single, auctions at best. Then, from their point of the view, the
mechanism still looks like a one-shot auction and they are bound to implement myopic
short-term strategies, optimizing point-wise their utility (by opposition to long-term and
effectively in expectation). Let us consider the simplifying assumption where bidder values
on the platform are sampled from a certain unknown distribution, that encompasses the
variability in their readiness to pay a certain price. Assuming the bidders actually bid their
true value (for instance, if the mechanism chosen is fixed and “incentive-compatible”, i.e.,
bidding one’s value is optimal for buyers), the seller has then access at the end of the day
to a dataset of buyer values.
Inspired by the computational learning formalism, Elkind (2007), Balcan et al. (2008),
and Cole and Roughgarden (2014a) initiated a line of research aiming at finding approxi-
mations of the revenue-maximizing auction, if possible, efficiently, with approximation
guarantees depending on the size of the dataset gathered (a.k.a., the sample complexity).
This setting is called the batch learning setting. A variant considers the case where the flow
of buyers is continuously coming on the platform and the seller can update continuously
her mechanism. This is the online learning setting introduced in (Cesa-Bianchi et al., 2014).
In all these problems, it is crucial that the samples gathered in the dataset do have the
same distribution as the samples that will be gathered and treated in the future.
5
Indeed, most companies willing to display ads actually rely on third-parties, demand-
side platforms (DSP), that are buying and displaying ads for them (because of technical
constraints, even sending bids in real-time might actually be quite complex). These ag-
gregated bidders are repeatedly interacting with the (same) seller, billions of times a day.
Consequently, this type of buyers can also optimize for long-term utility and need not
be myopic. Thus, even if the seller is using one-shot incentive compatible auctions - for
instance to gather data in order to later design and switch to a revenue maximizing mecha-
nism -, the bidder might have an interest in not bidding “truthfully", as classical theory
would suggest is optimal for them. Indeed, if buyers do not bid their values, this will
modify the distribution of “values” observed by the seller. Subsequently, the mechanism
chosen to optimize her revenue will be different from what it would have been had bidders
been naïve, to the advantage of the buyers (Tang and Zeng, 2018; Nedelec et al., 2019b).
Intuitively, this is possible because the information asymmetry that arose in the Ebay
example between the seller and the bidders – one optimizing over the long-term, the other
over the short-term – is almost reversed. If the seller must commit to a specific mechanism
or a family of mechanisms, for instance for contractual reasons, and buyers have this
information, they can strategically leverage it by e.g. changing their bidding behavior.
In the end, the respective utilities of the seller and buyers will somehow depend on the
underlying amount of asymmetry between them. Several works have started studying
various intermediate settings, for example when bidders are (almost) identical (Kanoria
and Nazerzadeh, 2014), or are patient, but not as patient as the seller (Amin et al., 2013),
etc.
6
is the revenue-maximizing auction once the seller has a prior on bidder’s valuations and
introduce some approximations of the revenue-maximizing auction when the seller must
use simpler auctions. In Chapter 3, we focus on the setting derived from the Ebay use
case and tackle both the batch learning setting and the online learning setting. We recall
some key concepts of statistical learning theory, derive the sample complexity of some
of the learning algorithms used to compute a revenue-maximizing auction and show
their computational complexity. In Chapter 4, we focus on the less studied but crucially
important setting where bidders can be strategic regarding the mechanism itself since
they have multiple interactions with the seller. We review some of the main methods
that have been devised to keep bidders from being strategic in that context, show their
limitations and introduce some very new results and approaches developed for bidders to
take advantage of the seller’s learning process.
This survey only assumes basic familiarity with standard notions of Machine Learning,
Statistics and Data Science and is written with a reader having this background in mind.
We hope our survey will be useful to engineers and researchers looking for an introduction
to the beautiful and fast developing topics of modern auction theory and applications.
References
Amin, K., A. Rostamizadeh, and U. Syed. 2013. “Learning prices for repeated auctions
with strategic buyers”. In: Proceedings of the 26th International Conference on Neural
Information Processing Systems-Volume 1. 1169–1177.
Balcan, M.-F., A. Blum, J. D. Hartline, and Y. Mansour. 2008. “Reducing mechanism design
to algorithm design via machine learning”. Journal of Computer and System Sciences.
74(8): 1245–1270.
Cesa-Bianchi, N., C. Gentile, and Y. Mansour. 2014. “Regret minimization for reserve prices
in second-price auctions”. IEEE Transactions on Information Theory. 61(1): 549–564.
Cole, R. and T. Roughgarden. 2014a. “The sample complexity of revenue maximization”. In:
Proceedings of the forty-sixth annual ACM symposium on Theory of computing. 243–252.
Elkind, E. 2007. “Designing and learning optimal finite support auctions”. In: Proceedings
of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. 736–745.
Kanoria, Y. and H. Nazerzadeh. 2014. “Dynamic Reserve Prices for Repeated Auctions:
Learning from Bids”. In: Web and Internet Economics: 10th International Conference.
Vol. 8877. Springer. 232.
Myerson, R. B. 1981. “Optimal auction design”. Mathematics of operations research. 6(1):
58–73.
Nedelec, T., N. El Karoui, and V. Perchet. 2019b. “Learning to bid in revenue-maximizing
auctions”. In: International Conference on Machine Learning. PMLR. 4781–4789.
7
Ostrovsky, M. and M. Schwarz. 2011. “Reserve prices in internet advertising auctions: A
field experiment”. In: Proceedings of the 12th ACM conference on Electronic commerce.
59–60.
Riley, J. G. and W. F. Samuelson. 1981a. “Optimal auctions”. The American Economic Review.
71(3): 381–392.
Tang, P. and Y. Zeng. 2018. “The price of prior dependence in auctions”. In: Proceedings of
the 2018 ACM Conference on Economics and Computation. 485–502.
Vickrey, W. 1961. “Counterspeculation, auctions, and competitive sealed tenders”. In: The
Journal of finance. Vol. 16. No. 1. Wiley Online Library.
8
2 Bayesian mechanism design
First read of this chapter, key concepts and ideas
Auctions mechanisms involve many different agents, sellers and/or buyers, with possibly
different and conflicting objectives as they all seek to optimize of their own utility functions.
These interactions can be modeled using game-theoretic concepts. More specifically, we
are going to focus on a specific type of games with incomplete information that are called
mechanisms. In those games, each player has some private information (i.e., unknown from
everyone else), and send a message to a central authority. Based on those gathered messages,
the latter decides on the final outcome. The utility of each player then solely depends on
this outcome.
Mechanisms model appropriately many practical situations such as the celebrated
problems of assigning students to schools, or matchings in organ-transplant applications.
In these problems, a central authority forms pairs between school and students or donors
and recipients. In selling mechanisms, the central authority is the seller of a specific item
and the players are the buyers. Auctions are mechanisms used to sell a particular item. In
the case of a sale of a single non-divisible item, they have the following specific features.
Auctions are games of incomplete information as each buyer has some private valuation
for the item to be sold, i.e., the highest price they are willing to pay to acquire this item.
This valuation might be different from one buyer to another. We denote by N the set of
buyers, of cardinality n ∈ N, by Xi ⊂ R the set of possible private values of bidder i ∈ N ,
by xi ∈ Xi the actual private value of bidder i and by x = (x1 , . . . , xn ) ∈ X := i Xi ⊂ Rn , the
Q
so-called profile of private values; the bold notation will refer to vectors for the sake of
clarity. The possible messages (or actions) of buyer i ∈ N are called “bids” and the set of
bids of buyer i is denoted by Bi . The outcome of an auction mechanism is defined by two
9
different rules:
We will denote by A the set of all auction mechanisms, i.e., the set of pairs of alloca-
tion/payment rules.
The utility of bidder i is simply the difference between the item value (if he won the
auction, and 0 otherwise) and his payment (that can be positive even if the auction is lost).
If we denote by b = (b1 , . . . , bn ) ∈ B the vector of bids of the bidders, the expected utility of
bidder i ∈ N , given bids and values , is then defined as
on the other hand, the seller aims at maximizing the expectation of her revenue defined,
given the bids, by:
Xn
pi (b) .
i=1
10
to get optimality results (and not just ε-optimality). A crucial assumption throughout this
survey is that, unless otherwise noted, the values xi are drawn independently for different
i’s and hence they are statistically independent as random variables. We shall denote by
F = F1 ⊗ . . . ⊗ Fn the joint - and hence - product distribution of x = (x1 , . . . , xn ), the vector of
values.
Examples Typical examples of value distributions are the uniform distribution, which is
widely used in textbook examples due to its simplicity, the exponential and the log-normal
distributions as they are similar to some empirical distributions encountered on modern
internet platforms. Power law distributions (also known as Pareto distributions) are also
widely used, as they capture the idea that the value in real time bidding and online advertis-
ing comes from few matches of very high quality, such as consumers who recently viewed
a product (Arnosti et al., 2016) (this situation where 20% of individuals own/generate
80% of wealth is also referred as the Pareto principle in economics). Generalized Pareto
distributions are also often used as examples because their virtual value - an important
concept we will define later - is linear.
Definition 2.1.1 (Symmetric setting). The auction setting is called symmetric if all bidders
have the same value distribution.
For the sake of clarity, we use pronouns her/she for the seller and he/his for one specific
bidder.
11
Gi is differentiable and its density is strictly positive except possibly on a set of measure
zero. We also use the standard convention that a function β is increasing, if for all l < u,
β(l) < β(u); on the other hand, a non-decreasing function is such that if l < u, β(l) ≤ β(u).
We use ∂f to denote the subdifferential of a convex function f (see Hiriart-Urruty and
Lemaréchal, 2001, Chapter D). Finally h·, ·i is the standard dot product between two vectors
and for any integer N ∈ N, we define [N ] = {1, . . . , N }.
• For ex-ante properties, bidders do not know yet their own value for the item; i.e.,
they only know Fi and F−i .
• For interim properties, bidders know their valuation but do not know the values of
other players, i.e., they know xi and F−i .
• For ex-post properties, bidders know both their and the other players’ valuations, i.e.,
they know xi and x−i .
In the next sections, we will mostly focus on interim properties. It is the starting point of
most of the auction literature: we assume that value distributions are common knowledge
and that exact valuations are private information to each bidder. We will mention explicitly
when we refer to ex-ante or ex-post properties.
We are also assuming that bidders are risk-neutral. In other words, they seek to maxi-
mize their expected utility and use utility-maximizing strategies: Given the strategy β −i of
other players, the expected utility of the strategy βi given bidder i’s value xi is denoted by
Ui (βi , β −i , xi ) = Ex−i ∼F−i ui (β1 (x1 ), . . . , βi−1 (xi−1 ), βi (xi ), βi+1 (xi+1 ), βn (xn )), xi .
Optimality and characterization of strategies Maybe one of the most central concepts
in game theory is (Bayesian) Nash equilibrium. At a Bayesian Nash Equilibrium, for any
bidder i, his strategy βi maximizes his expected utility, given his valuation distribution Fi ,
and given the strategies of his opponents and their valuation distributions, i.e., β −i and F−i .
12
A stronger concept is that of weak dominance: a strategy βi is weakly dominant when it is
optimal in terms of expected utility of bidder i against any strategies used in β −i and not
only those at a Bayesian Nash Equilibrium. The strongest concept is ex-post dominance,
where optimality is achieved at any possible profile of valuations.
∀β ∈ Bi , ∀xi ∈ Xi , ∀β −i ∈ B−i , ∀x−i ∈ X −i , ui ((βi (xi ), β −i (x−i )), xi ) ≥ ui ((β(xi ), β −i (x−i )), xi ) .
Those properties of strategies are classical concepts in game theory. On the other hand,
it is also possible to introduce and study different properties of mechanisms. Some of them
require the concept of “truthful bidding” which correspond to the specific strategy βi (x) = x.
We will denote by βi,tr this truthful strategy.
Characterization of mechanisms
(Standard) if it allocates the item to the buyer with the highest bid.
(Efficient) if it allocates the item to the buyer with the highest valuation (at least at some
equilibrium).
13
A DSIC mechanism is obviously a BIC mechanism. More generally, Incentive Compati-
ble (IC) auctions have the nice property of being “simple” for the buyers from a strategic
standpoint: bidding their (known in the interim setting) valuation is optimal for them. No-
tice that this unfortunately does not ensure the uniqueness of the equilibrium where each
bidder bids truthfully (we will call this equilibrium the truthful equilibrium). See Section
2.6 for more details. Like most authors we restrict attention to the truthful equilibrium
from now and leave more pathological equilibria aside. As we will see later, being DSIC is
one of the main reasons explaining the tremendous success of second-price auctions in
practice. Another reason is that if bidders are bidding truthfully, then the seller can, in
a first step, elicit their value distributions through a DSIC mechanism and then move to
another mechanism that maximizes her revenue (this is detailed in Section 2.4).
Finally, before presenting and analyzing two classical types of auctions, we indicate
that individual rationality simply ensures that bidders have an interest in taking part in
these auctions.
Theorem 2.3. The second-price auction is DSIC. In other words, bidding truthfully is
weakly dominant.
∗
Proof. Let us denote by xi the private value of bidder i and by b−i the highest bid of the
competition. We are going to compare the utility of bidding b instead of xi .
∗
• Case b > xi . The only case where his (ex-post) utility is changed is when xi < b−i < b.
∗
With a bid b, he now wins the auction but his utility is negative since xi − b−i < 0.
14
∗
• Case b < xi . The only case where his (ex-post) utility is changed is when xi > b−i > b.
With a bid b, he now loses a profitable (i.e., with a positive utility) auction.
The classical second-price auction was used until recently (Feng et al., 2021) by most of
the biggest online platforms to sell ad placements on publishers’ websites. Another widely
used and studied sealed-bid auction is the first-price auction.
The first-price auction allocates the item to the highest bidder who pays his own bid.
Before studying Nash equilibria of symmetric first-price auction, we derive the general
best reply of player i to the bid distribution of the competition, specifically the distribution
of the maximum bid of the competition. This result is of increasing interest to practitioners
as many online auctions are now first price auctions.
Proposition 2.4. Let Gi be the cdf of the highest bid of the competition of bidder i, i.e.,
maxj,i bj . In a sealed-bid first price auction, a best response of bidder i to Gi is any mapping
βi satisfing
βi (xi ) ∈ argmax Gi (b)(xi − b) .
b∈R
When Gi is log-concave and Gi (xi ) > 0, the best response is unique. If we further assume
that Gi has a pdf gi , first order conditions also give, if Gi (xi ) > 0,
Gi (b)
βi (xi ) is a solution (in b) of b + = xi .
gi (b)
If Gi (xi ) = 0 a best response is βi (xi ) = xi .
Calling Y−i a random variable with cdf Gi , it can also be shown that under mild techni-
cal conditions that βi is increasing and satisfies the equation βi (x) = E βi−1 (Y−i )|Y−i ≤ βi (x) .
We also have the following interesting corollary.
Corollary 2.5. Proposition 2.4 implies that the first price auction is in general not BIC.
The corollary simply follows by showing that the best response of bidder i when all
other bidders bid truthfully (and hence the top bid of the competition is the largest value
of the other bidders) consists in bidding something else than xi . If Gi (xi ) > 0 it follows
immediately than they are better strategies than bidding bi = xi : for instance, take any
b̃i such that Gi (b̃i ) = Gi (xi )/2 (b̃i exists by continuity of Gi and Gi (ai ) = 0). The utility of
bidder i is strictly positive at b̃i and is 0 at xi .
15
Proof. Let us denote by Y−i the random variable corresponding to the maximum bid of the
competition of bidder i, so that its cdf and pdf are Gi and gi .
When bidder i has private value is xi and bids bi , the utility he derives from the auction
is ui (bi , xi ) = (xi − bi )1{bi > Y−i }; in other words, it is his value minus his cost when he wins
the auction and zero otherwise.
We denote by Ui (bi , xi ) : R+ × R+ → R the associated expected utility of bidder i when
his private value is xi and he bids bi . We have
Ui (bi , xi ) = EY−i (xi − bi )1{bi > Y−i } = Gi (bi )(xi − bi ) .
A best response is therefore any b(xi ) ∈ argmaxb∈R+ Gi (b)(xi − b). Note that when Gi is
log-concave and Gi (xi ) > 0, we can verify by inspection that Ui (bi , xi ) is strictly log-concave
in bi on the support of Fi and therefore it has a unique maximum smaller than xi (Boyd
and Vandenberghe, 2004). This property follows also immediately from the definition of a
strictly concave function.
Since Gi has a pdf, it is continuous and differentiable and therefore so is Ui (bi , xi ) as a
function of bi . The derivative with respect to bi is then equal to
∂Ui
(b , x ) = gi (bi )(xi − bi ) − Gi (bi ).
∂bi i i
∂Ui (bi ,xi )
Let us assume that Gi (xi ) > 0, then Ui (xi , xi ) = 0 but ∂bi
< 0. This first implies by
b=xi
continuity that there exists bids where the utility is positive. Rolle’s theorem applied to
t 7→ Gi (t)(xi − t) also gives the existence of a stationary point bi∗ ≤ xi where ∂U i
(b∗ , x ) = 0,
∂bi i i
as Ui (0, xi ) = 0, too (since we assumed non-negative bids). Since we showed above that
Ui (bi , xi ) is positive somewhere in a neighborhood of xi , then necessarily Ui (bi∗ , xi ) > 0.
Finally, if gi (bi∗ ) = 0 then this would imply that Gi (bi∗ ) = 0 to satisfy the first order condition
and therefore Ui (bi∗ , xi ) would be equal to 0 which is impossible.
The case where Gi (xi ) = 0 is trivial as bidder i cannot have a positive utility (recall that
Gi is a non-decreasing and non-negative function, so 0 ≤ Gi (bi ) ≤ Gi (xi ) if bi ≤ xi ). Bidding
xi is then optimal.
There is a very rich line of work focusing on deriving Nash equilibria in first-price
auctions when bidders have different value distributions. This involves solving complex
systems of coupled first-order differential equations (at least with continuous value distri-
butions, see Krishna, 2009, Section 4.3; see also p. 18). On the other hand, with symmetric
bidders, i.e., with identical value distributions, it is possible to solve explicitly this system
of equations and to derive the unique symmetric Nash equilibrium with increasing strategy.
From now on, we will call a Nash equilibrium increasing if the strategies are all increasing
mappings.
16
Theorem 2.6. In the symmetric case, if the common pdf f is such that f (x) > 0 (except on
a set of Lebesgue measure 0 within the support of F), there exists a symmetric increasing
Nash-equilibrium whose strategy is described by:
(1) (1)
β(x) = E[x−i |x−i < x] .
(1)
where x−i is the highest value among bidders except bidder i.
This bidding strategy can be interpreted as bidding the expectation of the largest value
of the competition, conditionally on the fact that this value is smaller than bidder i’s value.
We note that this bidding strategy can be derived from the proof of the revenue-equivalence
Theorem 2.8, and specifically the expected payment formula. This is another common
method for finding equilibrium bidding strategies. The proof presented below might lead
more directly to the solution.
Example. Suppose there are n ≥ 2 bidders, and they all have uniform [0,1] value distri-
bution, i.e., F(x) = x on [0, 1]. Then a symmetric increasing Nash equilibrium exists in 1st
price auctions where all bidders bid using the strategy
n−1
β(x) = x.
n
Proof. We assume that n ≥ 2 and that all bidders are using the function β described
above. As we will show below, this function is increasing on the support of F under our
assumptions. Furthermore, when all bidders are using the same increasing strategy on the
union of the support of their value distributions, the probability that bidder i wins the
auction is the same as the probability that he has the highest value; this would not always
be true if the strategy were only non-decreasing.
Furthermore, elementary properties of conditional expectations give, if Gi and gi are
(1)
the cdf and pdf of x−i ,
Rx Rx
yg i (y)dy Gi (u)du
β(x) = βi (x) = 0 =x− 0 , whenever Gi (x) > 0 .
Gi (x) Gi (x)
Under our assumptions, Gi = F n−1 , thus gi (x) = (n − 1)f (x)F n−2 (x) and gi (x) > 0 on the
support of F, except possibly on a set of Lebesgue measure 0. As a consequence,
Rx
g i (x) G (u)du
0 i
β 0 (x) =
[Gi (x)]2
and therefore, restricted to the support of F, β is a non-decreasing function whose deriva-
tive is 0 on a set of Lebesgue measure 0 . We conclude that β is actually increasing on the
support of F. Since the latter is supposed to be [0, H], with H possibility infinite, bidder i
17
has no incentive to bid higher than β(H). Then, any other bid b will satisfy b ∈ [0, β(H)]
and since β(0) = 0 and β continuous , because F is continuous, there must exist z ∈ [0, H]
such that b = β(z). Finally, note that the probability that bidder i wins the auction when
bidding β(z) is just Gi (z), since β is increasing on the support of F. Therefore,
This shows that β is the best response, and thus (β, . . . , β) is a symmetric (increasing)
Nash equilibrium. We will prove unicity of this increasing differentiable symmetric Nash
equilibrium in symmetric first-price auctions in Section 2.3.
Unlike second-price auctions, first price auctions are not incentive compatible (see
e.g. Corollary 2.5). As a consequence, the strategy of a bidder at an equilibrium depends
on the bidding strategy of the other bidders, and ultimately on other bidders’ valuation
distributions (see Proposition 2.4). So, in practice, computing a good or optimal bidding
strategy would require estimating the distribution of the highest bid of the competition,
which can be very challenging. Nevertheless, because of their relative transparency for
bidders (who know ahead of time what they might pay if they win), first-price auctions
are increasingly used in online advertising auctions (Feng et al., 2021). However, optimal
bidding becomes much more complex for bidders than it is in second price or other
BIC/DSIC/“truthful” auctions.
Nash equilibrium in the asymmetric case Asymmetric first price auctions are much
more intricate than symmetric ones as the equilibrium strategy of each bidder depends in
a very subtle manner of the other bidders’ strategies (Lebrun, 1999). Indeed, let us assume
that the distributions F1 , . . . , Fn are supported on [h, H], have a density bounded away from
0 on (h, H) and possibly have a point mass at h.
Theorem 2.7 (Lebrun, 1999). Under these assumptions, there exist deterministic Nash
equilibrium strategies that are increasing. Let us denote them by βi and by xi (b) = βi−1 (b) ≥ b
their inverse, i.e., the value inducing the bid b. Then the increasing functions b → xi (b)
solve the system of differential equations:
n
∂ −1 1 X 1
∀i ∈ N , log(Fi (xi (b))) = + . (2.1)
∂b xi (b) − b n − 1 xj (b) − b
j=1
18
When the distributions are without atoms, the boundary conditions are xi (h) = h for all
i ∈ N , and there exists η > 0 such that for all i, βi (H) = η.
The boundary conditions mean that bidders bid their value at h and have a common
maximal bid (Fibich and Gavious, 2003).
Numerical issues in computing Nash equilibrium Finding the solution to the differen-
tial system (2.1) is considered hard essentially because the solutions are unstable near the
boundary (Marshall et al., 1994). The case of n = 2 bidders has received a fair amount
of attention both from both theoretical and numerical perspectives (Fibich and Gavish,
2012). Another approach finds yet another form of the differential system of equations
and expands the functions appearing in it in a fixed polynomial basis. (Gayle and Richard,
2008).
Remark. The revenue equivalence Theorem 2.8 applies in particular to first and second
price auctions, in the symmetric case where all bidders have the same value distribution
(Myerson, 1981) and use an increasing strategy at equilibrium. The revenue equivalence
theorem assumes that bidders have all the same value distributions. There exists asymmet-
ric cases where the first-price auction brings more revenue than the second-price auction
and vice-versa (Krishna, 2009, Section 4.3.2). We give in Section 2.6 an example showing
that the assumption that β is increasing is crucial and cannot be dispensed with.
19
when he bids β(t) and the other bidders are using the same strategy β. As before, we denote
(1)
by Gi the distribution of x−i , the highest value among all the bidders except bidder i. Using
the fact that β is increasing, that all players use this strategy - at the Nash equilibrium -
and the fact that the auction is standard, the probability that bidder i wins when he bids
β(t) is Gi (t).
We can still assume that a deviation of bidder i consists in bidding β(z) instead of β(xi )
because the auction is standard and β is increasing. In particular, the expected utility for
bidder i of this deviation can be written as
We now introduce the mapping Vi (xi ) = Ui (xi , xi ) that is convex, as the maximum of
linear and functions, and hence almost everywhere differentiable (Hiriart-Urruty and
Lemaréchal, 2001)). Recall also that (Lemma 4.4.1 in (Hiriart-Urruty and Lemaréchal,
2001)) the subdifferential at some x of a supremum of convex functions contains the
convex hull of the subdifferentials of the functions achieving this supremum (and is empty
if the supremum is not achieved). Before proving formally the result, let us give some
intuitions. The function Vi we consider is the maximum (over z) of linear mapping in x,
whose differential are simply Gi (z). As a consequence, when Vi is differentiable at x it holds
that Vi0 (x) = Gi (z∗ (x)), where z∗ (x) = argmaxz Ui (z, x), which suggests that “Vi0 (x) = Gi (x)".
This intuition is formalized thanks to the envelop theorem (Milgrom and Segal, 2002)
that holds because Ui (z, ·) is linear for all z and hence differentiable and therefore absolutely
∂Ui (z,x)
continuous. Furthermore, ∂x = Gi (z) ∈ [0, 1] for all z. Since the maximum of z 7→ Ui (z, xi )
is attained for z = xi at the Nash equilibrium, the envelope theorem finally states
Zx
Vi (x) − Vi (0) = Gi (u)du .
0
However, it also holds that Vi (x) − Vi (0) = xGi (x) − Pi (x) + Pi (0). Thus, since Pi (0) = 0 because
of 0-rationality, we finally get that, integrating by parts,
Zx Zx
(1) (1)
Pi (x) = Pi (0) + xGi (x) − Gi (u)du = tdGi (t)dt = Gi (x)E[x−i |x−i < x] .
0 0
Hence the expected payment of bidder i is independent of the specific auction format. As
a consequence, so are the expected seller’s revenue and the expected utility of bidder i,
because the auction is standard.
20
Remark. This proof shows a principled way to get necessary conditions on bidding strate-
gies forming an increasing Nash equilibrium, through the payment formula derived above.
The proof of Theorem 2.8 is a bit formal and technical as it relies on convex analysis
arguments; however, it provides insight on how to easily and informally derive symmetric
equilibrium strategies. Denote by β the common strategy of an increasing Nash equilibrium
of the first-price auction, postulated at this point for this informal derivation to exist. Note
that by symmetry, Gj = Gi = G for all i, j and similarly for the pdfs. So we use the notations
G and g for cdf and pdf below. Conducting the same computations as in the proof of
Theorem 2.8, assume that all bidders but i follow β - i.e., bidder j , i bids β(xj ) if xj is his
value - and that bidder i is bidding β(z) instead of β(xi ). Note that because all players are
using the strategy β and β is increasing, the probability that bidder i wins the auction is the
exactly the probability that z is higher than the largest value of the competition. In other
words, the probability that he wins the auction is G(z). His utility in the specific case of a
first price auction can then be written as
∂Ui
0= (β(z), xi ) = (xi − β(xi ))g(xi ) − β 0 (xi )G(xi ) .
∂z z=xi
Notice that the above equation can be rewritten in the more compact form
(β(x)G(x))0 = xg(x) .
Integrating the above equation and using the fact that G(0) = 0, we can compute explicitly
the symmetric equilibrium strategy
Rx
sg(s)ds (1) (1)
β(x) = s=0 = E[X−1 |X−1 < x] .
G(x)
21
2.4 Deriving revenue-maximizing auctions
We now focus on how the seller can design her auction system to maximize her revenue. A
large part of the recent literature on auctions have focused on this objective since most of
the auctioneers have a choice in designing the rules of their respective auction platforms.
To compute the optimal revenue-maximizing auction, we assume that the seller has
prior knowledge on the distribution Fi on each bidders’ valuations. These value distribu-
tions quantify the information that the seller has on each bidder.
22
2.4.2 The posted price setting: monopoly pricing
In this particularly simple through practically very common setting, designing an auction
simply reduces to a take-it-or-leave-it offer, also called posted price. In other words, a fixed
selling price is offered and a rational bidder will accept to pay it to acquire the item if and
only if the price is smaller than his valuation.
Lemma 2.10. In a posted price setting, the seller’s expected revenue is
Π(r) = r(1 − F(r)) . (2.2)
In the same setting, when F is differentiable and has finite expectation, the optimal reserve
price, called the monopoly price, is a solution of:
1 − F(r)
0 = −[r(1 − F(r))]0 = rf (r) − (1 − F(r)) = f (r) r − .
f (r)
Proof. The seller’s revenue can be written as a function of r as: in expectation, the seller’s
revenue is just the fixed price r multiplied by the probability that the buyer buys the item.
This latter probability is just the probability that the value of the buyer is above r. Formally,
if x is the value of the item for the buyer, the revenue of the seller Π(r) when selling at the
fixed price r is Z +∞
Π(r) = rP{x ≥ r} = r f (x)dx = r(1 − F(r)) .
r
The result on the monopoly price follows from differentiating the previous relation, since
dΠ(r)
= (1 − F(r)) − rf (r) .
dr
Furthermore, choosing r = 0 or r arbitrarily high gives 0 revenue, the latter because F is
assumed to have finite expectation, which then implies that limt→∞ t(1 − F(t)) = 0 by the
dominated convergence theorem. Indeed, if x has distribution F, t(1 − F(t)) ≤ Ex∼F [x1{x ≥
t}] ≤ Ex∼F [x] < ∞. Hence a maximum of the function Π exists among its stationary points,
finishing the proof.
Notice that without the assumption that F has a finite expectation, the optimal reserve
price could be arbitrarily high: take for instance F(r) = 1 − r −α with 0 < α < 1. Such an
example might actually be relevant in luxury items markets.
One of the purpose of discussing the single bidder case was to introduce organically
the crucial concept of virtual value (Myerson, 1981).
Definition 2.11. The virtual value function ψ : X → R of a distribution F (with pdf f ) is:
1 − F(x)
ψ(x) = x − . (2.3)
f (x)
23
The virtual value function can be either positive or negative, irrespective of the support
of the value distribution. The expectation under F of the virtual value, i.e., Ex∼F [ψ(x)],
is actually equal to the infimum of the support of F, when F has finite expectation. In
particular, it is equal to 0 if the support of F “starts" at 0.
The virtual value ψ(x) is a crucial concept that can be interpreted as virtual payment, as
we explain now. If the bidder has value x and decides to buy the item, so x ≥ r, his (virtual)
payment can be thought of as ψ(x), independently of the price r set by the seller. Indeed,
the revenue generated by such a price r is
Z∞ Z∞
Ex∼F [ψ(x)1{x ≥ r}] = ψ(x)f (x)dx = xf (x) − (1 − F(x))dx
r r
Z∞
=− (x(1 − F(x)))0 dx = r(1 − F(r)) = Π(r) .
r
The last equality comes from the definition of Π(r), see Equation (2.2). As a consequence,
even though it is traditionally called virtual value, ψ(x) could rather be understood as a
virtual payment: the buyer pays on average ψ(x) when his value is x and he buys/wins
the item, i.e., x ≥ r. See also Proposition 2.29 for an explanation of why this interpretation
holds for general auction systems and buyers optimizing their expected utility.
• if x ∼ U([0, 1]), the uniform distribution over [0, 1], then ψ(x) = 2x − 1 for x ∈ [0, 1].
Lemma 2.10 states that the optimal reserve price against a single bidder (which was
called the monopoly price) is, in the case where the virtual value/payment function ψ is
increasing and changes sign, necessarily the root of ψ (or the point where the sign changes
24
if ψ is not continuous): if Π(t) is the expected revenue of the seller at reserve price t (see
Equation (2.2)),
Π0 (t) = −f (t)ψ(t) .
If ψ is strictly positive everywhere, which can happen if the infimum of the support of Fi
is positive, then the optimal reserve price is that specific point (or equivalently 0). Quite
interestingly, even with multiple other bidders, the optimal reserve price for bidder i is
still i’s monopoly price, i.e., the same as if he were the only bidder. This is illustrated
in the following Section 2.4.3 under the same assumptions of ψ being increasing and/or
changing sign once.
As a consequence, in the following, our main focus will be on these distributions which
are called regular. The results we will prove can be generalized to non-regular distribution
with a technique called ironing, see Section 2.4.6.
The uniform, exponential and generalized Pareto distributions with ξ < 1, are all
regular distributions.
We focus in this section on second-price auctions with reserve prices (Riley and Samuelson,
1981b) where n buyers are asymmetric, i.e., their value distribution can be different. As
a consequence, the seller might also set different, personalized, reserve prices so as to
increase her revenue. When introducing the concept of reserve price, we mentioned that
a bidder can only win the auction if his bid was higher than his reserve price and that
the latter is the minimal payment that bidder might pay. There however remains some
ambiguity on how the auction unfolds (depending on which condition “highest bidders"
or “bid above reserve price" is checked first). As a consequence, there exist at least two
different types of second-price auction with reserve prices.
“Lazy” 2nd-price auction: The winner can only be the highest bidder. He gets the item
only if he clears his reserve price (i.e., he bids above it), and pays the maximum
between his reserve price and the second highest bid overall (regardless of whether
the second highest bid cleared its reserve).
“Eager” 2nd-price auction: Bidders that have not cleared their respective reserve price
are disregarded. Thus the winner is the highest bidder amongst those that have
cleared their reserve price and he pays the maximum between his reserve price and
the second highest cleared bid.
25
First of all, notice that if the reserve prices are anonymous, i.e., the same for all bidders,
as they should be in the symmetric case for instance, both types of auctions coincide.
Optimal reserve prices are easy to compute in lazy auctions, as they have an explicit
form. They are on the other hand hard to compute for eager auctions. Moreover, the eager
2nd-price auction is also not the revenue-maximizing auction for the seller. So this concept
is neither simple (as is the lazy auction) nor optimal (as is the Myerson auction, see Section
3.3.1). As a consequence, we will not put too much emphasis on eager auctions. In practice,
if one wishes to implement eager 2nd-price auctions, a good idea would be to use the
reserve prices of the corresponding lazy 2nd-price auction.
It is quite immediate to see that lazy and eager second price auctions are still DSIC
mechanism (the proof follows the exact same lines as that without reserve prices), hence we
shall again only consider the truthful equilibrium. We now derive the expected payment
of a bidder at this equilibrium.
Theorem 2.13. Let Pi be the expected payment of bidder i facing reserve price ri at the
truthful equilibrium of a lazy second-price auction. Then
Pi = Exi ∼Fi ψi (xi )Gi (xi )1{xi ≥ ri } ,
Proof. Let us introduce the notation y−i = maxj,i xj so that the pointwise payment of bidder
i, when he has value xi , given all the values xj is equal to:
We note that
max{ri , y−i }1{xi ≥ y−i }1{xi ≥ ri } = 1{xi ≥ ri } ri 1{y−i ≤ ri } + y−i 1{xi ≥ y−i }1{y−i ≥ ri } .
As a consequence,
Pi = EF max{ri , y−i }1{xi ≥ y−i }1{x ≥ ri }
Z Z
= max{ri , y−i }1{xi ≥ y−i }1{xi ≥ ri }fi (xi )gi (y−i )dy−i dxi
xi y−i
Z +∞ Z +∞
= 1 − Fi (ri ) Gi (ri )ri + y−i gi (y−i )fi (xi )1(xi ≥ y−i )dy−i dxi
xi =ri y−i =ri
Z +∞ Z +∞
= 1 − Fi (ri ) Gi (ri )ri + y−i gi (y−i ) fi (xi )dy−i dxi (by Fubini)
y−i =ri xi =y−i
Z +∞
= 1 − Fi (ri ) Gi (ri )ri + y−i gi (y−i )(1 − Fi (y−i ))dy−i
y−i =ri
26
Z +∞
= f (y−i )y−i − (1 − Fi (y−i )) Gi (y−i )dy−i (by integration by parts)
y−i =ri
= EFi ψi (x)Gi (x)1{x ≥ ri } .
We can therefore easily derive the optimal reserve prices, as a function of ψi at least for
regular distributions.
Theorem 2.14. If Fi are regular, the optimal reserve prices in a lazy second-price auction
are:
! !
−1 −1
r1 , . . . , rn = ψ1 (0), . . . , ψn (0) ,
with the convention that ψ1−1 (0) is the minimum of the support of Fi is ψi is positive
everywhere and the point where ψi changes its sign if it is discontinuous.
The proof indicates that in a lazy second-price auction, the seller can safely maximize
the payment of each bidder one by one independently. Indeed, in a lazy second price
auction, changing the reserve price of one specific bidder does not change the probability
of winning and the payment of the other bidders. This is not the case for eager second-
price auctions on the other hand, which explains the complexity of computing the optimal
reserve prices in them. Finally, the optimal reserve prices in a lazy second-price auction
correspond to the monopoly prices of each bidder.
Corollary 2.15. The optimal reserve price for a bidder in a lazy second-price auction is
independent of the presence, or not, of other bidders. In particular, it is the same as in the
situation where he is the only bidder.
27
Bidder payment as function of the reserve price
Varying number of bidders
0.25
1 bidder
0.20 2 bidders
Bidder payment 3 bidders
0.15
4 bidders
0.10 Monopoly
price
0.05
0.00
0.0 0.2 0.4 0.6 0.8 1.0
Reserve price r
Figure 2.1: Bidder’s payment as a function of the reserve price depending on the number of players in
second-price auction with bidders all having a uniform value distribution U([0, 1]).
The seller’s revenue increases using personalized reserve prices when bidders have
very different value distributions. Intuitively, it is in her best interest to set a high reserve
price to bidders with high values most of the time (or very high values sometimes) and low
reserve prices to bidders with low values most the time.
So far, we have only focused on the seller’s revenue when designing auctions. An
alternative objective can be the maximization of the global welfare of the system, which is
the sum of the seller’s revenue and all bidders’ utility.
Even though reserve prices largely increase the seller’s revenue, they actually signif-
icantly decrease the expected total welfare, as the item will sometimes not be allocated.
This happens when all bidders (or at least the highest one in lazy auctions) have values
below their reserve prices.
Example. To illustrate this decrease in welfare, we are going to consider a simple example.
There are n = 2 symmetric bidders with a value drawn uniformly over [0, 1]. Because
of the symmetry, the optimal reserve price is the same for both bidders hence lazy and
eager auctions coincide (and we do not need to specify the rule). In this simple case,
r ∗ = ψ −1 (0) = 1/2. As a consequence, the item is allocated as soon as one bidder bids above
1/2, which happens with probability 3/4. Otherwise, the item is not sold (which obviously
happens with probability 1/4). Simple computations show that changing the design from a
second-price auction without reserve price to a second-price auction with optimal reserve
price yields
28
• a 25 % increase of the seller’s revenue (from 1/3 to 5/12).
29
Since Vi is convex, this means that Qi (xi ) belongs to the subdifferential of Vi at xi , i.e.,
and ∇Vi (xi ) = Qi (xi ) if Vi differentiable at xi . Therefore, using Theorem D.2.3.4 in (Hiriart-
Urruty and Lemaréchal, 2001),
Z xi
Vi (xi ) − Vi (0) = Qi (z)dz ,
0
The result follows from the fact that Pi (0) = 0 since the auction is 0-rational.
The last equality comes from the fact that Qi (xi ) = Ex∼F [qi (x)|xi ] and the tower property
of conditional expectations.
Remark. Theorem 2.13 is a direct consequence of this result with the specific choice of
Qi (xi ) = Gi (xi )1{xi ≥ ri }.
Myerson’s lemma indicates that the expected payment of a 0-rational BIC auction
only depends on the allocation rule and the virtual value; the proof actually gives a
characterization of any incentive-compatible auction. However, this characterization is
slightly different for BIC and DSIC auctions.
Corollary 2.17 (Myerson, 1981). Using the notations of Theorem 2.16, an auction is
0-rational and BIC if and only if
i) the allocation rule is monotone, i.e., the probability of winning, as a function of the
bid, is non-decreasing (for any fixed bids of others bidders) and
30
ii) the expected payment verifies
Z xi
Pi (xi ) = Qi (xi )xi − Qi (z)dz.
0
Remark. Using the fact that Qi (xi ) = Ex∼F [qi (x)|xi ], we see that given an allocation rule
qi (x), the expected payment requirement can be fulfilled by the requiring, auctionR x by auc-
tion, an expected payment, given the vector of bids/values x, of pi (x) = xqi (x) − 0 qi (x)dxi .
(In the last integral all the bids x−i are fixed and the integral is performed over xi which
varies from 0 to x.)
Proof. The proof of Theorem 2.16 gives the first implication. For the reverse, let us assume
that all bidders except i bid truthfully; and let us show that i has an incentive to also bid
truthfully. This will show that truthful bidding constitutes a Nash equilibrium and hence
the auction is BIC.
Note that because we have assumed that all other bidders bid truthfully, if bidder i
bids z, the probability that he wins is Qi (z). Hence, the expected utility derived by bidder i
when bidding z and his value is xi is
Zz
Ui (z, xi ) = xi Qi (z) − Pi (z) = (xi − z)Qi (z) + Qi (t)dt .
0
The second equality comes from assumption ii). Let us call bi∗ , the optimal bid of bidder i.
Let us now show that bi∗ = xi . To do so, we simply need to establish that
Z xi Zz
∀z ∈ R+ , Ui (xi , xi ) = Qi (t)dt ≥ (xi − z)Qi (z) + Qi (t)dt .
0 0
This is equivalent to showing that
Z xi
∀z ∈ R+ , Qi (t)dt ≥ (xi − z)Qi (z) .
z
Multiplying the previous inequality by (−1) on both sides shows that if z ≤ xi , we also have
Z xi
Qi (t)dt ≥ (xi − z)Qi (z) .
z
31
So we have shown that
∀z ∈ R+ , Ui (xi , xi ) ≥ Ui (z, xi ) .
Therefore, bidding truthfully is an optimal strategy for bidder i and the auction is BIC.
ii) the payment of the winning bidder is the minimum bid guaranteeing that he would
still have won the auction.
Given a monotone allocation rule and assuming 0-rationality, the payment rule is unique.
Proof. The proof is almost identical, one just needs to make the various computations
pointwise (for any vector x−i ) instead of in expectation.
This characterization can be extended to very general mechanisms (Archer and Tardos,
2001).
Definition 2.19. The Myerson auction, for regular value distribution Fi with associated
virtual value ψi , is defined by the two following rules:
Allocation rule: Given the bids b = (b1 , . . . , bn ), the winner is the bidder with the highest
non-negative virtual value ψi (bi ), i.e.,
n o
qi (b) = 1 i = arg max ψj (bj ) ; j s.t. ψj (bj ) ≥ 0
with the convention that if all virtual values are negative, then the item is not
allocated and qi (b) = 0. Ties are broken arbitrarily.
This auction amounts to running a second price auction with reserve prices 0 among
the virtualized bids ψk (bk ) and converting back this “virtual cost" in the original bid space
of the winner i through the function ψi−1 .
32
Theorem 2.20. If F1 , . . . , Fn are regular, the Myerson auction maximizes seller’s revenue
among all BIC and interim-IR auctions.
Proof. The Myerson auction is BIC as it verifies the condition of Corollary 2.17. Since ψi
are non-decreasing (as Fi are regular), the probability of winning is non-decreasing.
To show individual-rationality, we remark that since the auction is BIC,
Z xi Z xi
Vi (xi ) = Vi (0) + Qi (si )dsi = −Pi (0) + Qi (si )dsi ≥ 0,
0 0
because Pi (0) = 0. Thanks to Myerson’s lemma, Theorem 2.16, the payment of each BIC
auction is equal to
Pi (0) + Exi ∼Fi [ψi (xi )Qi (xi )] .
The Myerson auction maximizes the two terms of this expression since for any rational
auction, Pi (0) ≤ 0. Since the winner in the Myerson auction is the bidder who verifies
ψi (xi ) = max ψj (xj ) ,
j∈S
and the item is not allocated when all ψi are negative, the second term, is also maximized
pointwise. Indeed, note that we can rewrite this second term
Exi ∼Fi [ψi (xi )Qi (xi )] = Ex∼F [hψ(x), q(x)i] ,
where h·, ·i is the standard inner product.
Corollary 2.21. In the symmetric case, the second-price auction with reserve prices set to
monopoly prices is the revenue-maximizing auction.
Remark. The seller can increase her revenue if the mechanism is only required to satisfy
ex-ante rationality instead of interim rationality, as shown in (Cremer and McLean, 1988).
Indeed, there exists a BIC auction that is ex ante individually rational that accomplishes
full-surplus extraction for the seller. In other words, the utility of bidders in this auction is
equal to zero. This auction is not interim individually-rational since the expected utility
when the bidder’s value is zero is strictly negative. This setting of ex-ante individual
rationality only makes sense when bidders have to decide to take part in the auction before
understanding their value for the item. We shall come back in more details to this setting
in Section 4.2.2.
33
Theorem 2.22. Given a mechanism and a specific Nash equilibrium for this mechanism,
there exists another BIC mechanism where the bidders’ expected utility and seller’s revenue
at the truthful equilibrium are equal to the ones at the original Nash equilibrium.
Corollary 2.23. If value distributions are regular, the Myerson auction is the revenue-
maximizing mechanism among all individually-rational mechanisms which have a Nash
equilibrium.
We now extend the Myerson auction to cases where value distributions are not regular.
Definition 2.24. For a function h defined on some set E ⊂ Rd , we call cav(h) is the concavi-
fication of the function h, which is its smallest concave majorant, i.e., the smallest concave
function above h: its hypograph is the convex hull of the hypograph of h. Moreover, this
function is defined pointwise as
cav (h) (x) = sup Eµ [h(z)] ; µ is a probability distribution on E such that Eµ [z] = x
34
We refer to (Rockafellar, 1970), p. 36, (Hiriart-Urruty and Lemaréchal, 2001) pp.98-102
and (Groeneboom and Jongbloed, 2014) pp.55-57 for properties of least concave majorant,
greatest convex minorant and convex hull of functions. In particular, if h is bounded and
attains its maximum, cav (h) has the same maximum attained (at least) on the convex hull
of the set of maximizers of h. Moreover, h and cav (h) are equal on the extreme points of the
definition set of h; this implies that if h is defined on [a, b], then necessarily cav (h) (a) = h(a)
and cav (h) (b) = h(b).
We can now define the ironed virtual value.
Definition 2.25. For any non-regular distribution F, the ironed virtual value of ψ, denoted
by ψ̃ is defined by
ψ̃(x) = ∂ − cav(Π ◦ F −1 ) (F(x)) , where ∂ denotes the subdifferential.
We refer to (Fu, 2016) for more technical details. In particular, the ironed virtual value,
since it is defined as a sub-differential, is not a function but a multi-valued mapping.
On the other hand, selecting the aforementioned ψ̃(x) as ψ(x) when the sub-differential
is not reduced to a singleton is also perfectly valid and implicitly used as convention
from now on. With this latter expression, either ψ̃ is equal to ψ or it is constant on some
interval around x. It is non-decreasing everywhere and intervals where ψ is decreasing are
“flattened”, as illustrated in Figure 2.2.
Recall that the purpose of ironing is to replace the - possibly somewhere decreasing -
virtual value function ψ in the Myerson auction (that might then not be BIC) by ψ̃. We now
show that ironing the virtual value does not decrease the revenue of the Myerson auction.
35
Concavification of the revenue curve Original and ironed virtual values
1.0
0.15 ψ(x)
ψ̃(x)
0.5
0.10
Π ◦ F −1
ψ(x)
0.0
0.05
cav(Π ◦ F −1)
Π ◦ F −1 −0.5
0.00
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
p x
0.10
Π(x)
0.05
0.00
0.0 0.2 0.4 0.6 0.8 1.0
x
Figure 2.2: Left: Concavification of Π ◦ F −1 for a mixture of U(0, 0.5) and U(0, 1). Right: Associated virtual
value (plain black) and ironed one (dashed orange). Center: Original (plain black) and ironed (dashed orange)
monopoly revenue.
Lemma 2.27. The payment of bidder i at the truthful equilibrium of any BIC auction
satisfies h i
Exi ∼Fi [Pi (xi )] ≤ Pi (0) + Exi ∼Fi Qi (xi ) ψ̃i (xi ) ,
where ψ̃i is his ironed virtual value.
Our assumptions that fi > 0 almost everywhere and hence Fi is increasing is important;
so is the assumption that Fi has 1 moment, which implies that ψi (xi ) and ψ̃i (xi ) have
finite mean. Note that this assumption is really minimal as it just means that the expected
payment under Qi (xi ) = 1 is finite.
The papers (Myerson, 1981) and (Fu, 2016) implicitly assume differentiability of Qi
in the previous lemma without stating it explicitly. In the case of differentiable Qi the
proof boils down to integration by parts applied twice and the fact that cav (h) ≥ h for
any function h. At the level of generality of our statement, which is needed for the most
important applications, it is more technical and we give the proof in Subsection 2.6.4.
36
We now generalize Theorem 2.20 to the case of non-regular value distributions.
Example. We consider the case of the Myerson auction with symmetric players but non-
regular value distributions in Subsection 2.6.3, where we derive the payment and allocation
rules. With non-regular value distributions the Myerson auction in the symmetric case is
not a second price auction with reserves anymore.
Proposition 2.29. Suppose bidder i participates in an auction such that the probability of
winning the auction when bidding b is Gi (b) and the corresponding expected payment is
Pi (b). Let βi an optimal strategy for bidder i in this setup that maximizes his utility. Then
the seller revenue coming from bidder i is
h i
Exi ∼Fi [Pi (βi (xi ))] − Pi (βi (0)) = Exi ∼Fi Gi (βi (xi ))ψi (xi ) , (2.4)
Remark. This proposition helps explaining how our interpretation of the ψi (xi ) as a virtual
payment makes sense for utility-maximizing bidders in general and not only in the case of
BIC or DSIC auctions we encountered previously. It can also be seen as a more quantitative
version of the revelation principle.
Let us call
W (x) = xGi (βi (x)) − Pi (βi (x))
The envelope theorem in the form of Theorem 2 of Milgrom and Segal, 2002 applies since
0 ≤ Gi (t) ≤ 1 for all t and hence
Zx
W (x) − W (0) = Gi (βi (t))dt .
0
Therefore, the expected payment satisfies
Z x
Pi (βi (x)) − Pi (βi (0)) = xGi (βi (x)) − Gi (βi (t))dt .
0
37
Taking expectation with respect to xi with cdf Fi in the previous equation gives
Z∞ Z ∞Z ∞
Exi ∼Fi [Pi (βi (xi ))] − Pi (βi (0)) = xGi (βi (x))fi (x)dx − fi (x)1{t ≤ x}Gi (βi (t))dt dx ,
0 0 0
Z∞ Z∞
= xGi (βi (x))fi (x)dx − (1 − Fi (t))Gi (βi (t))dt ,
0 0
h i
= Exi ∼Fi Gi (βi (xi ))ψi (xi ) .
In the case of first price auctions, we have Pi (βi (xi )) = βi (xi )Gi (βi (xi )), since the proba-
bility of winning is Gi (βi (xi )). So the arguments given in the proof above also implies the
following result (Kirkegaard, 2009): the optimal strategy for bidder i when the top bid of
the competition has cdf Gi and Pi (βi (0)) = 0 satisfies
Rx
Gi (βi (t))dt
βi (x) = x − 0 . (2.5)
Gi (βi (x))
Rx
The expected utility of player i at x is then equal to Ui (βi (x), x) = 0 Gi (βi (t))dt and it
h i h i
also holds that Exi ∼Fi Ui (βi (xi ), xi ) = Exi ∼Fi Gi (βi (xi ))(xi − ψ(xi )) .
Corollary 2.30. In a first price auction, when bidder i uses the strategy implicitly defined
in Equation (2.5), the seller revenue coming from bidder i is
h i
Exi ∼Fi Gi (βi (xi ))ψi (xi ) , (2.6)
The corollary follows from Proposition 2.29 after noticing that Pi (βi (0)) = βi (0)Gi (βi (0)) =
0 when bidder i uses βi defined in Equation (2.5).
Good or optimal reserve prices Proposition 2.29 and Corollary 2.30 suggest that from
a seller revenue standpoint it would be good to avoid bids corresponding to values that
have negative virtual values. In other words, setting individual reserve values at ψi−1 (0) for
bidders may have positive impact for seller revenue. However, this interpretation ignores
the impact of setting such reserve values on the strategic response of the bidder that is
optimizing his utility.
Finding optimal reserve prices for first price auctions is much more complicated than
finding them for the second price auctions, even with n = 2 symmetric bidders (Kotowski,
2018) for the following two reasons. First, if the distribution F is not regular and its density
is discontinuous at the monopoly price, giving two different reserve prices to the two
symmetric buyers actually increases the seller revenue at equilibrium, at least when r1
38
and r2 are close. Second, in the specific case where those two reserve prices r1 and r2 are
sufficiently close, the equilibrium bid distribution of the player with the lower reserve
price becomes discontinuous.
Many further difficulties arise when the seller tries to learn good reserve prices for first
price auctions from data and does not have access to the bidders’ value distributions. We
further detail them in Section 3.4.4.
Theorem 2.31 (Bulow and Klemperer, 1996). In the symmetric setting and with regular
distributions, the revenue of the Myerson auction with n bidders is lower than the revenue
of the Vickrey auction with n + 1 bidders.
0.6
0.4
Figure 2.3: Illustration of the Bulow-Klemperer theorem for the case with value distributions U([0, 1]).
39
Lemma 2.32. In the symmetric case and with regular distributions, the Vickrey auction is
revenue-maximizing in the class of individually-rational auctions where the item is always
attributed.
The proof of the Bulow-Klemperer theorem can now be derived from the previous
lemma.
Proof. Let us assume that there are n + 1 bidders, and consider the following mechanism.
First, the seller runs a Myerson auction on n bidders (chosen arbitrarily). If the item is
not allocated by the Myerson auction, it is allocated for free (i.e., without any payment)
the (n + 1)-th bidder. The revenue of this auction is equal to the revenue of the Myerson
auction with n bidder, yet it is an auction that always allocate an item amongst n+1 bidders.
Lemma 2.32 implies that the revenue of this auction is smaller than the one of the Vickrey
auction. This gives the result.
Nonetheless, let us assume that a specific mechanism has been chosen, independently
of the value distributions (which are unknown in this setup). A crucial question that
remains is the evaluation of this specific choice of mechanism in the worst-case analysis.
We will restrict ourselves to DSIC auctions. We will use the notion of competitive ratio
defined as the infimum, over all possible value distributions, of the revenue of this auction
divided by the optimal revenue of the Myerson auction for these distributions.
The competitive ratio is obviously smaller than 1, and the bigger the better. Unfortu-
nately, if the class of value distributions is not restricted when computing this infimum,
there does not exist any auction with a positive competitive ratio (Allouah and Besbes,
40
2020). As a consequence, we will use different types of restrictions to achieve non-trivial ap-
proximation results. Based on the Bulow-Klemperer theorem, we can derive some revenue
guarantees on the second price auction without reserve price.
= Pi (Myersonn+1 ),
We proved in Corollary 2.33, through the Bulow-Klemperer theorem that the compet-
itive ratio of the Vickrey auction is at least 0.5, when the distributions are restricted to
regular ones. Interestingly, it is possible to do better with a slightly different auction.
Theorem 2.34 (Fu et al., 2015). There exists an incentive-compatible auction with a
competitive ratio of 0.512 against regular value distributions.
The mechanism considered was the first with a higher competitive ratio than 0.5 against
regular distributions. It is a slight modification of the Vickrey auction where the seller
inflates the second highest bid. Formally, the mechanism is the following: with probability
1 − ε, a second price auction without reserve price is run. With the probability ε, the
mechanism allocates the object to the bidder with the highest valuation, but only if his
41
valuation is greater than 1 + δ times the valuation of the second highest bidder and pays
(1 + δ) the second highest bid. Otherwise, the mechanism does not allocate the item.
The idea behind this theorem is the following. The Bulow-Klemperer bound of 0.5 on
the competitive ratio is rather tight for regular distributions that would induce a high
optimal reserve price. On the other hand, it is rather loose for regular distribution with a
low associated reserve price. Inflating the second highest bid has a positive effect for the
former type of distribution (as it somehow emulates a high reserve price) and a negative
effect for the later type (because it induces some reserve price, bigger than what it should
be). As a consequence, the ratio of revenues increases in the first case, and decreases in the
second one; but thanks to the looseness in the Bulow-Klemperer bound of 0.5, the infimum
globally increases.
On the other hand, when restricted to MHR distributions, the ratio is equal to 0.7153
and this result is tight.
For regular distribution, this result was then improved up to 0.519 (Allouah and Besbes,
2020) and further improved in (Hartline et al., 2020) which identified the optimal prior-
independent mechanism. The optimal mechanism is a mixture between a second price
auction and the same auction where the prices are scaled up by a factor of about 2.5. The
authors find the worst-case family of distributions and use these distributions to derive
the optimal mechanism and solve the problem.
As we have explained while presenting them, second price auctions are DSIC, and hence
there exists a truthful equilibrium.
There however exist many other equilibria such as the following one. Suppose for
concreteness that all the value distributions of the bidders are supported on [0, 1]. Suppose
now that every bidder always bids 0 except one of them - say, bidder 1 - bids arbitrarily
high, say 1. Clearly for bidder 1 this is a best response to having a competition of 0 since
he wins all auctions and pays 0. For bidders other than 1, winning entails bidding more
than 1 and paying 1; but their values are less than 1. So, the utility of winning any auction
is non-positive and negative as soon as their value is strictly less than 1 and the maximum
utility they can expect is 0. And this is achieved by many strategies but in particular by
bidding 0 all the time, which is then clearly a best response.
42
2.6.2 No revenue equivalence when β is only non-increasing
We are grateful to an anonymous referee for bringing up this example while discussing
Theorem 2.8. Suppose bidders have the same value distribution on say [0, 1] and that the
value distribution admits a density.
Consider the following 0-rational auction with standard allocation rule, i.e., the winner
is the highest bidder; ties are broken at random among top bidders. Bidders pay 0 when
they bid 0 and pay 1 otherwise. Note that for any bidder bidding x > 0 results in non-
positive expected utility: either they win, and their utility is negative or they lose and their
utility is 0. Hence an equilibrium is for all bidders to bid 0. This equilibrium is symmetric.
The seller’s expected revenue is therefore 0. However, Theorem 2.8 states that the expected
payment for this type of auctions is independent of the payment rule at an increasing
symmetric equilibrium. The issue in this very interesting example is that the symmetric
equilibrium strategy described here is not increasing: it is in fact constant, since it maps all
values to 0. Looking at the proof of Theorem 2.8 it is clear that the expected utility Ui (z, xi )
is then not xi Gi (z) − Pi (z) - as was key to the proof. The utility when bidding b when the
other players use this strategy is (xi − 1)1{b > 0} and n1 xi if b = 0.
2.6.3 Example: the Myerson auction in the symmetric case with non-regular value
distribution
We consider the symmetric case where the n independent bidders still have a value
distribution denoted by F with a density f with f > 0. In particular, with probability 1 the
values they draw are all different. For simplicity we suppose that there is a single interval
[α, β] on which ψ requires ironing. In this case, using the remark following Corollary 2.17,
an optimal auction is the following: suppose bidder i value is such that ψ̃(xi ) ≥ maxj {ψ̃(xj )}
and ψ̃(xi ) ≥ 0, so that he might win the auction. As before, we call ψ̃ −1 (0) = inft {t : ψ̃(t) ≥ 0}.
1. if maxj,i ψ̃(xj ) < 0, bidder i wins the auction and pays ψ̃ −1 (0). For the other cases
below, we assume that maxj,i ψ̃(xj ) ≥ 0.
2. if maxj,i xj > β, then bidder i wins the auction and pays second price i.e., maxj,i xj .
3. if maxj,i xj < α, then bidder i wins the auction and pays maxj,i xj , i.e., second price.
4. if maxj,i xj ∈ (α, β), let us call K the number of bidders in i’s competition who have
xj ∈ (α, β). Then two situations arise:
β−α
4a) either xi > β, in which case bidder i wins the auction and pays β − K+1 . An
equivalently payment scheme would be to draw K i.i.d. uniform random vari-
ables uk on [α, β] and to charge their maximal value Y = max1≤k≤K uk , since
β−α
E[Y ] = β − K+1 ;
43
4b) or xi ∈ (α, β) in which case the winner is chosen uniformly at random among
the K + 1 bidders having value in (α, β) - and hence having the same virtualized
bid ψ̃(xk ). When bidder i wins, he pays α.
Recall that when ψ is regular, the optimal auction is a second price auction with
monopoly reserve.
44
Before starting the main argument of the proof, we now show that Exi ∼Fi [|ψi (xi )|] and
h i
Exi ∼Fi |ψ̃i (xi )| are finite when Fi has one moment. We simply note that
Z ∞ Z ∞
Exi ∼Fi [|ψi (xi )|] = |xi fi (xi ) − (1 − Fi (xi ))|dxi ≤ xf ii (xi ) + (1 − Fi (xi ))dxi = 2Exi ∼Fi [xi ] .
0 0
Now ψ̃i = ψi except on intervals where ψ̃i is constant and equal to the mean of ψi on those
intervals. It follows that on those intervals, the mean of |ψ̃i | is less that the mean on |ψi |.
Hence we also have
h i
Exi ∼Fi |ψ̃i (xi )| ≤ 4Exi ∼Fi [xi ] .
i.e., Z ∞ Z ∞
Qi (x)ψi (x)fi (x)dx ≤ Qi (x)ψ̃i (x)fi (x)dx
0 0
We have
Z ∞ N Z
X xk+1,N
Qi (x)ψi (x)fi (x)dx = Qi (x)ψi (x)fi (x)dx
0 k=0 xk,N
N Z xk+1,N N Z xk+1,N
X k X k
= ψi (x)fi (x)dx + ψi (x)fi (x)(Qi (x) − )dx
xk,n N N
k=0 k=0 xk,N
N N Z xk+1,N
X k X k
= Π(xk,N ) − Π(xk+1,N ) + ψi (x)fi (x)(Qi (x) − )dx
N xk,N N
k=0 k=0
N N Z xk+1,N
1X X k
= Π(xk,N ) + ψi (x)fi (x)(Qi (x) − )dx
N xk,N N
k=0 k=0
N Z ∞
1 X 1
≤ Π(xk,N ) + |ψi (x)|fi (x)dx
N N 0
k=0
N Z ∞
1 X 1
≤ Π(xk,N ) + xfi (x) + (1 − F(x)))dx
N N 0
k=0
N
1X 2Exi ∼Fi [xi ]
≤ Π(xk,N ) +
N N
k=0
Using the exact same argument for Π̃ defined above, which is a primitive of −ψ̃f that
45
upperbounds Π, we finally get
∞ N
2Exi ∼Fi [xi ]
Z
1X
Qi (x)ψi (x)fi (x)dx ≤ Π(xk,N ) +
0 N N
k=0
N
1 X 2Exi ∼Fi [xi ]
≤ Π̃(xk,N ) +
N N
k=0
Z∞
6Exi ∼Fi [xi ]
≤ Qi (x)ψ̃i (x)fi (x)dx +
0 N
Hence,
Z ∞ Z ∞ 6Exi ∼Fi [xi ]
Qi (x)ψi (x)fi (x)dx − Qi (x)ψ̃i (x)fi (x)dx ≤ .
0 0 N
As the left-hand side does not depend on N we can take the limit as N → ∞ to conclude
that Z∞ Z∞
Qi (x)ψi (x)fi (x)dx − Qi (x)ψ̃i (x)fi (x)dx ≤ 0 .
0 0
This concludes the proof.
References
Allouah, A. and O. Besbes. 2020. “Prior-independent optimal auctions”. Management
Science. 66(10): 4417–4432.
Archer, A. and É. Tardos. 2001. “Truthful mechanisms for one-parameter agents”. In:
Proceedings 2001 IEEE International Conference on Cluster Computing. IEEE.
Arnosti, N., M. Beck, and P. Milgrom. 2016. “Adverse selection and auction design for
internet display advertising”. American Economic Review. 106(10): 2852–66.
Balseiro, S. R., O. Candogan, and H. Gurkan. 2020. “Multistage Intermediation in Display
Advertising”. Manufacturing & Service Operations Management.
Boyd, S. and L. Vandenberghe. 2004. Convex Optimization. USA: Cambridge University
Press. isbn: 0521833787.
Bulow, J. and P. Klemperer. 1996. “Auctions Versus Negotiations”. The American Economic
Review. 86(1): 180–194.
Cremer, J. and R. P. McLean. 1988. “Full extraction of the surplus in Bayesian and dominant
strategy auctions”. In: Econometrica: Journal of the Econometric Society. JSTOR. 1247–
1257.
Feng, Z., S. Lahaie, J. Schneider, and J. Ye. 2021. “Reserve Price Optimization for First
Price Auctions in Display Advertising”. International Conference on Machine Learning:
3230–3239.
46
Fibich, G. and A. Gavious. 2003. “Asymmetric First-Price Auctions: A Perturbation Ap-
proach”. Mathematics of Operations Research. 28(4): 836–852.
Fibich, G. and N. Gavish. 2012. “Asymmetric First-Price Auctions—A Dynamical-Systems
Approach”. Mathematics of Operations Research. 37(2): 219–243.
Fu, H. 2016. “Notes on Myerson’s Revenue Optimal Mechanisms”. http://fuhuthu.com/
notes/iron.pdf. Accessed: 2021-08-25.
Fu, H., N. Immorlica, B. Lucier, and P. Strack. 2015. “Randomization beats second price
as a prior-independent auction”. In: Proceedings of the Sixteenth ACM Conference on
Economics and Computation. 323–323.
Gayle, W.-R. and J. F. Richard. 2008. “Numerical Solutions of Asymmetric, First-Price,
Independent Private Values Auctions”. Computational Economics. 32(3).
Groeneboom, P. and G. Jongbloed. 2014. Nonparametric Estimation under Shape Constraints.
Cambridge University Press.
Hartline, J., A. Johnsen, and Y. Li. 2020. “Benchmark design and prior-independent op-
timization”. 2020 IEEE 61st Annual Symposium on Foundations of Computer Science
(FOCS): 294–305.
Hartline, J. D. et al. 2013. “Bayesian mechanism design”. Foundations and Trends® in
Theoretical Computer Science. 8(3): 143–263.
Hiriart-Urruty, J.-B. and C. Lemaréchal. 2001. Fundamentals of Convex Analysis. isbn: 978-
3-540-42205-1. doi: 10.1007/978-3-642-56468-0.
Kirkegaard, R. 2009. “Asymmetric first price auctions”. Journal of Economic Theory. 144(4):
1617–1635. issn: 0022-0531.
Kotowski, M. H. 2018. “On asymmetric reserve prices”. Theoretical Economics. 13(1): 205–
237.
Krishna, V. 2009. Auction Theory.
Lebrun, B. 1999. “First Price Auctions in the Asymmetric N Bidder Case”. International
Economic Review. (1).
Marshall, R., M. Meurer, J. Richard, and W. Stromquist. 1994. “Numerical analysis of
asymmetric first price auctions”. Games and Economic Behavior. (2). issn: 0899-8256.
Milgrom, P. 2004. Putting auction theory to work. Cambridge University Press.
Milgrom, P. and I. Segal. 2002. “Envelope theorems for arbitrary choice sets”. Econometrica.
70(2): 583–601.
Myerson, R. B. 1981. “Optimal auction design”. Mathematics of operations research. 6(1):
58–73.
Riley, J. G. and W. F. Samuelson. 1981b. “Optimal auctions”. The American Economic Review.
71(3): 381–392.
Rockafellar, R. T. 1970. Convex Analysis. Princeton University Press.
Vickrey, W. 1961. “Counterspeculation, auctions, and competitive sealed tenders”. In: The
Journal of finance. Vol. 16. No. 1. Wiley Online Library.
47
3 Repeated auctions from a seller’s standpoint
First read of this chapter, key concepts and ideas
3.1 Motivation
The first large-scale field experiment in production showed how engineers at Yahoo could
handle their huge datasets to learn an optimal reserve price per key word (Ostrovsky
and Schwarz, 2011). Bidders were assumed to be non-strategic and to bid truthfully on
their platform. In the Ebay case, as buyers are different from one auction to the other,
the seller knows that, with running an incentive-compatible auction, bidders will bid
truthfully. Hence, the online platform is able to learn an optimal reserve price per object
and derive to a revenue-maximizing objective. In these two examples, the seller has access
to samples from bidders’ past values and they aim at exploiting this information to learn a
revenue-maximizing auction. The value distributions encompass the variability of values
between bidders or between objects sold on the platform.
The emergence of this setting created numerous bridges between statistical learning
and auction theory (Bar-Yossef et al., 2002; Blum et al., 2004; Lavi and Nisan, 2004), the
former being used to estimate the quantities (e.g. value distributions) to compute solution
for the latter. This chapter casts some light on these links, and how far the underlying
problem of learning revenue-maximizing auctions has been tackled.
48
A learner is given a set of hypothesis A: it is a set of possible auctions to run – e.g.,
second-price auctions with a set of possible reserve prices. She is also given a set of
observations ST = {x1 , . . . , xT }, sampled independently from the joint product distribution
F = F1 ⊗ . . . ⊗ Fn and belonging to a set of distributions D on a domain X . We emphasize
again that xt is a vector that corresponds to all bidder’s value and is sampled according
to a distribution F whose marginals corresponds respectively to every bidder’s value
distribution F1 , . . . Fn . For each value vector x ∈ X , and each IC auction a ∈ A, we denote by
ra (x) the revenue of the auction at the truthful equilibrium.
A classical assumption is to consider that for a given distribution F, there exists an
optimal hypothesis (i.e. an optimal auction or an optimal vector of reserve price). This
hypothesis is called the target hypothesis or optimal Bayes hypothesis. For the auction setting,
the optimal Bayes hypothesis is defined as
a∗A (F) = argmax R(a) where R(a) = Ex∼F [ra (x)] , (3.1)
a∈A
which is, by definition, the Myerson auction run on F if A represents the whole set of
auctions denoted by A. As it does not depend on a particular class of auctions, we simply
denote it a∗ (F). The practical objective of the learner is to optimize R(a) accessing only the
empirical distribution described by ST rather than the true distribution F.
A popular approach is to replace the true distribution F in Equation (3.1), by its
empirical counterpart. This is referred to as Empirical Revenue Maximization (ERM)
principle in statistical learning:
T
bS (a) = 1
X
aA (ST ) = argmax R
b bS (a)
T
where R T
ra (xt ) .
a∈A T
t=1
The goal is to provide error guarantees on the ERM hypothesis b aA (ST ) against the Myerson
auction a∗ (F) depending on the number of samples T and some relevant complexity
measure of the hypothesis class A. Indeed, to make the problem tractable and to avoid
overfitting, the learner often restricts the complexity of hypothesis space A. This leads to a
classical bias / variance trade-off that can be materialized by the following decomposition
of the excess-risk between b aA (ST ) and a∗ (F):
R(a∗ (F)) − R (b
aA (ST )) = R(a∗ (F)) − R(a∗A (F)) + R(a∗A (F)) − R(b
aA (ST ))
| {z } | {z }
approximation error estimation error
The challenge for the learner is, given the knowledge of a set of possible distributions
D and the sample size T , to choose a family of auctions A that allows to balance these two
error terms. In the reminder of this section, we briefly describe classical tools to derive
theoretical guarantees on the estimation error. We also describe why guarantees are not
49
provided for any arbitrarily complex distribution F as it would make worst-case guarantees
mostly void. Approximation error is usually handled in a more ad-hoc way, as it is very
dependent on the hypothesis class.
The rates of convergence of approximation and/or estimation error are formalized
through the notion of sample complexity of an algorithm, i.e., a mapping from the class of
finite datasets into A.
Definition 3.1. Given ε ∈ [0, 1] and δ > 0, the sample complexity of a batch learning
algorithm alg, against a class of joint distributions F is the smallest number of samples
T such that for all distributions F ∈ F , if alg learns from a dataset ST ∼ F⊗T of T samples,
the following holds
n o
P R(alg(ST )) ≥ (1 − ε)R(a∗ (F)) ≤ 1 − δ ;
Stated otherwise, alg is (1 − ε)-optimal with probability at least 1 − δ.
50
Proof. Consider the family of value distributions
2
z , with probability 1/z.
F = {Fz | z ∈ R+∗ } with Fz =
0, with probability 1 − 1/z.
The optimal price of z2 gives an expected revenue of z. For any number of samples T ,
1
and any δ > 0, if z ≥ (1 − (1 − δ) T )−1 then with probability at least 1 − δ the dataset will be
composed of only 0. Let zT be the price posted by the algorithm inq that case; the expected
zT
revenue of the algorithm is therefore zT /z. As a consequence, if z ≥ ε then the algorithm
is only ε-optimal.
Even though the counter-example distribution Fz involved in the proof does not satisfy
the basic assumption of continuity, it is easy to see that smoothing it won’t really change
the proof (except for additional technicalities). Proposition 3.3 implies that some restrictive
assumptions on the joint distribution F are required, such as regularity of the marginals
F1 , . . . , Fn . Another and stronger requirement than increasing virtual value is a monotonous
hazard rate.
f (x)
Definition 3.4. A distribution F has Monotonic Hazard Rate if the hazard rate 1−F(x)
is
non-decreasing over its support.
Uniform, exponential and normal distributions satisfy the MHR condition and, obvi-
ously, all MHR distributions are regular distributions. The converse is not true since the
distribution F(z) = 1 − 1/z is regular but not MHR. Intuitively MHR distributions have
thinner tails than general regular distributions.
Theorem 3.5 (Dhangwatnotai et al., 2015; Huang et al., 2018). The sample complexity of
the empirical monopoly price is of order
To get some intuitions, we provide a simple, but sub-optimal, proof for the case of
bounded distributions. But first, let us explain why bounded distributions are assumed
to lie on [1, H] and not [0, H]; this will not transpire in the proof, as we prove a weaker
statement (with a quadratic dependency in H but for any bounded distributions). The
reason is that usual techniques do not control an error of ε, but an error of εΠ(r ∗ ), which
can be arbitrarily smaller if Π(r ∗ ) is close to 0. The assumption that the support is included
on [1, H], ensures that Π(r ∗ ) ≥ 1. With a simple renormalisation, we can show that the
sample complexity of monopoly price scales as by ρε−2 log(ρ/ε) log(1/δ) if the distribution
is supported on [a, b], with ρ = b/a.
51
Proof. Let F be a bounded distribution whose support in included in [1, H] and let us
denote by r ∗ = argmaxr r(1 − F(r)) the monopoly price, by F b the empirical CDF and by
∗
r̂ = argmaxr r(1 − F(r)) the empirical monopoly price.
b
For the optimal proof, see Huang et al., 2018. These sample complexities match the
lower bounds provided in (Huang et al., 2018) up to logarithmic factors.
Unfortunately, this simple approach does not generalize to regular distributions, es-
pecially to heavy-tailed distribution. Intuitively, there exists a constant probability that
a few outliers generate an empirical monopoly price arbitrarily large. This intuition is
formalized in the following proposition.
Proposition 3.6. There exists a regular distribution F and two constants η0 , δ0 > 0 such
that, for any sample size T ,
∗ ∗
PST ∼F ⊗T Π(r̂ ) < (1 − η0 )Π(r ) > δ0 .
Proof. Consider F(x) = 1−1/x for x < 2 and F(x) = 1−1/(2(x −1)) for x > 2. Then F is regular
since ψ(x) = 0 for x < 2 and ψ(x) = 1 for x > 2 and the monopolistic revenue is equal to 1.
On the other hand, for any sample size T ∈ N, the following holds
52
This problem is related to the estimation of the mean of heavy tailed distributions.
We refer the interested reader to (Lugosi and Mendelson, 2019) for a precise survey on
algorithms used to estimate the mean of heavy-tailed distributions.
To handle heavy-tailed regular distributions, a solution introduced in (Dhangwatnotai
et al., 2015) consists in removing the largest samples.
Definition 3.7. Given a dataset ST = {xt | t ∈ [T ]}, assuming the xt are ordered so that
x1 ≤ x2 ... ≤ xT , and an accuracy parameter κ > 0, we denote by
Theorem 3.8 (Dhangwatnotai et al. (2015)). The sample complexity of the guarded empir-
ical monopoly price with κ = ε is of order Θ(ε−3 log(1/ε) log(1/δ)) for regular distributions.
53
Monopolistic revenue function of the reserve price
Varying for different distributions
Mixture of gaussian
0.4
Log normal
0.3 Kumaraswamy
r(1-F(r))
0.2
0.1
0.0
0.0 0.5 1.0 1.5 2.0
r
Figure 3.1: Several monopolistic revenues depending on the possible value distribution. The point yellow
corresponds to a log normal distribution with mean 0.5 and scale 0.5, the point-dashed green corresponds to a
Kumaraswamy distribution with parameters a= 2 and b=10 and the dashed blue one correspond to a mixture
of 7 Gaussian with mean equal respectively to (0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4) and standard deviation 0.001.
expect that under mild assumptions, the estimation error reduces to 0 when T → ∞, the
question being the dependency of the speed of convergence in the size of the hypotheses
class A and on the sample size T . When the class of hypotheses is finite, this estimation
error can be controlled with a union bound on some basic concentration inequalities.
Proposition 3.9. Consider a finite class of auctions A and ST ∼ F⊗T a dataset of T value
vectors drawn i.i.d from F. Assume n
that thesupport of F is included in [0, H] . Then, for
H2 4
all ε > 0 and δ > 0, if T ≥ 2ε 2 ln δ + ln |A| , then
PST ∼F⊗T aA (ST )) − R(a∗A (F)) ≤ ε
R(b ≥ 1−δ
aA (ST )) = R(a) − R
R(a) − R(b bS (b
T
aA (ST )) + R
bS (b
T
aA (ST )) − R(b
aA (ST ))
bS (a) + R
≤ R(a) − R bS (baA (ST )) − R(b
aA (ST )) (by definition of ERM)
T T
≤ 2 max |R
bS (a) − R(a)|
T
a∈A
54
Then,
n o X n o
bS (a) − R(a) > ε ≤
P ∃a ∈ A, R P bS (a) − R(a) > ε
R
T T
a∈A
−2T ε2
!
≤ 2|A| exp
H2
Setting the right-hand side to δ/2 finishes the proof.
This generalization bound is uninformative if the size of the hypotheses space is infinite
which happens for large families of value distributions. This simple proof can nonetheless
be extended to an infinite set of hypotheses using standard statistical learning tools. The
general idea is to reduce the analysis of an infinite class of auctions to a finite set of
hypotheses. First, we introduce below different notions to quantify the complexity on a
hypotheses set.
Definition 3.10. Let A be a class of auction and ST = (x1 , . . . , xT ) be a fixed dataset of values
vector. The empirical Rademacher complexity of A with respect to ST is
T
d S (A) = Eσ sup 1
X
Rad T
σ r (x
t a t )
a∈A T t=1
55
challenging. So weaker concepts, the Vapnik-Chervonenkis (VC) dimension (for the binary-
class problem) and the pseudo-dimension (for the real-valued hypotheses classes), were
introduced. They can be used to establish generalization bounds and are easier to compute
than the Rademacher complexity since they are pure combinatorial notions.
The following result from learning theory gives a uniform generalization bound in
terms of the pseudo-dimension.
This result is derived from an upper bound of the Rademacher complexity, as a function
of the pseudo-dimension. It can be translated in terms of sample complexity.
bS (a) − R(a)| > ε ≤ δ
PST ∼F⊗T ∃a ∈ A, |R T
This result links the sample size T to the "richness" of the auction class A on the
estimation error. It was first originally applied in learning auction as such (Morgenstern
and Roughgarden, 2015), before being extended (Devanur et al., 2016; Gonczarowski and
Nisan, 2017; Guo et al., 2019). In Sections 3.3 and 3.4, we will use the pseudo-dimension
to quantify the estimation error through this result.
56
3.3 Auctions with Asymptotically No Approximation Error
We first present families of auctions without asymptotical (as T → ∞) approximation error.
Said otherwise, either there is no approximation error at all or the approximation error
can be sent to 0, by controlling a parameter dependent on T .
Algorithmic Complexity. The running time of the empirical Myerson auction is also
quite high. It takes O(nT log T ) operations to compute the empirical cdf. Then, each time
the auction is run, computing the attribution and the payment takes O(nT ).
57
The winner of the auction is the bidder with the highest non-negative index: if all
bidders have an index equal to −1, the item is not allocated. Ties are broken at the advantage
of the highest bidder among those with highest bid, or by standard decision rules (such as
uniformly at random).
The payment rule is defined according to Corollary 2.18, and to ensure that the auction
is DSIC, it is the lowest winning bid. Formally, with the above breaking-tie rule, the
payment of the bidder i if he won is equal to
• ri0 if all other bidders have index -1,
• max{riτ , bj } where τ > −1 is the index of the bidder j that would have won without
bidder i
This class of L-level auctions interpolates between the eager second-price auction and
the Myerson auction. Indeed, the 1-level auction is equivalent to the eager second-price
auction. When L → ∞, the Myerson auction can be approximated with appropriate reserve
prices – i.e. the approximation error can be made arbitrarily small by taking T arbitrarily
large.
Theorem 3.16 (Morgenstern and Roughgarden, 2015). Let F be the class of distributions
whose support is included in [1, H]n and L = Θ( 1ε + log1+ε (H)), then for any F ∈ F , there
exists a L-level auction with a revenue higher than 1 − ε times the revenue of the Myerson
auction .
The idea of this theorem is quite simple, the class of L-level auctions, for the above
well-chosen value of L, is an ε-net for the class of regular value distributions (Morgenstern
and Roughgarden, 2015). Moreover, the estimation error can be controlled by computing
its pseudo-dimension.
Theorem 3.17 (Morgenstern and Roughgarden, 2015). Let A be the class of L-level
auctions with n bidders, then its pseudo-dimension satisfies
Pdim (A) = ΘL→∞ (nL log(nL)) .
Remark. Theorem 3.17 only provides a scaling of Pdim (A) with respect to nL for the sake
of simplicity. A more precise relation can be derived (Morgenstern and Roughgarden,
2015) since 2Pdim (A) ≤ (nPdim (A) + nL)3nL . This will be useful to derive a pseudo-dim for the
class of second-price auctions later (i.e. L = 1).
The approximation error will be smaller than ε by setting L = Θ( 1ε + log1+ε (H)) thanks
to Prop. 3.14. Similarly, the estimation error will be smaller than ε with
H
T = Θ H 2 ε−2 Pdim (A) log( ) + log(1/δ)
ε
58
samples, where Pdim (A) = ΘL→∞ (nL log(nL)). Combining these two claims gives the follow-
ing.
Corollary 3.18. Let F be the class of distributions with support included in [1, H]n . For
ε > 0 and δ > 0, the sample complexity of L-level auctions with L = Θ( 1ε + log1+ε (H)) is of
order
H 2n 1
T =Θ log .
ε3 δ
Table 3.1: Current status of sample complexity bounds in the batch setting depending on the class of value
distributions. Table taken from (Guo et al., 2019).
The question of finding the optimal sample complexity is more or less settled for
different interesting classes of distributions. On the other hand, most of the “optimal” (in
the sense that some upper-bound matches the associated lower bound, up to logarithmic
terms) techniques suffer from their computational complexity of running the optimal
auction (and not learning it), as the empirical Myerson auction method. Indeed, learning
the optimal auction has a cost of O(nT log T ), but running it has a fixed cost of O(nT )
operations to compute each allocation and each payment. There is a clear tension: the
smaller the error ε (large values of T ), the larger the running time of the optimal auction.
This is simply unpractical for large-scale auctions systems such as the Ebay example. The
59
next subsection describes how to handle revenue maximization on more tractable auctions,
at the cost of keeping a bounded, yet incompressible, approximation error.
First, let us consider the simple family of second-price auctions with an anonymous reserve
price (which contains the Myerson auction in the symmetric case). From the learning point
of view, only one parameter must be learned.
Proposition 3.20. Let A be the class of second-price auctions with anonymous reserve
prices. For n ≥ 2, the pseudo-dimension of A is
Pdim (A) = 2 .
Proof. First, an auction in A is defined by only one parameter, thus we identify it with its
anonymous reserve price and we denote it by a for simplicity. Finding a set of cardinality
2 that can be pseudo-shattered by A is trivial. We only need to prove that any set of
dimension 3 and higher cannot be shattered.
60
We remind that a dataset ST = (x1 , . . . , xT ) of size T is pseudo-shattered by A if there
exists θ ∈ RT , such that for any c ∈ {−1, 1}T , there exists a ∈ A such that ∀t ∈ [T ], sign(ra (xt )−
θt ) = ct .
Regardless of the number of bidders n, the function a 7→ ra (x) is quasi-concave in the
reserve price a and thus can cross (strictly) at most twice any threshold θt . Hence, a dataset
ST of size T can only generate a subset of {−1, +1}T of size 2T + 1: when a ranges from 0 to
∞, the vector sign(ra (xt ) − θt ) changes values at most twice per points in ST . Thus, a set ST
can be pseudo-shattered only if 2T ≤ 2T + 1 which means that necessarily T ≤ 2.
Proposition 3.14 yields that the sample complexity is T = Θ H 2 ε−2 log( Hε ) + log(1/δ)
for distribution with bounded support on [1, H]. Unfortunately, the approximation power
of such a simple class of auctions is poor and the approximation error remains large.
Theorem 3.21 (Hartline and Roughgarden, 2009). With regular value distributions, the
anonymous second-price auctions are a 4-approximation of the Myerson auction.
Sketch of proof: The proof relies on the following Bulow-Klemperer variant lemma.
Lemma 3.22 (Hartline and Roughgarden, 2009). Consider the following two settings. In
the first one, there are n bidders with value distribution Fi and, in the second one, there
are 2n bidders, the original ones and one independent copy of each one of them. The
second price auction without reserve price in the second setting is a 2-approximation of
the Myerson auction in the first setting.
Proof. Let R2n be the revenue generated by the second price auction without reserve price
with the original set of n bidders (of values denoted by xi ) and their n copies (whose
values are denoted by yi ). Since a bidder is identical to his copy, they both generate the
same revenue to the seller. As a consequence, in this auction, the revenue of the original n
bidders is equal to the revenue of their n copies, hence equal to R2n /2. Lemma 3.22 states
that R2n is bigger than half the revenue of the Myerson auction. This implies that, overall,
the original n bidders generate 1/4 of the Myerson auction revenue.
In this auction, one of the original n bidder gets the item only if his bid is the highest
and, in that case, he pays the second highest bid amongst the 2n ones, which is equal to
the maximum between the second highest bid of the original bidders and the highest bid
of the copies. As a consequence, the allocation and payment of that bidder is exactly the
same as in an auction with only the n original bidder with a random reserve price set as
the maximum bids of the copies, i.e., max{yi ; i ∈ [n]} := y (1) . So there is a second-price
auction with random reserve price that is a 1/4 approximation of the Myerson revenue. It
is crucial to notice that this random reserve price y (1) is independent of the highest and
second highest values, denoted respectively by x(1) and x(2) so that the expected revenue at
61
the truthful equilibrium satisfies
h n oi h n oi
Eyi ∼Fi Exi ∼Fi max{x(2) , y (1) }1 x(1) ≥ y (1) ≤ max
∗
Exi ∼Fi max{x (2) ∗
, y }1 x (1)
≥ y ∗
y
by the pigeon-hole principle. As a consequence, setting as reserve price Y ∗ that attains the
maximum on the right generates at least 1/4 of the Myerson auction revenue.
62
Sketch of proof. This is a direct extension of the proof for the second-price auction with
anonymous reserve prices as the n estimation problems are independent.
Proposition 3.23 states that the estimation problem is not much harder than with anony-
mous reserve
prices. Using again
Proposition 3.14, the associated sample complexity is
2 −2 H
T = Θ H ε n log( ε ) + log(1/δ) for distribution with bounded support on [0, H]. How-
ever, this simple modification already greatly improve the guarantee on the approximation
error.
Theorem 3.24 (Dhangwatnotai et al., 2015). With regular value distributions, the lazy
second-price with monopoly reserve prices is a 2-approximation of the Myerson auction.
Proof. We divide the revenue of the Myerson auction in two parts and bound each term by
the revenue of the lazy second price with monopoly reserve.
First, based on the Myerson lemma, recalling that ψi stands for the virtual value
function associated to the distribtuion Fi , see Definition 2.11, we get
n
X
R(lazy(F)) = Ex∼F ψi (xi )1{i is winning lazy 2nd-price }
i=1
n
X
≥ Ex∼F ψi (xi )1{i is winning Myerson auction & lazy 2nd-price}
i=1
since if i is winning the lazy second-price auction, his virtual value is non-negative.
The revenue of the lazy auction can also be compared to the revenue of the Vickrey
auction (i.e. the second-price auction with no reserve price), as follows
n
X
Ex∼F ψi (x)1{i wins Myerson aution & not lazy 2nd-price}
i=1
n
X
= Ex∼F ψi (x)1{i wins Myerson auction & not Vickrey auction}
i=1
n
X
≤ Ex∼F xi 1{i wins Myerson auction & not Vickrey auction}
i=1
≤ R(Vickrey(F))
≤ R(Lazy(F)) .
Indeed, the first equality is a consequence of the fact that if a bidder wins the Myerson
auction, he has a non-negative virtual value. Hence, if he does not win the lazy second
price with monopoly reserve, he does not win the Vickrey auction. The second inequality
is deduced from the definition of the virtual value, the third one from the payment of the
63
Vickrey auction and the last one from the Myerson lemma that shows that the monopoly
reserve prices are the optimal ones for a lazy second-price auction.
As a consequence,
n
X
2R(Lazy(F)) ≥ Ex∼F ψi (xi )1{i is winning Myerson auction & lazy 2nd price}
i=1
n
X
+ Ex∼F ψi (xi )1{i is winning Myerson auction & not lazy 2nd price}
i=1
∗
= R(a (F))
The last equality comes from by the Myerson lemma, and this concludes the proof.
First, in terms of estimation, the problem of optimizing the eager second price auction is
not much harder than the lazy one, as the pseudo-dimensions are rather similar: they only
differ by a factor log(n).
Proposition 3.25. Let A be the class of second-price auctions with personalized reserve
prices; its pseudo-dimension satisfies
Theorem 3.26 (Hartline and Roughgarden, 2009). With regular value distributions, the
optimal eager second price is a 2-approximation of the Myerson auction.
Proof. We prove that the eager second price with monopoly reserve is a 2 approximation of
the Myerson. The proof is very similar to the one of Theorem 3.24. We divide the revenue
of the Myerson auction in two parts and bound each term by the revenue of the eager
second price auctions.
64
First, based on the Myerson lemma,
n
X
R(eager(F)) = Ex∼F ψi (xi )1{i wins eager 2nd-price}
i=1
n
X
≥ Ex∼F ψi (xi )1{i wins Myerson auction & eager 2nd-price}
i=1
since if i is winning the eager second-price auction, his virtual value is non-negative.
The item is allocated in the Myerson auction if and only if the item is allocated in the
eager second-price auction. Hence, there exists a one-to-one mapping between a winner in
the eager second-price auction and a winner in the Myerson auction.
Consider the case where these two winners are different and denote by i the winner of
the eager second-price auction with monopoly reserve and by j the winner of the Myerson
auction. Let denote by x the vector of value corresponding to this case and by peager (x) the
payment of the eager second price with monopoly reserve for this specific vector of values.
By definition of the payment rule of the eager second price auction and by definition of
the virtual value,
peager (x) ≥ xj ≥ ψj (xj ).
We conclude the proof with the same reasoning of Theorem 3.24. Since the eager second
price with monopoly reserve is a 2-approximation, the eager second price with optimal
reserve is a 2-approximation.
Algorithmic complexity. Similarly to the lazy version, running the eager second-price
auction has a complexity of O(n). The main difference comes in the complexity of learning
the set of optimal reserve prices. For the lazy version, the optimal reserve prices are the
monopoly prices, that can be computed in O(nT log(m)). This is no longer true for the
eager version and finding the optimal reserve prices is NP-hard (Paes Leme et al., 2016;
Roughgarden and Wang, 2016), which explain why the more general T -level auctions
has the same limitation. This seems to be in contradiction with the objective of consider
non-zero approximation error to get tractable learning and running complexities. So the
question we investigate in the following section is the performance of the eager second-
price auctions, but with sub-optimal reserve prices set as the computable monopoly prices.
65
of bidders who cleared their reserve price. This second highest bid can be lower than the
second highest bid in general which is the one paid in the lazy version.
Theorem 3.27 (Fu, 2013). With regular value distributions, the revenue of the eager
second-price auction with monopoly reserve price is higher than the revenue of the lazy
second-price auction.
Proof. We denote by ri the monopoly price corresponding to bidder i. We will compare the
expected payment of bidder i, in the lazy or eager auction, conditioned to the values of all
lazy eager
other bidders. First, we define x−i = maxj,i {xj } and x−i = maxj,i,xj ≥rj {xj }. In particular,
lazy eager lazy
this implies that x−i ≥ x−i . Moreover, if x−i ≤ ri , the two auctions are identical for
bidder i that pays ri if xi ≥ ri .
To compare the two auctions, we can therefore restrict ourselves to the case where
lazy
x−i ≥ ri . The expected payment of bidder i in a lazy second-price auction is in this case
equal to
lazy lazy
x−i (1 − Fi (x−i ))
eager eager
For the eager second-price auction, the payment is equal x̃−i = max{x−i , ri } and the
expected payment of bidder i is
eager eager
x̃−i (1 − Fi (x̃−i ))
Since Fi is regular, then x(1 − F(x)) is non-increasing for x ≥ ψi−1 (0). As a consequence, the
expected payment in the lazy second price is lower than the expected payment in the eager
second-price auction with monopoly reserve prices.
66
3.4.3 The boosted second-price auction
A simple extension to the eager second-price auction has been proposed to empirically
improve the seller’s revenue by reducing the approximation error: boosted second-price
auction (Golrezaei et al., 2017). It relies on the following point of view: the eager second-
price auction can be seen as a Myerson auction with approximated virtual value functions
ei (x) = x − ri where ri is the personalized reserve price.
ψ
Practically, a drastic improvement in terms of approximation error can be made by
adding a slope parameter, different for each bidder. The result is the boosted second-
price auction, which is a Myerson auction with approximated virtual values functions
ei (x) = βi x −ri where ri is the personalized reserve price and βi the "boost". It turns out that
ψ
for certain families of distributions (ex: generalized Pareto distributions) the virtual value
is affine, making the boosted second-price auction coincide with the Myerson auction.
It retains the following two good properties of second-price auctions with personalized
reserve prices: 1) it is parametric and thus has a reasonable pseudo-dimension and 2) it
has a running time independent of the sample complexity T . And it even has a lower
approximation error as the auctions class is strictly larger. Actually, for some families
of distributions, it even has 0 approximation error even in an asymmetric setting. The
main caveat is the computational complexity of the training. As the eager second-price
auction, the global optimization is NP-hard to solve. However, by initializing with an
eager second-price auctions (βi = 1) with monopoly reserve prices ri and launching an
optimization from there, it proved to empirically perform very well (Golrezaei et al., 2017).
67
This approach appeals to stationarity assumptions, a potential drawback. In practice
issues may also arise from the fact that bidders and seller may have different estimates of
the bid distribution the bidders are facing. Then the seller’s estimate of the optimization
problem solved by the bidders could be inaccurate.
Another major difficulty in setting optimal asymmetric reserve prices is that they have a
somewhat complex and non-linear impact on the optimal bidding strategies of the buyers
and, as a consequence, will affect the allocation probability qi (x) in a potentially complex
fashion at equilibrium.
However, a natural, yet possibly suboptimal, choice is to set ri = ψi−1 (0), i.e. setting
the reserve value at the monopoly price ψi−1 (0), at least for regular distributions. This
guarantees that the term under the expectation in Equation (2.6) is always non-negative –
in a first-price auction, bidders never bid above their value. This principle is quite similar
to one studied in Section 3.4.2 for eager second-price auctions, as finding optimal reserve
prices is NP-hard (Paes Leme et al., 2016) for that type of auctions. However, in first price
auctions and other non-DSIC auctions, setting this reserve may induce a change of optimal
bidding strategy and hence a different βi , making the evaluation of the impact of such
choice of reserve price on seller revenue theoretically delicate. To compute the monopoly
price, the seller needs to estimate the value distribution of bidder i from his bids, a task
we now turn to.
Setting optimal reserve prices in first-price auctions cannot be done by naive ERM
In first price auctions, the bidder requires much information about the competition to
compute best responses. Even if he knew perfectly the value distribution of the other
buyers, it would still be numerically challenging to bid optimally and reach the Nash
equilibrium. The situation is even worse when buyers have to estimate the distribution of
the competition while bidding.
On the other hand, setting the reserve prices is also more complicated for the seller.
With second price auction, she could gather data and form a dataset of bids whose dis-
tribution should be close to the value distribution (assuming myopic and non-strategic
agent). It is then possible to run an ERM based on this dataset.
On the contrary, with first price auctions, each bid received has a distribution that
depends on the reserve price chosen at the time. And, in the future, choosing another
reserve price will induce yet another distribution of bids. As a consequence, the data
from the “training” set (past bids) and the “test” set (future bids) have different statistical
properties and naïve empirical risk minimization will not work.
68
A seller can however use the theory discussed above to account for the impact of reserve
price on bid distributions and then simulate from these new reserve-price dependent
distributions and measure the effect of different reserve prices in a unbiased way. The
difficulty of solving for Nash equilibrium creates nonetheless a hurdle to the practical
implementation of such ideas (Feng et al., 2021).
To ensure the IC constraint, two different approaches have been considered. The first one is
a hard constraint implemented by defining an architecture which is DSIC by design, called
MyersonNet. Myerson’s lemma is used to design this architecture that learns the optimal
69
DSIC auction in the single-item setting. However, for each new setting of the problem, a
new architecture must be designed (Shen et al., 2019b).
The second approach, the RegretNet architecture, uses a soft constraint in a Lagrangian
corresponding to the incentive-compatibility objective. For each bidder, the empirical
ex-post regret for bidder i is defined as
T
1X
Regi (ω) =
d max uiω (bt∗ , b−i,t ) − uiω (bi,t , b−i,t )
T bt∗
t=1
This regret is the difference between the maximum utility bidder i can get by optimizing
m
his bids bt∗ ∈ Rn×2 and the utility he gets when bidding truthfully (sort of similarly to the
regret introduced in Section 3.6.1, yet the maximum is inside the sum instead of being
outside). This quantifies serves as a proxy on how untruthful an auction is: the higher
the regret, the less truthful the auction is as bidders can largely increase their utility by
deviating. The augmented Lagrangian method is then used to optimize the Lagrangian
function defined as:
n n !2
X ρ X
L(ω, λ) = LRev + λi Reg
d (ω) +
i Reg
d (ω)
i
2
i=1 i=1
This Lagrangian function is the sum of the negated actual revenue of the mechanism with
two penalties which quantify the lack of incentive compatibility, thus insuring that the
learned mechanism is approximatively DSIC. The bids bi , which are maximizing Reg d (ω),
i
are optimized through gradient descent, making the optimization unfortunately very
slow. In some multi-item instances, this approach actually recovers the optimal revenue-
maximising mechanisms (when the latter is known theoretically).
This approach can be complemented by introducing a network encompassing the best
bids for one specific bidder, avoiding running a gradient descent for each specific value
and by trying to take advantage the continuity of the problem (Rahme et al., 2020). The
idea is to leverage the fact that if two valuations are close to each other, their optimal bids
should be also close to each other.
These numerical approaches can help theoreticians to identify some good candidates
for the revenue-maximizing auctions in more exotic cases when bidders have some budget
constraints Feng et al., 2018a. However, a general theory for designing optimal mechanisms
in the general case of multi-item auctions is still out of reach.
70
repeated auctions like internet advertising, the items sold are different from one to each
other, at least partly. For instance, successive ad slots sold may have same size but different
placements, same placement but different size, or may be on different pages of the same
website, or on different websites. An easy solution would be to consider all these different
items separately, but it would mean only having a small number of samples per items,
which would prevent accurate estimation of the monopoly price. To address this issue,
some underlying structure and some regularity are required to formalize the idea that
samples obtained for one item also informs on the distributions of similar items.
In this section (only), we will assume that an item is described by a public set of d features
z ∈ Rd such that similar items have similar features (for some distance of Rd ). By public, we
mean that z is available to both the seller and the bidders. The bidders use this information
to estimate their values x = (x1 , . . . , xn ) more accurately; in particular the distribution of
values now depends on z. For simplicity, we assume in this section that the seller only
estimates an anonymous reserve price, the same for all bidders, but that it may depend
on the available information z. The extension to personalized reserve prices is straight-
forward when they are independent, like in the case of the lazy second-price auction.
Formally, the seller aims at learning a reserve price as a function of z, hence, a mapping
r ∗ ∈ Rd → R+ . For learnability reasons, we will restrict r ∗ to belong to some compact
sub-class of hypothesis R ⊂ (Rd → R+ ). The learning of this contextual optimal reserve
price relies on the observations of samples from the distribution of values of the bidders.
In fact, only the observation of the highest and the second highest value is necessary, so
we denote by x(1) the highest value amongst the bidders and by x(2) the second highest
value. Further, we denote by F the joint distribution of (x(1) , x(2) , z). Then, F (.|z) is the
distribution of the two highest values conditionally to the contextual information z. In the
end, finding the reserve function can be written as follow:
h i
r ∗ = argmax E(x(1) ,x(2) ,z)∼F φ(r(z), x(1) , x(2) ) (3.3)
r∈R
where φ(ρ, x(1) , x(2) ) = x(2) 1{x(2) > ρ} + ρ1{x(2) ≤ ρ ≤ x(1) }
This problem is quite difficult, both because of the optimization over a set of function
R, and because of the quite complex objective function. Similarly to the monopolistic profit
function, for a given fixed z, if the distribution F (.|z) is regular or MHR, the function ρ 7→
E(x(1) ,x(2) )∼F (.|z)[φ(ρ,x(1) ,x(2) )] is pseudo-concave or log-concave, but not concave. Unfortunately,
a sum of pseudo-concave or log-concave function is in general not pseudo-concave. Thus,
whenever considering a parametric class of functions R strictly smaller than the whole set
of functions, such as linear mappings, the loss marginalized over z may not be unimodal
71
in the parameters, leading to hard optimization problems. In the following, we present a
high-level overview of methods proposed to solve this learning problem.
Theorem 3.5.1 (Mohri and Medina, 2014). Let φ e : [0, 1] × [0, 1] → R be a bounded function,
concave in its first argument. If φ
e is consistent with φ0 , then φ(·,e x) is a constant function
for any x ∈ [0, 1].
This theorem indicates that learning methods, even based on linear functions for R,
need to rely on non-concave maximization methods. We detail two examples of such
methods in the following.
A first method is based on the following piece-wise linear surrogate (Mohri and Medina,
2014):
e x(1) , x(2) ) = φ(ρ, x(1) , x(2) ) − (ρ − (1 + γ)x(1) )1{ρ > x(1) }1{ρ ≤ (1 + γ)x(1) },
φ(ρ, for γ > 0 .
While using a surrogate introduces a bias, as the maximizer in expectation of φ e will not
maximize exactly the expected monopoly revenue, this bias can be made small by taking
γ close to 0. However, this comes at the cost of making the Lipschitz constant 1/γ of
the surrogate grow significantly. The key idea under this surrogate φ e is that, because
it is piece-wise linear, it is possible to explicitly decompose it as a difference of convex
(1) (2)
functions. This means that the empirical risk T1 Tt=1 φ(r(z
P e
t ), xt , xt ) can, in turns, be
explicitly decomposed as a difference of convex functions as soon as r is a linear function.
Then, it is possible to optimize this empirical objective on classes R of linear function by
using DC-programming algorithms, such as DCA (Le Thi et al., 2014).
72
Solution based on Objective Variables
Another method exploits the idea of the introduction of objective variables in a Bayesian
framework Rudolph et al. (2016). First, the objective φ is smoothed using a Gaussian to
define the following surrogate:
e x(1) , x(2) ) = log E∼N (0,σ 2 ) exp(φ(ρ + , x(1) , x(2) )) , for some σ > 0.
φ(ρ, t t t t
This additional variable intuitively represents how satisfying the revenue is for a given
(1) (2)
auction (a sample) and thus is aimed to be put to 1. Hence, a dataset {xt , xt , zt , ηt = 1}t∈[T ]
(1) (2)
where (xt , xt , zt )t∈[T ] ∼ F ⊗T is used to estimate the parameters of the reserve price
function r to fit this probabilistic model. The key point is that the maximum at posteriori
(MAP) estimation recovers the parameter that maximizes the initial smoothed objective,
the expectation of φ. e The MAP estimation under this model is performed using the
Expectation-Maximixation (EM) algorithm. As such, the guarantee is only to improve the
solution at every step, but there is no global convergence guarantee. However, it exhibits
significant empirical improvements (in reasonable learning time) over the previous method
based on DC programming.
73
x̂(.). Then, the reserve price function r is defined in a piece-wise manner on this partition.
Denoting rk the empirical monopoly price computed on the restriction of the dataset to Ck ,
the reserve price is defined as
K
X
r(z) = rk 1{x̂(z) ∈ Ck } (3.5)
k=1
Theorem 3.5.2 (Medina and Vassilvitskii, 2017). For δ > 0, with probability at least (1 − δ)
over the learning samples, it holds
2 2 1
EF [r(z)1{r(z) ≤ x}] ≥ EH [x] − O K − 3 + η 3 + T − 6 .
The intuition behind this result is clear: the higher the prediction of the value x̂(z),
the higher the revenue extracted. However, it only provides a guarantee relatively to the
expected value EF [x] rather than the optimal revenue that could be extracted.
Overall, these three algorithms are costly to run on big datasets, highlighting the
complexity of the underlying problem. There exists more efficient computation of re-
serve prices, but usually by sacrificing the objective of revenue maximization for weaker
objectives (Shen et al., 2019a).
74
A crucial implicit assumption made for these arguments to hold is that, no matter
the auction mechanism chosen at each stage, the seller gets to observe perfectly a sample
of the value distribution of each (or at least one in the symmetric case) buyer. In many
applications, this is unfortunately not true. Consider for instance the posted price mech-
anism, then the feedback actually received is only whether the value is above - or below
- current price. Similarly, if reserve prices in second price auctions are too high, bidders
might decide to opt-out the current auction (as in posted price, see also the discussions on
lazy vs eager auctions) and/or the seller might only have her revenue as feedback, because
she is using some black-box tool to actually run the auction.
This setting is called with partial feedback and are closely related to the multi-armed
bandit scenarii, and therefore similar techniques (quickly recalled in the following section,
see Bubeck and Cesa-Bianchi, 2012; Lattimore and Szepesvári, 2020; Slivkins et al., 2019
for more details) can be used.
75
The multi-armed bandit literature focuses on finding algorithms that provably control the
regret with sub-linear growths (both in T and K). As mentioned before, the techniques
differ quite a lot from stochastic to adversarial data.
Before giving the proof, we should mention that the different universal constants can
be improved with slightly more involved proofs. Similarly, it is possible to change the
log(T ) term in the definition of UCB by 2 log(t). Yet this result is sufficient for us.
Proof. First, let us recall Hoeffding inequality. It states that 1t ts=1 Xk,s ≤ µk + ε with proba-
P
76
which implies that, on an event of probability at least 1 − K/T ,
s
log(T )
µ? ≤ µk + 2εkt ,t = µk + 2 .
Nk (t)
4 log(T )
Inverting the above equation gives that necessarily, on that event, NK (T ) ≤ ∆2k
+ 1, and
summing over the different actions k gives the second bound.
P
For the first bound, recall that Rq
T = k Nk (T )∆k . Since we proved above that on event
4 log(T )
of probability at least 1 − K/T , ∆k ≤ N (T )−1 , we get that on this event (using the fact that
k
∆k is also smaller than 1 to avoid dividing by 0)
s
X X 4 log(T )
RT = ∆k Nk (T ) ≤ N (T ) + K
Nk (T ) − 1 k
k k:Nk (T )≥2
p Xp
≤ 2 2 log(T ) Nk (T ) + K
k
p
≤ 2 2KT log(T ) + K,
√
where the second inequality comes from the fact that √ N ≤ 2N as soon as N ≥ 2 and
N −1
the last one is a consequence of Cauchy-Schwartz inequality.
The UCB algorithm has been generalized, extended with many different variants to
improve the different dependencies. For our purpose, we might only consider the MOSS
algorithm (Audibert
√ and Bubeck, 2009; Degenne and Perchet, 2016) whose expected regret
scales as O KT .
77
this estimate has the good properties of being both unbiased and always smaller than 1
(even if possibly arbitrarily small). The EXP.3 algorithm is defined by
P
exp η ts=1 X bk,s
pk,t+1 = P P ,
t
k 0 exp η s=1 X
b k 0 ,s
1
X
Φ(Wt+1 ) − Φ(Wt ) = log pk,t+1 exp(η X
bk,t+1 ) .
η
k
Using the facts that exp(ηx) ≤ 1 + ηx + η 2 x2 if x ≤ 1, that η X bk,t ≤ 1 and finally that
log(1 + x) ≤ x, we get
X X
Φ(Wt+1 ) − Φ(Wt ) ≤ bk,t+1 + η
pk,t+1 X b2
pk,t+1 X k,t+1
k k
1−Xk,t
In particular, plugging back the definition of X
bk,t+1 = 1 −
pk,t 1{kt = k}, we get
X X 1 − Xk,t+1 2
Φ(Wt+1 ) − Φ(Wt ) ≤ Xk,t+1 1{kt+1 = k} + η pk,t+1 1 − 1{kt+1 = k} .
pk,t+1
k k
78
Taking expectation, conditionally to the past history, gives that this term is controlled as
X 1 − Xk,t+1 2
E pk,t+1 1 − 1{kt+1 = k}
pk,t+1
Xk X
≤ (1 − Xk,t+1 )2 − 2 (1 − Xk,t+1 )pk,t+1 + 1
k k
and the latter is always smaller than K. To see this, assume, without loss of generality that
X1,t+1 ≥ Xk,t+1 for all k ∈ [K] so that
X 1 − Xk,t+1 2
E pk,t+1 1 − 1{kt+1 = k}
pk,t+1
Xk
≤ (1 − Xk,t+1 )2 + (1 − X1,t+1 )2 − 2(1 − X1,t+1 ) + 1
k≥2
X
≤ (1 − Xk,t+1 )2 + X1,t+1
2
k≥2
≤K
Plugging this back in the definition of Φ(Wt+1 ) − Φ(Wt ), and taking the expectation condi-
tionally to the past history give
X
EΦ(Wt+1 ) − Φ(Wt ) ≤ Xk,t+1 pk,t+1 + ηK.
k
log(K)
Summing over t, and using the fact that Φ(0) = η and Φ(Z) ≥ maxk Zk , we finally get
T
X X log(K)
E max X
bk,t − E Xkt ,t ≤ + ηKT
k
m
η
t=1
PT PT
which gives the result, as E maxk t=1 Xk,t
b ≥ maxk E t=1 Xk,t .
The EXP.3 algorithm is a standard building block of many online learning algorithms
with adversarial data. As p UCB, it has been improved in many directions, notably to get
rid of the sub-optimal log(K) term in the regret bound (yet at the cost of a much more
X
intricate proof). It is also possible to estimate Xk,t with p k,t 1{kt = k}. However, this estimate
k,t
can be arbitrarily large and the variance of EXP.3 cannot be directly controlled. The trick
is
p then to add a forced exploration term, i.e., to play uniformly at random with probability
K log(K)/T at each round.
p
Similarly to the stochastic case, it is possible to get rid of the log(K) term that arises
in the EXP.3 regret analysis with a more involved algorithm (and proof techniques). It is
then “optimal”’ in the sense that any learning √ algorithm
must (in some difficult problem
instances) have a regret scaling at least as Θ KT .
79
3.6.2 Auctions learning with partial feedback
As mentioned before, there are many instances where a seller only has incomplete data
on auctions run in the past (in posted price, lazy/eager second price, with a black-box
selling mechanism, etc.). However, it is still possible for the seller to learn the optimal
mechanism in many different cases, with actually a very small extra-cost compared to the
batch-approach.
Consider for instance the online posted price problem. A seller repeatedly posts a
price pt ∈ [0, 1] to sell identical items and buyers sequentially arrives, with private value
xt ∈ [0, 1]. The buyer t buys the item if xt ≥ pt , without revealing the true value. As a
consequence, the partial feedback available to the seller, before fixing the next price pt+1 ,
are all the indicators 1{xs ≥ ps } for s ∈ [t]. As before, the objective of the seller is to find, as
quickly as possible, the best price p∗ or to minimize the regret
T
X T
X
max p 1{xt ≥ p} − pt 1{xt ≥ pt }
p∈[0,1]
t=1 t=1
We recall that if data are stochastic, i.e., i.i.d. with cdf F, then p∗ is a root of the virtual
function (and the root if F is regular). Yet, the analysis carries on with adversarial sequence
of price pt .
In sequential learning, a first and naíve possibility is quite often to discretize the
decision space (here [0, 1]) and to run an UCB or EXP.3 algorithm on the discretization
(depending if data are stochastic or adversarial), agnostically to the structure at hand.
Given some ε > 0, the size of the discretization in the posted price problem is 1/ε, leading
to a global regret of the order of
√ 2 1
O(T ε) + O( T /ε) = O(T 3 ) with the choice of ε = T − 3 .
In the above equation, the first term corresponds to the approximation error due to the
discretization and the second to the estimation (or learning) error of the optimal price in
the discretized set.
On the other hand, the regret bound can be largely improved by leveraging the structure
of the problem, as least in the stochastic case, when the generating distribution behaves
nicely enough (the worst-case learning cost being indeed T 2/3 , see (Kleinberg and Leighton,
2003)). For instance, a typical assumption is that the monopoly profit function Π(p) =
p(1−F(p)) is approximatively quadratic around p∗ , i.e., that Π(p∗ )−Π(p) = Ω(p−p∗ )2 . Using
a UCB algorithm with a uniform ε-discretization then yields a total regret of the order of
1
ε − 14
log(T ) T
X p
2
O(T ε ) + O = O( T log(T )) with the choice of ε= .
k 2 ε2 log(T )
k=1
80
This simple technique cannot be improved, even with a stronger assumption: indeed,
since if the approximation term is of order T ∆, the estimation error is at least log(T )/∆.
However, the problem has a stronger property that can be further leveraged: if a price
pt is accepted, then any lower price would also have been accepted (and reciprocally). In
particular, this can be used in the following simplest possible problem, but where the
solution is highly counter-intuitive.
Suppose that each buyer has the same exact value xt = x. Then it’s clear that the optimal
price is p∗ = x; the remaining question is the learning cost. As mentioned before, the
feedback is in that case binary; either “x is greater than pt ” or “x is smaller than pt ”. In
order to find x, with the fewest query possible, then a binary search is optimal. However,
the binary search is exponentially sub-optimal in terms of learning cost.
Proposition 3.30. The regret of a binary search can be as large as Ω(log(T )). On the other
hand, there exists a more cautious search whose regret is smaller than O(log log(T )).
Proof. Assume that x = 12 . Then a binary search will use log(1/ε) posted – and refused–
prices, to reach the precision ε. Even with the optimal choice of ε = 1/T , this gives a log(T )
regret.
The cautious search works in epochs ` ∈ {0, 1, . . . , log2 log2 (T )} – let us assume for
simplicity here that log2 log2 (T ) is an integer. At the `-th epoch, the prices posted increase
` (`)
by 1/22 until such a price is refused and the next epoch begins. Let p? be the last accepted
(`+1) (`+1) (`) j−1
price at epoch ` and pj the j-th price posted at epoch ` + 1, then pj = p? + 2`+1 . At
2
the end of the epoch log2 log2 (T ), the cautious binary search posts the last accepted price
until the final stage T .
To compute the regret of the cautious binary search, notice that at each epoch `,
only one price is rejected, and this rejection has a cost smaller than 1. Moreover, since
(`) (`) (`+1)
p∗ ∈ [p? , p? + 12` ), then p∗ − p1 ≤ 12` and more generally, as long as posted prices are
2 2
not rejected
(`+1) 1 j −1
p ∗ − pj ≤ ` − `+1 .
22 22
`
Since they are at most 22 posted prices in epoch ` + 1, the cumulative cost of errors in that
epoch is bounded by
` `
22 22
X 1 j −1 X j
`
− `+1
= `+1
≤ 1.
j=1
22 22 j=1
22
As a consequence, each epoch has a bounded cost of (at most) 2 which gives the result as
only log2 log2 (T ) epochs are needed to get an error on p∗ smaller than 1/T .
81
The crucial property to obtain log log(T ) regret (Kleinberg and Leighton, 2003) is that
the cost function Π(p∗ ) − Π(p) is asymetric and decreases much slower on the left that on
the right. This property was later used again in the stochastic case to generalize √
Proposition
3.30 if the support of the distribution is finite to get a worst case bound of KT (Cesa-
Bianchi et al., 2019) and in the adversarial case to lower the parameter dependency in front
of the T 2/3 term y (Bubeck et al., 2017).
The fact that the reward mapping Π(p) cannot be any function, but must belong to a
specific family, can also be leveraged in learning the optimal reserve price in symmetric
(repeated) second-price auction (Cesa-Bianchi et al., 2014). Consider for instance the more
complex case where there are not only one but n bidders at each auction, and the feedback
to the seller is, as in posted price, the revenue of the auction – and not the true value.
In this specific case, the revenue is either the current reserve price pt (if the highest bid
is above it and the second highest below) or the second highest bid. In both cases, the
learner gets information on not only Π(pt ) but on the whole function Π(·). The learning
algorithm somehow combines the idea behind the cautious binary search and UCB. It
proceeds by epochs, and at each stage of the epoch k the proposed reserve price pk is
the same and always smaller than p∗ (at least with arbitrarily high probability). At the
end of an epoch, based on the data collected, a confidence interval of Π(·) is constructed
- this is possible because it only depends on the distribution of the second highest bid
in the symmetric case and only bids above the current reserve price matter – based on
the Dvoretzky-Kiefer-Wolfowitz inequality. Epoch after epoch, the error on the optimal
reserve price decreases and it is possible to control the regret (at the cost of intensive
computations).
References
Albert, M., V. Conitzer, and P. Stone. 2017. “Automated design of robust mechanisms”. In:
Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 31. No. 1.
Armstrong, M. 1996. “Multiproduct nonlinear pricing”. Econometrica: Journal of the Econo-
metric Society: 51–75.
Athey, S. and P. A. Haile. 2007. “Chapter 60 Nonparametric Approaches to Auctions”. In:
Handbook of Econometrics.
Audibert, J.-Y. and S. Bubeck. 2009. “Minimax policies for adversarial and stochastic
bandits”. In: Proceedings of COLT.
82
Bar-Yossef, Z., K. Hildrum, and F. Wu. 2002. “Incentive-compatible online auctions for
digital goods.” In: SODA. Vol. 2. 964–970.
Bartlett, P. L., S. Boucheron, and G. Lugosi. 2002. “Model selection and error estimation”.
Machine Learning. 48(1-3): 85–113.
Blum, A., V. Kumar, A. Rudra, and F. Wu. 2004. “Online learning in online auctions”.
Theoretical Computer Science. 324(2-3): 137–146.
Bubeck, S. and N. Cesa-Bianchi. 2012. “Regret Analysis of Stochastic and Nonstochastic
Multi-armed Bandit Problems”. In: Machine Learning. Vol. 5. No. 1. 1–122.
Bubeck, S., N. R. Devanur, Z. Huang, and R. Niazadeh. 2017. “Multi-scale Online Learning
and its Applications to Online Auctions”. Proceedings of the Eighteenth ACM Conference
on Economics and Computation.
Cesa-Bianchi, N., T. Cesari, and V. Perchet. 2019. “Dynamic pricing with finitely many
unknown valuations”. In: Algorithmic Learning Theory. PMLR. 247–273.
Cesa-Bianchi, N., C. Gentile, and Y. Mansour. 2014. “Regret minimization for reserve prices
in second-price auctions”. IEEE Transactions on Information Theory. 61(1): 549–564.
Choi, H., C. F. Mela, S. R. Balseiro, and A. Leary. 2020. “Online display advertising markets:
A literature review and future directions”. Information Systems Research. 31(2): 556–
575.
Cole, R. and T. Roughgarden. 2014b. “The sample complexity of revenue maximization”.
In: Proceedings of the forty-sixth annual ACM symposium on Theory of computing. 243–
252.
Conitzer, V. and T. Sandholm. 2002. “Complexity of mechanism design”. In: Proceedings of
the Eighteenth conference on Uncertainty in artificial intelligence. 103–110.
Daskalakis, C., A. Deckelbaum, and C. Tzamos. 2013. “Mechanism design via optimal
transport”. In: Proceedings of the fourteenth ACM conference on Electronic commerce.
269–286.
Degenne, R. and V. Perchet. 2016. “Anytime optimal algorithms in stochastic multi-armed
bandits”. In: International Conference on Machine Learning. 1587–1595.
Devanur, N. R., Z. Huang, and C.-A. Psomas. 2016. “The sample complexity of auctions
with side information”. In: Proceedings of the forty-eighth annual ACM symposium on
Theory of Computing. 426–439.
Dhangwatnotai, P., T. Roughgarden, and Q. Yan. 2015. “Revenue maximization with a
single sample”. Games and Economic Behavior. 91: 318–333.
Drutsa, A. 2020. “Reserve pricing in repeated second-price auctions with strategic bidders”.
In: International Conference on Machine Learning. PMLR. 2678–2689.
Dütting, P., Z. Feng, H. Narasimhan, D. Parkes, and S. S. Ravindranath. 2019. “Optimal
auctions through deep learning”. In: International Conference on Machine Learning.
PMLR. 1706–1715.
83
Feng, Z., S. Lahaie, J. Schneider, and J. Ye. 2021. “Reserve Price Optimization for First
Price Auctions in Display Advertising”. International Conference on Machine Learning:
3230–3239.
Feng, Z., H. Narasimhan, and D. C. Parkes. 2018a. “Deep learning for revenue-optimal auc-
tions with budgets”. In: Proceedings of the 17th International Conference on Autonomous
Agents and Multiagent Systems. 354–362.
Fu, H. 2013. “VCG auctions with reserve prices: Lazy or eager”. In: Proceedings of the
Fourteenth ACM Conference on Economics and Computation.
Golowich, N., H. Narasimhan, and D. C. Parkes. 2018. “Deep learning for multi-facility
location mechanism design”. In: Proceedings of the 27th International Joint Conference on
Artificial Intelligence. 261–267.
Golrezaei, N., M. Lin, V. Mirrokni, and H. Nazerzadeh. 2017. “Boosted Second-price
Auctions for Heterogeneous Bidders”. In: Management Science.
Gonczarowski, Y. A. and N. Nisan. 2017. “Efficient empirical revenue maximization
in single-parameter auction environments”. In: Proceedings of the 49th Annual ACM
SIGACT Symposium on Theory of Computing.
Guerre, E., I. Perrigne, and Q. Vuong. 2000. “Optimal Nonparametric Estimation of First-
price Auctions”. Econometrica. 68(3): 525–574.
Guo, C., Z. Huang, and X. Zhang. 2019. “Settling the sample complexity of single-
parameter revenue maximization”. In: Proceedings of the 51st Annual ACM SIGACT
Symposium on Theory of Computing.
Hartline, J. D. and T. Roughgarden. 2009. “Simple versus optimal mechanisms”. In: Pro-
ceedings of the 10th ACM conference on Electronic commerce. 225–234.
Haussler, D. 1992. “Decision theoretic generalizations of the PAC model for neural net and
other learning applications”. In: Information and computation.
Huang, Z., Y. Mansour, and T. Roughgarden. 2018. “Making the most of your samples”.
SIAM Journal on Computing. 47(3): 651–674.
Kleinberg, R. and T. Leighton. 2003. “The value of knowing a demand curve: Bounds on re-
gret for online posted-price auctions”. In: 44th Annual IEEE Symposium on Foundations
of Computer Science, 2003. Proceedings. IEEE. 594–605.
Koltchinskii, V., D. Panchenko, et al. 2002. “Empirical margin distributions and bounding
the generalization error of combined classifiers”. The Annals of Statistics. 30(1): 1–50.
Lattimore, T. and C. Szepesvári. 2020. Bandit algorithms. Cambridge University Press.
Lavi, R. and N. Nisan. 2004. “Competitive analysis of incentive compatible on-line auc-
tions”. Theoretical Computer Science. 310(1-3): 159–180.
Le Thi, H. A., V. N. Huynh, and T. P. Dinh. 2014. “DC Programming and DCA for General
DC Programs”. In: Advanced Computational Methods for Knowledge Engineering. Ed. by
T. van Do, H. A. L. Thi, and N. T. Nguyen. Cham: Springer International Publishing.
15–35.
84
Lugosi, G. and S. Mendelson. 2019. “Mean estimation and regression under heavy-tailed
distributions: A survey”. Foundations of Computational Mathematics. 19(5): 1145–1190.
Manelli, A. M. and D. R. Vincent. 2007. “Multidimensional mechanism design: Revenue
maximization and the multiple-good monopoly”. Journal of Economic theory. 137(1):
153–185.
Massart, P. 1990. “The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality”. In:
The Annals of Probability. Vol. 18. No. 3. Institute of Mathematical Statistics. 1269–1283.
Medina, A. M. and S. Vassilvitskii. 2017. “Revenue optimization with approximate bid
predictions”. In: Proceedings of the 31st International Conference on Neural Information
Processing Systems. 1856–1864.
Mohri, M. and A. M. Medina. 2014. “Learning theory and algorithms for revenue opti-
mization in second price auctions with reserve”. In: International Conference on Machine
Learning. PMLR. 262–270.
Morgenstern, J. and T. Roughgarden. 2015. “The pseudo-dimension of near-optimal auc-
tions”. In: Proceedings of the 28th International Conference on Neural Information Process-
ing Systems-Volume 1. 136–144.
Ostrovsky, M. and M. Schwarz. 2011. “Reserve prices in internet advertising auctions: A
field experiment”. In: Proceedings of the 12th ACM conference on Electronic commerce.
59–60.
Paes Leme, R., M. Pál, and S. Vassilvitskii. 2016. “A field guide to personalized reserve
prices”. In: Proceedings of the 25th international conference on world wide web. 1093–1102.
Rahme, J., S. Jelassi, and S. M. Weinberg. 2020. “Auction learning as a two-player game”.
arXiv preprint arXiv:2006.05684.
Roughgarden, T. and O. Schrijvers. 2016. “Ironing in the dark”. In: Proceedings of EC. 1–18.
Roughgarden, T. and J. R. Wang. 2016. “Minimizing Regret with Multiple Reserves”. In:
Proceedings of the 2016 ACM Conference on Economics and Computation. 601–616.
Rudolph, M. R., J. G. Ellis, and D. M. Blei. 2016. “Objective variables for probabilistic
revenue maximization in second-price auctions with reserve”. In: Proceedings of the
25th International Conference on World Wide Web. 1113–1122.
Shen, W., S. Lahaie, and R. P. Leme. 2019a. “Learning to clear the market”. In: International
Conference on Machine Learning. PMLR. 5710–5718.
Shen, W., P. Tang, and S. Zuo. 2019b. “Automated mechanism design via neural networks”.
In: Proceedings of the 18th International Conference on Autonomous Agents and Multiagent
Systems. 215–223.
Slivkins, A. et al. 2019. “Introduction to Multi-Armed Bandits”. Foundations and Trends®
in Machine Learning. 12(1-2): 1–286.
Yao, A. C.-C. 2017. “Dominant-strategy versus bayesian multi-item auctions: Maximum
revenue determination and comparison”. In: Proceedings of the 2017 ACM Conference on
Economics and Computation. 3–20.
85
4 Adaptive and strategic learning agents
First read of this chapter, key concepts and ideas
Online auctions are one of the most fundamental tool of the modern economy. A crucial
assumption behind the results of Chapter 3 is that the seller has access to some large
sample batch of bidders’ valuations. The objectives were to learn the optimal (or at least
the best possible in some class) mechanism based on this dataset. This model particularly
fits problems where the bidders are different from one auction to the other (typically such
as on Ebay) so that it is legitimate to assume that they bid their values (as long as they were
facing an incentive compatible auction) myopically, i.e., non strategically. Unfortunately,
this assumption of facing new bidders for each different item no longer holds in many
important economical situations, such as online advertising. Indeed, in that market the
so-called demand-side platforms (DSP), that are aggregate of bidders, repeatedly interact
with a single seller (called Supply-side platforms (SSP)), billions of times a day (Choi et al.,
2020).
The global objective of the seller remains identical: maximize the total revenue, cer-
tainly by learning (or trying to) the value distributions of the bidders. Indeed, it is still
interesting for SSP to optimize the reserve prices (personalized per DSP) since the number
of participants per auction is relatively low (the median number of bidders is equal to 6
(Celis et al., 2014)). The difficulty is now that bidders are also present in the game for a long
period optimizing their cumulative utility, instead of best-replying myopically to the seller
design of mechanism. Thanks to this long-term optimization, it is possible for a bidder
to sequentially learn his “optimal” bidding strategy, if the mechanism or his opponent’s
strategy or even his own distribution of valuations are unknown at first. Such a bidder is
86
adaptive to his environment, but he could even be strategic. Intuitively, and this will be
detailed later on, a strategic buyer might be tempted to modify his bids if he knows that
the seller is using them to set reserve prices. It can be much more profitable to face a low
reserve price by bidding non-truthfully than facing a high reserve price with truthful bids.
It might be worth mentioning here that bidders might have different values for different
auctions because the intrinsic value of an ad is strongly related to the probability that
the user seeing it clicks on it. As a consequence, the value distribution of bidders in this
chapter represents the time-variability of valuations for one specific bidder for the different
items that are sold successively. Notice, on the contrary, that in the precedent chapter,
the value of a buyer was fixed, but unknown to the seller that only had some prior (the
distribution) on its realization.
For simplicity, we shall assume in this section that values always belong to [0, 1]; this
assumption can be weakened, but at the cost of technicalities.
1. Bidding without knowing its own value: the bidder does not know the expected value
it gives to the item but could learn his own value distribution, by gathering a new
value sample each time he wins an auction. This is particularly relevant in online
advertising where advertisers have to show ads to potential buyers to understand
their propensity to buy a certain product (or at least, the propensity to click on an
ad).
2. Bidding without knowing the mechanism: a strategic bidder does not know precisely the
mechanism used by the seller (or alternatively, has few or no information on the other
bidders’ valuation distributions). The bidder can sequentially and incrementally
adapt his bidding strategy during the T successive auctions to maximise his expected
cumulative utility, hopefully achieving (almost) the same performances as if he knew
the whole mechanism in advance.
3. Bidding with budget constraints: a strategic bidder could have a pre-specified con-
straint that on the budget he can spend during the T successive auctions. In this
87
setting, at each round, he should bid (and spend some of his budget if he wins the
auction) without knowing the exact valuation he will get for the future rounds. The
bidder can sequentially and incrementally adapt his bidding strategy during the T
successive auctions to maximise his expected cumulative utility, without respecting
the constraint on his budget.
The simplest model to study this sequential learning problem is the following (Weed et
al., 2016). Bidder i participates in a sequence t = 1, . . . , T of second price auctions (without
reserve prices) where the realized values vt ∈ {0, 1} are i.i.d. of unknown expectation x. Of
course, vt is not observed before participating to the auction t, and only if that auction
(1)
is won. Let us denote by b−i,t the maximal bids of the opponents (that could actually
include a reserve price) and by bi,t the bid of bidder i at that stage then the performance of
(1)
h i
the optimal truthful strategy is Tt=1 E x1{x ≥ b−i,t } while the learning to bid policy has
P
(1)
h i
gathered Tt=1 E x1{bt ≥ b−i,t } . As a consequence, the cost of learning to bid is measured
P
in terms of regret
T
(1) (1)
X h i h i
RT = E x1{x ≥ b−i,t } − E x1{bi,t ≥ b−i,t } .
t=1
P (1)
s≤t vs 1{bi,s ≥b−i,s }
As in multi-armed bandit, the empirical average xt = (1) is negatively biased.
]{s≤t ; bi,s ≥b−i,s }
As a consequence, the algorithm UCBid slightly biased it positively by adding a small error
88
term in its bid
s
log(T ) (1)
bi,t+1 = xt + 2 where ωi,t = ]{s ≤ t ; bi,s ≥ b−i,s },
ωi,t
Proposition 4.1. The UCBid algorithm has a sublinear regret against any sequence of
(1)
opponent bids b−i,t , as long as they are independent of vt (conditionally to the history), as
p
RT ≤ O T log(T ) .
(1)
Moreover, if the sequence b−i,t is also i.i.d. (of unknown law to bidder i), then the regret
grows is even much slower as
1−α 1+α
2 log 2 (T ) if α < 1
cα T
RT ≤ 2
c log (T ) if α = 1
α
cα log(T )
if α > 1
for some constants cα independent of T and where α is some regularity parameter called
“margin” defined as follows. There exist some C > 0 such that, for any ε > 0,
n (1) o
P b−i,t ∈ (x, x + ε) ≤ Cεα .
Proof. The proof is a bit technical and mostly sketched; the main ingredients are more or
less classical multi-armed bandit techniques. First of all, notice that, as a direct consequence
of Hoeffding inequality, with probability of the order of 1/T , all bids are bigger than x. As
a consequence, we shall only focus on this event where regret is only incurred on auctions
such that
s
(1) log(T )
x < b−i,t < xt + 2 .
ωi,t
Indeed, the optimal bid bi,t = x loses this auction while UCBid overbids and wins it. It
unfortunately pays more than its expected value. The net cost of this specific auction is
s s
(1) log(T ) log(T )
b−i,t − x ≤ xt + 2 −x ≤ 4 ,
ωi,t ωi,t
where the last inequality holds for all auctions on the event considered (that holds with
probability at least 1/T ). Summing over all the auctions t = 1, . . . , T gives the first bound.
89
The other bounds are derived from careful computations of
s
log(T ) o
(1) (1)
n
E (b−i,t − x)1 x ≤ b−i,t ≤ xt + 2
ωi,t
s
log(T ) o
(1) (1)
n
= E (b−i,t − x)1 0 ≤ b−i,t − x ≤ xt − x + 2
ωi,t
s
log(T ) 1+α
≤ CE xt − x + 2 ,
ωi,t +
where the last inequality is a consequence of the margin definition and x+ = max{x, 0}.
An important fact is that, if some regret is incurred at auction t ∈ N, then this auction is
necessarily won hence the counter ωi,t increases by one. As a consequence, the overall
regret can be controlled by
T
r
log(T ) 1+α
X
CE Xs − x + 2
s +
s=1
where X s is the average value of the first s auctions won. Hoeffding inequality implies that
2
P{X s − x ≥ ε} ≤ e−2sε , hence we get that the total regret is smaller than
T
r
log(T ) 1+α
X
CE Xs − x + 2
s +
s=1
T Z ∞
r
log(T ) α −2sε2
X
≤ C(1 + α) q ε+2 e dε
log(T ) s
s=1 −2 s
T Z∞ α 2
1
X p u
= C(1 + α) 1+α √ u + 4 log(T ) e− 2 du
s=1 (2s) 2 −4 log(T )
T 1+α T Z∞
X 16 log(T ) 2 (1 + α) X 1 u2
≤ C(1 + α) +C 1+α √ u α e− 2 du .
t 2
s=1 s=1 s 2 4 log(T )
The result follows from instantiating the above sum over different values of α. For α > 1,
the first sum is controlled using the fact that all terms are necessarily smaller than 1.
These results only hold if the realized values vt are i.i.d. of unknown expectation
x, using the basic ideas of stochastic multi-armed bandits and devising a new bidding
algorithm based on UCB. If values vt can be any sequence, then it is also possible to achieve
non-trivial regret bounds by using as a building block EXP.3 algorithm instead of UCB
(Weed et al., 2016). Those results and techniques can also be exported to other auctions
settings than Vickrey (Feng et al., 2018b).
90
4.1.2 Adaptivity to the mechanism and other bidders
If a bidder has, at first, not enough information on the auction mechanism and/or the
distributions of values of his competitors, he cannot compute an appropriate bidding
strategy even if he knows perfectly the values he gives to item. The repeated auction setting
can then be helpful for him to learn, on the fly, a strategy. A possible approach is again to
use ides from multi-armed bandits, but more precisely on contextual bandits.
For simplicity, assume that the value distribution of the bidder has a finite support
included in [0, 1], denoted by {x1∗ , . . . , xL∗ }. Then a bidding strategy consists in finding, for
each possible values x`∗ a corresponding bid. A classical contextual bandits technique
consists in discretizing the set of bids [0, 1] in {b1∗ , . . . , n∗K } and in running independent
versions of a base bandit algorithm (such as EXP.3 for instance), one for each possible
values, where the set of arms is the discrete set of bids. Such regret minimizing algorithms
have indeed been reported in the online advertising industry (Nekipelov et al., 2015).
In this setting, we can even assume that the opponents change through time, so that
their sequence of bids might be arbitrary (but, conditionally to the past and to the actual
value x` , their bids at some given auction t ∈ N are independent from the bid of player i).
In this setting, the oracles to which algorithms are compared to are stationary strategies,
i.e., fixed mappings from values to bid; denote their set by Bi . Then the maximal expected
cumulative utility bidder i can get in this class of strategies is
T
X h i
max E ui ((βi (xt ), b−i,t ), xt ) ,
βi ∈Bi
t=1
where xt is the value for the item t of bidder i, b−i,t is the vector of bid of his opponent and
βi is a possible strategy. This quantity serves as a benchmark for a learning to bid policy.
Notice that if the mechanism is DSIC, then the optimal mapping in the above equation
is β(x) = x, i.e., bidding truthfully. On the other hand, the realized cumulative utility of
bidder i is equal to
XT
E[ui ((bi,t , b−i,t ), xt )] ,
t=1
where bi,t is his bid at stage t, after observing the value xt . As a consequence, the overall
regret of bidder i is the difference between those two terms, the benchmark, and the
cumulative utility and reads as
T
X T
X
RT = max E[ui ((βi (xt ), b−i,t ), xt )] − E[ui (bi,t , b−i,t ), xt )] .
βi ∈Bi
t=1 t=1
91
Section 3.6.1, we get that
L p
X X T
X
RT ≤ 2 K log(K)T` + max E[ui ((b` , b−i,t ), x`∗ )] − max
∗
E[ui ((b`∗ , b−i,t ), x`∗ )]
b` ∈[0,1] b` ∈[0,1]
`=1 t:xt =x`∗ t:xt =x`∗
(4.1)
where T` = ]{t ≤ T : xt = x`∗ } is the number of times the values was equal to x`∗ . The first
term of Equation 4.1 corresponds to the estimation error, and the second term to the
approximation error, because EXP.3 are restricted to bid in the discretization of [0, 1] while
the optimal bidding strategy does not have this restriction.
We will need some regularity assumption on the mechanism used by the seller; we will
assume that it is “C-almost Lipchitz”, for some constant C > 0, in the following sense. For
any value xi ∈ [0, 1], for any bids b−i of the opponents and any bids bi of bidder i, there
exists a point bkε in the ε-regular grid of [0, 1] (i.e., bkε = kε for some integer k) such that
ui ((bi , b−i ), xi ) ≤ ui ((bkε , b−i ), xi ) + Cε. Classical auction mechanisms (first and second price
auctions, with or without reserve prices, Myerson auction...) all satisfy this assumption
that ensures the approximation error is of order T ε if {b1∗ , . . . , bK ∗
} is the ε-regular grid (in
particular, this implies that K = 1/ε). We emphasize here that auction mechanism are
usually not Lipschitz (as there are discontinuities around the smallest winning bid).
Proposition 4.2. If the mechanism is C-almost Lipchitz (with C unknown) and the value
distribution has a finite support L, then there exists a learning to bid policy whose regret,
with respect to the optimal in hindsight bidding strategy smaller than
1
RT ≤ (2 + C)(LT 2 log(T )) 3
Proof. One just need to put the definition of C-almost Lipschitzness in Equation 4.1, as
this gives that regret scales as
r
1 1 1
RT ≤ 2 log( )LT + CT ε ≤ (2 + C)(LT 2 log(T )) 3
ε ε
with the specific choice of ε = (L log(T )/T )1/3 .
This result only holds for distribution with finite support; for continuous distribution,
the trick consists in bucketing the support of value distribution into small bins of size ε
and using, as before, an independent version of EXP.3 per bin. This only works with some
regularity assumption on a bin. Specifically, given a small bin I , we shall assume that there
exist a constant bid b that is ε-optimal on the set of stages where the values xs belongs to I ,
i.e.,
P P
s≤T :xs ∈I ui ((β(xs ), b−i,s ), xs ) s≤T :xs ∈I ui ((b, b−i,s ), xs )
max ≤ max + C 0 ε.
β:I →[0,1] ]{s ≤ T : xs ∈ I } b∈[0,1] ]{s ≤ T : xs ∈ I }
92
In particular, this assumption is satisfied if Ui is C 0 -Lipschitz and the bids b−i,t does
not depend (too much, at most in a Lipschitz fashion) on xt . Once again, balancing the
approximation (both in the bid and the value spaces) and estimations errors gives the
optimal choice of ε = (log(T )/T )1/4 for a regret scalling as
1
RT ≤ (2 + C + C 0 )(T 3 log(T )) 4 .
If the utility function ui (and/or the opponents bid b−i,t ) is not Lipschitz but less regular
(such as β-Hölder, which means that |ui (x) − ui (y)| ≤ Lβ kx − ykβ for some constant Lβ ),
then the rate of regret growth would be impacted as one should find better tradeoffs
in approximation vs estimation errors. This would typically lead to a regret scaling as
(T log(T ))b where b ∈ [ 12 , 1) is some parameter depending on the different regularities of
the mappings at hand.
Under the assumption that bids are much smaller than total budget, the problem loses
much of its stochastic component and is often approximated by its so-called fluid approx-
imation (which replaces both objective and constraint by their expectations, effectively
appealing to uniform laws of large numbers (Dudley, 2014)), turning the problem to
T
(1) (1)
X
max E[1{bt > b−i,t }(xt − b−i,t )] ,
{bt }Tt=1
t=1
T
(1) (1)
X
subject to E[1{bt > b−i,t }b−i,t ] ≤ B .
t=1
93
Under mild assumptions, and this is a consequence of some strong duality properties, an
optimal bidding strategy is
xt
bt = ,
1 + µ∗
where µ∗ is the optimal solution of the dual problem associated with the constrained
optimization mentioned above (Proposition 3.1, (Balseiro et al., 2015)). The bidding strategy
is similar to the optimal bidding strategy in a second price auction; however the value of
each item is discounted by a factor accounting for the constraint.
This result can be extended to a much broader set of auctions, such as first price,
generalized second price etc... where optimal bidding turns out to be of a similar form to
optimal bidding without constraint, the value of each item being linearly discounted by a
constant accounting for the budget constraint (Gummadi et al., 2012).
Three important conceptual ideas emerge from this type of problems. The first one
concerns the question of pacing. For a wide variety of auctions (and payment rules), the
optimal strategy to maximize the purchased inventory amounts to spending one’s budget
smoothly, i.e., at the rate of arrival of auction requests (Fernandez-Tapia, 2015; Aström and
Murray, 2008). However, a crucial assumption for this result to hold is that the price paid
as a function of win rate does not depend on time and hence the environment is stationary.
Not surprisingly, when this assumption does not hold, “smooth-spending” at the rate of
arrival of auction requests is no longer optimal. These general techniques can however be
used in that more general case to understand the optimal rate of winning auctions and of
spending in these more general cases.
To be implemented in practice, these ideas require of course a forecast for both the
arrival rate of auction requests and the price paid at a certain win rate. When the bidder
has access to such information, ideas of model-predictive control and re-optimization can
be used (Ciocan and Farias, 2012).
A third important line of work concerns online estimation of the parameter µ mentioned
above, possibly without forecast (Balseiro and Gur, 2019). The essential idea is to use the
fact that µ∗ mentioned above in the solution of an optimization problem and to solve this
optimization problem online, using online gradient descent (Shalev-Shwartz and Ben-
David, 2014). To be slightly more specific, in the problem mentioned above, the optimal
solution for µ in hindsight is determined through
T T
(1)
X X
inf (xt − (1 + µ)b−i,t )+ + µB ,: inf `t (µ).
µ≥0 µ≥0
t=1 t=1
The functions `t are observable at time t and are a sequence of functions arriving in a
streaming fashion. As a consequence, the estimate of the parameter µ∗ can be updated
in an online fashion, without a forecast, by using for instance the online (sub)-gradient
94
descent rule
µt+1 = µt − γ∂µ `t (µt ) ,
where ∂ is the (sub)gradient operator and γ is the stepsize in the online gradient descent
algorithm. Various theoretical guarantees about this scheme, under a variety of optimistic
and pessimistic assumptions about the amount of information that is known about the
environment in which the bidder evolves, can be proved (Balseiro and Gur, 2019).
The literature on this topic is very large, with many different variations (Ghosh et al.,
2009; Choi and Mela, 2018; Lee et al., 2013; Yuan et al., 2013; Xu et al., 2015). The question
of handling the situation where the bid to budget ratio is not close to 0, and hence the
fluid approximation is not well justified, is quite open and appears to be more of a genuine
stochastic control type. An interesting approach seems to use ideas coming out of the
analysis of the online knapsack and related problems (Arlotto and Gurvich, 2019).
95
where they are using algorithms such as EXP.3, that base their decisions on the mean
reward observed. Their particularity is that they rarely pick an arm whose current mean is
significantly worse than the current highest mean; for instance if during the first t stages
an arm k has generated an average reward (denoted by X k,t ) that is smaller than the one
of arm `, then the probability of choosing k over j is exponentially small. More precisely,
the difference of these log-probabilities scales linearly with X k,t − X j,t . We will consider in
the following a general class of algorithms that exhibit similar behavior, that are called
η-mean based, but more general than just EXP.3.
Definition 4.3. In the standard multi-armed problem, an algorithm is η-mean-based, for
some η ∈ (0, 1), if at any stage t ∈ N and for any pair of arms (k, `), if X k,t < X `,t − η then
the probability pk,t+1 that the algorithm pulls arm k at time t + 1 is smaller than η. An
algorithm is (asymptotically) mean-based if η = oT →∞ (1).
In particular, EXP.3 is asymptotically mean-based, and so is -greedy (for η = ε).
To simplify the following statements, we are going to assume that there is only one
bidder, whose value distribution F is known beforehand (an assumption that can be fairly
weakened (Deng et al., 2019)). If the seller was using the same mechanisms at each auction
t ∈ {1, . . . , T }, then the optimal one would obviously to post the monopoly price. This
generates a total revenue of T times the monopoly revenue (because the bidder will quickly
learn the optimal strategy). On the other hand, if the seller knows that the bidder is using
a mean-based algorithm, she can generate a much higher revenue (Braverman et al., 2018),
almost as high as the total welfare denoted by W(F) = Ex∼F [x]; with n bidders, the total
welfare would be W(F) = Ex∼F [maxi xi ]
To achieve this, the mechanism must change through time and possibly be itself
adaptive to the sequence of bids of the buyer. We therefore introduce the concept of
dynamic mechanism, so that the actual auction rules might change from stage to stage.
Recall that at each stage t ∈ {1, . . . , T }, bidders valuations for the item are sampled
through distributions Fi . These distributions are fixed from one auction to the other. We
denote by at ∈ A the auction mechanism chosen at time t by the seller and by bi,t the bid of
buyer i. We also denote by Ht = {a1 , b1,1 , bn,1 . . . , at−1 , b1,t−1 , bn,t−1 } the finite history at time
stage t, that consists of past auctions and past buyer’s bids.
S
Definition 4.4. A dynamic mechanism DM : t Ht → A is a mapping that associates to
any finite history Ht an auction at =: DM(Ht ). A bidder’s dynamic strategy, S, is a mapping
from Ht to the set B of strategies (i.e., functions from values to bids).
The following theorem states that a seller can extract the full surplus of the system, if
bidders are using naïve learning algorithms.
Theorem 4.5. If the bidder is running a mean-based algorithm, for any ε > 0, there exists
a dynamic selling mechanism such that the seller can get (1 − ε)W(F)T − o(T ).
96
The intuition behind this result is that the seller can lure the naïve algorithms such
as EXP.3 by setting low prices during a first (large) period of time and then by increasing
drastically the reserve price during a second stage. This is illustrated in the following
example with n = 1 bidder (Braverman et al., 2018).
Assume the bidder value xt has the following distribution
1 1
4 with probability 2
1
1
xt =
2 with probability 4
1 with probability 1
4
Simple computations shows that corresponding monopoly price is 1/4 and setting it at
each time generate a revenue of T /4 after T auctions.
To fool a mean-based algorithm, the seller can use the following scheme. At any stage,
it will only allocate the item if the bid is exactly 1 (any other bid gives a utility of 0). It
remains to define the payment associated to a winning bid of 1. During the first T /2 stages,
it is equal to 0, while it will be equal to 1 during the last T /2 stages.
Recall that the bidder runs multiple independent instances of EXP.3, one for each
possible values (1/4, 1/2 and 1), so we can focus independently on the set of stages where
the value is constant, and for the sake of simplicity we are going to assume that this value
equals 1/2 (resp. 1) exactly T /8 times during the first and second half of the game.
• On the set of stages where the value is 1, EXP.3 quickly learns that bidding 1 is
optimal during the first half of the game. This generates a cumulative utility of T /8
to the buyer. As a consequence, EXP.3 will keep bidding 1 when the value is 1 at each
stage of the second half with exponentially high probability. The revenue generated
on those stages by the seller is then approximatively T /8.
• When the value is 1/2, bidding 1 during the first half generates a revenue of T /16
to the buyer. During the second half, bidding 1 generates a negative utility of −1/2
per stage, so that the cumulative utility of bidding 1 decreases, but remains positive
during the whole process (and EXP.3 will keep bidding 1 with arbitrarily high
probability for almost all stages). The revenue generated on those stages by the seller
is then also approximatively T /8.
• When the value is 1/4, bidding 1 during the first half generates a revenue of T /16
to the buyer. During the second half, bidding 1 generates a negative utility of 3/4
per stage, so that the cumulative utility of bidding 1 decreases, but remains positive
during T /12 additional stages where EXP.3 will bid 1 with high probability (and
afterwards stop bidding 1 as the cumulative utility of this bid is negative). The
revenue generated on those stages by the seller is then also approximatively T /12.
97
At the end, the total revenue of the seller is therefore of the order of T /8 + T /8 + T /12 =
T /3 which is much bigger than T /4. The trick for the seller was to make the bidder
overpay on many auctions by exploiting the behavior of mean-based algorithms that keep
bidding 1 even when instantaneous negative utilities occur. Unfortunately for the seller,
this theorem only holds for mean-based buying algorithms. Even worse, for any dynamic
selling mechanism, there exists a buyer’s strategy such that he does not pay more than T
times the monopoly price (Braverman et al., 2018).
4.2.2 Trading off ex-post individual rationality for full surplus extraction
If the buyers are using naïve algorithms, and the seller knows this, then we proved that
she can extract (almost) the full surplus from the system. This was possible because of
the asymmetry of information between agents. There are other settings where this full
surplus extraction by the seller is possible. The first example we consider is the case where
the individual-rationality assumption of the mechanism is removed. This will induce
another strong asymmetry between agents, as bidders are somehow “forced” to participate
in auctions. The idea is to consider the weaker concept of ex-ante, instead of interim,
individual-rationality (see Section 2.1.1 for more details on the differences). In the ex-ante
setting, the bidder does not know the value he will give to the item before he agrees to
take part in the auction - he therefore has the same information as the seller on his private
valuation. An ex-ante individually rational mechanism must give a non-negative expected
utility to the bidder.
Theorem 4.6 (Cremer and McLean, 1988). There exists an ex-ante individually rational
and incentive-compatible auction where the bidders’ utilities are all equal to zero and the
seller extracts the full surplus.
Proof. Since the seller knows the bidders’ value distributions, she can compute their
expected utilities in a second price auction.
The mechanism constructed is simple. It consists of an entry fee that must be paid
before participating to the auction (stated otherwise, the bidder must pay this amount no
matter the outcome of the auction); afterwards, a standard second price auction without
reserve price is run. Choosing for the entry fee the expected utility in the second price
auction gives an expected utility of zero to each buyer and the seller extracts the full
surplus of the game. This mechanism is of course ex-ante-incentive-compatible.
This mechanism is not interim nor ex-post individually rational since for all valuation
vectors, the utility of all losing bidders is negative. In order to slightly overcome this issue,
(Balseiro et al., 2018) and (Mirrokni et al., 2016) refined this mechanism to ensure that
bidders’ utilities are not too negative at some point in the game; the trick is to dispatch the
98
fee on the different time steps instead of being paid at the beginning of the game. They
also generalize the original setting to more complex dynamic auctions.
Considering the ex-ante setting makes sense only in auctions where the buyers do
not know before participating their own valuation (but only the distribution). It is quite
unrealistic in many single item auction, but it could make some sense when T successive
auctions are run as in online ad market: indeed, the fee must be paid before taking part
in any of the T auctions. While bidder can compute their distribution of values in the
future, they do not know in advance what will be the exact future realizations. This
unfortunately requires the bidders to also know perfectly the number T of future auctions.
This assumption has been weakened by (Agrawal et al., 2018) that adapted the above
mechanisms to bidders that do not believe that there will be T auctions. In this case, they
are quite likely to refuse to pay a fee computed on T auctions early in the game.
A crucial assumption of this line of work is to assume that bidder’s value distributions
are known to the seller beforehand; this enables her to compute precisely the extra-fees
that can be charged to the bidders without breaking the ex-ante individually rational
assumption. Similarly, it implicitly assumes that bidders are able to compute best response
to dynamic mechanism (and that they implement them); this assumption is weakened in
the following section.
A first attempt to remove the prior knowledge of bidder’s value distributions, and instead
to learn them (Amin et al., 2014; Mohri and Medina, 2015; Golrezaei et al., 2021), is
to consider mechanisms that are incentive compatible (up to a small number of bids)
under the assumption that bidders are almost myopic or impatient – i.e., they have a fixed
discount on future utilities. This again introduces an asymmetry between the bidders with
a discounted long-term utility, and the seller with an undiscounted long-term revenue
(infinitely patient).
To simplify the exposure, we will focus on the posted price case. Formally, let us denote
by pt the price of the item at time t chosen by the mechanism and by dt the decision of the
buyer to buy (dt = 1) or to refuse the item (dt = 0). Since the distribution of values is not
known beforehand, pt+1 can only depend on the finite history H0t = {p1 , d1 , . . . , pt , dt }. The
discounted bidder utility is Tt=1 γt dt (xt − pt ), where γ = (γt )t is a sequence of non-negative
P
weights. In this section, we shall assume for simplicity that values are uniformly bounded
by 1.
The objective of the seller is to choose a dynamic selling mechanism DM, that will
maximize her revenue, against buyers that know DM and respond optimally for them in
the long run, i.e., in the objective of maximizing their discounted and expected utilities. Let
us denote by dt∗ (DM) the optimal strategy of the bidder and by pt (DM) the price posted.
99
The performance of a dynamic mechanism will be measured in terms of “regret”, whose
definition is slightly different than in the previous section.
Definition 4.7. Given a dynamic mechanism DM, the discount sequence γ and the value
distribution F , the regret of the seller is
T
X T
X
RT (DM, γ, F) = E dt∗ (DM∗ )pt (DM∗ ) − dt∗ (DM)pt (DM)
t=1 t=1
In this setting, DM∗ consists in posting the monopoly price corresponding to the value
distribution F at each round. We emphasize here that the dependencies in γ and F in the
regret definition are hidden in the best responses dt∗ (·).
Theorem 4.8 (Amin et al., 2013). Let γt be any positive non-increasing sequence and DM
be any dynamic selling mechanism. Then, there exists a buyer value distribution F such
1 PT
that the regret RT (DM, γ, F) ≥ 12 t=1 γt . In particular, sublinear regret is impossible to
PT
achieve if t=1 γt = Θ(T ).
This theorem states that if the buyer is patient enough, the seller cannot learn the
monopoly price quickly enough to reach a sublinear regret. However, when the sequence
γt decreases geometrically, i.e., γt = γ t for some γ ∈ (0, 1], sublinear regret is possible as
P
γt = o(T ). In words, this means that if the buyer is much more impatient than the seller,
the latter can extract surplus; moreover, this can be achieved with a simple two-phased
dynamic mechanism (Amin et al., 2014).
2. Phase 2 (of length: (1−α)T ) : compute the optimal price using some robust estimation
procedure and post it until the end.
The formal proof of this statement is a bit long and technical, but the main ingredients
are quite easy to understand. First, we are going to assume that the horizon T ∈ N is known
beforehand, otherwise one could just use the doubling trick.
The key idea is to bound the number of times the buyer can lie by not being truthful,
i.e., either by buying the item at a price higher than his value or refusing a lower price.
Notice that the net cost of a lie at stage t ∈ N, if the stage valuation is xt and the price
posted pt , is exactly equal to |xt − pt |. Since the prices pt are i.i.d., and uniformly drawn on
[0, 1], then the potential costs |xt − pt | are smaller than ε only 2εαT times during the first
100
phase (at least in expectation, but we are going to neglect the deviations in this sketch of
proof). As a consequence, if the buyer lies L times during this phase, then at least L/2 of
those lies must have a cost of at least L/(4αT ). Recall that the buyer puts weight γt = γ t to
the t-th stage, so that the cumulative, discounted cost of those L lies is at least
αT
X L L γ αT +1 −L/2−1 L 1 L2 +1 γ αT +1
γt = (γ − 1) ≥ .
4αT 4αT 1 − γ 8αT γ 1−γ
t=αT −L/2
It remains to control the total gain of those L lies. As best, they will induce a posted price
of pt = 0 during the second stage, and a per-stage gain of at most 1 for the buyer. As a
consequence, the total cumulative gain of the buyer is at most
T
X γ αT +1 γ αT +1
γt = (1 − γ T (1−α) ) ≤ .
1−γ 1−γ
t=αT +1
All things put together, the cumulative discounted net gain of lying L times is upper-
bounded by
γ αT +1 L 1 L2 +1 γ αT +1 γ αT +1 L 1 L2 +1
− = 1− .
1 − γ 8αT γ 1−γ 1−γ 8αT γ
A direct consequence of the above inequality is that the number of lies L, for them to be
profitable, must satisfy
L 1 L2 +1 log(8αT )
≤ 1 =⇒ L ≤ 2 .
8αT γ log(1/γ)
As a consequence, this gives a simple upper-bound on the number of lies that can be seen
as “outliers” from the point of view of the seller, when trying to estimate the optimal
price.
From the point of view of the seller, the regret can be decomposed into the cost of the
first phase, bounded by its length αT , and the cost of the second phase, bounded by T η,
where η is the error on the optimal price computed during the first phase. The remaining
question consists in bounding this error; standard robust estimation techniques (such as
median1 of means (Lecué √ and Lerasle, 2020)) or gradient descents with outliers indicate
that η is of the size of L/αT . Adding both terms and considering the previous bound on
L gives a regret scaling as, up to multiplicative constant,
s r s
log(T ) T log(T )
αT + ≤ T 2/3
log(1/γ) α log(1/γ)
1 This technique consists in dividing the full dataset of size αT in 2L different datasets and estimating
the optimal price on each of them. There necessarily exists a majority of small datasets without outliers that
estimate correctly the optimal price. Hence taking the median value is a robust procedure as long as L = o(αT ).
101
with the choice of α = T −1/3 . The simple idea behind this algorithm was then refined
(Mohri and Medina, 2015) and extended to the case of K bidders (Golrezaei et al., 2021).
Remark. Once again, this surplus extraction is possible only because there is an (artificial)
asymmetry between the seller and the buyer preventing him to be too strategic. This
can also be enforced through another approach, yet it is valid only with several (almost)
symmetric bidders – leading to another type of asymmetry between the seller and buyers.
The idea is quite simple: make the computations required (e.g., to determine a reserve price)
not as a function of the buyer bids, but as a function of his competitors’ bid (Ashlagi et al.,
2016; Kanoria and Nazerzadeh, 2014; Epasto et al., 2018). Unfortunately, this approach
cannot handle the existence of any dominant buyer, i.e., a buyer with much higher values
than the other bidders (Epasto et al., 2018). Therefore, the impact of this technique is
quite limited since revenue-optimizing mechanisms are mostly important when the buyers
are heterogenous. Moreover, in the main real-world application of online advertising,
with asymmetric bidders and no specific asymmetry between seller and buyers on future
utilities, none of these mechanisms ends up being able to enforce truthful bidding.
102
time t, where qt = q(b, Ht ) ∈ ∆n is the assignment at time t resulting from the sequential
assignment function q : B × t Ht → ∆n . We will explicitly denote it Ht (b, c) if/when
S
we want to emphasize that Ht is a function of the bids b and of the click (potential)
realizations c ∈ {0, 1}T ×n . Finally, because the bidders only give one bid at the beginning,
we can consider the payment p : B × HT → Rn is done at the very end (after t = T ), sort of a
final billing.
In fact, this setting can be viewed as a multi-armed bandit (MAB) with an "unusual"
way to define the reward, the assignment function q being the bandit algorithm. So here,
the question will be whether it is possible to recover the performance of optimal MAB
algorithm (KL-UCB) or whether restricting to assignments q for which it is possible to find
a payment that make the mechanism incentive compatibility will lead to a degradation
of the performance. Following different objectives of performance, the pseudo-regret can
be defined in terms of seller’s revenue (Devanur and Kakade, 2009) , or in social welfare
(Babaioff et al., 2014). In both case, the comparator of the regret is a weighted second-price
auction for which ρi s are known. Remembering we denote by it the winning bidder at time
t and by smax the second-highest element ("second max"), we have
T
p
X
RW
T = T max αi ρi − αit ρit RT = T smax αi ρi − p(b, HT )
i i
t=1
Theorem 4.10 (Babaioff et al., 2014). For n = 2, given a scale-free3 and deterministic
dynamic allocation q, there exists a payment p such that the resulting dynamic mechanism
is 0-rational and DSIC iff
1. (pointwise-monotone) for any bid profile and for any realization of the history, if bidder i
wins at round t, he would still win by bidding higher, i.e.
2 This assumption is technical and allows to avoid dealing with exposing results that hold almost surely
w.r.t. Lebesgue measure.
3 The scale-free property just means that rescaling the bids doesn’t change the outcome of the assignment –
e.g. it does not depend on the currency.
103
∀t ∈ [T ], ∀c ∈ {0, 1}t×n , ∀b ∈ B, if bidder i wins the auction at time t,
then for b̃i > bi , we have qi (b̃i , b−i , Ht (b̃i , b−i , c)) ≥ qi (b, Ht (b, c)).
Proof. To simplify notation for the proof, we denote q(b, c) ∈ Rn×T the matrix with columns
qt (b, c)> = q(b, Ht (b, c))). Further, we recall that the payment for a truthful mechanism is
Rb
defined by pi (b, c) = hbi ci , qi (b, c)i − 0 i hci , qi (u, b−i , c)idu (Archer and Tardos, 2001). We
break the proof in three steps, proving first that 0-rationality and DSIC implies monotonic-
ity, then exploration-separation and finally the converse statement.
DSIC ⇒ monotone. The proof is by contradiction. Assume there exists t, c, b, bi+ such
that bi < bi+ and qi,t (b, c) > qi,t (bi+ , b−i , c). W.l.o.g. we can assume there are no clicks at any
time t 0 ≥ t (as they do not affect assignment at time t) and we denote c0 = c ⊕ 1{(i, t)}, where
⊕ denotes the bit change – i.e. the addition modulo 2. Because buyer i does not win at step
t by bidding bi+ , we should have pi (bi+ , b−i , c) = pi (bi+ , b−i , c0 ). Contradiction will come by
proving they are not equal.
We can focus on the integral term of the payment, as the first one does not change
between c and c0 .
∀u ∈ [0, αmax ], hci , qi (u, b−i , c)i ≤ hci , qi (u, b−i , c0 )i (no clicks after time t)
Z b+ Z b+
i i
⇒ hci , qi (u, b−i , c)idu ≤ hci , qi (u, b−i , c0 )idu
0 0
c> 0 0
i qi (u, b−i , c) = hci , qi (bi , b−i , c)i < hci , qi (bi , b−i , c )i = hci , qi (u, b−i , c )i
R b+ R b+
Then, it means 0 i hci , qi (u, b−i , c)idu < 0 i hci , qi (u, b−i , c0 )idu, which is a contradiction
with the payments being equal.
DSIC ⇒ exploration-separated. The proof is again by contradiction. Assume there
exists t < t 0 , c, b such that
2. qt 0 (b, c) , qt 0 (b, c0 ) with c0 = c ⊕ 1{(2, t)}, i.e., time t 0 is influenced by output of time t,
104
5. there is no click after t 0 (again, w.l.o.g.).
b̃1
Since q is scale-free, for b1+ = b , we have q1,t (b1+ , b2 , c) = 1, thus, as q is pointwise-
b̃2 2
monotone, b1+ > b1 . Because the difference between c and c0 is on buyer 2 at time t, then
p(b1+ , b2 , c) = p(b1+ , b2 , c0 ). Contradiction will come by proving they are not equal.
R b+
We focus on the payment of buyer 1 and again, on the integral terms 0 1 hc1 , q1 (u, b2 , c)idu
R b+
vs 0 1 hc1 , q1 (u, b2 , c0 )idu. Assume w.l.o.g. that q1,t 0 (b, c) < q1,t 0 (b, c0 ), then by pointwise
monotonicity, we have ∀u < b1+ , q1,t 0 (u, b2 , c) ≤ q1,t 0 (u, b2 , c0 ). Then, since q is non-degenerate,
the strict inequality holds on a non-degenerate interval, hence
Z b+ Z b+
1 1
hc1 , q1 (u, b2 , c)idu < hc1 , q1 (u, b2 , c0 )idu,
0 0
As q is exploration-separated, then qt (b, c) only depends on two elements. The first one
is obviously b, and the second one is subset Et (c) of the set histories Ht (b, c) that is
independent of b. Consequently, we can write qt (b, Ht (b, c)) = qt (b, Et (c)) and thus
Z T
bi X
pi (b, c) = bi hci , qi (b, c)i − ci,t qi,t (u, b−i , Et (c))du.
0 t=1
Note that Theorem 4.10 characterizes the dynamic assignment, as the payment is
derived as in Corollary 2.18 (Archer and Tardos, 2001). This theorem confirms earlier
results (Devanur and Kakade, 2009), based on a dynamic selling mechanism with an
explore then commit structure (ETC, Perchet and Rigollet, 2013). We will describe this
105
specific algorithm for welfare regret (Babaioff et al., 2014) and provide upper-bounds for
it as it is possible to derive the guarantees both in terms of welfare and revenue for this
algorithm; however, seller’s revenue regret can be handled quite similarly (Devanur and
Kakade, 2009).
For this algorithm, it turns out the final payment is very naturally decomposed as the
sum of per-step payments, so we describe it this way. For the first j k τ steps, the assignment
is a round-robin over the bidders, leading each bidder to win nτ times and paying 0 each
time. This exploration phase allows to build an unbiased estimate ρ̂i of ρi . Further, it
allows to ensure that
r
n n
P ∃i ∈ N , |ρ̂i − ρi | ≥ 2 log ≤ δ. (4.2)
τ ∧T δ
| {z }
,r
Theorem 4.11 (Devanur and Kakade, 2009; Babaioff et al., 2014). The algorithm de-
1/3 T 2/3 log(nT )) and Rp = O(n1/3 T 2/3 log(nT )).
p p
scribed above guarantees that RW
T = O(n T
Proof. Before beginning, as τ is a variable to optimize over, we need to handle the case τ > T
properly, in order to avoid vacuous upper-bounds. Indeed, the length of the exploration
stage is not τ, but rather τ ∧ T , while there are (T − τ)+ remaining steps. Further, we will
make extensive use of the concentration (4.2).
We denote i ∗ = argmaxi αi ρi and i + = argmaxi αi ρ̂i+ . Then with probability at least 1 − δ,
we have,
ρi ∗ αi ∗ − ρi + αi + ≤ 2αmax r (4.4)
106
Choosing δ = T1 and τ = n1/3 T 2/3 log nT finishes the proof for RW (T ).
p
Choosing again δ = T1 and τ = n1/3 T 2/3 log nT finishes the proof for RP (T ).
p
Lower-bounds. These rates of T 2/3 for both regrets are tight, as shown by the following
result,
Theorem 4.12 (Devanur and Kakade, 2009; Babaioff et al., 2014). For any deterministic,
scale-free sequential assignment q and payment p such that (q, p) is DSIC, there exists a set
p
of bids and distributions over c such that RW
T , RT = Ω(n
1/3 T 2/3 ).
Comparison to MAB. As the social welfare coincide with the reward from a MAB point
of view, we can compare this performance to optimal performance on MAB problems. It
turns out the DSIC constraint is actually strong, as it implies a degradation of the regret
by a factor T 1/6 – "the cost of (ex-post) truthfulness" – from O(T 1/2 ) for optimal MAB
algorithms to Ω(T 2/3 ) when ensuring incentive compatibility. To understand intuitively
where this degradation comes from, it is possible to focus on explore-then-commit (ETC)
types of algorithms for the case n = 2, as an exploration-separated assignment rule is a
special case of ETC algorithm. In a pure bandit setting, two types of ETC algorithms can
have a regret of order O(T 1/2 ). Either an adaptive ETC that eliminates arms as soon as
they are detected to be sub-optimal or a fixed-design ETC, at the condition of knowing
in advance the gap ∆ of performance between both arms. Unfortunately, none of them is
exploration-separated. For adaptive ETC, because during the exploration step, the decision
(taken at each time step) to eliminate an arm or to keep it, depends on the estimated
reward of the arm and thus the bids. For a fixed-design ETC, the problem comes from
the need to know in advance the gap: whatever the value of ∆, choosing an exploration
period of length ∆−2 ∧ T ensures a regret upper bounded by O(T 1/2 ). However, because the
107
length of the exploration period depends on ∆ (the gap), which in our case is a function
of the bids, such choice of the length of the exploration period makes the assigment not
exploration-separated. Making an ETC algorithm exploration-separated requires for it to
be fixed-design with the length of the exploration period set independently from the gap,
which is known to cause a degradation of the regret, this so-called "cost of truthfulness".
Randomized Assignment. From the previous result, it would seem that the "cost of
truthfulness" may come from the strong requirement of ex-post incentive compatibility.
However, this lower bound can be circumvented by considering non-deterministic sequential
assignments, which allows to ensure ex-post incentive compatibility without restricting the
algorithm to be exploration-separated (Babaioff et al., 2010). Consequently, a randomized
dynamic mechanism, ex-post DSIC5 , with a O(T 1/2 ) regret guarantee in welfare can be
constructed. It relies on two ingredients:
2. a sampling procedure that modifies the bids that are inputed to the mechanism
Using these properties, it is possible to obtain a regret in terms of welfare that matches the
one of the underlying MAB algorithm and as we mentioned previously, an adaptive ETC
reaches the rate of O(T 1/2 ), which is optimal (Babaioff et al., 2010).
5 Here, ex-post is related to the realization of c , but in expectation over the randomness of the algorithm.
i,t
108
Extensions. Several other directions can be explored to better model the underlying
applications or to slightly relax the mechanism design contraints. A way to relax the
constraint of ex-post DSIC, is to consider an asymptotic version, where the benefit of not
being truthful vanishes over time (Nazerzadeh et al., 2008; Kandasamy et al., 2020). It
turns out this neither allow to avoid the explore-then-commit structure of algorithms for
deterministic assignments nor to avoid the degradation of the regret to T 2/3 , even for more
complicated mechanisms than auctions (Kandasamy et al., 2020).
This problem has been tackled under the assumption of perfect knowledge of the opti-
mization algorithm used by the seller (Kanoria and Nazerzadeh, 2014; Tang and Zeng,
2018; Nedelec et al., 2019a). It exploits a conceptual opening in most automatic mecha-
nism design works, i.e., the breakdown of incentive compatibility for the buyer when the
seller optimizes over incentive compatible auctions. In some sense, the computation of
bid distributions FBi instead of value distributions Fi can be seen as an “attack” of the
optimization algorithm of the seller (possibly based on deep learning for complex auction
systems (Dütting et al., 2019)). However, we point out now that those attacks differ from
the celebrated adversarial attacks in computer vision. Indeed, the latter generally rely
on the lack of local robustness of a classifier. Two other major differences are also quite
important: these “attacks” do not necessarily yield lower revenues for the seller (Nedelec
et al., 2019a); and they are also part of a dynamic game between buyers and seller and as
such have a dynamic component that is absent from classical and static machine learning
frameworks, such as image classification.
For concreteness, consider the case of second price auctions. In the classical setting of
auction theory, the buyer is asked to reveal their bid distribution first; facing “truthful
auctions”, they reveal their value distribution. The seller then optimizes their mechanism
109
based on this information, finding an optimal reserve price for this buyer. This is a Stack-
elberg game, as the two players do not play at the same time. In this instance, the seller
is the leader and the buyer is the follower. Most of the literature on optimal auctions is
focused on this version of the Stackelberg game.
Howerver, if the bidder knows that the seller is going to find an optimal mechanism,
and hence that she will optimize the auction based on the information given by his bid
distribution, he can anticipate this optimization to increase his utility. The order of the
Stackelberg game is then reversed. The bidder becomes the leader and the seller the
follower: he reveals his bid distribution knowing the optimization problem that she will
solve. In second price auctions with reserve prices, the bidder has an incentive to disclose
a bid distribution that may be different from his value distribution as he then might be
facing a more favorable reserve price.
More formally, the timing of the game we consider is the following:
1. the seller chooses a mapping M : F → A, from the set of bid distributions to the set
of auction mechanisms,
3. bidder i’s utility is computed in expectation when xi ∼ Fi , he bids βi (xi ) and the
outcome of the auction (allocation and payment) is defined by M(FB1 , . . . , FBn ). With
a slight abuse of notations, we will denote by Ui (βi ) his expected utility, assuming
the other bidders strategies are fixed.
Let first consider the posted price setting where n = 1 bidder plays against one seller. We
assume, for simplicity of this introductory example, that bidder’s value distributions Fi is
U[0, 1], i.e., uniform on the interval [0,1]. Let us initially consider that the bidder is bidding
truthfully, i.e, βi = Id. In this case, FBi = Fi and the seller will set as reserve price the
monopoly price by maximizing the monopoly revenue r(1 − Fi (r)). This monopoly price is
equal to 0.5 in the case of U[0, 1]. Note that this maximization problem is computationally
simple as the monopoly revenue is a concave function if the value distribution is regular.
The bidder can obviously do better. If he bids all the time zero (or ε arbitrarily close to
zero), FBi will be equal to a point mass at zero. Through computing the optimal reserve
price corresponding to FBi , the seller chooses zero, obviously maximizing bidder’s utility.
110
Optimal reserve price Utility
K=1 K=2 K=3 K=4 K=1 K=2 K=3 K=4
Truthful bidding 0.5 0.5 0.5 0.5 1/8 1/12 11/192 13/320
1/2 0.0 0.0 0.0
Zero bidding 0.0 0.0 0.0 0.0
(+400%) (-100%) (-100%) (-100%)
1/4 ≈ 0.094 ≈ 0.036 ≈ 0.015
Divide values by 2 0.25 0.25 0.25 0.25
(+100%) (+13%) (-37%) (-63%)
Thresholded at
1/4 ≈ 0.132 ≈ 0.076 ≈ 0.048
the monopoly price 0.25 0.25 0.25 0.25
(Theorem 4.13) (+100%) (+57%) (+33%) (+20%)
Optimal regularity-
0.0 0.162 0.204 0.22 1/2 ≈ 0.147 ≈ 0.079 ≈ 0.049
preserving strategies
(Theorem 4.17) (+400%) (+76%) (+38%) (+21%)
Table 4.1: Comparison of the utility of the strategic bidder between the truthful strategy, the strategy corre-
sponding to bidding zero for any values, the linear strategy dividing values by two, the strategy introduced
in Theorem 4.13 and the optimal regularity-preserving strategies for each number of competitors (derived
from Theorem 4.17). The first four strategies are fixed and do not require knowledge of the competition to
be computed. The last one is competition-specific and exact knowledge of the distribution followed by the
highest bid of the competition is needed to compute it. For this example, bidders’ value distributions are
U[0, 1] and opponents are assumed to bid truthfully.
The problem we consider derives from a simple extension of this example to the case of n
bidders. In a lazy second price auction, the optimal reserve price for each bidder is still the
monopoly price. Yet, as soon as there is some competition, bidders cannot bid zero as they
get zero utility in this case. They have to tradeoff between beating the competition and
decreasing their reserve price.
4.3.3 Improving the truthful strategy for any distributions of the competition
(Nedelec et al., 2019a) derives a simple strategy which guarantees to the bidder an increase
in utility compared to the truthful strategy for any distributions of the competition. This
increase depends on the distribution of the competition. Yet, by playing this strategy, the
bidder is sure to do better than by bidding truthfully. This is an important practical result
as in many ad platforms, bidders have to bid without knowing the distribution of the
competition. This strategy, that they call thresholding at the monopoly price, has also the key
property of making simple the optimization problem of the seller, i.e., if Fi is regular, the
bid distribution FBi induced by this strategy on Fi is also regular.
Definition 4.3.1. Consider a bidder with a regular value distribution Fi . A bidding strategy
βi is regularity-preserving if the bid distribution FBi induced by βi on Fi is a regular
distribution.
111
When the reserve price is computed from FBi - the bid distribution induced by using β
on Fi - a distinction between the reserve price rβ and the reserve value xβ must be made.
Definition 4.3.2. Given a non-decreasing strategy β, the reserve value xβ is the smallest
value above which the seller accepts bids. In particular, if the bidder bids truthfully,
his reserve value is equal to his reserve price; on the other hand, if β is continuous and
increasing, and rβ is the reserve price associated with the strategy β, then xβ = β −1 (rβ ).
Consider for instance, F = U[0, 1], and the bidding strategy β(x) = x/2, then rβ = 0.25
and xβ = 0.5. By dividing bids by two, the strategic bidder decreases their reserve price but
does not change the reserve value: it is the same as if they were bidding truthfully.
Theorem 4.13. Suppose the value distribution F has a density f , with f > 0 on the support
of F and that the left-end point of its support is 0, and that the other bidders’ strategies are
fixed. Let βr be an increasing strategy with associated reserve value r > 0 in a lazy second
price auction such that the bid distribution associated with βr has a virtual value. Then
there exists another bidding strategy β̃r such that:
1. A reserve value associated with β̃r is 0 and β̃r is increasing.
3. Pi (β̃r ) ≥ Pi (βr ), i.e., the payment of bidder i to the seller is also higher,
The following continuous function fulfills these conditions:
!
βr (r)(1 − Fi (r))
β̃r (x) = 1{x < r} + βr (x)1{x ≥ r}
1 − Fi (x)
A reserve value equal to zero means that the seller accepts all bids of the strategic
bidder. It also means that the reserve price is equal to the minimum bid of the strategic
bidder. This result can be applied to improve any preexisting shading strategy. A very
important case is to apply this theorem to the truthful strategy, showing that there exists a
strategy improving the truthful strategy regardless of the competition distribution. We now
explain why we can improve any strategy in this setting without knowing the distribution
of the competition. Myerson’s Lemma is a key element in this understanding.
In this setting, it is optimal for the seller to choose as reserve price for bidder i the
monopoly price corresponding to her bid distribution, and Myerson lemma implies that
the expected payment of bidder i in the optimized lazy second price auction is equal to
!
−1
Pi (βi ) = Eb∼FB ψFB (b)Gi (b)1{b ≥ ψBi (0)} .
i i
In order to simplify the computation of the expectation and remove the dependence on Bi ,
we rewrite this expected payment in the space of values using the fact that the strategic
112
bidder is using an increasing strategy βi . We will only consider increasing strategies in the
remaining of the survey and so we define:
With this new notation, the expected payment of the strategic bidder i rewrites as
!
Pi (βi ) = Exi ∼Fi hβi (xi )Gi (βi (xi ))1{xi ≥ xβ } .
where xβ is the reserve value. If hβi crosses 0 exactly once and is positive beyond that crossing
point, xβ = h−1 −1
βi (0). If we call ri = ψFB (0) the reserve price of bidder i and βi increasing , the
i
reserve value is equal to βi−1 (ri ).
If we consider only increasing differentiable strategies, and we denote by I the class of
such functions, the problem of the strategic bidder is therefore to solve supβ∈I U (β) with
U defined in Equation (4.7). This equation is crucial, as it indicates that optimizing over
bidding strategies can be reduced to finding a distribution with a well-specified hβ (·). Our
results extend to the case where the strategies are increasing and differentiable except at
finitely many points, as we only need bFB (b) to be absolutely continuous for the previous
result to go through.
A crucial difference between the long-term vision and the classical, myopic (or one-shot)
auction theory is that in this setup bidders maximize expected utility globally over the
full support of the value distribution. In the classical myopic setting, bidders determine
their bids to maximize their expected utility at each value. In our setup, the strategic
bidder also accounts for the computation of the reserve price, a function of her global bid
distribution. He might therefore be willing to sometimes over-bid (incurring a negative
utility at some specific auctions/values) or underbid (lose some auctions that he would
have won otherwise) if this reduces her reserve price. Indeed, having a lower reserve price
increases the utility of other auctions. Lose small to win big. In other words, the strategy
trades-off ex-post individual rationality (IR) for higher utility (of course ex-ante IR still
holds). This reasoning makes sense only with multiple interactions between bidders and
seller.
113
Thresholding the virtual value
Virtual value of the truthful bidder (r=0.5) 1.00
1.00
0.75 no bid
0.50 0.50
0.25 0.25
0.00 0.00
0.25 0.25
0.50 0.50 bid distribution support
0.75 0.75
1.00 1.000.0
0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0
bid b bid b
Figure 4.1: Virtual value of truthful bidder vs. strategic bidder. The value distribution of the bidder is
U[0, 1], the standard textbook example used for the sake of illustration. Her virtual value is therefore equal
to ψ(x) = 2x − 1, and is represented by the blue line. The dashed red vertical line corresponds to the current
reserve price. The green area corresponds to the bidder’s payment if we picked G = 1, i.e., no competition,
for the sake of clarity of the plot. The left-hand side corresponds to truthful bidding, the right-hand side to
strategic behavior. In both cases, the blue line corresponds to ψB .
U([0, 1]), uniformly between 0 and 1. With a truthful bidding, the associated virtual value
is negative below 1/2 and positive above, so that the optimal reserve price is 1/2, so that no
auction is won if the value is smaller than 1/2. On the other hand, if the strategic bidder
was able to send bids so that the virtual value (of bids) below 1/2 is exactly 0, then the
seller would not have any incentives to choose a reserve price, because of Myerson lemma.
In particular, the latter also implies that since the virtual value is zero below 1/2, the seller
receives exactly the same expected payment as with a truthful bidder.
This technique is called thresholding the virtual value. We now show formally how
to find a bidding strategy such that the virtual value of the induced bid distribution is
equal to zero below a certain threshold. Before carrying on with reasoning on the virtual
value, such as in our motivating example, we need to ensure we can find the corresponding
strategy βi that will expose a bid distribution FBi with the corresponding virtual value to
the seller. The two following technical lemmas show how to deduce βi from a given hβi .
Lemma 4.14. Suppose bi = βi (xi ), where βi is increasing and differentiable and xi is a
random variable with cdf Fi and pdf fi , with fi > 0 on the support of Fi . Then
1 − Fi (x)
hβi (xi ) = βi (x) − βi0 (x) = ψFB (βi (x)) . (4.8)
fi (x) i
The above results hold when β is increasing, continuous, and differentiable except
114
at finitely many points. The second lemma shows that for any function g, there exists a
function β such that hβ = g.
Lemma 4.15. Let X be a random variable with cdf F and pdf f , with f > 0 on the support
of F. Let x0 be in the support of X, C ∈ R and g : R → R. Define the function βg by
Rx
C(1 − F(x0 )) − x g(u)f (u)du
0
βg (x) = , (4.9)
1 − F(x)
then,
hβg (x) = g(x) and βg (x0 ) = C .
Moreover, if for some t ∈ R such that x0 ≤ t, g is non-decreasing on [x0 , t], then βg0 (x) ≥
(C − g(x))(1 − F(x0 ))f (x)/(1 − F(x)) for x ∈ [x0 , t]. Hence βg is increasing on [x0 , t] if g is
non-decreasing and g < C.
Proof. The result follows by simply differentiating the expression for βg , and plugging-in
the expression for hβg obtained in Lemma 4.14. The result on the derivative is simple
algebra.
The two technical lemmas 4.14 and 4.15 show that for any non-decreasing function g,
we can find a strategy βi such that the bid distribution induced by using βi on FXi verifies
ψBi (βi (x)) = g(x) for all x in the support of FXi .
We explained why sending to the seller a virtual value equal to zero when the initial one
was negative increases the bidder’s expected utility. To derive the corresponding bidding
strategy β from the virtual value, the strategic bidder only needs to solve the simple ODE
defined in Lemma 4.14.
This improvement of bidder’s utility does not depend of the estimation of the competi-
tion and thus can easily be implemented in practice. We plot in Figure 4.3, the bidding
(0)
strategy β̃0.5 when the initial value distribution is U[0, 1] and the virtual value of the bid
(0)
distribution induced by β̃0.5 on U[0, 1]. We recall that the monopoly price corresponding
to U[0, 1] is equal to 0.5. We remark that the strategy consists in overbidding below the
monopoly price of the initial value distribution. The strategic bidder is ready to increase
pointwise her payment when she wins auctions with low values in order to get a large
decrease of the reserve price (going from 0.5 to 0.25). Globally, the payment of the bidder
remains unchanged compared to when the bidder was bidding truthfully with a reserve
price equal to 0.5. Thresholding the virtual value at the monopoly price amounts to over-
bidding below the monopoly price, effectively providing over the course of the auctions an
extra payment to the seller in exchange for lowering the reserve price/value faced by the
strategic bidder. This strategy unlocks a very substantial utility gain for the bidder.
115
Virtual value of the bid distribution
1.00
Thresholded strategy
0.4 0.25
0.50 bid distribution support
0.2
0.75
0.0
0.0 0.2 0.4 0.6 0.8 1.0
1.000.0 0.2 0.4 0.6 0.8 1.0
value x Bid b
(0)
Figure 4.2: The value distribution is U[0, 1]. Left: Thresholded strategy β̃0.5 compared to the traditional
truthful strategy. Right: virtual value of the bid distribution induced by the thresholded strategy. The
optimal reserve price of the thresholded strategy is equal to 0.25 (corresponding to a reserve value of 0)
whereas the reserve price of the truthful strategy is equal to 0.5. (corresponding to a reserve value of 0.5). The
green area represents the expected payment corresponding to the thresholded strategy (we assumed G = 1 for
the sake of clarity).
Naturally, a key question is to understand the impact of this new strategy on the utility
of the strategic bidder. We compare the situation with two bidders bidding truthfully
against an optimal reserve price and the new situation with one bidder using the thresh-
olded strategy and the second one bidding truthfully. We assume, as is standard in many
textbooks and research papers numerical examples, that their value distribution is U[0, 1].
Then, elementary computations show that in this specific illustrative example, the
strategic bidder utility has a 57% increase, from 1/12 to 1/12 + (log(2) − 1/2)/4 ≈ 0.132, and
the welfare has a 8% increase, from 7/12 to 7/12 + (log(2) − 1/2)/4 ≈ 0.632.
1−F (x)
with hβi (x) = βi (x) − βi0 (x) f (x)
i
and xβi = h−1
βi (0). In this section, we assume that the bidder
i
has now access to the distribution of the highest bid of the competition that denoted by
Gi , with associated pdf gi and that he will optimize his utility among the strategies with
thresholded virtual values introduced in Subsection 4.3.3.
116
Thresholded strategies Virtual value of the bid distribution Monopoly revenue function of the reserve price
1.0 Truthful strategy 1.00 Truthful strategy 0.25
0.00
0.4 0.25
0.10
Figure 4.3: The value distribution is U[0, 1]. Left: Thresholded strategy compared to the traditional truth-
ful strategy. Middle: virtual value of the bid distribution induced by the thresholded strategy. Right:
monopoly revenue of the induced bid distribution as function of the reserve price.
Definition 4.16. A bidding strategy β is thresholded if there exists r > 0 such that for all
x < r, hβ (x) = ψB (β(x)) = 0. This family of functions can be parametrized as
γ γ(r)(1 − F(r))
βr (x) = 1{x < r} + γ(x)1{x ≥ r} ,
1 − F(x)
with r ∈ R and γ : R → R some continuous and increasing mapping.
This class of continuous bidding strategies has two degrees of freedom: the threshold r
such that for all x < r, hβ (x) = 0 and the strategy γ used beyond the threshold. We do not
restrict the functions γ that can be used beyond the threshold (beside being continuous
and increasing). All the strategies defined in this class have the property that their reserve
value is equal to zero, i.e., their reserve price is equal to their minimum bid, when the seller
is welfare benevolent and the virtual value of γ is positive beyond r. We can prove that the
optimal regularity-preserving strategy belongs to the class of thresholded strategies.
The following result states that there exists an optimal threshold r for the strategic
bidder that depends on the competition and that the optimal strategy to use for x > r is to
be truthful. It is derived by computing the directional derivatives of the utility function
defined in Equation (4.10).
In Theorem 4.13, we proved that when the strategic bidder does not know the distri-
bution of the highest bid of the competition, he can use the thresholded strategy at his
monopoly price and increases his utility compared to truthful bidding. Theorem 4.17 gives
the optimal threshold when the strategic bidder knows G.
117
Some numerical results We consider the situation where we have 1 strategic bidder, and
1 non-strategic one, both wit in Subsection 4.3.3 was to bid truthfully beyond the monopoly
price (r = .5 here) and using Theorem 4.13 before. This strategy yields a utility of 0.1316, a
57% increase over the standard truthful bidding revenue. The optimal strategy coming out
of Theorem 4.17 consists in bidding truthfully beyond r ' .8 and using the thresholding
completion before. The utility is then around 0.1468, a 76% percent increase in bidder
utility compared to bidding truthfully (truthful bidding yields a utility of 1/12 ' .083).
This second strategy yields a higher utility for the strategic bidder but requires some
knowledge of the competition. The optimal strategy in Theorem 4.17 overbids on small
values, underbids on intermediate values and is truthful on high values. We also recover
that with no competition, the optimal strategy is to bid zero for any possible valuations.
In Table 4.1, we also notice that the difference in utility is decreasing with the number of
players since, with increasing competition, the strategic bidder cannot lower his bid for
values above his monopoly price.
118
at symmetric equilibrium bidders recover the same utility as in a second price auction with
no reserves arguably makes it an even more natural class of bidding strategies to consider
from the bidder standpoint.
Theorem 4.19 (Tang and Zeng, 2018; Abeille et al., 2018). In the Myerson auction, the
symmetric equilibrium strategy βeq satisfies
0
βeq (x) + βeq (x)(ψ(x) − x) = β I (x) ,
where β I (x) is the symmetric equilibrium strategy in a first price auction with no reserve
price. A solution of this equation is
At the equilibrium, the bidders’ expected utilities are the same as in a first price auction
without reserve price; in particular, it is strictly greater than their expected payoffs had
they bid truthfully.
Discussion The intuition behind this result is quite clear. In the Myerson auction, the
expected utility of a bidder is the same as in a first price auction where her bids have been
transformed through his virtual value function. We call the corresponding pseudo-bids
“virtualized” bids. Hence, if the bidders can bid in such a way that their virtualized bids are
equal to their symmetric equilibrium first price bids, the situation is completely equivalent
to a first price auction. And hence their equilibrium strategy in virtualized bid space
should be the strategy they use in a standard first price auction with no reserve price.
119
be plugged in any modern bidding algorithms learning distribution of the highest bid of
the competition and we test it on other classes of mechanism without any known closed
form optimal bidding strategies.
The major and prohibitive drawback of these approaches is that they require that strate-
gic bidders perfectly know the underlying mechanism design problem (i.e., the revenue
maximization problem) solved by the seller, leading to a strong asymmetry between the
bidders and the seller, this time in favor of the former.
It is nonetheless possible to remove the prior knowledge on the exact algorithmic proce-
dure used by the seller to optimize her mechanism by a classical exploration/exploitation
trade-off, inspired by reinforcement learning techniques, thus reducing this asymmetry
(Nedelec et al., 2021).
References
Abeille, M., C. Calauzènes, N. E. Karoui, T. Nedelec, and V. Perchet. 2018. “Explicit shading
strategies for repeated truthful auctions”. In: arXiv preprint arXiv:1805.00256.
Agrawal, S., C. Daskalakis, V. S. Mirrokni, and B. Sivan. 2018. “Robust Repeated Auctions
under Heterogeneous Buyer Behavior”. In: Proceedings of the 2018 ACM Conference on
Economics and Computation. 171–171.
Amin, K., A. Rostamizadeh, and U. Syed. 2013. “Learning prices for repeated auctions
with strategic buyers”. In: Proceedings of the 26th International Conference on Neural
Information Processing Systems-Volume 1. 1169–1177.
Amin, K., A. Rostamizadeh, and U. Syed. 2014. “Repeated contextual auctions with strate-
gic buyers”. In: Proceedings of the 27th International Conference on Neural Information
Processing Systems-Volume 1. 622–630.
Archer, A. and É. Tardos. 2001. “Truthful mechanisms for one-parameter agents”. In:
Proceedings 2001 IEEE International Conference on Cluster Computing. IEEE.
Arlotto, A. and I. Gurvich. 2019. “Uniformly Bounded Regret in the Multisecretary Prob-
lem”. Stochastic Systems. 9(3): 231–260.
Ashlagi, I., C. Daskalakis, and N. Haghpanah. 2016. “Sequential mechanisms with ex-post
participation guarantees”. In: Proceedings of the 2016 ACM Conference on Economics and
Computation. 213–214.
Aström, K. J. and R. M. Murray. 2008. Feedback Systems: An Introduction for Scientists and
Engineers. Princeton University Press.
120
Babaioff, M., R. D. Kleinberg, and A. Slivkins. 2010. “Truthful Mechanisms with Im-
plicit Payment Computation”. In: Proceedings of the 11th ACM Conference on Electronic
Commerce. EC ’10. Cambridge, Massachusetts, USA: Association for Computing Ma-
chinery. 43?52. isbn: 9781605588223. doi: 10 . 1145 / 1807342. 1807349. url: https :
//doi.org/10.1145/1807342.1807349.
Babaioff, M., Y. Sharma, and A. Slivkins. 2014. “Characterizing Truthful Multi-armed
Bandit Mechanisms”. SIAM Journal on Computing. 43(1): 194–230. doi: 10 . 1137 /
120878768.
Balseiro, S. R., O. Besbes, and G. Y. Weintraub. 2015. “Repeated auctions with budgets in
ad exchanges: Approximations and design”. Management Science. 61(4): 864–884.
Balseiro, S. R. and Y. Gur. 2019. “Learning in repeated auctions with budgets: Regret
minimization and equilibrium”. Management Science. 65(9): 3952–3968.
Balseiro, S. R., V. S. Mirrokni, and R. P. Leme. 2018. “Dynamic mechanisms with martingale
utilities”. Management Science. 64(11): 5062–5082.
Braverman, M., J. Mao, J. Schneider, and M. Weinberg. 2018. “Selling to a no-regret buyer”.
In: Proceedings of the 2018 ACM Conference on Economics and Computation. 523–538.
Celis, L. E., G. Lewis, M. Mobius, and H. Nazerzadeh. 2014. “Buy-it-now or take-a-chance:
Price discrimination through randomized auctions”. Management Science. 60(12): 2927–
2948.
Choi, H., C. F. Mela, S. R. Balseiro, and A. Leary. 2020. “Online display advertising markets:
A literature review and future directions”. Information Systems Research. 31(2): 556–
575.
Choi, H. and C. F. . Mela. 2018. “Display advertising pricing in exchange markets”. Working
paper.
Ciocan, D. F. and V. Farias. 2012. “Model Predictive Control for Dynamic Resource Alloca-
tion”. Mathematics of Operations Research.
Cremer, J. and R. P. McLean. 1988. “Full extraction of the surplus in Bayesian and dominant
strategy auctions”. In: Econometrica: Journal of the Econometric Society. JSTOR. 1247–
1257.
Deng, Y., J. Schneider, and B. Sivan. 2019. “Prior-Free Dynamic Auctions with Low Regret
Buyers”. In: Advances in Neural Information Processing Systems. 4804–4814.
Devanur, N. R. and S. M. Kakade. 2009. “The Price of Truthfulness for Pay-per-Click
Auctions”. In: Proceedings of the 10th ACM Conference on Electronic Commerce. EC
’09. Stanford, California, USA: Association for Computing Machinery. 99?106. isbn:
9781605584584. doi: 10.1145/1566374.1566388. url: https://doi.org/10.1145/
1566374.1566388.
Dudley, R. M. 2014. Uniform Central Limit Theorems. Cambridge Studies in Advanced Mathe-
matics. Cambridge University Press.
121
Dütting, P., Z. Feng, H. Narasimhan, D. Parkes, and S. S. Ravindranath. 2019. “Optimal
auctions through deep learning”. In: International Conference on Machine Learning.
PMLR. 1706–1715.
Epasto, A., M. Mahdian, V. Mirrokni, and S. Zuo. 2018. “Incentive-aware learning for large
markets”. In: Proceedings of the 2018 World Wide Web Conference. 1369–1378.
Feng, Z., C. Podimata, and V. Syrgkanis. 2018b. “Learning to bid without knowing your
value”. In: Proceedings of the 2018 ACM Conference on Economics and Computation. 505–
522.
Fernandez-Tapia, J. 2015. “An analytical solution to the budget-pacing problem in pro-
grammatic advertising”. Journal of Information and Optimization Sciences. 40.
Fernandez-Tapia, J., O. Guéant, and J.-M. Lasry. 2016. “Optimal Real-Time Bidding Strate-
gies”. Applied Mathematics Research eXpress.
Ghosh, A., B. I. Rubinstein, S. Vassilvitskii, and M. Zinkevich. 2009. “Adaptive Bidding for
Display Advertising”. In: Proceedings of the 18th International Conference on World Wide
Web. WWW ’09. 251–260.
Golrezaei, N., A. Javanmard, and V. Mirrokni. 2021. “Dynamic incentive-aware learning:
Robust pricing in contextual auctions”. Operations Research. 69(1): 297–314.
Gummadi, R., P. Key, and A. Proutiere. 2012. Optimal Bidding Strategies and Equilibria in
Dynamic Auctions with Budget Constraints.
Kandasamy, K., J. E. Gonzalez, M. I. Jordan, and I. Stoica. 2020. “Mechanism Design with
Bandit Feedback”. arXiv: 2004.08924 [stat.ML].
Kanoria, Y. and H. Nazerzadeh. 2014. “Dynamic Reserve Prices for Repeated Auctions:
Learning from Bids”. In: Web and Internet Economics: 10th International Conference.
Vol. 8877. Springer. 232.
Lecué, G. and M. Lerasle. 2020. “Robust machine learning by median-of-means: theory
and practice”. The Annals of Statistics. 48(2): 906–931.
Lee, K.-C., A. Jalali, and A. Dasdan. 2013. “Real time bid optimization with smooth budget
delivery in online advertising.” Proceedings of the Seventh International Workshop on
Data Mining for Online Advertising.
Mirrokni, V. S., R. P. Leme, P. Tang, and S. Zuo. 2016. “Dynamic Auctions with Bank
Accounts.” In: Proceedings of IJCAI. 387–393.
Mohri, M. and A. M. Medina. 2015. “Revenue optimization against strategic buyers”.
Advances in Neural Information Processing Systems. 2015: 2530–2538.
Nazerzadeh, H., A. Saberi, and R. Vohra. 2008. “Dynamic Cost-per-Action Mechanisms and
Applications to Online Advertising”. In: Proceedings of the 17th International Conference
on World Wide Web. WWW ’08. Beijing, China: Association for Computing Machinery.
179?188. isbn: 9781605580852. doi: 10.1145/1367497.1367522. url: https://doi.org/
10.1145/1367497.1367522.
122
Nedelec, T., M. Abeille, C. Calauzènes, N. E. Karoui, B. Heymann, and V. Perchet. 2019a.
“Thresholding at the monopoly price: an agnostic way to improve bidding strategies in
revenue-maximizing auctions”. In: The Workshop on Learning in the Presence of Strategic
Behavior, EC.
Nedelec, T., J. Baudet, V. Perchet, and N. E. Karoui. 2021. “Adversarial learning for revenue-
maximizing auctions”. In: 20th International Conference on Autonomous Agents and
Multiagent Systems.
Nedelec, T., N. El Karoui, and V. Perchet. 2019b. “Learning to bid in revenue-maximizing
auctions”. In: International Conference on Machine Learning. PMLR. 4781–4789.
Nekipelov, D., V. Syrgkanis, and E. Tardos. 2015. “Econometrics for learning agents”. In:
Proceedings of the Sixteenth ACM Conference on Economics and Computation. 1–18.
Perchet, V. and P. Rigollet. 2013. “The multi-armed bandit problem with covariates”. The
Annals of Statistics. 41(2): 693–721.
Shalev-Shwartz, S. and S. Ben-David. 2014. Understanding Machine Learning: From Theory
to Algorithms. Cambridge University Press.
Tang, P. and Y. Zeng. 2018. “The price of prior dependence in auctions”. In: Proceedings of
the 2018 ACM Conference on Economics and Computation. 485–502.
Weed, J., V. Perchet, and P. Rigollet. 2016. “Online learning in repeated auctions”. In:
Conference on Learning Theory. PMLR. 1562–1583.
Xu, J., K.-c. Lee, W. Li, H. Qi, and Q. Lu. 2015. “Smart pacing for effective online ad cam-
paign optimization”. In: Proceedings of the 21th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining. 2217–2226.
Yuan, S., J. Wang, and X. Zhao. 2013. “Real-time bidding for online advertising: measure-
ment and analysis”. Proceedings of the Seventh International Workshop on Data Mining
for Online Advertising.
123
Bibliography
Abeille, M., C. Calauzènes, N. E. Karoui, T. Nedelec, and V. Perchet. 2018. “Explicit shading
strategies for repeated truthful auctions”. In: arXiv preprint arXiv:1805.00256.
Agrawal, S., C. Daskalakis, V. S. Mirrokni, and B. Sivan. 2018. “Robust Repeated Auctions
under Heterogeneous Buyer Behavior”. In: Proceedings of the 2018 ACM Conference on
Economics and Computation. 171–171.
Albert, M., V. Conitzer, and P. Stone. 2017. “Automated design of robust mechanisms”. In:
Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 31. No. 1.
Allouah, A. and O. Besbes. 2020. “Prior-independent optimal auctions”. Management
Science. 66(10): 4417–4432.
Amin, K., A. Rostamizadeh, and U. Syed. 2013. “Learning prices for repeated auctions
with strategic buyers”. In: Proceedings of the 26th International Conference on Neural
Information Processing Systems-Volume 1. 1169–1177.
Amin, K., A. Rostamizadeh, and U. Syed. 2014. “Repeated contextual auctions with strate-
gic buyers”. In: Proceedings of the 27th International Conference on Neural Information
Processing Systems-Volume 1. 622–630.
Archer, A. and É. Tardos. 2001. “Truthful mechanisms for one-parameter agents”. In:
Proceedings 2001 IEEE International Conference on Cluster Computing. IEEE.
Arlotto, A. and I. Gurvich. 2019. “Uniformly Bounded Regret in the Multisecretary Prob-
lem”. Stochastic Systems. 9(3): 231–260.
Armstrong, M. 1996. “Multiproduct nonlinear pricing”. Econometrica: Journal of the Econo-
metric Society: 51–75.
Arnosti, N., M. Beck, and P. Milgrom. 2016. “Adverse selection and auction design for
internet display advertising”. American Economic Review. 106(10): 2852–66.
Ashlagi, I., C. Daskalakis, and N. Haghpanah. 2016. “Sequential mechanisms with ex-post
participation guarantees”. In: Proceedings of the 2016 ACM Conference on Economics and
Computation. 213–214.
Aström, K. J. and R. M. Murray. 2008. Feedback Systems: An Introduction for Scientists and
Engineers. Princeton University Press.
Athey, S. and P. A. Haile. 2007. “Chapter 60 Nonparametric Approaches to Auctions”. In:
Handbook of Econometrics.
Audibert, J.-Y. and S. Bubeck. 2009. “Minimax policies for adversarial and stochastic
bandits”. In: Proceedings of COLT.
Babaioff, M., R. Kleinberg, and A. Slivkins. 2013. “Multi-Parameter Mechanisms with
Implicit Payment Computation”. In: Proceedings of the Fourteenth ACM Conference on
Electronic Commerce. EC ’13. Philadelphia, Pennsylvania, USA: Association for Com-
puting Machinery. 35?52. isbn: 9781450319621. doi: 10.1145/2482540.2482602. url:
https://doi.org/10.1145/2482540.2482602.
124
Babaioff, M., R. D. Kleinberg, and A. Slivkins. 2010. “Truthful Mechanisms with Im-
plicit Payment Computation”. In: Proceedings of the 11th ACM Conference on Electronic
Commerce. EC ’10. Cambridge, Massachusetts, USA: Association for Computing Ma-
chinery. 43?52. isbn: 9781605588223. doi: 10 . 1145 / 1807342. 1807349. url: https :
//doi.org/10.1145/1807342.1807349.
Babaioff, M., Y. Sharma, and A. Slivkins. 2014. “Characterizing Truthful Multi-armed
Bandit Mechanisms”. SIAM Journal on Computing. 43(1): 194–230. doi: 10 . 1137 /
120878768.
Balcan, M.-F., A. Blum, J. D. Hartline, and Y. Mansour. 2008. “Reducing mechanism design
to algorithm design via machine learning”. Journal of Computer and System Sciences.
74(8): 1245–1270.
Balseiro, S. R., O. Besbes, and G. Y. Weintraub. 2015. “Repeated auctions with budgets in
ad exchanges: Approximations and design”. Management Science. 61(4): 864–884.
Balseiro, S. R., O. Candogan, and H. Gurkan. 2020. “Multistage Intermediation in Display
Advertising”. Manufacturing & Service Operations Management.
Balseiro, S. R. and Y. Gur. 2019. “Learning in repeated auctions with budgets: Regret
minimization and equilibrium”. Management Science. 65(9): 3952–3968.
Balseiro, S. R., V. S. Mirrokni, and R. P. Leme. 2018. “Dynamic mechanisms with martingale
utilities”. Management Science. 64(11): 5062–5082.
Bar-Yossef, Z., K. Hildrum, and F. Wu. 2002. “Incentive-compatible online auctions for
digital goods.” In: SODA. Vol. 2. 964–970.
Bartlett, P. L., S. Boucheron, and G. Lugosi. 2002. “Model selection and error estimation”.
Machine Learning. 48(1-3): 85–113.
Blum, A., V. Kumar, A. Rudra, and F. Wu. 2004. “Online learning in online auctions”.
Theoretical Computer Science. 324(2-3): 137–146.
Boyd, S. and L. Vandenberghe. 2004. Convex Optimization. USA: Cambridge University
Press. isbn: 0521833787.
Braverman, M., J. Mao, J. Schneider, and M. Weinberg. 2018. “Selling to a no-regret buyer”.
In: Proceedings of the 2018 ACM Conference on Economics and Computation. 523–538.
Bubeck, S. and N. Cesa-Bianchi. 2012. “Regret Analysis of Stochastic and Nonstochastic
Multi-armed Bandit Problems”. In: Machine Learning. Vol. 5. No. 1. 1–122.
Bubeck, S., N. R. Devanur, Z. Huang, and R. Niazadeh. 2017. “Multi-scale Online Learning
and its Applications to Online Auctions”. Proceedings of the Eighteenth ACM Conference
on Economics and Computation.
Bulow, J. and P. Klemperer. 1996. “Auctions Versus Negotiations”. The American Economic
Review. 86(1): 180–194.
Celis, L. E., G. Lewis, M. Mobius, and H. Nazerzadeh. 2014. “Buy-it-now or take-a-chance:
Price discrimination through randomized auctions”. Management Science. 60(12): 2927–
2948.
125
Cesa-Bianchi, N., T. Cesari, and V. Perchet. 2019. “Dynamic pricing with finitely many
unknown valuations”. In: Algorithmic Learning Theory. PMLR. 247–273.
Cesa-Bianchi, N., C. Gentile, and Y. Mansour. 2014. “Regret minimization for reserve prices
in second-price auctions”. IEEE Transactions on Information Theory. 61(1): 549–564.
Choi, H., C. F. Mela, S. R. Balseiro, and A. Leary. 2020. “Online display advertising markets:
A literature review and future directions”. Information Systems Research. 31(2): 556–
575.
Choi, H. and C. F. . Mela. 2018. “Display advertising pricing in exchange markets”. Working
paper.
Ciocan, D. F. and V. Farias. 2012. “Model Predictive Control for Dynamic Resource Alloca-
tion”. Mathematics of Operations Research.
Cole, R. and T. Roughgarden. 2014a. “The sample complexity of revenue maximization”. In:
Proceedings of the forty-sixth annual ACM symposium on Theory of computing. 243–252.
Cole, R. and T. Roughgarden. 2014b. “The sample complexity of revenue maximization”.
In: Proceedings of the forty-sixth annual ACM symposium on Theory of computing. 243–
252.
Conitzer, V. and T. Sandholm. 2002. “Complexity of mechanism design”. In: Proceedings of
the Eighteenth conference on Uncertainty in artificial intelligence. 103–110.
Cremer, J. and R. P. McLean. 1988. “Full extraction of the surplus in Bayesian and dominant
strategy auctions”. In: Econometrica: Journal of the Econometric Society. JSTOR. 1247–
1257.
Daskalakis, C., A. Deckelbaum, and C. Tzamos. 2013. “Mechanism design via optimal
transport”. In: Proceedings of the fourteenth ACM conference on Electronic commerce.
269–286.
Degenne, R. and V. Perchet. 2016. “Anytime optimal algorithms in stochastic multi-armed
bandits”. In: International Conference on Machine Learning. 1587–1595.
Deng, Y., J. Schneider, and B. Sivan. 2019. “Prior-Free Dynamic Auctions with Low Regret
Buyers”. In: Advances in Neural Information Processing Systems. 4804–4814.
Devanur, N. R., Z. Huang, and C.-A. Psomas. 2016. “The sample complexity of auctions
with side information”. In: Proceedings of the forty-eighth annual ACM symposium on
Theory of Computing. 426–439.
Devanur, N. R. and S. M. Kakade. 2009. “The Price of Truthfulness for Pay-per-Click
Auctions”. In: Proceedings of the 10th ACM Conference on Electronic Commerce. EC
’09. Stanford, California, USA: Association for Computing Machinery. 99?106. isbn:
9781605584584. doi: 10.1145/1566374.1566388. url: https://doi.org/10.1145/
1566374.1566388.
Dhangwatnotai, P., T. Roughgarden, and Q. Yan. 2015. “Revenue maximization with a
single sample”. Games and Economic Behavior. 91: 318–333.
126
Drutsa, A. 2020. “Reserve pricing in repeated second-price auctions with strategic bidders”.
In: International Conference on Machine Learning. PMLR. 2678–2689.
Dudley, R. M. 2014. Uniform Central Limit Theorems. Cambridge Studies in Advanced Mathe-
matics. Cambridge University Press.
Dütting, P., Z. Feng, H. Narasimhan, D. Parkes, and S. S. Ravindranath. 2019. “Optimal
auctions through deep learning”. In: International Conference on Machine Learning.
PMLR. 1706–1715.
Elkind, E. 2007. “Designing and learning optimal finite support auctions”. In: Proceedings
of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. 736–745.
Epasto, A., M. Mahdian, V. Mirrokni, and S. Zuo. 2018. “Incentive-aware learning for large
markets”. In: Proceedings of the 2018 World Wide Web Conference. 1369–1378.
Feng, Z., S. Lahaie, J. Schneider, and J. Ye. 2021. “Reserve Price Optimization for First
Price Auctions in Display Advertising”. International Conference on Machine Learning:
3230–3239.
Feng, Z., H. Narasimhan, and D. C. Parkes. 2018a. “Deep learning for revenue-optimal auc-
tions with budgets”. In: Proceedings of the 17th International Conference on Autonomous
Agents and Multiagent Systems. 354–362.
Feng, Z., C. Podimata, and V. Syrgkanis. 2018b. “Learning to bid without knowing your
value”. In: Proceedings of the 2018 ACM Conference on Economics and Computation. 505–
522.
Fernandez-Tapia, J. 2015. “An analytical solution to the budget-pacing problem in pro-
grammatic advertising”. Journal of Information and Optimization Sciences. 40.
Fernandez-Tapia, J., O. Guéant, and J.-M. Lasry. 2016. “Optimal Real-Time Bidding Strate-
gies”. Applied Mathematics Research eXpress.
Fibich, G. and A. Gavious. 2003. “Asymmetric First-Price Auctions: A Perturbation Ap-
proach”. Mathematics of Operations Research. 28(4): 836–852.
Fibich, G. and N. Gavish. 2012. “Asymmetric First-Price Auctions—A Dynamical-Systems
Approach”. Mathematics of Operations Research. 37(2): 219–243.
Fu, H. 2013. “VCG auctions with reserve prices: Lazy or eager”. In: Proceedings of the
Fourteenth ACM Conference on Economics and Computation.
Fu, H. 2016. “Notes on Myerson’s Revenue Optimal Mechanisms”. http://fuhuthu.com/
notes/iron.pdf. Accessed: 2021-08-25.
Fu, H., N. Immorlica, B. Lucier, and P. Strack. 2015. “Randomization beats second price
as a prior-independent auction”. In: Proceedings of the Sixteenth ACM Conference on
Economics and Computation. 323–323.
Gayle, W.-R. and J. F. Richard. 2008. “Numerical Solutions of Asymmetric, First-Price,
Independent Private Values Auctions”. Computational Economics. 32(3).
127
Ghosh, A., B. I. Rubinstein, S. Vassilvitskii, and M. Zinkevich. 2009. “Adaptive Bidding for
Display Advertising”. In: Proceedings of the 18th International Conference on World Wide
Web. WWW ’09. 251–260.
Golowich, N., H. Narasimhan, and D. C. Parkes. 2018. “Deep learning for multi-facility
location mechanism design”. In: Proceedings of the 27th International Joint Conference on
Artificial Intelligence. 261–267.
Golrezaei, N., M. Lin, V. Mirrokni, and H. Nazerzadeh. 2017. “Boosted Second-price
Auctions for Heterogeneous Bidders”. In: Management Science.
Golrezaei, N., A. Javanmard, and V. Mirrokni. 2021. “Dynamic incentive-aware learning:
Robust pricing in contextual auctions”. Operations Research. 69(1): 297–314.
Gonczarowski, Y. A. and N. Nisan. 2017. “Efficient empirical revenue maximization
in single-parameter auction environments”. In: Proceedings of the 49th Annual ACM
SIGACT Symposium on Theory of Computing.
Groeneboom, P. and G. Jongbloed. 2014. Nonparametric Estimation under Shape Constraints.
Cambridge University Press.
Guerre, E., I. Perrigne, and Q. Vuong. 2000. “Optimal Nonparametric Estimation of First-
price Auctions”. Econometrica. 68(3): 525–574.
Gummadi, R., P. Key, and A. Proutiere. 2012. Optimal Bidding Strategies and Equilibria in
Dynamic Auctions with Budget Constraints.
Guo, C., Z. Huang, and X. Zhang. 2019. “Settling the sample complexity of single-
parameter revenue maximization”. In: Proceedings of the 51st Annual ACM SIGACT
Symposium on Theory of Computing.
Hartline, J., A. Johnsen, and Y. Li. 2020. “Benchmark design and prior-independent op-
timization”. 2020 IEEE 61st Annual Symposium on Foundations of Computer Science
(FOCS): 294–305.
Hartline, J. D. et al. 2013. “Bayesian mechanism design”. Foundations and Trends® in
Theoretical Computer Science. 8(3): 143–263.
Hartline, J. D. and T. Roughgarden. 2009. “Simple versus optimal mechanisms”. In: Pro-
ceedings of the 10th ACM conference on Electronic commerce. 225–234.
Haussler, D. 1992. “Decision theoretic generalizations of the PAC model for neural net and
other learning applications”. In: Information and computation.
Hiriart-Urruty, J.-B. and C. Lemaréchal. 2001. Fundamentals of Convex Analysis. isbn: 978-
3-540-42205-1. doi: 10.1007/978-3-642-56468-0.
Huang, Z., Y. Mansour, and T. Roughgarden. 2018. “Making the most of your samples”.
SIAM Journal on Computing. 47(3): 651–674.
Kandasamy, K., J. E. Gonzalez, M. I. Jordan, and I. Stoica. 2020. “Mechanism Design with
Bandit Feedback”. arXiv: 2004.08924 [stat.ML].
128
Kanoria, Y. and H. Nazerzadeh. 2014. “Dynamic Reserve Prices for Repeated Auctions:
Learning from Bids”. In: Web and Internet Economics: 10th International Conference.
Vol. 8877. Springer. 232.
Kirkegaard, R. 2009. “Asymmetric first price auctions”. Journal of Economic Theory. 144(4):
1617–1635. issn: 0022-0531.
Kleinberg, R. and T. Leighton. 2003. “The value of knowing a demand curve: Bounds on re-
gret for online posted-price auctions”. In: 44th Annual IEEE Symposium on Foundations
of Computer Science, 2003. Proceedings. IEEE. 594–605.
Koltchinskii, V., D. Panchenko, et al. 2002. “Empirical margin distributions and bounding
the generalization error of combined classifiers”. The Annals of Statistics. 30(1): 1–50.
Kotowski, M. H. 2018. “On asymmetric reserve prices”. Theoretical Economics. 13(1): 205–
237.
Krishna, V. 2009. Auction Theory.
Lattimore, T. and C. Szepesvári. 2020. Bandit algorithms. Cambridge University Press.
Lavi, R. and N. Nisan. 2004. “Competitive analysis of incentive compatible on-line auc-
tions”. Theoretical Computer Science. 310(1-3): 159–180.
Le Thi, H. A., V. N. Huynh, and T. P. Dinh. 2014. “DC Programming and DCA for General
DC Programs”. In: Advanced Computational Methods for Knowledge Engineering. Ed. by
T. van Do, H. A. L. Thi, and N. T. Nguyen. Cham: Springer International Publishing.
15–35.
Lebrun, B. 1999. “First Price Auctions in the Asymmetric N Bidder Case”. International
Economic Review. (1).
Lecué, G. and M. Lerasle. 2020. “Robust machine learning by median-of-means: theory
and practice”. The Annals of Statistics. 48(2): 906–931.
Lee, K.-C., A. Jalali, and A. Dasdan. 2013. “Real time bid optimization with smooth budget
delivery in online advertising.” Proceedings of the Seventh International Workshop on
Data Mining for Online Advertising.
Lugosi, G. and S. Mendelson. 2019. “Mean estimation and regression under heavy-tailed
distributions: A survey”. Foundations of Computational Mathematics. 19(5): 1145–1190.
Manelli, A. M. and D. R. Vincent. 2007. “Multidimensional mechanism design: Revenue
maximization and the multiple-good monopoly”. Journal of Economic theory. 137(1):
153–185.
Marshall, R., M. Meurer, J. Richard, and W. Stromquist. 1994. “Numerical analysis of
asymmetric first price auctions”. Games and Economic Behavior. (2). issn: 0899-8256.
Massart, P. 1990. “The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality”. In:
The Annals of Probability. Vol. 18. No. 3. Institute of Mathematical Statistics. 1269–1283.
Medina, A. M. and S. Vassilvitskii. 2017. “Revenue optimization with approximate bid
predictions”. In: Proceedings of the 31st International Conference on Neural Information
Processing Systems. 1856–1864.
129
Milgrom, P. 2004. Putting auction theory to work. Cambridge University Press.
Milgrom, P. and I. Segal. 2002. “Envelope theorems for arbitrary choice sets”. Econometrica.
70(2): 583–601.
Mirrokni, V. S., R. P. Leme, P. Tang, and S. Zuo. 2016. “Dynamic Auctions with Bank
Accounts.” In: Proceedings of IJCAI. 387–393.
Mohri, M. and A. M. Medina. 2015. “Revenue optimization against strategic buyers”.
Advances in Neural Information Processing Systems. 2015: 2530–2538.
Mohri, M. and A. M. Medina. 2014. “Learning theory and algorithms for revenue opti-
mization in second price auctions with reserve”. In: International Conference on Machine
Learning. PMLR. 262–270.
Morgenstern, J. and T. Roughgarden. 2015. “The pseudo-dimension of near-optimal auc-
tions”. In: Proceedings of the 28th International Conference on Neural Information Process-
ing Systems-Volume 1. 136–144.
Myerson, R. B. 1981. “Optimal auction design”. Mathematics of operations research. 6(1):
58–73.
Nazerzadeh, H., A. Saberi, and R. Vohra. 2008. “Dynamic Cost-per-Action Mechanisms and
Applications to Online Advertising”. In: Proceedings of the 17th International Conference
on World Wide Web. WWW ’08. Beijing, China: Association for Computing Machinery.
179?188. isbn: 9781605580852. doi: 10.1145/1367497.1367522. url: https://doi.org/
10.1145/1367497.1367522.
Nedelec, T., M. Abeille, C. Calauzènes, N. E. Karoui, B. Heymann, and V. Perchet. 2019a.
“Thresholding at the monopoly price: an agnostic way to improve bidding strategies in
revenue-maximizing auctions”. In: The Workshop on Learning in the Presence of Strategic
Behavior, EC.
Nedelec, T., J. Baudet, V. Perchet, and N. E. Karoui. 2021. “Adversarial learning for revenue-
maximizing auctions”. In: 20th International Conference on Autonomous Agents and
Multiagent Systems.
Nedelec, T., N. El Karoui, and V. Perchet. 2019b. “Learning to bid in revenue-maximizing
auctions”. In: International Conference on Machine Learning. PMLR. 4781–4789.
Nekipelov, D., V. Syrgkanis, and E. Tardos. 2015. “Econometrics for learning agents”. In:
Proceedings of the Sixteenth ACM Conference on Economics and Computation. 1–18.
Ostrovsky, M. and M. Schwarz. 2011. “Reserve prices in internet advertising auctions: A
field experiment”. In: Proceedings of the 12th ACM conference on Electronic commerce.
59–60.
Paes Leme, R., M. Pál, and S. Vassilvitskii. 2016. “A field guide to personalized reserve
prices”. In: Proceedings of the 25th international conference on world wide web. 1093–1102.
Perchet, V. and P. Rigollet. 2013. “The multi-armed bandit problem with covariates”. The
Annals of Statistics. 41(2): 693–721.
130
Rahme, J., S. Jelassi, and S. M. Weinberg. 2020. “Auction learning as a two-player game”.
arXiv preprint arXiv:2006.05684.
Riley, J. G. and W. F. Samuelson. 1981a. “Optimal auctions”. The American Economic Review.
71(3): 381–392.
Riley, J. G. and W. F. Samuelson. 1981b. “Optimal auctions”. The American Economic Review.
71(3): 381–392.
Rockafellar, R. T. 1970. Convex Analysis. Princeton University Press.
Roughgarden, T. and O. Schrijvers. 2016. “Ironing in the dark”. In: Proceedings of EC. 1–18.
Roughgarden, T. and J. R. Wang. 2016. “Minimizing Regret with Multiple Reserves”. In:
Proceedings of the 2016 ACM Conference on Economics and Computation. 601–616.
Rudolph, M. R., J. G. Ellis, and D. M. Blei. 2016. “Objective variables for probabilistic
revenue maximization in second-price auctions with reserve”. In: Proceedings of the
25th International Conference on World Wide Web. 1113–1122.
Shalev-Shwartz, S. and S. Ben-David. 2014. Understanding Machine Learning: From Theory
to Algorithms. Cambridge University Press.
Shen, W., S. Lahaie, and R. P. Leme. 2019a. “Learning to clear the market”. In: International
Conference on Machine Learning. PMLR. 5710–5718.
Shen, W., P. Tang, and S. Zuo. 2019b. “Automated mechanism design via neural networks”.
In: Proceedings of the 18th International Conference on Autonomous Agents and Multiagent
Systems. 215–223.
Slivkins, A. et al. 2019. “Introduction to Multi-Armed Bandits”. Foundations and Trends®
in Machine Learning. 12(1-2): 1–286.
Tang, P. and Y. Zeng. 2018. “The price of prior dependence in auctions”. In: Proceedings of
the 2018 ACM Conference on Economics and Computation. 485–502.
Vickrey, W. 1961. “Counterspeculation, auctions, and competitive sealed tenders”. In: The
Journal of finance. Vol. 16. No. 1. Wiley Online Library.
Weed, J., V. Perchet, and P. Rigollet. 2016. “Online learning in repeated auctions”. In:
Conference on Learning Theory. PMLR. 1562–1583.
Xu, J., K.-c. Lee, W. Li, H. Qi, and Q. Lu. 2015. “Smart pacing for effective online ad cam-
paign optimization”. In: Proceedings of the 21th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining. 2217–2226.
Yao, A. C.-C. 2017. “Dominant-strategy versus bayesian multi-item auctions: Maximum
revenue determination and comparison”. In: Proceedings of the 2017 ACM Conference on
Economics and Computation. 3–20.
Yuan, S., J. Wang, and X. Zhao. 2013. “Real-time bidding for online advertising: measure-
ment and analysis”. Proceedings of the Seventh International Workshop on Data Mining
for Online Advertising.
131