0% found this document useful (0 votes)

9 views131 pages

Learning in Repeated Auctions: Ffi FF

Uploaded by

ahazq776

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views131 pages

Learning in Repeated Auctions: Ffi FF

Uploaded by

ahazq776

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 131

Learning in repeated auctions

Thomas Nedelec Clément Calauzènes

ENS Paris Saclay Criteo AI Lab
arXiv:2011.09365v2 [cs.GT] 22 Sep 2021

Criteo AI Lab [email protected]

[email protected]
Noureddine El Karoui Vianney Perchet
Work done while at ENSAE
UC, Berkeley Criteo AI Lab
Criteo AI Lab [email protected]
[email protected]

September 23, 2021

Abstract
Online auctions are one of the most fundamental facets of the modern economy
and power an industry generating hundreds of billions of dollars a year in revenue.
Auction theory has historically focused on the question of designing the best way to
sell a single item to potential buyers, with the concurrent objectives of maximizing
revenue generated or welfare created. Theoretical results in this area have typically
relied on some prior Bayesian knowledge agents were assumed to have on each-other.
This assumption is no longer satisfied in new markets such as online advertising:
similar items are sold repeatedly, and agents are unaware of each other or might try to
manipulate each-other. On the other hand, statistical learning theory now provides
tools to supplement those missing pieces of information given enough data, as agents
can learn from their environment to improve their strategies.
This survey covers recent advances in learning in repeated auctions, starting from
the traditional economic study of optimal one-shot auctions with a Bayesian prior.
We then focus on the question of learning optimal mechanisms from a dataset of
bidders’ past values. The sample complexity as well as the computational efficiency
of different methods will be studied. We will also investigate online variants where
gathering data has a cost to be accounted for, either by seller or buyers ("earning while
learning"). Later in the survey, we will further assume that bidders are also adaptive
to the mechanism as they interact repeatedly with the same seller. We will show how
strategic agents can actually manipulate repeated auctions, to their own advantage.
A particularly interesting example is that of reserve price improvements for strategic
buyers in second price auctions.
All the questions discussed in this survey are grounded in real-world applications
and many of the ideas and algorithms we describe are used every day to power the
Internet economy.

1
Contents

1 Introduction: scope and motivation 3

1.1 Bayesian mechanism design . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Learning theory and auction design . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Organization of the survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Bayesian mechanism design 9

2.1 The Bayesian setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Sealed-bid auctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 A revenue equivalence theorem . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Deriving revenue-maximizing auctions . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Prior-independent optimal auctions . . . . . . . . . . . . . . . . . . . . . . . 39
2.6 Advanced material: non-unicity of Nash equilibria and related complications 42
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3 Repeated auctions from a seller’s standpoint 48

3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Statistical Learning Theory Tools for Revenue Maximization . . . . . . . . . 48
3.3 Auctions with Asymptotically No Approximation Error . . . . . . . . . . . . . 57
3.4 Tractability at the Cost of Approximation Error . . . . . . . . . . . . . . . . . 60
3.5 Contextual Estimation of Reserve Prices . . . . . . . . . . . . . . . . . . . . . 70
3.6 Cost & Online Estimation of Auctions . . . . . . . . . . . . . . . . . . . . . . 74
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4 Adaptive and strategic learning agents 86

4.1 Adaptive bidders - Online learning to bid . . . . . . . . . . . . . . . . . . . . 87
4.2 Mechanism design in front of adaptive bidders & Full surplus extraction . . 95
4.3 Reversing the asymmetry: Strategic buyer vs. myopic seller . . . . . . . . . 109
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

Bibliography 124

2
1 Introduction: scope and motivation
The main purpose of auction theory is to construct a set of rules that will be used by a
seller to sell one or several items to a group of potential buyers, that will send messages
(or bids) to the seller – usually indicating how much they value the item or how much they
are willing to pay to acquire it. In almost all cases, it is sufficient to define only two rules.
First, the allocation rule describes which buyer wins the auction (if a unique non-divisible
item is sold), depending on the different messages received; if the item is divisible, the
allocation rule describes how the item is shared between winners. Second, the payment
rule indicates to buyers how much they are going to pay to the seller, again based on the
different messages. Those rules are known publicly before the auction starts, and they
influence the behavior, or strategy, of the different buyers.
When choosing an allocation and a payment rule, the seller might have several con-
straints to respect: 1) maximizing the revenue she is getting from the auction (revenue
maximization); 2) ensuring the participation of buyers to the auction and making sure they
have an incentive to participate (individual rationality); 3) ensuring that given the rules of
the auction, it is in the best interest of buyers to reveal how much they truly value an item
(incentive compatibility) as it may make revenue maximization easier. On the other side of
the game, the buyers adapt strategically the bids sent to the seller depending on auction
rules in order to maximize their own utility.
Historically, auctions have often been designed so that buyers have an incentive to
bid in a way that reflects how much they truly value the items that are for sale. This
constraint still leaves plenty of choices for auction design, and a large part of the literature
has focused on designing auctions that maximize the seller’s revenue, assuming buyers
are rational. However, with the advent of the Internet and the automation of auctions, the
landscape of possible applications has changed drastically, necessitating more complex
settings to accurately study the incentives and behaviors at play. More recently, the auction
literature has aimed at understanding how the design of an auction platform impacts
seller’s revenue, the global welfare and the behavior of buyers and sellers in contexts where
sellers (and sometimes buyers) participate in a very large number of auctions each day.
These setups reflect situations appearing in modern online marketplaces.

1.1 Bayesian mechanism design

Auction theory has focused for a long time on the simplest case: there is a single, non-
divisible item to be sold to a set of predefined buyers in a one-shot auction. The chosen
mechanism indicates which buyer (if any) gets the item and at which price. The seminal
works of Vickrey (1961), Myerson (1981) and Riley and Samuelson (1981a) emphasize the
importance of the information structure of an auction system. It consists in the information

3
owned privately by the buyers and the information that the seller has on each buyer. This
information owned privately by the buyers is the value they give to the item, i.e, the highest
price they are willing to pay to get the item. The uncertainties upon these different values
lie at the gist of the seller’s optimization problem: otherwise, she would just have to sell
the item to the buyer with the highest value, at this price or infinitesimally less.
To handle this deficit of information about buyers, it is standard to take a “Bayesian"
viewpoint and assume that the seller has some probabilistic prior on the values given to
the item by each bidder. This prior distribution is usually called the value distribution and
it encompasses the seller’s uncertainty on a specific bidder’s values. There are of course
several possibilities for how this value distribution is constructed. For instance, in wine or
art auctions, it often comes from expert knowledge about an admissible price for a good
wine bottle or for an important piece of art.

1.2 Learning theory and auction design

It is now possible for Internet platforms to run billions of auctions a day and store most
of the historical data coming from them. This digitization of auction mechanisms was
the first step into gathering data to optimize selling mechanisms. Auctions are now used
in most Internet platforms to organize interactions between the different stakeholders.
Ebay was one of the first big online platforms to use ascending auction to sell objects
on the platform. Google and most search engines companies started to use auctions to
sell ad opportunities on their front page. For instance, they let advertisers bid on some
keywords to get sponsored links above the first results for a certain user query. Nowadays,
Facebook and LinkedIn are also using them to determine which ad to display, Amazon
and most e-commerce marketplaces decides which products are going to be sponsored
(and/or advertised) through an auction mechanism and auctions are also used to sell
carbon permits by the European union or to run large electricity markets.
To exploit this new source of available information (i.e., enormous datasets of past
bids), practitioners used advanced statistical learning algorithms in connection with the
classical Bayesian theory. Indeed, beyond the AI hype, machine learning algorithms are
now widely applied in the industry for numerous applications: the value distribution
is no longer coming from some given and fixed prior, but learned (hopefully accurately
and efficiently) on historical -bidding - data. The first large-scale field experiment in
production showed how engineers at Yahoo could handle their huge datasets to learn
an optimal reserve price per key word (Ostrovsky and Schwarz, 2011). This results in
data-driven mechanisms whose design use techniques coming from a large variety of fields,
including statistics, machine learning, game theory and Economics. Similarly, bidders
on these online platforms also gather data and use new statistical learning techniques to

4
improve their bidding strategies against automated mechanisms. This flood of data and the
associated paradigm shift it constitutes opens many new interesting practical problems,
new theoretical questions and new interesting games to study.

1.2.1 Repeated auctions only from a seller’s standpoint

The first natural repeated game setting consists in understanding how the seller can
learn a revenue-maximizing auction mechanism from a dataset of bids or values. In the
example of Ebay marketplace, the seller (Ebay) observes numerous auctions a day for
similar items. Hence, from its point of view, the mechanism is repeated and she can
aim at optimizing some long-term revenue. On the contrary, buyers are individuals that
participate in a few, if not a single, auctions at best. Then, from their point of the view, the
mechanism still looks like a one-shot auction and they are bound to implement myopic
short-term strategies, optimizing point-wise their utility (by opposition to long-term and
effectively in expectation). Let us consider the simplifying assumption where bidder values
on the platform are sampled from a certain unknown distribution, that encompasses the
variability in their readiness to pay a certain price. Assuming the bidders actually bid their
true value (for instance, if the mechanism chosen is fixed and “incentive-compatible”, i.e.,
bidding one’s value is optimal for buyers), the seller has then access at the end of the day
to a dataset of buyer values.
Inspired by the computational learning formalism, Elkind (2007), Balcan et al. (2008),
and Cole and Roughgarden (2014a) initiated a line of research aiming at finding approxi-
mations of the revenue-maximizing auction, if possible, efficiently, with approximation
guarantees depending on the size of the dataset gathered (a.k.a., the sample complexity).
This setting is called the batch learning setting. A variant considers the case where the flow
of buyers is continuously coming on the platform and the seller can update continuously
her mechanism. This is the online learning setting introduced in (Cesa-Bianchi et al., 2014).
In all these problems, it is crucial that the samples gathered in the dataset do have the
same distribution as the samples that will be gathered and treated in the future.

1.2.2 Repeated auctions from seller and bidder standpoint

The crucial assumption of myopic/short-sighted/impatient bidders facing a patient seller

is unfortunately not necessarily satisfied, depending on the setting. In modern-day practice,
typically large online ad platforms, such as Google DoubleClick or AppNexus, are selling
ad opportunities for large publishers such as some of the biggest online newspapers. The
main difference with the aforementioned Ebay example is that only a few companies
are actually bidding in these auctions. They are furthermore doing so repeatedly and
participating in massive number of auctions.

5
Indeed, most companies willing to display ads actually rely on third-parties, demand-
side platforms (DSP), that are buying and displaying ads for them (because of technical
constraints, even sending bids in real-time might actually be quite complex). These ag-
gregated bidders are repeatedly interacting with the (same) seller, billions of times a day.
Consequently, this type of buyers can also optimize for long-term utility and need not
be myopic. Thus, even if the seller is using one-shot incentive compatible auctions - for
instance to gather data in order to later design and switch to a revenue maximizing mecha-
nism -, the bidder might have an interest in not bidding “truthfully", as classical theory
would suggest is optimal for them. Indeed, if buyers do not bid their values, this will
modify the distribution of “values” observed by the seller. Subsequently, the mechanism
chosen to optimize her revenue will be different from what it would have been had bidders
been naïve, to the advantage of the buyers (Tang and Zeng, 2018; Nedelec et al., 2019b).
Intuitively, this is possible because the information asymmetry that arose in the Ebay
example between the seller and the bidders – one optimizing over the long-term, the other
over the short-term – is almost reversed. If the seller must commit to a specific mechanism
or a family of mechanisms, for instance for contractual reasons, and buyers have this
information, they can strategically leverage it by e.g. changing their bidding behavior.
In the end, the respective utilities of the seller and buyers will somehow depend on the
underlying amount of asymmetry between them. Several works have started studying
various intermediate settings, for example when bidders are (almost) identical (Kanoria
and Nazerzadeh, 2014), or are patient, but not as patient as the seller (Amin et al., 2013),
etc.

1.3 Organization of the survey

In this survey, our overarching objective is to provide a widely accessible introduction
to the fascinating topics of classical and modern auction theory while bringing to the
fore the statistical and machine learning lenses to the topic. We will very clearly state
the differences between the different information-asymmetry settings we will review, and
point to cutting edge theoretical and practical solutions adapted to them. We will also show
how new statistical tools can be used to tackle some important and well-known problems
from Economics. Furthermore, those questions open many new interesting problems in
Economics since algorithms are replacing classical sellers and buyers. We believe that
modern auction theory offers a nice framework to understand what data and Computer
Science can bring to modern Economics.
In Chapter 2, we survey the main results of the Bayesian auction literature, initiated
with the seminal works of Vickrey and Myerson. Those results form some of the backbone
of classical auction theory and are widely used in Internet practice. We will recall what

6
is the revenue-maximizing auction once the seller has a prior on bidder’s valuations and
introduce some approximations of the revenue-maximizing auction when the seller must
use simpler auctions. In Chapter 3, we focus on the setting derived from the Ebay use
case and tackle both the batch learning setting and the online learning setting. We recall
some key concepts of statistical learning theory, derive the sample complexity of some
of the learning algorithms used to compute a revenue-maximizing auction and show
their computational complexity. In Chapter 4, we focus on the less studied but crucially
important setting where bidders can be strategic regarding the mechanism itself since
they have multiple interactions with the seller. We review some of the main methods
that have been devised to keep bidders from being strategic in that context, show their
limitations and introduce some very new results and approaches developed for bidders to
take advantage of the seller’s learning process.
This survey only assumes basic familiarity with standard notions of Machine Learning,
Statistics and Data Science and is written with a reader having this background in mind.
We hope our survey will be useful to engineers and researchers looking for an introduction
to the beautiful and fast developing topics of modern auction theory and applications.

References
Amin, K., A. Rostamizadeh, and U. Syed. 2013. “Learning prices for repeated auctions
with strategic buyers”. In: Proceedings of the 26th International Conference on Neural
Information Processing Systems-Volume 1. 1169–1177.
Balcan, M.-F., A. Blum, J. D. Hartline, and Y. Mansour. 2008. “Reducing mechanism design
to algorithm design via machine learning”. Journal of Computer and System Sciences.
74(8): 1245–1270.
Cesa-Bianchi, N., C. Gentile, and Y. Mansour. 2014. “Regret minimization for reserve prices
in second-price auctions”. IEEE Transactions on Information Theory. 61(1): 549–564.
Cole, R. and T. Roughgarden. 2014a. “The sample complexity of revenue maximization”. In:
Proceedings of the forty-sixth annual ACM symposium on Theory of computing. 243–252.
Elkind, E. 2007. “Designing and learning optimal finite support auctions”. In: Proceedings
of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. 736–745.
Kanoria, Y. and H. Nazerzadeh. 2014. “Dynamic Reserve Prices for Repeated Auctions:
Learning from Bids”. In: Web and Internet Economics: 10th International Conference.
Vol. 8877. Springer. 232.
Myerson, R. B. 1981. “Optimal auction design”. Mathematics of operations research. 6(1):
58–73.
Nedelec, T., N. El Karoui, and V. Perchet. 2019b. “Learning to bid in revenue-maximizing
auctions”. In: International Conference on Machine Learning. PMLR. 4781–4789.

7
Ostrovsky, M. and M. Schwarz. 2011. “Reserve prices in internet advertising auctions: A
field experiment”. In: Proceedings of the 12th ACM conference on Electronic commerce.
59–60.
Riley, J. G. and W. F. Samuelson. 1981a. “Optimal auctions”. The American Economic Review.
71(3): 381–392.
Tang, P. and Y. Zeng. 2018. “The price of prior dependence in auctions”. In: Proceedings of
the 2018 ACM Conference on Economics and Computation. 485–502.
Vickrey, W. 1961. “Counterspeculation, auctions, and competitive sealed tenders”. In: The
Journal of finance. Vol. 16. No. 1. Wiley Online Library.

8
2 Bayesian mechanism design
First read of this chapter, key concepts and ideas

This chapter introduces the Bayesian setting of auction theory. For

a reader who is new to the topic, we think the key results and ideas
of this chapter are: second price auctions and associated optimal
bidding strategy (Theorem 2.3); the notion of “truthful” bidding,
p. 13; first price auctions and associated bidding strategy at a
symmetric equilibrium (Theorem 2.6); the revenue equivalence
theorem (Theorem 2.8); the notion of virtual value (Definition
2.11) and its implications for optimal reserve prices in second
price auctions (Theorem 2.14), optimal “truthful"/Myerson auc-
tions (Definition 2.19 and Theorem 2.20) and the conceptually
fundamental Myerson Lemma (Theorem 2.16). We also recom-
mend focusing on the symmetric setting on first reading.

Auctions mechanisms involve many different agents, sellers and/or buyers, with possibly
different and conflicting objectives as they all seek to optimize of their own utility functions.
These interactions can be modeled using game-theoretic concepts. More specifically, we
are going to focus on a specific type of games with incomplete information that are called
mechanisms. In those games, each player has some private information (i.e., unknown from
everyone else), and send a message to a central authority. Based on those gathered messages,
the latter decides on the final outcome. The utility of each player then solely depends on
this outcome.
Mechanisms model appropriately many practical situations such as the celebrated
problems of assigning students to schools, or matchings in organ-transplant applications.
In these problems, a central authority forms pairs between school and students or donors
and recipients. In selling mechanisms, the central authority is the seller of a specific item
and the players are the buyers. Auctions are mechanisms used to sell a particular item. In
the case of a sale of a single non-divisible item, they have the following specific features.
Auctions are games of incomplete information as each buyer has some private valuation
for the item to be sold, i.e., the highest price they are willing to pay to acquire this item.
This valuation might be different from one buyer to another. We denote by N the set of
buyers, of cardinality n ∈ N, by Xi ⊂ R the set of possible private values of bidder i ∈ N ,
by xi ∈ Xi the actual private value of bidder i and by x = (x1 , . . . , xn ) ∈ X := i Xi ⊂ Rn , the
Q

so-called profile of private values; the bold notation will refer to vectors for the sake of
clarity. The possible messages (or actions) of buyer i ∈ N are called “bids” and the set of
bids of buyer i is denoted by Bi . The outcome of an auction mechanism is defined by two

9
different rules:

1. an allocation rule q : B = B1 × · · · × Bn → ∆n where ∆n is the set of probability distribu-

tions over the set of n buyers. It specifies the probability that each player gets the
item.

2. a payment rule p : B = B1 × · · · × Bn → Rn . It specifies the expected payment of each

player, whether or not they get the item.

We will denote by A the set of all auction mechanisms, i.e., the set of pairs of alloca-
tion/payment rules.
The utility of bidder i is simply the difference between the item value (if he won the
auction, and 0 otherwise) and his payment (that can be positive even if the auction is lost).
If we denote by b = (b1 , . . . , bn ) ∈ B the vector of bids of the bidders, the expected utility of
bidder i ∈ N , given bids and values , is then defined as

ui (b, xi ) = qi (b)xi − pi (b) ;

on the other hand, the seller aims at maximizing the expectation of her revenue defined,
given the bids, by:
Xn
pi (b) .
i=1

2.1 The Bayesian setting

The auction literature has often considered a Bayesian setting where the value xi of bidder
i ∈ N is random, drawn according to his value distribution that “represents the seller
assessment of the probability of bidder i having a value estimate of xi or less” (Myerson,
1981). The value distribution quantifies the uncertainty of the seller on the maximum
price that one buyer is willing to pay for the item and is represented as a Bayesian prior on
the private information of each buyer.
A widely made assumption is that value distributions are common knowledge among
bidders and seller. As a consequence, we denote by Fi the value distribution - i.e., the
cumulative distribution function of the value - of buyer i ∈ N and by fi its corresponding
density function, assuming it exists, an assumption we are going to make repeatedly
(removing this assumption is almost always a matter of technicalities and has little to
no impact on conceptual questions). We implicitly identify the distribution Fi with its
cumulative distribution function (cdf) and use both terms exchangeably. We are also going
to assume that values xi ∈ Xi are non-negative, i.e., the support of Fi is included in (but not
necessarily equal to) R+ = [0, ∞). Unless otherwise noted, we assume that Exi ∼Fi [xi ] < +∞

10
to get optimality results (and not just ε-optimality). A crucial assumption throughout this
survey is that, unless otherwise noted, the values xi are drawn independently for different
i’s and hence they are statistically independent as random variables. We shall denote by
F = F1 ⊗ . . . ⊗ Fn the joint - and hence - product distribution of x = (x1 , . . . , xn ), the vector of
values.

Examples Typical examples of value distributions are the uniform distribution, which is
widely used in textbook examples due to its simplicity, the exponential and the log-normal
distributions as they are similar to some empirical distributions encountered on modern
internet platforms. Power law distributions (also known as Pareto distributions) are also
widely used, as they capture the idea that the value in real time bidding and online advertis-
ing comes from few matches of very high quality, such as consumers who recently viewed
a product (Arnosti et al., 2016) (this situation where 20% of individuals own/generate
80% of wealth is also referred as the Pareto principle in economics). Generalized Pareto
distributions are also often used as examples because their virtual value - an important
concept we will define later - is linear.

Definition 2.1.1 (Symmetric setting). The auction setting is called symmetric if all bidders
have the same value distribution.

For the sake of clarity, we use pronouns her/she for the seller and he/his for one specific
bidder.

Assumptions on Value Distributions and Notations

If v is a vector (of scalars or functions) in Rn , we call v−i the vector in Rn−1 that contains
all entries of v except the i-th one, vi . In other words, v−i = (v1 , . . . , vi−1 , vi+1 , . . . , vn ). With
a slight abuse of notations, the vector (vi , v−i ) is identified to the vector v. Similarly, the
notation F−i will denote the product distribution of values of all bidders except i and X−i
is the cartesian product of all Xj apart the i-th one.
We are always going to make the following key default assumption on distributions,
unless explicitly noted. Each value distribution Fi is assumed to be continuous, supported
on some interval [0, H], with H being possibly infinite. Moreover, we assume that fi > 0 on
(0, H), except possibly on a set of Lebesgue measure 0. Finally, unless otherwise noted Fi
is assumed to have at least one moment (i.e., the associated random variable xi has finite
expectation). This will also imply that ψi , the function defined later on in Section 2.11 and
called the virtual value function associated with Fi , also has one moment.
We recall that if Gi is the cdf of the largest value of a vector x−i drawn according to
Q
F−i , then Gi (t) = j,i Fj (t). In particular, with our key default assumption on distributions,

11
Gi is differentiable and its density is strictly positive except possibly on a set of measure
zero. We also use the standard convention that a function β is increasing, if for all l < u,
β(l) < β(u); on the other hand, a non-decreasing function is such that if l < u, β(l) ≤ β(u).
We use ∂f to denote the subdifferential of a convex function f (see Hiriart-Urruty and
Lemaréchal, 2001, Chapter D). Finally h·, ·i is the standard dot product between two vectors
and for any integer N ∈ N, we define [N ] = {1, . . . , N }.

2.1.1 Properties of auction systems

A bidder’s strategy is the mapping indicating which bids he sends to the seller to buy one
specific item, conditional on his private information, a.k.a., his value. Stated otherwise,
a strategy for bidder i is a mapping that maps private values in Xi to bids in Bi . We will
denote by Bi the set of all strategies of player i, βi the strategy chosen by i and β −i the
profile of strategies corresponding to all bidders except bidder i.
In Bayesian games, it is important to differentiate between ex ante, interim and ex post
properties (Hartline et al., 2013). These notions depend on the information available to
buyers when they decide to participate in the game and choose their strategy.

• For ex-ante properties, bidders do not know yet their own value for the item; i.e.,
they only know Fi and F−i .

• For interim properties, bidders know their valuation but do not know the values of
other players, i.e., they know xi and F−i .

• For ex-post properties, bidders know both their and the other players’ valuations, i.e.,
they know xi and x−i .

In the next sections, we will mostly focus on interim properties. It is the starting point of
most of the auction literature: we assume that value distributions are common knowledge
and that exact valuations are private information to each bidder. We will mention explicitly
when we refer to ex-ante or ex-post properties.
We are also assuming that bidders are risk-neutral. In other words, they seek to maxi-
mize their expected utility and use utility-maximizing strategies: Given the strategy β −i of
other players, the expected utility of the strategy βi given bidder i’s value xi is denoted by

Ui (βi , β −i , xi ) = Ex−i ∼F−i ui (β1 (x1 ), . . . , βi−1 (xi−1 ), βi (xi ), βi+1 (xi+1 ), βn (xn )), xi .

Optimality and characterization of strategies Maybe one of the most central concepts
in game theory is (Bayesian) Nash equilibrium. At a Bayesian Nash Equilibrium, for any
bidder i, his strategy βi maximizes his expected utility, given his valuation distribution Fi ,
and given the strategies of his opponents and their valuation distributions, i.e., β −i and F−i .

12
A stronger concept is that of weak dominance: a strategy βi is weakly dominant when it is
optimal in terms of expected utility of bidder i against any strategies used in β −i and not
only those at a Bayesian Nash Equilibrium. The strongest concept is ex-post dominance,
where optimality is achieved at any possible profile of valuations.

Definition 2.1. • Nash equilibrium: A profile of strategies (β1 , . . . , βn ) is a Bayesian Nash

equilibrium if for all players i ∈ N ,

∀β ∈ Bi , ∀xi ∈ Xi , Ui (βi , β −i , xi ) ≥ Ui (β, β −i , xi ) .

• Weak dominance: A strategy βi is weakly dominant for player i ∈ N if

∀β ∈ Bi , ∀xi ∈ Xi , ∀β −i ∈ B−i , Ui (βi , β −i , xi ) ≥ Ui (β, β −i , xi ) .

• Ex-Post Weak dominance: A strategy βi is ex-post weakly dominant for player i ∈ N if

∀β ∈ Bi , ∀xi ∈ Xi , ∀β −i ∈ B−i , ∀x−i ∈ X −i , ui ((βi (xi ), β −i (x−i )), xi ) ≥ ui ((β(xi ), β −i (x−i )), xi ) .

Those properties of strategies are classical concepts in game theory. On the other hand,
it is also possible to introduce and study different properties of mechanisms. Some of them
require the concept of “truthful bidding” which correspond to the specific strategy βi (x) = x.
We will denote by βi,tr this truthful strategy.

Characterization of mechanisms

Definition 2.2. A mechanism is

(BIC) Bayesian Incentive-Compatible: if bidding truthfully for all bidders is a Bayesian

Nash equilibrium.

(DSIC) Dominant Strategy Incentive-Compatible: if bidding truthfully is a weakly domi-

nant strategy for all bidders.

(Standard) if it allocates the item to the buyer with the highest bid.

(Efficient) if it allocates the item to the buyer with the highest valuation (at least at some
equilibrium).

(IR) interim Individually-Rational: if

∀ i ∈ N , ∀xi ∈ Xi , Ui (βi,tr , β −i,tr , xi ) ≥ 0 .

and ex-post Individually-Rational: if

∀ i ∈ N , ∀xi ∈ Xi , ∀x−i ∈ X −i , ui (xi , x−i ), xi ≥ 0 .

13
A DSIC mechanism is obviously a BIC mechanism. More generally, Incentive Compati-
ble (IC) auctions have the nice property of being “simple” for the buyers from a strategic
standpoint: bidding their (known in the interim setting) valuation is optimal for them. No-
tice that this unfortunately does not ensure the uniqueness of the equilibrium where each
bidder bids truthfully (we will call this equilibrium the truthful equilibrium). See Section
2.6 for more details. Like most authors we restrict attention to the truthful equilibrium
from now and leave more pathological equilibria aside. As we will see later, being DSIC is
one of the main reasons explaining the tremendous success of second-price auctions in
practice. Another reason is that if bidders are bidding truthfully, then the seller can, in
a first step, elicit their value distributions through a DSIC mechanism and then move to
another mechanism that maximizes her revenue (this is detailed in Section 2.4).
Finally, before presenting and analyzing two classical types of auctions, we indicate
that individual rationality simply ensures that bidders have an interest in taking part in
these auctions.

2.2 Sealed-bid auctions

In a sealed-bid auction bidders privately send their bid to the seller. We present below two
of the most well-known sealed-bid auctions, the second-price and the first-price auctions.
This class of auctions does not include the well-known - in popular culture - ascending
auction, a.k.a., English auction, where bidders can observe bids from other bidders and
progressively choose to increase their bids until only one bidder remains. On the other
hand, if bidders’ valuations are independent, there exists a strategic-equivalence between
the ascending and the second-price auction.

2.2.1 The sealed-bid second-price auction

The second-price auction allocates the item to the highest bidder who pays the highest
bid among other bidders, i.e., the second highest bid. A key property of this auction is the
following result (Vickrey, 1961), which does not require any assumption on bidders’ value
distributions.

Theorem 2.3. The second-price auction is DSIC. In other words, bidding truthfully is
weakly dominant.
∗
Proof. Let us denote by xi the private value of bidder i and by b−i the highest bid of the
competition. We are going to compare the utility of bidding b instead of xi .
∗
• Case b > xi . The only case where his (ex-post) utility is changed is when xi < b−i < b.
∗
With a bid b, he now wins the auction but his utility is negative since xi − b−i < 0.

14
∗
• Case b < xi . The only case where his (ex-post) utility is changed is when xi > b−i > b.
With a bid b, he now loses a profitable (i.e., with a positive utility) auction.

Hence, bidder i has no incentive to deviate from truthful bidding.

The classical second-price auction was used until recently (Feng et al., 2021) by most of
the biggest online platforms to sell ad placements on publishers’ websites. Another widely
used and studied sealed-bid auction is the first-price auction.

2.2.2 The sealed-bid first-price auction

The first-price auction allocates the item to the highest bidder who pays his own bid.
Before studying Nash equilibria of symmetric first-price auction, we derive the general
best reply of player i to the bid distribution of the competition, specifically the distribution
of the maximum bid of the competition. This result is of increasing interest to practitioners
as many online auctions are now first price auctions.

Proposition 2.4. Let Gi be the cdf of the highest bid of the competition of bidder i, i.e.,
maxj,i bj . In a sealed-bid first price auction, a best response of bidder i to Gi is any mapping
βi satisfing
βi (xi ) ∈ argmax Gi (b)(xi − b) .
b∈R
When Gi is log-concave and Gi (xi ) > 0, the best response is unique. If we further assume
that Gi has a pdf gi , first order conditions also give, if Gi (xi ) > 0,
Gi (b)
βi (xi ) is a solution (in b) of b + = xi .
gi (b)
If Gi (xi ) = 0 a best response is βi (xi ) = xi .

Calling Y−i a random variable with cdf Gi , it can also be shown that under mild techni-
cal conditions that βi is increasing and satisfies the equation βi (x) = E βi−1 (Y−i )|Y−i ≤ βi (x) .
We also have the following interesting corollary.

Corollary 2.5. Proposition 2.4 implies that the first price auction is in general not BIC.

The corollary simply follows by showing that the best response of bidder i when all
other bidders bid truthfully (and hence the top bid of the competition is the largest value
of the other bidders) consists in bidding something else than xi . If Gi (xi ) > 0 it follows
immediately than they are better strategies than bidding bi = xi : for instance, take any
b̃i such that Gi (b̃i ) = Gi (xi )/2 (b̃i exists by continuity of Gi and Gi (ai ) = 0). The utility of
bidder i is strictly positive at b̃i and is 0 at xi .

15
Proof. Let us denote by Y−i the random variable corresponding to the maximum bid of the
competition of bidder i, so that its cdf and pdf are Gi and gi .
When bidder i has private value is xi and bids bi , the utility he derives from the auction
is ui (bi , xi ) = (xi − bi )1{bi > Y−i }; in other words, it is his value minus his cost when he wins
the auction and zero otherwise.
We denote by Ui (bi , xi ) : R+ × R+ → R the associated expected utility of bidder i when
his private value is xi and he bids bi . We have

Ui (bi , xi ) = EY−i (xi − bi )1{bi > Y−i } = Gi (bi )(xi − bi ) .

A best response is therefore any b(xi ) ∈ argmaxb∈R+ Gi (b)(xi − b). Note that when Gi is
log-concave and Gi (xi ) > 0, we can verify by inspection that Ui (bi , xi ) is strictly log-concave
in bi on the support of Fi and therefore it has a unique maximum smaller than xi (Boyd
and Vandenberghe, 2004). This property follows also immediately from the definition of a
strictly concave function.
Since Gi has a pdf, it is continuous and differentiable and therefore so is Ui (bi , xi ) as a
function of bi . The derivative with respect to bi is then equal to
∂Ui
(b , x ) = gi (bi )(xi − bi ) − Gi (bi ).
∂bi i i
∂Ui (bi ,xi )
Let us assume that Gi (xi ) > 0, then Ui (xi , xi ) = 0 but ∂bi
< 0. This first implies by
b=xi
continuity that there exists bids where the utility is positive. Rolle’s theorem applied to
t 7→ Gi (t)(xi − t) also gives the existence of a stationary point bi∗ ≤ xi where ∂U i
(b∗ , x ) = 0,
∂bi i i
as Ui (0, xi ) = 0, too (since we assumed non-negative bids). Since we showed above that
Ui (bi , xi ) is positive somewhere in a neighborhood of xi , then necessarily Ui (bi∗ , xi ) > 0.
Finally, if gi (bi∗ ) = 0 then this would imply that Gi (bi∗ ) = 0 to satisfy the first order condition
and therefore Ui (bi∗ , xi ) would be equal to 0 which is impossible.
The case where Gi (xi ) = 0 is trivial as bidder i cannot have a positive utility (recall that
Gi is a non-decreasing and non-negative function, so 0 ≤ Gi (bi ) ≤ Gi (xi ) if bi ≤ xi ). Bidding
xi is then optimal.

There is a very rich line of work focusing on deriving Nash equilibria in first-price
auctions when bidders have different value distributions. This involves solving complex
systems of coupled first-order differential equations (at least with continuous value distri-
butions, see Krishna, 2009, Section 4.3; see also p. 18). On the other hand, with symmetric
bidders, i.e., with identical value distributions, it is possible to solve explicitly this system
of equations and to derive the unique symmetric Nash equilibrium with increasing strategy.
From now on, we will call a Nash equilibrium increasing if the strategies are all increasing
mappings.

16
Theorem 2.6. In the symmetric case, if the common pdf f is such that f (x) > 0 (except on
a set of Lebesgue measure 0 within the support of F), there exists a symmetric increasing
Nash-equilibrium whose strategy is described by:
(1) (1)
β(x) = E[x−i |x−i < x] .
(1)
where x−i is the highest value among bidders except bidder i.

This bidding strategy can be interpreted as bidding the expectation of the largest value
of the competition, conditionally on the fact that this value is smaller than bidder i’s value.
We note that this bidding strategy can be derived from the proof of the revenue-equivalence
Theorem 2.8, and specifically the expected payment formula. This is another common
method for finding equilibrium bidding strategies. The proof presented below might lead
more directly to the solution.

Example. Suppose there are n ≥ 2 bidders, and they all have uniform [0,1] value distri-
bution, i.e., F(x) = x on [0, 1]. Then a symmetric increasing Nash equilibrium exists in 1st
price auctions where all bidders bid using the strategy
n−1
β(x) = x.
n
Proof. We assume that n ≥ 2 and that all bidders are using the function β described
above. As we will show below, this function is increasing on the support of F under our
assumptions. Furthermore, when all bidders are using the same increasing strategy on the
union of the support of their value distributions, the probability that bidder i wins the
auction is the same as the probability that he has the highest value; this would not always
be true if the strategy were only non-decreasing.
Furthermore, elementary properties of conditional expectations give, if Gi and gi are
(1)
the cdf and pdf of x−i ,
Rx Rx
yg i (y)dy Gi (u)du
β(x) = βi (x) = 0 =x− 0 , whenever Gi (x) > 0 .
Gi (x) Gi (x)
Under our assumptions, Gi = F n−1 , thus gi (x) = (n − 1)f (x)F n−2 (x) and gi (x) > 0 on the
support of F, except possibly on a set of Lebesgue measure 0. As a consequence,
Rx
g i (x) G (u)du
0 i
β 0 (x) =
[Gi (x)]2
and therefore, restricted to the support of F, β is a non-decreasing function whose deriva-
tive is 0 on a set of Lebesgue measure 0 . We conclude that β is actually increasing on the
support of F. Since the latter is supposed to be [0, H], with H possibility infinite, bidder i

17
has no incentive to bid higher than β(H). Then, any other bid b will satisfy b ∈ [0, β(H)]
and since β(0) = 0 and β continuous , because F is continuous, there must exist z ∈ [0, H]
such that b = β(z). Finally, note that the probability that bidder i wins the auction when
bidding β(z) is just Gi (z), since β is increasing on the support of F. Therefore,

Ui (β(x), x) − Ui (b, x) = Ui (β(x), x) − Ui (β(z), x)

 Rx   Rz 
 yg i (y)dy   yg i (y)dy 
= Gi (x) x − 0  − G (z) x − 0
i

Gi (x) Gi (z) 
  
Zz Zz
= x(Gi (x) − Gi (z)) + ygi (y)dy = (y − x)gi (y)dy ≥ 0 .
x x

This shows that β is the best response, and thus (β, . . . , β) is a symmetric (increasing)
Nash equilibrium. We will prove unicity of this increasing differentiable symmetric Nash
equilibrium in symmetric first-price auctions in Section 2.3.

Unlike second-price auctions, first price auctions are not incentive compatible (see
e.g. Corollary 2.5). As a consequence, the strategy of a bidder at an equilibrium depends
on the bidding strategy of the other bidders, and ultimately on other bidders’ valuation
distributions (see Proposition 2.4). So, in practice, computing a good or optimal bidding
strategy would require estimating the distribution of the highest bid of the competition,
which can be very challenging. Nevertheless, because of their relative transparency for
bidders (who know ahead of time what they might pay if they win), first-price auctions
are increasingly used in online advertising auctions (Feng et al., 2021). However, optimal
bidding becomes much more complex for bidders than it is in second price or other
BIC/DSIC/“truthful” auctions.

Nash equilibrium in the asymmetric case Asymmetric first price auctions are much
more intricate than symmetric ones as the equilibrium strategy of each bidder depends in
a very subtle manner of the other bidders’ strategies (Lebrun, 1999). Indeed, let us assume
that the distributions F1 , . . . , Fn are supported on [h, H], have a density bounded away from
0 on (h, H) and possibly have a point mass at h.

Theorem 2.7 (Lebrun, 1999). Under these assumptions, there exist deterministic Nash
equilibrium strategies that are increasing. Let us denote them by βi and by xi (b) = βi−1 (b) ≥ b
their inverse, i.e., the value inducing the bid b. Then the increasing functions b → xi (b)
solve the system of differential equations:
n
∂ −1 1 X 1
∀i ∈ N , log(Fi (xi (b))) = + . (2.1)
∂b xi (b) − b n − 1 xj (b) − b
j=1

18
When the distributions are without atoms, the boundary conditions are xi (h) = h for all
i ∈ N , and there exists η > 0 such that for all i, βi (H) = η.

The boundary conditions mean that bidders bid their value at h and have a common
maximal bid (Fibich and Gavious, 2003).

Numerical issues in computing Nash equilibrium Finding the solution to the differen-
tial system (2.1) is considered hard essentially because the solutions are unstable near the
boundary (Marshall et al., 1994). The case of n = 2 bidders has received a fair amount
of attention both from both theoretical and numerical perspectives (Fibich and Gavish,
2012). Another approach finds yet another form of the differential system of equations
and expands the functions appearing in it in a fixed polynomial basis. (Gayle and Richard,
2008).

2.3 A revenue equivalence theorem

Revenue equivalence theorems are general results showing that different auction systems
– sharing nonetheless some properties – are equivalent in terms of expected revenue at
a specific equilibrium. The first revenue equivalence theorem showed that the sealed-bid
second-price auction and the ascending/English auction lead to the same revenue for the
seller (Vickrey, 1961). This result was later extended to all standard auctions - under rather
minimal assumptions.

Theorem 2.8. Consider the family of auctions that are both

i) standard (i.e., the winner is the bidder with the highest bid)
ii) 0-rational (i.e., winning the auction with a bid of 0 induces a payment of 0).
When all bidders have the same value distribution, i.e., in the symmetric case, the seller’s
expected revenue and bidders’ expected utilities at a symmetric increasing Nash equilib-
rium are independent of the specific payment rule.

Remark. The revenue equivalence Theorem 2.8 applies in particular to first and second
price auctions, in the symmetric case where all bidders have the same value distribution
(Myerson, 1981) and use an increasing strategy at equilibrium. The revenue equivalence
theorem assumes that bidders have all the same value distributions. There exists asymmet-
ric cases where the first-price auction brings more revenue than the second-price auction
and vice-versa (Krishna, 2009, Section 4.3.2). We give in Section 2.6 an example showing
that the assumption that β is increasing is crucial and cannot be dispensed with.

Proof. Given a specific standard auction, let β be a strategy at an increasing symmetric

Nash equilibrium. Let us denote by P (t) the corresponding expected payment of bidder i

19
when he bids β(t) and the other bidders are using the same strategy β. As before, we denote
(1)
by Gi the distribution of x−i , the highest value among all the bidders except bidder i. Using
the fact that β is increasing, that all players use this strategy - at the Nash equilibrium -
and the fact that the auction is standard, the probability that bidder i wins when he bids
β(t) is Gi (t).
We can still assume that a deviation of bidder i consists in bidding β(z) instead of β(xi )
because the auction is standard and β is increasing. In particular, the expected utility for
bidder i of this deviation can be written as

Ui (z, xi ) = xi Gi (z) − Pi (z) .

Since β is a strategy corresponding to a Nash equilibrium, bidder i’s expected utility is

maximized when he bids β(xi ) and hence

Ui (xi , xi ) = max Ui (z, xi ) = max xi Gi (z) − Pi (z) .

z z

We now introduce the mapping Vi (xi ) = Ui (xi , xi ) that is convex, as the maximum of
linear and functions, and hence almost everywhere differentiable (Hiriart-Urruty and
Lemaréchal, 2001)). Recall also that (Lemma 4.4.1 in (Hiriart-Urruty and Lemaréchal,
2001)) the subdifferential at some x of a supremum of convex functions contains the
convex hull of the subdifferentials of the functions achieving this supremum (and is empty
if the supremum is not achieved). Before proving formally the result, let us give some
intuitions. The function Vi we consider is the maximum (over z) of linear mapping in x,
whose differential are simply Gi (z). As a consequence, when Vi is differentiable at x it holds
that Vi0 (x) = Gi (z∗ (x)), where z∗ (x) = argmaxz Ui (z, x), which suggests that “Vi0 (x) = Gi (x)".
This intuition is formalized thanks to the envelop theorem (Milgrom and Segal, 2002)
that holds because Ui (z, ·) is linear for all z and hence differentiable and therefore absolutely
∂Ui (z,x)
continuous. Furthermore, ∂x = Gi (z) ∈ [0, 1] for all z. Since the maximum of z 7→ Ui (z, xi )
is attained for z = xi at the Nash equilibrium, the envelope theorem finally states
Zx
Vi (x) − Vi (0) = Gi (u)du .
0

However, it also holds that Vi (x) − Vi (0) = xGi (x) − Pi (x) + Pi (0). Thus, since Pi (0) = 0 because
of 0-rationality, we finally get that, integrating by parts,
Zx Zx
(1) (1)
Pi (x) = Pi (0) + xGi (x) − Gi (u)du = tdGi (t)dt = Gi (x)E[x−i |x−i < x] .
0 0
Hence the expected payment of bidder i is independent of the specific auction format. As
a consequence, so are the expected seller’s revenue and the expected utility of bidder i,
because the auction is standard.

20
Remark. This proof shows a principled way to get necessary conditions on bidding strate-
gies forming an increasing Nash equilibrium, through the payment formula derived above.

Informal derivation of symmetric first-price auction equilibrium strategy

The proof of Theorem 2.8 is a bit formal and technical as it relies on convex analysis
arguments; however, it provides insight on how to easily and informally derive symmetric
equilibrium strategies. Denote by β the common strategy of an increasing Nash equilibrium
of the first-price auction, postulated at this point for this informal derivation to exist. Note
that by symmetry, Gj = Gi = G for all i, j and similarly for the pdfs. So we use the notations
G and g for cdf and pdf below. Conducting the same computations as in the proof of
Theorem 2.8, assume that all bidders but i follow β - i.e., bidder j , i bids β(xj ) if xj is his
value - and that bidder i is bidding β(z) instead of β(xi ). Note that because all players are
using the strategy β and β is increasing, the probability that bidder i wins the auction is the
exactly the probability that z is higher than the largest value of the competition. In other
words, the probability that he wins the auction is G(z). His utility in the specific case of a
first price auction can then be written as

Ui (β(z), xi ) = (xi − β(z))G(z) .

Since β is a symmetric increasing Nash equilibrium, the maximum utility is attained

by bidding b = β(xi ). Differentiating with respect to z, temporarily assuming that β is
differentiable, then yields

∂Ui
0= (β(z), xi ) = (xi − β(xi ))g(xi ) − β 0 (xi )G(xi ) .
∂z z=xi

Notice that the above equation can be rewritten in the more compact form

(β(x)G(x))0 = xg(x) .

Integrating the above equation and using the fact that G(0) = 0, we can compute explicitly
the symmetric equilibrium strategy
Rx
sg(s)ds (1) (1)
β(x) = s=0 = E[X−1 |X−1 < x] .
G(x)

We can verify as posteriori that β is differentiable and increasing when g > 0.

This also proves the uniqueness of increasing differentiable Nash equilibria in a sym-
metric first-price auction.

21
2.4 Deriving revenue-maximizing auctions
We now focus on how the seller can design her auction system to maximize her revenue. A
large part of the recent literature on auctions have focused on this objective since most of
the auctioneers have a choice in designing the rules of their respective auction platforms.
To compute the optimal revenue-maximizing auction, we assume that the seller has
prior knowledge on the distribution Fi on each bidders’ valuations. These value distribu-
tions quantify the information that the seller has on each bidder.

2.4.1 The role of reserve prices

Setting reserve prices is a crucial tool used to improve or maximize seller’s revenue in
classical auctions. To illustrate its role, we focus on second-price auctions.
Definition 2.9. The reserve price is the minimum price a bidder must pay to acquire an
item. The reserve price is anonymous (respectively, personalized) if it is the same for all
bidders (resp., if it is different from one bidder to the other).
In a second-price auction with reserve prices the buyer that wins the auction must have
bid above his reserve price. He then pays the maximum between his reserve price and the
second highest bid. A crucial and perhaps surprising point is that, in expectation, it can be
beneficial for the seller to sometimes not allocate the item. However, whenever the item is
sold, the payment is higher with this reserve price than without.
To illustrate this point, we quickly recall a historical example, the New-Zealand radio
spectrum rights auction (Milgrom, 2004). In 1990, the NZ government decided to sell some
radio spectrum rights through multiple simultaneous sealed-bid second-price auctions
without reserve prices for the corresponding licenses. These auctions were expected to
raise around NZ$250 million. Instead, the government revenue was around NZ$36 million.
On many licenses, there was a huge discrepancy between the first and the second bid. For
instance, a firm bid NZ$100 000 and the second price was only NZ$36. . . If reserve prices
had been set beforehand, it would have ensured to the government that the firm who made
a bid of NZ$100 000 would have paid a price sufficiently high. This example illustrates
the importance of setting reserve prices. A first way to do it would be to assume that the
seller can compute an intrinsic value for keeping the item. This could be a reasonable
assumption for housing or wine auctions. However, in many other situations (TV rights,
radio spectrum rights auctions...), the seller does not have any intrinsic value for the item.
For illustration purpose, we will compute in the following the optimal reserve price in
the simple case of a single buyer. As before, we denote by F the cdf of the buyer valuation,
and for the sake of simplicity, we assume it has a density f (again, all the following
results generalize to arbitrary distributions, but at the cost of technicalities) and has finite
expectation.

22
2.4.2 The posted price setting: monopoly pricing
In this particularly simple through practically very common setting, designing an auction
simply reduces to a take-it-or-leave-it offer, also called posted price. In other words, a fixed
selling price is offered and a rational bidder will accept to pay it to acquire the item if and
only if the price is smaller than his valuation.
Lemma 2.10. In a posted price setting, the seller’s expected revenue is
Π(r) = r(1 − F(r)) . (2.2)
In the same setting, when F is differentiable and has finite expectation, the optimal reserve
price, called the monopoly price, is a solution of:
1 − F(r)

0 = −[r(1 − F(r))]0 = rf (r) − (1 − F(r)) = f (r) r − .
f (r)
Proof. The seller’s revenue can be written as a function of r as: in expectation, the seller’s
revenue is just the fixed price r multiplied by the probability that the buyer buys the item.
This latter probability is just the probability that the value of the buyer is above r. Formally,
if x is the value of the item for the buyer, the revenue of the seller Π(r) when selling at the
fixed price r is Z +∞
Π(r) = rP{x ≥ r} = r f (x)dx = r(1 − F(r)) .
r
The result on the monopoly price follows from differentiating the previous relation, since
dΠ(r)
= (1 − F(r)) − rf (r) .
dr
Furthermore, choosing r = 0 or r arbitrarily high gives 0 revenue, the latter because F is
assumed to have finite expectation, which then implies that limt→∞ t(1 − F(t)) = 0 by the
dominated convergence theorem. Indeed, if x has distribution F, t(1 − F(t)) ≤ Ex∼F [x1{x ≥
t}] ≤ Ex∼F [x] < ∞. Hence a maximum of the function Π exists among its stationary points,
finishing the proof.

Notice that without the assumption that F has a finite expectation, the optimal reserve
price could be arbitrarily high: take for instance F(r) = 1 − r −α with 0 < α < 1. Such an
example might actually be relevant in luxury items markets.
One of the purpose of discussing the single bidder case was to introduce organically
the crucial concept of virtual value (Myerson, 1981).
Definition 2.11. The virtual value function ψ : X → R of a distribution F (with pdf f ) is:
1 − F(x)
ψ(x) = x − . (2.3)
f (x)

23
The virtual value function can be either positive or negative, irrespective of the support
of the value distribution. The expectation under F of the virtual value, i.e., Ex∼F [ψ(x)],
is actually equal to the infimum of the support of F, when F has finite expectation. In
particular, it is equal to 0 if the support of F “starts" at 0.
The virtual value ψ(x) is a crucial concept that can be interpreted as virtual payment, as
we explain now. If the bidder has value x and decides to buy the item, so x ≥ r, his (virtual)
payment can be thought of as ψ(x), independently of the price r set by the seller. Indeed,
the revenue generated by such a price r is
Z∞ Z∞
Ex∼F [ψ(x)1{x ≥ r}] = ψ(x)f (x)dx = xf (x) − (1 − F(x))dx
r r
Z∞
=− (x(1 − F(x)))0 dx = r(1 − F(r)) = Π(r) .
r

The last equality comes from the definition of Π(r), see Equation (2.2). As a consequence,
even though it is traditionally called virtual value, ψ(x) could rather be understood as a
virtual payment: the buyer pays on average ψ(x) when his value is x and he buys/wins
the item, i.e., x ≥ r. See also Proposition 2.29 for an explanation of why this interpretation
holds for general auction systems and buyers optimizing their expected utility.

Examples of virtual value functions:

• if x ∼ U([0, 1]), the uniform distribution over [0, 1], then ψ(x) = 2x − 1 for x ∈ [0, 1].

• if x ∼ Exp(λ), the exponential distribution, then ψ(x) = x − λ for x ∈ R+ (F(x) =

1 − exp(−x/λ)).

• Generalized Pareto (GP) distributions, parametrized by (µ, σ , ξ) where σ > 0 and

ξ ≤ 0, have cdf  ξ(x−µ) −1/ξ
1 − (1 + σ ) for ξ < 0


Fµ,ξ,σ (x) =  .
1 − e−(x−µ)/σ
 for ξ = 0
Their virtual value is affine (Balseiro et al., 2020)

ψµ,ξ,σ (x) = (1 − ξ)x + ξµ − σ

Relationship between monopoly price and virtual value/payment

Lemma 2.10 states that the optimal reserve price against a single bidder (which was
called the monopoly price) is, in the case where the virtual value/payment function ψ is
increasing and changes sign, necessarily the root of ψ (or the point where the sign changes

24
if ψ is not continuous): if Π(t) is the expected revenue of the seller at reserve price t (see
Equation (2.2)),
Π0 (t) = −f (t)ψ(t) .
If ψ is strictly positive everywhere, which can happen if the infimum of the support of Fi
is positive, then the optimal reserve price is that specific point (or equivalently 0). Quite
interestingly, even with multiple other bidders, the optimal reserve price for bidder i is
still i’s monopoly price, i.e., the same as if he were the only bidder. This is illustrated
in the following Section 2.4.3 under the same assumptions of ψ being increasing and/or
changing sign once.
As a consequence, in the following, our main focus will be on these distributions which
are called regular. The results we will prove can be generalized to non-regular distribution
with a technique called ironing, see Section 2.4.6.

Definition 2.12. The distribution F is regular if its corresponding virtual value ψ is

increasing.

The uniform, exponential and generalized Pareto distributions with ξ < 1, are all
regular distributions.

2.4.3 Optimal reserve prices in a second-price auction

We focus in this section on second-price auctions with reserve prices (Riley and Samuelson,
1981b) where n buyers are asymmetric, i.e., their value distribution can be different. As
a consequence, the seller might also set different, personalized, reserve prices so as to
increase her revenue. When introducing the concept of reserve price, we mentioned that
a bidder can only win the auction if his bid was higher than his reserve price and that
the latter is the minimal payment that bidder might pay. There however remains some
ambiguity on how the auction unfolds (depending on which condition “highest bidders"
or “bid above reserve price" is checked first). As a consequence, there exist at least two
different types of second-price auction with reserve prices.

“Lazy” 2nd-price auction: The winner can only be the highest bidder. He gets the item
only if he clears his reserve price (i.e., he bids above it), and pays the maximum
between his reserve price and the second highest bid overall (regardless of whether
the second highest bid cleared its reserve).

“Eager” 2nd-price auction: Bidders that have not cleared their respective reserve price
are disregarded. Thus the winner is the highest bidder amongst those that have
cleared their reserve price and he pays the maximum between his reserve price and
the second highest cleared bid.

25
First of all, notice that if the reserve prices are anonymous, i.e., the same for all bidders,
as they should be in the symmetric case for instance, both types of auctions coincide.
Optimal reserve prices are easy to compute in lazy auctions, as they have an explicit
form. They are on the other hand hard to compute for eager auctions. Moreover, the eager
2nd-price auction is also not the revenue-maximizing auction for the seller. So this concept
is neither simple (as is the lazy auction) nor optimal (as is the Myerson auction, see Section
3.3.1). As a consequence, we will not put too much emphasis on eager auctions. In practice,
if one wishes to implement eager 2nd-price auctions, a good idea would be to use the
reserve prices of the corresponding lazy 2nd-price auction.
It is quite immediate to see that lazy and eager second price auctions are still DSIC
mechanism (the proof follows the exact same lines as that without reserve prices), hence we
shall again only consider the truthful equilibrium. We now derive the expected payment
of a bidder at this equilibrium.

Theorem 2.13. Let Pi be the expected payment of bidder i facing reserve price ri at the
truthful equilibrium of a lazy second-price auction. Then

Pi = Exi ∼Fi ψi (xi )Gi (xi )1{xi ≥ ri } ,

where Gi still denotes the cdf of maxj,i xj .

Proof. Let us introduce the notation y−i = maxj,i xj so that the pointwise payment of bidder
i, when he has value xi , given all the values xj is equal to:

max{ri , y−i }1{xi ≥ y−i }1{xi ≥ ri } ,

We note that

max{ri , y−i }1{xi ≥ y−i }1{xi ≥ ri } = 1{xi ≥ ri } ri 1{y−i ≤ ri } + y−i 1{xi ≥ y−i }1{y−i ≥ ri } .

As a consequence,

Pi = EF max{ri , y−i }1{xi ≥ y−i }1{x ≥ ri }
Z Z
= max{ri , y−i }1{xi ≥ y−i }1{xi ≥ ri }fi (xi )gi (y−i )dy−i dxi
xi y−i
Z +∞ Z +∞
= 1 − Fi (ri ) Gi (ri )ri + y−i gi (y−i )fi (xi )1(xi ≥ y−i )dy−i dxi
xi =ri y−i =ri
Z +∞ Z +∞
= 1 − Fi (ri ) Gi (ri )ri + y−i gi (y−i ) fi (xi )dy−i dxi (by Fubini)
y−i =ri xi =y−i
Z +∞
= 1 − Fi (ri ) Gi (ri )ri + y−i gi (y−i )(1 − Fi (y−i ))dy−i
y−i =ri

26
Z +∞
= f (y−i )y−i − (1 − Fi (y−i )) Gi (y−i )dy−i (by integration by parts)
y−i =ri

= EFi ψi (x)Gi (x)1{x ≥ ri } .

We can therefore easily derive the optimal reserve prices, as a function of ψi at least for
regular distributions.

Theorem 2.14. If Fi are regular, the optimal reserve prices in a lazy second-price auction
are:
! !
−1 −1
r1 , . . . , rn = ψ1 (0), . . . , ψn (0) ,

with the convention that ψ1−1 (0) is the minimum of the support of Fi is ψi is positive
everywhere and the point where ψi changes its sign if it is discontinuous.

Proof. The seller maximizes the sum of expected payment:

n
X Xn
EF Pi = EFi ψi (xi )Gi (xi )1{xi ≥ ri } .
i=1 i=1

Since Fi is regular, ψi has one zero and is negative before

and positive after.
Thus, the
optimal choice for ri is ψi−1 (0) as the function t → EFi ψi (xi )Gi (xi )1{xi ≥ t} is increasing
before ψi−1 (0) and decreasing afterwards, owing to the sign of the integrand on both sides
of ψi−1 (0).

The proof indicates that in a lazy second-price auction, the seller can safely maximize
the payment of each bidder one by one independently. Indeed, in a lazy second price
auction, changing the reserve price of one specific bidder does not change the probability
of winning and the payment of the other bidders. This is not the case for eager second-
price auctions on the other hand, which explains the complexity of computing the optimal
reserve prices in them. Finally, the optimal reserve prices in a lazy second-price auction
correspond to the monopoly prices of each bidder.

Corollary 2.15. The optimal reserve price for a bidder in a lazy second-price auction is
independent of the presence, or not, of other bidders. In particular, it is the same as in the
situation where he is the only bidder.

27
Bidder payment as function of the reserve price
Varying number of bidders
0.25
1 bidder
0.20 2 bidders
Bidder payment 3 bidders
0.15
4 bidders
0.10 Monopoly
price
0.05

0.00
0.0 0.2 0.4 0.6 0.8 1.0
Reserve price r

Figure 2.1: Bidder’s payment as a function of the reserve price depending on the number of players in
second-price auction with bidders all having a uniform value distribution U([0, 1]).

The seller’s revenue increases using personalized reserve prices when bidders have
very different value distributions. Intuitively, it is in her best interest to set a high reserve
price to bidders with high values most of the time (or very high values sometimes) and low
reserve prices to bidders with low values most the time.

So far, we have only focused on the seller’s revenue when designing auctions. An
alternative objective can be the maximization of the global welfare of the system, which is
the sum of the seller’s revenue and all bidders’ utility.
Even though reserve prices largely increase the seller’s revenue, they actually signif-
icantly decrease the expected total welfare, as the item will sometimes not be allocated.
This happens when all bidders (or at least the highest one in lazy auctions) have values
below their reserve prices.

Example. To illustrate this decrease in welfare, we are going to consider a simple example.
There are n = 2 symmetric bidders with a value drawn uniformly over [0, 1]. Because
of the symmetry, the optimal reserve price is the same for both bidders hence lazy and
eager auctions coincide (and we do not need to specify the rule). In this simple case,
r ∗ = ψ −1 (0) = 1/2. As a consequence, the item is allocated as soon as one bidder bids above
1/2, which happens with probability 3/4. Otherwise, the item is not sold (which obviously
happens with probability 1/4). Simple computations show that changing the design from a
second-price auction without reserve price to a second-price auction with optimal reserve
price yields

• a 12,5 % decrease of the global welfare (from 2/3 to 7/12).

• a 50 % decrease of every single bidder’s utility (from 1/6 to 1/12).

28
• a 25 % increase of the seller’s revenue (from 1/3 to 5/12).

2.4.4 Myerson’s lemma and characterization of BIC and DSIC auctions

We mentioned before that the virtual value could (and maybe should) be understood as
a virtual payment in a single bidder auction and/or in lazy second price auctions. The
following lemma (that will be referred to as the “Myerson Lemma”) is a crucial result
(Myerson, 1981). It states that the virtual payment of a bidder is the correct quantity to
study in any incentive-compatible auctions, and not just lazy second-price ones. The proof,
while conceptually profound, is not very hard technically and requires integration by
part and Fubini’s theorem as we used in the case of the lazy second-price auction. This
result holds for any Bayesian Incentive Compatible auction (if bidding truthfully is weakly
dominant) that is 0-rational (if bidding 0 ensures a payment of 0). The latter assumption
can be weakened, at the cost of an additive constant in the payment formula.
Theorem 2.16. For any BIC and 0-rational auction, the expected payment of bidder i at
the truthful equilibrium is
Exi ∼Fi [Pi (xi )] = Exi ∼Fi [ψi (xi )Qi (xi )] = Ex∼F [ψi (xi )qi (x)] .
Here Qi (xi ) is the winning probability of bidder i at the truthful equilibrium given his
value xi and qi (x) is the probability that the item is attributed to bidder i when the values
are x = (x1 , . . . , xn ).
Proof. Let us consider only the truthful equilibrium of the BIC auction. In particular, we
assume that all bidders except possibly i bid their values, i.e., they bid truthful. Let us
call Qi (z) the probability that bidder i bidding z wins the auction (when all other bidders
bid truthful) and Pi (z) his expected payment (when all other bidders bid truthful). The
expected utility of bidder i when he has value xi and bids z is simply Qi (z)xi − Pi (z). By
definition of Bayesian incentive-compatibility, at the truthful equilibrium the auction must
verify:
∀z ∈ R+ , ∀xi ∈ Xi , Qi (xi )xi − Pi (xi ) ≥ Qi (z)xi − Pi (z) ,
since at the truthful equilibrium bidder i gets maximum utility by bidding his value. Thus,
if we still denote by Vi (xi ) the expected utility of bidder i when he has value xi ,
Vi (xi ) = max Qi (z)xi − Pi (z) = xi Qi (xi ) − Pi (xi ) .
z
As a consequence, Vi is a convex mapping (the maximum of affine mappings) and therefore
is differentiable almost everywhere and absolutely continuous (i.e., it is equal to the integral
of its derivative).
Bayesian-incentive compatibility also implies that
∀z ∈ Xi , Vi (z) = max zQi (t) − Pi (t) ≥ zQi (xi ) − Pi (xi ) = Vi (xi ) + Qi (xi )(z − xi ) .
t

29
Since Vi is convex, this means that Qi (xi ) belongs to the subdifferential of Vi at xi , i.e.,

Qi (xi ) ∈ ∂Vi (xi ) ,

and ∇Vi (xi ) = Qi (xi ) if Vi differentiable at xi . Therefore, using Theorem D.2.3.4 in (Hiriart-
Urruty and Lemaréchal, 2001),
Z xi
Vi (xi ) − Vi (0) = Qi (z)dz ,
0

and since Vi (xi ) = xi Qi (xi ) − Pi (xi ),

Z xi
Pi (xi ) = Pi (0) + Qi (xi )xi − Qi (z)dz .
0

Taking expectation over xi now gives:

Z
Exi ∼Fi [Pi (xi )] = Pi (xi )fi (xi )dxi
xi
Z Z xi
= Pi (0) + Qi (xi )xi − Qi (z)dz fi (xi )dxi
x 0
Zi Z Z ∞
= Pi (0) + Qi (xi )xi fi (xi )dxi − Qi (z) fi (xi )dxi dz (Fubini)
x z z
Z i
1 − Fi (xi )

= Pi (0) + xi − Qi (xi )fi (xi )dxi
xi fi (xi )
= Pi (0) + Exi ∼Fi [ψi (xi )Qi (xi )] .

The result follows from the fact that Pi (0) = 0 since the auction is 0-rational.
The last equality comes from the fact that Qi (xi ) = Ex∼F [qi (x)|xi ] and the tower property
of conditional expectations.

Remark. Theorem 2.13 is a direct consequence of this result with the specific choice of
Qi (xi ) = Gi (xi )1{xi ≥ ri }.

Myerson’s lemma indicates that the expected payment of a 0-rational BIC auction
only depends on the allocation rule and the virtual value; the proof actually gives a
characterization of any incentive-compatible auction. However, this characterization is
slightly different for BIC and DSIC auctions.

Corollary 2.17 (Myerson, 1981). Using the notations of Theorem 2.16, an auction is
0-rational and BIC if and only if

i) the allocation rule is monotone, i.e., the probability of winning, as a function of the
bid, is non-decreasing (for any fixed bids of others bidders) and

30
ii) the expected payment verifies
Z xi
Pi (xi ) = Qi (xi )xi − Qi (z)dz.
0

Remark. Using the fact that Qi (xi ) = Ex∼F [qi (x)|xi ], we see that given an allocation rule
qi (x), the expected payment requirement can be fulfilled by the requiring, auctionR x by auc-
tion, an expected payment, given the vector of bids/values x, of pi (x) = xqi (x) − 0 qi (x)dxi .
(In the last integral all the bids x−i are fixed and the integral is performed over xi which
varies from 0 to x.)

Proof. The proof of Theorem 2.16 gives the first implication. For the reverse, let us assume
that all bidders except i bid truthfully; and let us show that i has an incentive to also bid
truthfully. This will show that truthful bidding constitutes a Nash equilibrium and hence
the auction is BIC.
Note that because we have assumed that all other bidders bid truthfully, if bidder i
bids z, the probability that he wins is Qi (z). Hence, the expected utility derived by bidder i
when bidding z and his value is xi is
Zz
Ui (z, xi ) = xi Qi (z) − Pi (z) = (xi − z)Qi (z) + Qi (t)dt .
0

The second equality comes from assumption ii). Let us call bi∗ , the optimal bid of bidder i.
Let us now show that bi∗ = xi . To do so, we simply need to establish that
Z xi Zz
∀z ∈ R+ , Ui (xi , xi ) = Qi (t)dt ≥ (xi − z)Qi (z) + Qi (t)dt .
0 0
This is equivalent to showing that
Z xi
∀z ∈ R+ , Qi (t)dt ≥ (xi − z)Qi (z) .
z

If z ≤ xi , since Qi is non-decreasing, Qi (t) ≥ Qi (z) on [z, xi ] and hence

Z xi
Qi (t)dt ≥ (xi − z)Qi (z) .
z

If z ≥ xi , since Qi is non-decreasing, Qi (t) ≤ Qi (z) on [z, xi ] and hence

Zz
Qi (t)dt ≤ (z − xi )Qi (z) .
xi

Multiplying the previous inequality by (−1) on both sides shows that if z ≤ xi , we also have
Z xi
Qi (t)dt ≥ (xi − z)Qi (z) .
z

31
So we have shown that
∀z ∈ R+ , Ui (xi , xi ) ≥ Ui (z, xi ) .
Therefore, bidding truthfully is an optimal strategy for bidder i and the auction is BIC.

Corollary 2.18 (Myerson, 1981). An auction is DSIC if and only if

i) the allocation rule is monotone and

ii) the payment of the winning bidder is the minimum bid guaranteeing that he would
still have won the auction.

Given a monotone allocation rule and assuming 0-rationality, the payment rule is unique.

Proof. The proof is almost identical, one just needs to make the various computations
pointwise (for any vector x−i ) instead of in expectation.

This characterization can be extended to very general mechanisms (Archer and Tardos,
2001).

2.4.5 The Myerson auction: revenue maximization for BIC auctions

After having established the Myerson lemma, it is now possible to derive the revenue-
maximizing auction among all BIC auctions.

Definition 2.19. The Myerson auction, for regular value distribution Fi with associated
virtual value ψi , is defined by the two following rules:

Allocation rule: Given the bids b = (b1 , . . . , bn ), the winner is the bidder with the highest
non-negative virtual value ψi (bi ), i.e.,
n o
qi (b) = 1 i = arg max ψj (bj ) ; j s.t. ψj (bj ) ≥ 0

with the convention that if all virtual values are negative, then the item is not
allocated and qi (b) = 0. Ties are broken arbitrarily.

Payment rule: If bidder i wins the auction, he pays

n o
pi (b) = max ψi−1 (0), ψi−1 max ψj (bj )
j,i

This auction amounts to running a second price auction with reserve prices 0 among
the virtualized bids ψk (bk ) and converting back this “virtual cost" in the original bid space
of the winner i through the function ψi−1 .

32
Theorem 2.20. If F1 , . . . , Fn are regular, the Myerson auction maximizes seller’s revenue
among all BIC and interim-IR auctions.
Proof. The Myerson auction is BIC as it verifies the condition of Corollary 2.17. Since ψi
are non-decreasing (as Fi are regular), the probability of winning is non-decreasing.
To show individual-rationality, we remark that since the auction is BIC,
Z xi Z xi
Vi (xi ) = Vi (0) + Qi (si )dsi = −Pi (0) + Qi (si )dsi ≥ 0,
0 0
because Pi (0) = 0. Thanks to Myerson’s lemma, Theorem 2.16, the payment of each BIC
auction is equal to
Pi (0) + Exi ∼Fi [ψi (xi )Qi (xi )] .
The Myerson auction maximizes the two terms of this expression since for any rational
auction, Pi (0) ≤ 0. Since the winner in the Myerson auction is the bidder who verifies
ψi (xi ) = max ψj (xj ) ,
j∈S

and the item is not allocated when all ψi are negative, the second term, is also maximized
pointwise. Indeed, note that we can rewrite this second term
Exi ∼Fi [ψi (xi )Qi (xi )] = Ex∼F [hψ(x), q(x)i] ,
where h·, ·i is the standard inner product.

Corollary 2.21. In the symmetric case, the second-price auction with reserve prices set to
monopoly prices is the revenue-maximizing auction.
Remark. The seller can increase her revenue if the mechanism is only required to satisfy
ex-ante rationality instead of interim rationality, as shown in (Cremer and McLean, 1988).
Indeed, there exists a BIC auction that is ex ante individually rational that accomplishes
full-surplus extraction for the seller. In other words, the utility of bidders in this auction is
equal to zero. This auction is not interim individually-rational since the expected utility
when the bidder’s value is zero is strictly negative. This setting of ex-ante individual
rationality only makes sense when bidders have to decide to take part in the auction before
understanding their value for the item. We shall come back in more details to this setting
in Section 4.2.2.

2.4.6 Generalization of optimality result

Non Incentive-Compatible mechanism: the revelation principle
Myerson’s optimality result can be extended to any incentive-compatible auctions, as long
as there is a Nash equilibrium between bidders.

33
Theorem 2.22. Given a mechanism and a specific Nash equilibrium for this mechanism,
there exists another BIC mechanism where the bidders’ expected utility and seller’s revenue
at the truthful equilibrium are equal to the ones at the original Nash equilibrium.

Proof. Consider a mechanism and β = (β1 , . . . , βn ) a profile of strategies that is a Nash

equilibrium. This mechanism is defined by an allocation rule q : B → Rn and a payment
rule p : B → Rn . The mechanism (q ◦ β, p ◦ β) is then clearly BIC and bidding truthful
generate the same bids distributions, allocation and payment as in the original mechanism;
hence utilities and revenue are unchanged.

Corollary 2.23. If value distributions are regular, the Myerson auction is the revenue-
maximizing mechanism among all individually-rational mechanisms which have a Nash
equilibrium.

Proof. This is a direct application of the revelation principle.

We now extend the Myerson auction to cases where value distributions are not regular.

Non-regular distribution: the ironing technique

If Fi ’s are not regular, the Myerson auction is not always defined as ψi ’s may not be invert-
ible. From an allocation standpoint, since ψi is not necessarily increasing, the allocation
rule may not be monotone. Hence a bidder might have incentive to “shade" (or lower) his
bid to increase his virtual bid and his probability of winning. Non-regular distributions
are not uncommon, as a mixture of two distributions typically is not regular (for instance
the mixture of two uniforms, Gaussians, etc...). It is therefore crucial to adapt the Myerson
mechanism to non-regular distributions. The canonical way is to define a slightly different
allocation rule based on a modified virtual value (Myerson, 1981) called the ironed virtual
value.
One key consequence ofR having a non-decreasing virtual value is that the monopoly
∞
revenue Π(r) = r(1 − F(r)) = r ψ(v)f (v)dv is “almost” concave, in the sense that Π ◦ F −1 is
concave on [0, 1] with derivative −ψ ◦ F −1 (·). The ironing technique consists in replacing
Π ◦ F −1 , which is not necessarily concave, by its concavification, a.k.a., its least concave
majorant.

Definition 2.24. For a function h defined on some set E ⊂ Rd , we call cav(h) is the concavi-
fication of the function h, which is its smallest concave majorant, i.e., the smallest concave
function above h: its hypograph is the convex hull of the hypograph of h. Moreover, this
function is defined pointwise as

cav (h) (x) = sup Eµ [h(z)] ; µ is a probability distribution on E such that Eµ [z] = x

34
We refer to (Rockafellar, 1970), p. 36, (Hiriart-Urruty and Lemaréchal, 2001) pp.98-102
and (Groeneboom and Jongbloed, 2014) pp.55-57 for properties of least concave majorant,
greatest convex minorant and convex hull of functions. In particular, if h is bounded and
attains its maximum, cav (h) has the same maximum attained (at least) on the convex hull
of the set of maximizers of h. Moreover, h and cav (h) are equal on the extreme points of the
definition set of h; this implies that if h is defined on [a, b], then necessarily cav (h) (a) = h(a)
and cav (h) (b) = h(b).
We can now define the ironed virtual value.
Definition 2.25. For any non-regular distribution F, the ironed virtual value of ψ, denoted
by ψ̃ is defined by

ψ̃(x) = ∂ − cav(Π ◦ F −1 ) (F(x)) , where ∂ denotes the subdifferential.

In general, the concavification of a function g is either equal to g at some point or linear

on some interval otherwise. Luckily enough, the ironed virtual value has a closed form on
intervals where it is not equal to the virtual value.
Lemma 2.26. Assume that for some α < β, it holds that
– cav(Π ◦ F −1 )(F(α)) = Π(α),

– cav(Π ◦ F −1 )(F(β)) = Π(β)

– cav(Π ◦ F −1 )(F(x)) > Π(x) for x ∈ (α, β)

then the ironed virtual value of ψ on (α, β) is constant and equal to
Rβ
ψ(t)f (t)dt Π(α) − Π(β) α(1 − F(α)) − β(1 − F(β))
∀x ∈ (α, β) , ψ̃(x) = α = = .
F(β) − F(α) F(β) − F(α) F(β) − F(α)
Proof. By definition of the concavification, cav(Π ◦ F −1 ) is linear on [F(α), F(β)] and the
results come from linear interpolation.

We refer to (Fu, 2016) for more technical details. In particular, the ironed virtual value,
since it is defined as a sub-differential, is not a function but a multi-valued mapping.
On the other hand, selecting the aforementioned ψ̃(x) as ψ(x) when the sub-differential
is not reduced to a singleton is also perfectly valid and implicitly used as convention
from now on. With this latter expression, either ψ̃ is equal to ψ or it is constant on some
interval around x. It is non-decreasing everywhere and intervals where ψ is decreasing are
“flattened”, as illustrated in Figure 2.2.
Recall that the purpose of ironing is to replace the - possibly somewhere decreasing -
virtual value function ψ in the Myerson auction (that might then not be BIC) by ψ̃. We now
show that ironing the virtual value does not decrease the revenue of the Myerson auction.

35
Concavification of the revenue curve Original and ironed virtual values
1.0
0.15 ψ(x)
ψ̃(x)
0.5
0.10
Π ◦ F −1

ψ(x)
0.0
0.05
cav(Π ◦ F −1)
Π ◦ F −1 −0.5
0.00
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
p x

Original and ironed monopoly revenue curves

ironed Π(x)
0.15
original Π(x)

0.10
Π(x)

0.05

0.00
0.0 0.2 0.4 0.6 0.8 1.0
x

Figure 2.2: Left: Concavification of Π ◦ F −1 for a mixture of U(0, 0.5) and U(0, 1). Right: Associated virtual
value (plain black) and ironed one (dashed orange). Center: Original (plain black) and ironed (dashed orange)
monopoly revenue.

Lemma 2.27. The payment of bidder i at the truthful equilibrium of any BIC auction
satisfies h i
Exi ∼Fi [Pi (xi )] ≤ Pi (0) + Exi ∼Fi Qi (xi ) ψ̃i (xi ) ,
where ψ̃i is his ironed virtual value.

Our assumptions that fi > 0 almost everywhere and hence Fi is increasing is important;
so is the assumption that Fi has 1 moment, which implies that ψi (xi ) and ψ̃i (xi ) have
finite mean. Note that this assumption is really minimal as it just means that the expected
payment under Qi (xi ) = 1 is finite.
The papers (Myerson, 1981) and (Fu, 2016) implicitly assume differentiability of Qi
in the previous lemma without stating it explicitly. In the case of differentiable Qi the
proof boils down to integration by parts applied twice and the fact that cav (h) ≥ h for
any function h. At the level of generality of our statement, which is needed for the most
important applications, it is more technical and we give the proof in Subsection 2.6.4.

36
We now generalize Theorem 2.20 to the case of non-regular value distributions.

Theorem 2.28 (Myerson, 1981). With general value distribution, in a revenue-maximizing

auction, the seller allocates the item to the bidder with the highest non-negative ironed
virtual value ψ̃i (xi ), ties broken at random, with the payment rule of Corollary 2.17.

Example. We consider the case of the Myerson auction with symmetric players but non-
regular value distributions in Subsection 2.6.3, where we derive the payment and allocation
rules. With non-regular value distributions the Myerson auction in the symmetric case is
not a second price auction with reserves anymore.

2.4.7 Reserve prices in first price auctions

We start with an abstract result that we then apply to first price auctions.

Proposition 2.29. Suppose bidder i participates in an auction such that the probability of
winning the auction when bidding b is Gi (b) and the corresponding expected payment is
Pi (b). Let βi an optimal strategy for bidder i in this setup that maximizes his utility. Then
the seller revenue coming from bidder i is
h i
Exi ∼Fi [Pi (βi (xi ))] − Pi (βi (0)) = Exi ∼Fi Gi (βi (xi ))ψi (xi ) , (2.4)

where ψi is the virtual value associated to Fi .

Remark. This proposition helps explaining how our interpretation of the ψi (xi ) as a virtual
payment makes sense for utility-maximizing bidders in general and not only in the case of
BIC or DSIC auctions we encountered previously. It can also be seen as a more quantitative
version of the revelation principle.

Proof. Note that since bidder i tries to maximize his utility,

βi (xi ) = argmax xi Gi (b) − Pi (b) .

Let us call
W (x) = xGi (βi (x)) − Pi (βi (x))
The envelope theorem in the form of Theorem 2 of Milgrom and Segal, 2002 applies since
0 ≤ Gi (t) ≤ 1 for all t and hence
Zx
W (x) − W (0) = Gi (βi (t))dt .
0
Therefore, the expected payment satisfies
Z x
Pi (βi (x)) − Pi (βi (0)) = xGi (βi (x)) − Gi (βi (t))dt .
0

37
Taking expectation with respect to xi with cdf Fi in the previous equation gives
Z∞ Z ∞Z ∞
Exi ∼Fi [Pi (βi (xi ))] − Pi (βi (0)) = xGi (βi (x))fi (x)dx − fi (x)1{t ≤ x}Gi (βi (t))dt dx ,
0 0 0
Z∞ Z∞
= xGi (βi (x))fi (x)dx − (1 − Fi (t))Gi (βi (t))dt ,
0 0
h i
= Exi ∼Fi Gi (βi (xi ))ψi (xi ) .

In the case of first price auctions, we have Pi (βi (xi )) = βi (xi )Gi (βi (xi )), since the proba-
bility of winning is Gi (βi (xi )). So the arguments given in the proof above also implies the
following result (Kirkegaard, 2009): the optimal strategy for bidder i when the top bid of
the competition has cdf Gi and Pi (βi (0)) = 0 satisfies
Rx
Gi (βi (t))dt
βi (x) = x − 0 . (2.5)
Gi (βi (x))
Rx
The expected utility of player i at x is then equal to Ui (βi (x), x) = 0 Gi (βi (t))dt and it
h i h i
also holds that Exi ∼Fi Ui (βi (xi ), xi ) = Exi ∼Fi Gi (βi (xi ))(xi − ψ(xi )) .

Corollary 2.30. In a first price auction, when bidder i uses the strategy implicitly defined
in Equation (2.5), the seller revenue coming from bidder i is
h i
Exi ∼Fi Gi (βi (xi ))ψi (xi ) , (2.6)

where ψi is the virtual value associated with the value distribution of xi .

The corollary follows from Proposition 2.29 after noticing that Pi (βi (0)) = βi (0)Gi (βi (0)) =
0 when bidder i uses βi defined in Equation (2.5).

Good or optimal reserve prices Proposition 2.29 and Corollary 2.30 suggest that from
a seller revenue standpoint it would be good to avoid bids corresponding to values that
have negative virtual values. In other words, setting individual reserve values at ψi−1 (0) for
bidders may have positive impact for seller revenue. However, this interpretation ignores
the impact of setting such reserve values on the strategic response of the bidder that is
optimizing his utility.
Finding optimal reserve prices for first price auctions is much more complicated than
finding them for the second price auctions, even with n = 2 symmetric bidders (Kotowski,
2018) for the following two reasons. First, if the distribution F is not regular and its density
is discontinuous at the monopoly price, giving two different reserve prices to the two
symmetric buyers actually increases the seller revenue at equilibrium, at least when r1

38
and r2 are close. Second, in the specific case where those two reserve prices r1 and r2 are
sufficiently close, the equilibrium bid distribution of the player with the lower reserve
price becomes discontinuous.
Many further difficulties arise when the seller tries to learn good reserve prices for first
price auctions from data and does not have access to the bidders’ value distributions. We
further detail them in Section 3.4.4.

2.5 Prior-independent optimal auctions

In the previous section, we derived the revenue-maximizing auction given the prior of the
seller on the possible valuations of the different bidders. In some cases, the seller does not
have access to a reliable prior and she has to choose an auction without this information.
An associated challenge is to understand what auctions are robust to a lack of knowledge
about (or some mis-specification in) the bidders’ value distributions.
The following theorem shows that a basic second-price auction without reserve price
- referred to henceforth as the Vickrey auction - with just one extra bidder, leads to a
higher revenue than the optimal Myerson auction (Bulow and Klemperer, 1996). Only the
symmetric case is considered, so that the comparison of two auctions with different set of
bidders is possible.

Theorem 2.31 (Bulow and Klemperer, 1996). In the symmetric setting and with regular
distributions, the revenue of the Myerson auction with n bidders is lower than the revenue
of the Vickrey auction with n + 1 bidders.

Expected revenue function of the number of bidders

0.8
Expected revenue

0.6

0.4

Myerson auction with n bidders

0.2 Vickrey auction with n bidders
Vickrey auction with n + 1 bidders
0.0
1 2 3 4 5 6
Number of bidders

Figure 2.3: Illustration of the Bulow-Klemperer theorem for the case with value distributions U([0, 1]).

Before proving the Bulow-Klemperer theorem, we introduce a useful lemma.

39
Lemma 2.32. In the symmetric case and with regular distributions, the Vickrey auction is
revenue-maximizing in the class of individually-rational auctions where the item is always
attributed.

Proof. Based on the revelation principle, we shall only consider incentive-compatible

auctions. Theorem 2.16 then applies. Since the item must always be attributed, in order
to maximize the seller’s revenue, it should be allocated to the bidder with the highest
virtual value, regardless of whether the highest virtual value is non-negative or not. This
is a consequence of the proof of Theorem 2.16. Since the distributions are regular and
identical, the bidder with the highest virtual value is also the bidder with the highest value.
As a consequence, the mechanism we just described is exactly the Vickrey auction.

The proof of the Bulow-Klemperer theorem can now be derived from the previous
lemma.

Proof. Let us assume that there are n + 1 bidders, and consider the following mechanism.
First, the seller runs a Myerson auction on n bidders (chosen arbitrarily). If the item is
not allocated by the Myerson auction, it is allocated for free (i.e., without any payment)
the (n + 1)-th bidder. The revenue of this auction is equal to the revenue of the Myerson
auction with n bidder, yet it is an auction that always allocate an item amongst n+1 bidders.
Lemma 2.32 implies that the revenue of this auction is smaller than the one of the Vickrey
auction. This gives the result.

As a consequence, it is more interesting, from a seller revenue maximization standpoint

to have one more bidder in the auction than to implement a complex mechanism. This
is the reason why one of the first recommendations of most economists dealing with
institutions organizing auctions is to maximize the competition before implementing
complex mechanisms (Bulow and Klemperer, 1996; Milgrom, 2004). This is sometimes
referred to as the Wilson doctrine.

Nonetheless, let us assume that a specific mechanism has been chosen, independently
of the value distributions (which are unknown in this setup). A crucial question that
remains is the evaluation of this specific choice of mechanism in the worst-case analysis.
We will restrict ourselves to DSIC auctions. We will use the notion of competitive ratio
defined as the infimum, over all possible value distributions, of the revenue of this auction
divided by the optimal revenue of the Myerson auction for these distributions.
The competitive ratio is obviously smaller than 1, and the bigger the better. Unfortu-
nately, if the class of value distributions is not restricted when computing this infimum,
there does not exist any auction with a positive competitive ratio (Allouah and Besbes,

40
2020). As a consequence, we will use different types of restrictions to achieve non-trivial ap-
proximation results. Based on the Bulow-Klemperer theorem, we can derive some revenue
guarantees on the second price auction without reserve price.

Corollary 2.33. The Vickrey

auction with n symmetric bidders and regular value distri-
n−1
butions is at least a n -approximation of the Myerson auction.

Proof. We denote respectively by R(Vickreyn ) and R(Myersonn ) the expected revenue of

the seller in the Vickrey or Myerson auction with n bidders and by Pi (Vickreyn ) and
Pi (Myersonn ) the expected payment of the bidder i in those auctions (so that R(·) =
Pn
i=1 Pi (·)).
Let us first prove that Pi (Myersonn ) ≥ Pi (Myersonn+1 ).
By symmetry and with regular distributions, there exists a unique monopoly price r ∗
which is independent of the number of players. Note that ψ(x) ≥ 0 when x ≥ r ∗ . Further-
more, in the symmetric case, the probability that bidder i wins the auction when facing k
competitors is just F k (x). Hence,
Z
Pi (Myersonn ) = ψ(xi )F n−1 (xi )f (xi )1{xi ≥ r ∗ }dxi
Z
≥ ψ(xi )F n (xi )f (xi )1{xi ≥ r ∗ }dxi

= Pi (Myersonn+1 ),

which proves the inequality.

Then, based on the Bulow-Klemperer theorem,
1
R(Vickreyn ) ≥ R(Myersonn−1 ) = (n − 1)Pi (Myersonn−1 ) ≥ (1 − )R(Myersonn )
n
which gives the result.

We proved in Corollary 2.33, through the Bulow-Klemperer theorem that the compet-
itive ratio of the Vickrey auction is at least 0.5, when the distributions are restricted to
regular ones. Interestingly, it is possible to do better with a slightly different auction.

Theorem 2.34 (Fu et al., 2015). There exists an incentive-compatible auction with a
competitive ratio of 0.512 against regular value distributions.

The mechanism considered was the first with a higher competitive ratio than 0.5 against
regular distributions. It is a slight modification of the Vickrey auction where the seller
inflates the second highest bid. Formally, the mechanism is the following: with probability
1 − ε, a second price auction without reserve price is run. With the probability ε, the
mechanism allocates the object to the bidder with the highest valuation, but only if his

41
valuation is greater than 1 + δ times the valuation of the second highest bidder and pays
(1 + δ) the second highest bid. Otherwise, the mechanism does not allocate the item.
The idea behind this theorem is the following. The Bulow-Klemperer bound of 0.5 on
the competitive ratio is rather tight for regular distributions that would induce a high
optimal reserve price. On the other hand, it is rather loose for regular distribution with a
low associated reserve price. Inflating the second highest bid has a positive effect for the
former type of distribution (as it somehow emulates a high reserve price) and a negative
effect for the later type (because it induces some reserve price, bigger than what it should
be). As a consequence, the ratio of revenues increases in the first case, and decreases in the
second one; but thanks to the looseness in the Bulow-Klemperer bound of 0.5, the infimum
globally increases.
On the other hand, when restricted to MHR distributions, the ratio is equal to 0.7153
and this result is tight.
For regular distribution, this result was then improved up to 0.519 (Allouah and Besbes,
2020) and further improved in (Hartline et al., 2020) which identified the optimal prior-
independent mechanism. The optimal mechanism is a mixture between a second price
auction and the same auction where the prices are scaled up by a factor of about 2.5. The
authors find the worst-case family of distributions and use these distributions to derive
the optimal mechanism and solve the problem.

2.6 Advanced material: non-unicity of Nash equilibria and re-

lated complications

2.6.1 The case of 2nd price auctions

As we have explained while presenting them, second price auctions are DSIC, and hence
there exists a truthful equilibrium.
There however exist many other equilibria such as the following one. Suppose for
concreteness that all the value distributions of the bidders are supported on [0, 1]. Suppose
now that every bidder always bids 0 except one of them - say, bidder 1 - bids arbitrarily
high, say 1. Clearly for bidder 1 this is a best response to having a competition of 0 since
he wins all auctions and pays 0. For bidders other than 1, winning entails bidding more
than 1 and paying 1; but their values are less than 1. So, the utility of winning any auction
is non-positive and negative as soon as their value is strictly less than 1 and the maximum
utility they can expect is 0. And this is achieved by many strategies but in particular by
bidding 0 all the time, which is then clearly a best response.

42
2.6.2 No revenue equivalence when β is only non-increasing
We are grateful to an anonymous referee for bringing up this example while discussing
Theorem 2.8. Suppose bidders have the same value distribution on say [0, 1] and that the
value distribution admits a density.
Consider the following 0-rational auction with standard allocation rule, i.e., the winner
is the highest bidder; ties are broken at random among top bidders. Bidders pay 0 when
they bid 0 and pay 1 otherwise. Note that for any bidder bidding x > 0 results in non-
positive expected utility: either they win, and their utility is negative or they lose and their
utility is 0. Hence an equilibrium is for all bidders to bid 0. This equilibrium is symmetric.
The seller’s expected revenue is therefore 0. However, Theorem 2.8 states that the expected
payment for this type of auctions is independent of the payment rule at an increasing
symmetric equilibrium. The issue in this very interesting example is that the symmetric
equilibrium strategy described here is not increasing: it is in fact constant, since it maps all
values to 0. Looking at the proof of Theorem 2.8 it is clear that the expected utility Ui (z, xi )
is then not xi Gi (z) − Pi (z) - as was key to the proof. The utility when bidding b when the
other players use this strategy is (xi − 1)1{b > 0} and n1 xi if b = 0.

2.6.3 Example: the Myerson auction in the symmetric case with non-regular value
distribution
We consider the symmetric case where the n independent bidders still have a value
distribution denoted by F with a density f with f > 0. In particular, with probability 1 the
values they draw are all different. For simplicity we suppose that there is a single interval
[α, β] on which ψ requires ironing. In this case, using the remark following Corollary 2.17,
an optimal auction is the following: suppose bidder i value is such that ψ̃(xi ) ≥ maxj {ψ̃(xj )}
and ψ̃(xi ) ≥ 0, so that he might win the auction. As before, we call ψ̃ −1 (0) = inft {t : ψ̃(t) ≥ 0}.

1. if maxj,i ψ̃(xj ) < 0, bidder i wins the auction and pays ψ̃ −1 (0). For the other cases
below, we assume that maxj,i ψ̃(xj ) ≥ 0.

2. if maxj,i xj > β, then bidder i wins the auction and pays second price i.e., maxj,i xj .

3. if maxj,i xj < α, then bidder i wins the auction and pays maxj,i xj , i.e., second price.

4. if maxj,i xj ∈ (α, β), let us call K the number of bidders in i’s competition who have
xj ∈ (α, β). Then two situations arise:
β−α
4a) either xi > β, in which case bidder i wins the auction and pays β − K+1 . An
equivalently payment scheme would be to draw K i.i.d. uniform random vari-
ables uk on [α, β] and to charge their maximal value Y = max1≤k≤K uk , since
β−α
E[Y ] = β − K+1 ;

43
4b) or xi ∈ (α, β) in which case the winner is chosen uniformly at random among
the K + 1 bidders having value in (α, β) - and hence having the same virtualized
bid ψ̃(xk ). When bidder i wins, he pays α.

Recall that when ψ is regular, the optimal auction is a second price auction with
monopoly reserve.

2.6.4 Proof of Lemma 2.27

Proof. Consider a BIC auction. According to Theorem 2.16, at the truthful equilibrium,
Exi ∼Fi [Pi (xi )] = Pi (0) + Exi ∼Fi [Qi (xi ) ψi (xi )] .
Furthermore, since the auction is BIC, Corollary 2.17 implies that Qi must be non-
decreasing. Almost by definition we also have 0 ≤ Qi (xi ) ≤ 1 since Qi (t) is the probability
that bidder i wins the item in the auction when bidding t.
So the Lemma will be shown is we can show that
h i
I(Qi ; ψi ) = Exi ∼Fi [Qi (xi ) ψi (xi )] ≤ Exi ∼Fi Qi (xi ) ψ̃i (xi ) = I(Qi ; ψ̃i )
The papers Myerson, 1981 and Fu, 2016 implicitly assume differentiability of Qi without
stating it explicitly. We give a rigorous proof without this assumption, but the proof is a
bit technical.
Let us call xk,N = infx {x : Qi (x) ≥ k/N } for 0 ≤ k ≤ N and by definition xN +1,N = ∞;
recall that 0 ≤ Qi (x) ≤ 1. Note that because Qi is non-decreasing, {x : Qi (x) ≥ k/N } is a
semi-infinite interval.
Let us call
N
1X
Qi,N (t) = 1{t ≥ xk,N } .
N
k=1
Suppose (xk,N , xk+1,N ) is not empty, so that if x ∈ (xk,N , xk+1,N ), then k/N ≤ Qi (x) ≤ (k +1)/N .
This yields that
k 1
∀k ∈ {0, . . . , N − 1} , ∀x ∈ (xk,N , xk+1,N ) , |Qi (x) −
|≤ .
N N
Note that if (xk,N , xk+1,N ) is empty the statements are logically valid as the empty set has
all universal properties. Now since Qi is non-decreasing and bounded by 1, it has at most
N jump discontinuities of size greater than 1/N . We conclude that under Fi , which has a
density, the measure of the set where |Qi − Qi,N | > 1/N is 0. In particular, it is a (possibly
empty) subset of ∪N k=1 {xk,N } as our previous results show.
This implies that
Z xk+1,N
1 xk+1,N
Z
k
ψi (x)fi (x)(Qi (x) − )dx ≤ |ψi (x)|fi (x)dx .
xk,N N N xk,N

Recall that we want to prove

h i
Exi ∼Fi [Qi (xi ) ψi (xi )] ≤ Exi ∼Fi Qi (xi ) ψ̃i (xi )

i.e., Z ∞ Z ∞
Qi (x)ψi (x)fi (x)dx ≤ Qi (x)ψ̃i (x)fi (x)dx
0 0
We have
Z ∞ N Z
X xk+1,N
Qi (x)ψi (x)fi (x)dx = Qi (x)ψi (x)fi (x)dx
0 k=0 xk,N
N Z xk+1,N N Z xk+1,N
X k X k
= ψi (x)fi (x)dx + ψi (x)fi (x)(Qi (x) − )dx
xk,n N N
k=0 k=0 xk,N
N N Z xk+1,N
X k X k
= Π(xk,N ) − Π(xk+1,N ) + ψi (x)fi (x)(Qi (x) − )dx
N xk,N N
k=0 k=0
N N Z xk+1,N
1X X k
= Π(xk,N ) + ψi (x)fi (x)(Qi (x) − )dx
N xk,N N
k=0 k=0
N Z ∞
1 X 1
≤ Π(xk,N ) + |ψi (x)|fi (x)dx
N N 0
k=0
N Z ∞
1 X 1
≤ Π(xk,N ) + xfi (x) + (1 − F(x)))dx
N N 0
k=0
N
1X 2Exi ∼Fi [xi ]
≤ Π(xk,N ) +
N N
k=0

Using the exact same argument for Π̃ defined above, which is a primitive of −ψ̃f that

45
upperbounds Π, we finally get
∞ N
2Exi ∼Fi [xi ]
Z
1X
Qi (x)ψi (x)fi (x)dx ≤ Π(xk,N ) +
0 N N
k=0
N
1 X 2Exi ∼Fi [xi ]
≤ Π̃(xk,N ) +
N N
k=0
Z∞
6Exi ∼Fi [xi ]
≤ Qi (x)ψ̃i (x)fi (x)dx +
0 N
Hence,
Z ∞ Z ∞ 6Exi ∼Fi [xi ]
Qi (x)ψi (x)fi (x)dx − Qi (x)ψ̃i (x)fi (x)dx ≤ .
0 0 N
As the left-hand side does not depend on N we can take the limit as N → ∞ to conclude
that Z∞ Z∞
Qi (x)ψi (x)fi (x)dx − Qi (x)ψ̃i (x)fi (x)dx ≤ 0 .
0 0
This concludes the proof.

References
Allouah, A. and O. Besbes. 2020. “Prior-independent optimal auctions”. Management
Science. 66(10): 4417–4432.
Archer, A. and É. Tardos. 2001. “Truthful mechanisms for one-parameter agents”. In:
Proceedings 2001 IEEE International Conference on Cluster Computing. IEEE.
Arnosti, N., M. Beck, and P. Milgrom. 2016. “Adverse selection and auction design for
internet display advertising”. American Economic Review. 106(10): 2852–66.
Balseiro, S. R., O. Candogan, and H. Gurkan. 2020. “Multistage Intermediation in Display
Advertising”. Manufacturing & Service Operations Management.
Boyd, S. and L. Vandenberghe. 2004. Convex Optimization. USA: Cambridge University
Press. isbn: 0521833787.
Bulow, J. and P. Klemperer. 1996. “Auctions Versus Negotiations”. The American Economic
Review. 86(1): 180–194.
Cremer, J. and R. P. McLean. 1988. “Full extraction of the surplus in Bayesian and dominant
strategy auctions”. In: Econometrica: Journal of the Econometric Society. JSTOR. 1247–
1257.
Feng, Z., S. Lahaie, J. Schneider, and J. Ye. 2021. “Reserve Price Optimization for First
Price Auctions in Display Advertising”. International Conference on Machine Learning:
3230–3239.

46
Fibich, G. and A. Gavious. 2003. “Asymmetric First-Price Auctions: A Perturbation Ap-
proach”. Mathematics of Operations Research. 28(4): 836–852.
Fibich, G. and N. Gavish. 2012. “Asymmetric First-Price Auctions—A Dynamical-Systems
Approach”. Mathematics of Operations Research. 37(2): 219–243.
Fu, H. 2016. “Notes on Myerson’s Revenue Optimal Mechanisms”. http://fuhuthu.com/
notes/iron.pdf. Accessed: 2021-08-25.
Fu, H., N. Immorlica, B. Lucier, and P. Strack. 2015. “Randomization beats second price
as a prior-independent auction”. In: Proceedings of the Sixteenth ACM Conference on
Economics and Computation. 323–323.
Gayle, W.-R. and J. F. Richard. 2008. “Numerical Solutions of Asymmetric, First-Price,
Independent Private Values Auctions”. Computational Economics. 32(3).
Groeneboom, P. and G. Jongbloed. 2014. Nonparametric Estimation under Shape Constraints.
Cambridge University Press.
Hartline, J., A. Johnsen, and Y. Li. 2020. “Benchmark design and prior-independent op-
timization”. 2020 IEEE 61st Annual Symposium on Foundations of Computer Science
(FOCS): 294–305.
Hartline, J. D. et al. 2013. “Bayesian mechanism design”. Foundations and Trends® in
Theoretical Computer Science. 8(3): 143–263.
Hiriart-Urruty, J.-B. and C. Lemaréchal. 2001. Fundamentals of Convex Analysis. isbn: 978-
3-540-42205-1. doi: 10.1007/978-3-642-56468-0.
Kirkegaard, R. 2009. “Asymmetric first price auctions”. Journal of Economic Theory. 144(4):
1617–1635. issn: 0022-0531.
Kotowski, M. H. 2018. “On asymmetric reserve prices”. Theoretical Economics. 13(1): 205–
237.
Krishna, V. 2009. Auction Theory.
Lebrun, B. 1999. “First Price Auctions in the Asymmetric N Bidder Case”. International
Economic Review. (1).
Marshall, R., M. Meurer, J. Richard, and W. Stromquist. 1994. “Numerical analysis of
asymmetric first price auctions”. Games and Economic Behavior. (2). issn: 0899-8256.
Milgrom, P. 2004. Putting auction theory to work. Cambridge University Press.
Milgrom, P. and I. Segal. 2002. “Envelope theorems for arbitrary choice sets”. Econometrica.
70(2): 583–601.
Myerson, R. B. 1981. “Optimal auction design”. Mathematics of operations research. 6(1):
58–73.
Riley, J. G. and W. F. Samuelson. 1981b. “Optimal auctions”. The American Economic Review.
71(3): 381–392.
Rockafellar, R. T. 1970. Convex Analysis. Princeton University Press.
Vickrey, W. 1961. “Counterspeculation, auctions, and competitive sealed tenders”. In: The
Journal of finance. Vol. 16. No. 1. Wiley Online Library.

47
3 Repeated auctions from a seller’s standpoint
First read of this chapter, key concepts and ideas

This chapter focuses on the complexity/cost of learning optimal,

or good enough, mechanism using datasets of past values. It con-
tains two sections, that provide the theoretical material required
for this chapter and the following one, respectively Section 3.2 and
Section 3.6. The crucial results of this chapter are Theorem 3.5,
that describes how many samples are required in the symmetric
case (or with one bidder) to learn the optimal auctions; Corollary
3.18 then extends this result to the asymmetric case (combining
Theorem 3.16 and 3.17). Finally, maybe the most important claim
of this chapter is that eager second-price auctions with monopoly
prices are maybe the better compromise in efficiency vs. learning
cost, stated in Theorem 3.26.

3.1 Motivation
The first large-scale field experiment in production showed how engineers at Yahoo could
handle their huge datasets to learn an optimal reserve price per key word (Ostrovsky
and Schwarz, 2011). Bidders were assumed to be non-strategic and to bid truthfully on
their platform. In the Ebay case, as buyers are different from one auction to the other,
the seller knows that, with running an incentive-compatible auction, bidders will bid
truthfully. Hence, the online platform is able to learn an optimal reserve price per object
and derive to a revenue-maximizing objective. In these two examples, the seller has access
to samples from bidders’ past values and they aim at exploiting this information to learn a
revenue-maximizing auction. The value distributions encompass the variability of values
between bidders or between objects sold on the platform.
The emergence of this setting created numerous bridges between statistical learning
and auction theory (Bar-Yossef et al., 2002; Blum et al., 2004; Lavi and Nisan, 2004), the
former being used to estimate the quantities (e.g. value distributions) to compute solution
for the latter. This chapter casts some light on these links, and how far the underlying
problem of learning revenue-maximizing auctions has been tackled.

3.2 Statistical Learning Theory Tools for Revenue Maximization

We first start with a short reminder on statistical learning theory. After introducing the
different notions, we shall indicate to what they correspond in the auction setting.

48
A learner is given a set of hypothesis A: it is a set of possible auctions to run – e.g.,
second-price auctions with a set of possible reserve prices. She is also given a set of
observations ST = {x1 , . . . , xT }, sampled independently from the joint product distribution
F = F1 ⊗ . . . ⊗ Fn and belonging to a set of distributions D on a domain X . We emphasize
again that xt is a vector that corresponds to all bidder’s value and is sampled according
to a distribution F whose marginals corresponds respectively to every bidder’s value
distribution F1 , . . . Fn . For each value vector x ∈ X , and each IC auction a ∈ A, we denote by
ra (x) the revenue of the auction at the truthful equilibrium.
A classical assumption is to consider that for a given distribution F, there exists an
optimal hypothesis (i.e. an optimal auction or an optimal vector of reserve price). This
hypothesis is called the target hypothesis or optimal Bayes hypothesis. For the auction setting,
the optimal Bayes hypothesis is defined as

a∗A (F) = argmax R(a) where R(a) = Ex∼F [ra (x)] , (3.1)
a∈A

which is, by definition, the Myerson auction run on F if A represents the whole set of
auctions denoted by A. As it does not depend on a particular class of auctions, we simply
denote it a∗ (F). The practical objective of the learner is to optimize R(a) accessing only the
empirical distribution described by ST rather than the true distribution F.
A popular approach is to replace the true distribution F in Equation (3.1), by its
empirical counterpart. This is referred to as Empirical Revenue Maximization (ERM)
principle in statistical learning:
T
bS (a) = 1
X
aA (ST ) = argmax R
b bS (a)
T
where R T
ra (xt ) .
a∈A T
t=1

The goal is to provide error guarantees on the ERM hypothesis b aA (ST ) against the Myerson
auction a∗ (F) depending on the number of samples T and some relevant complexity
measure of the hypothesis class A. Indeed, to make the problem tractable and to avoid
overfitting, the learner often restricts the complexity of hypothesis space A. This leads to a
classical bias / variance trade-off that can be materialized by the following decomposition
of the excess-risk between b aA (ST ) and a∗ (F):

R(a∗ (F)) − R (b
aA (ST )) = R(a∗ (F)) − R(a∗A (F)) + R(a∗A (F)) − R(b
aA (ST ))
| {z } | {z }
approximation error estimation error

The challenge for the learner is, given the knowledge of a set of possible distributions
D and the sample size T , to choose a family of auctions A that allows to balance these two
error terms. In the reminder of this section, we briefly describe classical tools to derive
theoretical guarantees on the estimation error. We also describe why guarantees are not

49
provided for any arbitrarily complex distribution F as it would make worst-case guarantees
mostly void. Approximation error is usually handled in a more ad-hoc way, as it is very
dependent on the hypothesis class.
The rates of convergence of approximation and/or estimation error are formalized
through the notion of sample complexity of an algorithm, i.e., a mapping from the class of
finite datasets into A.
Definition 3.1. Given ε ∈ [0, 1] and δ > 0, the sample complexity of a batch learning
algorithm alg, against a class of joint distributions F is the smallest number of samples
T such that for all distributions F ∈ F , if alg learns from a dataset ST ∼ F⊗T of T samples,
the following holds
n o
P R(alg(ST )) ≥ (1 − ε)R(a∗ (F)) ≤ 1 − δ ;
Stated otherwise, alg is (1 − ε)-optimal with probability at least 1 − δ.

3.2.1 The Need to Restrict the Space of Distributions

There are several reasons explaining the importance of restricting the admissible class of
value distributions F to control these error terms. The first is that worst-case bounds taken
over arbitrarily badly-behaved distribution would be almost uninformative. The second is
more pragmatical: arbitrarily bad distributions (ex: arbitrary mixtures or Dirac masses)
lead to very hard optimization problems. It is not really crucial to provide guarantees on
the solutions of problems that can not be solved (at least yet).
To simply explain why the estimation error term cannot be controlled in a satisfying
way without further assumptions on the distribution, we focus the remainder of this
sub-section on the case where there is only one buyer with value distribution F: the posted-
price setting. In this setting, we recall that to maximize her revenue Π(r) = r(1 − F(r)), the
seller must set the reserve price as the monopoly price, a root of bidder’s virtual value
distribution.
Definition 3.2. Let F
b denote the empirical value distribution in the dataset (with only one
bidder or in the symmetric case). The empirical monopoly price is
r̂ ∗ = argmax r(1 − F(r))
b .
It is unfortunately impossible to learn a near-optimal revenue-maximizing auctions
(Roughgarden and Schrijvers, 2016) without additional assumption.
Proposition 3.3 (Roughgarden and Schrijvers, 2016). Consider any fixed algorithm and
fixed dataset size T ∈ N . Then for every δ > 0 and ε > 0, there exists a value distribution
such that the auction output by the algorithm is at most ε-optimal with probability at least
1 − δ.

50
Proof. Consider the family of value distributions

2
z , with probability 1/z.

F = {Fz | z ∈ R+∗ } with Fz = 

0, with probability 1 − 1/z.


The optimal price of z2 gives an expected revenue of z. For any number of samples T ,
1
and any δ > 0, if z ≥ (1 − (1 − δ) T )−1 then with probability at least 1 − δ the dataset will be
composed of only 0. Let zT be the price posted by the algorithm inq that case; the expected
zT
revenue of the algorithm is therefore zT /z. As a consequence, if z ≥ ε then the algorithm
is only ε-optimal.

Even though the counter-example distribution Fz involved in the proof does not satisfy
the basic assumption of continuity, it is easy to see that smoothing it won’t really change
the proof (except for additional technicalities). Proposition 3.3 implies that some restrictive
assumptions on the joint distribution F are required, such as regularity of the marginals
F1 , . . . , Fn . Another and stronger requirement than increasing virtual value is a monotonous
hazard rate.
f (x)
Definition 3.4. A distribution F has Monotonic Hazard Rate if the hazard rate 1−F(x)
is
non-decreasing over its support.

Uniform, exponential and normal distributions satisfy the MHR condition and, obvi-
ously, all MHR distributions are regular distributions. The converse is not true since the
distribution F(z) = 1 − 1/z is regular but not MHR. Intuitively MHR distributions have
thinner tails than general regular distributions.

Theorem 3.5 (Dhangwatnotai et al., 2015; Huang et al., 2018). The sample complexity of
the empirical monopoly price is of order

– Θ(ε−3/2 log(1/ε) log(1/δ)) for MHR distributions and

– Θ(Hε−2 log(H/ε) log(1/δ)) for bounded [1, H] distributions

To get some intuitions, we provide a simple, but sub-optimal, proof for the case of
bounded distributions. But first, let us explain why bounded distributions are assumed
to lie on [1, H] and not [0, H]; this will not transpire in the proof, as we prove a weaker
statement (with a quadratic dependency in H but for any bounded distributions). The
reason is that usual techniques do not control an error of ε, but an error of εΠ(r ∗ ), which
can be arbitrarily smaller if Π(r ∗ ) is close to 0. The assumption that the support is included
on [1, H], ensures that Π(r ∗ ) ≥ 1. With a simple renormalisation, we can show that the
sample complexity of monopoly price scales as by ρε−2 log(ρ/ε) log(1/δ) if the distribution
is supported on [a, b], with ρ = b/a.

51
Proof. Let F be a bounded distribution whose support in included in [1, H] and let us
denote by r ∗ = argmaxr r(1 − F(r)) the monopoly price, by F b the empirical CDF and by
∗
r̂ = argmaxr r(1 − F(r)) the empirical monopoly price.
b

p We recall Dvoretzky-Kiefer-Wolfowitz (DKW) inequality (Massart, 1990): if CT (δ) =

log(2/δ)/2T , then
bT (x) − F(x)| > CT (δ)} ≤ δ .
P{sup |F
x

As a consequence, with probability 1 − δ

! !
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
r (1 − F(r )) − r̂ (1 − F(r̂ )) ≤ r (1 − F(r )) − r (1 − F(r )) + r (1 − F(r )) − r̂ (1 − F(r̂ ))
b b b
!
∗ ∗ ∗ ∗
+ r̂ (1 − F(r̂ )) − r̂ (1 − F(r̂ ))
b
r
log(2/δ)
≤ 2H
2T
The second term is negative by definition of the empirical monopoly price.
Choosing T = 2H 2 ε−2 log(2/δ) gives an (1−ε)-approximation of the revenue-maximizing
auction with probability 1 − δ.

For the optimal proof, see Huang et al., 2018. These sample complexities match the
lower bounds provided in (Huang et al., 2018) up to logarithmic factors.
Unfortunately, this simple approach does not generalize to regular distributions, es-
pecially to heavy-tailed distribution. Intuitively, there exists a constant probability that
a few outliers generate an empirical monopoly price arbitrarily large. This intuition is
formalized in the following proposition.

Proposition 3.6. There exists a regular distribution F and two constants η0 , δ0 > 0 such
that, for any sample size T ,

∗ ∗
PST ∼F ⊗T Π(r̂ ) < (1 − η0 )Π(r ) > δ0 .

Proof. Consider F(x) = 1−1/x for x < 2 and F(x) = 1−1/(2(x −1)) for x > 2. Then F is regular
since ψ(x) = 0 for x < 2 and ψ(x) = 1 for x > 2 and the monopolistic revenue is equal to 1.
On the other hand, for any sample size T ∈ N, the following holds

PST ∼F ⊗T {∃x ∈ ST , x ≥ 4T } ≥ exp(−1/16).

b r̂ ∗ )) ≥ 4T 1 = 4 which entails that

Hence, with a constant probability, r̂ ∗ must verify r̂ ∗ (1 − F( T
r̂ ≥ 4. This implies in turn that Π(r̂ ) = r̂ (1 − F(r̂ )) ≤ 3 = (1 − 31 )Π(r ∗ ).
∗ ∗ ∗ ∗ 2

52
This problem is related to the estimation of the mean of heavy tailed distributions.
We refer the interested reader to (Lugosi and Mendelson, 2019) for a precise survey on
algorithms used to estimate the mean of heavy-tailed distributions.
To handle heavy-tailed regular distributions, a solution introduced in (Dhangwatnotai
et al., 2015) consists in removing the largest samples.

Definition 3.7. Given a dataset ST = {xt | t ∈ [T ]}, assuming the xt are ordered so that
x1 ≤ x2 ... ≤ xT , and an accuracy parameter κ > 0, we denote by

STκ = {xt | t ≤ (1 − κ)T }

bκ the the empirical
the subset of ST where the highest κT data-points are removed and by F
value distribution on it. Then the guarded empirical monopoly price is
bκ (r)) .
argmax r(1 − F
r

Theorem 3.8 (Dhangwatnotai et al. (2015)). The sample complexity of the guarded empir-
ical monopoly price with κ = ε is of order Θ(ε−3 log(1/ε) log(1/δ)) for regular distributions.

This result is another instance of the classical bias-variance tradeoff (Dhangwatnotai

et al., 2015), as removing the high values from the dataset reduces the variance of the
estimator at the cost of introducing a small bias.
The sample size is smaller for MHR distributions as they induce strongly concave
expected revenue curve, which limits the number of potential candidates around the actual
monopoly price. A crucial point in the formal proof is that estimation errors for different
possible prices are highly correlated. Indeed, if there are more lower values than expected,
the revenue of the true monopoly price will be underestimated but this will be also the
case for all prices near this optimal one. From a computational point of view, MHR and
regularity properties also lead to easier optimization problems as illustrated in Figure 3.1.
The monopolistic profit function is represented in three cases: mixture of Dirac, regular
and MHR. In general, there are no reasons for the monopolistic profit function to only
have one (local) maximum, so that optimizing is challenging. On the other hand, if F is
regular, this profit function is pseudo-concave and has only global (and no local) maxima.
As a consequence, the global optimisation is feasible, but at the cost of uniformly good
rates as the function can be quite flat in some places. If F is MHR, the profit function is
log-concave, its optimization can thus be solved more efficiently.

3.2.2 Bounding the Estimation Error

The estimation term quantifies how far is the performance of the predictor baA (ST ) found by
ERM in class A from the one of the best predictor in A, denoted a∗A (F). Intuitively, one can

53
Monopolistic revenue function of the reserve price
Varying for different distributions
Mixture of gaussian
0.4
Log normal
0.3 Kumaraswamy
r(1-F(r))

0.2

0.1

0.0
0.0 0.5 1.0 1.5 2.0
r
Figure 3.1: Several monopolistic revenues depending on the possible value distribution. The point yellow
corresponds to a log normal distribution with mean 0.5 and scale 0.5, the point-dashed green corresponds to a
Kumaraswamy distribution with parameters a= 2 and b=10 and the dashed blue one correspond to a mixture
of 7 Gaussian with mean equal respectively to (0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4) and standard deviation 0.001.

expect that under mild assumptions, the estimation error reduces to 0 when T → ∞, the
question being the dependency of the speed of convergence in the size of the hypotheses
class A and on the sample size T . When the class of hypotheses is finite, this estimation
error can be controlled with a union bound on some basic concentration inequalities.

Proposition 3.9. Consider a finite class of auctions A and ST ∼ F⊗T a dataset of T value
vectors drawn i.i.d from F. Assume n
that thesupport of F is included in [0, H] . Then, for
H2 4
all ε > 0 and δ > 0, if T ≥ 2ε 2 ln δ + ln |A| , then

PST ∼F⊗T aA (ST )) − R(a∗A (F)) ≤ ε
R(b ≥ 1−δ

Proof. Given any a ∈ A, the following holds

aA (ST )) = R(a) − R
R(a) − R(b bS (b
T
aA (ST )) + R
bS (b
T
aA (ST )) − R(b
aA (ST ))
bS (a) + R
≤ R(a) − R bS (baA (ST )) − R(b
aA (ST )) (by definition of ERM)
T T

≤ 2 max |R
bS (a) − R(a)|
T
a∈A

54
Then,
n o X n o
bS (a) − R(a) > ε ≤
P ∃a ∈ A, R P bS (a) − R(a) > ε
R
T T
a∈A
−2T ε2
!
≤ 2|A| exp
H2
Setting the right-hand side to δ/2 finishes the proof.

This generalization bound is uninformative if the size of the hypotheses space is infinite
which happens for large families of value distributions. This simple proof can nonetheless
be extended to an infinite set of hypotheses using standard statistical learning tools. The
general idea is to reduce the analysis of an infinite class of auctions to a finite set of
hypotheses. First, we introduce below different notions to quantify the complexity on a
hypotheses set.

Definition 3.10. Let A be a class of auction and ST = (x1 , . . . , xT ) be a fixed dataset of values
vector. The empirical Rademacher complexity of A with respect to ST is
T
d S (A) = Eσ sup 1
X
Rad T
σ r (x
t a t )
a∈A T t=1

where σ = (σ1 , . . . , σT ) with σi i.i.d uniform random variables in {−1, +1}.

If F is a distributions over values vector, its Rademacher complexity is the expectation
of the empirical Rademacher complexity over F
h i
RadT (A, F) = EST ∼F⊗T Rad
d S (A)
T

In order to use classical concentration bounds, possible revenue of an auction should be

uniformly bounded over all possible values vectors. Hence, the scope of the next theorem,
which is a key result from the learning theory (Koltchinskii, Panchenko, et al., 2002; Bartlett
et al., 2002), is restricted to distributions with support included in [0, H]n and to the class
of ex-post individually-rational auctions.

Theorem 3.11. Let A be a class of ex-post individually-rational auctions and F a distribu-

tion over values vectors space X bounded by H. Then, for any δ > 0, with probability 1 − δ
over a dataset ST = (x1 , . . . , xT ) ∼ F⊗T , it holds:
s
T
1 X log( 1δ )
∀a ∈ A, R(a) ≤ ra (xt ) + 2RadT (A, F) + H .
T 2T
t=1

As a consequence, computing the Rademacher complexity bounds the estimation error

for all the auctions in the hypotheses class. Unfortunately, in most cases, this task is quite

55
challenging. So weaker concepts, the Vapnik-Chervonenkis (VC) dimension (for the binary-
class problem) and the pseudo-dimension (for the real-valued hypotheses classes), were
introduced. They can be used to establish generalization bounds and are easier to compute
than the Rademacher complexity since they are pure combinatorial notions.

Definition 3.12. Let A be a set of hypotheses on X . A dataset ST = (x1 , . . . , xT ) ∈ X T is

pseudo-shattered by A if there exists (θ1 , . . . , θT ) ∈ RT , such that for all (c1 , . . . , cT ) ∈ {−1, 1}T ,
there exists a ∈ A such that

∀t ∈ {1, . . . , T }, sign(ra (xt ) − θt ) = ct .

If this (θ1 , . . . , θm ) exists, it witnesses the shattering. The pseudo-dimension of A is the

cardinality of the largest set of points in X that can be pseudo-shattered by A.

Pdim (A) = max T ∈ N | ∃ST ∈ X T such that ST is pseudo shattered by A

The following result from learning theory gives a uniform generalization bound in
terms of the pseudo-dimension.

Theorem 3.13 (Haussler, 1992). Let A be a class of ex-post individually-rational auctions

and F a distribution over values vectors space X bounded by H. Then, for any δ > 0, with
probability 1 − δ over a dataset ST = (x1 , . . . , xT ) ∼ F⊗T , for all a ∈ A,
s s
T T
1 X 2P dim (A) log( Pdim (A)
) log( 1δ )
R(a) = Ex∼F (ra (x)) ≤ ra (xt ) + H +H .
T T 2T
t=1

This result is derived from an upper bound of the Rademacher complexity, as a function
of the pseudo-dimension. It can be translated in terms of sample complexity.

Proposition 3.14 (Haussler, 1992). Let A be a class of ex-post individually-rational auc-

tions and F a distribution over values vectors space X bounded ! by H. Then, for any δ > 0,

for any ε > 0, if T = Θ H 2 ε−2 Pdim (A) log( Hε ) + log(1/δ) , then

bS (a) − R(a)| > ε ≤ δ
PST ∼F⊗T ∃a ∈ A, |R T

This result links the sample size T to the "richness" of the auction class A on the
estimation error. It was first originally applied in learning auction as such (Morgenstern
and Roughgarden, 2015), before being extended (Devanur et al., 2016; Gonczarowski and
Nisan, 2017; Guo et al., 2019). In Sections 3.3 and 3.4, we will use the pseudo-dimension
to quantify the estimation error through this result.

56
3.3 Auctions with Asymptotically No Approximation Error
We first present families of auctions without asymptotical (as T → ∞) approximation error.
Said otherwise, either there is no approximation error at all or the approximation error
can be sent to 0, by controlling a parameter dependent on T .

3.3.1 Approximation of Myerson Auction with Empirical Distributions

With multiple bidders, the most natural approach is to generalize the posted price tech-
niques and to estimate for each bidder the empirical value distribution F bi . Based on these
empirical distributions, the seller can compute empirical virtual values and run the Myer-
son auction with these virtual values. Unfortunately, this approach is prone to overfitting
and the Myerson auction run on empirical distributions has poor revenue guarantees.
Intuitively, this comes from the definition of the virtual value; although a quantile 1 − F(x)
can be easily estimated, this is not the case for the associated density f (x) even in not so
pathological cases. Then, the virtual value is not precisely estimated which leads to poor
performances of the empirical Myerson auction.
Theorem 3.15 (Cole and Roughgarden, 2014b; Devanur et al., 2016). With regular value
distributions, the sample complexity of the empirical Myerson auction is of order
n 1
Θ n10 ε−7 log3 ( ) log( ) .
ε δ
This result can be extended to non-regular distributions (Roughgarden and Schrijvers,
2016). The bound can also be improved in the case of distributions with bounded support
included in [1, H]: the sample complexity its of order Θ(n2 H 2 ε−2 log 1δ ), disregarding the
computational burden of ironing. We mention here that running the Myerson auction re-
quires distributions that are regular, which is not the case of empirical ones. To circumvent
that issue, the trick consists in, roughly speaking, ironing the empirical distributions (see
Section 2.4.6).

Algorithmic Complexity. The running time of the empirical Myerson auction is also
quite high. It takes O(nT log T ) operations to compute the empirical cdf. Then, each time
the auction is run, computing the attribution and the payment takes O(nT ).

3.3.2 L-level auctions

The class of L-level auctions is a generalization of the second-price auctions with reserve
price (Morgenstern and Roughgarden, 2015). Each bidder has L floors, denoted by ri` for
i ∈ N and ` ∈ {0, . . . , L − 1}. Given a bid bi of bidder i, his index is defined as
ιi = max{ ` ∈ {0, . . . , L − 1} such that bi ≥ ri` }, with ιi = −1 if bi < ri0 .

57
The winner of the auction is the bidder with the highest non-negative index: if all
bidders have an index equal to −1, the item is not allocated. Ties are broken at the advantage
of the highest bidder among those with highest bid, or by standard decision rules (such as
uniformly at random).
The payment rule is defined according to Corollary 2.18, and to ensure that the auction
is DSIC, it is the lowest winning bid. Formally, with the above breaking-tie rule, the
payment of the bidder i if he won is equal to
• ri0 if all other bidders have index -1,

• max{riτ , bj } where τ > −1 is the index of the bidder j that would have won without
bidder i
This class of L-level auctions interpolates between the eager second-price auction and
the Myerson auction. Indeed, the 1-level auction is equivalent to the eager second-price
auction. When L → ∞, the Myerson auction can be approximated with appropriate reserve
prices – i.e. the approximation error can be made arbitrarily small by taking T arbitrarily
large.
Theorem 3.16 (Morgenstern and Roughgarden, 2015). Let F be the class of distributions
whose support is included in [1, H]n and L = Θ( 1ε + log1+ε (H)), then for any F ∈ F , there
exists a L-level auction with a revenue higher than 1 − ε times the revenue of the Myerson
auction .
The idea of this theorem is quite simple, the class of L-level auctions, for the above
well-chosen value of L, is an ε-net for the class of regular value distributions (Morgenstern
and Roughgarden, 2015). Moreover, the estimation error can be controlled by computing
its pseudo-dimension.
Theorem 3.17 (Morgenstern and Roughgarden, 2015). Let A be the class of L-level
auctions with n bidders, then its pseudo-dimension satisfies
Pdim (A) = ΘL→∞ (nL log(nL)) .

Remark. Theorem 3.17 only provides a scaling of Pdim (A) with respect to nL for the sake
of simplicity. A more precise relation can be derived (Morgenstern and Roughgarden,
2015) since 2Pdim (A) ≤ (nPdim (A) + nL)3nL . This will be useful to derive a pseudo-dim for the
class of second-price auctions later (i.e. L = 1).
The approximation error will be smaller than ε by setting L = Θ( 1ε + log1+ε (H)) thanks
to Prop. 3.14. Similarly, the estimation error will be smaller than ε with
H

T = Θ H 2 ε−2 Pdim (A) log( ) + log(1/δ)
ε

58
samples, where Pdim (A) = ΘL→∞ (nL log(nL)). Combining these two claims gives the follow-
ing.

Corollary 3.18. Let F be the class of distributions with support included in [1, H]n . For
ε > 0 and δ > 0, the sample complexity of L-level auctions with L = Θ( 1ε + log1+ε (H)) is of
order
H 2n 1
T =Θ log .
ε3 δ

Algorithmic Complexity. This improvement over the empirical Myerson auction in

terms of sample complexity comes at a cost: tractability. Even though each time the auction
is run only requires nL operations (independent of T ) to compute attribution and payment,
the computation of optimal L-level auction – i.e. of the optimal values for ri` – is NP-hard
(Paes Leme et al., 2016; Roughgarden and Wang, 2016).

3.3.3 Further improvements.

These results were improved to reach first a linear dependence in H, by preprocessing
the data, removing some outliers and running the Myerson auction on this new empirical
distribution. (Devanur et al., 2016). It has been even further improved with a bound in
O(nHε−2 log 1δ ) for [1,H]-value distributions, by building an ad-hoc empirical distribu-
tion called the dominated empirical distribution (Guo et al., 2019). State of the art are
summarized in Table 3.1, taken from (Guo et al., 2019).

Setting Lower bound Upper bound

Regular Ω(nε−3 ) O(nε−3 log 1δ )
MHR Ω(nε−2 ) O(nε−3 log 1δ )
[1,H] Ω(nHε−2 ) O(nHε−2 log 1δ )

Table 3.1: Current status of sample complexity bounds in the batch setting depending on the class of value
distributions. Table taken from (Guo et al., 2019).

The question of finding the optimal sample complexity is more or less settled for
different interesting classes of distributions. On the other hand, most of the “optimal” (in
the sense that some upper-bound matches the associated lower bound, up to logarithmic
terms) techniques suffer from their computational complexity of running the optimal
auction (and not learning it), as the empirical Myerson auction method. Indeed, learning
the optimal auction has a cost of O(nT log T ), but running it has a fixed cost of O(nT )
operations to compute each allocation and each payment. There is a clear tension: the
smaller the error ε (large values of T ), the larger the running time of the optimal auction.
This is simply unpractical for large-scale auctions systems such as the Ebay example. The

59
next subsection describes how to handle revenue maximization on more tractable auctions,
at the cost of keeping a bounded, yet incompressible, approximation error.

3.4 Tractability at the Cost of Approximation Error

It is quite desirable, for an auction aiming to be implemented in practice, to have a
complexity independent of T both for computing the attribution and the payment. This
might require searching in a parametric class of auctions A rather than in a non-parametric
one, like the different versions of the empirical Myerson auction. Further, the number of
parameters also has to be independent of T , contrarily to L-level auctions with L = Θ( 1ε +
log1+ε (H)), as ε is linked to T . For instance, a L-level auction with L fixed independently of
T would be satisfying: the running time of the auction is independent of T , at the expense
of not having anymore asymptotically zero approximation error. In order to provide simple
insights, we are going to focus on simpler families of auctions A derived from the second-
price auction. We are mostly going to focus on controlling the approximation error, so we
first introduce new require notations.

Definition 3.19. An auction family A is a κ-approximation of the Myerson auction for a

class of distribution F if
1
∀F ∈ F , R(a∗A (F)) ≥ R(a∗ (F))
κ
Stated otherwise, the approximation error has the form
1

∗ ∗
R(a (F)) − R(aA (F)) ≤ 1 − R(a∗A (F)) .
κ

3.4.1 Second-price auctions with anonymous reserve price

First, let us consider the simple family of second-price auctions with an anonymous reserve
price (which contains the Myerson auction in the symmetric case). From the learning point
of view, only one parameter must be learned.

Proposition 3.20. Let A be the class of second-price auctions with anonymous reserve
prices. For n ≥ 2, the pseudo-dimension of A is

Pdim (A) = 2 .

Proof. First, an auction in A is defined by only one parameter, thus we identify it with its
anonymous reserve price and we denote it by a for simplicity. Finding a set of cardinality
2 that can be pseudo-shattered by A is trivial. We only need to prove that any set of
dimension 3 and higher cannot be shattered.

60
We remind that a dataset ST = (x1 , . . . , xT ) of size T is pseudo-shattered by A if there
exists θ ∈ RT , such that for any c ∈ {−1, 1}T , there exists a ∈ A such that ∀t ∈ [T ], sign(ra (xt )−
θt ) = ct .
Regardless of the number of bidders n, the function a 7→ ra (x) is quasi-concave in the
reserve price a and thus can cross (strictly) at most twice any threshold θt . Hence, a dataset
ST of size T can only generate a subset of {−1, +1}T of size 2T + 1: when a ranges from 0 to
∞, the vector sign(ra (xt ) − θt ) changes values at most twice per points in ST . Thus, a set ST
can be pseudo-shattered only if 2T ≤ 2T + 1 which means that necessarily T ≤ 2.

Proposition 3.14 yields that the sample complexity is T = Θ H 2 ε−2 log( Hε ) + log(1/δ)
for distribution with bounded support on [1, H]. Unfortunately, the approximation power
of such a simple class of auctions is poor and the approximation error remains large.

Theorem 3.21 (Hartline and Roughgarden, 2009). With regular value distributions, the
anonymous second-price auctions are a 4-approximation of the Myerson auction.

Sketch of proof: The proof relies on the following Bulow-Klemperer variant lemma.

Lemma 3.22 (Hartline and Roughgarden, 2009). Consider the following two settings. In
the first one, there are n bidders with value distribution Fi and, in the second one, there
are 2n bidders, the original ones and one independent copy of each one of them. The
second price auction without reserve price in the second setting is a 2-approximation of
the Myerson auction in the first setting.

Proof. Let R2n be the revenue generated by the second price auction without reserve price
with the original set of n bidders (of values denoted by xi ) and their n copies (whose
values are denoted by yi ). Since a bidder is identical to his copy, they both generate the
same revenue to the seller. As a consequence, in this auction, the revenue of the original n
bidders is equal to the revenue of their n copies, hence equal to R2n /2. Lemma 3.22 states
that R2n is bigger than half the revenue of the Myerson auction. This implies that, overall,
the original n bidders generate 1/4 of the Myerson auction revenue.
In this auction, one of the original n bidder gets the item only if his bid is the highest
and, in that case, he pays the second highest bid amongst the 2n ones, which is equal to
the maximum between the second highest bid of the original bidders and the highest bid
of the copies. As a consequence, the allocation and payment of that bidder is exactly the
same as in an auction with only the n original bidder with a random reserve price set as
the maximum bids of the copies, i.e., max{yi ; i ∈ [n]} := y (1) . So there is a second-price
auction with random reserve price that is a 1/4 approximation of the Myerson revenue. It
is crucial to notice that this random reserve price y (1) is independent of the highest and
second highest values, denoted respectively by x(1) and x(2) so that the expected revenue at

61
the truthful equilibrium satisfies
h n oi h n oi
Eyi ∼Fi Exi ∼Fi max{x(2) , y (1) }1 x(1) ≥ y (1) ≤ max
∗
Exi ∼Fi max{x (2) ∗
, y }1 x (1)
≥ y ∗
y

by the pigeon-hole principle. As a consequence, setting as reserve price Y ∗ that attains the
maximum on the right generates at least 1/4 of the Myerson auction revenue.

3.4.2 Second-price auctions with personalized reserve prices

The approximation power of this family of auction can be increased with personalized
reserve prices, one for each buyer. Recall that two different families of second price auctions
with personalized reserve prices can be considered, either eager or lazy (Paes Leme et al.,
2016), see Section 2.4.3.
Recall that in an eager auction, the item is allocated more often than in the lazy one:
whenever at least one bidder clears his reserve price versus when the highest bidder clears
it. However, it is sold at a lower price in the eager version: the maximum between the
reserve price and the highest bid among the other bidders who cleared their reserve, rather
than amongst all other bidders.
In practice, the lazy version of the auction is not very interesting, yet it is a very inter-
esting theoretical tool to understand eager second-price auctions, generally implemented
in practice (Drutsa, 2020; Choi et al., 2020). The reason is that, in a lazy second-price
auction, being truthful is only a weakly dominant strategy; however, bidding 0 when the
value is below the reserve price and truthful otherwise is also a weakly dominant strategy.
With those bidding strategies, the lazy second-price auction becomes de facto an eager
second-price auction.

Lazy second-price auction

The main reason explaining the relative simplicity of lazy second-price auctions with
respect to the eager version is the decoupling of the revenue maximization problem across
the bidders. In the lazy version, the reserve price of a bidder j , i has no influence on
whether bidder i gets the item and its cost. Those only depends on the bid of bidder i and
his reserve price denoted by ri . Hence, as seen in Section 2.4.3, the lazy second-price auction
is optimized by setting the personalized reserve prices to the respective bidders’ monopoly
prices or rather their empirical estimates when given a dataset ST . Unsurprisingly, this
gives n independent estimation problems, so that the pseudo-dimension is then n times
larger.
Proposition 3.23. Let A be the class of lazy second-price auctions with personalized
reserve prices. For n ≥ 2, its pseudo-dimension
Pdim (A) = 2n .

62
Sketch of proof. This is a direct extension of the proof for the second-price auction with
anonymous reserve prices as the n estimation problems are independent.

Proposition 3.23 states that the estimation problem is not much harder than with anony-
mous reserve
prices. Using again
Proposition 3.14, the associated sample complexity is
2 −2 H
T = Θ H ε n log( ε ) + log(1/δ) for distribution with bounded support on [0, H]. How-
ever, this simple modification already greatly improve the guarantee on the approximation
error.

Theorem 3.24 (Dhangwatnotai et al., 2015). With regular value distributions, the lazy
second-price with monopoly reserve prices is a 2-approximation of the Myerson auction.

Proof. We divide the revenue of the Myerson auction in two parts and bound each term by
the revenue of the lazy second price with monopoly reserve.
First, based on the Myerson lemma, recalling that ψi stands for the virtual value
function associated to the distribtuion Fi , see Definition 2.11, we get
n
X
R(lazy(F)) = Ex∼F ψi (xi )1{i is winning lazy 2nd-price }
i=1
n
X
≥ Ex∼F ψi (xi )1{i is winning Myerson auction & lazy 2nd-price}
i=1

since if i is winning the lazy second-price auction, his virtual value is non-negative.
The revenue of the lazy auction can also be compared to the revenue of the Vickrey
auction (i.e. the second-price auction with no reserve price), as follows
n
X
Ex∼F ψi (x)1{i wins Myerson aution & not lazy 2nd-price}
i=1
n
X
= Ex∼F ψi (x)1{i wins Myerson auction & not Vickrey auction}
i=1
n
X
≤ Ex∼F xi 1{i wins Myerson auction & not Vickrey auction}
i=1
≤ R(Vickrey(F))
≤ R(Lazy(F)) .

Indeed, the first equality is a consequence of the fact that if a bidder wins the Myerson
auction, he has a non-negative virtual value. Hence, if he does not win the lazy second
price with monopoly reserve, he does not win the Vickrey auction. The second inequality
is deduced from the definition of the virtual value, the third one from the payment of the

63
Vickrey auction and the last one from the Myerson lemma that shows that the monopoly
reserve prices are the optimal ones for a lazy second-price auction.
As a consequence,
n
X
2R(Lazy(F)) ≥ Ex∼F ψi (xi )1{i is winning Myerson auction & lazy 2nd price}
i=1
n
X
+ Ex∼F ψi (xi )1{i is winning Myerson auction & not lazy 2nd price}
i=1
∗
= R(a (F))

The last equality comes from by the Myerson lemma, and this concludes the proof.

In a nutshell, if the lazy auction were implementable in practice, it would be a good

compromise. Its learning time is O(nT log T ), its running time is O(n), the sample complex-
ity is in ε−2 and it is a 2-approximation of the Myerson auction.

Eager second-price auction

First, in terms of estimation, the problem of optimizing the eager second price auction is
not much harder than the lazy one, as the pseudo-dimensions are rather similar: they only
differ by a factor log(n).

Proposition 3.25. Let A be the class of second-price auctions with personalized reserve
prices; its pseudo-dimension satisfies

Pdim (A) = O(n log(n)) .

Sketch of proof. This is a corollary of the pseudo-dimension of L-level auctions for L = 1

(Thm. 3.17). As explained in Remark 3.3.2, it amounts to plugging in the last line of the
proof of L-level auctions with L = 1 and conclude.

Similarly, the guarantee on the approximation error is the same, it is a 2-approximation,

whether the distributions are MHR or regular.

Theorem 3.26 (Hartline and Roughgarden, 2009). With regular value distributions, the
optimal eager second price is a 2-approximation of the Myerson auction.

Proof. We prove that the eager second price with monopoly reserve is a 2 approximation of
the Myerson. The proof is very similar to the one of Theorem 3.24. We divide the revenue
of the Myerson auction in two parts and bound each term by the revenue of the eager
second price auctions.

64
First, based on the Myerson lemma,
n
X
R(eager(F)) = Ex∼F ψi (xi )1{i wins eager 2nd-price}
i=1
n
X
≥ Ex∼F ψi (xi )1{i wins Myerson auction & eager 2nd-price}
i=1

since if i is winning the eager second-price auction, his virtual value is non-negative.
The item is allocated in the Myerson auction if and only if the item is allocated in the
eager second-price auction. Hence, there exists a one-to-one mapping between a winner in
the eager second-price auction and a winner in the Myerson auction.
Consider the case where these two winners are different and denote by i the winner of
the eager second-price auction with monopoly reserve and by j the winner of the Myerson
auction. Let denote by x the vector of value corresponding to this case and by peager (x) the
payment of the eager second price with monopoly reserve for this specific vector of values.
By definition of the payment rule of the eager second price auction and by definition of
the virtual value,
peager (x) ≥ xj ≥ ψj (xj ).
We conclude the proof with the same reasoning of Theorem 3.24. Since the eager second
price with monopoly reserve is a 2-approximation, the eager second price with optimal
reserve is a 2-approximation.

Algorithmic complexity. Similarly to the lazy version, running the eager second-price
auction has a complexity of O(n). The main difference comes in the complexity of learning
the set of optimal reserve prices. For the lazy version, the optimal reserve prices are the
monopoly prices, that can be computed in O(nT log(m)). This is no longer true for the
eager version and finding the optimal reserve prices is NP-hard (Paes Leme et al., 2016;
Roughgarden and Wang, 2016), which explain why the more general T -level auctions
has the same limitation. This seems to be in contradiction with the objective of consider
non-zero approximation error to get tractable learning and running complexities. So the
question we investigate in the following section is the performance of the eager second-
price auctions, but with sub-optimal reserve prices set as the computable monopoly prices.

Eager with monopoly reserve prices

We now prove that the eager second-price with monopoly reserve prices generates a higher
revenue for the seller than the lazy second price with monopoly reserve prices. This is not
obvious as in the eager version, the winning bidder pays the highest second bid in the set

65
of bidders who cleared their reserve price. This second highest bid can be lower than the
second highest bid in general which is the one paid in the lazy version.
Theorem 3.27 (Fu, 2013). With regular value distributions, the revenue of the eager
second-price auction with monopoly reserve price is higher than the revenue of the lazy
second-price auction.
Proof. We denote by ri the monopoly price corresponding to bidder i. We will compare the
expected payment of bidder i, in the lazy or eager auction, conditioned to the values of all
lazy eager
other bidders. First, we define x−i = maxj,i {xj } and x−i = maxj,i,xj ≥rj {xj }. In particular,
lazy eager lazy
this implies that x−i ≥ x−i . Moreover, if x−i ≤ ri , the two auctions are identical for
bidder i that pays ri if xi ≥ ri .
To compare the two auctions, we can therefore restrict ourselves to the case where
lazy
x−i ≥ ri . The expected payment of bidder i in a lazy second-price auction is in this case
equal to
lazy lazy
x−i (1 − Fi (x−i ))
eager eager
For the eager second-price auction, the payment is equal x̃−i = max{x−i , ri } and the
expected payment of bidder i is
eager eager
x̃−i (1 − Fi (x̃−i ))
Since Fi is regular, then x(1 − F(x)) is non-increasing for x ≥ ψi−1 (0). As a consequence, the
expected payment in the lazy second price is lower than the expected payment in the eager
second-price auction with monopoly reserve prices.

Especially, as a corollary, it means that an eager second-price auction with monopoly

reserve is still a 2-approximation of the Myerson auction. Thus, in the end, the eager
auction with monopoly reserves provides a good compromise in terms of tractability versus
optimality. Indeed, the computational complexity are O(n) (running) and O(nT log(T ))
(learning) and enjoys the pseudo-dimension of learning the monopoly prices of 2n. The
only concern that may remain is the approximation error that is only guaranteed to be less
than 12 R(a∗ (F)). In fact, this guarantee is loose, and the actual error is often much smaller.

Why the approximation error may not be too large

We proved in Section 2.5 that
the Vickrey symmetric auction with n bidders and regular
value distributions is a n−1 n -approximation of the Myerson auction. Overall, even though
it is not straight-forward to extend it in the non-symmetric cases, this result indicates that
when the competition is strong, the approximation error incurred by second-price auctions
is not as big as 12 R(a∗ (F)). So, often, the guarantee of the eager second-price auction being a
1
2 -approximation of the Myerson auction is loose and the error is much smaller as shown
on Figure 2.3.

66
3.4.3 The boosted second-price auction
A simple extension to the eager second-price auction has been proposed to empirically
improve the seller’s revenue by reducing the approximation error: boosted second-price
auction (Golrezaei et al., 2017). It relies on the following point of view: the eager second-
price auction can be seen as a Myerson auction with approximated virtual value functions
ei (x) = x − ri where ri is the personalized reserve price.
ψ
Practically, a drastic improvement in terms of approximation error can be made by
adding a slope parameter, different for each bidder. The result is the boosted second-
price auction, which is a Myerson auction with approximated virtual values functions
ei (x) = βi x −ri where ri is the personalized reserve price and βi the "boost". It turns out that
ψ
for certain families of distributions (ex: generalized Pareto distributions) the virtual value
is affine, making the boosted second-price auction coincide with the Myerson auction.
It retains the following two good properties of second-price auctions with personalized
reserve prices: 1) it is parametric and thus has a reasonable pseudo-dimension and 2) it
has a running time independent of the sample complexity T . And it even has a lower
approximation error as the auctions class is strictly larger. Actually, for some families
of distributions, it even has 0 approximation error even in an asymmetric setting. The
main caveat is the computational complexity of the training. As the eager second-price
auction, the global optimization is NP-hard to solve. However, by initializing with an
eager second-price auctions (βi = 1) with monopoly reserve prices ri and launching an
optimization from there, it proved to empirically perform very well (Golrezaei et al., 2017).

3.4.4 Learning reserve prices in first-price auctions

As indicated by the second price or Myerson auctions, a crucial step when designing an
optimized mechanism consists in estimating the bidders’ value distributions. This step is
even harder in first price auctions (Guerre et al., 2000; Athey and Haile, 2007) than it is in
second price auctions.
The central idea is to assume that bidder i ∈ N optimizes his bids according to the best
response strategy described in Equation 2.4. Hence - keeping the notations of Subsection
2.2.2) - when his value is xi , he bids bi , a solution of
Gi (bi )
xi = bi + . (3.2)
gi (bi )
When there are sufficiently many repeated auctions, the seller can compute/estimate the
cdf Gi (and the pdf gi , possibly with kernel density estimators) from the distribution of bids
of the competition that bidder i is facing. Plugging back this knowledge into Equation (3.2)
gives an estimate of the value xi . Repeating this operation for all bids produces estimate of
the value distribution.

67
This approach appeals to stationarity assumptions, a potential drawback. In practice
issues may also arise from the fact that bidders and seller may have different estimates of
the bid distribution the bidders are facing. Then the seller’s estimate of the optimization
problem solved by the bidders could be inaccurate.

Value distribution estimation and reserve price issues

Another major difficulty in setting optimal asymmetric reserve prices is that they have a
somewhat complex and non-linear impact on the optimal bidding strategies of the buyers
and, as a consequence, will affect the allocation probability qi (x) in a potentially complex
fashion at equilibrium.
However, a natural, yet possibly suboptimal, choice is to set ri = ψi−1 (0), i.e. setting
the reserve value at the monopoly price ψi−1 (0), at least for regular distributions. This
guarantees that the term under the expectation in Equation (2.6) is always non-negative –
in a first-price auction, bidders never bid above their value. This principle is quite similar
to one studied in Section 3.4.2 for eager second-price auctions, as finding optimal reserve
prices is NP-hard (Paes Leme et al., 2016) for that type of auctions. However, in first price
auctions and other non-DSIC auctions, setting this reserve may induce a change of optimal
bidding strategy and hence a different βi , making the evaluation of the impact of such
choice of reserve price on seller revenue theoretically delicate. To compute the monopoly
price, the seller needs to estimate the value distribution of bidder i from his bids, a task
we now turn to.

Setting optimal reserve prices in first-price auctions cannot be done by naive ERM

In first price auctions, the bidder requires much information about the competition to
compute best responses. Even if he knew perfectly the value distribution of the other
buyers, it would still be numerically challenging to bid optimally and reach the Nash
equilibrium. The situation is even worse when buyers have to estimate the distribution of
the competition while bidding.
On the other hand, setting the reserve prices is also more complicated for the seller.
With second price auction, she could gather data and form a dataset of bids whose dis-
tribution should be close to the value distribution (assuming myopic and non-strategic
agent). It is then possible to run an ERM based on this dataset.
On the contrary, with first price auctions, each bid received has a distribution that
depends on the reserve price chosen at the time. And, in the future, choosing another
reserve price will induce yet another distribution of bids. As a consequence, the data
from the “training” set (past bids) and the “test” set (future bids) have different statistical
properties and naïve empirical risk minimization will not work.

68
A seller can however use the theory discussed above to account for the impact of reserve
price on bid distributions and then simulate from these new reserve-price dependent
distributions and measure the effect of different reserve prices in a unbiased way. The
difficulty of solving for Nash equilibrium creates nonetheless a hurdle to the practical
implementation of such ideas (Feng et al., 2021).

3.4.5 New numerical methods for multi-item auctions

The multi-item framework is more intricate than single-item. Myerson’s fundamental
result has been extended to specific settings depending on the number of objects and on
the properties of the bidders’ utility functions (Armstrong, 1996; Manelli and Vincent,
2007; Daskalakis et al., 2013; Yao, 2017). A general and analytical optimal auction in the
multi-item framework has yet to be found.
Because of the amazingly large variety of different settings, automatic mechanism design
has been introduced to provide a (numerical) framework for learning revenue-maximizing
mechanisms satisfying constraints chosen by the designer (Conitzer and Sandholm, 2002;
Albert et al., 2017). This framework was complemented by introducing neural networks
for different instances of the multi-item problem (Dütting et al., 2019; Shen et al., 2019b;
Golowich et al., 2018) to take advantage of the large expressivity power of neural networks
architectures.
A general algorithmic approach to approximately solve the seller’s optimization prob-
lem in multi-item, multi-bidder settings has been implemented (Dütting et al., 2019).
The seller’s auction is parametrized by a weight vector ω ∈ Rn×m corresponding to two
neural networks which take valuations for each item and each player as inputs and return
respectively the allocation probability qω ∈ Rn×m of each item and each player, and the
payment for each player pω ∈ Rn×m per unit of item. In the case of combinatorial auctions,
bidders would submit a bid for each possible bundle (in our setting, a bit bs then belongs to
m
Rn×2 ). The bidders valuations in a combinatorial setting would not need to be explicitly
described as per the large literature on succinct representations of bidding languages
for combinatorial auctions. The networks are trained by batches of size L, i.e., L vectors
m n
xt ∈ R(2 ) (corresponding to each bundle per bidder) are sampled at each iteration.
The first term of the loss function used to train the network is the negated empirical
revenue computed on the dataset of valuations ST = {x1 , . . . , xT },
T
1X
LRev = − hpω (xt ), qω (xt )i
T
t=1

To ensure the IC constraint, two different approaches have been considered. The first one is
a hard constraint implemented by defining an architecture which is DSIC by design, called
MyersonNet. Myerson’s lemma is used to design this architecture that learns the optimal

69
DSIC auction in the single-item setting. However, for each new setting of the problem, a
new architecture must be designed (Shen et al., 2019b).
The second approach, the RegretNet architecture, uses a soft constraint in a Lagrangian
corresponding to the incentive-compatibility objective. For each bidder, the empirical
ex-post regret for bidder i is defined as
T
1X
Regi (ω) =
d max uiω (bt∗ , b−i,t ) − uiω (bi,t , b−i,t )
T bt∗
t=1

This regret is the difference between the maximum utility bidder i can get by optimizing
m
his bids bt∗ ∈ Rn×2 and the utility he gets when bidding truthfully (sort of similarly to the
regret introduced in Section 3.6.1, yet the maximum is inside the sum instead of being
outside). This quantifies serves as a proxy on how untruthful an auction is: the higher
the regret, the less truthful the auction is as bidders can largely increase their utility by
deviating. The augmented Lagrangian method is then used to optimize the Lagrangian
function defined as:
n n !2
X ρ X
L(ω, λ) = LRev + λi Reg
d (ω) +
i Reg
d (ω)
i
2
i=1 i=1

This Lagrangian function is the sum of the negated actual revenue of the mechanism with
two penalties which quantify the lack of incentive compatibility, thus insuring that the
learned mechanism is approximatively DSIC. The bids bi , which are maximizing Reg d (ω),
i
are optimized through gradient descent, making the optimization unfortunately very
slow. In some multi-item instances, this approach actually recovers the optimal revenue-
maximising mechanisms (when the latter is known theoretically).
This approach can be complemented by introducing a network encompassing the best
bids for one specific bidder, avoiding running a gradient descent for each specific value
and by trying to take advantage the continuity of the problem (Rahme et al., 2020). The
idea is to leverage the fact that if two valuations are close to each other, their optimal bids
should be also close to each other.
These numerical approaches can help theoreticians to identify some good candidates
for the revenue-maximizing auctions in more exotic cases when bidders have some budget
constraints Feng et al., 2018a. However, a general theory for designing optimal mechanisms
in the general case of multi-item auctions is still out of reach.

3.5 Contextual Estimation of Reserve Prices

In the previous sections, it was more or less implicitly assumed that only one type of
goods were being sold repeatedly. In practice, especially in the motivating examples of

70
repeated auctions like internet advertising, the items sold are different from one to each
other, at least partly. For instance, successive ad slots sold may have same size but different
placements, same placement but different size, or may be on different pages of the same
website, or on different websites. An easy solution would be to consider all these different
items separately, but it would mean only having a small number of samples per items,
which would prevent accurate estimation of the monopoly price. To address this issue,
some underlying structure and some regularity are required to formalize the idea that
samples obtained for one item also informs on the distributions of similar items.

3.5.1 Contextual Auctions

In this section (only), we will assume that an item is described by a public set of d features
z ∈ Rd such that similar items have similar features (for some distance of Rd ). By public, we
mean that z is available to both the seller and the bidders. The bidders use this information
to estimate their values x = (x1 , . . . , xn ) more accurately; in particular the distribution of
values now depends on z. For simplicity, we assume in this section that the seller only
estimates an anonymous reserve price, the same for all bidders, but that it may depend
on the available information z. The extension to personalized reserve prices is straight-
forward when they are independent, like in the case of the lazy second-price auction.
Formally, the seller aims at learning a reserve price as a function of z, hence, a mapping
r ∗ ∈ Rd → R+ . For learnability reasons, we will restrict r ∗ to belong to some compact
sub-class of hypothesis R ⊂ (Rd → R+ ). The learning of this contextual optimal reserve
price relies on the observations of samples from the distribution of values of the bidders.
In fact, only the observation of the highest and the second highest value is necessary, so
we denote by x(1) the highest value amongst the bidders and by x(2) the second highest
value. Further, we denote by F the joint distribution of (x(1) , x(2) , z). Then, F (.|z) is the
distribution of the two highest values conditionally to the contextual information z. In the
end, finding the reserve function can be written as follow:
h i
r ∗ = argmax E(x(1) ,x(2) ,z)∼F φ(r(z), x(1) , x(2) ) (3.3)
r∈R
where φ(ρ, x(1) , x(2) ) = x(2) 1{x(2) > ρ} + ρ1{x(2) ≤ ρ ≤ x(1) }

This problem is quite difficult, both because of the optimization over a set of function
R, and because of the quite complex objective function. Similarly to the monopolistic profit
function, for a given fixed z, if the distribution F (.|z) is regular or MHR, the function ρ 7→
E(x(1) ,x(2) )∼F (.|z)[φ(ρ,x(1) ,x(2) )] is pseudo-concave or log-concave, but not concave. Unfortunately,
a sum of pseudo-concave or log-concave function is in general not pseudo-concave. Thus,
whenever considering a parametric class of functions R strictly smaller than the whole set
of functions, such as linear mappings, the loss marginalized over z may not be unimodal

71
in the parameters, leading to hard optimization problems. In the following, we present a
high-level overview of methods proposed to solve this learning problem.

3.5.2 Linear Reserve Price Function with Surrogate-Based Approaches

A simple class of functions for R are linear functions: R = {z 7→ hθ, zi : θ ∈ Rd }. However, as

mentioned right before, the objective to optimize, θ 7→ E[φ(hθ, zi, x(1) , x(2) )] is potentially
multimodal, as a mixture of only pseudo-concave functions. A first classical direction to
tackle such an a priori complex learning problem is to try to design a surrogate loss φ,e easier
to optimize in expectation (such as concave) than the initial objective φ. Obviously, a key
requirement is that the surrogate problem is consistent with the initial one, meaning that
the optimization of the former leads to a solution of the latter. Unfortunately, simplifying
the learning problem by finding a consistent concave surrogate formulation is actually
impossible, even for the simpler case where the second highest value x(2) = 0 (Mohri and
Medina, 2014) . For the sake of notations, we define φ0 (ρ, x(1) ) = φ(ρ, x(1) , 0).

Theorem 3.5.1 (Mohri and Medina, 2014). Let φ e : [0, 1] × [0, 1] → R be a bounded function,
concave in its first argument. If φ
e is consistent with φ0 , then φ(·,e x) is a constant function
for any x ∈ [0, 1].

This theorem indicates that learning methods, even based on linear functions for R,
need to rely on non-concave maximization methods. We detail two examples of such
methods in the following.

Solution based on DC-programing

A first method is based on the following piece-wise linear surrogate (Mohri and Medina,
2014):

e x(1) , x(2) ) = φ(ρ, x(1) , x(2) ) − (ρ − (1 + γ)x(1) )1{ρ > x(1) }1{ρ ≤ (1 + γ)x(1) },
φ(ρ, for γ > 0 .

While using a surrogate introduces a bias, as the maximizer in expectation of φ e will not
maximize exactly the expected monopoly revenue, this bias can be made small by taking
γ close to 0. However, this comes at the cost of making the Lipschitz constant 1/γ of
the surrogate grow significantly. The key idea under this surrogate φ e is that, because
it is piece-wise linear, it is possible to explicitly decompose it as a difference of convex
(1) (2)
functions. This means that the empirical risk T1 Tt=1 φ(r(z
P e
t ), xt , xt ) can, in turns, be
explicitly decomposed as a difference of convex functions as soon as r is a linear function.
Then, it is possible to optimize this empirical objective on classes R of linear function by
using DC-programming algorithms, such as DCA (Le Thi et al., 2014).

72
Solution based on Objective Variables
Another method exploits the idea of the introduction of objective variables in a Bayesian
framework Rudolph et al. (2016). First, the objective φ is smoothed using a Gaussian to
define the following surrogate:

e x(1) , x(2) ) = log E∼N (0,σ 2 ) exp(φ(ρ + , x(1) , x(2) )) , for some σ > 0.
φ(ρ, t t t t

As σ tends to 0, this surrogate φ

e converges towards φ, meaning it increased the smoothness,
but it is still potentially multimodal in expectation, ruling out simple descent methods
for the optimization. To tackle this, they introduce a probabilistic model using additional
objective variables η defined as follow:

η ∼ Bernoulli exp − (x(1) − φ(ρ + , x(1) , x(2) )) , where ∼ N (0, σ 2 ) .

This additional variable intuitively represents how satisfying the revenue is for a given
(1) (2)
auction (a sample) and thus is aimed to be put to 1. Hence, a dataset {xt , xt , zt , ηt = 1}t∈[T ]
(1) (2)
where (xt , xt , zt )t∈[T ] ∼ F ⊗T is used to estimate the parameters of the reserve price
function r to fit this probabilistic model. The key point is that the maximum at posteriori
(MAP) estimation recovers the parameter that maximizes the initial smoothed objective,
the expectation of φ. e The MAP estimation under this model is performed using the
Expectation-Maximixation (EM) algorithm. As such, the guarantee is only to improve the
solution at every step, but there is no global convergence guarantee. However, it exhibits
significant empirical improvements (in reasonable learning time) over the previous method
based on DC programming.

3.5.3 Using a Bid Prediction

Another very different approach does not rely on a surrogate of φ (Medina and Vassilvitskii,
2017), but on having access to a good prediction of the highest value. For simplicity,
consider the posted-price setting, i.e., when there is only one bidder, or x(2) = 0 (and for
this subsection, we then drop x(2) from notations). Assume the unique bidder has access to
a prediction function of the highest value x̂(z) with a given squared error – i.e. such that
h i
EF (x̂(z) − x)2 = η 2 . (3.4)
(1)
Using this value prediction and a training dataset {xt , zt }t∈[T ] ∼ F ⊗T , a reserve price
function is built as follows. First, the feature space Rd is partitioned by discretizing the
image of the value prediction function x̂(.) into K subset C1 , . . . , CK . Formally, this partition
is defined by τ ∈ RK d
+ and Ck = {z ∈ R : τk ≤ x̂(z) < τk+1 } (where by convention τK+1 = +∞).
This vector τ is built to minimize the sum of intra-partition variance of the prediction

73
x̂(.). Then, the reserve price function r is defined in a piece-wise manner on this partition.
Denoting rk the empirical monopoly price computed on the restriction of the dataset to Ck ,
the reserve price is defined as
K
X
r(z) = rk 1{x̂(z) ∈ Ck } (3.5)
k=1

Further, it is possible to provide a guarantee on the performance of r(.) that depends on

the accuracy of the value prediction η.

Theorem 3.5.2 (Medina and Vassilvitskii, 2017). For δ > 0, with probability at least (1 − δ)
over the learning samples, it holds
2 2 1
EF [r(z)1{r(z) ≤ x}] ≥ EH [x] − O K − 3 + η 3 + T − 6 .

The intuition behind this result is clear: the higher the prediction of the value x̂(z),
the higher the revenue extracted. However, it only provides a guarantee relatively to the
expected value EF [x] rather than the optimal revenue that could be extracted.

Overall, these three algorithms are costly to run on big datasets, highlighting the
complexity of the underlying problem. There exists more efficient computation of re-
serve prices, but usually by sacrificing the objective of revenue maximization for weaker
objectives (Shen et al., 2019a).

3.6 Cost & Online Estimation of Auctions

Given the sample complexity of a class A of (ex-post individuallly-rational) auctions,
as given in Proposition 3.14, it is possible to compute the global cost of learning the
optimal auction in that family. Indeed, ! recalling the statement of Proposition 3.14, only

Θ H 2 ε−2 Pdim (A) log( Hε ) + log(1/δ) samples are required to find an ε-optimal auction
(within the class A) with probability at least 1 − δ.
As a consequence, after t samples, by q inverting this equation (and setting arbitrarily
P (A) log(t)
δ = 1/t 2 ), a learner can compute an O(H dim t )-optimal auction with probability at
2
least 1/t . Summing all the errors from the first (t = 1) up to the p
last (t = T ) auction, we get
a total cost of learning the optimal auction of the order of O(H Pdim (A)T log(T )).
Those computations are possible for value distributions with support bounded in
[0, H], but similar computations give, for the case of regular distributions and the class
A of all ex-post IR auctions, a total cost of learning the Myerson auction of the order of
1 2 1
O(n 3 T 3 log 3 (T )).

74
A crucial implicit assumption made for these arguments to hold is that, no matter
the auction mechanism chosen at each stage, the seller gets to observe perfectly a sample
of the value distribution of each (or at least one in the symmetric case) buyer. In many
applications, this is unfortunately not true. Consider for instance the posted price mech-
anism, then the feedback actually received is only whether the value is above - or below
- current price. Similarly, if reserve prices in second price auctions are too high, bidders
might decide to opt-out the current auction (as in posted price, see also the discussions on
lazy vs eager auctions) and/or the seller might only have her revenue as feedback, because
she is using some black-box tool to actually run the auction.
This setting is called with partial feedback and are closely related to the multi-armed
bandit scenarii, and therefore similar techniques (quickly recalled in the following section,
see Bubeck and Cesa-Bianchi, 2012; Lattimore and Szepesvári, 2020; Slivkins et al., 2019
for more details) can be used.

3.6.1 A quick reminder on multi-armed bandits

In a bandit problem, an agent faces a sequential decision problem. At each stage t ∈ N, she
chooses an action (takes a decision, pulls an arm...) kt in some finite set K of cardinality K.
This generates a reward Xkt ,t that belongs to [0, 1] – this assumption can be fairly relaxed –
that is observed by the agent (contrary to the other possible rewards Xk,t , for k , kt , that are
not observed). The objective of the agent is to maximize her expected cumulated reward
PT
t=1 Xkt ,t .
To evaluate the performance of a learning algorithm, the cumulative reward should
be normalized; a traditional way is therefore to consider the expected regret, which is the
difference between the cumulative reward obtained by always playing the same action at
each stage and the cumulative reward of the agent.
It remains to describe how rewards are generated. Basically, there are two extreme
distinct possible scenarii that are quite different (and so are the associated algorithms). In
the stochastic case, Xk,t are i.i.d., of (unknown) expectation µk . In the adversarial case, Xk,t
can be any sequence of values in [0, 1], and Xk,t+1 could even depend on all the past history
(up to the previous stage t). The regret can be rewritten in the following form
T
X X
Stochastic RT = T µ? − µkt = ∆k Nk (T ) where µ? = max µk , ∆k = µ? − µk is called
t=1 k
the gap (or the cost) of choosing action k instead of an optimal one, and Nk (T ) =
PT
t=1 1{kt = k} is the number of times this decision has been made
T
X T
X
Adversarial RT = max E Xk,t − Xkt ,t , where the argument of the maximum might
k
t=1 t=1
change with time (unlike with stochastic data).

75
The multi-armed bandit literature focuses on finding algorithms that provably control the
regret with sub-linear growths (both in T and K). As mentioned before, the techniques
differ quite a lot from stochastic to adversarial data.

UCB and variants for stochastic data

Under the stochastic assumption, a natural proxy for µk is the empirical mean
t
1 X
X k,t = Xk,s 1{ks = k}.
Nk (t)
s=1

Unfortunately, this quantity is biased, and usually negatively, because reinforcement

algorithms tend to select actions that performed well in the past. Moreover, a negative
bias is naturally reinforced with time (i.e., it will not disappear), which is not the case of a
positive bias (as algorithms will certainly sample again that action). As a consequence, a
celebrated algorithm called UCB, for Upper-Confidence Bound, adds a small error term to
the empirical mean so that it will still be biased, but positively (this property is a direct
consequence of Hoeffding’s inequality).
UCB algorithm is defined by
s
log(T )

kt+1 = arg max X k,t +
k Nk (t)

Theorem 3.28. UCB algorithm has an expected regret bounded as

p X log(T )
E[RT ] ≤ min 2 2KT log(T ) + 2K ; 4 + 2K
∆k
k

Before giving the proof, we should mention that the different universal constants can
be improved with slightly more involved proofs. Similarly, it is possible to change the
log(T ) term in the definition of UCB by 2 log(t). Yet this result is sufficient for us.

Proof. First, let us recall Hoeffding inequality. It states that 1t ts=1 Xk,s ≤ µk + ε with proba-
P

bility at least 1 − exp(−2tε2 ). This implies that, with probability at least K T,

s s
log(T ) log(T )
∀t ≤ T , ∀k , ?, X k,t ≤ µk + and µ? ≤ X ?,t + ,
Nk (t) N? (t)
q
log(T )
It remains to control the regret. To lighten notations, we introduce εk,t = Nk (t)
. For the
second bound, notice that, by definition of UCB, kt = k, if

X ?,t + ε?,t ≤ X kt ,t + εkt ,t ,

76
which implies that, on an event of probability at least 1 − K/T ,
s
log(T )
µ? ≤ µk + 2εkt ,t = µk + 2 .
Nk (t)
4 log(T )
Inverting the above equation gives that necessarily, on that event, NK (T ) ≤ ∆2k
+ 1, and
summing over the different actions k gives the second bound.
P
For the first bound, recall that Rq
T = k Nk (T )∆k . Since we proved above that on event
4 log(T )
of probability at least 1 − K/T , ∆k ≤ N (T )−1 , we get that on this event (using the fact that
k
∆k is also smaller than 1 to avoid dividing by 0)
s
X X 4 log(T )
RT = ∆k Nk (T ) ≤ N (T ) + K
Nk (T ) − 1 k
k k:Nk (T )≥2
p Xp
≤ 2 2 log(T ) Nk (T ) + K
k
p
≤ 2 2KT log(T ) + K,
√
where the second inequality comes from the fact that √ N ≤ 2N as soon as N ≥ 2 and
N −1
the last one is a consequence of Cauchy-Schwartz inequality.

The UCB algorithm has been generalized, extended with many different variants to
improve the different dependencies. For our purpose, we might only consider the MOSS
algorithm (Audibert
√ and Bubeck, 2009; Degenne and Perchet, 2016) whose expected regret
scales as O KT .

EXP3 and variants for adversarial data

Unfortunately, UCB algorithm and its variants only works with i.i.d. data (or with strong
stationarity assumptions). In the adversarial case, when Xk,t can be any sequence, its regret
would increase linearly and another class of algorithm (based on optimization concepts
instead of reinforcement) has been introduced; it is called EXP.3 (for Exploration and
Exploitation with Exponential weights). The basic idea is to always choose actions at
random, with a positive probability, so that important sampling techniques can be used to
build unbiased estimates of rewards. Let us then denote by pk,t the probability of choosing
action k at stage t (that obviously depends on the past history). A classical way to estimate
the reward Xk,t is then to define

bk,t = 1 − 1 − Xk,t 1{kt = k};

X
pk,t

77
this estimate has the good properties of being both unbiased and always smaller than 1
(even if possibly arbitrarily small). The EXP.3 algorithm is defined by
P
exp η ts=1 X bk,s
pk,t+1 = P P ,
t
k 0 exp η s=1 X
b k 0 ,s

where η is a parameter to be chosen.

q
log(K)
Theorem 3.29. With the choice of η = KT , the expected regret of EXP.3 scales as
p
E[RT ] ≤ 2 K log(K)T
Pt
Proof. The proof relies
on a careful study of Φ(Wt ) where Wk,t = s=1 Xk,s
b and Φ(Z) =
1 P
η log k exp(ηZk ) . First, notice that by definition of Φ,

1
X
Φ(Wt+1 ) − Φ(Wt ) = log pk,t+1 exp(η X
bk,t+1 ) .
η
k

Using the facts that exp(ηx) ≤ 1 + ηx + η 2 x2 if x ≤ 1, that η X bk,t ≤ 1 and finally that
log(1 + x) ≤ x, we get
X X
Φ(Wt+1 ) − Φ(Wt ) ≤ bk,t+1 + η
pk,t+1 X b2
pk,t+1 X k,t+1
k k

1−Xk,t
In particular, plugging back the definition of X
bk,t+1 = 1 −
pk,t 1{kt = k}, we get
X X 1 − Xk,t+1 2
Φ(Wt+1 ) − Φ(Wt ) ≤ Xk,t+1 1{kt+1 = k} + η pk,t+1 1 − 1{kt+1 = k} .
pk,t+1
k k

Conditionally to the past history, the middle term satisfies, in expectation,

X X
E Xk,t+1 1{kt+1 = k} = Xk,t+1 pk,t+1
k k

For the last term, expanding the square yields that

X 1 − Xk,t+1 2
pk,t+1 1 − 1{kt+1 = k}
pk,t+1
k
X (1 − X 2
k,t+1 )
X
= 1{kt+1 = k} − 2 (1 − Xk,t+1 )1{kt+1 = k} + 1
pk,t+1
k k

78
Taking expectation, conditionally to the past history, gives that this term is controlled as
X 1 − Xk,t+1 2
E pk,t+1 1 − 1{kt+1 = k}
pk,t+1
Xk X
≤ (1 − Xk,t+1 )2 − 2 (1 − Xk,t+1 )pk,t+1 + 1
k k
and the latter is always smaller than K. To see this, assume, without loss of generality that
X1,t+1 ≥ Xk,t+1 for all k ∈ [K] so that
X 1 − Xk,t+1 2
E pk,t+1 1 − 1{kt+1 = k}
pk,t+1
Xk
≤ (1 − Xk,t+1 )2 + (1 − X1,t+1 )2 − 2(1 − X1,t+1 ) + 1
k≥2
X
≤ (1 − Xk,t+1 )2 + X1,t+1
2

k≥2
≤K
Plugging this back in the definition of Φ(Wt+1 ) − Φ(Wt ), and taking the expectation condi-
tionally to the past history give
X
EΦ(Wt+1 ) − Φ(Wt ) ≤ Xk,t+1 pk,t+1 + ηK.
k
log(K)
Summing over t, and using the fact that Φ(0) = η and Φ(Z) ≥ maxk Zk , we finally get
T
X X log(K)
E max X
bk,t − E Xkt ,t ≤ + ηKT
k
m
η
t=1
PT PT
which gives the result, as E maxk t=1 Xk,t
b ≥ maxk E t=1 Xk,t .

The EXP.3 algorithm is a standard building block of many online learning algorithms
with adversarial data. As p UCB, it has been improved in many directions, notably to get
rid of the sub-optimal log(K) term in the regret bound (yet at the cost of a much more
X
intricate proof). It is also possible to estimate Xk,t with p k,t 1{kt = k}. However, this estimate
k,t
can be arbitrarily large and the variance of EXP.3 cannot be directly controlled. The trick
is
p then to add a forced exploration term, i.e., to play uniformly at random with probability
K log(K)/T at each round.
p
Similarly to the stochastic case, it is possible to get rid of the log(K) term that arises
in the EXP.3 regret analysis with a more involved algorithm (and proof techniques). It is
then “optimal”’ in the sense that any learning √ algorithm
must (in some difficult problem
instances) have a regret scaling at least as Θ KT .

79
3.6.2 Auctions learning with partial feedback
As mentioned before, there are many instances where a seller only has incomplete data
on auctions run in the past (in posted price, lazy/eager second price, with a black-box
selling mechanism, etc.). However, it is still possible for the seller to learn the optimal
mechanism in many different cases, with actually a very small extra-cost compared to the
batch-approach.
Consider for instance the online posted price problem. A seller repeatedly posts a
price pt ∈ [0, 1] to sell identical items and buyers sequentially arrives, with private value
xt ∈ [0, 1]. The buyer t buys the item if xt ≥ pt , without revealing the true value. As a
consequence, the partial feedback available to the seller, before fixing the next price pt+1 ,
are all the indicators 1{xs ≥ ps } for s ∈ [t]. As before, the objective of the seller is to find, as
quickly as possible, the best price p∗ or to minimize the regret
T
X T
X
max p 1{xt ≥ p} − pt 1{xt ≥ pt }
p∈[0,1]
t=1 t=1

We recall that if data are stochastic, i.e., i.i.d. with cdf F, then p∗ is a root of the virtual
function (and the root if F is regular). Yet, the analysis carries on with adversarial sequence
of price pt .
In sequential learning, a first and naíve possibility is quite often to discretize the
decision space (here [0, 1]) and to run an UCB or EXP.3 algorithm on the discretization
(depending if data are stochastic or adversarial), agnostically to the structure at hand.
Given some ε > 0, the size of the discretization in the posted price problem is 1/ε, leading
to a global regret of the order of
√ 2 1
O(T ε) + O( T /ε) = O(T 3 ) with the choice of ε = T − 3 .

In the above equation, the first term corresponds to the approximation error due to the
discretization and the second to the estimation (or learning) error of the optimal price in
the discretized set.
On the other hand, the regret bound can be largely improved by leveraging the structure
of the problem, as least in the stochastic case, when the generating distribution behaves
nicely enough (the worst-case learning cost being indeed T 2/3 , see (Kleinberg and Leighton,
2003)). For instance, a typical assumption is that the monopoly profit function Π(p) =
p(1−F(p)) is approximatively quadratic around p∗ , i.e., that Π(p∗ )−Π(p) = Ω(p−p∗ )2 . Using
a UCB algorithm with a uniform ε-discretization then yields a total regret of the order of
1
ε − 14
log(T ) T
X p
2
O(T ε ) + O = O( T log(T )) with the choice of ε= .
k 2 ε2 log(T )
k=1

80
This simple technique cannot be improved, even with a stronger assumption: indeed,
since if the approximation term is of order T ∆, the estimation error is at least log(T )/∆.
However, the problem has a stronger property that can be further leveraged: if a price
pt is accepted, then any lower price would also have been accepted (and reciprocally). In
particular, this can be used in the following simplest possible problem, but where the
solution is highly counter-intuitive.
Suppose that each buyer has the same exact value xt = x. Then it’s clear that the optimal
price is p∗ = x; the remaining question is the learning cost. As mentioned before, the
feedback is in that case binary; either “x is greater than pt ” or “x is smaller than pt ”. In
order to find x, with the fewest query possible, then a binary search is optimal. However,
the binary search is exponentially sub-optimal in terms of learning cost.

Proposition 3.30. The regret of a binary search can be as large as Ω(log(T )). On the other
hand, there exists a more cautious search whose regret is smaller than O(log log(T )).

Proof. Assume that x = 12 . Then a binary search will use log(1/ε) posted – and refused–
prices, to reach the precision ε. Even with the optimal choice of ε = 1/T , this gives a log(T )
regret.

The cautious search works in epochs ` ∈ {0, 1, . . . , log2 log2 (T )} – let us assume for
simplicity here that log2 log2 (T ) is an integer. At the `-th epoch, the prices posted increase
` (`)
by 1/22 until such a price is refused and the next epoch begins. Let p? be the last accepted
(`+1) (`+1) (`) j−1
price at epoch ` and pj the j-th price posted at epoch ` + 1, then pj = p? + 2`+1 . At
2
the end of the epoch log2 log2 (T ), the cautious binary search posts the last accepted price
until the final stage T .
To compute the regret of the cautious binary search, notice that at each epoch `,
only one price is rejected, and this rejection has a cost smaller than 1. Moreover, since
(`) (`) (`+1)
p∗ ∈ [p? , p? + 12` ), then p∗ − p1 ≤ 12` and more generally, as long as posted prices are
2 2
not rejected
(`+1) 1 j −1
p ∗ − pj ≤ ` − `+1 .
22 22
`
Since they are at most 22 posted prices in epoch ` + 1, the cumulative cost of errors in that
epoch is bounded by
` `
22 22
X 1 j −1 X j
`
− `+1
= `+1
≤ 1.
j=1
22 22 j=1
22

As a consequence, each epoch has a bounded cost of (at most) 2 which gives the result as
only log2 log2 (T ) epochs are needed to get an error on p∗ smaller than 1/T .

81
The crucial property to obtain log log(T ) regret (Kleinberg and Leighton, 2003) is that
the cost function Π(p∗ ) − Π(p) is asymetric and decreases much slower on the left that on
the right. This property was later used again in the stochastic case to generalize √
Proposition
3.30 if the support of the distribution is finite to get a worst case bound of KT (Cesa-
Bianchi et al., 2019) and in the adversarial case to lower the parameter dependency in front
of the T 2/3 term y (Bubeck et al., 2017).
The fact that the reward mapping Π(p) cannot be any function, but must belong to a
specific family, can also be leveraged in learning the optimal reserve price in symmetric
(repeated) second-price auction (Cesa-Bianchi et al., 2014). Consider for instance the more
complex case where there are not only one but n bidders at each auction, and the feedback
to the seller is, as in posted price, the revenue of the auction – and not the true value.
In this specific case, the revenue is either the current reserve price pt (if the highest bid
is above it and the second highest below) or the second highest bid. In both cases, the
learner gets information on not only Π(pt ) but on the whole function Π(·). The learning
algorithm somehow combines the idea behind the cautious binary search and UCB. It
proceeds by epochs, and at each stage of the epoch k the proposed reserve price pk is
the same and always smaller than p∗ (at least with arbitrarily high probability). At the
end of an epoch, based on the data collected, a confidence interval of Π(·) is constructed
- this is possible because it only depends on the distribution of the second highest bid
in the symmetric case and only bids above the current reserve price matter – based on
the Dvoretzky-Kiefer-Wolfowitz inequality. Epoch after epoch, the error on the optimal
reserve price decreases and it is possible to control the regret (at the cost of intensive
computations).

p Learning the optimal reserve price in symmetric second price auctions

Proposition 3.31.
has a cost of O( T log(T )), where the dependency in the value distribution F is independent
of T , but hidden in the O(·) notation.

References
Albert, M., V. Conitzer, and P. Stone. 2017. “Automated design of robust mechanisms”. In:
Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 31. No. 1.
Armstrong, M. 1996. “Multiproduct nonlinear pricing”. Econometrica: Journal of the Econo-
metric Society: 51–75.
Athey, S. and P. A. Haile. 2007. “Chapter 60 Nonparametric Approaches to Auctions”. In:
Handbook of Econometrics.
Audibert, J.-Y. and S. Bubeck. 2009. “Minimax policies for adversarial and stochastic
bandits”. In: Proceedings of COLT.

82
Bar-Yossef, Z., K. Hildrum, and F. Wu. 2002. “Incentive-compatible online auctions for
digital goods.” In: SODA. Vol. 2. 964–970.
Bartlett, P. L., S. Boucheron, and G. Lugosi. 2002. “Model selection and error estimation”.
Machine Learning. 48(1-3): 85–113.
Blum, A., V. Kumar, A. Rudra, and F. Wu. 2004. “Online learning in online auctions”.
Theoretical Computer Science. 324(2-3): 137–146.
Bubeck, S. and N. Cesa-Bianchi. 2012. “Regret Analysis of Stochastic and Nonstochastic
Multi-armed Bandit Problems”. In: Machine Learning. Vol. 5. No. 1. 1–122.
Bubeck, S., N. R. Devanur, Z. Huang, and R. Niazadeh. 2017. “Multi-scale Online Learning
and its Applications to Online Auctions”. Proceedings of the Eighteenth ACM Conference
on Economics and Computation.
Cesa-Bianchi, N., T. Cesari, and V. Perchet. 2019. “Dynamic pricing with finitely many
unknown valuations”. In: Algorithmic Learning Theory. PMLR. 247–273.
Cesa-Bianchi, N., C. Gentile, and Y. Mansour. 2014. “Regret minimization for reserve prices
in second-price auctions”. IEEE Transactions on Information Theory. 61(1): 549–564.
Choi, H., C. F. Mela, S. R. Balseiro, and A. Leary. 2020. “Online display advertising markets:
A literature review and future directions”. Information Systems Research. 31(2): 556–
575.
Cole, R. and T. Roughgarden. 2014b. “The sample complexity of revenue maximization”.
In: Proceedings of the forty-sixth annual ACM symposium on Theory of computing. 243–
252.
Conitzer, V. and T. Sandholm. 2002. “Complexity of mechanism design”. In: Proceedings of
the Eighteenth conference on Uncertainty in artificial intelligence. 103–110.
Daskalakis, C., A. Deckelbaum, and C. Tzamos. 2013. “Mechanism design via optimal
transport”. In: Proceedings of the fourteenth ACM conference on Electronic commerce.
269–286.
Degenne, R. and V. Perchet. 2016. “Anytime optimal algorithms in stochastic multi-armed
bandits”. In: International Conference on Machine Learning. 1587–1595.
Devanur, N. R., Z. Huang, and C.-A. Psomas. 2016. “The sample complexity of auctions
with side information”. In: Proceedings of the forty-eighth annual ACM symposium on
Theory of Computing. 426–439.
Dhangwatnotai, P., T. Roughgarden, and Q. Yan. 2015. “Revenue maximization with a
single sample”. Games and Economic Behavior. 91: 318–333.
Drutsa, A. 2020. “Reserve pricing in repeated second-price auctions with strategic bidders”.
In: International Conference on Machine Learning. PMLR. 2678–2689.
Dütting, P., Z. Feng, H. Narasimhan, D. Parkes, and S. S. Ravindranath. 2019. “Optimal
auctions through deep learning”. In: International Conference on Machine Learning.
PMLR. 1706–1715.

83
Feng, Z., S. Lahaie, J. Schneider, and J. Ye. 2021. “Reserve Price Optimization for First
Price Auctions in Display Advertising”. International Conference on Machine Learning:
3230–3239.
Feng, Z., H. Narasimhan, and D. C. Parkes. 2018a. “Deep learning for revenue-optimal auc-
tions with budgets”. In: Proceedings of the 17th International Conference on Autonomous
Agents and Multiagent Systems. 354–362.
Fu, H. 2013. “VCG auctions with reserve prices: Lazy or eager”. In: Proceedings of the
Fourteenth ACM Conference on Economics and Computation.
Golowich, N., H. Narasimhan, and D. C. Parkes. 2018. “Deep learning for multi-facility
location mechanism design”. In: Proceedings of the 27th International Joint Conference on
Artificial Intelligence. 261–267.
Golrezaei, N., M. Lin, V. Mirrokni, and H. Nazerzadeh. 2017. “Boosted Second-price
Auctions for Heterogeneous Bidders”. In: Management Science.
Gonczarowski, Y. A. and N. Nisan. 2017. “Efficient empirical revenue maximization
in single-parameter auction environments”. In: Proceedings of the 49th Annual ACM
SIGACT Symposium on Theory of Computing.
Guerre, E., I. Perrigne, and Q. Vuong. 2000. “Optimal Nonparametric Estimation of First-
price Auctions”. Econometrica. 68(3): 525–574.
Guo, C., Z. Huang, and X. Zhang. 2019. “Settling the sample complexity of single-
parameter revenue maximization”. In: Proceedings of the 51st Annual ACM SIGACT
Symposium on Theory of Computing.
Hartline, J. D. and T. Roughgarden. 2009. “Simple versus optimal mechanisms”. In: Pro-
ceedings of the 10th ACM conference on Electronic commerce. 225–234.
Haussler, D. 1992. “Decision theoretic generalizations of the PAC model for neural net and
other learning applications”. In: Information and computation.
Huang, Z., Y. Mansour, and T. Roughgarden. 2018. “Making the most of your samples”.
SIAM Journal on Computing. 47(3): 651–674.
Kleinberg, R. and T. Leighton. 2003. “The value of knowing a demand curve: Bounds on re-
gret for online posted-price auctions”. In: 44th Annual IEEE Symposium on Foundations
of Computer Science, 2003. Proceedings. IEEE. 594–605.
Koltchinskii, V., D. Panchenko, et al. 2002. “Empirical margin distributions and bounding
the generalization error of combined classifiers”. The Annals of Statistics. 30(1): 1–50.
Lattimore, T. and C. Szepesvári. 2020. Bandit algorithms. Cambridge University Press.
Lavi, R. and N. Nisan. 2004. “Competitive analysis of incentive compatible on-line auc-
tions”. Theoretical Computer Science. 310(1-3): 159–180.
Le Thi, H. A., V. N. Huynh, and T. P. Dinh. 2014. “DC Programming and DCA for General
DC Programs”. In: Advanced Computational Methods for Knowledge Engineering. Ed. by
T. van Do, H. A. L. Thi, and N. T. Nguyen. Cham: Springer International Publishing.
15–35.

84
Lugosi, G. and S. Mendelson. 2019. “Mean estimation and regression under heavy-tailed
distributions: A survey”. Foundations of Computational Mathematics. 19(5): 1145–1190.
Manelli, A. M. and D. R. Vincent. 2007. “Multidimensional mechanism design: Revenue
maximization and the multiple-good monopoly”. Journal of Economic theory. 137(1):
153–185.
Massart, P. 1990. “The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality”. In:
The Annals of Probability. Vol. 18. No. 3. Institute of Mathematical Statistics. 1269–1283.
Medina, A. M. and S. Vassilvitskii. 2017. “Revenue optimization with approximate bid
predictions”. In: Proceedings of the 31st International Conference on Neural Information
Processing Systems. 1856–1864.
Mohri, M. and A. M. Medina. 2014. “Learning theory and algorithms for revenue opti-
mization in second price auctions with reserve”. In: International Conference on Machine
Learning. PMLR. 262–270.
Morgenstern, J. and T. Roughgarden. 2015. “The pseudo-dimension of near-optimal auc-
tions”. In: Proceedings of the 28th International Conference on Neural Information Process-
ing Systems-Volume 1. 136–144.
Ostrovsky, M. and M. Schwarz. 2011. “Reserve prices in internet advertising auctions: A
field experiment”. In: Proceedings of the 12th ACM conference on Electronic commerce.
59–60.
Paes Leme, R., M. Pál, and S. Vassilvitskii. 2016. “A field guide to personalized reserve
prices”. In: Proceedings of the 25th international conference on world wide web. 1093–1102.
Rahme, J., S. Jelassi, and S. M. Weinberg. 2020. “Auction learning as a two-player game”.
arXiv preprint arXiv:2006.05684.
Roughgarden, T. and O. Schrijvers. 2016. “Ironing in the dark”. In: Proceedings of EC. 1–18.
Roughgarden, T. and J. R. Wang. 2016. “Minimizing Regret with Multiple Reserves”. In:
Proceedings of the 2016 ACM Conference on Economics and Computation. 601–616.
Rudolph, M. R., J. G. Ellis, and D. M. Blei. 2016. “Objective variables for probabilistic
revenue maximization in second-price auctions with reserve”. In: Proceedings of the
25th International Conference on World Wide Web. 1113–1122.
Shen, W., S. Lahaie, and R. P. Leme. 2019a. “Learning to clear the market”. In: International
Conference on Machine Learning. PMLR. 5710–5718.
Shen, W., P. Tang, and S. Zuo. 2019b. “Automated mechanism design via neural networks”.
In: Proceedings of the 18th International Conference on Autonomous Agents and Multiagent
Systems. 215–223.
Slivkins, A. et al. 2019. “Introduction to Multi-Armed Bandits”. Foundations and Trends®
in Machine Learning. 12(1-2): 1–286.
Yao, A. C.-C. 2017. “Dominant-strategy versus bayesian multi-item auctions: Maximum
revenue determination and comparison”. In: Proceedings of the 2017 ACM Conference on
Economics and Computation. 3–20.

85
4 Adaptive and strategic learning agents
First read of this chapter, key concepts and ideas

This chapter focuses on dynamic settings, where the interactions

between bidders and seller are repeated. The first of the three
main results are that, if the mechanism is fixed, a bidder can learn
his value distribution on the fly at a small cost (or regret), see
Proposition 4.1. The two other main results explain how one agent
(either a bidder or a seller) can take advantage of the other one.
Indeed, the seller can extract the surplus from the bidder if, and
only if, she is way more patient than him (Theorems 4.8 and 4.9).
On the other hand, if the seller has to commit to some class mech-
anism, say second price auction with personalized reserve prices,
then the bidder can use these repetition to somehow manipulate
the seller at his advantage. Theorem 4.13 gives a simple statement
of this maybe counter-intuitive fact.

Online auctions are one of the most fundamental tool of the modern economy. A crucial
assumption behind the results of Chapter 3 is that the seller has access to some large
sample batch of bidders’ valuations. The objectives were to learn the optimal (or at least
the best possible in some class) mechanism based on this dataset. This model particularly
fits problems where the bidders are different from one auction to the other (typically such
as on Ebay) so that it is legitimate to assume that they bid their values (as long as they were
facing an incentive compatible auction) myopically, i.e., non strategically. Unfortunately,
this assumption of facing new bidders for each different item no longer holds in many
important economical situations, such as online advertising. Indeed, in that market the
so-called demand-side platforms (DSP), that are aggregate of bidders, repeatedly interact
with a single seller (called Supply-side platforms (SSP)), billions of times a day (Choi et al.,
2020).
The global objective of the seller remains identical: maximize the total revenue, cer-
tainly by learning (or trying to) the value distributions of the bidders. Indeed, it is still
interesting for SSP to optimize the reserve prices (personalized per DSP) since the number
of participants per auction is relatively low (the median number of bidders is equal to 6
(Celis et al., 2014)). The difficulty is now that bidders are also present in the game for a long
period optimizing their cumulative utility, instead of best-replying myopically to the seller
design of mechanism. Thanks to this long-term optimization, it is possible for a bidder
to sequentially learn his “optimal” bidding strategy, if the mechanism or his opponent’s
strategy or even his own distribution of valuations are unknown at first. Such a bidder is

86
adaptive to his environment, but he could even be strategic. Intuitively, and this will be
detailed later on, a strategic buyer might be tempted to modify his bids if he knows that
the seller is using them to set reserve prices. It can be much more profitable to face a low
reserve price by bidding non-truthfully than facing a high reserve price with truthful bids.
It might be worth mentioning here that bidders might have different values for different
auctions because the intrinsic value of an ad is strongly related to the probability that
the user seeing it clicks on it. As a consequence, the value distribution of bidders in this
chapter represents the time-variability of valuations for one specific bidder for the different
items that are sold successively. Notice, on the contrary, that in the precedent chapter,
the value of a buyer was fixed, but unknown to the seller that only had some prior (the
distribution) on its realization.
For simplicity, we shall assume in this section that values always belong to [0, 1]; this
assumption can be weakened, but at the cost of technicalities.

4.1 Adaptive bidders - Online learning to bid

As mentioned above, the buyers were assumed to be myopic in the previous chapters (or
equivalently, only present for one single auctions), a crucial hypothesis that should be
removed, at least in online ads markets. We will on the contrary assume that a single buyer
bids at each auction and that he can use these repetitions to improve his bidding strategy
by adapting to his unknown environment; there are at least two different aspects that can
be learned sequentially, depending on where the lack of information lies.

1. Bidding without knowing its own value: the bidder does not know the expected value
it gives to the item but could learn his own value distribution, by gathering a new
value sample each time he wins an auction. This is particularly relevant in online
advertising where advertisers have to show ads to potential buyers to understand
their propensity to buy a certain product (or at least, the propensity to click on an
ad).

2. Bidding without knowing the mechanism: a strategic bidder does not know precisely the
mechanism used by the seller (or alternatively, has few or no information on the other
bidders’ valuation distributions). The bidder can sequentially and incrementally
adapt his bidding strategy during the T successive auctions to maximise his expected
cumulative utility, hopefully achieving (almost) the same performances as if he knew
the whole mechanism in advance.

3. Bidding with budget constraints: a strategic bidder could have a pre-specified con-
straint that on the budget he can spend during the T successive auctions. In this

87
setting, at each round, he should bid (and spend some of his budget if he wins the
auction) without knowing the exact valuation he will get for the future rounds. The
bidder can sequentially and incrementally adapt his bidding strategy during the T
successive auctions to maximise his expected cumulative utility, without respecting
the constraint on his budget.

4.1.1 Online learning the value distribution to bid

In some oversimplified online ad example, a publisher (the website where an ad is dis-

played) gets paid by the advertiser each time a user clicks on that ad because this is that
random event that is relevant (as it might generate a sale afterwards), and not simply the
fact that the ad is displayed (as the user might not see it). As a consequence, the actual
value of an ad is random – either 0 or 1 to simplify things again, i.e., whether the user
actually clicks or not – of expectation x, i.e., the probability of click. If x is known before
participating in a DSIC auction, then due to the linearity of expectation, the optimal
strategy remains to bid truthfully x.
Unfortunately for the seller, there are many examples where the probability of click
is unknown to the advertiser before starting a new campaign; he would be willing to
bid truthfully this probability repeatedly, but it is unknown. This is an example where
multi-armed bandit techniques can be used to learn on the fly the probability of clicks
based on the feedback received. Notice that an ad that is not displayed will never be clicked,
hence the advertiser must acquire some data to estimate this probability.

The simplest model to study this sequential learning problem is the following (Weed et
al., 2016). Bidder i participates in a sequence t = 1, . . . , T of second price auctions (without
reserve prices) where the realized values vt ∈ {0, 1} are i.i.d. of unknown expectation x. Of
course, vt is not observed before participating to the auction t, and only if that auction
(1)
is won. Let us denote by b−i,t the maximal bids of the opponents (that could actually
include a reserve price) and by bi,t the bid of bidder i at that stage then the performance of
(1)
h i
the optimal truthful strategy is Tt=1 E x1{x ≥ b−i,t } while the learning to bid policy has
P
(1)
h i
gathered Tt=1 E x1{bt ≥ b−i,t } . As a consequence, the cost of learning to bid is measured
P

in terms of regret
T
(1) (1)
X h i h i
RT = E x1{x ≥ b−i,t } − E x1{bi,t ≥ b−i,t } .
t=1
P (1)
s≤t vs 1{bi,s ≥b−i,s }
As in multi-armed bandit, the empirical average xt = (1) is negatively biased.
]{s≤t ; bi,s ≥b−i,s }
As a consequence, the algorithm UCBid slightly biased it positively by adding a small error

88
term in its bid
s
log(T ) (1)
bi,t+1 = xt + 2 where ωi,t = ]{s ≤ t ; bi,s ≥ b−i,s },
ωi,t

is the number of auctions won until stage t.

Proposition 4.1. The UCBid algorithm has a sublinear regret against any sequence of
(1)
opponent bids b−i,t , as long as they are independent of vt (conditionally to the history), as
p
RT ≤ O T log(T ) .

(1)
Moreover, if the sequence b−i,t is also i.i.d. (of unknown law to bidder i), then the regret
grows is even much slower as
 1−α 1+α
2 log 2 (T ) if α < 1
 cα T



RT ≤  2
c log (T ) if α = 1

 α

 cα log(T )
 if α > 1

for some constants cα independent of T and where α is some regularity parameter called
“margin” defined as follows. There exist some C > 0 such that, for any ε > 0,
n (1) o
P b−i,t ∈ (x, x + ε) ≤ Cεα .

Proof. The proof is a bit technical and mostly sketched; the main ingredients are more or
less classical multi-armed bandit techniques. First of all, notice that, as a direct consequence
of Hoeffding inequality, with probability of the order of 1/T , all bids are bigger than x. As
a consequence, we shall only focus on this event where regret is only incurred on auctions
such that
s
(1) log(T )
x < b−i,t < xt + 2 .
ωi,t

Indeed, the optimal bid bi,t = x loses this auction while UCBid overbids and wins it. It
unfortunately pays more than its expected value. The net cost of this specific auction is
s s
(1) log(T ) log(T )
b−i,t − x ≤ xt + 2 −x ≤ 4 ,
ωi,t ωi,t

where the last inequality holds for all auctions on the event considered (that holds with
probability at least 1/T ). Summing over all the auctions t = 1, . . . , T gives the first bound.

89
The other bounds are derived from careful computations of
s
log(T ) o

(1) (1)
n
E (b−i,t − x)1 x ≤ b−i,t ≤ xt + 2
ωi,t
s
log(T ) o

(1) (1)
n
= E (b−i,t − x)1 0 ≤ b−i,t − x ≤ xt − x + 2
ωi,t
s
log(T ) 1+α

≤ CE xt − x + 2 ,
ωi,t +
where the last inequality is a consequence of the margin definition and x+ = max{x, 0}.
An important fact is that, if some regret is incurred at auction t ∈ N, then this auction is
necessarily won hence the counter ωi,t increases by one. As a consequence, the overall
regret can be controlled by
T
r
log(T ) 1+α
X
CE Xs − x + 2
s +
s=1

where X s is the average value of the first s auctions won. Hoeffding inequality implies that
2
P{X s − x ≥ ε} ≤ e−2sε , hence we get that the total regret is smaller than
T
r
log(T ) 1+α
X
CE Xs − x + 2
s +
s=1
T Z ∞
r
log(T ) α −2sε2
X
≤ C(1 + α) q ε+2 e dε
log(T ) s
s=1 −2 s

T Z∞ α 2
1
X p u
= C(1 + α) 1+α √ u + 4 log(T ) e− 2 du
s=1 (2s) 2 −4 log(T )
T 1+α T Z∞
X 16 log(T ) 2 (1 + α) X 1 u2
≤ C(1 + α) +C 1+α √ u α e− 2 du .
t 2
s=1 s=1 s 2 4 log(T )

The result follows from instantiating the above sum over different values of α. For α > 1,
the first sum is controlled using the fact that all terms are necessarily smaller than 1.

These results only hold if the realized values vt are i.i.d. of unknown expectation
x, using the basic ideas of stochastic multi-armed bandits and devising a new bidding
algorithm based on UCB. If values vt can be any sequence, then it is also possible to achieve
non-trivial regret bounds by using as a building block EXP.3 algorithm instead of UCB
(Weed et al., 2016). Those results and techniques can also be exported to other auctions
settings than Vickrey (Feng et al., 2018b).

90
4.1.2 Adaptivity to the mechanism and other bidders
If a bidder has, at first, not enough information on the auction mechanism and/or the
distributions of values of his competitors, he cannot compute an appropriate bidding
strategy even if he knows perfectly the values he gives to item. The repeated auction setting
can then be helpful for him to learn, on the fly, a strategy. A possible approach is again to
use ides from multi-armed bandits, but more precisely on contextual bandits.
For simplicity, assume that the value distribution of the bidder has a finite support
included in [0, 1], denoted by {x1∗ , . . . , xL∗ }. Then a bidding strategy consists in finding, for
each possible values x`∗ a corresponding bid. A classical contextual bandits technique
consists in discretizing the set of bids [0, 1] in {b1∗ , . . . , n∗K } and in running independent
versions of a base bandit algorithm (such as EXP.3 for instance), one for each possible
values, where the set of arms is the discrete set of bids. Such regret minimizing algorithms
have indeed been reported in the online advertising industry (Nekipelov et al., 2015).
In this setting, we can even assume that the opponents change through time, so that
their sequence of bids might be arbitrary (but, conditionally to the past and to the actual
value x` , their bids at some given auction t ∈ N are independent from the bid of player i).
In this setting, the oracles to which algorithms are compared to are stationary strategies,
i.e., fixed mappings from values to bid; denote their set by Bi . Then the maximal expected
cumulative utility bidder i can get in this class of strategies is
T
X h i
max E ui ((βi (xt ), b−i,t ), xt ) ,
βi ∈Bi
t=1

where xt is the value for the item t of bidder i, b−i,t is the vector of bid of his opponent and
βi is a possible strategy. This quantity serves as a benchmark for a learning to bid policy.
Notice that if the mechanism is DSIC, then the optimal mapping in the above equation
is β(x) = x, i.e., bidding truthfully. On the other hand, the realized cumulative utility of
bidder i is equal to
XT
E[ui ((bi,t , b−i,t ), xt )] ,
t=1
where bi,t is his bid at stage t, after observing the value xt . As a consequence, the overall
regret of bidder i is the difference between those two terms, the benchmark, and the
cumulative utility and reads as
T
X T
X
RT = max E[ui ((βi (xt ), b−i,t ), xt )] − E[ui (bi,t , b−i,t ), xt )] .
βi ∈Bi
t=1 t=1

As mentioned above, if we assume that bidder i is running independent versions of EXP.3

(one per possible value) then using standard results of multi-armed bandits recalled in

91
Section 3.6.1, we get that
L p
X X T
X
RT ≤ 2 K log(K)T` + max E[ui ((b` , b−i,t ), x`∗ )] − max
∗
E[ui ((b`∗ , b−i,t ), x`∗ )]
b` ∈[0,1] b` ∈[0,1]
`=1 t:xt =x`∗ t:xt =x`∗
(4.1)
where T` = ]{t ≤ T : xt = x`∗ } is the number of times the values was equal to x`∗ . The first
term of Equation 4.1 corresponds to the estimation error, and the second term to the
approximation error, because EXP.3 are restricted to bid in the discretization of [0, 1] while
the optimal bidding strategy does not have this restriction.
We will need some regularity assumption on the mechanism used by the seller; we will
assume that it is “C-almost Lipchitz”, for some constant C > 0, in the following sense. For
any value xi ∈ [0, 1], for any bids b−i of the opponents and any bids bi of bidder i, there
exists a point bkε in the ε-regular grid of [0, 1] (i.e., bkε = kε for some integer k) such that
ui ((bi , b−i ), xi ) ≤ ui ((bkε , b−i ), xi ) + Cε. Classical auction mechanisms (first and second price
auctions, with or without reserve prices, Myerson auction...) all satisfy this assumption
that ensures the approximation error is of order T ε if {b1∗ , . . . , bK ∗
} is the ε-regular grid (in
particular, this implies that K = 1/ε). We emphasize here that auction mechanism are
usually not Lipschitz (as there are discontinuities around the smallest winning bid).

Proposition 4.2. If the mechanism is C-almost Lipchitz (with C unknown) and the value
distribution has a finite support L, then there exists a learning to bid policy whose regret,
with respect to the optimal in hindsight bidding strategy smaller than
1
RT ≤ (2 + C)(LT 2 log(T )) 3

Proof. One just need to put the definition of C-almost Lipschitzness in Equation 4.1, as
this gives that regret scales as
r
1 1 1
RT ≤ 2 log( )LT + CT ε ≤ (2 + C)(LT 2 log(T )) 3
ε ε
with the specific choice of ε = (L log(T )/T )1/3 .

This result only holds for distribution with finite support; for continuous distribution,
the trick consists in bucketing the support of value distribution into small bins of size ε
and using, as before, an independent version of EXP.3 per bin. This only works with some
regularity assumption on a bin. Specifically, given a small bin I , we shall assume that there
exist a constant bid b that is ε-optimal on the set of stages where the values xs belongs to I ,
i.e.,
P P
s≤T :xs ∈I ui ((β(xs ), b−i,s ), xs ) s≤T :xs ∈I ui ((b, b−i,s ), xs )
max ≤ max + C 0 ε.
β:I →[0,1] ]{s ≤ T : xs ∈ I } b∈[0,1] ]{s ≤ T : xs ∈ I }

92
In particular, this assumption is satisfied if Ui is C 0 -Lipschitz and the bids b−i,t does
not depend (too much, at most in a Lipschitz fashion) on xt . Once again, balancing the
approximation (both in the bid and the value spaces) and estimations errors gives the
optimal choice of ε = (log(T )/T )1/4 for a regret scalling as
1
RT ≤ (2 + C + C 0 )(T 3 log(T )) 4 .

If the utility function ui (and/or the opponents bid b−i,t ) is not Lipschitz but less regular
(such as β-Hölder, which means that |ui (x) − ui (y)| ≤ Lβ kx − ykβ for some constant Lβ ),
then the rate of regret growth would be impacted as one should find better tradeoffs
in approximation vs estimation errors. This would typically lead to a regret scaling as
(T log(T ))b where b ∈ [ 12 , 1) is some parameter depending on the different regularities of
the mappings at hand.

4.1.3 Adaptivity to budget constraints and pacing options in practice

In practice, a common problem for bidders is to find strategies maximizing some cu-
mulative number of “events” (e.g., clicks, views) or some utility, subject to some budget
constraint (Balseiro et al., 2015; Gummadi et al., 2012; Fernandez-Tapia, 2015; Fernandez-
Tapia et al., 2016).
As an example, let us describe bidder i’s problem in second price auction for utility
(1)
maximization. Given his budget Bi , b−i,t the maximum bid of the competition at time t
(possibly including reserve prices), xt his value of the item and by bt at time t, the objective
is to solve
T
(1) (1)
X
max 1{bt > b−i,t }(xt − b−i,t ) ,
{bt }Tt=1
t=1
T
(1) (1)
X
subject to b−i,t 1{bt > b−i,t } ≤ B .
t=1

Under the assumption that bids are much smaller than total budget, the problem loses
much of its stochastic component and is often approximated by its so-called fluid approx-
imation (which replaces both objective and constraint by their expectations, effectively
appealing to uniform laws of large numbers (Dudley, 2014)), turning the problem to
T
(1) (1)
X
max E[1{bt > b−i,t }(xt − b−i,t )] ,
{bt }Tt=1
t=1
T
(1) (1)
X
subject to E[1{bt > b−i,t }b−i,t ] ≤ B .
t=1

93
Under mild assumptions, and this is a consequence of some strong duality properties, an
optimal bidding strategy is
xt
bt = ,
1 + µ∗
where µ∗ is the optimal solution of the dual problem associated with the constrained
optimization mentioned above (Proposition 3.1, (Balseiro et al., 2015)). The bidding strategy
is similar to the optimal bidding strategy in a second price auction; however the value of
each item is discounted by a factor accounting for the constraint.
This result can be extended to a much broader set of auctions, such as first price,
generalized second price etc... where optimal bidding turns out to be of a similar form to
optimal bidding without constraint, the value of each item being linearly discounted by a
constant accounting for the budget constraint (Gummadi et al., 2012).
Three important conceptual ideas emerge from this type of problems. The first one
concerns the question of pacing. For a wide variety of auctions (and payment rules), the
optimal strategy to maximize the purchased inventory amounts to spending one’s budget
smoothly, i.e., at the rate of arrival of auction requests (Fernandez-Tapia, 2015; Aström and
Murray, 2008). However, a crucial assumption for this result to hold is that the price paid
as a function of win rate does not depend on time and hence the environment is stationary.
Not surprisingly, when this assumption does not hold, “smooth-spending” at the rate of
arrival of auction requests is no longer optimal. These general techniques can however be
used in that more general case to understand the optimal rate of winning auctions and of
spending in these more general cases.
To be implemented in practice, these ideas require of course a forecast for both the
arrival rate of auction requests and the price paid at a certain win rate. When the bidder
has access to such information, ideas of model-predictive control and re-optimization can
be used (Ciocan and Farias, 2012).
A third important line of work concerns online estimation of the parameter µ mentioned
above, possibly without forecast (Balseiro and Gur, 2019). The essential idea is to use the
fact that µ∗ mentioned above in the solution of an optimization problem and to solve this
optimization problem online, using online gradient descent (Shalev-Shwartz and Ben-
David, 2014). To be slightly more specific, in the problem mentioned above, the optimal
solution for µ in hindsight is determined through
T T
(1)
X X
inf (xt − (1 + µ)b−i,t )+ + µB ,: inf `t (µ).
µ≥0 µ≥0
t=1 t=1

The functions `t are observable at time t and are a sequence of functions arriving in a
streaming fashion. As a consequence, the estimate of the parameter µ∗ can be updated
in an online fashion, without a forecast, by using for instance the online (sub)-gradient

94
descent rule
µt+1 = µt − γ∂µ `t (µt ) ,
where ∂ is the (sub)gradient operator and γ is the stepsize in the online gradient descent
algorithm. Various theoretical guarantees about this scheme, under a variety of optimistic
and pessimistic assumptions about the amount of information that is known about the
environment in which the bidder evolves, can be proved (Balseiro and Gur, 2019).
The literature on this topic is very large, with many different variations (Ghosh et al.,
2009; Choi and Mela, 2018; Lee et al., 2013; Yuan et al., 2013; Xu et al., 2015). The question
of handling the situation where the bid to budget ratio is not close to 0, and hence the
fluid approximation is not well justified, is quite open and appears to be more of a genuine
stochastic control type. An interesting approach seems to use ideas coming out of the
analysis of the online knapsack and related problems (Arlotto and Gurvich, 2019).

4.2 Mechanism design in front of adaptive bidders & Full sur-

plus extraction
As illustrated in the previous section, the standard techniques of bandits and online
learning can be used by bidders to choose their bidding strategy and potentially improve
it overtime (e.g., by improving the estimation of their own valuation). This adaptivity to
an initial lack of knowledge comes at a cost for the bidder, the regret that cumulates over
time. In a less straight-forward manner, it also impacts the seller, by making mechanism
design more complex. Indeed, even a basic notion such as incentive compatibility is
changed. For instance, in the context of learning to bid in a DSIC auction, the bidder
does not choose anymore his bid as a best response that optimizes his utility: due to
the need to find a compromise between exploration and exploitation, UCBid generally
overbids. Even while the auction is DSIC, a bidder using UCBid is not truthful anymore. The
following subsections describe how notions such as incentive compatibility and revenue
maximization can be modified when facing adaptive bidders.

4.2.1 Learning against bidders using zero-regret algorithms

As illustrated in the previous section, the standard techniques of bandits and online
learning can be used by bidders to choose their bidding strategy... assuming that the seller
let them do it. Indeed, it was more or less implicitly assumed that the mechanism was fixed
during the whole interaction, another assumption that is way too restrictive and should be
taken care of, as the seller could actually largely leverage this almost predictive behavior
for her own interest.
Let us consider the simple case where all bidders are independently running regret
minimizing algorithms (Braverman et al., 2018; Deng et al., 2019), and more precisely

95
where they are using algorithms such as EXP.3, that base their decisions on the mean
reward observed. Their particularity is that they rarely pick an arm whose current mean is
significantly worse than the current highest mean; for instance if during the first t stages
an arm k has generated an average reward (denoted by X k,t ) that is smaller than the one
of arm `, then the probability of choosing k over j is exponentially small. More precisely,
the difference of these log-probabilities scales linearly with X k,t − X j,t . We will consider in
the following a general class of algorithms that exhibit similar behavior, that are called
η-mean based, but more general than just EXP.3.
Definition 4.3. In the standard multi-armed problem, an algorithm is η-mean-based, for
some η ∈ (0, 1), if at any stage t ∈ N and for any pair of arms (k, `), if X k,t < X `,t − η then
the probability pk,t+1 that the algorithm pulls arm k at time t + 1 is smaller than η. An
algorithm is (asymptotically) mean-based if η = oT →∞ (1).
In particular, EXP.3 is asymptotically mean-based, and so is -greedy (for η = ε).
To simplify the following statements, we are going to assume that there is only one
bidder, whose value distribution F is known beforehand (an assumption that can be fairly
weakened (Deng et al., 2019)). If the seller was using the same mechanisms at each auction
t ∈ {1, . . . , T }, then the optimal one would obviously to post the monopoly price. This
generates a total revenue of T times the monopoly revenue (because the bidder will quickly
learn the optimal strategy). On the other hand, if the seller knows that the bidder is using
a mean-based algorithm, she can generate a much higher revenue (Braverman et al., 2018),
almost as high as the total welfare denoted by W(F) = Ex∼F [x]; with n bidders, the total
welfare would be W(F) = Ex∼F [maxi xi ]
To achieve this, the mechanism must change through time and possibly be itself
adaptive to the sequence of bids of the buyer. We therefore introduce the concept of
dynamic mechanism, so that the actual auction rules might change from stage to stage.
Recall that at each stage t ∈ {1, . . . , T }, bidders valuations for the item are sampled
through distributions Fi . These distributions are fixed from one auction to the other. We
denote by at ∈ A the auction mechanism chosen at time t by the seller and by bi,t the bid of
buyer i. We also denote by Ht = {a1 , b1,1 , bn,1 . . . , at−1 , b1,t−1 , bn,t−1 } the finite history at time
stage t, that consists of past auctions and past buyer’s bids.
S
Definition 4.4. A dynamic mechanism DM : t Ht → A is a mapping that associates to
any finite history Ht an auction at =: DM(Ht ). A bidder’s dynamic strategy, S, is a mapping
from Ht to the set B of strategies (i.e., functions from values to bids).
The following theorem states that a seller can extract the full surplus of the system, if
bidders are using naïve learning algorithms.
Theorem 4.5. If the bidder is running a mean-based algorithm, for any ε > 0, there exists
a dynamic selling mechanism such that the seller can get (1 − ε)W(F)T − o(T ).

96
The intuition behind this result is that the seller can lure the naïve algorithms such
as EXP.3 by setting low prices during a first (large) period of time and then by increasing
drastically the reserve price during a second stage. This is illustrated in the following
example with n = 1 bidder (Braverman et al., 2018).
Assume the bidder value xt has the following distribution
 1 1
4 with probability 2


 1
 1
xt = 
 2 with probability 4
 1 with probability 1

4

Simple computations shows that corresponding monopoly price is 1/4 and setting it at
each time generate a revenue of T /4 after T auctions.
To fool a mean-based algorithm, the seller can use the following scheme. At any stage,
it will only allocate the item if the bid is exactly 1 (any other bid gives a utility of 0). It
remains to define the payment associated to a winning bid of 1. During the first T /2 stages,
it is equal to 0, while it will be equal to 1 during the last T /2 stages.
Recall that the bidder runs multiple independent instances of EXP.3, one for each
possible values (1/4, 1/2 and 1), so we can focus independently on the set of stages where
the value is constant, and for the sake of simplicity we are going to assume that this value
equals 1/2 (resp. 1) exactly T /8 times during the first and second half of the game.

• On the set of stages where the value is 1, EXP.3 quickly learns that bidding 1 is
optimal during the first half of the game. This generates a cumulative utility of T /8
to the buyer. As a consequence, EXP.3 will keep bidding 1 when the value is 1 at each
stage of the second half with exponentially high probability. The revenue generated
on those stages by the seller is then approximatively T /8.

• When the value is 1/2, bidding 1 during the first half generates a revenue of T /16
to the buyer. During the second half, bidding 1 generates a negative utility of −1/2
per stage, so that the cumulative utility of bidding 1 decreases, but remains positive
during the whole process (and EXP.3 will keep bidding 1 with arbitrarily high
probability for almost all stages). The revenue generated on those stages by the seller
is then also approximatively T /8.

• When the value is 1/4, bidding 1 during the first half generates a revenue of T /16
to the buyer. During the second half, bidding 1 generates a negative utility of 3/4
per stage, so that the cumulative utility of bidding 1 decreases, but remains positive
during T /12 additional stages where EXP.3 will bid 1 with high probability (and
afterwards stop bidding 1 as the cumulative utility of this bid is negative). The
revenue generated on those stages by the seller is then also approximatively T /12.

97
At the end, the total revenue of the seller is therefore of the order of T /8 + T /8 + T /12 =
T /3 which is much bigger than T /4. The trick for the seller was to make the bidder
overpay on many auctions by exploiting the behavior of mean-based algorithms that keep
bidding 1 even when instantaneous negative utilities occur. Unfortunately for the seller,
this theorem only holds for mean-based buying algorithms. Even worse, for any dynamic
selling mechanism, there exists a buyer’s strategy such that he does not pay more than T
times the monopoly price (Braverman et al., 2018).

4.2.2 Trading off ex-post individual rationality for full surplus extraction

If the buyers are using naïve algorithms, and the seller knows this, then we proved that
she can extract (almost) the full surplus from the system. This was possible because of
the asymmetry of information between agents. There are other settings where this full
surplus extraction by the seller is possible. The first example we consider is the case where
the individual-rationality assumption of the mechanism is removed. This will induce
another strong asymmetry between agents, as bidders are somehow “forced” to participate
in auctions. The idea is to consider the weaker concept of ex-ante, instead of interim,
individual-rationality (see Section 2.1.1 for more details on the differences). In the ex-ante
setting, the bidder does not know the value he will give to the item before he agrees to
take part in the auction - he therefore has the same information as the seller on his private
valuation. An ex-ante individually rational mechanism must give a non-negative expected
utility to the bidder.

Theorem 4.6 (Cremer and McLean, 1988). There exists an ex-ante individually rational
and incentive-compatible auction where the bidders’ utilities are all equal to zero and the
seller extracts the full surplus.

Proof. Since the seller knows the bidders’ value distributions, she can compute their
expected utilities in a second price auction.
The mechanism constructed is simple. It consists of an entry fee that must be paid
before participating to the auction (stated otherwise, the bidder must pay this amount no
matter the outcome of the auction); afterwards, a standard second price auction without
reserve price is run. Choosing for the entry fee the expected utility in the second price
auction gives an expected utility of zero to each buyer and the seller extracts the full
surplus of the game. This mechanism is of course ex-ante-incentive-compatible.

This mechanism is not interim nor ex-post individually rational since for all valuation
vectors, the utility of all losing bidders is negative. In order to slightly overcome this issue,
(Balseiro et al., 2018) and (Mirrokni et al., 2016) refined this mechanism to ensure that
bidders’ utilities are not too negative at some point in the game; the trick is to dispatch the

98
fee on the different time steps instead of being paid at the beginning of the game. They
also generalize the original setting to more complex dynamic auctions.
Considering the ex-ante setting makes sense only in auctions where the buyers do
not know before participating their own valuation (but only the distribution). It is quite
unrealistic in many single item auction, but it could make some sense when T successive
auctions are run as in online ad market: indeed, the fee must be paid before taking part
in any of the T auctions. While bidder can compute their distribution of values in the
future, they do not know in advance what will be the exact future realizations. This
unfortunately requires the bidders to also know perfectly the number T of future auctions.
This assumption has been weakened by (Agrawal et al., 2018) that adapted the above
mechanisms to bidders that do not believe that there will be T auctions. In this case, they
are quite likely to refuse to pay a fee computed on T auctions early in the game.
A crucial assumption of this line of work is to assume that bidder’s value distributions
are known to the seller beforehand; this enables her to compute precisely the extra-fees
that can be charged to the bidders without breaking the ex-ante individually rational
assumption. Similarly, it implicitly assumes that bidders are able to compute best response
to dynamic mechanism (and that they implement them); this assumption is weakened in
the following section.

4.2.3 Learning against (almost) myopic buyers

A first attempt to remove the prior knowledge of bidder’s value distributions, and instead
to learn them (Amin et al., 2014; Mohri and Medina, 2015; Golrezaei et al., 2021), is
to consider mechanisms that are incentive compatible (up to a small number of bids)
under the assumption that bidders are almost myopic or impatient – i.e., they have a fixed
discount on future utilities. This again introduces an asymmetry between the bidders with
a discounted long-term utility, and the seller with an undiscounted long-term revenue
(infinitely patient).
To simplify the exposure, we will focus on the posted price case. Formally, let us denote
by pt the price of the item at time t chosen by the mechanism and by dt the decision of the
buyer to buy (dt = 1) or to refuse the item (dt = 0). Since the distribution of values is not
known beforehand, pt+1 can only depend on the finite history H0t = {p1 , d1 , . . . , pt , dt }. The
discounted bidder utility is Tt=1 γt dt (xt − pt ), where γ = (γt )t is a sequence of non-negative
P

weights. In this section, we shall assume for simplicity that values are uniformly bounded
by 1.
The objective of the seller is to choose a dynamic selling mechanism DM, that will
maximize her revenue, against buyers that know DM and respond optimally for them in
the long run, i.e., in the objective of maximizing their discounted and expected utilities. Let
us denote by dt∗ (DM) the optimal strategy of the bidder and by pt (DM) the price posted.

99
The performance of a dynamic mechanism will be measured in terms of “regret”, whose
definition is slightly different than in the previous section.

Definition 4.7. Given a dynamic mechanism DM, the discount sequence γ and the value
distribution F , the regret of the seller is
T
X T
X
RT (DM, γ, F) = E dt∗ (DM∗ )pt (DM∗ ) − dt∗ (DM)pt (DM)
t=1 t=1

In this setting, DM∗ consists in posting the monopoly price corresponding to the value
distribution F at each round. We emphasize here that the dependencies in γ and F in the
regret definition are hidden in the best responses dt∗ (·).

Theorem 4.8 (Amin et al., 2013). Let γt be any positive non-increasing sequence and DM
be any dynamic selling mechanism. Then, there exists a buyer value distribution F such
1 PT
that the regret RT (DM, γ, F) ≥ 12 t=1 γt . In particular, sublinear regret is impossible to
PT
achieve if t=1 γt = Θ(T ).

This theorem states that if the buyer is patient enough, the seller cannot learn the
monopoly price quickly enough to reach a sublinear regret. However, when the sequence
γt decreases geometrically, i.e., γt = γ t for some γ ∈ (0, 1], sublinear regret is possible as
P
γt = o(T ). In words, this means that if the buyer is much more impatient than the seller,
the latter can extract surplus; moreover, this can be achieved with a simple two-phased
dynamic mechanism (Amin et al., 2014).

1. Phase 1 (of length: αT ) : offer a random price, uniformly in [0, 1]

2. Phase 2 (of length: (1−α)T ) : compute the optimal price using some robust estimation
procedure and post it until the end.

−1/3 , the regret can be bounded

Theorem q4.9 (Amin et al., 2014). With the choice of α = T
2 log(T )
as O(T 3 log(1/γ) ) if the discount sequence satisfies γt = γ t , for some γ ∈ (0, 1].

The formal proof of this statement is a bit long and technical, but the main ingredients
are quite easy to understand. First, we are going to assume that the horizon T ∈ N is known
beforehand, otherwise one could just use the doubling trick.
The key idea is to bound the number of times the buyer can lie by not being truthful,
i.e., either by buying the item at a price higher than his value or refusing a lower price.
Notice that the net cost of a lie at stage t ∈ N, if the stage valuation is xt and the price
posted pt , is exactly equal to |xt − pt |. Since the prices pt are i.i.d., and uniformly drawn on
[0, 1], then the potential costs |xt − pt | are smaller than ε only 2εαT times during the first

100
phase (at least in expectation, but we are going to neglect the deviations in this sketch of
proof). As a consequence, if the buyer lies L times during this phase, then at least L/2 of
those lies must have a cost of at least L/(4αT ). Recall that the buyer puts weight γt = γ t to
the t-th stage, so that the cumulative, discounted cost of those L lies is at least
αT
X L L γ αT +1 −L/2−1 L 1 L2 +1 γ αT +1
γt = (γ − 1) ≥ .
4αT 4αT 1 − γ 8αT γ 1−γ
t=αT −L/2

It remains to control the total gain of those L lies. As best, they will induce a posted price
of pt = 0 during the second stage, and a per-stage gain of at most 1 for the buyer. As a
consequence, the total cumulative gain of the buyer is at most
T
X γ αT +1 γ αT +1
γt = (1 − γ T (1−α) ) ≤ .
1−γ 1−γ
t=αT +1

All things put together, the cumulative discounted net gain of lying L times is upper-
bounded by
γ αT +1 L 1 L2 +1 γ αT +1 γ αT +1 L 1 L2 +1

− = 1− .
1 − γ 8αT γ 1−γ 1−γ 8αT γ
A direct consequence of the above inequality is that the number of lies L, for them to be
profitable, must satisfy
L 1 L2 +1 log(8αT )
≤ 1 =⇒ L ≤ 2 .
8αT γ log(1/γ)
As a consequence, this gives a simple upper-bound on the number of lies that can be seen
as “outliers” from the point of view of the seller, when trying to estimate the optimal
price.
From the point of view of the seller, the regret can be decomposed into the cost of the
first phase, bounded by its length αT , and the cost of the second phase, bounded by T η,
where η is the error on the optimal price computed during the first phase. The remaining
question consists in bounding this error; standard robust estimation techniques (such as
median1 of means (Lecué √ and Lerasle, 2020)) or gradient descents with outliers indicate
that η is of the size of L/αT . Adding both terms and considering the previous bound on
L gives a regret scaling as, up to multiplicative constant,
s r s
log(T ) T log(T )
αT + ≤ T 2/3
log(1/γ) α log(1/γ)
1 This technique consists in dividing the full dataset of size αT in 2L different datasets and estimating
the optimal price on each of them. There necessarily exists a majority of small datasets without outliers that
estimate correctly the optimal price. Hence taking the median value is a robust procedure as long as L = o(αT ).

101
with the choice of α = T −1/3 . The simple idea behind this algorithm was then refined
(Mohri and Medina, 2015) and extended to the case of K bidders (Golrezaei et al., 2021).

Remark. Once again, this surplus extraction is possible only because there is an (artificial)
asymmetry between the seller and the buyer preventing him to be too strategic. This
can also be enforced through another approach, yet it is valid only with several (almost)
symmetric bidders – leading to another type of asymmetry between the seller and buyers.
The idea is quite simple: make the computations required (e.g., to determine a reserve price)
not as a function of the buyer bids, but as a function of his competitors’ bid (Ashlagi et al.,
2016; Kanoria and Nazerzadeh, 2014; Epasto et al., 2018). Unfortunately, this approach
cannot handle the existence of any dominant buyer, i.e., a buyer with much higher values
than the other bidders (Epasto et al., 2018). Therefore, the impact of this technique is
quite limited since revenue-optimizing mechanisms are mostly important when the buyers
are heterogenous. Moreover, in the main real-world application of online advertising,
with asymmetric bidders and no specific asymmetry between seller and buyers on future
utilities, none of these mechanisms ends up being able to enforce truthful bidding.

4.2.4 Incentive Compatibility for Adaptive bidders

As illustrated by Theorem 4.8, when facing a patient bidder, the seller can’t expect to get a
revenue equivalent to these of a revenue-maximizing auction, even in expectation. Indeed,
in this case, the mechanisms previously described cannot ensure the first crucial need
to implement a dynamic selling mechanism: making sure the bidders bid truthfully, at
least often enough. In the simpler case, when the bidders fully observe their value before
bidding, the seller can implement a simple second-price auction without reserve price
(or with a fixed one), which ensures truthfulness and a minimal revenue. However, in the
slightly more complex setting (but also more realistic) used in Section 4.1.1, this is no
longer straightforward.
This section focuses on a simple setting to study this problem, the static bid model, and
extension to more complex settings are explained at the end. This setting describes well
the situation encountered when buying online advertising space. At time t ∈ [T ], the value
xi,t of bidder i is decomposed into two parts: xi,t = αi ci,t . The first part, αi ∈ [0, αmax ] (e.g.,
a cost-per-click) is a private knowledge bidder i has from the start, while the second part
ci,t (e.g., a click on an ad banner) is a binary random variable with mean ρi and which
realization is observed, for the winner it only, after assignment. Because the objective
of this section is to restrict our study to DSIC dynamic mechanisms, we can assume
the bidders are sending the same bid at any time step and have a formalization where
the bidders are sending the bids b once, at the beginning (at t = 0). We don’t assume
the bids are bounded above. Then, the assignment is sequential, in a similar way as in
Definition 4.4. Indeed, let us denote by Ht = {q1 , ci1 ,1 . . . , qt−1 , cit−1 ,t−1 } the history up to

102
time t, where qt = q(b, Ht ) ∈ ∆n is the assignment at time t resulting from the sequential
assignment function q : B × t Ht → ∆n . We will explicitly denote it Ht (b, c) if/when
S

we want to emphasize that Ht is a function of the bids b and of the click (potential)
realizations c ∈ {0, 1}T ×n . Finally, because the bidders only give one bid at the beginning,
we can consider the payment p : B × HT → Rn is done at the very end (after t = T ), sort of a
final billing.
In fact, this setting can be viewed as a multi-armed bandit (MAB) with an "unusual"
way to define the reward, the assignment function q being the bandit algorithm. So here,
the question will be whether it is possible to recover the performance of optimal MAB
algorithm (KL-UCB) or whether restricting to assignments q for which it is possible to find
a payment that make the mechanism incentive compatibility will lead to a degradation
of the performance. Following different objectives of performance, the pseudo-regret can
be defined in terms of seller’s revenue (Devanur and Kakade, 2009) , or in social welfare
(Babaioff et al., 2014). In both case, the comparator of the regret is a weighted second-price
auction for which ρi s are known. Remembering we denote by it the winning bidder at time
t and by smax the second-highest element ("second max"), we have
T
p
X
RW
T = T max αi ρi − αit ρit RT = T smax αi ρi − p(b, HT )
i i
t=1

Deterministic Assignment We split the analysis on whether the assignment is random-

ized or not, as it incentive compatibility imposes very different constraints depending on it,
leading to different orders of performance. When assignment is deterministic, it is possible
characterize the dynamic mechanisms that are DSIC. We provide this charaterization for
the case of two bidders, i.e., n = 2, for the sake of simplicity, that can be extended to n > 2
bidders (Babaioff et al., 2014). As a technical detail, we assume here the sequential assign-
ment is non-degenerate, meaning that when a bid b generates a given assignment, all bids
of the form (u, b−i ) generates the same assignment, for u ranging in some non-degenerate
interval containing bi 2 .

Theorem 4.10 (Babaioff et al., 2014). For n = 2, given a scale-free3 and deterministic
dynamic allocation q, there exists a payment p such that the resulting dynamic mechanism
is 0-rational and DSIC iff

1. (pointwise-monotone) for any bid profile and for any realization of the history, if bidder i
wins at round t, he would still win by bidding higher, i.e.
2 This assumption is technical and allows to avoid dealing with exposing results that hold almost surely
w.r.t. Lebesgue measure.
3 The scale-free property just means that rescaling the bids doesn’t change the outcome of the assignment –
e.g. it does not depend on the currency.

103
∀t ∈ [T ], ∀c ∈ {0, 1}t×n , ∀b ∈ B, if bidder i wins the auction at time t,
then for b̃i > bi , we have qi (b̃i , b−i , Ht (b̃i , b−i , c)) ≥ qi (b, Ht (b, c)).

2. (exploration-separated) at any step t whose output impacts a future assignement, the

assignment does not depends on the bids b.

Proof. To simplify notation for the proof, we denote q(b, c) ∈ Rn×T the matrix with columns
qt (b, c)> = q(b, Ht (b, c))). Further, we recall that the payment for a truthful mechanism is
Rb
defined by pi (b, c) = hbi ci , qi (b, c)i − 0 i hci , qi (u, b−i , c)idu (Archer and Tardos, 2001). We
break the proof in three steps, proving first that 0-rationality and DSIC implies monotonic-
ity, then exploration-separation and finally the converse statement.
DSIC ⇒ monotone. The proof is by contradiction. Assume there exists t, c, b, bi+ such
that bi < bi+ and qi,t (b, c) > qi,t (bi+ , b−i , c). W.l.o.g. we can assume there are no clicks at any
time t 0 ≥ t (as they do not affect assignment at time t) and we denote c0 = c ⊕ 1{(i, t)}, where
⊕ denotes the bit change – i.e. the addition modulo 2. Because buyer i does not win at step
t by bidding bi+ , we should have pi (bi+ , b−i , c) = pi (bi+ , b−i , c0 ). Contradiction will come by
proving they are not equal.
We can focus on the integral term of the payment, as the first one does not change
between c and c0 .

∀u ∈ [0, αmax ], hci , qi (u, b−i , c)i ≤ hci , qi (u, b−i , c0 )i (no clicks after time t)
Z b+ Z b+
i i
⇒ hci , qi (u, b−i , c)idu ≤ hci , qi (u, b−i , c0 )idu
0 0

Further, because the assignment is non-degenerate, there exists an interval I containing bi

such that for all u ∈ I ,

c> 0 0
i qi (u, b−i , c) = hci , qi (bi , b−i , c)i < hci , qi (bi , b−i , c )i = hci , qi (u, b−i , c )i
R b+ R b+
Then, it means 0 i hci , qi (u, b−i , c)idu < 0 i hci , qi (u, b−i , c0 )idu, which is a contradiction
with the payments being equal.
DSIC ⇒ exploration-separated. The proof is again by contradiction. Assume there
exists t < t 0 , c, b such that

1. q2,t (b, c) = 1, i.e.,. buyer 2 wins round t, w.l.o.g.,

2. qt 0 (b, c) , qt 0 (b, c0 ) with c0 = c ⊕ 1{(2, t)}, i.e., time t 0 is influenced by output of time t,

3. ∃b0 ∈ B, q1,t (b0 , c) = 1, i.e., the assignment at time t depends on bids,

4. t 0 is minimal (w.l.o.g.) and

104
5. there is no click after t 0 (again, w.l.o.g.).
b̃1
Since q is scale-free, for b1+ = b , we have q1,t (b1+ , b2 , c) = 1, thus, as q is pointwise-
b̃2 2
monotone, b1+ > b1 . Because the difference between c and c0 is on buyer 2 at time t, then
p(b1+ , b2 , c) = p(b1+ , b2 , c0 ). Contradiction will come by proving they are not equal.
R b+
We focus on the payment of buyer 1 and again, on the integral terms 0 1 hc1 , q1 (u, b2 , c)idu
R b+
vs 0 1 hc1 , q1 (u, b2 , c0 )idu. Assume w.l.o.g. that q1,t 0 (b, c) < q1,t 0 (b, c0 ), then by pointwise
monotonicity, we have ∀u < b1+ , q1,t 0 (u, b2 , c) ≤ q1,t 0 (u, b2 , c0 ). Then, since q is non-degenerate,
the strict inequality holds on a non-degenerate interval, hence
Z b+ Z b+
1 1
hc1 , q1 (u, b2 , c)idu < hc1 , q1 (u, b2 , c0 )idu,
0 0

which is in contradiction with payments being equal.

exploration-separated + monotone ⇒ DSIC. Since q is pointwise-monotone, it is
monotone, hence the auction is truthful and 0-rational if it can implement the payment
of a truthful mechanism (Archer and Tardos, 2001). The main challenge is to show p is
Ht (b, c)-adapted, i.e., it can be computed with access to observable information (observed
Rb
clicks) only, and especially the integral term 0 i c> i qi (u, b−i , c)du. Indeed, for u < bi , buyer i
can loose an assignment at some time t, which may impact future assignments, potentially
requiring to use clicks that were not observed with bids b to compute the payment.
Z bi
pi (b, c) = bi hci , qi (b, c)i − hci , qi (u, b−i , c)idu
0
Z T
bi X
= bi hci , qi (b, c)i − ci,t qi,t (u, b−i , Ht (i, b−i , c))du (by definition of q)
0 t=1

As q is exploration-separated, then qt (b, c) only depends on two elements. The first one
is obviously b, and the second one is subset Et (c) of the set histories Ht (b, c) that is
independent of b. Consequently, we can write qt (b, Ht (b, c)) = qt (b, Et (c)) and thus
Z T
bi X
pi (b, c) = bi hci , qi (b, c)i − ci,t qi,t (u, b−i , Et (c))du.
0 t=1

This implies that the payment is Ht (b, c)-adapted.

Note that Theorem 4.10 characterizes the dynamic assignment, as the payment is
derived as in Corollary 2.18 (Archer and Tardos, 2001). This theorem confirms earlier
results (Devanur and Kakade, 2009), based on a dynamic selling mechanism with an
explore then commit structure (ETC, Perchet and Rigollet, 2013). We will describe this

105
specific algorithm for welfare regret (Babaioff et al., 2014) and provide upper-bounds for
it as it is possible to derive the guarantees both in terms of welfare and revenue for this
algorithm; however, seller’s revenue regret can be handled quite similarly (Devanur and
Kakade, 2009).
For this algorithm, it turns out the final payment is very naturally decomposed as the
sum of per-step payments, so we describe it this way. For the first j k τ steps, the assignment
is a round-robin over the bidders, leading each bidder to win nτ times and paying 0 each
time. This exploration phase allows to build an unbiased estimate ρ̂i of ρi . Further, it
allows to ensure that
r
n n

P ∃i ∈ N , |ρ̂i − ρi | ≥ 2 log ≤ δ. (4.2)
τ ∧T δ
| {z }
,r

During the remaining T − τ steps, the auction is a weighted-second price auction4

smaxj ρ̂j+ bj ρ̂i+
qi (b) = 1{i = argmax ρ̂j+ bj }, and pi (b) = qi (b) where ρ̂j+ = ρ̂j + r .
j
p
Both regrets considered, RW
T and RT , can be upper-bounded by the same rate.

Theorem 4.11 (Devanur and Kakade, 2009; Babaioff et al., 2014). The algorithm de-
1/3 T 2/3 log(nT )) and Rp = O(n1/3 T 2/3 log(nT )).
p p
scribed above guarantees that RW
T = O(n T

Proof. Before beginning, as τ is a variable to optimize over, we need to handle the case τ > T
properly, in order to avoid vacuous upper-bounds. Indeed, the length of the exploration
stage is not τ, but rather τ ∧ T , while there are (T − τ)+ remaining steps. Further, we will
make extensive use of the concentration (4.2).
We denote i ∗ = argmaxi αi ρi and i + = argmaxi αi ρ̂i+ . Then with probability at least 1 − δ,
we have,

(ρi + + 2r)αi + ≥ ρ̂i++ αi + ≥ ρ̂i+∗ αi ∗ ≥ ρi ∗ αi ∗ (4.3)

From which we can deduce

ρi ∗ αi ∗ − ρi + αi + ≤ 2αmax r (4.4)

Then the regret RW

T can be upper-bounded, denoting τ ∧ T = min{τ, T }, as follow:
r
n n

W
RT ≤ (τ ∧ T )αmax + (1 − δ)(T − τ)+ 2αmax 2 log + δT αmax (4.5)
τ ∧T δ
4 Babaioff et al., 2014 proposes a slightly different algorithm, using ρ̂ instead of ρ̂+ . It enjoys the same
i i
guarantee in terms of welfare, but it is unclear whether it also enjoys the same guarantee in terms of revenue.

106
Choosing δ = T1 and τ = n1/3 T 2/3 log nT finishes the proof for RW (T ).
p

We know prove the upper-bound on RP (T ).

smaxi ρ̂i+ αi smaxi ρ̂i+ αi smaxi ρi αi +
!
smax ρi αi − ρi + = ρ̂ + − ρi +
i ρ̂i++ ρ̂i++ smaxi ρ̂i+ αi, i
!
smaxi ρi αi +
≤ αi + ρ̂ + − ρi + (by definition of i + )
smaxi ρ̂i+ αi i

≤ αi + ρ̂i++ − ρi + (with proba. 1 − δ)
≤ αmax r
p
Then the regret RT can be upper-bounded as follow:
r
n n

p
RT ≤ (τ ∧ T )αmax + (1 − δ)(T − τ)+ αmax 2 log + δT αmax (4.6)
τ ∧T δ

Choosing again δ = T1 and τ = n1/3 T 2/3 log nT finishes the proof for RP (T ).
p

Lower-bounds. These rates of T 2/3 for both regrets are tight, as shown by the following
result,

Theorem 4.12 (Devanur and Kakade, 2009; Babaioff et al., 2014). For any deterministic,
scale-free sequential assignment q and payment p such that (q, p) is DSIC, there exists a set
p
of bids and distributions over c such that RW
T , RT = Ω(n
1/3 T 2/3 ).

Comparison to MAB. As the social welfare coincide with the reward from a MAB point
of view, we can compare this performance to optimal performance on MAB problems. It
turns out the DSIC constraint is actually strong, as it implies a degradation of the regret
by a factor T 1/6 – "the cost of (ex-post) truthfulness" – from O(T 1/2 ) for optimal MAB
algorithms to Ω(T 2/3 ) when ensuring incentive compatibility. To understand intuitively
where this degradation comes from, it is possible to focus on explore-then-commit (ETC)
types of algorithms for the case n = 2, as an exploration-separated assignment rule is a
special case of ETC algorithm. In a pure bandit setting, two types of ETC algorithms can
have a regret of order O(T 1/2 ). Either an adaptive ETC that eliminates arms as soon as
they are detected to be sub-optimal or a fixed-design ETC, at the condition of knowing
in advance the gap ∆ of performance between both arms. Unfortunately, none of them is
exploration-separated. For adaptive ETC, because during the exploration step, the decision
(taken at each time step) to eliminate an arm or to keep it, depends on the estimated
reward of the arm and thus the bids. For a fixed-design ETC, the problem comes from
the need to know in advance the gap: whatever the value of ∆, choosing an exploration
period of length ∆−2 ∧ T ensures a regret upper bounded by O(T 1/2 ). However, because the

107
length of the exploration period depends on ∆ (the gap), which in our case is a function
of the bids, such choice of the length of the exploration period makes the assigment not
exploration-separated. Making an ETC algorithm exploration-separated requires for it to
be fixed-design with the length of the exploration period set independently from the gap,
which is known to cause a degradation of the regret, this so-called "cost of truthfulness".

Randomized Assignment. From the previous result, it would seem that the "cost of
truthfulness" may come from the strong requirement of ex-post incentive compatibility.
However, this lower bound can be circumvented by considering non-deterministic sequential
assignments, which allows to ensure ex-post incentive compatibility without restricting the
algorithm to be exploration-separated (Babaioff et al., 2010). Consequently, a randomized
dynamic mechanism, ex-post DSIC5 , with a O(T 1/2 ) regret guarantee in welfare can be
constructed. It relies on two ingredients:

1. a MAB algorithm that leads to an ex-post monotone assignment – e.g. an adaptive

ETC with successive elimination (Perchet and Rigollet, 2013),

2. a sampling procedure that modifies the bids that are inputed to the mechanism

Using these properties, it is possible to obtain a regret in terms of welfare that matches the
one of the underlying MAB algorithm and as we mentioned previously, an adaptive ETC
reaches the rate of O(T 1/2 ), which is optimal (Babaioff et al., 2010).

Non-stationary values As explained at the beginning of the section, we considered a

simple setting where the values α, and thus the bids b, are constant over time. It is possible
to consider a more complex setting where the values can change over time. In such case,
the bidders submit a different bid at each time-step and the payments need to be per-step
instead of a final bill. To keep the parallel with bandit problems, this is not a MAB anymore,
but rather a contextual linear bandit with finitely many arms (the contextual part coming
from the arm set changing as the values are changing). As it turns out, the algorithm
from Devanur and Kakade, 2009 is still valid (remember the payments were designed
per-step) and is DSIC even in a strong sense along the sequence. Further, one can notice
the proof of Theorem 4.11 for the upper-bounds on the regret never relies on the values
being constant over time and perfectly holds in this non-stationary setting. As this setting
is strictly more general, the lower bounds from Theorem 4.12 also hold, meaning that
deterministic mechanisms still have regrets scaling as T 2/3 . However, it stays an open
question whether moving to randomized mechanisms still allows to recover the rate of
T 1/2 which is also the optimal rate for linear bandits.

5 Here, ex-post is related to the realization of c , but in expectation over the randomness of the algorithm.
i,t

108
Extensions. Several other directions can be explored to better model the underlying
applications or to slightly relax the mechanism design contraints. A way to relax the
constraint of ex-post DSIC, is to consider an asymptotic version, where the benefit of not
being truthful vanishes over time (Nazerzadeh et al., 2008; Kandasamy et al., 2020). It
turns out this neither allow to avoid the explore-then-commit structure of algorithms for
deterministic assignments nor to avoid the degradation of the regret to T 2/3 , even for more
complicated mechanisms than auctions (Kandasamy et al., 2020).

4.3 Reversing the asymmetry: Strategic buyer vs. myopic seller

In the previous sections, the asymmetry between the seller and the buyers was always in
favor of the former. On the other hand, there are many cases where the converse happens:
the seller has her hand tied, while the buyers can and try to exploit this. For instance, the
seller must sometimes disclose (and commit to) the learning algorithm she is using to
devise her mechanism - for instance, a second price auction with reserve price, or more
generally the Myerson revenue-maximizing auction, based on the distribution of bids
received. Let us denote by M(FB1 , . . . , FBn ) the mechanism induced by the past distribution
of bids FBi ; for instance, it can be a second price auction with personalized reserve price
computed from FBi (and not from the Fi that are not known beforehand).

4.3.1 A Stackelberg view

This problem has been tackled under the assumption of perfect knowledge of the opti-
mization algorithm used by the seller (Kanoria and Nazerzadeh, 2014; Tang and Zeng,
2018; Nedelec et al., 2019a). It exploits a conceptual opening in most automatic mecha-
nism design works, i.e., the breakdown of incentive compatibility for the buyer when the
seller optimizes over incentive compatible auctions. In some sense, the computation of
bid distributions FBi instead of value distributions Fi can be seen as an “attack” of the
optimization algorithm of the seller (possibly based on deep learning for complex auction
systems (Dütting et al., 2019)). However, we point out now that those attacks differ from
the celebrated adversarial attacks in computer vision. Indeed, the latter generally rely
on the lack of local robustness of a classifier. Two other major differences are also quite
important: these “attacks” do not necessarily yield lower revenues for the seller (Nedelec
et al., 2019a); and they are also part of a dynamic game between buyers and seller and as
such have a dynamic component that is absent from classical and static machine learning
frameworks, such as image classification.
For concreteness, consider the case of second price auctions. In the classical setting of
auction theory, the buyer is asked to reveal their bid distribution first; facing “truthful
auctions”, they reveal their value distribution. The seller then optimizes their mechanism

109
based on this information, finding an optimal reserve price for this buyer. This is a Stack-
elberg game, as the two players do not play at the same time. In this instance, the seller
is the leader and the buyer is the follower. Most of the literature on optimal auctions is
focused on this version of the Stackelberg game.
Howerver, if the bidder knows that the seller is going to find an optimal mechanism,
and hence that she will optimize the auction based on the information given by his bid
distribution, he can anticipate this optimization to increase his utility. The order of the
Stackelberg game is then reversed. The bidder becomes the leader and the seller the
follower: he reveals his bid distribution knowing the optimization problem that she will
solve. In second price auctions with reserve prices, the bidder has an incentive to disclose
a bid distribution that may be different from his value distribution as he then might be
facing a more favorable reserve price.
More formally, the timing of the game we consider is the following:

1. the seller chooses a mapping M : F → A, from the set of bid distributions to the set
of auction mechanisms,

2. based on this choice of mapping, each buyer picks a bidding strategy βi ,

3. bidder i’s utility is computed in expectation when xi ∼ Fi , he bids βi (xi ) and the
outcome of the auction (allocation and payment) is defined by M(FB1 , . . . , FBn ). With
a slight abuse of notations, we will denote by Ui (βi ) his expected utility, assuming
the other bidders strategies are fixed.

4. the seller gets her expected revenue under this mechanism.

This objective is particularly relevant in modern applications as most of the data-driven

selling mechanisms are using large batches of bids as examples to update their mechanism.

4.3.2 The posted price setting

Let first consider the posted price setting where n = 1 bidder plays against one seller. We
assume, for simplicity of this introductory example, that bidder’s value distributions Fi is
U[0, 1], i.e., uniform on the interval [0,1]. Let us initially consider that the bidder is bidding
truthfully, i.e, βi = Id. In this case, FBi = Fi and the seller will set as reserve price the
monopoly price by maximizing the monopoly revenue r(1 − Fi (r)). This monopoly price is
equal to 0.5 in the case of U[0, 1]. Note that this maximization problem is computationally
simple as the monopoly revenue is a concave function if the value distribution is regular.
The bidder can obviously do better. If he bids all the time zero (or ε arbitrarily close to
zero), FBi will be equal to a point mass at zero. Through computing the optimal reserve
price corresponding to FBi , the seller chooses zero, obviously maximizing bidder’s utility.

110
Optimal reserve price Utility
K=1 K=2 K=3 K=4 K=1 K=2 K=3 K=4
Truthful bidding 0.5 0.5 0.5 0.5 1/8 1/12 11/192 13/320
1/2 0.0 0.0 0.0
Zero bidding 0.0 0.0 0.0 0.0
(+400%) (-100%) (-100%) (-100%)
1/4 ≈ 0.094 ≈ 0.036 ≈ 0.015
Divide values by 2 0.25 0.25 0.25 0.25
(+100%) (+13%) (-37%) (-63%)
Thresholded at
1/4 ≈ 0.132 ≈ 0.076 ≈ 0.048
the monopoly price 0.25 0.25 0.25 0.25
(Theorem 4.13) (+100%) (+57%) (+33%) (+20%)
Optimal regularity-
0.0 0.162 0.204 0.22 1/2 ≈ 0.147 ≈ 0.079 ≈ 0.049
preserving strategies
(Theorem 4.17) (+400%) (+76%) (+38%) (+21%)

Table 4.1: Comparison of the utility of the strategic bidder between the truthful strategy, the strategy corre-
sponding to bidding zero for any values, the linear strategy dividing values by two, the strategy introduced
in Theorem 4.13 and the optimal regularity-preserving strategies for each number of competitors (derived
from Theorem 4.17). The first four strategies are fixed and do not require knowledge of the competition to
be computed. The last one is competition-specific and exact knowledge of the distribution followed by the
highest bid of the competition is needed to compute it. For this example, bidders’ value distributions are
U[0, 1] and opponents are assumed to bid truthfully.

The problem we consider derives from a simple extension of this example to the case of n
bidders. In a lazy second price auction, the optimal reserve price for each bidder is still the
monopoly price. Yet, as soon as there is some competition, bidders cannot bid zero as they
get zero utility in this case. They have to tradeoff between beating the competition and
decreasing their reserve price.

4.3.3 Improving the truthful strategy for any distributions of the competition

(Nedelec et al., 2019a) derives a simple strategy which guarantees to the bidder an increase
in utility compared to the truthful strategy for any distributions of the competition. This
increase depends on the distribution of the competition. Yet, by playing this strategy, the
bidder is sure to do better than by bidding truthfully. This is an important practical result
as in many ad platforms, bidders have to bid without knowing the distribution of the
competition. This strategy, that they call thresholding at the monopoly price, has also the key
property of making simple the optimization problem of the seller, i.e., if Fi is regular, the
bid distribution FBi induced by this strategy on Fi is also regular.

Definition 4.3.1. Consider a bidder with a regular value distribution Fi . A bidding strategy
βi is regularity-preserving if the bid distribution FBi induced by βi on Fi is a regular
distribution.

111
When the reserve price is computed from FBi - the bid distribution induced by using β
on Fi - a distinction between the reserve price rβ and the reserve value xβ must be made.
Definition 4.3.2. Given a non-decreasing strategy β, the reserve value xβ is the smallest
value above which the seller accepts bids. In particular, if the bidder bids truthfully,
his reserve value is equal to his reserve price; on the other hand, if β is continuous and
increasing, and rβ is the reserve price associated with the strategy β, then xβ = β −1 (rβ ).
Consider for instance, F = U[0, 1], and the bidding strategy β(x) = x/2, then rβ = 0.25
and xβ = 0.5. By dividing bids by two, the strategic bidder decreases their reserve price but
does not change the reserve value: it is the same as if they were bidding truthfully.
Theorem 4.13. Suppose the value distribution F has a density f , with f > 0 on the support
of F and that the left-end point of its support is 0, and that the other bidders’ strategies are
fixed. Let βr be an increasing strategy with associated reserve value r > 0 in a lazy second
price auction such that the bid distribution associated with βr has a virtual value. Then
there exists another bidding strategy β̃r such that:
1. A reserve value associated with β̃r is 0 and β̃r is increasing.

2. Ui (β̃r ) ≥ Ui (βr ), i.e., the utility of bidder i is higher;

3. Pi (β̃r ) ≥ Pi (βr ), i.e., the payment of bidder i to the seller is also higher,
The following continuous function fulfills these conditions:
!
βr (r)(1 − Fi (r))
β̃r (x) = 1{x < r} + βr (x)1{x ≥ r}
1 − Fi (x)
A reserve value equal to zero means that the seller accepts all bids of the strategic
bidder. It also means that the reserve price is equal to the minimum bid of the strategic
bidder. This result can be applied to improve any preexisting shading strategy. A very
important case is to apply this theorem to the truthful strategy, showing that there exists a
strategy improving the truthful strategy regardless of the competition distribution. We now
explain why we can improve any strategy in this setting without knowing the distribution
of the competition. Myerson’s Lemma is a key element in this understanding.
In this setting, it is optimal for the seller to choose as reserve price for bidder i the
monopoly price corresponding to her bid distribution, and Myerson lemma implies that
the expected payment of bidder i in the optimized lazy second price auction is equal to
!
−1
Pi (βi ) = Eb∼FB ψFB (b)Gi (b)1{b ≥ ψBi (0)} .
i i

In order to simplify the computation of the expectation and remove the dependence on Bi ,
we rewrite this expected payment in the space of values using the fact that the strategic

112
bidder is using an increasing strategy βi . We will only consider increasing strategies in the
remaining of the survey and so we define:

hβi (x) = ψFB (βi (x))

With this new notation, the expected payment of the strategic bidder i rewrites as
!
Pi (βi ) = Exi ∼Fi hβi (xi )Gi (βi (xi ))1{xi ≥ xβ } .

and her expected utility can be derived as a function of βi , since

!
Ui (βi ) = Exi ∼Fi (xi − hβi (xi ))Gi (βi (xi ))1{xi ≥ xβ } . (4.7)

where xβ is the reserve value. If hβi crosses 0 exactly once and is positive beyond that crossing
point, xβ = h−1 −1
βi (0). If we call ri = ψFB (0) the reserve price of bidder i and βi increasing , the
i
reserve value is equal to βi−1 (ri ).
If we consider only increasing differentiable strategies, and we denote by I the class of
such functions, the problem of the strategic bidder is therefore to solve supβ∈I U (β) with
U defined in Equation (4.7). This equation is crucial, as it indicates that optimizing over
bidding strategies can be reduced to finding a distribution with a well-specified hβ (·). Our
results extend to the case where the strategies are increasing and differentiable except at
finitely many points, as we only need bFB (b) to be absolutely continuous for the previous
result to go through.
A crucial difference between the long-term vision and the classical, myopic (or one-shot)
auction theory is that in this setup bidders maximize expected utility globally over the
full support of the value distribution. In the classical myopic setting, bidders determine
their bids to maximize their expected utility at each value. In our setup, the strategic
bidder also accounts for the computation of the reserve price, a function of her global bid
distribution. He might therefore be willing to sometimes over-bid (incurring a negative
utility at some specific auctions/values) or underbid (lose some auctions that he would
have won otherwise) if this reduces her reserve price. Indeed, having a lower reserve price
increases the utility of other auctions. Lose small to win big. In other words, the strategy
trades-off ex-post individual rationality (IR) for higher utility (of course ex-ante IR still
holds). This reasoning makes sense only with multiple interactions between bidders and
seller.

Thresholding the virtual value

A truthful bidding strategy can easily be improved by a strategic bidder, as illustrated by
the following elementary example. Consider that the value distribution of the bidder is

113
Thresholding the virtual value
Virtual value of the truthful bidder (r=0.5) 1.00
1.00
0.75 no bid

Virtual value function of b

0.75
Virtual value function of b

0.50 0.50
0.25 0.25
0.00 0.00
0.25 0.25
0.50 0.50 bid distribution support
0.75 0.75
1.00 1.000.0
0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0
bid b bid b

Figure 4.1: Virtual value of truthful bidder vs. strategic bidder. The value distribution of the bidder is
U[0, 1], the standard textbook example used for the sake of illustration. Her virtual value is therefore equal
to ψ(x) = 2x − 1, and is represented by the blue line. The dashed red vertical line corresponds to the current
reserve price. The green area corresponds to the bidder’s payment if we picked G = 1, i.e., no competition,
for the sake of clarity of the plot. The left-hand side corresponds to truthful bidding, the right-hand side to
strategic behavior. In both cases, the blue line corresponds to ψB .

U([0, 1]), uniformly between 0 and 1. With a truthful bidding, the associated virtual value
is negative below 1/2 and positive above, so that the optimal reserve price is 1/2, so that no
auction is won if the value is smaller than 1/2. On the other hand, if the strategic bidder
was able to send bids so that the virtual value (of bids) below 1/2 is exactly 0, then the
seller would not have any incentives to choose a reserve price, because of Myerson lemma.
In particular, the latter also implies that since the virtual value is zero below 1/2, the seller
receives exactly the same expected payment as with a truthful bidder.
This technique is called thresholding the virtual value. We now show formally how
to find a bidding strategy such that the virtual value of the induced bid distribution is
equal to zero below a certain threshold. Before carrying on with reasoning on the virtual
value, such as in our motivating example, we need to ensure we can find the corresponding
strategy βi that will expose a bid distribution FBi with the corresponding virtual value to
the seller. The two following technical lemmas show how to deduce βi from a given hβi .
Lemma 4.14. Suppose bi = βi (xi ), where βi is increasing and differentiable and xi is a
random variable with cdf Fi and pdf fi , with fi > 0 on the support of Fi . Then
1 − Fi (x)
hβi (xi ) = βi (x) − βi0 (x) = ψFB (βi (x)) . (4.8)
fi (x) i

1−FBi (b) β −1 (b)

Proof. By definition, ψFB (b) = b − fBi (b)
with FBi (b) = Fi (βi−1 (b)) and fBi (b) = fi ( β 0 (βi −1 (b) ).
i i i
1−Fi (x)
Then, hβi (x) = ψBi (βi (x)) = βi (x) − βi0 (x) f (x) .
i

The above results hold when β is increasing, continuous, and differentiable except

114
at finitely many points. The second lemma shows that for any function g, there exists a
function β such that hβ = g.

Lemma 4.15. Let X be a random variable with cdf F and pdf f , with f > 0 on the support
of F. Let x0 be in the support of X, C ∈ R and g : R → R. Define the function βg by
Rx
C(1 − F(x0 )) − x g(u)f (u)du
0
βg (x) = , (4.9)
1 − F(x)
then,
hβg (x) = g(x) and βg (x0 ) = C .
Moreover, if for some t ∈ R such that x0 ≤ t, g is non-decreasing on [x0 , t], then βg0 (x) ≥
(C − g(x))(1 − F(x0 ))f (x)/(1 − F(x)) for x ∈ [x0 , t]. Hence βg is increasing on [x0 , t] if g is
non-decreasing and g < C.

Proof. The result follows by simply differentiating the expression for βg , and plugging-in
the expression for hβg obtained in Lemma 4.14. The result on the derivative is simple
algebra.

The two technical lemmas 4.14 and 4.15 show that for any non-decreasing function g,
we can find a strategy βi such that the bid distribution induced by using βi on FXi verifies
ψBi (βi (x)) = g(x) for all x in the support of FXi .
We explained why sending to the seller a virtual value equal to zero when the initial one
was negative increases the bidder’s expected utility. To derive the corresponding bidding
strategy β from the virtual value, the strategic bidder only needs to solve the simple ODE
defined in Lemma 4.14.
This improvement of bidder’s utility does not depend of the estimation of the competi-
tion and thus can easily be implemented in practice. We plot in Figure 4.3, the bidding
(0)
strategy β̃0.5 when the initial value distribution is U[0, 1] and the virtual value of the bid
(0)
distribution induced by β̃0.5 on U[0, 1]. We recall that the monopoly price corresponding
to U[0, 1] is equal to 0.5. We remark that the strategy consists in overbidding below the
monopoly price of the initial value distribution. The strategic bidder is ready to increase
pointwise her payment when she wins auctions with low values in order to get a large
decrease of the reserve price (going from 0.5 to 0.25). Globally, the payment of the bidder
remains unchanged compared to when the bidder was bidding truthfully with a reserve
price equal to 0.5. Thresholding the virtual value at the monopoly price amounts to over-
bidding below the monopoly price, effectively providing over the course of the auctions an
extra payment to the seller in exchange for lowering the reserve price/value faced by the
strategic bidder. This strategy unlocks a very substantial utility gain for the bidder.

115
Virtual value of the bid distribution
1.00
Thresholded strategy

reserve price = min bid = 0.25

0.75

Virtual value function of b

1.0 truthful
thresholded strategy 0.50
0.8
0.25
0.6 0.00
bid

0.4 0.25
0.50 bid distribution support
0.2
0.75
0.0
0.0 0.2 0.4 0.6 0.8 1.0
1.000.0 0.2 0.4 0.6 0.8 1.0
value x Bid b

(0)
Figure 4.2: The value distribution is U[0, 1]. Left: Thresholded strategy β̃0.5 compared to the traditional
truthful strategy. Right: virtual value of the bid distribution induced by the thresholded strategy. The
optimal reserve price of the thresholded strategy is equal to 0.25 (corresponding to a reserve value of 0)
whereas the reserve price of the truthful strategy is equal to 0.5. (corresponding to a reserve value of 0.5). The
green area represents the expected payment corresponding to the thresholded strategy (we assumed G = 1 for
the sake of clarity).

Naturally, a key question is to understand the impact of this new strategy on the utility
of the strategic bidder. We compare the situation with two bidders bidding truthfully
against an optimal reserve price and the new situation with one bidder using the thresh-
olded strategy and the second one bidding truthfully. We assume, as is standard in many
textbooks and research papers numerical examples, that their value distribution is U[0, 1].
Then, elementary computations show that in this specific illustrative example, the
strategic bidder utility has a 57% increase, from 1/12 to 1/12 + (log(2) − 1/2)/4 ≈ 0.132, and
the welfare has a 8% increase, from 7/12 to 7/12 + (log(2) − 1/2)/4 ≈ 0.632.

4.3.4 Best response for a known distribution of the competition

We now show, for a specific given distribution of the competition, what is the optimal
increasing and regularity-preserving (RP) strategy, as defined in Definition 4.3.1. A direct
way to compute the expected utility of a bidding strategy βi when the seller is using a
second price auction with personalized reserve price and the other bidders are bidding
truthfully has been introduced in Subsection 4.3.3. Indeed,
!
U (βi ) = Ex∼Fi (x − hβi (x))Gi (β(x))1{x ≥ xβi } . (4.10)

1−F (x)
with hβi (x) = βi (x) − βi0 (x) f (x)
i
and xβi = h−1
βi (0). In this section, we assume that the bidder
i
has now access to the distribution of the highest bid of the competition that denoted by
Gi , with associated pdf gi and that he will optimize his utility among the strategies with
thresholded virtual values introduced in Subsection 4.3.3.

116
Thresholded strategies Virtual value of the bid distribution Monopoly revenue function of the reserve price
1.0 Truthful strategy 1.00 Truthful strategy 0.25

Monopoly revenue function of r

Monopoly price thresh. 0.75 Monopoly price thresh.

Virtual value function of b

0.8 Optimal R-P strategy Optimal R-P strategy 0.20
0.50
0.6 0.25 0.15
Bid b

0.00
0.4 0.25
0.10

0.2 0.50 0.05 Truthful strategy

Monopoly thresh.
0.75 Optimal R-P strategy
0.0 1.000.0
0.00
0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Value x Bid b Reserve price r

Figure 4.3: The value distribution is U[0, 1]. Left: Thresholded strategy compared to the traditional truth-
ful strategy. Middle: virtual value of the bid distribution induced by the thresholded strategy. Right:
monopoly revenue of the induced bid distribution as function of the reserve price.

Definition 4.16. A bidding strategy β is thresholded if there exists r > 0 such that for all
x < r, hβ (x) = ψB (β(x)) = 0. This family of functions can be parametrized as

γ γ(r)(1 − F(r))
βr (x) = 1{x < r} + γ(x)1{x ≥ r} ,
1 − F(x)
with r ∈ R and γ : R → R some continuous and increasing mapping.

This class of continuous bidding strategies has two degrees of freedom: the threshold r
such that for all x < r, hβ (x) = 0 and the strategy γ used beyond the threshold. We do not
restrict the functions γ that can be used beyond the threshold (beside being continuous
and increasing). All the strategies defined in this class have the property that their reserve
value is equal to zero, i.e., their reserve price is equal to their minimum bid, when the seller
is welfare benevolent and the virtual value of γ is positive beyond r. We can prove that the
optimal regularity-preserving strategy belongs to the class of thresholded strategies.
The following result states that there exists an optimal threshold r for the strategic
bidder that depends on the competition and that the optimal strategy to use for x > r is to
be truthful. It is derived by computing the directional derivatives of the utility function
defined in Equation (4.10).

Theorem 4.17. If Fi is regular, then the optimal increasing and regularity-preserving

strategy consists in thresholding at r ∗ and bidding truthfully beyond r ∗ , defined by:
r ∗ (1 − Fi (r ∗ ))
! !
∗ x ∗
G(r ) = Ex∼Fi g 1{x ≤ r } .
1 − Fi (x) 1 − Fi (x)
where G is the distribution of the largest bid of the competition

In Theorem 4.13, we proved that when the strategic bidder does not know the distri-
bution of the highest bid of the competition, he can use the thresholded strategy at his
monopoly price and increases his utility compared to truthful bidding. Theorem 4.17 gives
the optimal threshold when the strategic bidder knows G.

117
Some numerical results We consider the situation where we have 1 strategic bidder, and
1 non-strategic one, both wit in Subsection 4.3.3 was to bid truthfully beyond the monopoly
price (r = .5 here) and using Theorem 4.13 before. This strategy yields a utility of 0.1316, a
57% increase over the standard truthful bidding revenue. The optimal strategy coming out
of Theorem 4.17 consists in bidding truthfully beyond r ' .8 and using the thresholding
completion before. The utility is then around 0.1468, a 76% percent increase in bidder
utility compared to bidding truthfully (truthful bidding yields a utility of 1/12 ' .083).
This second strategy yields a higher utility for the strategic bidder but requires some
knowledge of the competition. The optimal strategy in Theorem 4.17 overbids on small
values, underbids on intermediate values and is truthful on high values. We also recover
that with no competition, the optimal strategy is to bid zero for any possible valuations.
In Table 4.1, we also notice that the difference in utility is decreasing with the number of
players since, with increasing competition, the strategic bidder cannot lower his bid for
values above his monopoly price.

4.3.5 Nash equilibrium

In the previous section, only one bidder was strategic, and the other bidders did not directly
react to this strategy. It could be a reasonable assumption in practice since the number of
bidders able to implement sophisticated bidding strategies appears to be limited, but we
still investigate the case where all bidders are strategic. For simplicity, we shall assume
that they are symmetric, with the same value distribution F. We consider a large class of
admissible bidders strategies: the large set of all thresholded bidding strategies introduced
in Definition 4.16. If bidders only use strategies from this set, then there exists a unique
Nash equilibrium, and their utility is the same as in a second price auction without reserve
price.
Theorem 4.18 (Tang and Zeng, 2018; Nedelec et al., 2019a). Assume that bidders are
symmetric, with a valuations distribution supported on [0, 1], with a continuous positive
density at 0 and 1, and such that the virtual value equals 0 exactly once and is positive
beyond. Then there exists a unique symmetric Nash equilibrium in the class of thresholded
bidding strategies that can be computed by solving
n−1

E xF (x)(1 − F(x))1{x ≤ r } = F n−1 (r ∗ )
n−2 ∗
(4.11)
r ∗ (1 − F(r ∗ )) x∼F
to determine the common reserve price r ∗ . Moreover, at this Nash equilibrium, the revenue
of the seller and the utilities of the buyers are the same as in a second price auction without
reserve prices.
With appropriate shading functions, the bidders can recover the utility they would get
when the seller was not optimizing her mechanism to maximize her revenue. The fact that

118
at symmetric equilibrium bidders recover the same utility as in a second price auction with
no reserves arguably makes it an even more natural class of bidding strategies to consider
from the bidder standpoint.

4.3.6 Perturbation analysis for the Myerson auction

These precedent results can be extended beyond the lazy second price, and to the Myerson
auction (see, e.g., (Nedelec et al., 2019b). for more details). The Nash equilibrium has a
relatively simple form given in the following Theorem 4.19.

Theorem 4.19 (Tang and Zeng, 2018; Abeille et al., 2018). In the Myerson auction, the
symmetric equilibrium strategy βeq satisfies
0
βeq (x) + βeq (x)(ψ(x) − x) = β I (x) ,

where β I (x) is the symmetric equilibrium strategy in a first price auction with no reserve
price. A solution of this equation is

βeq (x) = EX∼F [β I (X)|X ≥ x] .

At the equilibrium, the bidders’ expected utilities are the same as in a first price auction
without reserve price; in particular, it is strictly greater than their expected payoffs had
they bid truthfully.

Discussion The intuition behind this result is quite clear. In the Myerson auction, the
expected utility of a bidder is the same as in a first price auction where her bids have been
transformed through his virtual value function. We call the corresponding pseudo-bids
“virtualized” bids. Hence, if the bidders can bid in such a way that their virtualized bids are
equal to their symmetric equilibrium first price bids, the situation is completely equivalent
to a first price auction. And hence their equilibrium strategy in virtualized bid space
should be the strategy they use in a standard first price auction with no reserve price.

4.3.7 Approximations of the Myerson auction via numerical methods

This new variational approach unlocks, through numerical optimization, a method to
find best-responses to most of the approximated Myerson auction, such as boosted second
price auctions (Nedelec et al., 2019b). A straightforward optimization can fail because the
objective is discontinuous as a function of the bidding strategy. To circumvent this issue,
a new relaxation of the problem which is stable to local perturbations of the objective
function and computationally tractable and efficient has been introduced. This new objec-
tive can be numerically optimized through a simple neural network, with very significant
improvements in bidder utility compared to truthful bidding. This simple approach can

119
be plugged in any modern bidding algorithms learning distribution of the highest bid of
the competition and we test it on other classes of mechanism without any known closed
form optimal bidding strategies.
The major and prohibitive drawback of these approaches is that they require that strate-
gic bidders perfectly know the underlying mechanism design problem (i.e., the revenue
maximization problem) solved by the seller, leading to a strong asymmetry between the
bidders and the seller, this time in favor of the former.
It is nonetheless possible to remove the prior knowledge on the exact algorithmic proce-
dure used by the seller to optimize her mechanism by a classical exploration/exploitation
trade-off, inspired by reinforcement learning techniques, thus reducing this asymmetry
(Nedelec et al., 2021).

References
Abeille, M., C. Calauzènes, N. E. Karoui, T. Nedelec, and V. Perchet. 2018. “Explicit shading
strategies for repeated truthful auctions”. In: arXiv preprint arXiv:1805.00256.
Agrawal, S., C. Daskalakis, V. S. Mirrokni, and B. Sivan. 2018. “Robust Repeated Auctions
under Heterogeneous Buyer Behavior”. In: Proceedings of the 2018 ACM Conference on
Economics and Computation. 171–171.
Amin, K., A. Rostamizadeh, and U. Syed. 2013. “Learning prices for repeated auctions
with strategic buyers”. In: Proceedings of the 26th International Conference on Neural
Information Processing Systems-Volume 1. 1169–1177.
Amin, K., A. Rostamizadeh, and U. Syed. 2014. “Repeated contextual auctions with strate-
gic buyers”. In: Proceedings of the 27th International Conference on Neural Information
Processing Systems-Volume 1. 622–630.
Archer, A. and É. Tardos. 2001. “Truthful mechanisms for one-parameter agents”. In:
Proceedings 2001 IEEE International Conference on Cluster Computing. IEEE.
Arlotto, A. and I. Gurvich. 2019. “Uniformly Bounded Regret in the Multisecretary Prob-
lem”. Stochastic Systems. 9(3): 231–260.
Ashlagi, I., C. Daskalakis, and N. Haghpanah. 2016. “Sequential mechanisms with ex-post
participation guarantees”. In: Proceedings of the 2016 ACM Conference on Economics and
Computation. 213–214.
Aström, K. J. and R. M. Murray. 2008. Feedback Systems: An Introduction for Scientists and
Engineers. Princeton University Press.

120
Babaioff, M., R. D. Kleinberg, and A. Slivkins. 2010. “Truthful Mechanisms with Im-
plicit Payment Computation”. In: Proceedings of the 11th ACM Conference on Electronic
Commerce. EC ’10. Cambridge, Massachusetts, USA: Association for Computing Ma-
chinery. 43?52. isbn: 9781605588223. doi: 10 . 1145 / 1807342. 1807349. url: https :
//doi.org/10.1145/1807342.1807349.
Babaioff, M., Y. Sharma, and A. Slivkins. 2014. “Characterizing Truthful Multi-armed
Bandit Mechanisms”. SIAM Journal on Computing. 43(1): 194–230. doi: 10 . 1137 /
120878768.
Balseiro, S. R., O. Besbes, and G. Y. Weintraub. 2015. “Repeated auctions with budgets in
ad exchanges: Approximations and design”. Management Science. 61(4): 864–884.
Balseiro, S. R. and Y. Gur. 2019. “Learning in repeated auctions with budgets: Regret
minimization and equilibrium”. Management Science. 65(9): 3952–3968.
Balseiro, S. R., V. S. Mirrokni, and R. P. Leme. 2018. “Dynamic mechanisms with martingale
utilities”. Management Science. 64(11): 5062–5082.
Braverman, M., J. Mao, J. Schneider, and M. Weinberg. 2018. “Selling to a no-regret buyer”.
In: Proceedings of the 2018 ACM Conference on Economics and Computation. 523–538.
Celis, L. E., G. Lewis, M. Mobius, and H. Nazerzadeh. 2014. “Buy-it-now or take-a-chance:
Price discrimination through randomized auctions”. Management Science. 60(12): 2927–
2948.
Choi, H., C. F. Mela, S. R. Balseiro, and A. Leary. 2020. “Online display advertising markets:
A literature review and future directions”. Information Systems Research. 31(2): 556–
575.
Choi, H. and C. F. . Mela. 2018. “Display advertising pricing in exchange markets”. Working
paper.
Ciocan, D. F. and V. Farias. 2012. “Model Predictive Control for Dynamic Resource Alloca-
tion”. Mathematics of Operations Research.
Cremer, J. and R. P. McLean. 1988. “Full extraction of the surplus in Bayesian and dominant
strategy auctions”. In: Econometrica: Journal of the Econometric Society. JSTOR. 1247–
1257.
Deng, Y., J. Schneider, and B. Sivan. 2019. “Prior-Free Dynamic Auctions with Low Regret
Buyers”. In: Advances in Neural Information Processing Systems. 4804–4814.
Devanur, N. R. and S. M. Kakade. 2009. “The Price of Truthfulness for Pay-per-Click
Auctions”. In: Proceedings of the 10th ACM Conference on Electronic Commerce. EC
’09. Stanford, California, USA: Association for Computing Machinery. 99?106. isbn:
9781605584584. doi: 10.1145/1566374.1566388. url: https://doi.org/10.1145/
1566374.1566388.
Dudley, R. M. 2014. Uniform Central Limit Theorems. Cambridge Studies in Advanced Mathe-
matics. Cambridge University Press.

121
Dütting, P., Z. Feng, H. Narasimhan, D. Parkes, and S. S. Ravindranath. 2019. “Optimal
auctions through deep learning”. In: International Conference on Machine Learning.
PMLR. 1706–1715.
Epasto, A., M. Mahdian, V. Mirrokni, and S. Zuo. 2018. “Incentive-aware learning for large
markets”. In: Proceedings of the 2018 World Wide Web Conference. 1369–1378.
Feng, Z., C. Podimata, and V. Syrgkanis. 2018b. “Learning to bid without knowing your
value”. In: Proceedings of the 2018 ACM Conference on Economics and Computation. 505–
522.
Fernandez-Tapia, J. 2015. “An analytical solution to the budget-pacing problem in pro-
grammatic advertising”. Journal of Information and Optimization Sciences. 40.
Fernandez-Tapia, J., O. Guéant, and J.-M. Lasry. 2016. “Optimal Real-Time Bidding Strate-
gies”. Applied Mathematics Research eXpress.
Ghosh, A., B. I. Rubinstein, S. Vassilvitskii, and M. Zinkevich. 2009. “Adaptive Bidding for
Display Advertising”. In: Proceedings of the 18th International Conference on World Wide
Web. WWW ’09. 251–260.
Golrezaei, N., A. Javanmard, and V. Mirrokni. 2021. “Dynamic incentive-aware learning:
Robust pricing in contextual auctions”. Operations Research. 69(1): 297–314.
Gummadi, R., P. Key, and A. Proutiere. 2012. Optimal Bidding Strategies and Equilibria in
Dynamic Auctions with Budget Constraints.
Kandasamy, K., J. E. Gonzalez, M. I. Jordan, and I. Stoica. 2020. “Mechanism Design with
Bandit Feedback”. arXiv: 2004.08924 [stat.ML].
Kanoria, Y. and H. Nazerzadeh. 2014. “Dynamic Reserve Prices for Repeated Auctions:
Learning from Bids”. In: Web and Internet Economics: 10th International Conference.
Vol. 8877. Springer. 232.
Lecué, G. and M. Lerasle. 2020. “Robust machine learning by median-of-means: theory
and practice”. The Annals of Statistics. 48(2): 906–931.
Lee, K.-C., A. Jalali, and A. Dasdan. 2013. “Real time bid optimization with smooth budget
delivery in online advertising.” Proceedings of the Seventh International Workshop on
Data Mining for Online Advertising.
Mirrokni, V. S., R. P. Leme, P. Tang, and S. Zuo. 2016. “Dynamic Auctions with Bank
Accounts.” In: Proceedings of IJCAI. 387–393.
Mohri, M. and A. M. Medina. 2015. “Revenue optimization against strategic buyers”.
Advances in Neural Information Processing Systems. 2015: 2530–2538.
Nazerzadeh, H., A. Saberi, and R. Vohra. 2008. “Dynamic Cost-per-Action Mechanisms and
Applications to Online Advertising”. In: Proceedings of the 17th International Conference
on World Wide Web. WWW ’08. Beijing, China: Association for Computing Machinery.
179?188. isbn: 9781605580852. doi: 10.1145/1367497.1367522. url: https://doi.org/
10.1145/1367497.1367522.

122
Nedelec, T., M. Abeille, C. Calauzènes, N. E. Karoui, B. Heymann, and V. Perchet. 2019a.
“Thresholding at the monopoly price: an agnostic way to improve bidding strategies in
revenue-maximizing auctions”. In: The Workshop on Learning in the Presence of Strategic
Behavior, EC.
Nedelec, T., J. Baudet, V. Perchet, and N. E. Karoui. 2021. “Adversarial learning for revenue-
maximizing auctions”. In: 20th International Conference on Autonomous Agents and
Multiagent Systems.
Nedelec, T., N. El Karoui, and V. Perchet. 2019b. “Learning to bid in revenue-maximizing
auctions”. In: International Conference on Machine Learning. PMLR. 4781–4789.
Nekipelov, D., V. Syrgkanis, and E. Tardos. 2015. “Econometrics for learning agents”. In:
Proceedings of the Sixteenth ACM Conference on Economics and Computation. 1–18.
Perchet, V. and P. Rigollet. 2013. “The multi-armed bandit problem with covariates”. The
Annals of Statistics. 41(2): 693–721.
Shalev-Shwartz, S. and S. Ben-David. 2014. Understanding Machine Learning: From Theory
to Algorithms. Cambridge University Press.
Tang, P. and Y. Zeng. 2018. “The price of prior dependence in auctions”. In: Proceedings of
the 2018 ACM Conference on Economics and Computation. 485–502.
Weed, J., V. Perchet, and P. Rigollet. 2016. “Online learning in repeated auctions”. In:
Conference on Learning Theory. PMLR. 1562–1583.
Xu, J., K.-c. Lee, W. Li, H. Qi, and Q. Lu. 2015. “Smart pacing for effective online ad cam-
paign optimization”. In: Proceedings of the 21th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining. 2217–2226.
Yuan, S., J. Wang, and X. Zhao. 2013. “Real-time bidding for online advertising: measure-
ment and analysis”. Proceedings of the Seventh International Workshop on Data Mining
for Online Advertising.

123
Bibliography
Abeille, M., C. Calauzènes, N. E. Karoui, T. Nedelec, and V. Perchet. 2018. “Explicit shading
strategies for repeated truthful auctions”. In: arXiv preprint arXiv:1805.00256.
Agrawal, S., C. Daskalakis, V. S. Mirrokni, and B. Sivan. 2018. “Robust Repeated Auctions
under Heterogeneous Buyer Behavior”. In: Proceedings of the 2018 ACM Conference on
Economics and Computation. 171–171.
Albert, M., V. Conitzer, and P. Stone. 2017. “Automated design of robust mechanisms”. In:
Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 31. No. 1.
Allouah, A. and O. Besbes. 2020. “Prior-independent optimal auctions”. Management
Science. 66(10): 4417–4432.
Amin, K., A. Rostamizadeh, and U. Syed. 2013. “Learning prices for repeated auctions
with strategic buyers”. In: Proceedings of the 26th International Conference on Neural
Information Processing Systems-Volume 1. 1169–1177.
Amin, K., A. Rostamizadeh, and U. Syed. 2014. “Repeated contextual auctions with strate-
gic buyers”. In: Proceedings of the 27th International Conference on Neural Information
Processing Systems-Volume 1. 622–630.
Archer, A. and É. Tardos. 2001. “Truthful mechanisms for one-parameter agents”. In:
Proceedings 2001 IEEE International Conference on Cluster Computing. IEEE.
Arlotto, A. and I. Gurvich. 2019. “Uniformly Bounded Regret in the Multisecretary Prob-
lem”. Stochastic Systems. 9(3): 231–260.
Armstrong, M. 1996. “Multiproduct nonlinear pricing”. Econometrica: Journal of the Econo-
metric Society: 51–75.
Arnosti, N., M. Beck, and P. Milgrom. 2016. “Adverse selection and auction design for
internet display advertising”. American Economic Review. 106(10): 2852–66.
Ashlagi, I., C. Daskalakis, and N. Haghpanah. 2016. “Sequential mechanisms with ex-post
participation guarantees”. In: Proceedings of the 2016 ACM Conference on Economics and
Computation. 213–214.
Aström, K. J. and R. M. Murray. 2008. Feedback Systems: An Introduction for Scientists and
Engineers. Princeton University Press.
Athey, S. and P. A. Haile. 2007. “Chapter 60 Nonparametric Approaches to Auctions”. In:
Handbook of Econometrics.
Audibert, J.-Y. and S. Bubeck. 2009. “Minimax policies for adversarial and stochastic
bandits”. In: Proceedings of COLT.
Babaioff, M., R. Kleinberg, and A. Slivkins. 2013. “Multi-Parameter Mechanisms with
Implicit Payment Computation”. In: Proceedings of the Fourteenth ACM Conference on
Electronic Commerce. EC ’13. Philadelphia, Pennsylvania, USA: Association for Com-
puting Machinery. 35?52. isbn: 9781450319621. doi: 10.1145/2482540.2482602. url:
https://doi.org/10.1145/2482540.2482602.

124
Babaioff, M., R. D. Kleinberg, and A. Slivkins. 2010. “Truthful Mechanisms with Im-
plicit Payment Computation”. In: Proceedings of the 11th ACM Conference on Electronic
Commerce. EC ’10. Cambridge, Massachusetts, USA: Association for Computing Ma-
chinery. 43?52. isbn: 9781605588223. doi: 10 . 1145 / 1807342. 1807349. url: https :
//doi.org/10.1145/1807342.1807349.
Babaioff, M., Y. Sharma, and A. Slivkins. 2014. “Characterizing Truthful Multi-armed
Bandit Mechanisms”. SIAM Journal on Computing. 43(1): 194–230. doi: 10 . 1137 /
120878768.
Balcan, M.-F., A. Blum, J. D. Hartline, and Y. Mansour. 2008. “Reducing mechanism design
to algorithm design via machine learning”. Journal of Computer and System Sciences.
74(8): 1245–1270.
Balseiro, S. R., O. Besbes, and G. Y. Weintraub. 2015. “Repeated auctions with budgets in
ad exchanges: Approximations and design”. Management Science. 61(4): 864–884.
Balseiro, S. R., O. Candogan, and H. Gurkan. 2020. “Multistage Intermediation in Display
Advertising”. Manufacturing & Service Operations Management.
Balseiro, S. R. and Y. Gur. 2019. “Learning in repeated auctions with budgets: Regret
minimization and equilibrium”. Management Science. 65(9): 3952–3968.
Balseiro, S. R., V. S. Mirrokni, and R. P. Leme. 2018. “Dynamic mechanisms with martingale
utilities”. Management Science. 64(11): 5062–5082.
Bar-Yossef, Z., K. Hildrum, and F. Wu. 2002. “Incentive-compatible online auctions for
digital goods.” In: SODA. Vol. 2. 964–970.
Bartlett, P. L., S. Boucheron, and G. Lugosi. 2002. “Model selection and error estimation”.
Machine Learning. 48(1-3): 85–113.
Blum, A., V. Kumar, A. Rudra, and F. Wu. 2004. “Online learning in online auctions”.
Theoretical Computer Science. 324(2-3): 137–146.
Boyd, S. and L. Vandenberghe. 2004. Convex Optimization. USA: Cambridge University
Press. isbn: 0521833787.
Braverman, M., J. Mao, J. Schneider, and M. Weinberg. 2018. “Selling to a no-regret buyer”.
In: Proceedings of the 2018 ACM Conference on Economics and Computation. 523–538.
Bubeck, S. and N. Cesa-Bianchi. 2012. “Regret Analysis of Stochastic and Nonstochastic
Multi-armed Bandit Problems”. In: Machine Learning. Vol. 5. No. 1. 1–122.
Bubeck, S., N. R. Devanur, Z. Huang, and R. Niazadeh. 2017. “Multi-scale Online Learning
and its Applications to Online Auctions”. Proceedings of the Eighteenth ACM Conference
on Economics and Computation.
Bulow, J. and P. Klemperer. 1996. “Auctions Versus Negotiations”. The American Economic
Review. 86(1): 180–194.
Celis, L. E., G. Lewis, M. Mobius, and H. Nazerzadeh. 2014. “Buy-it-now or take-a-chance:
Price discrimination through randomized auctions”. Management Science. 60(12): 2927–
2948.

125
Cesa-Bianchi, N., T. Cesari, and V. Perchet. 2019. “Dynamic pricing with finitely many
unknown valuations”. In: Algorithmic Learning Theory. PMLR. 247–273.
Cesa-Bianchi, N., C. Gentile, and Y. Mansour. 2014. “Regret minimization for reserve prices
in second-price auctions”. IEEE Transactions on Information Theory. 61(1): 549–564.
Choi, H., C. F. Mela, S. R. Balseiro, and A. Leary. 2020. “Online display advertising markets:
A literature review and future directions”. Information Systems Research. 31(2): 556–
575.
Choi, H. and C. F. . Mela. 2018. “Display advertising pricing in exchange markets”. Working
paper.
Ciocan, D. F. and V. Farias. 2012. “Model Predictive Control for Dynamic Resource Alloca-
tion”. Mathematics of Operations Research.
Cole, R. and T. Roughgarden. 2014a. “The sample complexity of revenue maximization”. In:
Proceedings of the forty-sixth annual ACM symposium on Theory of computing. 243–252.
Cole, R. and T. Roughgarden. 2014b. “The sample complexity of revenue maximization”.
In: Proceedings of the forty-sixth annual ACM symposium on Theory of computing. 243–
252.
Conitzer, V. and T. Sandholm. 2002. “Complexity of mechanism design”. In: Proceedings of
the Eighteenth conference on Uncertainty in artificial intelligence. 103–110.
Cremer, J. and R. P. McLean. 1988. “Full extraction of the surplus in Bayesian and dominant
strategy auctions”. In: Econometrica: Journal of the Econometric Society. JSTOR. 1247–
1257.
Daskalakis, C., A. Deckelbaum, and C. Tzamos. 2013. “Mechanism design via optimal
transport”. In: Proceedings of the fourteenth ACM conference on Electronic commerce.
269–286.
Degenne, R. and V. Perchet. 2016. “Anytime optimal algorithms in stochastic multi-armed
bandits”. In: International Conference on Machine Learning. 1587–1595.
Deng, Y., J. Schneider, and B. Sivan. 2019. “Prior-Free Dynamic Auctions with Low Regret
Buyers”. In: Advances in Neural Information Processing Systems. 4804–4814.
Devanur, N. R., Z. Huang, and C.-A. Psomas. 2016. “The sample complexity of auctions
with side information”. In: Proceedings of the forty-eighth annual ACM symposium on
Theory of Computing. 426–439.
Devanur, N. R. and S. M. Kakade. 2009. “The Price of Truthfulness for Pay-per-Click
Auctions”. In: Proceedings of the 10th ACM Conference on Electronic Commerce. EC
’09. Stanford, California, USA: Association for Computing Machinery. 99?106. isbn:
9781605584584. doi: 10.1145/1566374.1566388. url: https://doi.org/10.1145/
1566374.1566388.
Dhangwatnotai, P., T. Roughgarden, and Q. Yan. 2015. “Revenue maximization with a
single sample”. Games and Economic Behavior. 91: 318–333.

126
Drutsa, A. 2020. “Reserve pricing in repeated second-price auctions with strategic bidders”.
In: International Conference on Machine Learning. PMLR. 2678–2689.
Dudley, R. M. 2014. Uniform Central Limit Theorems. Cambridge Studies in Advanced Mathe-
matics. Cambridge University Press.
Dütting, P., Z. Feng, H. Narasimhan, D. Parkes, and S. S. Ravindranath. 2019. “Optimal
auctions through deep learning”. In: International Conference on Machine Learning.
PMLR. 1706–1715.
Elkind, E. 2007. “Designing and learning optimal finite support auctions”. In: Proceedings
of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. 736–745.
Epasto, A., M. Mahdian, V. Mirrokni, and S. Zuo. 2018. “Incentive-aware learning for large
markets”. In: Proceedings of the 2018 World Wide Web Conference. 1369–1378.
Feng, Z., S. Lahaie, J. Schneider, and J. Ye. 2021. “Reserve Price Optimization for First
Price Auctions in Display Advertising”. International Conference on Machine Learning:
3230–3239.
Feng, Z., H. Narasimhan, and D. C. Parkes. 2018a. “Deep learning for revenue-optimal auc-
tions with budgets”. In: Proceedings of the 17th International Conference on Autonomous
Agents and Multiagent Systems. 354–362.
Feng, Z., C. Podimata, and V. Syrgkanis. 2018b. “Learning to bid without knowing your
value”. In: Proceedings of the 2018 ACM Conference on Economics and Computation. 505–
522.
Fernandez-Tapia, J. 2015. “An analytical solution to the budget-pacing problem in pro-
grammatic advertising”. Journal of Information and Optimization Sciences. 40.
Fernandez-Tapia, J., O. Guéant, and J.-M. Lasry. 2016. “Optimal Real-Time Bidding Strate-
gies”. Applied Mathematics Research eXpress.
Fibich, G. and A. Gavious. 2003. “Asymmetric First-Price Auctions: A Perturbation Ap-
proach”. Mathematics of Operations Research. 28(4): 836–852.
Fibich, G. and N. Gavish. 2012. “Asymmetric First-Price Auctions—A Dynamical-Systems
Approach”. Mathematics of Operations Research. 37(2): 219–243.
Fu, H. 2013. “VCG auctions with reserve prices: Lazy or eager”. In: Proceedings of the
Fourteenth ACM Conference on Economics and Computation.
Fu, H. 2016. “Notes on Myerson’s Revenue Optimal Mechanisms”. http://fuhuthu.com/
notes/iron.pdf. Accessed: 2021-08-25.
Fu, H., N. Immorlica, B. Lucier, and P. Strack. 2015. “Randomization beats second price
as a prior-independent auction”. In: Proceedings of the Sixteenth ACM Conference on
Economics and Computation. 323–323.
Gayle, W.-R. and J. F. Richard. 2008. “Numerical Solutions of Asymmetric, First-Price,
Independent Private Values Auctions”. Computational Economics. 32(3).

127
Ghosh, A., B. I. Rubinstein, S. Vassilvitskii, and M. Zinkevich. 2009. “Adaptive Bidding for
Display Advertising”. In: Proceedings of the 18th International Conference on World Wide
Web. WWW ’09. 251–260.
Golowich, N., H. Narasimhan, and D. C. Parkes. 2018. “Deep learning for multi-facility
location mechanism design”. In: Proceedings of the 27th International Joint Conference on
Artificial Intelligence. 261–267.
Golrezaei, N., M. Lin, V. Mirrokni, and H. Nazerzadeh. 2017. “Boosted Second-price
Auctions for Heterogeneous Bidders”. In: Management Science.
Golrezaei, N., A. Javanmard, and V. Mirrokni. 2021. “Dynamic incentive-aware learning:
Robust pricing in contextual auctions”. Operations Research. 69(1): 297–314.
Gonczarowski, Y. A. and N. Nisan. 2017. “Efficient empirical revenue maximization
in single-parameter auction environments”. In: Proceedings of the 49th Annual ACM
SIGACT Symposium on Theory of Computing.
Groeneboom, P. and G. Jongbloed. 2014. Nonparametric Estimation under Shape Constraints.
Cambridge University Press.
Guerre, E., I. Perrigne, and Q. Vuong. 2000. “Optimal Nonparametric Estimation of First-
price Auctions”. Econometrica. 68(3): 525–574.
Gummadi, R., P. Key, and A. Proutiere. 2012. Optimal Bidding Strategies and Equilibria in
Dynamic Auctions with Budget Constraints.
Guo, C., Z. Huang, and X. Zhang. 2019. “Settling the sample complexity of single-
parameter revenue maximization”. In: Proceedings of the 51st Annual ACM SIGACT
Symposium on Theory of Computing.
Hartline, J., A. Johnsen, and Y. Li. 2020. “Benchmark design and prior-independent op-
timization”. 2020 IEEE 61st Annual Symposium on Foundations of Computer Science
(FOCS): 294–305.
Hartline, J. D. et al. 2013. “Bayesian mechanism design”. Foundations and Trends® in
Theoretical Computer Science. 8(3): 143–263.
Hartline, J. D. and T. Roughgarden. 2009. “Simple versus optimal mechanisms”. In: Pro-
ceedings of the 10th ACM conference on Electronic commerce. 225–234.
Haussler, D. 1992. “Decision theoretic generalizations of the PAC model for neural net and
other learning applications”. In: Information and computation.
Hiriart-Urruty, J.-B. and C. Lemaréchal. 2001. Fundamentals of Convex Analysis. isbn: 978-
3-540-42205-1. doi: 10.1007/978-3-642-56468-0.
Huang, Z., Y. Mansour, and T. Roughgarden. 2018. “Making the most of your samples”.
SIAM Journal on Computing. 47(3): 651–674.
Kandasamy, K., J. E. Gonzalez, M. I. Jordan, and I. Stoica. 2020. “Mechanism Design with
Bandit Feedback”. arXiv: 2004.08924 [stat.ML].

128
Kanoria, Y. and H. Nazerzadeh. 2014. “Dynamic Reserve Prices for Repeated Auctions:
Learning from Bids”. In: Web and Internet Economics: 10th International Conference.
Vol. 8877. Springer. 232.
Kirkegaard, R. 2009. “Asymmetric first price auctions”. Journal of Economic Theory. 144(4):
1617–1635. issn: 0022-0531.
Kleinberg, R. and T. Leighton. 2003. “The value of knowing a demand curve: Bounds on re-
gret for online posted-price auctions”. In: 44th Annual IEEE Symposium on Foundations
of Computer Science, 2003. Proceedings. IEEE. 594–605.
Koltchinskii, V., D. Panchenko, et al. 2002. “Empirical margin distributions and bounding
the generalization error of combined classifiers”. The Annals of Statistics. 30(1): 1–50.
Kotowski, M. H. 2018. “On asymmetric reserve prices”. Theoretical Economics. 13(1): 205–
237.
Krishna, V. 2009. Auction Theory.
Lattimore, T. and C. Szepesvári. 2020. Bandit algorithms. Cambridge University Press.
Lavi, R. and N. Nisan. 2004. “Competitive analysis of incentive compatible on-line auc-
tions”. Theoretical Computer Science. 310(1-3): 159–180.
Le Thi, H. A., V. N. Huynh, and T. P. Dinh. 2014. “DC Programming and DCA for General
DC Programs”. In: Advanced Computational Methods for Knowledge Engineering. Ed. by
T. van Do, H. A. L. Thi, and N. T. Nguyen. Cham: Springer International Publishing.
15–35.
Lebrun, B. 1999. “First Price Auctions in the Asymmetric N Bidder Case”. International
Economic Review. (1).
Lecué, G. and M. Lerasle. 2020. “Robust machine learning by median-of-means: theory
and practice”. The Annals of Statistics. 48(2): 906–931.
Lee, K.-C., A. Jalali, and A. Dasdan. 2013. “Real time bid optimization with smooth budget
delivery in online advertising.” Proceedings of the Seventh International Workshop on
Data Mining for Online Advertising.
Lugosi, G. and S. Mendelson. 2019. “Mean estimation and regression under heavy-tailed
distributions: A survey”. Foundations of Computational Mathematics. 19(5): 1145–1190.
Manelli, A. M. and D. R. Vincent. 2007. “Multidimensional mechanism design: Revenue
maximization and the multiple-good monopoly”. Journal of Economic theory. 137(1):
153–185.
Marshall, R., M. Meurer, J. Richard, and W. Stromquist. 1994. “Numerical analysis of
asymmetric first price auctions”. Games and Economic Behavior. (2). issn: 0899-8256.
Massart, P. 1990. “The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality”. In:
The Annals of Probability. Vol. 18. No. 3. Institute of Mathematical Statistics. 1269–1283.
Medina, A. M. and S. Vassilvitskii. 2017. “Revenue optimization with approximate bid
predictions”. In: Proceedings of the 31st International Conference on Neural Information
Processing Systems. 1856–1864.

129
Milgrom, P. 2004. Putting auction theory to work. Cambridge University Press.
Milgrom, P. and I. Segal. 2002. “Envelope theorems for arbitrary choice sets”. Econometrica.
70(2): 583–601.
Mirrokni, V. S., R. P. Leme, P. Tang, and S. Zuo. 2016. “Dynamic Auctions with Bank
Accounts.” In: Proceedings of IJCAI. 387–393.
Mohri, M. and A. M. Medina. 2015. “Revenue optimization against strategic buyers”.
Advances in Neural Information Processing Systems. 2015: 2530–2538.
Mohri, M. and A. M. Medina. 2014. “Learning theory and algorithms for revenue opti-
mization in second price auctions with reserve”. In: International Conference on Machine
Learning. PMLR. 262–270.
Morgenstern, J. and T. Roughgarden. 2015. “The pseudo-dimension of near-optimal auc-
tions”. In: Proceedings of the 28th International Conference on Neural Information Process-
ing Systems-Volume 1. 136–144.
Myerson, R. B. 1981. “Optimal auction design”. Mathematics of operations research. 6(1):
58–73.
Nazerzadeh, H., A. Saberi, and R. Vohra. 2008. “Dynamic Cost-per-Action Mechanisms and
Applications to Online Advertising”. In: Proceedings of the 17th International Conference
on World Wide Web. WWW ’08. Beijing, China: Association for Computing Machinery.
179?188. isbn: 9781605580852. doi: 10.1145/1367497.1367522. url: https://doi.org/
10.1145/1367497.1367522.
Nedelec, T., M. Abeille, C. Calauzènes, N. E. Karoui, B. Heymann, and V. Perchet. 2019a.
“Thresholding at the monopoly price: an agnostic way to improve bidding strategies in
revenue-maximizing auctions”. In: The Workshop on Learning in the Presence of Strategic
Behavior, EC.
Nedelec, T., J. Baudet, V. Perchet, and N. E. Karoui. 2021. “Adversarial learning for revenue-
maximizing auctions”. In: 20th International Conference on Autonomous Agents and
Multiagent Systems.
Nedelec, T., N. El Karoui, and V. Perchet. 2019b. “Learning to bid in revenue-maximizing
auctions”. In: International Conference on Machine Learning. PMLR. 4781–4789.
Nekipelov, D., V. Syrgkanis, and E. Tardos. 2015. “Econometrics for learning agents”. In:
Proceedings of the Sixteenth ACM Conference on Economics and Computation. 1–18.
Ostrovsky, M. and M. Schwarz. 2011. “Reserve prices in internet advertising auctions: A
field experiment”. In: Proceedings of the 12th ACM conference on Electronic commerce.
59–60.
Paes Leme, R., M. Pál, and S. Vassilvitskii. 2016. “A field guide to personalized reserve
prices”. In: Proceedings of the 25th international conference on world wide web. 1093–1102.
Perchet, V. and P. Rigollet. 2013. “The multi-armed bandit problem with covariates”. The
Annals of Statistics. 41(2): 693–721.

130
Rahme, J., S. Jelassi, and S. M. Weinberg. 2020. “Auction learning as a two-player game”.
arXiv preprint arXiv:2006.05684.
Riley, J. G. and W. F. Samuelson. 1981a. “Optimal auctions”. The American Economic Review.
71(3): 381–392.
Riley, J. G. and W. F. Samuelson. 1981b. “Optimal auctions”. The American Economic Review.
71(3): 381–392.
Rockafellar, R. T. 1970. Convex Analysis. Princeton University Press.
Roughgarden, T. and O. Schrijvers. 2016. “Ironing in the dark”. In: Proceedings of EC. 1–18.
Roughgarden, T. and J. R. Wang. 2016. “Minimizing Regret with Multiple Reserves”. In:
Proceedings of the 2016 ACM Conference on Economics and Computation. 601–616.
Rudolph, M. R., J. G. Ellis, and D. M. Blei. 2016. “Objective variables for probabilistic
revenue maximization in second-price auctions with reserve”. In: Proceedings of the
25th International Conference on World Wide Web. 1113–1122.
Shalev-Shwartz, S. and S. Ben-David. 2014. Understanding Machine Learning: From Theory
to Algorithms. Cambridge University Press.
Shen, W., S. Lahaie, and R. P. Leme. 2019a. “Learning to clear the market”. In: International
Conference on Machine Learning. PMLR. 5710–5718.
Shen, W., P. Tang, and S. Zuo. 2019b. “Automated mechanism design via neural networks”.
In: Proceedings of the 18th International Conference on Autonomous Agents and Multiagent
Systems. 215–223.
Slivkins, A. et al. 2019. “Introduction to Multi-Armed Bandits”. Foundations and Trends®
in Machine Learning. 12(1-2): 1–286.
Tang, P. and Y. Zeng. 2018. “The price of prior dependence in auctions”. In: Proceedings of
the 2018 ACM Conference on Economics and Computation. 485–502.
Vickrey, W. 1961. “Counterspeculation, auctions, and competitive sealed tenders”. In: The
Journal of finance. Vol. 16. No. 1. Wiley Online Library.
Weed, J., V. Perchet, and P. Rigollet. 2016. “Online learning in repeated auctions”. In:
Conference on Learning Theory. PMLR. 1562–1583.
Xu, J., K.-c. Lee, W. Li, H. Qi, and Q. Lu. 2015. “Smart pacing for effective online ad cam-
paign optimization”. In: Proceedings of the 21th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining. 2217–2226.
Yao, A. C.-C. 2017. “Dominant-strategy versus bayesian multi-item auctions: Maximum
revenue determination and comparison”. In: Proceedings of the 2017 ACM Conference on
Economics and Computation. 3–20.
Yuan, S., J. Wang, and X. Zhao. 2013. “Real-time bidding for online advertising: measure-
ment and analysis”. Proceedings of the Seventh International Workshop on Data Mining
for Online Advertising.

131

Design The Midship Section and Calculate Von-Misses Stress.: Pathak Pharindra
No ratings yet
Design The Midship Section and Calculate Von-Misses Stress.: Pathak Pharindra
31 pages
CFD Tutorial 1 - Elbow
100% (1)
CFD Tutorial 1 - Elbow
26 pages
Hydrotherapy Lecture...
100% (2)
Hydrotherapy Lecture...
76 pages
Ch2100X - Spare Parts
100% (1)
Ch2100X - Spare Parts
195 pages
Putting Auction Theory To Work - 2004 - 1era Edición - Milgrom PDF
100% (2)
Putting Auction Theory To Work - 2004 - 1era Edición - Milgrom PDF
393 pages
ASRJC H2 Chem 2021 P1 Solutions
No ratings yet
ASRJC H2 Chem 2021 P1 Solutions
29 pages
Inventory Management Summary
No ratings yet
Inventory Management Summary
5 pages
Atm System FINAL
No ratings yet
Atm System FINAL
77 pages
Earle Brown - Compositional Process
100% (1)
Earle Brown - Compositional Process
19 pages
Generalized Minimum Miscibility Pressure Correlation: SPE, Petroleum Technology Research LNST
No ratings yet
Generalized Minimum Miscibility Pressure Correlation: SPE, Petroleum Technology Research LNST
10 pages
Iwc Dump
No ratings yet
Iwc Dump
147 pages
Water Level Indicator
No ratings yet
Water Level Indicator
29 pages
Auctions I
No ratings yet
Auctions I
46 pages
RMR DOKU V20 E L
100% (1)
RMR DOKU V20 E L
133 pages
Ahmet Ozan HATİPOĞLU Cansu Çalişir Mehmet Özgür TEMUÇİN
100% (1)
Ahmet Ozan HATİPOĞLU Cansu Çalişir Mehmet Özgür TEMUÇİN
14 pages
Reliability of Gait Performance Tests in Men and Women With Hemiparesis After Stroke
No ratings yet
Reliability of Gait Performance Tests in Men and Women With Hemiparesis After Stroke
8 pages
BOP Control System BC0114001A
No ratings yet
BOP Control System BC0114001A
2 pages
Short Bowel Syndrome: Tinjauan Pustaka
No ratings yet
Short Bowel Syndrome: Tinjauan Pustaka
19 pages
Physics 2020 QP Set 1 English
No ratings yet
Physics 2020 QP Set 1 English
10 pages
Voicu Dynamics of Procurement Auctions
No ratings yet
Voicu Dynamics of Procurement Auctions
93 pages
Auctions Slides Part1
No ratings yet
Auctions Slides Part1
17 pages
Managerial Economics:: Auctions
No ratings yet
Managerial Economics:: Auctions
36 pages
DSK Audio4 Reva 1
No ratings yet
DSK Audio4 Reva 1
15 pages
Incentive Compatible Ticket Booking Scheme For Sporting Events
No ratings yet
Incentive Compatible Ticket Booking Scheme For Sporting Events
28 pages
Vogelsang ETEP-Journal Detection of Electrical Tree Propagation by Partial Discharge Measurements
No ratings yet
Vogelsang ETEP-Journal Detection of Electrical Tree Propagation by Partial Discharge Measurements
7 pages
Greenhouse Effect Atmosphere Carbon Dioxide Nitrous Oxide Methane
No ratings yet
Greenhouse Effect Atmosphere Carbon Dioxide Nitrous Oxide Methane
11 pages
Materials For Mechanical Parts
No ratings yet
Materials For Mechanical Parts
20 pages
1LE1601-1AB53-4FB4-Z F01+F11+F50+L05 Datasheet en
No ratings yet
1LE1601-1AB53-4FB4-Z F01+F11+F50+L05 Datasheet en
2 pages
Online Ad Auctions: UC Berkeley and Google. Hal@ischool - Berkeley.edu 1
No ratings yet
Online Ad Auctions: UC Berkeley and Google. Hal@ischool - Berkeley.edu 1
6 pages
Physics Questions
No ratings yet
Physics Questions
7 pages
Auction Lectures Part 1
No ratings yet
Auction Lectures Part 1
126 pages
Keyword Auctions
No ratings yet
Keyword Auctions
55 pages
Main Ai Games Markets
No ratings yet
Main Ai Games Markets
89 pages
Incentives and Organization (Auction Theory) : 1 Auctions in Practice
No ratings yet
Incentives and Organization (Auction Theory) : 1 Auctions in Practice
30 pages
Theory and Practice of Auctions: June 2003
No ratings yet
Theory and Practice of Auctions: June 2003
21 pages
Doubly Fair Dynamic Pricing
No ratings yet
Doubly Fair Dynamic Pricing
35 pages
ECON302 Auctions Part ONE
No ratings yet
ECON302 Auctions Part ONE
13 pages
Optimal Auction
No ratings yet
Optimal Auction
7 pages
Lecture 15
No ratings yet
Lecture 15
39 pages
Master Thesis Tom de Ronde
No ratings yet
Master Thesis Tom de Ronde
79 pages
The Evolution of Auction Theory
No ratings yet
The Evolution of Auction Theory
35 pages
Notes Auction
No ratings yet
Notes Auction
21 pages
05 Focs Aa
No ratings yet
05 Focs Aa
10 pages
Competitive Auctions With Imperfect Predictions
No ratings yet
Competitive Auctions With Imperfect Predictions
30 pages
Eyal Winter Center For The Study of Rationality The Hebrew University of Jerusalem
No ratings yet
Eyal Winter Center For The Study of Rationality The Hebrew University of Jerusalem
29 pages
Auctions: Managerial Economics
No ratings yet
Auctions: Managerial Economics
36 pages
1 s2.0 S2405959520304756 Main
No ratings yet
1 s2.0 S2405959520304756 Main
7 pages
Micro 2-2 Consumer Theory - Auctions - DUBAI
No ratings yet
Micro 2-2 Consumer Theory - Auctions - DUBAI
39 pages
Optimal Auctions
No ratings yet
Optimal Auctions
4 pages
Auction Theory
No ratings yet
Auction Theory
37 pages
Resource 20240428125627 Doc-20240422-Wa0002.
No ratings yet
Resource 20240428125627 Doc-20240422-Wa0002.
2 pages
Auctions: Auction Types
No ratings yet
Auctions: Auction Types
6 pages
Lecture 6 Auction Theory
No ratings yet
Lecture 6 Auction Theory
36 pages
Learning in Budgeted Auctions With Spacing Objectives
No ratings yet
Learning in Budgeted Auctions With Spacing Objectives
53 pages
Auction Theory: Pak-Sing Choi Felix Munoz-Garcia
No ratings yet
Auction Theory: Pak-Sing Choi Felix Munoz-Garcia
304 pages
Edexcel Magnetism 1 QP
No ratings yet
Edexcel Magnetism 1 QP
17 pages
Auction Theory Stephane
No ratings yet
Auction Theory Stephane
45 pages
Learning in Auctions: Regret Is Hard, Envy Is Easy
No ratings yet
Learning in Auctions: Regret Is Hard, Envy Is Easy
45 pages
WP Tse 1593
No ratings yet
WP Tse 1593
48 pages
Ec515 Module17
No ratings yet
Ec515 Module17
7 pages
A Simple and Approximately Optimal Mechanism For An Additive Buyer
No ratings yet
A Simple and Approximately Optimal Mechanism For An Additive Buyer
40 pages
Auction Theory Presentation
No ratings yet
Auction Theory Presentation
23 pages
Sailer Katharina
No ratings yet
Sailer Katharina
92 pages
The Price of Anarchy in Auctions: Tim Roughgarden
No ratings yet
The Price of Anarchy in Auctions: Tim Roughgarden
43 pages
SSMP Vespa Service Manual
No ratings yet
SSMP Vespa Service Manual
25 pages
Differential Economics and Deep Learning
No ratings yet
Differential Economics and Deep Learning
55 pages
A Learning Approach To Auctions
No ratings yet
A Learning Approach To Auctions
24 pages
Symmetries and Optimal Multi-Dimensional Mechanism Design: Constantinos Daskalakis S. Matthew Weinberg
No ratings yet
Symmetries and Optimal Multi-Dimensional Mechanism Design: Constantinos Daskalakis S. Matthew Weinberg
18 pages
MSA - Manual
No ratings yet
MSA - Manual
18 pages
Artificial Intelligence and Auction Design
No ratings yet
Artificial Intelligence and Auction Design
31 pages
Informs
No ratings yet
Informs
17 pages
Online Auctions
No ratings yet
Online Auctions
71 pages
Learning Curve
No ratings yet
Learning Curve
4 pages
Feng Ec18
No ratings yet
Feng Ec18
18 pages
1 s2.0 S0004370221000990 Main
No ratings yet
1 s2.0 S0004370221000990 Main
23 pages
PTDLKD Final Report 2 PDFF
No ratings yet
PTDLKD Final Report 2 PDFF
60 pages
Auctions STOC 02
No ratings yet
Auctions STOC 02
11 pages
Auction and Voting Protocols
No ratings yet
Auction and Voting Protocols
121 pages
Nan Ney Thesis
No ratings yet
Nan Ney Thesis
65 pages
Revenue Equivalence Theorem
No ratings yet
Revenue Equivalence Theorem
4 pages
Trading Agents
No ratings yet
Trading Agents
12 pages
Ice Bayesian Games Auctions
No ratings yet
Ice Bayesian Games Auctions
20 pages
Blum 2011
No ratings yet
Blum 2011
10 pages
Optimal Auctions Through Deep Learning
No ratings yet
Optimal Auctions Through Deep Learning
8 pages
Lecture12 Auctions
No ratings yet
Lecture12 Auctions
41 pages
Putting Auction Theory To Work 1st Edition Paul Milgrom Install Download
No ratings yet
Putting Auction Theory To Work 1st Edition Paul Milgrom Install Download
53 pages
The Stock Market from A to See - 2nd Edition
From Everand
The Stock Market from A to See - 2nd Edition
John Nunez
No ratings yet
Blog Smarter, Not Harder: SEO, Blogging, and AI Strategies to Skyrocket Your Traffic
From Everand
Blog Smarter, Not Harder: SEO, Blogging, and AI Strategies to Skyrocket Your Traffic
Jay Nans
No ratings yet
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
From Everand
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
Zemelak Goraga
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet