0% found this document useful (0 votes)

15 views53 pages

Machine Learning Framework

Uploaded by

Alex Hope

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views53 pages

Machine Learning Framework

Uploaded by

Alex Hope

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

Personalized Dynamic Pricing with Machine Learning:

High Dimensional Features and Heterogeneous Elasticity

Gah-Yi Ban
London Business School, Regent’s Park, London, NW1 4SA, United Kingdom.
[email protected]

N. Bora Keskin
Fuqua School of Business, Duke University, 100 Fuqua Drive, Durham, NC 27708, United States.
[email protected]

We consider a seller who can dynamically adjust the price of a product at the individual customer level, by utilizing
information about customers’ characteristics encoded as a d-dimensional feature vector. We assume a personalized
demand model, parameters of which depend on s out of the d features. The seller initially does not know the
relationship between the customer features and the product demand, but learns this through sales observations over
a selling horizon of T periods. We prove that the seller’s expected regret, i.e., the revenue loss against a clairvoyant
√
who knows the underlying demand relationship, is at least of order s T under any admissible policy. We then design
a near-optimal pricing policy for a “semi-clairvoyant” seller (who knows which s of the d features are in the demand
√
model) that achieves an expected regret of order s T log T . We extend this policy to a more realistic setting where
the seller does not know the true demand predictors, and show that this policy has an expected regret of order
√
s T (log d + log T ), which is also near-optimal. Finally, we test our theory on simulated data and on a data set from
an online auto loan company in the United States. On both data sets, our experimentation-based pricing policy is
superior to intuitive and/or widely-practiced customized pricing methods such as myopic pricing and segment-then-
optimize policies. Furthermore, our policy improves upon the loan company’s historical pricing decisions by 47% in
expected revenue over a six-month period.
Key words : dynamic pricing, demand learning, demand uncertainty, regret analysis, lasso, machine learning
History : First version: May 23, 2017. This version: April 20, 2020. Forthcoming in Management Science.

1. Introduction
1.1. Background and Overview
In recent years, the advent of online sales channels has made an abundance of detailed customer
information available to sellers. Examples of such information include customer demographics
(postal code, date of birth, education/income status), past spending patterns, and social media
activities. Using sales data on these customer characteristics, it is now possible for many sellers to
dynamically improve their pricing decisions.
The availability of such information poses unique challenges and opportunities for online sellers.
In particular, how can a seller dynamically learn the impact of customer characteristics on product
demand, and simultaneously employ this information in pricing decisions to maximize revenue over
time?
To address this question, we investigate personalized dynamic pricing with learning; i.e., dynamic
pricing with imperfect information on how customers’ characteristics affect the demand. Specif-

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
2 Forthcoming in Management Science

ically, we consider a seller who offers personalized prices to different customers whose unique
characteristics are encoded as different vectors, known as features or the feature vector. The seller
is uncertain about the relationship between the customer features and the demand for its product,
but can learn this relationship through sales observations over time. We emphasize that by per-
sonalization we mean customized pricing at the individual level, not via customer segmentation,
which, to the best of our knowledge, seems to be the prevailing scheme for customized pricing in
practice. We thus reserve the term “customized pricing” for price discrimination on at least one
dimension of customer characteristics, and “personalized pricing” for price discrimination at the
most granular level possible with the available information.
At first glance, pricing identical products differently for different customers may seem a ques-
tionable practice. However, price discrimination based on customer characteristics is an age-old
practice that is legal, bar differentiation that violates antitrust or price-fixing laws (Ramasastry
2005). For instance, in the offline world, price discrimination on a per customer basis is prevalent in
insurance and consumer lending industries, where customers accept that rates are set on a case-by-
case basis. In the online world, customized pricing is already a wide-spread practice; e-commerce
websites that practice customized pricing include well-known retailers such as Amazon, Walmart,
and Sears, and travel websites such as Cheaptickets, Expedia, Hotels.com, Priceline, and Orbitz.
Furthermore, as tailor-made discount coupons are just a special case of customized pricing, the
question of finding the optimal personalized price is equivalent to finding the optimal discount
coupon from a common base price.
To formally study personalized dynamic pricing and learning, we consider a seller offering a
product for sale to customers who arrive sequentially over a discrete time horizon of T periods. For
each customer, the seller observes a d-dimensional feature vector pertaining to the characteristics
of that customer. The seller initially does not know the joint impact of customer features and prices
on the demand for the product, but can infer their impact based on individual sales observations.
In practice, d, the dimension of the feature vectors, can be quite large, and not all features may
be informative of the demand. As such, we let s ≤ d denote the number of features that are actual
predictors for the demand, and analyze two different settings: (i) a semi-clairvoyant seller who
knows a priori which s of the d features are non-trivial demand predictors, and (ii) a more realistic
seller who does not know which s of the d features are non-trivial demand predictors. In both
settings, we design and analyze policies that achieve near-optimal revenue performance, where
performance is measured by the seller’s T -period expected regret, i.e., the revenue loss relative to
a clairvoyant who knows the underlying demand model.

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 3

1.2. Key Results and Main Contributions

Our paper makes the following four contributions to the literature on learning and earning in the
context of dynamic pricing.
Formulating a personalized demand model with feature-dependent price sensitivity.
The price sensitivity of demand is a key aspect of dynamic pricing models. To gain insights into
customers’ price sensitivity in practice, we analyze a real-life data set from the U.S. auto loan
industry. Our analysis of this data set indicates that the price sensitivity of demand depends on
customer features with strong statistical significance, based on p-values less than 0.01. Table 1
shows the p-values associated with features affecting price sensitivity in our data set (see §5 for
further details).

Table 1 Statistical evidence for feature-dependent price sensitivity in practice

features affecting price sensitivity state FICO score term competitor rate loan amount
p-value 2.7 × 10−8 4.7 × 10−116 3.4 × 10−3 1.6 × 10−99 5.8 × 10−51
Notes. The details of our demand model and data analysis are given in Sections 2 and 5.

Motivated by this observation, we construct and study a personalized dynamic pricing model where
individual customer features affect the price sensitivity of demand as well as the potential market
size and customer taste. Thus, our model captures the joint impact of prices and customer features
in the form of feature-dependent price sensitivity (i.e., individual customers may have different
price-sensitivities; see the demand model (1) for details). To the best of our knowledge, this is the
first work to introduce feature-dependent price sensitivity in the design of personalized dynamic
pricing policies.
Characterizing problem complexity and the need for judicious price experiments.
The generality of our problem formulation (with feature-dependent price sensitivity and no addi-
tional assumptions of all prices being informative) affects the problem complexity in two major
ways. First, the best achievable regret performance in general is substantially worse than the best
achievable regret performance under the assumption that all prices are informative. In Theorem 1,
√
we prove that the seller’s T -period regret is at least of order s T under any admissible policy in
our general setting. In contrast, it has been shown that the seller’s T -period regret can grow loga-
rithmically in T under the assumption that all prices are informative (see, e.g., Qiang and Bayati
2016 and Javanmard and Nazerzadeh 2016). Theorem 1 thus shows that this cannot be achieved
in general. Second, and relatedly, we show that myopic policies can exhibit extremely poor per-
formance in our general problem setting. In §5.1.2 and §5.2.5, we conduct simulation experiments
and analyze a real-life data set from the U.S. auto loan industry to illustrate that not all prices
are necessarily informative in our general setup. As a result, the seller needs to use judicious price

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
4 Forthcoming in Management Science

experiments to achieve near-optimal revenue performance. Thus, the generality of our formulation
reveals the practical necessity of experimentation for optimal personalized dynamic pricing and
learning, which stands in stark contrast with the aforementioned related studies that show that
myopic policies are near-optimal under more restrictive assumptions.
Designing near-optimal policies and deriving performance guarantees. Our work
constructs policies that exhibit near-optimal performance in personalized dynamic pricing with
demand model uncertainty. To that end, we first consider the case of a seller who faces a linear
demand model and knows which features are the true predictors of demand. In this case, we design
an iterated estimation-and-pricing policy, and prove that this policy achieves a T -period regret
√
of order s T log T (see Theorem 2). After that, we extend our analysis to the case where the
seller faces a generalized linear demand model and does not know the true demand predictors. For
this case, we design another policy that employs maximum quasi-likelihood regression with lasso
√
regularization, and show that this policy achieves a T -period regret of order s T (log d + log T )
(see Theorem 3). The performance guarantees in Theorems 2 and 3 indicate that both of our poli-
cies are near-optimal (up to logarithmic terms) in view of the best achievable growth rate of regret
established in Theorem 1. Finally, as alluded to earlier, we validate these theoretical results with
two computational studies, one on simulated data based on a linear demand model, and another
on real-life data from a U.S. auto loan company based on a logit demand model (see §5).
Managerial insights for pricing practice. From an application perspective, the results in our
paper are of immediate relevance to practitioners. Comparing our approach with the actual pricing
decisions of the U.S. auto loan company in the aforementioned real-life data set, we observe that
our policies increase the company’s annual expected revenue by 47% over a six-month period (see
§5 for a more detailed comparison). Furthermore, our results also demonstrate the suboptimality
of policies that are prevalent in practice, which are (i) myopic pricing, and (ii) segment-then-
optimize policies. Our computational results in §5 show that these policies can perform in the
worst possible manner (regret growing linearly with T ), which is tantamount to no learning taking
place. With regard to (i), as mentioned above, the poor performance of myopic pricing is due
to the lack of sufficient price experimentation. With regard to (ii), we consider two versions of
segment-then-optimize policies, one in which the seller performs customer segmentation then sets
the price for the average customer in each segment, and another in which the seller performs
customer segmentation then applies our near-optimal personalized dynamic pricing policy within
each segment. In the former version of the segment-then-optimize policy, the regret grows linearly
because the policy accumulates errors in revenue as every arriving customer deviates from the
average customer (almost surely, if the demand is continuous). In the latter version, the regret
grows sub-linearly but at multiples of our personalized policy applied to the entire non-segmented

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 5

data set. This is because the segment-then-optimize policy unnecessarily reduces the number of
observation samples to learn from, and prevents learning from customers across different segments
(e.g., if customers are segmented by location, they can still have similarities in other dimensions).
We note that these findings are on a conservative example where there are two sufficiently separated
customer populations (with a misclassification rate at just 4%), and similar results hold even when
the correct segmentation information is provided to the seller.
Furthermore, in §6 we demonstrate that our approach performs well even when a generative
demand model different from our assumed one is fitted to the aforementioned real-life data set.
Based on this observation, our demand model and accompanying results seem robust to model
misspecification in practice.

1.3. Related Literature

Our work is related to the literature on dynamic pricing with demand model uncertainty, and the
literature on operations management models with feature (a.k.a. covariate) information.
There is a growing body of literature on dynamic pricing while learning an unknown demand
function (see, e.g., Araman and Caldentey 2009, Besbes and Zeevi 2009, Farias and Van Roy
2010, Harrison et al. 2012, Broder and Rusmevichientong 2012, den Boer and Zwart 2014,
Keskin and Zeevi 2014, 2016, Cheung et al. 2017, Ferreira et al. 2018). In this context,
Araman and Caldentey (2009), Farias and Van Roy (2010), Harrison et al. (2012), Cheung et al.
(2017), and Ferreira et al. (2018) studied dynamic pricing policies in Bayesian settings where a
seller dynamically chooses prices and makes demand observations, iteratively updating a prior
belief distribution on an unknown demand model. On the other hand, Besbes and Zeevi (2009),
Broder and Rusmevichientong (2012), den Boer and Zwart (2014), and Keskin and Zeevi (2014,
2016) studied frequentist settings where a seller conducts sequential price experiments to explore an
unknown demand function. For a thorough overview of dynamic pricing with learning, we refer the
reader to the recent survey paper by den Boer (2015). Our work is distinct from the aforementioned
studies because none of them consider demand functions that depend on customer characteristics.
More recently, Chen et al. (2015), Cohen et al. (2016), Qiang and Bayati (2016), and
Javanmard and Nazerzadeh (2016) studied pricing problems with demand learning and customer
feature information. In the presence of existing sales data on customer features, Chen et al. (2015)
consider an offline statistical estimation problem involving a logit model that represents customers’
choices, and analyze the performance of maximum likelihood estimation in their setting. On the
other hand, Cohen et al. (2016), Qiang and Bayati (2016), and Javanmard and Nazerzadeh (2016)
consider dynamic pricing problems with demand model uncertainty. Cohen et al. (2016) study
dynamic pricing of differentiated products on a homogeneous customer base, where the market

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
6 Forthcoming in Management Science

value of of each product is a linear function of the feature vector. They design a pricing policy
based on ellipsoid methods and derive performance guarantees on the worst-case regret of their
policy. Qiang and Bayati (2016) study the performance of a myopic policy called greedy iterated
least squares, and show that this policy can exhibit near-optimal revenue performance under cer-
tain conditions. Javanmard and Nazerzadeh (2016) consider a dynamic pricing problem with a
binary choice model, and construct a near-optimal policy in their setting. Our work is distinct
from the aforementioned studies in major ways. First of all, our model captures the case of feature-
dependent price sensitivity (i.e., the price sensitivity of demand is allowed to depend on individual
customer features), and we do not assume that all prices are informative. As discussed in §1.2 and
shown in detail below, these aspects of our model significantly affect the problem complexity and
shed light on the design of near-optimal policies in personalized pricing and learning. As shown
by Qiang and Bayati (2016), a myopic policy can be near-optimal when the price sensitivity of
demand does not depend on customer features and all prices are assumed to be informative. In
contrast, we demonstrate that such myopic policies can perform poorly in our general problem
formulation (see §5). Secondly, we show that the best achievable revenue performance in the full
generality of our setting is different from those in the dynamic pricing studies above (see §3).
Informed by this result, we construct policies that achieve said performance benchmark, providing
key practical guidelines on how to conduct price experimentation in the presence of customer fea-
ture information. In addition to the above studies, Nambiar et al. (2017) recently studied model
misspecification in the context of dynamic pricing and product features. Our work is differentiated
from theirs since they also analyze the case where the price sensitivity of demand does not depend
on features and their focus is on model misspecification.
More broadly, our paper also contributes to the growing literature in operations management on
the inclusion of feature (a.k.a. attribute/covariate) information in decision making (see, for instance,
Ferreira et al. 2015 for an empirical analytics study, and Ban and Rudin 2018, Ban et al. 2019 for
theoretical studies, in addition to Chen et al. 2015, Cohen et al. 2016, Qiang and Bayati 2016,
Javanmard and Nazerzadeh 2016 discussed in detail above).

2. Problem Formulation
Basic model elements. Consider a firm, hereafter referred to as the seller, that offers a product
for sale to T customers who arrive sequentially. Viewing each sales opportunity as a “period,” the
seller has a discrete time horizon of T periods and can dynamically adjust the product’s price over
the time horizon.
At the beginning of period t = 1, 2, . . . , T, the seller observes a d-dimensional vector of fea-
tures pertaining to the customer arriving in period t. We denote this random vector by Zt =

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 7

(Zt1 , Zt2 , . . . , Ztd ) and assume that {Zt , t = 1, 2, . . . , T } are independent and identically distributed
with a compact support Z ⊆ B0 (zmax ) ⊂ Rd , where B0 (zmax ) is the d-dimensional ball of radius
zmax > 0, and E[Zt ] is normalized to 0.1 We denote by ΣZ = E[Zt ZtT ] the covariance matrix of {Zt }
and assume that ΣZ is a symmetric and positive definite matrix. We note that the seller need
not know ΣZ . We allow Zt to include individual customer data and macroeconomic factors that
entail categorical features (e.g., postal code and income bracket) as well as continuous features
(e.g., credit score). This means that some components of Zt are continuous random variables, and
others discrete random variables. For the features modeled as continuous random variables, the
only assumption we make is that they have positive measure in the interior of their domains and
zero on the boundary. We also introduce the augmented feature vector Xt := Z1t ∈ Rd+1 for expo-

sitional convenience which is made apparent below. Accordingly, we denote the support of Xt by
X = {1} × Z , and the expectation over the product measure on X1 × · · · × XT by EX {·}.
Upon observing Xt = xt , the seller chooses a price pt ∈ [ℓ, u] to be offered to the customer arriving
in period t, where 0 < ℓ < u < ∞. Then, the seller observes this customer’s demand in response to
pt , which is given by

Dt = g α · xt + (β · xt ) pt + εt for t = 1, 2, . . . , T, (1)

where α, β ∈ Rd+1 are demand parameter vectors unknown to the seller, g(·) is a known function,
εt is the unobservable and idiosyncratic demand shock of the customer arriving in period t, and
Pd+1
u · v = i=1 ui vi denotes the inner product of vectors u and v. Note that the demand model (1)
captures feature-dependent customer taste and potential market size (through α · xt ) as well as
feature-dependent price sensitivity (through β · xt ).
Let θ := (α, β) be the vector of all unknown demand parameters, and Θ be a compact rectangle
in R2(d+1) from which the value of θ is chosen. We allow for d to be large, possibly larger than the
selling horizon T , but assume that a smaller subset of the d features have a sizable effect in the
demand model. We express this sparsity structure as follows: let Sα := {i = 1, . . . , d + 1 : αi 6= 0},
Sβ := {i = 1, . . . , d + 1 : βi 6= 0}, and S := Sα ∪ Sβ . Note that S contains the indices of all non-
zero components of α and β. For notational convenience, we use the set S to express the sparsity
structure in the unknown parameter vector θ = (α, β). (If the non-zero components of α and β are
distinct, one could use Sα and Sβ to express the sparsity structures in α and β separately; our

1
In many applications, E[Zt ] need not be 0, but this is subsumed in our analysis: if E[Zt ] equals a non-zero vector
µZ ∈ Rd , then we replace the first components of the vectors α, β ∈ Rd+1 in the demand model (1) with α̃1 =
α1 + (α2 , . . . , αd+1 )T µZ and β̃1 = β1 + (β2 , . . . , βd+1 )T µZ . This normalizes the value of E[Zt ] to 0 without influencing
other distributional assumptions on {Zt }. Moreover, by absorbing µZ into the unknown vectors α and β, we capture
the fact that the seller does not know µZ . We note that our analysis in the following section is valid also when the
seller uses unnormalized feature vectors with an unknown, non-zero mean µZ .

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
8 Forthcoming in Management Science

analysis is valid for that case because S already includes all components that influence demand.)
Define αS = (αi )i∈S and βS = (βi)i∈S as the vectors consisting of the components of α and β,
respectively, whose indices are in S , and θS = (αS , βS ). Note that θS is a compressed vector that
contains all non-zero components of θ; hence, we hereafter refer to θS as the compressed version
of θ. Let s ∈ {1, . . . , d + 1} be the cardinality of S , and denote the compressed versions of the
key quantities defined earlier with a subscript S . Thus, the compressed version of Θ is ΘS =
{θS : θ ∈ Θ} ⊂ R2s . For t = 1, . . . , T, the compressed versions of Zt and Xt are ZS ,t ∈ ZS ⊂ Rs and
1
XS ,t = ZS,t ∈ XS ⊂ Rs+1 , respectively, where ZS = {(zi )i∈S : z ∈ Z} and XS = {1} × ZS .
The demand function in (1) is known as a generalized linear model (GLM) because, given x ∈ X ,
the function that maps price p to expected demand is the composition of the function g : R → R
and the linear function p 7→ α · x + (β · x) p. In this relationship, the function g(·) is referred to
as the “link” function that captures potential nonlinearities in the demand-price relationship. We
assume that g(·) is differentiable and increasing. This assumption is satisfied for a broad family of
functions including linear, logit, probit, and exponential demand functions. It also implies that the
link function has bounded derivatives over its compact domain.2
We assume that {εt , t = 1, 2, . . .} is a sub-Gaussian martingale difference sequence; that is,
E[εt |Ft−1 ] = 0, and there exist positive constants σ0 and η0 such that E[ε2t |Ft−1 ] ≤ σ02 and
E[eηεt |Ft−1 ] < ∞ for all η satisfying |η| < η0 , where Ft = σ(p1 , . . . , pt , ε1 , . . . , εt , X1 , . . . , Xt+1 ) and the
construction of admissible price sequences {pt , t = 1, 2, . . .} is specified below. (A simple example
of this setting is where {εt } are bounded and have zero mean.) We note that the distribution of εt
can depend on price and feature observations. This implies that the idiosyncratic demand shocks
of customers are allowed to be dependent on prices and customer features in our formulation.
Moreover, the generality of the above demand-shock distribution allows for continuous as well as
discrete demand distributions. A noteworthy example within discrete demand distributions is the
binary customer response model, where {εt } are such that Dt ∈ {0, 1} for all t. In this case, the
event {Dt = 1} corresponds to a sale at the offered price pt , whereas {Dt = 0} corresponds to no
sale.
1
Given θ = (α, β) ∈ Θ and x = z ∈ X , the seller’s expected single-period revenue is

r(p, θ, x) = p g α · x + (β · x)p for p ∈ [ℓ, u]. (2)

Let ϕ(θ, x) = argmaxp {r(p, θ, x)} denote the unconstrained revenue-maximizing price in terms of
θ ∈ Θ and x ∈ X . We assume that ϕ(θ, x) is in the interior of the feasible set [ℓ, u] for all θ ∈ Θ and
x ∈ X.
2
That is, there exist ℓ̃, ũ ∈ R satisfying 0 < ℓ̃ ≤ |g ′ (ξ)| ≤ ũ < ∞ for all ξ = α · x + (β · x) p such that (α, β) ∈ Θ, x ∈ X ,
and p ∈ [ℓ, u] (here and later, a prime denotes a derivative).

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 9

Pricing policies, induced probabilities, and performance metric. Let us construct a

sequence of historical data vectors as follows: let H0 = X1 , and for t = 1, 2, . . . , T, let Ht be a
vector consisting of the observations until just after the beginning of period t + 1, when the
feature vector for period t + 1 has been revealed but before the pricing decision; i.e., Ht =
(p1 , . . . , pt , D1 , . . . , Dt , X1 , . . . , Xt+1 ). Define an admissible policy as a sequence of functions π =
(π1 , π2 , . . .), where πt : R(d+3)t−2 → [ℓ, u] is a measurable function that maps Ht−1 to the price to be
offered in period t. Thus, pt = πt (Ht−1 ) for all t = 1, 2, . . . , T, under policy π. We denote by Π the
set of all admissible pricing policies. Given π ∈ Π and θ = (α, β) ∈ Θ, define a probability measure
Pπθ {·} on the sample space of demand sequences D = (D1 , D2 , . . .) such that
T
Y
Pπθ {D1 ∈ dξ1 , . . . , DT ∈ dξT } =

Pε g α · Xt + (β · Xt )pt + εt ∈ dξt Ht−1 for ξ1 , ξ2 , . . . , ξT ∈ R,
t=1

where Pε {·} is the probability measure governing {εt , t = 1, 2, . . .}. The seller’s conditional expected
revenue loss in T periods relative to a clairvoyant who knows the underlying demand parameter
vector θ is defined as
T
X
∆πθ (T ; X T ) = Eπθ ∗
(θ, Xt ) − r(pπt , θ, Xt )

r XT (3)
t=1

for θ ∈ Θ, π ∈ Π, and X T = (X1 , . . . , XT ) ∈ X T , where: Eπθ {·} is the expectation operator associated
with Pπθ {·}, r ∗ (θ, x) = r(ϕ(θ, x), θ, x) is the maximum single-period revenue function, and pπt is the
price charged in period t under policy π. We call this performance metric the seller’s T -period
conditional regret, which is a random variable that depends on the realization of X T = (X1 , . . . , XT ).
The seller aims to minimize its T -period expected regret, given by

∆πθ (T ) = EX ∆πθ (T ; X T )

(4)

for θ ∈ Θ and π ∈ Π, where EX {·} is the expectation operator associated with the probability
measure governing {Xt , t = 1, 2, . . .}. Throughout the sequel, we also use the expectation notation
EπX,θ {·} := EX {Eπθ {·}}, and let PπX,θ {·} be the probability measure associated with EπX,θ {·}. Finally,
to describe the complexity of our problem setting, we use the seller’s worst-case expected regret,
which is defined as ∆π (T ) = sup{∆πθ (T ) : θ ∈ Θ}.

3. Characterizing the Problem Complexity

In this section, we characterize the complexity of our problem in terms of the best achievable regret
performance. To that end, we focus on an important special case of the general demand model (1)
by letting the expected demand be a linear function of price; i.e., g(ξ) = ξ for all ξ ∈ R. In this
case, the demand in period t is given by

Dt = α · xt + (β · xt ) pt + εt for t = 1, 2, . . . , T. (5)

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
10 Forthcoming in Management Science

We hereafter refer to (5) as the linear demand model. Note that, for this model, the unconstrained
revenue-maximizing price is ϕ(θ, x) = −(α · x)/(2β · x) for θ = (α, β) ∈ Θ and x ∈ X .
We consider in the following result the linear demand model (5) in the case where {εt } follow
an independent and identically distributed sequence of normal random variables. In this setting,
we present a lower bound on the seller’s expected regret under any admissible policy.

Theorem 1. (lower bound on regret) Let {Dt } be given by the linear demand model (5),
iid
and εt ∼ N (0, σ02 ) with σ0 > 0. Then, there exists a finite positive constant c such that
√
∆π (T ) ≥ cs T for all π ∈ Π and T ≥ 2.

Remark 1. The constant c is independent of s and T (see the proof of Theorem 1 in Appendix A
iid
for an explicit computation of this constant). We consider the case where εt ∼ N (0, σ02 ) for expo-
sition purposes, but the proof of Theorem 1 is valid for the broader, exponential family of distri-
butions.
√
Remark 2. Theorem 1 implies that supg∈G inf π∈Π {∆g,π (T )} ≥ cs T for all T ≥ 2, where G
denotes the set of all differentiable and increasing functions, and ∆g,π (T ) is the T -period expected
regret of policy π, with its dependence on the link function g(·) expressed explicitly. In other words,
the lower bound in Theorem 1 is a worst-case lower bound on the minimum regret for a broad
class of link functions.

Theorem 1 characterizes the complexity of our problem setting, stating that the expected regret
√
of any admissible policy must grow at least in the order of s T . Thus, any policy whose expected
√
regret is of order s T (up to logarithmic terms) is hereafter called a first-order optimal policy. In
the following section, we design such first-order optimal policies, and prove that the lower bound
in Theorem 1 is tight (up to logarithmic terms).
The proof of Theorem 1 is based on the analysis of the empirical Fisher information matrix,
which is given by
Pt Pt X t
xk xTk pk xk xTk 1 1 T
Jt := Pt k=1
T
k=1
Pt 2 T
= pk pk ⊗ xk xTk , (6)
k=1 pk xk xk k=1 pk xk xk k=1

where ⊗ denotes the Kronecker product of matrices. In this proof, we show that the seller’s expected
revenue loss in period t is inversely proportional to the smallest eigenvalue of Jt , denoted by
µmin (Jt ). Simply put, the seller’s cumulative information in period t, as measured by µmin (Jt ),
characterizes the growth rate of the expected revenue loss. For example, if all feasible prices were
assumed to provide substantial information about the underlying demand model, then µmin (Jt )
would increase by a positive amount in every period, growing linearly over time under any given

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 11

policy. In such a setting, there would be no need for active price experimentation. To be more
precise, under a myopic policy that puts no emphasis on price experimentation, the seller’s expected
revenue loss in period t would be proportional to 1/t. By the growth rate of harmonic series, this
implies that the seller’s expected revenue loss over T periods would be of order log T . As a result,
in a setting where all prices are assumed to be informative, the seller can use a myopic policy
to achieve an expected regret of order log T . Theorem 1 shows that this performance benchmark
cannot be achieved in our general setting. In general, not all prices are necessarily informative,
and hence, the seller needs to implement a judicious amount of price experiments. This qualitative
insight is key in our design of first-order optimal policies.
The above observation is consistent with Broder and Rusmevichientong (2012), who studied
dynamic pricing with learning in the absence of customer feature information. In particular,
Broder and Rusmevichientong (2012) showed that if the problem setup assumes that all prices are
informative (also known as the “well-separated” setting), then even myopic pricing policies can
rapidly accumulate information and achieve near-optimal revenue performance with logarithmically
growing regret. What Qiang and Bayati (2016) and Javanmard and Nazerzadeh (2016) study are
precisely pricing problems in this well-separated setting. As such, these papers show that myopic
pricing is near-optimal with logarithmic regret growth. To reiterate, attaining this benchmark is
not possible with our more general demand model. In contrast to the settings of Qiang and Bayati
(2016) and Javanmard and Nazerzadeh (2016), our setting does not assume that all prices are infor-
mative, which necessitates some price experimentation for optimal learning. Table 2 summarizes
these differences to position our work within the related literature.

Table 2 Positioning our work within the related literature

general well-separated
Qiang and Bayati (2016)
with feature information this paper
Javanmard and Nazerzadeh (2016)
no feature information Broder and Rusmevichientong (2012) Broder and Rusmevichientong (2012)

Therefore, our setting is differentiated from the settings of Qiang and Bayati (2016) and
Javanmard and Nazerzadeh (2016), allowing us to construct a new family of policies and results
that shed light on how price experiments should be implemented in the presence of customer feature
information.
In §5, we investigate both simulated and real-life data sets to illustrate the above points. This
data analysis also indicates that myopic policies exhibit a more severe complication beyond the
complexity characterized in Theorem 1. With no experimentation, the price paths of myopic policies
can converge to the boundary of the feasible price range. This results in an expected regret of

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
12 Forthcoming in Management Science

order T , which is substantially worse than the performance benchmark in Theorem 1. That is,
the convergence of price paths to the feasible price boundary makes myopic policies perform much
more poorly than the general regret bound in Theorem 1. We also note that the boundedness of the
feasible price set plays a minimal role in the derivation of Theorem 1. We use this boundedness for
replacing the expected value of the optimal price with an infimum, and the theorem’s conclusion
holds as long as that expected value is finite and positive. The proof of this theorem, as explained
above, is based on the rate of information accumulation, rather than bounds on feasible prices.

4. Designing First-order Optimal Policies

In this section, we design pricing policies that achieve the first-order optimality benchmark in
Theorem 1. For purposes of exposition, we first consider the linear demand model with known
sparsity structure. We then analyze our general demand model with unknown sparsity structure.

4.1. Linear Demand Model with Known Sparsity Structure

Let us consider the linear demand model (5) and suppose that the seller knows the underlying spar-
sity structure in the demand parameter vector θ = (α, β); i.e., the seller knows which components of
θ are zero, but does not know the values of the non-zero components of θ. In this semi-clairvoyant
setting, the seller simply focuses on learning the non-zero components of θ. As explained in §2, the
(compressed) parameter vector θS ∈ R2s contains all of the non-zero components of θ; hence, it is
sufficient for the seller to learn θS in this case. For this purpose, we design a policy, called iterated
least-squares regression with price experimentation (hereafter abbreviated ILSX), that conducts
price experiments with a decreasing frequency and iteratively estimates the unknown parameter
vector θS via least-squares regression. In what follows, we describe the details of experimentation
and estimation under ILSX.
Price experiments. We construct an experimentation scheme under which the number of price
√
experiments conducted over periods {1, 2, . . . , t} is at least in the order of t. To that end, let m1
and m2 be two distinct experimental prices in [ℓ, u]. For i ∈ {1, 2}, the set of periods in which the
experimental price mi is charged is

Mi = {t = L2 + i − 1 : L = 1, 2, . . .}. (7)

For brevity, we denote by M = M1 ∪ M2 the set of all experimentation periods. The price exper-
imentation scheme in (7) ensures that, for all t ≥ 5, each experimental price is charged at least
1
√
4
t times. This scheme uses two prices for experimentation; as seen below, one needs at least
two distinct experimental prices to ensure that regression estimates are well-defined in all periods.
For expositional purposes, we use exactly two experimental prices in the scheme described above,
noting that our analysis remains valid as long as two or more experimental prices are used.

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 13

Parameter estimation. We now describe an iterated least-squares regression procedure to

estimate the unknown demand parameter vector θS = (αS , βS ). Given a history of compressed
feature vectors (XS,1 , . . . , XS ,t ) = (xS ,1 , . . . , xS ,t ) observed through the end of period t, consider the
following estimate of θS :

θ̂S ,t+1 = argminθ̃S ∈R2s St (θ̃S ) , (8)
Pt
where St (θ̃S ) = k=1 I{k ∈ M }[Dk − α̃S · xS ,k − (β̃S · xS ,k )pk ]2 for θ̃S = (α̃S , β̃S ) ∈ R2s . Letting χk =
I{k ∈ M }, we note that
Pt
α̂S ,t+1 k=1 χk Dk xS ,k
θ̂S ,t+1 = = J˜S−1
,t
P t for all t ≥ 2, (9)
β̂S ,t+1 k=1 χk pk Dk xS ,k

where
Pt T
Pt T
X t
χ x x χ p x x T
J˜S ,t := Pt k=1 k S ,k S ,k
T
P k=1
t
k k
2
S ,k S
T
,k
= χk p1k p1k ⊗ xS ,k xTS ,k , (10)
χ p x x
k=1 k k S ,k S ,k χ p x
k=1 k k S ,k S ,kx k=1
Pt
and ⊗ denotes the Kronecker product of matrices. Moreover, letting MS ,t := k=1 χk p1k ⊗

xS ,k εk , we deduce from (5) and (9) that

θ̂S ,t+1 = J˜S−1 ˜ ˜−1

,t JS ,t θS + MS ,t = θS + JS ,t MS ,t for all t ≥ 2.

We thus have the following estimation error:

θ̂S ,t+1 − θS = J˜S−1

,t MS ,t for all t ≥ 2. (11)

Let ϑ̂S ,t+1 be the truncated estimate that satisfies ϑ̂S ,t+1 = PΘ {θ̂S ,t+1 }, where PΘ : R2s → ΘS is the
projection mapping from R2s onto the compressed parameter space ΘS . After observing the feature
vector XS ,t = xS ,t in period t, the ILSX policy with parameters m1 , m2 , denoted by ILSX(m1 , m2 ),
charges the price 
 m1 if t ∈ M1 ,
pt = m2
if t ∈ M2 , (12)
ϕ ϑ̂S ,t , xS ,t otherwise.

Analysis of the estimation error and regret. To study the estimation error under the ILSX
policy, we measure information with the following metric:
t
X
Jt = χk (pk − p̄t )2 for all t ≥ 1, (13)
k=1
Pt Pt
where p̄t = k=1 χk pk / k=1 χk . Note that Jt is the cumulative squared deviation in the price

sequence {p1 , p2 , . . . , pt }, which provides a measure of quadratic variation in prices. In the compu-
tation of Jt , the indicator variables {χk } ensure that only the experimentation periods are counted
for measuring information. As the experimental prices m1 and m2 are selected deterministically
under the policy ILSX(m1 , m2 ), the metric Jt is non-random for all t under this policy. Based on
this, our next result shows how Jt is connected to the smallest eigenvalue of the information matrix
J˜S ,t under ILSX(m1 , m2 ).

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
14 Forthcoming in Management Science

Lemma 1. (eigenvalues of Fisher information under known sparsity structure)

Assume that π = ILSX(m1 , m2 ), and for any given matrix A, let µmin (A) denote the smallest eigen-
value of A. Then, there exist finite and positive constants γ1 , κ1 , and ρ1 such that

PπX,θS µmin (J˜S ,t ) ≥ γ1 Jt ≥ 1 − κ1 se−ρ1 Jt

(14)

for all θS ∈ ΘS and t ≥ 2.

Remark 3. The constants γ1 , κ1 , and ρ1 are characterized in the proof of Lemma 1.

Lemma 1 states that, with high probability, the minimum eigenvalue of the information matrix
J˜S ,t grows at least in the order of Jt under the policy ILSX(m1 , m2 ). To characterize the growth
√
rate of Jt under ILSX(m1 , m2 ), recall that this policy charges each experimental price at least 41 t
times through the end of period t ≥ 5. This implies that, under ILSX(m1 , m2 ), we have

√
Jt ≥ 18 (m1 − m2 )2 t for all t ≥ 5. (15)

Given this growth rate of Jt , we derive the following result regarding the estimation error under
ILSX(m1 , m2 ).

Lemma 2. (estimation error under known sparsity structure) Let π = ILSX(m1 , m2 ).

Then, there exist finite and positive constants κ2 , ρ2 , and t0 such that

ρ2 s log t κ2 s log t
PπX,θS kJ˜S−1
,t MS ,t k2
≤ √ ≥1− √ (16)
t t

for all θS ∈ ΘS and t ≥ t0 , where k · k denotes the Euclidean norm.

Remark 4. The constants κ2 , ρ2 , and t0 are characterized in the proof of Lemma 2.

According to Lemma 2, with high probability, the estimation error under ILSX(m1 , m2 ) converges
√
to zero at a rate of (log t)/ t. Using this lemma, we obtain the following performance guarantee
for ILSX(m1 , m2 ).

Theorem 2. (upper bound on regret under known sparsity structure) Let π =

ILSX(m1 , m2 ). Then, there exists a finite and positive constant C such that

√
∆πθS (T ) ≤ Cs T log T for all θS ∈ ΘS and T ≥ 2.

Remark 5. The constant C is independent of s and T , and is characterized in the proof of

Theorem 2.

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 15
√
Theorem 2 proves that the regret of the policy ILSX(m1 , m2 ) is at most in the order of s T log T .
As shown in Theorem 1, the lowest achievable growth rate of regret by any admissible policy in
√
our setting is s T . Therefore, ILSX(m1 , m2 ) is first-order optimal in the sense that it achieves
the lowest possible growth rate of regret (up to logarithmic terms). We note that ILSX(m1 , m2 )
relies on the knowledge of the underlying sparsity structure in demand parameters. In the following
section, we construct a policy that achieves a similar near-optimal performance without relying on
this knowledge.

4.2. General Demand Model with Unknown Sparsity Structure

In practice, the seller makes observations of the d-dimensional feature vectors {Zt } without knowing
which components play non-negligible roles in the demand model. Furthermore, demand can be
a nonlinear function of price pt . In accordance with these points, we now extend the main ideas
developed in the preceding subsection to the general demand model (1) and unknown sparsity
structure. Note that, for the following analysis, the seller need not know the sparsity parameter s.
Generalizing the estimation procedure and pricing policy. To extend our analysis, we
generalize the seller’s estimation objective in two major ways. First, regarding the extension from
the linear demand model (5) to the general demand model (1), we use a maximum quasi-likelihood
estimation objective (instead of the least-squares estimation objective). Second, regarding the
extension from known to unknown sparsity structure, we augment the aforementioned estima-
tion objective with a lasso regularization penalty. Therefore, given a history of feature vectors
(X1 , . . . , Xt ) = (x1 , . . . , xt ), our generalized estimation objective in period t is given by
t g(θ̃·uk )
Dk − y
X Z
Qt (θ̃, λ̃) := χk dy − λ̃kθ̃ k1 for θ̃ ∈ R2(d+1) and λ̃ ≥ 0, (17)
k=1 Dk ν(y)

where: χk = I{k ∈ M } and uk = p1k ⊗ xk for k ∈ {1, 2, . . .}, ν(y) = g ′ g −1 (y) for y ∈ R, and

P2(d+1)
kθ̃ k1 = i=1 |θ̃i | denotes the ℓ1 -norm of θ̃. The estimation objective in (17) is a lasso-regularized
quasi-likelihood function for the seller’s observations in the first t periods (for early references on
maximum quasi-likelihood estimation and lasso regularization, see Nelder and Wedderburn 1972
and Tibshirani 1996, respectively). This estimation objective subsumes as a special case the least-
squares estimation objective studied in the preceding subsection: if the demand model were linear
(i.e., g(ξ) = ξ for all ξ ∈ R) and there were no lasso regularization penalty (i.e., λ̃ = 0) then the
Pt
maximizer of Qt (θ̃, λ̃) would equal the least-squares error minimizer of St (θ̃) = k=1 χk (Dk − θ̃ · uk )2
for all θ̃. Moreover, in other special cases, maximizing the estimation objective in (17) is equivalent
to standard maximum likelihood estimation with lasso regularization; see, e.g., §5 for an application
to logit demand models.

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
16 Forthcoming in Management Science

It is straightforward to show that, given λ̃ ≥ 0, the mapping θ̃ 7→ Qt (θ̃, λ̃) is strictly concave and
has a unique maximizer, which we use as an estimate of θ̃. We denote this maximum quasi-likelihood
estimate by
(lasso)
θ̂t+1 (λ̃) = argmaxθ̃∈R2(d+1) Qt (θ̃, λ̃) for λ̃ ≥ 0, (18)

and let ϑ̂(lasso) (lasso) (lasso)

t+1 (λ̃) be the truncated estimate satisfying ϑ̂t+1 (λ̃) = PΘ {θ̂t+1 (λ̃)} for λ̃ ≥ 0, where
PΘ : R2(d+1) → Θ denotes the projection mapping from R2(d+1) onto Θ.
The use of lasso regression has recently attracted a lot of attention due to its ability to retrieve
sparse models (see Belloni and Chernozhukov 2013 and the references therein for a review of recent
results on lasso regression). In our setting, if the seller uses a policy with no regularization (e.g., the
iterated least-squares policy studied in the preceding subsection) then the estimation error would
√
be of order d/ t (up to logarithmic terms) with high probability, following similar arguments as in
√
Lemma 2. This would result in an upper bound on regret in the order of d T (up to logarithmic
√
terms), which is undesirable if d is large. In particular, if d is of higher order than T , then the
upper bound on regret would be linear in T , which is tantamount to no learning taking place at
all. This is why we design a regularized learning policy in the generalized setting of this subsection.
Our policy is called iterated lasso-regularized quasi-likelihood regression with price experimentation
(abbreviated ILQX). Upon observing the feature vector Xt = xt in period t, the ILQX policy with
parameters m1 , m2 , and λ = (λ1 , λ2 , . . .), denoted by ILQX(m1 , m2 , λ), charges the price

 m1 if t ∈ M1 ,
pt = m2 if t ∈ M2 , (19)
(lasso)
ϕ ϑ̂t (λt ), xt otherwise,


where M1 and M2 are as defined in (7).

Analysis of the estimation error and regret. In what follows, we study the revenue per-
formance of the ILQX policy. First, we characterize the convergence rate for the squared norm of
the estimation error under ILQX(m1 , m2 , λ).

Lemma 3. (estimation error under unknown sparsity structure) Let π =

√
ILQX(m1 , m2 , λ), where λ = (λ1 , λ2 , . . .) with λt+1 = c̃t1/4 log d + log t for all t, and c̃ is a positive
constant. Then, there exist finite and positive constants κ3 , ρ3 , and t1 such that

π (lasso) 2 ρ3 s(log d + log t) κ3 s(log d + log t)
PX,θ kθ̂t+1 (λt+1 ) − θ k ≤ √ ≥1− √ (20)
t t
for all θ ∈ Θ and t ≥ t1 , where k · k denotes the Euclidean norm.

Remark 6. The constants κ3 , ρ3 , and t1 are characterized in the proof of Lemma 3.

Using Lemma 3, we obtain the following performance guarantee for ILQX(m1 , m2 , λ).

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 17

Theorem 3. (upper bound on regret under unknown sparsity structure) Let π =

√
ILQX(m1 , m2 , λ), where λ = (λ1 , λ2 , . . .) with λt+1 = c̃t1/4 log d + log t for all t, and c̃ is a positive
constant. Then, there exists a finite and positive constant C̃ such that
√
∆πθ (T ) ≤ C̃s T (log d + log T ) for all θ ∈ Θ and T ≥ 2.

Remark 7. The constant C̃ is independent of s, d, T , and is obtained in the proof of Theorem 3.

We also note that, if d grows in the order of T a , where a > 1, then the upper bound in Theorem 3
√
would be of order s T log T , which is in the same order as the upper bound in Theorem 2.

Theorem 3 shows that the lasso-based ILQX policy achieves the lowest possible growth rate of
regret presented in Theorem 1 (up to logarithmic terms) and is therefore first-order optimal.
As argued earlier, using a policy with no regularization in the case of unknown sparsity structure
√ √
would make the T -period regret grow in the order of d T , which is substantially larger than s T ,
the best achievable rate given in Theorem 1. In contrast, Theorem 3 establishes that ILQX has
the smallest growth rate of regret that can be achieved in the case of a linear demand model with
known sparsity structure (up to logarithmic terms). Thus, a significant consequence of Theorem 3
is that the ILQX policy effectively recovers the added cost of not knowing the sparsity structure
and having a generalized linear demand model instead of a linear one, thereby achieving first-order
optimality in a fairly general setting.

5. Computational Results
In this section, we first run illustrative simulation experiments in the context of the linear demand
model. After that, to shed further light on our general analysis, we study a real-life data set from
the consumer lending industry that involves nonlinear customer response.

5.1. Case Study I: Simulation Experiments for the Linear Demand Model
In the following simulation experiments, we employ the linear demand model (5) to demonstrate the
value of: (a) regularization, (b) price experimentation over greedy pricing, and (c) individualized
pricing over segmentation-based pricing.

5.1.1. Value of regularization. To illustrate the theoretical performance results derived in §4,
we examine the performance of three personalized dynamic pricing policies, namely the policies
of: (i) a semi-clairvoyant seller who employs the unregularized price experimentation policy ILSX
on the relevant s out of d dimensions of features, as described in §4.1; (ii) a seller who employs
ILSX on all d dimensions of features; and (iii) a seller who employs the lasso-regularized price
experimentation policy ILQX described in §4.2. Note that (i) is the case of known sparsity structure
(i.e., the seller knows which customer features are relevant), whereas (ii) and (iii) correspond to

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
18 Forthcoming in Management Science

the case of unknown sparsity structure. Moreover, since we consider the linear demand model in
this section, the essential difference between ILSX and ILQX is lasso regularization.
In our simulation experiments, we consider the following problem parameters. The feasible price
interval is [ℓ, u] = [0.2, 2] and the first two prices of all policies are chosen arbitrarily at p1 =
m1 = 1.1 and p2 = m2 = 1.2. The (unnormalized) customer feature vectors {Zt } are multivariate
normal random variables with d = 14 dimensions, with their mean µZ equal to a vector of ones
and covariance matrix ΣZ = 0.2I14 , where I14 is the 14 × 14 identity matrix. The unknown demand
parameter vector is θ = (α, β) ∈ R30 , where

α = [ 1.1, −0.1, 0, 0.1, 0, 0.2, 0, 0.1, −0.1, 0, 0, 0.1, −0.1, 0.2, −0.2 ],
β = (−1) × [ 0.5, 0.1, −0.1, 0, 0, 0, 0, 0.2, 0.1, 0.2, 0, 0.2, −0.1, −0.2, 0 ].

Note that 11 out of 30 components of θ are zero in this model; thus, the presence of zero components
would have a nontrivial effect. The parameter space is Θ = [−1.5, 1.5]30 , and the demand shocks
{ǫt } are normally distributed with mean zero and standard deviation σ0 = 0.01. For ILQX, the
√
regularization parameter sequence λ = (λ1 , λ2 , . . .) is selected such that λt+1 = 0.05t1/4 log d + log t
for all t; this choice is informed by the theory developed in §4.2.
500
semi-clairvoyant ILSX
400
ILSX
ILQX
∆πθ (T )

300

200

100

0
0 0.5 1 1.5 2
T 4
10
Figure 1 Value of regularization. The solid, dashed, and dotted curves display the T -period regret of (i) a
semi-clairvoyant seller who employs the unregularized price experimentation policy ILSX on the relevant s out of d
dimensions of features; (ii) a seller who employs ILSX on all d dimensions of features; and (iii) a seller who employs
the lasso-regularized price experimentation policy ILQX, respectively. The problem parameters are as given in §5.1.1.
The displayed regret values are computed by averaging the realized regret over 30 sample paths.

The performance of the three pricing policies, as measured by regret, is shown in Figure 1. First
√
of all, we observe that the seller’s regret over T periods, i.e., ∆πθ (T ), grows roughly at rate T
under all three policies. Second, we observe that using ILSX on all d dimensions of features yields
the worst performance, which is expected because ILSX neither knows nor adjusts for the sparsity
present in the model considered. Third, we observe that the lasso-regularized policy, ILQX, performs
slightly better than the first-order optimal semi-clairvoyant policy for short time horizons, whereas
the performance of these two policies are essentially the same for long time horizons. As noted
by Hastie et al. (2009, p. 57), shrinking or setting some coefficients to zero can help reduce the
variance of least-squares estimates; thus, the simple and sparse model of ILQX can be more effective

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 19

at handling noise that arises from having a small number of observations in estimation (note that,
√
for T = 20,000 there are about 20000 ≈ 141 experimental periods used in the estimation step of
the policies). This observation is encouraging from a practical standpoint because it suggests that
a seller (who would in practice not know the sparse components of a model, if present) can do very
well by employing ILQX even when the number of samples is small.

5.1.2. Value of price experimentation. As alluded to in §3, myopic policies, which do not
explicitly experiment with prices, can lead to remarkably poor revenue performance in the context
of personalized dynamic pricing. Specifically, we demonstrate that a myopic pricing policy can
make the seller’s regret grow linearly in T , which is the worst possible growth rate of regret.
First, let us define what is meant by myopic pricing in our formulation. Consider the linear
demand model (5), and suppose that the seller charges two distinct prices p1 , p2 ∈ [ℓ, u] in the
first two periods. Then, at the end of every period t ≥ 2, the seller computes the least-squares
estimate of the unknown parameter vector θ, and in the following period, charges the price that
would have been optimal if θ were equal to its most recent estimate. The seller subsequently
repeats this estimate-and-optimize routine until the end of the time horizon. This pricing policy
is typically referred to as “greedy iterated least squares” (abbreviated greedy ILS, or GILS) (see
Keskin and Zeevi 2014, Qiang and Bayati 2016). Note that this is a myopic policy that puts no
emphasis on price experimentation to accelerate learning, and simply maximizes the single-period
revenue function based on the most recent estimates.
To demonstrate the performance of the myopic pricing policy versus one based on price exper-
imentation (as in §4.1), we compare greedy ILS against ILQX in the problem setting in §5.1.1.
Figure 2 shows that the regret of greedy ILS grows at a linear rate, whereas ILQX maintains a
sublinear growth rate—a stark difference in performance.
1500
greedy ILS
ILQX
1000
∆πθ (T )

500

0
0 1000 2000 3000 4000 5000
T
Figure 2 Value of price experimentation. The solid and dotted curves display the T -period regret of (i)
greedy ILS, which is a myopic policy that puts no emphasis on learning, and (ii) ILQX, which dedicates a judiciously
selected number of periods to price experimentation, respectively. The problem setting is as described in §5.1.2. The
displayed regret values are computed by averaging the realized regret over 30 sample paths.

To delve further into the difference between the two policies, we plot in Figure 3 the price paths
followed by the two policies in 30 independent simulation runs, together with the optimal price

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
20 Forthcoming in Management Science

path. Interestingly, in many instances the price path of greedy ILS converges to the lower price
boundary of 0.2 and stays there, despite the optimal prices being far away from that boundary. The
histogram in Figure 3(b) makes this clear, showing that 29 out of 30 prices charged by greedy ILS
in period 5000 are equal to 0.2. We note that, when there is a subset of estimate values that
make greedy ILS charge a price at the boundary of feasible range, this boundary price would be
an uninformative action for the seller. That is, if greedy ILS keeps charging the boundary price,
there would be very limited cumulative price variation, and thus, the demand parameter estimates
would not change significantly over time. In turn, this would make greedy ILS charge future prices
near the same boundary, resulting in a vicious cycle. This is how the presence of uninformative
prices in our general formulation leads to poor regret performance under greedy ILS. In contrast,
the experimentation-based policy, ILQX, does not suffer from charging prices at the boundary, with
Figure 3(d) showing that most of the prices charged by this policy is between 0.5 and 1 by period
5000, which is a narrow range where the optimal price sequence falls in as well.
(a) price paths of greedy ILS (b) price histogram of greedy ILS
30
optimal price
number of sample paths

2
25

1.5 20
pt

15
1
10
0.5
5

0 0
0 1000 2000 3000 4000 5000 0.5 1 1.5 2
t p5000
(c) price paths of ILQX (d) price histogram of ILQX
30
optimal price
number of sample paths

2
25

1.5 20
pt

15
1
10
0.5
5

0 0
0 1000 2000 3000 4000 5000 0.5 1 1.5 2
t p5000
Figure 3 Price evolution under greedy ILS and ILQX. Panels (a) and (b) depict sample paths of the price
sequence {pt } and the histogram of the price in period 5000, respectively, generated under greedy ILS. Panels (c)
and (d) show sample paths of the price sequence {pt } and the histogram of the price in period 5000, respectively,
generated under ILQX. All panels are based on the setting considered in §5.1.2, and there are 30 sample paths in total.
In panels (a) and (c), each solid curve displays one of the 30 sample paths of {pt }, and the dotted curve displays the
underlying optimal price sequence. Under greedy ILS, the vast majority of the sample paths result in uninformative
price choices at the boundary of the feasible price range, whereas under ILQX, the sample paths concentrate around
the underlying optimal price instead of the boundaries.

To prevent greedy ILS from getting stuck at the feasible price boundary, it is possible to slightly
modify this myopic policy such that, whenever its price path hits the same boundary in two

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 21

consecutive periods, the price path is pulled towards the interior of the feasible price set by a
margin δ > 0. To be precise, if the myopic price equals ℓ (resp. u) in periods t − 1 and t, then
this modified policy, denoted as δ-greedy ILS, charges the price ℓ + δ (resp. u − δ) in period t,
and otherwise, it simply charges the myopic price. This introduces a small amount of variation
in pricing decisions, and as shown in Figure 4, can significantly improve the regret performance.
But, ILQX still outperforms this modification of greedy ILS; hence, the judiciously designed price
experiments of ILQX appear to be a more efficient tool for collecting information.
1500 greedy ILS
δ-greedy ILS with δ = 0.1
δ-greedy ILS with δ = 0.3
1000 δ-greedy ILS with δ = 0.5
∆πθ (T )

ILQX
500

0
0 1000 2000 3000 4000
T
Figure 4 Experimentation near the boundary of feasible prices. The dashed curves display the T -period
regret of δ-greedy ILS for δ ∈ {0.1, 0.3, 0.5}, whereas the solid and dotted curves show the T -period regret of greedy ILS
and ILQX, respectively. For the three curves displayed for δ-greedy ILS, the regret is the lowest when δ = 0.3, but this
is still higher than the regret of ILQX. The problem setting is as described in §5.1.2. The displayed regret values are
computed by averaging the realized regret over 30 sample paths.

As explained in §3, in the absence of uninformative prices, greedy ILS would collect sufficient
information from demand observations, but when uninformative prices exist, it can perform poorly,
as illustrated above. In such cases, experimentation-based pricing policies offer substantial value
relative to greedy ILS. We further illustrate this point in our analysis of a real-life data set in §5.2.

5.1.3. Value of personalization at the individual level over customer segmentation. To

study individualization and customer segmentation in dynamic pricing with demand learning, let us
consider a market consisting of two distinct customer populations of equal sizes (hereafter referred
to as populations 1 and 2), and compare ILQX against a policy that first classifies a new customer
according to a clustering algorithm, then prices for each population separately. Specifically, we
consider a mixture of two normal distributions to represent the customers in the market. That is,

d
Zt = ξ N (µ1 , Σ1 ) + (1 − ξ) N (µ2 , Σ2 ), (21)

where: ξ ∼ Bernoulli(0.5), and (µ1 , Σ1 ) and (µ2 , Σ2 ) are two distinct parameter sets pertaining to
the customer feature distributions in populations 1 and 2, respectively. A cross section of these
feature distributions is illustrated in Figure 5, and a detailed account of the parameter values is
given in Appendix D.
For a conservative comparison of segmentation with ILQX, we consider the following two
“segment-then-optimize” policies. For the first segment-then-optimize policy (denoted by segment),

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
22 Forthcoming in Management Science

z2
1

-1
-1 0 1 2 3 4
z1
Figure 5 Distribution of features. The figure shows two contour plots for the feature density functions belonging
to the two customer populations in §5.1.3, with each contour plot displaying the cross section of a higher-dimensional
density function in the first two dimensions. The two populations are sufficiently “separated” that segmentation is
not prone to substantial classification errors—the average misclassification rate using the k-means algorithm is 4%
for T = 5000 customers.

the seller knows which segment each arriving customer belongs to and the demand parameters with
certainty. For each customer, the segment policy chooses the price that maximizes the expected
revenue for an average individual in the current customer’s segment. This policy represents the
segmentation-based pricing used widely in practice. For the second segment-then-optimize policy
(denoted by segment-ILQX), the seller knows which segment each arriving customer belongs to
but has to learn the demand parameters. For each customer, segment-ILQX chooses the price that
maximizes the expected revenue for the individual customer, using the same price experimentation
as ILQX, but using only the data collected from the particular customer’s segment when estimating
the demand model parameters.
For a more realistic comparison, we also consider the policy segment-ILQX′ , where the seller does
not know which segment an arriving customer belongs to and has to learn the segments over time.
For segment-ILQX′ , we assume the seller has access to the features of 100 customers to start with,
so that customer segmentation can operate starting from the first period.
Figure 6 compares ILQX, which treats every customer as an individual, against segment,
segment-ILQX, and segment-ILQX′ . From this figure, we see that ILQX is noticeably better than
these segment-then-optimize policies. The policy segment exhibits a linear regret growth; this poor
performance stems from the fact that targeting the average customer in each population, however
tightly clustered as they may be, leads to biased pricing for every individual customer which does
√
not disappear over time. While segment-ILQX′ achieves a regret of order T , it performs worse
than the plain version of ILQX that does not classify the customers. This observation is due to
the fact that, within each segment, segment-ILQX′ learns from a smaller sized data set. As such,
segment-ILQX′ would perform worse than ILQX that learns from the entire data set, regardless of
the customer population model under consideration.
We see in Figure 6 that segment-ILQX performs better than segment-ILQX′ because its misclas-
sification rate is zero, but it still does not beat ILQX that does not know a priori the existence
of two clusters in the customer base. This observation thus rules out the possibility that the

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 23

misclassification error, however small, is contributing to the underperformance of segment-ILQX′

relative to ILQX (which does not segment customers). Therefore, even if we entirely eliminate the
misclassification error of segment-ILQX′ , ILQX would outperform segment-ILQX′ .
(a) comparison with segment (b) comparison with segment-ILQX
ILQX ILQX
200
segment segment-ILQX′
150
150
segment-ILQX

∆πθ (T )
∆πθ (T )

100
100

50 50

0 0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
T T
Figure 6 Value of personalization at the individual level over customer segmentation. Panel (a) displays
the T -period regret of ILQX (dotted curve) and segment (dashed curve), which knows which population each customer
is coming from and charges the price targeting the average customer in the arriving customer’s population. Panel (b)
depicts the T -period regret of ILQX (dotted curve) and segment-ILQX′ (solid curve), which first classifies the customer
using the k-means classification algorithm and then uses ILQX within each cluster. Panel (b) also shows the T -period
regret of segment-ILQX (dashed curve), which operates similar to segment-ILQX′ except that it knows which population
each customer is coming from, thereby eliminating the classification error. The problem parameters are as described
in §5.1.3 and Appendix D. The displayed regret values are computed by averaging the realized regret over 30 sample
paths.

To gain some intuition for these results, consider a simple example of two clusters of customers
from two very different locations; say Manhattan, New York and Louisville, Colorado. While at
first glance segmenting then targeting the individual customer in each region separately (as in
segment-ILQX′ ) may appear to be the right approach, this unnecessarily reduces the number of
observations to learn from because such a pricing policy would not take advantage of learning from
customers with similar characteristics in dimensions other than location (e.g., income bracket and
credit score). We conclude that, while conceptualizing a heterogenous customer base by segmenta-
tion may be intuitive and helpful, the optimal way to implement personalized dynamic pricing is
at the individual customer level.

5.2. Case Study II: Analyzing Real-life Data from an Auto Loan Company
In this section, we explore the validity of our policy for computing personalized lending rates for
an online auto loan company in the United States. As mentioned in §1, consumer lending is a
prominent industry in which personalization of prices (i.e., rates) is both socially acceptable and in
current practice, albeit at varying degrees of granularity. One purpose of this section is to compare
our first-order optimal policy with historical decisions made by a company. We also compare our
policy with other dynamic pricing-and-learning policies that have been proposed in the literature.

5.2.1. Description of data. We use the data set CPRM-12-001: On-Line Auto Lending pro-
vided by the Center for Pricing and Revenue Management at Columbia University. This includes
information about all auto loan applications received by a major online lender in the United States

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
24 Forthcoming in Management Science

from July 2002 through November 2004. The data set includes the date of an application made by
prospective borrowers, the type of loan they requested (term and amount) as well as some personal
information. The data set also includes whether or not the online lender approved the application,
and for the approved applications, an annual percentage rate (APR) offered, and whether or not a
contract eventuated. In contrast to Case Study I, the customers’ demand responses are binary in
this setting, indicating whether or not a loan was agreed upon.
We use the first 50,000 data entries for the case study. A summary of the data set (with descriptive
statistics on the demand and available features) is shown in Table 3 in Appendix E. Note that
the variable apply is the binary demand indicator for eventual contract, and there are 18 feature
variables, both discrete categorical (e.g., the state of residency for the applicant) and continuous
(e.g., FICO score).

5.2.2. Problem formulation. The pricing problem of the online auto loan company in our data
set is a special case of the problem formulation in §2 such that the demand is a binary variable. In
this case, we compute the price p of a loan as the net present value of future payments minus the
loan amount; i.e.,
TX
erm
p = M onthly P ayment × (1 + Rate)−τ − Loan Amount,
τ =1

where Rate is the monthly London interbank offered rate (LIBOR) for the studied time period.3
We use the interval [0, 7500] as the set of feasible prices. To represent the customers’ choices, we
employ a logit demand model. That is, given a price p and a feature vector x, the binary variable
apply that represents the customer choice is given by
eα·x+(β·x) p
(
1 with probability 1+eα·x+(β·x) p
,
apply = 1
(22)
0 with probability 1+eα·x+(β·x) p
,

where α and β are demand parameter vectors.4 The model (22) is a special case of (1) such that the
α·x+(β·x) p
e
variable apply is the demand realization, the expected demand is g(α · x + (β · x)p) = 1+e α·x+(β·x) p ∈
n o
eα·x+(β·x) p eα·x+(β·x) p
[0, 1], and the demand shock is given by ε = I U[0,1] ≤ 1+eα·x+(β·x) p − 1+eα·x+(β·x) p , where U[0,1] is
a uniform random variable in [0, 1].

3
Because this net present value is calculated from the auto loan company’s perspective, the relevant interest rate is
the one used for financial exchanges between commercial banks.
4
It is possible to include higher powers of the price and continuous customer features in the demand model, which
can improve the fit of the model to the data. At the same time, this also raises an overfitting concern because a very
good fit to a given data set could result in poor predictions in future data. The primary goal of the case study in
this section is to provide a simple yet realistic test bed to show the value of our approach, which is based on price
experimentation and lasso regularization. To ensure a fair comparison, all of the studied policies and benchmarks use
the same set of explanatory variables.

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 25

5.2.3. Pricing policies. We compare the following pricing policies in our data analysis:
1. The company’s actual pricing policy (denoted by company).
2. Iterated lasso logistic regression with price experimentation (denoted by ILQX, as described
in §4.2). Note that, under a logit demand model, the quasi-likelihood function in §4.2 reduces
to the log-likelihood function for logit demand response.
3. Iterated logistic regression with price experimentation (denoted by IQX). This policy is the
same as ILQX but with no lasso regularization; i.e., λt = 0 for all t.
4. Semi-clairvoyant iterated logistic regression with price experimentation (denoted by
semi-clairvoyant IQX). This policy is the same as IQX but with known sparsity structure.
5. Greedy iterated logistic regression (denoted by greedy IL). This policy uses no price experi-
mentation and is an adaptation of the greedy policy in Qiang and Bayati (2016) for the binary
response model (22). Accordingly, there is no model misspecification for this policy.
6. Greedy episodic iterated lasso logistic regression (denoted by greedy EILL) adapted for the
model (22). This is the policy in Javanmard and Nazerzadeh (2016), which uses no price
experimentation and performs parameter estimation only at the beginning of periodic episodes
of geometrically increasing length. The regularization parameter is chosen according to
Javanmard and Nazerzadeh (2016). There is no model misspecification for this policy.
7. Segmented iterated logistic regression with price experimentation (denoted by segment-X).
First, for the 50,000 customers in the data set used for fitting the population model, we use
the k-means classification algorithm to find their optimal segmentation, as measured by the
average silhouette value.5 We then apply semi-clairvoyant IQX separately within each segment.
This would represent an optimistic result for a segmentation-based policy, as in practice the
optimal number and boundaries of the segments would need to be learned on the fly.
We implement each of the policies above to customers who arrived at the company starting from
July 2002, and compute the company’s expected regret based on a population model fitted to the
first 50,000 observations of the data set. The first 550 observations were used for a burn-in stage,
which was sufficient for the feature design matrix to be positive definite, as assumed by the above
policies. We note that specifying a generative model is necessary for backtesting pricing policies as
responses depend on prices selected. This is in contrast to other supervised learning problems such
as prediction problems, whereby an isolated test data set can be used to evaluate a policy without
specifying a generative model.
We set the population model to be the best model found by logistic regression using the backward
elimination method. That is, we fit a logistic regression model for the variable apply (the binary
5
The silhouette value is used to evaluate cluster analysis of a data set. The silhouette value for each point is a
measure of how similar that point is to points in its own cluster, when compared to points in other clusters. See
https://www.mathworks.com/help/stats/silhouette.html for further details on calculating the silhouette value.

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
26 Forthcoming in Management Science

dependent variable for customer choices) using all 18 available features described in Table 3, and
then progressively remove any features that are not statistically significant at the 1% significance
level. The model fitted from this backward elimination method is given by

α = [ −11.07, 0, 0, 0, 0, 4.948, 0, 5.136, 4.051, 0, 0, 0, 0.7972, 0, 0, 0, 0, −1.539, 0 ],

β = [ 0, 0, −0.0460, 0, 0, 0, 0, −3.234, 1.731, 0, 0, −0.3021, 0, 0, 0, 0, 0, 0.5127, 0 ] × 10−3 ,

where the feature vectors x reside in R19 such that x1 = 1 (to allow for an intercept) and x2 , . . . , x19
correspond to the 18 features described in Table 3, normalized by their empirical means for numer-
ical stability.6 In the model fitted above, the two key terms in the expected demand function

g α · x + (β · x)p , namely the terms α · x and (β · x)p, are of comparable order. Note that, because
the prices are expressed in $1000s, the value of β above is of smaller magnitude relative to α, and
this difference in the order of magnitude disappears when β is multiplied by price p.

5.2.4. Results and discussion. Figure 7 shows the regret performance of ILQX against the
√
benchmark policies described in §5.2.3. Note that λt+1 = c̃t1/4 log d + log t for ILQX, as dictated
by Theorem 3, where c̃ = 0.005 is chosen from a grid search. (We revisit and further discuss the
choice of c̃ below.) The experimental prices are arbitrarily chosen as m1 = $391.1 and m2 = $6960.
In Figure 7(a)-(b), we observe that our first-order optimal policy, ILQX, is vastly better than the
company’s policy, and its performance is similar to semi-clairvoyant IQX that knows which features
are non-zero. We also see the importance of lasso regularization, as can be seen by the substantially
larger regret incurred by IQX, which is ILQX with no regularization. In sum, we can conclude that
ILQX is a fast learning policy with sublinear growth of regret. (We remark that the relative noise
in the plots compared to Case Study I is due to the fact that Figure 7 shows a single sample path
rather than an average over many simulation paths.)
Figure 7(c) compares ILQX with two greedy policies, greedy IL of Qiang and Bayati (2016) and
greedy EILL of Javanmard and Nazerzadeh (2016). Because our general problem formulation entails
potentially uninformative prices, the greedy policies perform strikingly worse, even worse than the
company policy, demonstrating the importance of price experimentation. We further investigate
the performance of greedy policies in the following subsection.
Finally, Figure 7(d) shows the value of personalization at the individual level and the subopti-
mality of customer segmentation in dynamic pricing, as observed in Case Study I. Note that the
performance of segment-X is an optimistic lower bound on the regret of segmentation policies used
in practice, as segment-X knows the optimal segmentation for the customers from the beginning.

6
This is a standard data processing step to put the different features on a similar scale. Note the dependent variable
apply is also normalized this way.

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 27

10
5 (a) 10
5 (b)
2.5 2.5
company IQX
2 ILQX 2 ILQX
semi-clairvoyant IQX

∆πθ (T )

∆πθ (T )
1.5 1.5

1 1

0.5 0.5

0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
T 10
4
T 104

105 (c) 105 (d)

8 2.5
greedy EILL segment-X
2 ILQX
6 greedy IL
ILQX

∆πθ (T )
∆πθ (T )

1.5
4
1
2
0.5

0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
T 104 T 104
Figure 7 Performance of ILQX against various benchmark policies. In all panels, the dotted curves show
the T -period regret of ILQX. On top of these curves, panel (a) shows the T -period regret of company (solid curve),
panel (b) shows the T -period regret of IQX (dashed curve) and semi-clairvoyant IQX (solid curve), panel (c) shows the
T -period regret of greedy IL (solid curve) and greedy EILL (dashed curve), and panel (d) shows the T -period regret
of segment-X (dashed curve). Problem parameters are calibrated from a real-life data set as described in §5.2; the
displayed regret values are based on the sample path of customer feature vectors observed in the data set.

We summarize all results in Figure 8, which displays the expected cumulative revenues for the
first 20,000 customers (who arrive over approximately six months) under the five policies in this
section. The figure shows that ILQX accumulates the highest expected revenue (about $600,000),
which is 47% higher than that of the company policy.
T -period expected revenue (USD)

105
6 ILQX
5 segment-X
company
4
greedy IL
3 greedy EILL

0
0 0.5 1 1.5 2
T 104
Figure 8 Expected revenue of ILQX against various benchmark policies. The graph shows the T -period
expected revenue of ILQX as well as several other benchmark policies. Problem parameters are calibrated from a
real-life data set as described in §5.2; the displayed regret values are based on the sample path of customer feature
vectors observed in the data set.

5.2.5. More on the value of price experimentation. The reason for greedy poli-
cies performing poorly in our setting but near-optimally in Qiang and Bayati (2016) and
Javanmard and Nazerzadeh (2016) is that both papers consider population demand models with

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
28 Forthcoming in Management Science

constant price sensitivity that does not depend on customer features Xt and consequently all prices
are informative in their settings. Specifically, in both papers, β is effectively one-dimensional, with
our setting retrieving theirs by letting β = 1 and β2 = · · · = βd = 0. As a consequence of this mod-
eling choice, the settings of Qiang and Bayati (2016) and Javanmard and Nazerzadeh (2016) do
not entail uninformative prices, and accordingly, greedy policies exhibit good performance. As
explained earlier in §3, this does not necessarily happen in our general setting, thereby indicating
that price experimentation is crucial in general.
To see this point, we plot in Figure 9 the evolution of the prices charged by ILQX and the two
greedy policies, greedy IL and greedy EILL, where the population model and the sequence of feature
vectors are as in §§5.2.1-5.2.4. From Figure 9, we observe that the price paths of greedy IL and
greedy EILL eventually converge to u, the upper boundary of the feasible price interval, thereby
limiting the cumulative price variation and the amount of information collected. By contrast, ILQX
employs explicit price experiments to facilitate learning, and consequently, is able to charge prices
closer to the optimal price. The greedy policies’ convergence to the uninformative upper boundary
of feasible prices is similar to the phenomenon we observed earlier in §5.1.2 for the linear demand
model. This indicates that the personalized pricing problem for the binary demand model also
requires a small amount of judicious price experimentation.
8000

6000 optimal price

greedy EILL
pt

4000
greedy IL
2000 ILQX

0
0 0.5 1 1.5 2
t
Figure 9 Price paths of ILQX and greedy policies. The solid curves show the sequence of prices {pt } charged
by ILQX, greedy IL, and greedy EILL, and the dotted curve shows the optimal price sequence. The population model
and the sequence of feature vectors are as in §§5.2.1-5.2.4.

To further investigate this point, we generate additional results on 30 new customer feature
paths for 20,000 periods by bootstrapping with replacement. We display in Figure 10 the evolution
of prices charged by greedy IL and greedy EILL in this setting. For greedy IL, 24 out of 30 paths
converge to u by period 20,000, and for greedy EILL, 23 out of 30 paths. These observations highlight
the necessity of price experimentation in personalized pricing practice; greedy policies that do not
experiment with prices periodically (to steer the price sequence away from uninformative prices)
can perform very poorly.7
7
As in §5.1.2, it is possible to consider slight modifications of greedy policies such that prices are chosen myopically
unless the price path hits the same boundary in two consecutive periods, in which case the price path is pulled towards
the interior of the feasible set by a margin δ > 0 (see §5.1.2 for a precise description). For the sample path of customer
feature vectors in our real-life data set, we observe that the regret performance of such slightly modified greedy
policies is very close to the regret performance of plain (unmodified) greedy policies. For details, see Appendix F.

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 29

(a) price paths of greedy IL (b) price histogram of greedy IL

8000 optimal price

number of sample paths

20
6000
15

pt 4000
10

2000 5

0 0
0 0.5 1 1.5 2 0 2000 4000 6000
t 104 p20,000
(c) price paths of greedy EILL (d) price histogram of greedy EILL
8000 optimal price

number of sample paths

20
6000
15
pt

4000
10

2000 5

0 0
0 0.5 1 1.5 2 0 2000 4000 6000
t 104
p20,000
Figure 10 Price evolution under greedy IL and greedy EILL. Panels (a) and (b) show sample paths of the price
sequence {pt } and the histogram of the price in period 20,000, respectively, generated under greedy IL. Panels (c)
and (d) depict sample paths of the price sequence {pt } and the histogram of the price in period 20,000, respectively,
generated under greedy EILL. Problem parameters are calibrated from a real-life data set as described in §5.2, and
the sample paths of customer feature vectors are bootstrapped from the data set with replacement.

5.2.6. On the choice of regularization parameters. We investigate in Figure 11 the effect

of the regularization parameter λt on the performance of ILQX.

(a) effect of c̃ (b) effect of growth rate of λt

4 4
10 10
c̃ = 2.5 × 10 −3 λt ≈ log d
6 6 λt ≈ log d + log t
c̃ = 5.0 × 10−3
c̃ = 7.5 × 10−3 λt ≈ t1/4 √
(log d + log t)1/4
∆πθ (T )
∆πθ (T )

4 c̃ = 1.0 × 10−2 4 λt ≈ t1/4 log d + log t

2 2

0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
T 10
4
T 10
4

Figure 11 The effect of the regularization parameter λt on the performance of ILQX. Panel (a) and
(b) show the T -period
√ regret of ILQX for different choices of the regularization parameter sequence {λt }. In panel
(a), λt+1 = c̃t1/4 log d + log t for all t, and different values of the constant c̃ are considered. In panel (b), different
(suboptimal) growth rates of λt are considered; the solid and dashed curves correspond to the following cases: (i)
λt+1 = 0.005 log d for all t; (ii) λt+1 = 0.005(log d + log t) for all t; and (iii) λt+1 = 0.005t1/4 (log d + log t)√1/4 for all t.
On the other hand, the dotted curve corresponds to the near-optimal growth rate of λt : λt+1 = 0.005t1/4 log d + log t
for all t. The rest of the problem parameters are calibrated from a real-life data set as described in §5.2; the displayed
regret values are based on the sample path of customer feature vectors observed in the data set.

In Figure 11(a), we consider different values for the constant c̃ in the near-optimal choice of λt in
Theorem 3. We find that, for this case study, c̃ in the order of 10−3 yields the best results, and
that, within the same order of magnitude, the expected regret is not too sensitive to the exact

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
30 Forthcoming in Management Science

value of the constant. To demonstrate the importance of using the correct growth rate for λt , we
compute the performance of ILQX under three suboptimal choices for λt . In the first choice, we
have λt+1 = 0.005 log d for all t; in the second one, λt+1 = 0.005(log d + log t) for all t; finally, in the
third one, λt+1 = 0.005t1/4 (log d + log t)1/4 for all t. In Figure 11(b), we observe that the rate of
convergence of λt is crucial, with the expected regret being sensitive to this choice.

6. Impact of Model Misspecification

In this section, we study the performance of our approach in the presence of model misspecification.
To that end, suppose that the customer base consists of separate subpopulations (segments), each
with different demand model parameters. If the seller assumes that (1) holds for the entire popu-
lation with the same α and β, then the underlying model would result in misspecification errors.
We explore the impact of such model misspecification, using the real-life data set from §5.2.1.
To specify a multi-segment generative model, we first find the optimal clustering of the first
50,000 customers using the k-means clustering algorithm, and then fit separate logistic regression
models via backward elimination for each cluster of customers. For the logistic regression step, we
restrict the dimension of β to equal 1, in order to avoid giving any advantage to our policy, ILQX,
which is designed to perform well for the general model with feature-dependent price sensitivity
(i.e., high-dimensional β). This yields a three-segment demand model with three sets of param-
eters (α1 , β1 ), (α2 , β2 ) and (α3 , β3 ). Further details on the multi-segment generative model are in
Appendix G.
We compare three policies in this setting: (i) the lasso-regularized price experimentation policy
ILQX described in §4.2; (ii) the first segmentation policy described in §5.1.3, namely segment,
which targets the average customer in each segment; and (iii) the second segmentation policy
described in §5.1.3, segment-ILQX, which places customers in segments and applies ILQX within
each segment. Note that both ILQX and segment are misspecified, the former because it does
not distinguish between the underlying customer segments, and the latter because it ignores the
fact that the non-price determinants of demand are heterogeneous over individual customers. We
further disadvantage ILQX by giving perfect knowledge of the segments to the other two policies,
segment and segment-ILQX. In practice, segmentation policies would make classification errors
because segments need to be discovered dynamically. We expect segment-ILQX to exhibit good
performance because it correctly specifies the underlying demand model. However, it is not clear a
priori if ILQX or segment would perform well, as these policies misspecify the true demand model.
Figure 12(a) shows the empirical performance of ILQX, segment, and segment-ILQX. Perhaps
surprisingly, ILQX performs the best, with sublinearly growing regret that is slightly better than
segment-ILQX. On the other hand, segment performs very poorly with linearly growing regret. The

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 31

surprisingly good performance of ILQX indicates that, for this data set, (i) the three clusters are
not particularly well-separated, and (ii) the demand responses for the different clusters are not too
different from each other, which make learning and earning for a single, overarching demand model
for the entire customer base sufficient. The slightly better performance of ILQX over segment-ILQX
is likely due to the fact that ILQX works with a greater sample size than segment-ILQX.
Is it then possible to find a generative model where ILQX does not perform well? We are able to
find such a model by trial and error, by adding progressively larger random perturbations to the
coefficients of the demand models of each cluster. With sufficiently large perturbations, we find a
model in which segment-ILQX performs the best with sublinear regret growth, whereas ILQX and
segment perform poorly. The computational results for this model are shown in Figure 12(b), and
the details of this model are given in Appendix G.

(a) first misspecified model 4

(b) second misspecified model
10
12000 6

10000
segment-ILQX segment-ILQX
segment segment
∆πθ (T )

8000 ILQX 4 ILQX

∆πθ (T )

6000

4000 2

2000

0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
T 10 4
T 10
4

Figure 12 Impact of model misspecification. Panel (a) shows the T -period regret of three policies, namely,
ILQX (dotted curve), segment (solid curve), and segment-ILQX (dashed curve) in the three-segment model derived
from the real-life data set described in §5.2.1. Panel (b) shows the T -period regret of the same three policies in the
same model with added random perturbations to demand coefficients. Problem parameters are calibrated from the
aforementioned real-life data set as described in §6 and Appendix G. The displayed regret values are computed by
averaging the realized regret over 30 sample paths of customer feature vectors that are bootstrapped from the data
set with replacement.

We thus conclude that, while a demand model that cannot be captured by the generalized
linear model (1) exists, ILQX is robust to this type of misspecification for the real-life data set
CPRM-12-001: On-Line Auto Lending.

7. Concluding Remarks
In this paper, we investigate personalized dynamic pricing with demand learning and information
about individual customer characteristics. After establishing that the optimal policy must have
√
expected regret growing at a rate at least of order s T , we propose price experimentation policies
and prove their near-optimality. Of particular note is the analysis of the seller who has possibly
a large number of customer features but can nevertheless earn near-optimal profit by employing
the lasso-regularized price experimentation policy, ILQX. Our extensive data analysis validates
the theoretical performance bounds, and firmly establishes the practical value of our personalized
policies over other intuitive and/or widely-practiced policies, such as unregularized pricing policies

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
32 Forthcoming in Management Science

that do not account for potential sparsity in the demand model, myopic pricing policies that do
not experiment with prices to facilitate learning, and segmentation policies that do not utilize
across-segment information.
We believe the feature-based modeling approach and techniques used in this paper can be trans-
ferred to analyze personalization in other operational decision problems. These include assortment
optimization, joint pricing-inventory management, and healthcare, which are all rapidly gaining
attention for the potential value of personalization.

Acknowledgments
The authors thank the Department Editor Prof. Noah Gans, the associate editor, and three referees for their
helpful comments that improved the presentation and structuring of the paper.

References
Araman, V. F. and Caldentey, R. (2009). Dynamic pricing for nonperishable products with demand learning.
Operations Research, 57(5):1169–1188.
Ban, G.-Y., Gallien, J., and Mersereau, A. J. (2019). Dynamic procurement of new products with covariate
information: The residual tree method. Manufacturing & Service Operations Management, 21(4):798–815.
Ban, G.-Y. and Rudin, C. (2018). The big data newsvendor: Practical insights from machine learning.
Operations Research, 67(1):90–108.
Belloni, A. and Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse
models. Bernoulli, 19(2):521–547.
Besbes, O. and Zeevi, A. (2009). Dynamic pricing without knowing the demand function: Risk bounds and
near-optimal algorithms. Operations Research, 57(6):1407–1420.
Broder, J. and Rusmevichientong, P. (2012). Dynamic pricing under a general parametric choice model.
Operations Research, 60(4):965–980.
Chen, X., Owen, Z., Pixton, C., and Simchi-Levi, D. (2015). A statistical learning approach to personalization
in revenue management. Working Paper.
Cheung, W. C., Simchi-Levi, D., and Wang, H. (2017). Dynamic pricing and demand learning with limited
price experimentation. Operations Research, 65(6):1722–1731.
Cohen, M. C., Lobel, I., and Paes Leme, R. (2016). Feature-based dynamic pricing. Working Paper.
den Boer, A. V. (2015). Dynamic pricing and learning: historical origins, current research, and new directions.
Surveys in Operations Research and Management Science, 20(1):1–18.
den Boer, A. V. and Zwart, B. (2014). Simultaneously learning and optimizing using controlled variance
pricing. Management Science, 60(3):770–783.
Farias, V. F. and Van Roy, B. (2010). Dynamic pricing with a prior on market response. Operations Research,
58(1):16–29.
Ferreira, K. J., Lee, B. H. A., and Simchi-Levi, D. (2015). Analytics for an online retailer: Demand forecasting
and price optimization. Manufacturing & Service Operations Management, 18(1):69–88.
Ferreira, K. J., Simchi-Levi, D., and Wang, H. (2018). Online network revenue management using Thompson
sampling. Operations Research, 66(6):1586–1602.
Harrison, J. M., Keskin, N. B., and Zeevi, A. (2012). Bayesian dynamic pricing policies: Learning and earning
under a binary prior distribution. Management Science, 58(3):570–586.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning. Springer, New York.

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 33

Javanmard, A. and Nazerzadeh, H. (2016). Dynamic pricing in high-dimensions. Working Paper.

Keskin, N. B. and Zeevi, A. (2014). Dynamic pricing with an unknown demand model: Asymptotically
optimal semi-myopic policies. Operations Research, 62(5):1142–1167.
Keskin, N. B. and Zeevi, A. (2016). Chasing demand: Learning and earning in a changing environment.
Mathematics of Operations Research, 42(2):277–307.
Nambiar, M., Simchi-Levi, D., and Wang, H. (2017). Dynamic learning and pricing with model misspecifi-
cation. Working Paper.
Nelder, J. and Wedderburn, R. (1972). Generalized linear models. Journal of the Royal Statistical Society.
Series A (General), 135(3):370–384.
Qiang, S. and Bayati, M. (2016). Dynamic pricing with demand covariates. Working Paper.
Ramasastry, A. (2005). Web sites change prices based on customers’ habits. CNN Law Center. Posted on
June 24, 2005.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical
Society. Series B (Methodological), 58(1):267–288.

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
34 Forthcoming in Management Science

Appendix A: Proof of Theorem 1

We first assume that all components of Xt are continuous random variables, with positive measure
in the interior of X and zero on the boundary, and show at the end that this can be generalized to
Xt with discrete components.
We derive our lower bound for the more general case where the distribution of {εt } is from
the exponential family of distributions; that is, {εt } are independent and identically distributed
random variables whose density has the following parametric form: fε (ξ | φ) = eφ·T (ξ)−A(φ)+B(ξ) ,
where φ ∈ Rn is the vector of distribution parameters, T : R → Rn , A : Rn → R, and B : R → R
are differentiable functions, and n is a natural number that represents the number of distribution
iid
parameters. Note that the case where εt ∼ N (0, σ02 ) is a special case of the above setting, obtained
by letting n = 1, φ = − 2σ12 , T (ξ) = ξ 2 and B(ξ) = − 12 log(2π) for all ξ, and A(φ) = 21 log − 2φ
1

for
0
all φ.
Given θ = (α, β) ∈ Θ and conditional on X T = xT , the density of the history vector Ht =
(p1 , . . . , pt , D1 , . . . , Dt , X1 , . . . , Xt+1 ) is given by
t
Y
ℓt (Ht , θ, xT ) = fε Dk − α · xk − (β · xk )pk φ for t = 1, 2, . . . , T. (23)
k=1

By elementary analysis, (23) implies that Ht has the following Fisher information matrix under
any given admissible policy π ∈ Π:
( T )
∂ log ℓ t (H t , θ, xT ) ∂ log ℓ t (H t , θ, xT )
Itπ (xT ) := Eπθ = ζ(φ) Eπθ [Jt (xT )], (24)
∂θ ∂θ
∂ ∂ ∂ ∂

where ζ(φ) = Eπθ [φ ·∇T (ε1 ) +B ′ (ε1 )], ∇T (ξ) = ∂ξ T 1 (ξ), ∂ξ T 2 (ξ), . . . , ∂ξ T n (ξ) and B ′ (ξ) = ∂ξ
B(ξ)
for all ξ, Jt (xT ) is the empirical Fisher information matrix given by
Pt Pt X t
xk xTk pk xk xTk 1 1 T
Jt (xT ) = Pt k=1
T
P k=1
t 2 T
= pk · pk ⊗ xk xTk ,
p x
k=1 k k kx p x
k=1 k k k x k=1

and ⊗ denotes the Kronecker product of matrices. In the remainder of the proof, we consider two
cases:
Case 1: d + 1 ≥ T . In this case, we use the following lemma.

Lemma A.1. (lower bound on cumulative pricing error) There exist finite positive con-
stants c0 and c1 such that
( T ) T
X X c0
sup EX Eπθ [(pt − ϕ(θ, Xt ))2 ] ≥ ,
θ∈Θ
t=2 t=2
c1 + supθ∈Θ EX Ct (θ, X t ) Eπθ [Jt−1 (X t−1 )] Ct (θ, X t )T

where Ct (· , ·) is a 2(d + 1)-dimensional function on Θ × X t such that

Ct (θ, xt ) = − tk=1 ϕ(θ, xk ) vkT

P Pt
k=1 vkT ,

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 35
Pt
and vk ∈ Rd+1 are (column) vectors constructed as follows: for k = 1, . . . , t, let vk := ℓ=1 γkℓ xℓ ,
where {γkℓ , ℓ = 1, . . . , t} solve the following t equations:
t
(
X
T 1 if ℓ′ = k,
γkℓ xℓ xℓ′ =
ℓ=1
0 otherwise.
Proof of Lemma A.1. Let µ be an absolutely continuous density on Θ, taking positive values
on the interior of Θ and zero on its boundary, and let Eµ {·} be the expectation operator associated
with the density µ. We consider estimating the vector [ϕ(θ, X1 ), . . . , ϕ(θ, XT )].
Given that the components of Xt are continuous random variables, with positive measure in the
interior of X and zero on the boundary, the multivariate van Trees inequality (see Gill and Levit
1995) implies that
2
(Eµ,X {tr[Ct (θ, X t )(∂ϕ(θ, Xt)/∂θ)T ]})
Eµ,X Eπθ (pt − ϕ(θ, Xt))2 ≥

π
, (25)
Eµ,X {tr[Ct (θ, X t )It−1 (X t )Ct (θ, X t )T ]} + Ĩ (µ)
where Ĩ (µ) is a constant that depends on µ, and Eµ,X = Eµ {EX (·)}. Since

∂ϕ(θ, xt ) xt ϕ(θ, xt )xt
= − − ,
∂θ 2β · xt β · xt
we have

∂ϕ(θ, xt )T ϕ(θ, xt)
tr Ct (θ, xt ) =− .
∂θ 2β · xt
π
By (24), we have It−1 (X t ) = ζ(φ) Eπθ [Jt−1 (X t )] = ζ(φ) Eπθ [Jt−1 (X t−1 )]. Using these facts and sum-
ming up over t = 2, . . . , T ,
2
ϕ(θ, Xt)
T T Eµ,X
X π 2
X 2β · Xt
Eµ,X Eθ (pt − ϕ(θ, Xt )) ≥ π
,
t=2 t=2
ζ(φ) Eµ,X {tr[Ct (θ, X t )Eθ [Jt−1 (X t−1 )]Ct (θ, X t )T ]} + Ĩ (µ)
and since Eµ {·} is a monotone operator,
2
ϕ(θ, Xt )
T T inf EX
X X θ∈Θ 2β · Xt
sup EX {Eπθ [(pt − ϕ(θ, Xt))2 ]} ≥ π
.
θ∈Θ
t=2 t=2
ζ(φ) sup EX {tr [Ct (θ, X t )Eθ [Jt−1 (X t−1 )]Ct (θ, X t )T ]} + Ĩ (µ)
θ∈Θ

Because 0 < ℓ ≤ ϕ(θ, x) for all θ and x, the numerator of the right-hand side of the preceding
inequality is greater than or equal to ℓ2 /[4βmax
2
(max{1, zmax })2 ], where βmax = max(α,β)∈Θ {kβ k}.
Thus, letting c0 = ℓ2 /[4ζ(φ)βmax
2
(max{1, zmax })2 ] and c1 = Ĩ (µ)/ζ(φ), we arrive at the desired result.
Q.E.D.

For each k = 1, . . . , t, the constants {γkℓ , ℓ = 1, . . . , t} in Lemma A.1 are found by solving the
following system of linear equations:

XTt Xt γk = ek , (26)

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
36 Forthcoming in Management Science

where: Xt = [x1 , . . . , xt ] is the (d + 1) × t matrix of feature vectors up to time t, γk = (γk1 , . . . , γkt ) ∈

Rt , and ek ∈ Rt is the k-th basis vector in Rt . Because d + 1 ≥ T , the matrix XTt Xt is full-rank;
hence, there exists a unique solution for γk . Note that

Ct (θ, xt ) Jt−1 (xt ) Ct (θ, xt )T

Pt−1 Pt−1 Pt
Pt Pt
k=1 xk xTk k=1 pk xk xTk − k=1 ϕ(θ, xk )vk
= − k=1 ϕ(θ, xk ) vk T
k=1 vk
T Pt−1 Pt−1 2 Pt
k=1 vk
T T
k=1 pk xk xk k=1 pk xk xk
Pt−1 Pt
Pt Pt − k=1 k′ =1 {ϕ(θ, xk′ )xk xk vk′ + pk xk xTk vk′ }
T
= − k=1 ϕ(θ, xk ) vk T
k=1 vk
T
t−1 t
− k=1 k′ =1 {pk ϕ(θ, xk′ )xk xTk vk′ + p2k xk xTk vk′ }
P P

− t−1 {ϕ(θ, xk )xk + pk xk }

P
(a)
= − tk=1 ϕ(θ, xk ) vkT
P Pt k=1
k=1 vk
T
t−1
− k=1 {pk ϕ(θ, xk )xk + p2k xk }
P

X t Xt−1
= {ϕ(θ, xk′ )ϕ(θ, xk )vkT′ xk − 2pk ϕ(θ, xk′ )vkT′ xk + p2k vkT′ xk }
k′ =1 k=1
t−1
(b) X
= {pk − ϕ(θ, xk )}2 ,
k=1

where: (a) and (b) follow because, by construction, vkT′ xk = 0 unless k = k ′ , in which case vkT xk = 1.
Thus,

Ct (θ, xt ) Eπθ [Jt−1 (xt )] Ct (θ, xt )T = Eπθ [Ct (θ, xt )Jt−1 (xt )Ct (θ, xt )T ]
t−1
X
Eπθ {pk − ϕ(θ, xk )}2 .

=
k=1

Consequently, we have
( T
)
X
π
EX Eπθ 2
T

∆ (T ) = sup −(β Xt )(pt − ϕ(θ, Xt))
θ∈Θ
t=1
( T )
(c) X
||XS ,t ||1 Eπθ 2

≥ |βmin | sup EX (pt − ϕ(θ, Xt ))
θ∈Θ
( t=1
T
)
X
XminEπθ 2

≥ |βmin | sup EX (pt − ϕ(θ, Xt)) ,
θ∈Θ
t=1
Ps i
where: βmin = min(α,β)∈Θ {kβ k}, ||XS ,t ||1 = i=1 |XS ,t | is the ℓ1 -norm of the compressed feature
vector XS ,t ; XSi ,t is the i-th component of XS ,t for i = 1, . . . , s; Xmin := min{||XS ,1 ||1 , . . . , ||XS ,T ||1 };
and (c) follows because β̃ = |βmin |[sgnXS1,t , . . . , sgnXSs,t ] is a feasible solution to the supremum
problem in the first line. Now, since no component of XS ,t is almost surely zero, there is a positive
constant cmin = mini∈{1,...,s} {E|XSi ,t |}. Then, Xmin ≥ cmin s, and we get
( T
)
X
π
EX Eπθ 2

∆ (T ) ≥ |βmin | cmin s sup (pt − ϕ(θ, Xt)) .
θ∈Θ
t=1

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 37

Combining the above with Lemma A.1, we can lower bound the worst-case regret by
T
X c0
∆π (T ) ≥ βmin
2
c2mins2 .
t=2
c1 |βmin |cmin s + ∆π (t − 1)
2
Letting K1 = c0 βmin and K2 = c1 |βmin |, we further obtain the following:
(d) K1 c2min s2 (T − 1) (e) sK1 c2min s2 T
∆π (T ) ≥ ≥ ,
K2 cmin s + ∆π (T ) 2∆π (T )(K2 cmin s/∆π (T ) + 1)
where: (d) follows because ∆π (T ) ≥ ∆π (t − 1) for t ∈ {1, . . . , T }, and (e) follows because T ≥ 2. Now

∆π (T ) ≥ ∆π (1) ≥ |βmin |cmin s(u − ℓ)2 /4.

K2
Thus, letting K3 = |βmin |(u−ℓ)2 /4
+ 1, we get

K1 1/2
√
∆π (T ) ≥

2K3
cmin s T .

Case 2: d + 1 < T . In this case the t systems of linear equations (26) may become inconsistent by
the Rouché-Capelli theorem, because the right-hand side of (26) spans the entire Rt space but the
rank of XTt Xt may be less than t. To avoid such inconsistencies, we consider instead augmented
feature vectors, x̃k ∈ RT , where the first d + 1 elements of x̃k equal xk and the rest are determined
by the requirement X̃t = [x̃1 , . . . x̃t ] be of rank t. With this augmentation, the proof of in this
case follows by the same arguments for the preceding case. This concludes the proof when the
components of Xt are continuous random variables.
Finally, if some components of Xt are discrete random variables, we can take the conditional
expectation over all possible realizations of the discrete components first, then apply (25) for each
realization. To illustrate, let D denote the set of all realizations of the discrete components of Xt ;
e.g., if Xt ∈ R3 , with Xt1 = 1 almost surely, Xt2 = ±1/2 with probability 1/2 (half male, half female),
and Xt3 a continuous random variable, then D = {[1, 1/2], [1, −1/2]}. For d ∈ D , let XtC (d) denote
the conditioned random variable where the discrete components of Xt are set to the values in d.
Then, we have
X
Eµ,X Eπθ (pt − ϕ(θ, Xt))2 = PX Xt = XtC (d) Eµ,X C Eπθ (pt − ϕ(θ, Xt ))2 | Xt = XtC (d) ,

d∈D

where Eµ,X C denotes taking expectation over µ and the reduced feature vector that only
contains the continuous components. Applying the multivariate van Trees inequality on
Eµ,X C {Eπθ [(pt − ϕ(θ, Xt ))2 | Xt = XtC (d)]} for each d ∈ D , we arrive at the same conclusion
as before by following the same proof arguments for the conditional regret ∆π,C (T ) :=
PT π 2 C
t=1 EX C ,θ [−(β Xt )(pt − ϕ(θ, Xt )) | Xt = Xt (d)] for each d ∈ D . Q.E.D.
T
supθ∈Θ

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
38 Forthcoming in Management Science

Appendix B: Proofs of the Results in Section 4.1

1
1 1
Proof of Lemma 1. Letting US ,t = pt ⊗ XS ,t = pt ⊗ ZS,t and Gt = σ(ε1 , ε2 , . . . , εt ) for t ≥ 1,
note that
 
1 ZST,t pt pt ZST,t
 ZS ,t ZS ,t ZST,t pt ZS ,t pt ZS ,t ZST,t
EπX,θS [US ,t UST,t |Gt−1 ] = EπX,θS 

 pt T 2 2 T Gt−1 
pt ZS ,t pt pt ZS ,t 
T 2 2 T
pt ZS ,t pt ZS ,t ZS ,t pt ZS ,t pt ZS ,t ZS ,t
1 0 pt 0
 

(a)  0 ΣZ,S 0 pt ΣZ,S  1 pt 1 0
=  = ⊗
pt 0 p2t 0  pt p2t 0 ΣZ,S
2
0 pt ΣZ,S 0 pt ΣZ,S
for all t ∈ M , where: (a) follows because pt ∈ {m1 , m2 } is non-random for all t ∈ M , and E[ZS ,t ] = 0
Pt Pt h1 p i
and E[ZS ,t ZS ,t ] = ΣZ,S for all t. Thus, k=1 χk EX,θS [US ,k US ,k |Gk−1 ] = k=1 χk p pk2 ⊗ 10 ΣZ,S
0
T π T

,
k k

where χk = I{k ∈ M }. By Lemma 2 in Keskin and Zeevi (2014), we deduce that there exists a finite
Pt h1 p i
and positive constant γ̃1 such that the smallest eigenvalue of k=1 χk p pk2 is greater than or
k k

equal to γ̃1 Jt . Consequently,

X t
π T
1 0
µmin χk EX,θS [US ,k US ,k |Gk−1 ] ≥ γ̃1 µmin 0 ΣZ,S Jt . (27)
k=1
1 0

To obtain a lower bound on µmin 0 ΣZ,S , let y = (y1 , y2 ) ∈ Rs such that y1 ∈ R, y2 ∈ Rs−1 , and
ky k = 1. Note that
1 (b) (c)
0
y = y12 + y2T ΣZ,S y2 ≥ y12 + µmin (ΣZ,S )ky2 k2 = y12 + µmin (ΣZ,S )(1 − y12 )

yT 0 ΣZ,S

≥ min {ξ + (1 − ξ)µmin (ΣZ,S )}

ξ∈[0,1]
(d)
= min{1, µmin (ΣZ,S )},

where: (b) follows by the Rayleigh-Ritz theorem, (c) follows because ky k = 1, and (d) follows
because ξ + (1 − ξ)µ(ΣZ,S ) is linear in ξ. Thus, (27) implies that
X t
π T
µmin χk EX,θS [US ,k US ,k |Gk−1 ] ≥ γ̃1 min{1, µmin (ΣZ,S )}Jt . (28)
k=1

To conclude the proof, we use Theorem 3.1 in Tropp (2011). Note that the maximum eigenvalue

of χk US ,k UST,k is bounded above by tr χk US ,k UST,k = χk UST,k US ,k = χk (1 + p2k )(1 + Zk1
2 2
+ · · · + Zkd )≤
(1 + u2 )(1 + zmax
2
). Therefore, by Theorem 3.1 in Tropp (2011), (28) implies that
X t
π
PX,θS µmin χk US ,k US ,k ≤ 2 γ̃1 min{1, µmin (ΣZ,S )}Jt ≤ 2se−ρ1 Jt ,
T 1

k=1

1
where ρ1 = 2
(1 − log 2)γ̃1 min{1, µmin (ΣZ,S )}/[(1 + u2 )(1 + zmax
2
)] > 0. Finally, letting γ1 =
1
γ̃ min{1, µmin (ΣZ,S )}
2 1
and κ1 = 2, we obtain the desired result. Q.E.D.

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 39

Proof of Lemma 2. Fix π = ILSX(m1 , m2 ), and let ρ > 0. We first note that, if s and t satisfy
√ √
t/ log t < ρs, then (ρ s log t)/ t > 1 and (16) trivially holds by choosing κ2 ≥ ρ. Thus, in the
√
remainder of the proof, we consider the case where s and t satisfy t/ log t ≥ ρs. Define FS ,k =
q √
σ p1 , . . . , pk , ε1 , . . . , εk , XS ,1 , . . . , XS ,k+1 for k = 1, 2, . . . , and δs,t = (ρ s log t)/ t ∈ [0, 1]. Letting
vs,t = δs,t J˜S−1 ˜−1
,t MS ,t /kJS ,t MS ,t k for all t = 1, 2, . . . , we note that

PπX,θS {kJ˜S−1 2 2 π T T ˜
,t MS ,t k > δs,t } ≤ PX,θS {vs,t MS ,t ≥ vs,t JS ,t vs,t }. (29)

Recalling that EπX,θS [εk |FS ,k−1 ] = 0 and EπX,θS [ε2k |FS ,k−1 ] ≤ σ02 for k ≥ 2, and EπX,θS [eηεk |FS ,k−1 ] < ∞
for all η satisfying |η| ≤ η0 and k ≥ 2, we deduce that EπX,θS [eηεk |FS ,k−1 ] = 1 + ηEπX,θS [εk |FS ,k−1 ] +
µ(η)σ02 η 2
P∞ q π q 2 2
q=2 η EX,θS [εk |FS ,k−1 ]/q! ≤ 1 + µ(η)σ0 η ≤ e for all η satisfying |η| ≤ η0 and k ≥ 2, where
P∞ q−2 π 2 2
µ(η) = q=2 η EX,θS [εqk |FS ,k−1 ]/(q!σ02 ). Therefore, EπX,θS [eηεk |FS ,k−1 ] ≤ eµ0 σ0 η for all η satisfying
η0
|η| ≤ η0 and k ≥ 2, where µ0 = max{µ(η) : |η| ≤ η0 }. Letting ψ = min 2µ1σ2 , [(1+u2 )(1+z

2 )]1/2
, we
0 0 max
obtain from (29) that
T 1 T ˜ 1 T ˜
PπX,θS {kJ˜S−1 2 2 π
,t MS ,t k > δs,t } ≤ PX,θS {e
ψvs,t MS,t − 2 ψvs,t JS,t vs,t
≥ e 2 ψvs,t JS,t vs,t }. (30)

Now, let Bs,t = {v ∈ R2s : kv k = δs,t }. For all v ∈ Bs,t , define {Mkv , k = 1, 2, . . .} such that M1v = 1,
T 1
MS,k − ψv T J˜S,k v
and Mkv = eψv 2 for k ≥ 2. Based on this, we re-express (30) as
1 T ˜ (a) 1 T ˜
vs,t
PπX,θS {kJ˜S−1 2 2 π
,t MS ,t k > δs,t } ≤ PX,θS {Mt ≥ e 2 ψvs,t JS,t vs,t } ≤ PπX,θS {Mtv ≥ e 2 ψv JS,t v for some v ∈ Bs,t },
(31)
where (a) follows because vs,t ∈ Bs,t . Consequently, letting µ1 = 18 (m1 − m2 )2 γ1 > 0, and A1 =
√
{µmin (J˜S,t ) ≥ µ1 t}, we deduce from (31) that
(b) 1 T ˜
PπX,θS {kJ˜S−1 2 2 π v
,t MS ,t k > δs,t } ≤ PX,θS {Mt ≥ e
2 ψv JS,t v for some v ∈ B π c
s,t , A1 } + PX,θS {A1 }
(c) 1 2 √
≤ PπX,θS {Mtv ≥ e 2 ψµ1 δs,t t
for some v ∈ Bs,t } + PπX,θS {Ac1 }
(d) 1
= PπX,θS {Mtv ≥ e 2 ψµ1 ρs log t for some v ∈ Bs,t } + PπX,θS {Ac1 }
(e) 1
√
≤ PπX,θS Mtv ≥ e 2 ψµ1 ρs log t for some v ∈ Bs,t + κ1 se−ρ̃1 t ,

(32)

where: ρ̃1 = 81 (m1 − m2 )2 ρ1 > 0; (b) follows by the law of total probability; (c) follows because
√ √
µmin (J˜S ,t ) ≥ µ1 t on A1 , and kvs,t k = δs,t ; (d) follows because δs,t
2
t = ρ s log t; and (e) follows by
Lemma 1 and (15). Our next goal is to apply the union bound on the first term of the right-hand
side of (32). For that purpose, define
√ √
B̃s,t = {−δs,t + i/ t : i = 0, 1, . . . , 2δs,t t}2s ∩ {v ∈ R2s : kv k ≤ δs,t }.

Note that B̃s,t is a discrete grid over the 2s-dimensional ball of radius δs,t , where the increments
√
between adjacent grid points equal 1/ t. Thus, for all v ∈ Bs,t , there exists ṽ ∈ B̃s,t such that kv −

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
40 Forthcoming in Management Science

√ √
ṽ k ≤ 1/ t. Now, let µ2 = (1 + u2 )(1 + zmax
2
) > 0, and A2 = {kMS ,t k ≤ µ2 t}. Then, by elementary
algebra, we have Mtṽ ≥ κ̃1 Mtv on A2 , where κ̃1 = e−3ψµ2 . Therefore, (32) implies that

PπX,θS {kJ˜S−1 2 2
,t MS ,t k > δs,t }
(f) 1
√
≤ PπX,θS {Mtṽ ≥ κ̃1 e 2 ψµ1 ρs log t for some ṽ ∈ B̃s,t , A2 } + PπX,θS {Ac2 } + κ1 se−ρ̃1 t

(g) P 1
√
π ṽ 2 ψµ1 ρs log t } + Pπ c −ρ̃1 t
≤ ṽ∈B̃s,t PX,θS {Mt ≥ κ̃1 e X,θS {A2 } + κ1 se
(h) 1
√
1
e− 2 ψµ1 ρs log t EπX,θS [Mtṽ ] + PπX,θS {Ac2 } + κ1 se−ρ̃1 t
P
≤ ṽ∈B̃s,t κ̃1 (33)

where: (f) follows by the law of total probability and the fact that Mtṽ ≥ κ̃1 Mtv on A2 , (g) follows
by the union bound, and (h) follows by Markov’s inequality. We now prove that, for every ṽ ∈ B̃s,t ,
1
{Mkṽ , k = 1, 2, . . .} is a supermartingale. Letting ṽ ∈ B̃s,t , and US ,k = p1k ⊗ XS ,k = p1k ⊗ ZS,k

for
k = 1, 2, . . . , we note that

T 1 ψṽ T J˜ T
EπX,θS [Mkṽ | FS ,k−1 ] = eψṽ MS,k−1 − 2 S,k ṽ EπX,θS [eψχk ṽ US,k εk
| FS ,k−1 ]
(i)ṽ 1 T
US,k )2 T
= Mk−1 e− 2 ψχk (ṽ EπX,θS [eψχk ṽ US,k εk
| US ,k ] (34)

for k ≥ 2, where (i) follows from the definition of {Mkṽ } and the independence of {εk }. Since kṽ k ≤

δs,t for all ṽ ∈ B̃s,t , and the maximum eigenvalue of χk US ,k UST,k cannot exceed tr χk US ,k UST,k =
χk UST,k US ,k ≤ (1 + u2 )(1 + zmax
2
), we deduce that |ψχk ṽ T US ,k | ≤ ψδs,t [(1 + u2 )(1 + zmax
2
)]1/2 ≤ η0 .
Thus,
(j) (k)
1 T
US,k )2 µ0 σ02 ψ 2 χk (ṽ T US,k )2
EπX,θS [Mkṽ | FS ,k−1 ] ≤ Mk−1
ṽ
e− 2 ψχk (ṽ e ṽ
≤ Mk−1

2 2
for k ≥ 2, where: (j) follows by (34) and the fact that EπX,θS [eηεk |FS ,k−1 ] ≤ eµ0 σ0 η for all η satis-
fying |η| ≤ η0 , and (k) follows because µ0 σ02 ψ ≤ 12 . As a result, (Mkṽ , FS ,k ) is a supermartingale.
Consequently, by (33) and the fact that M1ṽ = 1, we deduce that

1
√
PπX,θS {kJ˜S−1 2 2 1
e− 2 ψµ1 ρs log t + PπX,θS {Ac2 } + κ1 se−ρ̃1 t
P
,t MS ,t k > δs,t } ≤ ṽ∈B̃s,t κ̃1
(l) √ √
t log t)− 1
≤ κ̃11 es log(4ρs 2 ψµ1 ρs log t + PπX,θS {Ac2 } + κ1 se−ρ̃1 t
(35)

√ 2
where (l) follows because the cardinality of B̃s,t is at most (2δs,t t)2s = es log(4δs,t t) . Moreover,
√
repeating the above supermartingale argument, we obtain PπX,θS {Ac2 } = PπX,θS {kMS ,t k > µ2 t} ≤
√
η
√
µ √
κ̃2 se−ρ̃2 t/s , where κ̃2 = 4 and ρ̃2 = min 4µµ2σ2 , 0√2 2 . Because t/ log t ≥ ρs, this implies that

√ 0 0

PπX,θS {Ac2 } = PπX,θS {kMS ,t k > µ2 t} ≤ κ̃2 se−ρ̃2 ρ log t . As a result, (35) implies that
√ 1 ψµ ρs log t
√
PπX,θS {kJ˜S−1 2 2 1 s log(4ρs
,t MS ,t k > δs,t } ≤ κ̃1 e
t log t)− 2 1
+ κ̃2 se−ρ̃2 ρ log t + κ1 se−ρ̃1 t . (36)

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 41
√
Now, let ρ2 = max{1, 8/(ψµ1 ), 1/(2ρ̃2 )}, and t0 be the smallest integer such that log t0 ≤ 2ρ̃1 t0
and t0 ≥ 4ρ22 . Therefore, choosing ρ = ρ2 , we deduce from (36) and the definition of δs,t that

˜ ρ2 s log t κ2 s log t
π −1 2
PX,θS kJS ,t MS ,t k > √ = PπX,θS {kJ˜S−1 2 2
,t MS ,t k > δs,t } ≤ √ . (37)
t t
1
for all t ≥ t0 , where κ2 = κ̃1
+ κ̃2 + κ1 . Q.E.D.

Proof of Theorem 2. Fix π = ILSX(m1 , m2 ). Given θS = (αS , βS ) ∈ ΘS , we have the following by

elementary algebra:
T
X
∆πθS (T ) EπθS ∗

= EX r (θS , XS ,t ) − r(pt , θS , XS ,t ) XT
t=1
T
X 2
EπX,θS (−βST XS ,t ) ϕ(θS , XS ,t ) − pt

= (38)
t=1

for T ≥ 2, where EπX,θS {·} = EX { EπθS {· | X T }}. By (12), we have

T
X 2
EπX,θS (−βST XS ,t ) ϕ(θS , XS ,t ) − pt

t=1
T
X 2
EπX,θS (−βST XS ,t ) ϕ(θS , XS ,t ) − pt I{t ∈ M }

=
t=1
T
X 2
EπX,θS (−βST XS ,t ) ϕ(θS , XS ,t ) − pt I{t 6∈ M }

+ (39)
t=1
PT
for T ≥ 2. With regard to the first term on the right-hand side of (39), note that t=1 χt =
PT √
t=1 I{t ∈ M } ≤ 2 T under π = ILSX(m1 , m2 ). Thus,
T
X 2 √
EπX,θS (−βST XS ,t ) ϕ(θS , XS ,t ) − pt I{t ∈ M } ≤ C1 T

(40)
t=1
1

for T ≥ 2, where C1 = 2C0 (u − ℓ)2 , and C0 = max βS · zS : θS = (αS , βS ) ∈ ΘS , zS ∈ ZS . With
regard to the second term on the right-hand side of (39), note that
T
X 2
EπX,θS (−βST XS ,t ) ϕ(θS , XS ,t ) − pt I{t 6∈ M }

t=1
T
(a) X 2
EπX,θS (−βST XS ,t ) ϕ(θS , XS ,t ) − ϕ ϑ̂S ,t , XS ,t I{t 6∈ M }

=
t=2
T
X 2
EπX,θS

≤ C0 ϕ(θS , XS ,t ) − ϕ ϑ̂S ,t , XS ,t I{t 6∈ M }
t=2
T
(b) X
EπX,θS kθS − ϑ̂S ,t k2 I{t 6∈ M }

≤ C0 K0
t=2
T
(c) X
EπX,θS kθS − ϑ̂S ,t k2

≤ C0 K0 (41)
t=2

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
42 Forthcoming in Management Science

for T ≥ 2, where K0 = max{k∇θS ϕ(θS , xS )k2 : θS ∈ ΘS , xS ∈ XS }; (a) follows because 1 ∈ M and

pt = ϕ ϑ̂S ,t , XS ,t for t 6∈ M , under π = ILSX(m1 , m2 ); (b) follows by the mean value theorem; and
(c) follows because I{t 6∈ M } ≤ 1. For brevity, denote the event in Lemma 2 as follows:

ρ2 s log t
At = kJ˜S−1 ,t MS ,t k2
≤ √ .
t
Let t0 be as given in Lemma 2, and recall that ϑ̂S ,t is the truncated estimate that satisfies ϑ̂S ,t =
PΘS {θ̂S ,t }, where PΘS : R2s → ΘS is the projection mapping from R2s onto ΘS . Then, (41) implies
that
T
X 2
EπX,θS (−βST XS ,t ) ϕ(θS , XS ,t ) − pt I{t 6∈ M }

t=1
T
X T
X
≤ C0 K0 t0 d2Θ + PπX,θS {Act−1 } d2Θ + EπX,θS kθS − ϑ̂S ,t k2 I{At−1 }

t=t0 +1 t=t0 +1
T
X −1 T
X −1
= C0 K0 t0 d2Θ + PπX,θS {Act } d2Θ + EπX,θS kθS − ϑ̂S ,t+1 k2 I{At }

(42)
t=t0 t=t0

for T ≥ 2, where: dΘ = max{kϑ − ϑ̃k : ϑ, ϑ̃ ∈ ΘS }, and PπX,θS {·} is the probability measure associ-
√
ated with EπX,θS {·}. By Lemma 2, we know that PπX,θS {Act } ≤ (κ2 s log t)/ t for t ≥ t0 . Therefore,
PT −1 π c
√
t=t0 PX,θS {At } ≤ 2κ2 s T log T . Hence, we deduce from (42) that

T
X 2
EπX,θS (−βST XS ,t ) ϕ(θS , XS ,t ) − pt I{t 6∈ M }

t=1
T −1

2 2
√ X
EπX,θS kθS − ϑ̂S ,t+1 k2 I{At

≤ C0 K0 t0 dΘ + 2κ2 dΘ s T log T + (43)
t=t0

for T ≥ 2. Now, note that

(d)
EπX,θS kθS − ϑ̂S ,t+1 k2 I{At } ≤ EπX,θS kθS − θ̂S ,t+1 k2 I{At }

(e) π
= EX,θS kJ˜S−1 2

,t MS ,t k I{At }
(f) ρ2 s log t
≤ √ (44)
t
for all t ≥ t0 ; (d) follows because ϑ̂S ,t = PΘ {θ̂S ,t }, (e) follows by (11), and (f) follows by the definition
of At . Combining (43) and (44), we obtain
T
X 2 √ √
EπX,θS (−βST XS ,t ) ϕ(θS , XS ,t ) − pt I{t 6∈ M } ≤ C0 K0 t0 d2Θ + 2κ2 d2Θ s T log T + 4ρ2 s T log T

t=1

for T ≥ 2. This implies that

T
X 2 √
EπX,θS (−βST XS ,t ) ϕ(θS , XS ,t ) − pt I{t 6∈ M } ≤ C2 s T log T

(45)
t=1

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 43

for T ≥ 2, where C2 = C0 K0 (t0 d2Θ + 2κ2 d2Θ + 4ρ2 ). By (38)-(40) and (45), we conclude that ∆πθS (T ) ≤
√
Cs T log T for T ≥ 2, where C = C1 + C2 . Q.E.D.

Appendix C: Proofs of the Results in Section 4.2

√
Proof of Lemma 3. Fix π = ILQX(m1 , m2 , λ), and let ρ > 0. Note that, if t/(log d + log t) <
√
ρs, then ρs(log d + log t)/ t > 1 and (20) trivially holds by choosing κ3 ≥ ρ. Accordingly, in the
√
remainder of the proof, we consider the case where t/(log d + log t) ≥ ρs.
For all t, the estimation objective function Qt (·, λt+1 ) is continuous and strictly concave. Thus,
q √ (lasso)
letting δs,t = ρs(log d + log t)/ t ∈ [0, 1], we deduce that if kθ̂t+1 (λt+1 ) − θ k > δs,t then there
exists a random vector θ̃t+1 ∈ R2(d+1) such that kθ̃t+1 − θ k = δs,t and Qt (θ, λt+1 ) < Qt (θ̃t+1 , λt+1 ).
Therefore,
(lasso)
PπX,θ kθ̂t+1 (λt+1 ) − θ k2 > δs,t
2
≤ PπX,θ Qt (θ, λt+1 ) − Qt (θ̃t+1 , λt+1 ) < 0 .

(46)

Pt R g(θ̃·Uk ) D
k −y
To analyze the right-hand side of (46), we first define Q̃t (θ̃) := k=1 χk Dk ν(y)
dy for all
θ̃ ∈ R2(d+1) , where Uk = p1k ⊗ Xk for all k. Then, we have

Qt (θ, λt+1 ) − Qt (θ̃t+1 , λt+1 ) = Q̃t (θ) − λt+1 kθ k1 − Q̃t (θ̃t+1 ) − λt+1 kθ̃t+1 k1

= Q̃t (θ) − Q̃t (θ̃t+1 ) − λt+1 kθ k1 + λt+1 kθ̃t+1 k1 . (47)

We now examine the Taylor series expansion of the second term on the right-hand side of (47),
namely Q̃t (θ̃t+1 ), around θ. To that end, we deduce from elementary analysis that, for all θ̃ ∈ R2(d+1) ,

∂ Q̃t (θ̃) ∂ Q̃t (θ̃)
∇Q̃t (θ̃) = ∂ θ̃1
, ∂ θ̃ , . . . , ∂∂θ̃Q̃t (θ̃)
2 2(d+1)
t
X Dk − g(θ̃ · Uk )
= χk g ′ (θ̃ · Uk )Uk
k=1
ν g(θ̃ · Uk )
t
(a) X
= χk Uk Dk − g(θ̃ · Uk ) , (48)
k=1

where (a) follows because ν(y) = g ′ g −1 (y) for y ∈ R. In addition, for all θ̃ ∈ R2(d+1) ,

∂ 2 Q̃t (θ̃) ∂ 2 Q̃t (θ̃) ∂ 2 Q̃t (θ̃)

 
∂ θ̃12 ∂ θ̃1 ∂ θ̃2
··· ∂ θ̃1 ∂ θ̃2(d+1)
∂ 2 Q̃t (θ̃) ∂ 2 Q̃t (θ̃) ∂ 2 Q̃t (θ̃)
 

∂ θ̃2 ∂ θ̃1 ∂ θ̃22
··· ∂ θ̃2 ∂ θ̃2(d+1)

2
 
∇ Q̃t (θ̃) =  
 .. .. .. .. 
 . . . . 
∂ 2 Q̃t (θ̃) ∂ 2 Q̃t (θ̃) ∂ 2 Q̃t (θ̃)
 
∂ θ̃2(d+1) ∂ θ̃1 ∂ θ̃2(d+1) ∂ θ̃2
··· 2
∂ θ̃2(d+1)

X t
= − χk g ′ (θ̃ · Uk )Uk UkT . (49)
k=1

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
44 Forthcoming in Management Science

We deduce from (48), (49), and Taylor’s theorem that there exists a random vector ξt+1 ∈ R2(d+1)
on the line segment connecting θ and θ̃t+1 such that

Q̃t (θ̃t+1 ) = Q̃t (θ) + (θ̃t+1 − θ)T ∇Q̃t (θ) + 12 (θ̃t+1 − θ)T ∇2 Q̃t (ξt+1 )(θ̃t+1 − θ)
t
X t
X
1
χk g ′ (ξt+1 · Uk )Uk UkT (θ̃t+1 − θ)
T
T
= Q̃t (θ) + (θ̃t+1 − θ) χk Uk Dk − g(θ · Uk ) − 2
(θ̃t+1 − θ)
k=1 k=1
t t
(b) X X
= Q̃t (θ) + (θ̃t+1 − θ)T χk Uk εk − 21 (θ̃t+1 − θ)T χk g ′ (ξt+1 · Uk )Uk UkT (θ̃t+1 − θ), (50)
k=1 k=1

where (b) follows because Dk = g(θ · Uk ) + εk for all k. Consequently, letting ζk,t = g ′ (ξt+1 · Uk ) for
all k and t, we deduce from (47) and (50) that

Qt (θ, λt+1 ) − Qt (θ̃t+1 , λt+1 )

Xt t
X
= −(θ̃t+1 − θ)T χk Uk εk + 21 (θ̃t+1 − θ)T χk ζk,t Uk UkT (θ̃t+1 − θ) − λt+1 kθ k1 + λt+1 kθ̃t+1 k1 .
k=1 k=1

Combining the preceding identity with (46), we obtain the following:

(lasso)
PπX,θ kθ̂t+1 (λt+1 ) − θ k2 > δs,t
2

Xt t
X
π 1 T T T
≤ PX,θ 2 (θ̃t+1 − θ) χk ζk,t Uk Uk (θ̃t+1 − θ) − (θ̃t+1 − θ) χk Uk εk < λt+1 kθ k1 − λt+1 kθ̃t+1 k1
k=1 k=1

PπX,θ
1
= 2
(θ̃t+1 − θ) Vt (θ̃t+1 − θ) < λt+1 kθ k1 − λt+1 kθ̃t+1 k1 + (θ̃t+1 − θ)T Mt ,
T
(51)
Pt Pt
where Vt = k=1 χk ζk,t Uk UkT and Mt = k=1 χk Uk εk for all t. Thus, letting B1 = {kMt k∞ ≤ λt+1 },
we deduce from (51) that
(lasso)
PπX,θ kθ̂t+1 (λt+1 ) − θ k2 > δs,t 2

≤ PπX,θ 21 (θ̃t+1 − θ)T Vt (θ̃t+1 − θ) < λt+1 kθ k1 − λt+1 kθ̃t+1 k1 + (θ̃t+1 − θ)T Mt , B1 + PπX,θ B1c . (52)

To obtain an upper bound on PπX,θ B1c , we use the following lemma, the proof of which is deferred
to the end of this section.

Lemma C.1. There exist finite and positive constants κ̃3 and ρ̃3 such that, if ρ ≥ ρ̃3 , then

PπX,θ kMt k∞ > λt+1 ≤ κ̃3 s(log√d+log t)

t
.

By (52) and Lemma C.1, we obtain the following: if ρ ≥ ρ̃3 , then

(lasso)
PπX,θ kθ̂t+1 (λt+1 ) − θ k2 > δs,t 2

≤ PπX,θ 12 (θ̃t+1 − θ)T Vt (θ̃t+1 − θ) < λt+1 kθ k1 − λt+1 kθ̃t+1 k1 + (θ̃t+1 − θ)T Mt , B1 + κ̃3 s(log√d+log t)

t
.
(53)

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 45

On the other hand, we have the following on B1 :

where (c) follows from Hölder’s inequality. By (53) and (54), we deduce that, if ρ ≥ ρ̃3 , then

(lasso)
PπX,θ kθ̂t+1 (λt+1 ) − θ k2 > δs,t 2

≤ PπX,θ 12 (θ̃t+1 − θ)TVt (θ̃t+1 − θ) < λt+1 kθ k1 − λt+1 kθ̃t+1 k1 + λt+1 kθ̃t+1 − θ k1 + κ̃3 s(log√d+log t)

t

= PπX,θ 21 (θ̃t+1 − θ)T Vt (θ̃t+1 − θ) < λt+1 kθ k1 − kθ̃t+1 k1 + kθ̃t+1 − θ k1 + κ̃3 s(log√d+log t)

t
. (55)

Recall that θS ∈ R2s is the compressed parameter vector containing all non-zero components of θ.
Let θN ∈ R2(d+1−s) be the vector consisting of the components of θ that are not contained in θS .
Thus, all components of θN are zero. In accordance with this, we separate θ̃t+1 into two vectors in
the same way: let θ̃S ,t+1 ∈ R2s be the vector consisting of the components of θ̃t+1 whose indices are
in S , and θ̃N ,t+1 ∈ R2(d+1−s) be the vector consisting of the components of θ̃t+1 whose indices are
not in S . Then, we have the following

kθ k1 = kθS k1 , (56a)

kθ̃t+1 k1 = kθ̃S ,t+1 k1 + kθ̃N ,t+1 k1 , (56b)

kθ̃t+1 − θ k1 = kθ̃S ,t+1 − θS k1 + kθ̃N ,t+1 k1 . (56c)

Equations (56a)-(56c) imply that

(d)
kθ k1 − kθ̃t+1 k1 + kθ̃t+1 − θ k1 = kθS k1 − kθ̃S ,t+1 k1 + kθ̃S ,t+1 − θS k1 ≤ 2kθ̃S ,t+1 − θS k1 , (57)

where (d) follows from Minkowski’s inequality. We deduce from (55) and (57) that, if ρ ≥ ρ̃3 , then

(lasso)
PπX,θ kθ̂t+1 (λt+1 ) − θ k2 > δs,t 2

≤ PπX,θ 12 (θ̃t+1 − θ)T Vt (θ̃t+1 − θ) < 2λt+1 kθ̃S ,t+1 − θS k1 + κ̃3 s(log√d+log t)

t

= PπX,θ (θ̃t+1 − θ)TVt (θ̃t+1 − θ) < 4λt+1 kθ̃S ,t+1 − θS k1 + κ̃3 s(log√d+log t)

t
. (58)

√
Let µ3 = 81 (m1 − m2 )2 γ1 ℓ̃ > 0 and B2 = {µmin (Vt ) ≥ µ3 t}, where µmin (Vt ) denotes the smallest
eigenvalue of Vt . We use the following lemma to derive an upper bound on PπX,θ B2c . The proof of

this lemma is at the end of this section.

Lemma C.2. There exist finite and positive constants κ̃4 and ρ̃4 such that, if ρ ≥ ρ̃4 , then
√
PπX,θ µmin (Vt ) < µ3 t ≤ κ̃4 s(log√d+log t)

t
for t ≥ 5.

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
46 Forthcoming in Management Science

Lemma C.2 and (58) imply that, if ρ ≥ max{ρ̃3 , ρ̃4 } and t ≥ 5, then

(lasso)
PπX,θ kθ̂t+1 (λt+1 ) − θ k2 > δs,t
2

≤ PπX,θ (θ̃t+1 − θ)T Vt (θ̃t+1 − θ) < 4λt+1 kθ̃S ,t+1 − θS k1 , B2 + κ3 s(log√d+log t)

t
(e) √
≤ PπX,θ µ3 tkθ̃t+1 − θ k2 < 4λt+1 kθ̃S ,t+1 − θS k1 , B2 + κ3 s(log√d+log
t
t)
, (59)

√
where κ3 = κ̃3 + κ̃4 , (e) follows from the Rayleigh-Ritz theorem. Since ky k1 ≤ 2sky k for all y ∈ R2s ,
√ √
we have kθ̃S ,t+1 − θS k1 ≤ 2skθ̃S ,t+1 − θS k ≤ 2skθ̃t+1 − θ k. Thus, we deduce from (59) that, if
ρ ≥ max{ρ̃3 , ρ̃4 } and t ≥ 5, then

(lasso)
PπX,θ kθ̂t+1 (λt+1 ) − θ k2 > δs,t
2

√ √
≤ PπX,θ µ3 tkθ̃t+1 − θ k2 < 4λt+1 2skθ̃t+1 − θ k + κ3 s(log√d+log t
t)

√ √
= PπX,θ µ3 tkθ̃t+1 − θ k < 4λt+1 2s + κ3 s(log√d+log t
t)

= PπX,θ µ23 tkθ̃t+1 − θ k2 < 32sλ2t+1 + κ3 s(log√d+log t)

t
(f) π
n o
= PX,θ kθ̃t+1 − θ k2 < ρ̃5 s(log√d+log
t
t)
+ κ3 s(log√d+log
t
t)

(g) π
= PX,θ ρ < ρ̃5 + κ3 s(log√d+log
t
t)
. (60)

2 √
where ρ̃5 = 32c̃ µ2
, (f) follows because λt+1 = c̃t1/4 log d + log t, and (g) follows because kθ̃t+1 − θ k2 =
3
2
= ρs(log√d+log t) π

δs,t t
. Note that, if ρ ≥ ρ̃5 , then P X,θ ρ < ρ̃5 = 0. Thus, letting ρ = ρ3 = max{ρ̃3 , ρ̃4 , ρ̃5 }
and t1 = 5, we deduce from (60) that

(lasso) κ3 s(log d+log t)

PπX,θ kθ̂t+1 (λt+1 ) − θ k2 > δs,t
2
≤ √
t
for all t ≥ t1 ,

or equivalently,

(lasso) ρ3 s(log d+log t) κ3 s(log d+log t)
π
PX,θ kθ̂t+1 (λt+1 ) − θ k2 > √
t
≤ √
t
for all t ≥ t1 . Q.E.D.

Proof of Theorem 3. Fix π = ILQX(m1 , m2 , λ). For p ∈ [ℓ, u], θ ∈ Θ, and x ∈ X , consider the
Taylor series expansion of r(p, θ, x) around the revenue-maximizing price, ϕ(θ, x), noting that there
exists a price p̃ between p and ϕ(θ, x) such that

∂ ∂2
2
r ϕ(θ, x), θ, x p − ϕ(θ, x) + 21 ∂p

r(p, θ, x) = r ϕ(θ, x), θ, x + ∂p 2 r p̃, θ, x p − ϕ(θ, x) . (61)

∂

Because ∂p
r ϕ(θ, x), θ, x = 0 for all θ ∈ Θ and x ∈ X , (61) implies that

2
r ∗ (θ, x) − r(p, θ, x) = r ϕ(θ, x), θ, x − r(p, θ, x) ≤ C3 ϕ(θ, x) − p

(62)

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 47

∂2
for all θ ∈ Θ and x ∈ X , where C3 = max{ 12 ∂p

2 r p, θ, x : p ∈ [ℓ, u], θ ∈ Θ, x ∈ X }. We deduce from

(62) that, given θ ∈ Θ,

XT
∆πθ (T ) π
∗
= EX Eθ r (θ, Xt ) − r(pt , θ, Xt ) X T
t=1
T
X 2
EπX,θ C3 ϕ(θ, Xt ) − pt

≤ (63)
t=1

for T ≥ 2, where EπX,θ {·} = EX { Eπθ {· | X T }}. Using (63) and repeating the arguments used for
deriving (39)-(42) in the proof of Theorem 2, we obtain the following:
T
X 2 √
EπX,θ C3 ϕ(θ, Xt) − pt I{t ∈ M } ≤ C4 T

(64)
t=1

for T ≥ 2, where C4 = 2C3 (u − ℓ)2 , and furthermore,

T
X 2
EπX,θ C3 ϕ(θ, Xt) − pt I{t 6∈ M }

t=1
T
X −1 T
X −1
(lasso)
≤ C5 t1 d2Θ + PπX,θ {Act } d2Θ + EπX,θ kθ − ϑ̂t+1 (λt+1 )k2 I{At }

(65)
t=t1 t=t1

for T ≥ 2, where C5 = C3 max{k∇θ ϕ(θ, x)k2 : θ ∈ Θ, x ∈ X }, dΘ = max{kϑ − ϑ̃k : ϑ, ϑ̃ ∈ Θ}, At =

(lasso)
kθ − θ̂t+1 (λt+1 )k2 ≤ ρ3 s(log√d+log t)

t
, and PπX,θ {·} is the probability measure associated with
EπX,θ {·}. Lemma 2 implies that PπX,θ {Act } ≤ κ3 s(log√d+log
t
t)
for t ≥ t1 , from which we deduce that
PT −1 π c
√
t=t1 PX,θ {At } ≤ 2κ3 s T (log d + log T ). Invoking this inequality in (65) and repeating the argu-

ments used to derive (43)-(45), we further deduce that

T
X 2 √
EπX,θ C3 ϕ(θ, Xt) − pt I{t 6∈ M } ≤ C6 s T (log d + log T ),

(66)
t=1

for T ≥ 2, where C6 = C5 (t1 d2Θ + 2κ3 d2Θ + 4ρ3 ). Combining (63), (64), and (66), we deduce that
√
∆πθ (T ) ≤ C̃s T (log d + log T ) for T ≥ 2, where C̃ = C4 + C6 . Q.E.D.
√
Proof of Lemma C.1. We first note that, if t/(log d + log t) < ρs, then we deduce that
PπX,θ kMt k∞ > λt+1 ≤ 1 < ρs(log√d+log t)

t
and obtain the desired result by choosing κ̃3 ≥ ρ. Now,
√
suppose that t/(log d + log t) ≥ ρs, and note that

π π
i
PX,θ kMt k∞ > λt+1 = PX,θ max |Mt | > λt+1
i=1,...,2(d+1)
( 2(d+1) )
[
= PπX,θ |Mit | > λt+1
i=1
2(d+1)
(a) X
PπX,θ |Mit | > λt+1 ,

≤ (67)
i=1

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
48 Forthcoming in Management Science

where Mit denotes the i-th component of Mt for i = 1, . . . , 2(d + 1), and (a) follows by the union
bound. We construct an upper bound on the right-hand side of (67). For all v ∈ R and i ∈
i 1 2 Pk
{1, 2, . . . , 2(d + 1)}, define {Mkv,i , k = 1, 2, . . .} such that M1v,i = 1 and Mkv,i = eψ̃vMk − 2 ψ̃v l=1 χl for
1
k ≥ 2, where ψ̃ = 2µ σ1 2 υ2 and υ0 = max p ⊗ x ∞ : p ∈ [ℓ, u], x ∈ X . Based on this, we deduce
0 0 0

that, for all i,

PπX,θ |Mit | > λt+1 = PπX,θ Mit > λt+1 + PπX,θ Mit < −λt+1

t t
(c)
π i λt+1 X π i λt+1 X
≤ PX,θ Mt > √ χl + PX,θ Mt < − √ χl
2 t l=1 2 t l=1
v ,i 1 2 Pt v ,i 1 2 Pt
≤ PπX,θ Mt + > e 2 ψ̃v+ l=1 χl + PπX,θ Mt − > e 2 ψ̃v− l=1 χl , (68)

λt+1 λ Pt √
where v+ = √ ,
2 t
v− = − 2t+1
√ , and (c) follows because
t l=1 χl ≤ 2 t. Given v ∈ {v+ , v− } and i ∈
1 2 i
{1, . . . , 2(d + 1)}, we have EπX,θ [Mkv,i | Fk−1 ] = Mk−1 v,i
e− 2 ψ̃v χk EπX,θ [eψ̃vχk Uk εk | Fk−1 ] for k ≥ 2, where

Fk = σ p1 , . . . , pk , ε1 , . . . , εk , X1 , . . . , Xk+1 and Uki denotes the i-th component of Uk . Note that
q q
ψ̃λ υ
√ 0 = 1 ψ̃c̃υ0 log d+log t 1 1 ψ̃c̃υ0 2

|ψ̃vχk Uki | ≤ 2t+1 t 2
√
t
≤ 2
ψ̃c̃υ0 ρs
. Thus, if ρ ≥ ρ̃3 = 2η0
, then |ψ̃vχk Uki | ≤ η0 .
2 2
Therefore, because EπX,θ [eηεk |Fk−1 ] ≤ eµ0 σ0 η for all η satisfying |η| ≤ η0 , we have EπX,θ [Mkv,i | Fk−1 ] ≤
2
v,i 1 χk +µ0 σ02 ψ̃ 2 v 2 χk (Uki )2 v,i
Mk−1 e− 2 ψ̃v ≤ Mk−1 for k ≥ 2, where the latter inequality follows because
µ0 σ02 ψ̃(Uki )2 ≤ 12 . Consequently, for v ∈ {v+ , v− } and i ∈ {1, . . . , 2(d + 1)}, (Mkv,i , Fk ) is a super-
martingale with M v,i = 1. Hence, letting c̃ = 4ψ̃−1/2 , we deduce that, if ρ ≥ ρ̃3 , then
2 1 2
1 2 Pt 2 Pt (d) λt+1 (f) e− 16 ψ̃c̃ log d
1 − 1 ψ̃ √ (e) 1 2
PπX,θ Mtv,i > e 2 v l=1 χl ≤ e− 2 ψ̃v l=1 χl ≤ e 16 t = e− 16 ψ̃c̃ (log d+log t) ≤

√ , (69)
t
λ2 Pt √
where (d) follows because v 2 = t+1 4t
for v ∈ {v+ , v− } and l=1 χl ≥ 21 t, (e) follows because λt+1 =
√
c̃t1/4 log d + log t, and (f) follows because ψ̃ ≥ 8c̃−2 . Combining (68) and (69), we further deduce
that, if ρ ≥ ρ̃3 , then
1 2
2e− 16 ψ̃c̃ log d
PπX,θ
i
|Mt | > λt+1 ≤ √
t

for all i = 1, . . . , 2(d + 1). Using the preceding inequality in (67), we obtain the following: if ρ ≥ ρ̃3 ,
1 ψ̃c̃2 log d
− 16
then PπX,θ kMt k∞ > λt+1 ≤ 4(d+1)e √t ≤ κ̃3 s(log√d+log t)

t
, where κ̃3 = 8 and the latter inequality
follows because ψ̃ ≥ 16c̃−2 . Q.E.D.
√ √
Proof of Lemma C.2. Note that, if t/(log d + log t) < ρs, then PπX,θ µmin (Vt ) < µ3 t ≤ 1 <
ρs(log d+log t)
√
t
and the desired result follows by choosing κ̃4 ≥ ρ. Consequently, we suppose that
√
t/(log d + log t) ≥ ρs in the remainder of the proof. Repeating the arguments in the proof of
√ √
Lemma 1 and noting that Jt ≥ 18 (m1 − m2 )2 t for all t ≥ 5, we deduce that PπX,θ µmin (Vt ) < µ3 t ≤

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 49
√
1
2(d + 1)e−ρ̃6 t , where ρ̃6 = 16
(m1 − m2 )2 (1 − log 2)γ̃1 min{1, µmin (ΣZ )}ℓ̃/[(1 + u2 )(1 + zmax
2
)ũ]. Let-
ting ρ̃4 = 1/ρ̃6 , we deduce that, if ρ ≥ ρ̃4 , then
√ (a) (b)
PπX,θ µmin (Vt ) < µ3 t ≤ 4de−ρ̃6ρs(log d+log t) ≤ κ̃4 s(log√d+log t)

t
, (70)
√
for all t ≥ 5, where: κ̃4 = 4, (a) follows because t ≥ ρs(log d + log t), and (b) follows because
ρ ≥ 1/ρ̃6 . Q.E.D.

Appendix D: Model Parameters for Section 5.1.3

In our simulation experiments in §5.1.3, we consider the following problem parameters. The feasible
price interval is [ℓ, u] = [0.2, 2] and the first two prices of all policies are chosen arbitrarily at
p1 = m1 = 1.1 and p2 = m2 = 1.2. The unknown demand parameter vector is θ = (α, β) ∈ R30 , where

α = [ 1.1, −0.1, 0, 0.1, 0, 0.2, 0, 0.1, −0.1, 0, 0, 0.1, −0.1, 0.2, −0.2 ],
β = (−1) × [ 0.5, 0.1, −0.1, 0, 0, 0, 0, 0.2, 0.1, 0.2, 0, 0.2, −0.1, −0.2, 0 ].

The parameter space is Θ = [−1.5, 1.5]30 , and the demand shocks {ǫt } are normally distributed
with mean zero and standard deviation σ0 = 0.01.
Further, µ1 equals a vector of ones, Σ1 = 0.1I14 (where I14 is the 14 × 14 identity matrix), µ2
equals a vector of twos, and Σ2 = 0.06AA, where A is a 14 × 14 matrix generated randomly by
setting each element to a uniform random variable between 0 and 1. For ILQX, the regularization
√
parameter sequence λ = (λ1 , λ2 , . . .) is selected such that λt+1 = 0.05t1/4 log d + log t for all t; this
choice is informed by the theory developed in §4.2.

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
50 Forthcoming in Management Science

Appendix E: Summary of the Data Set Used in Section 5.2

Variable Type Description Summary statistics

apply Categorical Indicator for eventual contract 1 for Funded (21.91%),
(dependent variable) 0 for Non-funded (78.09%)
1 Tier Categorical Segmentation (1–7) based on 1 (48.44%), 2 (21.85%), 3 (18.66%),
FICO scores (defined by the 4–6 (0%), 7 (11.05%)
auto finance company)
2 State Categorical Customer state; 49 states Top 3 states: 13.90% from CA,
present 9.70% from TX, 6.99% from FL
3 termclass Categorical Segmentation based on terms 1: 0–36 months (15.19%),
2: 37–48 months (15.53%),
3: 49–60 months (49.14%),
4: > 60 months (20.14%)
4 Type Categorical Type of financing Finance (77.39%) or Refinance
(22.61%)
5 CarType Categorical Type of car New (58.48%), Used (18.90%) or
Refinanced (22.61%)
6 partnerbin Categorical Segmentation based on partners 1: Direct auto finance company
(40.41%), 2: Partner A (16.11%), 3:
Other Partners (43.48%)
7 Primary FICO Continuous FICO score range: 587–854, mean: 726.7, std:
44.61
8 Term Continuous Approved term in months 36 months (15.19%), 48 months
(15.53%), 60 months (49.14%),
66 months (2.50%), 72 months
(17.64%)
9 rate Continuous Customer rate range: 2.4500%–15.5300%,
mean: 5.5779%, std: 1.5029%
10 Previous Rate Continuous Previous interest rate for a refi- range: 3.00%–24.00%,
nanced car mean: 8.140%, std: 26.99%
11 Competition rate Continuous Competitor’s rate range: 2.99%–6.45%, mean: 4.80%,
std: 0.5858%
12 onemonth Continuous Prime rate range: 1.0200%–2.1270%,
mean: 1.3288%, std: 0.2781%
13 rate1 Continuous Rate relative to the prime rate; range: 0.57%–12.87%,
(rate − onemonth)/onemonth mean: 3.42%, std: 1.32%
14 rel compet rate Continuous Rate relative to the competi- range: −1.68%–10.07%,
tor’s rate; mean: 0.63%, std: 1.10%
(rate − Competition rate)/onemonth
15 mp Continous Monthly payment range: $80.04–$3085,
mean: $529.74, std: $217.81
16 mp rto amtfinance Continuous Monthly payment over amount range: 1.5414%–3.4925%,
financed; mp/amount approved mean: 2.0951%, std: 0.4348%
17 Amount Approved Continuous Loan amount approved range: $4527.6–$10,000,
mean: $26,132, std: $11,140
18 days to approve Continuous No. of days until approval range: 0–116, mean: 0.25, std: 1.36
Table 3 Summary of the data set used in Section 5.2

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 51

Appendix F: Robustness Studies

Impact of ILQX parameters on regret performance. For completeness, we also explore
how the various parameters of ILQX influence the regret performance. Specifically, we explore in
Figure 13 the sensitivity of the results to (a) the regularization parameter, (b) the experimental
prices, (c) the number of experimentation cycles over T -period time horizon, and (d) the number
of times an experimental price is charged per experimentation cycle.
(a) (b)
150 150
c̃ = 0.01
c̃ = 0.025 m1 = 1.1, m2 = 1.2
c̃ = 0.05 m1 = 1.0, m2 = 1.3
c̃ = 0.075 m1 = 0.9, m2 = 1.4
100 100

∆πθ (T )
∆πθ (T )

c̃ = 0.1

50 50

0 0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
T T
(c) (d)
150 κ = 0.5 150
κ=1 1×experiments/cycle
κ=2 2×experiments/cycle
3×experiments/cycle
100 100
∆πθ (T )
∆πθ (T )

50 50

0 0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
T T
Figure 13 Sensitivity of ILQX to variations in policy parameters. Panel (a) shows the impact of different
constants c̃ in the near-optimal regularization parameter in Theorem 3. Panel (b) shows the impact of increasing the
difference between the two experimental prices m1 and m2 . Panel (c) shows the impact of scaling the experimentation
cycle lengths by a constant factor κ, whereas panel (d) shows the impact of charging experimental prices one to three
times per experimentation cycle. The rest of the problem parameters are the same as in §5.1.1.

In Figure 13(a), we find c̃ = 0.025 and c̃ = 0.05 to be the best constant multipliers for the
near-optimal regularization parameter in Theorem 3. We also observe that the performance of
ILQX is not too sensitive to the chose of c̃. In Figure 13(b), we observe that regret decreases with
the difference between the two experimental prices m1 and m2 . Since in practice, businesses are
constrained in the degree of price experimentation they may engage in, we can recommend from this
observation that good experimental prices are ones that differ as much as possible within practical
limits. In Figure 13(c), we observe that more frequent price experiments can lead to substantial
improvements in the regret performance, possibly due to the increased sample size in the demand
estimation step of ILQX. In Figure 13(d), we observe that the number of times each experimental
price is charged per experimentation cycle does not affect the results very much.

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
52 Forthcoming in Management Science

Impact of using all historical data points in estimation. As explained in §4, ILQX uses
only the data points from the price experimentation periods. To examine the impact of using all
data, Figure 14 compares ILQX and a variant of ILQX that uses all historical data points (denoted
as ILQX with full data) in terms of regret performance. As shown in Figure 14(a), using all historical
points improves expected regret performance in our simulation experiments in §5.1. But, this does
not necessarily ensure an almost sure improvement. For the sample path realized in our real-life
data set in §5.2, ILQX turns out to have slightly better performance than ILQX with full data; please
see Figure 14(b). It is worth noting that both policies outperform other benchmarks in our real-life
data analysis (see §5.2.3 for the definitions of these policies).

(a) 8
5
10 (b)
300 ILQX greedy EILL
ILQX with full data 6 greedy IL

∆πθ (T )
company
∆πθ (T )

200 ILQX with full data

4
ILQX
100 2

0 0
0 1000 2000 3000 4000 5000 0 0.5 1 1.5 2
T T 104
Figure 14 Impact of using all data points versus those in the experimentation periods. Panel (a) shows
the T -period regret of ILQX (dashed curve) and ILQX with full data (dash-dotted curve) in the setting described in
§5.1. Panel (b) shows the T -period regret of ILQX (dashed curve), ILQX with full data (dash-dotted curve), company
(lower solid curve), greedy IL (upper solid curve), and greedy EILL (dashed curve) in the setting described in §5.2,
which is based on real-life data.

More on experimentation near the boundary of feasible prices. As demonstrated in

§5.1.2 and mentioned in §5.2.5, myopic policies can be modified to conduct price experiments when
their price paths hit the boundary of the feasible price set. To implement this idea in the context of
our real-life data set, we consider slight modifications of greedy policies (greedy IL and greedy EILL)
such that if the greedy policy’s price equals ℓ (resp. u) in periods t − 1 and t, then the modified
greedy policy would charge the price ℓ+δ (resp. u − δ) in period t, and otherwise, it would charge the
greedy price. Following the notation in §5.1.2, we denote these modified policies as δ-greedy IL and
δ-greedy EILL. Figure 15 shows that, for the sample path of customer feature vectors observed in our
real-life data set, the aforementioned modification does not yield a substantial change in the regret
performance of greedy policies. The regret gap between greedy policies and their modified δ-greedy
versions is less than 0.05% for this data set. Regarding these results, we note that experimenting
near the feasible price boundary is helpful in learning, but because the underlying optimal price is
unknown and time-varying, the cost of experimentation is not necessarily the same near the lower
and upper boundaries. This cost could be almost negligible on one boundary, but non-negligible
on the other.

Electronic copy available at: https://ssrn.com/abstract=2972985

Ban & Keskin: Personalized Dynamic Pricing with Machine Learning
Forthcoming in Management Science 53

(a) (b)

(relative to greedy EILL)

1000 1000

(relative to greedy IL)

T -period regret gap

δ-greedy IL with δ = 50 δ-greedy EILL with δ = 50

T -period regret gap

δ-greedy IL with δ = 100 δ-greedy EILL with δ = 100
500 500

0 0

-500 -500

-1000 -1000
0 0.5 1 1.5 2 0 0.5 1 1.5 2
T 104
T 10
4

Figure 15 More on experimentation near the boundary of feasible prices. Panel (a) shows the T -period
regret gap between greedy IL and δ-greedy IL, i.e., ∆greedy
θ
IL
(T ) − ∆δ-greedy
θ
IL
(T ), for δ ∈ {50, 100}. Panel (b) shows the
T -period regret gap between greedy EILL and δ-greedy EILL, i.e., ∆θ greedy EILL
(T ) − ∆δ-greedy
θ
EILL
(T ), for δ ∈ {50, 100}. The
feasible price set is [ℓ, u] = [0, 15000]. Problem parameters are calibrated from a real-life data set as described in §5.2;
the displayed regret differences are based on the sample path of customer feature vectors observed in the data set.

Appendix G: Details of the Misspecified Models in Section 6

For the first misspecified model in Section 6, we find K = 3 clusters to be the optimal number
of clusters in terms of the average silhouette value when applying the k-means clustering algo-
rithm with the squared Euclidean distance metric. In the logistic regression step, we use a feasible
parameter set Θ that ensures that the optimal prices for the demand models of all three clusters
are within the feasible price range [ℓ, u]. This yields a three-segment demand model with three sets
of parameters (α1 , β1 ), (α2 , β2 ) and (α3 , β3 ) given by:

α1 = [ −4.670, 0.8203, 04 , −0.3348, −2.156, 03 , 9.037, −2.195, 0, −0.677, 04 ], β1 = −7.222 × 10−4,

α2 = [ −5.838, 1.025, 04 , −0.3611, −2.695, 03 , 11.30, −2.743, 0, −0.8464, 04 ], β2 = −9.027 × 10−4,

α3 = [ −4.670, 0.8203, 04 , −0.2889, −2.156, 03 , 9.037, −3.292, 0, −1.016, 04 ], β3 = −7.222 × 10−4,

where 0k represents a sequence of k zeros.

For the second misspecified model, we add progressively larger random perturbations to the
coefficients in the above model until a model where segment-ILQX performs the best is found. We
arrive at the following three-segment demand model with this process:

α1 = [ −26.21, 2.336, 04 , −2.069, −10.67, 03 , 88.82, −4.348, 0, −18.47, 04 ], β1 = −6.681 × 10−3,

α2 = [ −5.623, 0.5540, 04 , −0.4038, −3.264, 03 , 14.49, −0.4707, 0, −4.520, 04 ], β2 = −1.154 × 10−3,

α3 = [ −7.104, 0.9869, 04 , −1.927, −4.637, 03 , 46.82, −0.9174, 0, −5.512, 04 ], β3 = −3.887 × 10−3.

References
Gill, R. D. and Levit, B. Y. (1995). Applications of the van Trees inequality: a Bayesian Cramér-Rao bound.
Bernoulli, 1(1/2):59–79.
Keskin, N. B. and Zeevi, A. (2014). Dynamic pricing with an unknown demand model: Asymptotically
optimal semi-myopic policies. Operations Research, 62(5):1142–1167.
Tropp, J. A. (2011). User-friendly tail bounds for matrix martingales. Technical report, California Institute
of Technology.

Electronic copy available at: https://ssrn.com/abstract=2972985

Deep Reinforcement Learning Algorithms For Dynamic Pricing
No ratings yet
Deep Reinforcement Learning Algorithms For Dynamic Pricing
18 pages
Journal of Business Research: Gerrit Hufnagel, Manfred Schwaiger, Louisa Weritz
No ratings yet
Journal of Business Research: Gerrit Hufnagel, Manfred Schwaiger, Louisa Weritz
20 pages
Dynamic Pricing in High-Dimensions: Adel Javanmard
No ratings yet
Dynamic Pricing in High-Dimensions: Adel Javanmard
49 pages
Dynamic Pricing
No ratings yet
Dynamic Pricing
35 pages
Demand Learning
No ratings yet
Demand Learning
30 pages
Customer-Centric Marketing: A Pragmatic Framework
From Everand
Customer-Centric Marketing: A Pragmatic Framework
R. Ravi
No ratings yet
1 s2.0 S2192440623000175 Main
No ratings yet
1 s2.0 S2192440623000175 Main
29 pages
Price Optimization With Practical Constraints
No ratings yet
Price Optimization With Practical Constraints
32 pages
Thompson Sampling For Dynamic Pricing
No ratings yet
Thompson Sampling For Dynamic Pricing
22 pages
SSRN 4259622
No ratings yet
SSRN 4259622
36 pages
Dynamic Air Ticket Pricing Using Reinforcement Learning Method
No ratings yet
Dynamic Air Ticket Pricing Using Reinforcement Learning Method
19 pages
Basepaper 3
No ratings yet
Basepaper 3
50 pages
The Price Is Right: Mastering Pricing Methods and Strategies
From Everand
The Price Is Right: Mastering Pricing Methods and Strategies
Dale Allman
No ratings yet
Personalized Pricing and Competition
No ratings yet
Personalized Pricing and Competition
34 pages
Dynamic Targeted Pricing in B2B Settings: Zaozao Zhang
No ratings yet
Dynamic Targeted Pricing in B2B Settings: Zaozao Zhang
79 pages
Durable Goods Pricingwith Reference Price Effects
No ratings yet
Durable Goods Pricingwith Reference Price Effects
32 pages
Novel Pricing Strategies For Revenue Maximization
No ratings yet
Novel Pricing Strategies For Revenue Maximization
24 pages
Oded Dynamic Pricing in B2B Context-05!12!2011
No ratings yet
Oded Dynamic Pricing in B2B Context-05!12!2011
53 pages
Prakash, D., Dan Spann, M. 2022. Dynamic Pricing and Reference Price Effects. Journal of Business Research
No ratings yet
Prakash, D., Dan Spann, M. 2022. Dynamic Pricing and Reference Price Effects. Journal of Business Research
15 pages
SSRN Id3489355
No ratings yet
SSRN Id3489355
37 pages
05 - Optimal Dynamic Pricing of Perishable Items
No ratings yet
05 - Optimal Dynamic Pricing of Perishable Items
21 pages
Basepaper 5
No ratings yet
Basepaper 5
10 pages
SSRN 5075202
No ratings yet
SSRN 5075202
11 pages
A Systematic Requirement Analysis For Dynamic Pricing in Retail E-Commerce
No ratings yet
A Systematic Requirement Analysis For Dynamic Pricing in Retail E-Commerce
7 pages
BF02706246
No ratings yet
BF02706246
26 pages
Dynamic Pricing and Its Forming Factors: December 2012
No ratings yet
Dynamic Pricing and Its Forming Factors: December 2012
9 pages
Business School Research Paper
No ratings yet
Business School Research Paper
10 pages
02+HR 738 JIER With DOI
No ratings yet
02+HR 738 JIER With DOI
7 pages
Dynamic Pricing Model of E-Commerce Platforms Base
No ratings yet
Dynamic Pricing Model of E-Commerce Platforms Base
17 pages
SRM Paper
No ratings yet
SRM Paper
11 pages
Lean Pricing: Pricing Strategies for Startups
From Everand
Lean Pricing: Pricing Strategies for Startups
Omar Mohout
No ratings yet
Dynamic Pricing and Learning
No ratings yet
Dynamic Pricing and Learning
39 pages
Mastering the Art and Science of Management
From Everand
Mastering the Art and Science of Management
Sepantaarmaeiti Momeni
No ratings yet
The Role of Dynamic Pricing Models in Increasing M
No ratings yet
The Role of Dynamic Pricing Models in Increasing M
15 pages
The Broker’s Bible: The Way Forward
From Everand
The Broker’s Bible: The Way Forward
Nancy Gardner
No ratings yet
Value-based Marketing Strategy: Pricing and Costs for Relationship Marketing
From Everand
Value-based Marketing Strategy: Pricing and Costs for Relationship Marketing
Santiago Lopez
No ratings yet
Behavior Analytics in Retail: Measure, Monitor and Predict Employee and Customer Activities to Optimize Store Operations and Profitably, and Enhance the Shopping Experience.
From Everand
Behavior Analytics in Retail: Measure, Monitor and Predict Employee and Customer Activities to Optimize Store Operations and Profitably, and Enhance the Shopping Experience.
Ronny Max
No ratings yet
Lecture 27 - Poisson Regression: I TH 1 N I I I TH I TH
No ratings yet
Lecture 27 - Poisson Regression: I TH 1 N I I I TH I TH
4 pages
Macro and Micro Economics Made Easy For Beginners: For Adults, College and High School Students, Quick Study Guide, Cheatsheet
From Everand
Macro and Micro Economics Made Easy For Beginners: For Adults, College and High School Students, Quick Study Guide, Cheatsheet
Roggie Clark
No ratings yet
Glimmix
No ratings yet
Glimmix
244 pages
Marketing Analytics: How to Achieve Success, #1
From Everand
Marketing Analytics: How to Achieve Success, #1
Ricardo Moreno
No ratings yet
Pukelsheim Optimal DoE
100% (2)
Pukelsheim Optimal DoE
487 pages
Price: Maximizing Customer Loyalty through Personalized Pricing
From Everand
Price: Maximizing Customer Loyalty through Personalized Pricing
Cactus Raazi
No ratings yet
How to Price Your Product or Service Just Right
From Everand
How to Price Your Product or Service Just Right
Barney Kemps
No ratings yet
Designing and Conducting Stated Choice Experiments
No ratings yet
Designing and Conducting Stated Choice Experiments
42 pages
Customer willingness to pay: How to use pricing to boost your sales
From Everand
Customer willingness to pay: How to use pricing to boost your sales
Marika Päiväniemi
No ratings yet
Optimal Design For Nonlinear Response Models 1st Edition High-Resolution PDF Download
100% (10)
Optimal Design For Nonlinear Response Models 1st Edition High-Resolution PDF Download
14 pages
MARKETING DATA ANALYST HANDBOOK: A DATA WHISPERERS DREAM
From Everand
MARKETING DATA ANALYST HANDBOOK: A DATA WHISPERERS DREAM
DR JAMES SELIGMAN
No ratings yet
Arning Time Series Classification With Fisher Information
No ratings yet
Arning Time Series Classification With Fisher Information
22 pages
Pricing Tactics
From Everand
Pricing Tactics
Robert C. Brenner
3.5/5 (4)
Pricing Types: Signalling Market Positioning Intent
From Everand
Pricing Types: Signalling Market Positioning Intent
Robert David Hughes
No ratings yet
History and Development of Statistics
96% (27)
History and Development of Statistics
4 pages
Caie D 22 03171
No ratings yet
Caie D 22 03171
36 pages
Advanced E-Commerce Business Questions and Analytical Hints
From Everand
Advanced E-Commerce Business Questions and Analytical Hints
Zemelak Goraga
No ratings yet
Generalized Linear Models
No ratings yet
Generalized Linear Models
109 pages
Unlocking the Code: Crack the Business Success Formula
From Everand
Unlocking the Code: Crack the Business Success Formula
Sachin Naha
No ratings yet
Fischer Information Matrix
No ratings yet
Fischer Information Matrix
2 pages
MCQs
No ratings yet
MCQs
8 pages
Value-based Intelligent Pricing: Marketing and Business, #1
From Everand
Value-based Intelligent Pricing: Marketing and Business, #1
Miguel Santiago Lopez Arrazola
No ratings yet
45 Business Concepts in 500 Words Each: In 500 words, #15
From Everand
45 Business Concepts in 500 Words Each: In 500 words, #15
Nietsnie Trebla
No ratings yet
Documento 1 Bmantenimiento
No ratings yet
Documento 1 Bmantenimiento
70 pages
Eece 522 Notes - 05 CH - 3b
No ratings yet
Eece 522 Notes - 05 CH - 3b
10 pages
The Confined Helium Atom An Informational Approach
No ratings yet
The Confined Helium Atom An Informational Approach
20 pages
Bayesian Parameter Estimation
No ratings yet
Bayesian Parameter Estimation
40 pages
(Azzalini, 1985) A Class of Distributions Which Includes The Normal Ones
No ratings yet
(Azzalini, 1985) A Class of Distributions Which Includes The Normal Ones
9 pages
The Essential BYOM Manual: Tjalling Jager April 3, 2015
100% (1)
The Essential BYOM Manual: Tjalling Jager April 3, 2015
18 pages
Business Intelligence Questions, Analytical & Reporting Hint
From Everand
Business Intelligence Questions, Analytical & Reporting Hint
Dr. Zemelak Goraga
No ratings yet
VK Malik
No ratings yet
VK Malik
25 pages
Cleaning Business Pricing Strategies
From Everand
Cleaning Business Pricing Strategies
Business Success Shop
No ratings yet
Maximum Likelihood
No ratings yet
Maximum Likelihood
16 pages
Information Geometry Univariate Time Series
No ratings yet
Information Geometry Univariate Time Series
12 pages
The Marketing Audit
From Everand
The Marketing Audit
Orlando Skelton
No ratings yet
Fisher Information For GLM
No ratings yet
Fisher Information For GLM
35 pages
SampleQs Solutions PDF
No ratings yet
SampleQs Solutions PDF
20 pages
Entropy 24 00713 v2
No ratings yet
Entropy 24 00713 v2
12 pages
Lect 4 Notes
No ratings yet
Lect 4 Notes
21 pages
Tests Based On Asymptotic Properties
No ratings yet
Tests Based On Asymptotic Properties
6 pages
Kalman Filter
No ratings yet
Kalman Filter
11 pages
Statistical Inference A
No ratings yet
Statistical Inference A
4 pages
How To Win Customers Every Day _ Volume 7: Data-Driven Selling: The Complete Guide to Success
From Everand
How To Win Customers Every Day _ Volume 7: Data-Driven Selling: The Complete Guide to Success
MAX EDITORIAL
No ratings yet
Assignment 5: E1 244 - Detection and Estimation Theory (Jan 2023) Due Date: April 02, 2023 Total Marks: 55
No ratings yet
Assignment 5: E1 244 - Detection and Estimation Theory (Jan 2023) Due Date: April 02, 2023 Total Marks: 55
2 pages
Monotonicity of Entropy PDF
No ratings yet
Monotonicity of Entropy PDF
8 pages
Local Influence - by Manuel Galea Et - Al 1997
No ratings yet
Local Influence - by Manuel Galea Et - Al 1997
9 pages
Strategy
From Everand
Strategy
Jacob Varghese
4/5 (1)
Smart Pricing (Review and Analysis of Raju and Zhang's Book)
From Everand
Smart Pricing (Review and Analysis of Raju and Zhang's Book)
BusinessNews Publishing
4/5 (1)
Customer Relationship Management: A powerful tool for attracting and retaining customers
From Everand
Customer Relationship Management: A powerful tool for attracting and retaining customers
50minutes
3.5/5 (3)
Customers Rule! (Review and Analysis of Blackwell and Stephan's Book)
From Everand
Customers Rule! (Review and Analysis of Blackwell and Stephan's Book)
BusinessNews Publishing
No ratings yet
The Customer Revolution (Review and Analysis of Seybold's Book)
From Everand
The Customer Revolution (Review and Analysis of Seybold's Book)
BusinessNews Publishing
No ratings yet
The Art of Pricing (Review and Analysis of Mohammed's Book)
From Everand
The Art of Pricing (Review and Analysis of Mohammed's Book)
BusinessNews Publishing
No ratings yet
The Perfect Price
From Everand
The Perfect Price
MENTES LIBRES
No ratings yet