Constrained User Intent
Constrained User Intent
An You Yu Chen
Meituan Group Meituan Group
Beijing, China Beijing, China
[email protected] [email protected]
5028
KDD ’23, August 6–10, 2023, Long Beach, CA, USA Chao Wang et al.
Prediction
overall optimal conversion. For example, Meituan, one of the most
0.006
popular e-commerce platforms for service, takes tens of millions
of orders daily. In its payment platform, bonuses are allocated to 0.003
new customers through discounts on the order amount. In order
to meet the needs of different users as much as possible, there are 0.000
1 17 34 51
dozens of optional bonus amounts. Treatment
User intent detection is a multi-treatment effect estimation prob-
lem. Methods can be divided into several categories, such as stratifi-
Figure 1: We randomly select three new customers online
cation methods, matching models, meta-algorithms, and tree-based
and obtain the potential outcomes predicted by the intent
methods. Representation learning methods based on deep neural
detection model under each bonus treatment, called the re-
networks have also attracted much attention, including the pio-
sponse curve.
neering method TARNET [21] and the PM model [20] for multi-
treatment scenarios. Compared with traditional machine learning
approaches, deep representation learning models are capable of au-
to make the information fusion between the head layers more suffi-
tomatically searching for correlated features and combining them to
cient and to improve the stability of model training when there are
enable more effective and accurate counterfactual estimation [29].
many treatments.
In terms of optimization, the bonus allocation problem can be for-
For the second challenge, in the bonus allocation optimization
mulated as a multi-choice knapsack problem (MCKP). The bonus
problem, there are millions of new customers daily and dozens of
allocation is non-trivial because of the following challenges.
bonus treatments, so the policy space is combinatorial and enor-
mous. Inspired by [27, 32], we leverage the Lagrangian dual theory
• For user intent detection, the existing model lacks inter- to calculate the Lagrangian multiplier to compress the policy space.
pretability, i.e., does not conform to domain knowledge, to Furthermore, we propose a user intent detection model with con-
make reliable predictions in the real world. In economic vex constraints that can be theoretically proven to reach an upper
theory, the marginal gain of increasing the bonus amount bound on the optimal value.
should be non-negative [11]. However, due to the sparsity For the third challenge, we approach it as a feedback control
of the interaction and noise, as shown in Figure 1, the user problem and apply feedback control theory to address the dynamic
response curve predicted by the traditional treatment effect system’s response and control in the presence of external noise.
estimation methods is not entirely monotonous, and there Without loss of generality, the feedback control module can be
are many jitters. applied not only in our scenario where customer acquisition cost
• There is an optimality gap between the two stages of user (CAC) is the target constraint but also in other cost-related con-
intent detection and optimization. We prove that in the re- straints.
sponse curve obtained in the first stage, some bonus treat- We conduct extensive experiments to evaluate the performances
ments have no chance of being decided, thereby reducing of our proposed framework. Both online and offline experiments
the upper bound of the optimal value. demonstrate that the proposed framework achieves better perfor-
• The distribution of online orders and users changes over mance compared with other methods. The proposed bonus alloca-
time, such as during the week and weekends. How to adjust tion system has brought significant profits to the platform and still
allocation strategy efficiently and dynamically so that the runs online.
daily customer acquisition cost does not exceed the budget The rest of our paper is organized as follows. We describe the
limit is crucial for budget control. bonus allocation system in detail in Section 2. Then we introduce the
three modules included in the proposed framework respectively in
Section 3. We present the experimental setups and results in Section
The main contribution of this work is to propose an online bonus
4. We briefly review related work in Section 5. The conclusions are
allocation framework to address the above challenges, which con-
given in Section 6.
sists of three modules: User Intent Detection Module, Online Allo-
cation Module, and Feedback Control Module.
For the first challenge, we analyze the typical pattern of multi-
2 THE BONUS ALLOCATION FRAMEWORK IN
treatment effect estimation methods based on representation learn- MEITUAN PAYMENT
ing. Based on it, we propose a monotonic constrained user intent In this section, we introduce a bonus allocation algorithm frame-
detection model. It still deploys a multi-head network structure. work for customer acquisition marketing in Meituan Payment, as
The difference is that it implicitly models the effect increment of illustrated in Figure 2,. It comprises a business layer and an algo-
two adjacent bonus amounts and finally obtains the individual treat- rithm layer. When a new customer places an order and makes a
ment effect through accumulation. The monotonic constraint is payment on the Meituan platform, the user intent detection module
implemented by restricting the effect increment to be non-negative. in the algorithm layer evaluates the customer’s conversion prob-
In addition, the accumulation operator is added to the output layer ability for each optional bonus amount, which is provided by the
5029
A Multi-stage Framework for Online Bonus Allocation Based on Constrained User Intent Detection KDD ’23, August 6–10, 2023, Long Beach, CA, USA
business layer and is consistent for all users. In the bonus alloca- represent the expected cost. Eq.(3) is because we only issue one
tion optimization module, we formulate an optimization problem bonus per user visit.
based on the user conversion probability to determine the optimal In order to solve this problem, we propose a framework includ-
allocation strategy that maximizes the total number of converted ing (1) Predicting the conversion probability in the User Intent
customers within a given budget limit, while adhering to the al- Detection Module, (2) Solving the optimization problem in the
location rules in the business layer. We then apply the optimal Online Allocation Module to get optimal policy and making the
allocation strategy in real-time to allocate each order. As the traffic online allocation, (3) Since new customers do not arrive uniformly,
to online orders is non-uniform, the budget consumption fluctu- we are supposed to adjust the pace of budget consumption in the
ates. To ensure that the actual consumption approaches the budget Feedback Control Module so that the cost does not violate the
limit accurately, we deploy a feedback control strategy to adjust budget limit.
the allocation strategy in real time.
Randomized controlled trials (RCTs) are considered the gold 3.2 Online Allocation Module
standard for estimating treatment effects. In this study, we collected The methods for solving MCKPs can be divided into two categories,
samples through RCTs to assess the potential effects of different one is the exact algorithm, such as dynamic programming and
bonus treatments. The experiment was conducted as an online branch and bound method, and the other is the inexact algorithm,
multivariate A/B test, which was deployed in the actual traffic of mainly including Lagrangian dual algorithm, evolutionary algo-
Meituan Payment and lasted for several days. rithm, etc. Due to the need for real-time allocation, it is difficult
to obtain an exact solution. Inspired by [27, 32], we leverage the
3 METHODOLOGY Lagrangian dual theory, so that we can calculate the Lagrangian
In this section, we first formulate the problem of bonus allocation multiplier 𝜆 to compress the parameter space X.
in section 3.1 and then propose the online allocation strategy based By introducing a Lagrangian multiplier 𝜆, we get the Lagrangian
on the user’s conversion probability detected in User Intent Detec- relaxation function of the original problem as follows:
tion Module described in section 3.2. Secondly, in order to meet
𝑁 ∑︁
𝑀 𝑁 𝑀
the business knowledge and increase the upper bound of the opti- ∑︁ ©∑︁ ∑︁
𝐿(X, 𝜆) = 𝑝𝑖,𝑗 𝑥𝑖,𝑗 − 𝜆 𝑏 𝑗 𝑥𝑖,𝑗 𝑝𝑖,𝑗 − 𝐵 ® ,
ª
mal value under the Lagrangian relaxation paradigm, we leverage (5)
𝑖=1 𝑗=1 « 𝑖=1 𝑗=1
monotonic and convex constraints to the common multi-treatment ¬
effect estimation model in User Intent Detection Module detailed Correspondingly, the dual problem is formulated as:
in section 3.3. Finally, we propose a feedback control strategy in
section 3.4 so that online cost consumption does not violate the min max 𝐿(X, 𝜆), (6)
𝜆 𝑥𝑖,𝑗
budget limit. 𝑀
∑︁
s.t. 𝑥𝑖,𝑗 = 1, ∀𝑖 ∈ [𝑁 ], (7)
3.1 Preliminaries 𝑗=1
Within a time period of one day, we assume that there are 𝑁 new 𝑥𝑖,𝑗 ∈ {0, 1}, ∀𝑖 ∈ [𝑁 ], ∀𝑗 ∈ [𝑀], (8)
customers who visit the Meituan payment platform for transactions,
and there are 𝑀 types of candidate bonus amounts. For a given with optimality conditions:
user 𝑖, we use 𝑝𝑖,𝑗 to denote the conversion probability under the 𝑁 𝑀
bonus amount 𝑗. The objective of bonus allocation is to identify an ©∑︁ ∑︁
𝑏 𝑗 𝑥𝑖,𝑗 𝑝𝑖,𝑗 − 𝐵 ® = 0,
ª
𝜆 (9)
optimal allocation policy that converts as many users as possible
« 𝑖=1 𝑗=1 ¬
within a given budget constraint 𝐵. Therefore, the bonus allocation 𝑁 𝑀
∑︁ ∑︁
problem can be formulated as the MCKP. 𝑏 𝑗 𝑥𝑖,𝑗 𝑝𝑖,𝑗 − 𝐵 ≤ 0, 𝜆 ≥ 0. (10)
𝑖=1 𝑗=1
𝑁 ∑︁
∑︁ 𝑀
max 𝑝𝑖,𝑗 𝑥𝑖,𝑗 , (1) Thanks to the Lagrangian dual decomposition, we remove Eq.(2)
𝑥𝑖,𝑗 from the constraints, and for a fixed 𝜆, the above optimization
𝑖=1 𝑗=1
𝑁 ∑︁
𝑀 problem turns into a set of sub-problems (i.e., one for each new
customer 𝑖 ∗ ):
∑︁
s.t. 𝑏 𝑗 𝑝𝑖,𝑗 𝑥𝑖,𝑗 ≤ 𝐵, (2)
𝑖=1 𝑗=1 𝑀
∑︁
𝑀
∑︁ max 𝑥𝑖 ∗ ,𝑗 𝑝𝑖 ∗ ,𝑗 − 𝜆𝑝𝑖 ∗ ,𝑗 𝑏 𝑗 , (11)
𝑥𝑖 ∗ ,𝑗
𝑥𝑖,𝑗 = 1, ∀𝑖 ∈ [𝑁 ], (3) 𝑗=1
𝑗=1 𝑀
∑︁
𝑥𝑖,𝑗 ∈ {0, 1}, ∀𝑖 ∈ [𝑁 ], ∀𝑗 ∈ [𝑀], (4) s.t. 𝑥𝑖 ∗ ,𝑗 = 1, 𝑥𝑖∗,𝑗 ∈ {0, 1}, ∀𝑗 ∈ [𝑀]. (12)
𝑗=1
Where if bonus 𝑗 is allocated to user 𝑖, i.e 𝑥𝑖 𝑗 = 1, thus X = {𝑥𝑖 𝑗 }
denotes the optimal allocation policy. Eq.(1) denotes the objective, Eq.(11) holds because obtaining the optimal value for a new
that is, the expected total conversion customers. Eq.(2) represents customer leads to the overall optimal value. When 𝜆 is given, all
the budget constraint, 𝑏 𝑗 denotes the amount of bonus 𝑗, and only sub-problems become independent. Compared to the original prob-
the converted users will consume the budget, so we calculate 𝑝𝑖,𝑗 𝑏 𝑗 lem with O(𝑀𝑁 ) decision variables, solving the sub-problem only
5030
KDD ’23, August 6–10, 2023, Long Beach, CA, USA Chao Wang et al.
Figure 2: Our proposed framework for multi-stage bonus allocation at the algorithmic layer, including a user intent detection
module, an online allocation module that optimizes the bonus allocation optimization problem, and a feedback control module.
requires traversing 𝑀 bonuses, reducing the computational com- 3.3.1 Monotonic constrained.
plexity to O(𝑀). The optimal allocation policy is expressed as fol- Assumption 1. Controlling other conditions unchanged, the user
lows: conversion probability and the bonus amount maintain a monotoni-
𝑥𝑖,𝑗 ∗ = 1{ 𝑗 ∗ = arg max 𝑝𝑖,𝑗 − 𝜆𝑝𝑖,𝑗 𝑏 𝑗 }, (13) cally increasing relationship.
𝑗
Obviously, the bonus is a direct discount on the order amount
Where 1{·} is the 0/1 indicator function.
in our scenario and does not include personalized creative content.
To solve the dual problem in Eq.(6), we can alternate between
Therefore, the higher the amount, the more attractive it is for user
the following steps: (1) fix 𝜆 ∗ and calculate the optimal allocation
conversion. However, due to the presence of noise and the sparsity
solution X using Eq.(13), and (2) based on the fixed X ∗ , iterate
of the interaction data, we observe that even in randomized con-
𝜆 to satisfy the optimality conditions described in Eq.(9,10). To
trolled trials, monotonicity cannot be guaranteed. To ensure mono-
improve the efficiency of the iterative process for 𝜆, we leverage
tonicity, as shown in Figure 3 (b), we propose a multi-treatment
the bisection algorithm [27, 33] for searching. The initial value of 𝜆
effect estimation model with monotonic constraints. We implicitly
is recommended to be set between 0 and 2 [26].
model the effect increment between two adjacent treatments and
obtain individual treatment effects through accumulation. Next, we
3.3 User Intent Detection Module
describe the model structure in detail.
Due to the explosive development of deep learning, treatment ef- A new customer 𝑢 is represented by a set of characteristics
fect estimation based on representation learning has become a (𝑥, 𝑐, 𝑡), where 𝑥 includes the statistical characteristics of the user’s
hot topic. The architecture of representation learning is usually historical behavior, such as the number of transaction orders and
as shown in Figure 3 (a) [16, 20, 21, 30]. In order to prevent the the average amount of orders in the past 𝑛 days, and user iden-
influence of treatment variables from being lost during training, tity descriptions, such as age and membership status, consumption
a multi-head network structure is often deployed. The loss in the preference level for each business line, etc. 𝑐 represents real-time
network consists of two parts. L𝑝𝑟𝑒𝑑𝑖𝑐𝑡 represents the error be- contextual features, such as the amount of the order, which business
tween the predicted outcome 𝑦ˆ𝑘 and the actual outcome 𝑦𝑘 . In line it originated from, and order time, etc., and 𝑡 represents the
real scenarios, treatments are not always randomly issued, so the exposed bonus treatment. The output of the shared dense layers 𝐿
sample distribution of each treatment is not the same, which may is defined as:
cause the model to fail to obtain the real causal relationship be-
tween treatment and outcome, but a spurious relationship. In deep 𝜙 = 𝐿(𝑥), (14)
representation learning, we can transform the covariates from the Where 𝜙 denotes the common user representation learned from the
original space to the latent space. L𝑏𝑎𝑙𝑎𝑛𝑐𝑒 represents the difference shared layers, it enables the efficient sharing of information across
in the potential space of samples under different treatments, where multiple treatments.
the integral probability metric is widely used. However, we obtain As shown in Figure 3 (b), 𝛿𝑘 is used as input to head layers to
samples through randomized controlled trials in Meituan payment model incremental effects. It includes two parts, context features
customer acquisition so that this regular item can be eliminated. In 𝑐 and treatment information 𝛿𝑘𝑡 . Since the contextual features are
our scenario, treatment corresponds to the bonus amount 𝑡, and more important, drawing on the idea of the deep&wide model [7],
we need to predict the effect of a customer under different bonus we place it close to the output to strengthen the memorization of the
amounts, that is, the conversion probability. model. 𝛿𝑘𝑡 includes the embedding of 𝑘-th and (𝑘 − 1)-th treatment.
5031
A Multi-stage Framework for Online Bonus Allocation Based on Constrained User Intent Detection KDD ’23, August 6–10, 2023, Long Beach, CA, USA
𝓛𝒑𝒓𝒆𝒅𝒊𝒄𝒕
𝓛𝒑𝒓𝒆𝒅𝒊𝒄𝒕 𝓛𝒑𝒓𝒆𝒅𝒊𝒄𝒕
𝒙 𝒙 𝒙
𝒕 𝒕 𝒕
(a) (b) (c)
Figure 3: (a) Multi-treatment effect estimation pattern based on representation learning. A multi-head network structure is
adopted, where 𝑇 represents treatment. (b) The proposed multi-treatment effect estimation model with monotonic constraint.
The outcome is calculated by accumulating the increment by modeling the positive effect increment between two adjacent
sorted treatments. (c) The proposed convex constrained model is realized by restricting increments monotonically decreasing.
Considering the physical meaning of accumulation, 𝛿 1𝑡 will be used Combining proposition 3.1 and 3.2, in order to obtain a higher
to model 𝑦ˆ1 . For convenience, we construct a treatment placeholder optimal value in the bonus allocation problem, we should add new
𝑡 0 , and then 𝛿 1𝑡 contains the embedding of 𝑡 0 and the first treatment. optional bonuses or reduce the size of non-monotonic and non-
We define: convex cases.
Δ̂𝑘 = 𝑓𝑘 ( [𝜙; 𝛿𝑘 ]), (15) Theorem 3.3. If 𝑦 = 𝑓 (𝑥) is a monotonically increasing and con-
Where the function 𝑓𝑘 (·) is the head, [·; ·] denotes the concatenation vex function, there is 𝑦 = 𝑔(𝑥 𝑓 (𝑥)), then 𝑔(·) is also a monotonically
of two vectors. increasing and convex function if 𝑥 > 0.
Finally, the prediction of user 𝑢 for the 𝑡-th bonus treatment is
Thanks to theorem 3.3, we can transform the convex constraint
formulated as:
between conversion probability and expected cost into that between
𝑦ˆ1, 𝑖𝑓 𝑡 = 1 conversion probability and bonus amount, which is convenient for
𝑡
𝑦ˆ𝑡 = ∑︁ (16) modeling. Therefore, we propose a multi-treatment effect estima-
𝑦ˆ + Δ̂𝑘 , 𝑖 𝑓 𝑡 > 1.
1
tion model with monotonically increasing and convex constraints.
𝑘=2 As shown in Figure 3 (c), the model inference process consists
To implement the monotonic constraint, we only need to make of two steps. First, the increment of the effect incremental slope is
the incremental effect Δ̂𝑘 greater than 0. It is easily achieved by formulated:
squaring the raw output of the last head layer. The loss function of ′
the proposed model is defined as follows: 𝜈ˆ𝑘 = 𝑓𝑘 ([𝜙; 𝛿𝑘 ]). (18)
′
1
𝑁
∑︁ We square the output of the last head layer to ensure that 𝜈ˆ𝑘 ≥ 0.
L𝑝𝑟𝑒𝑑𝑖𝑐𝑡 = − (𝑦𝑙𝑜𝑔𝑦ˆ𝑡 + (1 − 𝑦)𝑙𝑜𝑔(1 − 𝑦ˆ𝑡 )), (17) and then the effect incremental slope 𝜈ˆ𝑘 is obtained through reverse
𝑁
(𝑢,𝑦) ∈ D accumulation, so that it decreases monotonically with the increase
Where 𝑢 = (𝑥, 𝑐, 𝑡) is the feature tuple, 𝑦 is the label indicating of the bonus amount, satisfying the definition of a convex function.
whether to convert. 𝑁 is the number of samples in the entire sample 𝑀
′
∑︁
space D. 𝜈ˆ𝑘 = 𝜈ˆ𝑖 . (19)
𝑖=𝑘
3.3.2 Convex constrained.
Secondly, we define the incremental effect Δ̂𝑘 between two adja-
Proposition 3.1. Given 𝑏𝑙 and 𝑏𝑢 , there exists 𝑏𝑙 ≤ 𝑏 𝑗 ≤ 𝑏𝑢 , ∀𝑗 ∈
cent treatments as:
[𝑀], then the optimal value in Eq.(1) will increase, when adding new
bonus candidates. Δ̂𝑘 = 𝜈ˆ𝑘 × 𝛼𝑘 , (20)
Proposition 3.2. For the sub-problem (11), let 𝐶𝑖 = {𝑐𝑖,𝑗 |𝑐𝑖,𝑗 = Where 𝛼𝑘 indicates the amount difference between the (𝑘 − 1)-
𝑝𝑖,𝑗 ∗ 𝑏 𝑗 } represents the expected cost set. Sort the (𝑐𝑖,𝑗 , 𝑝𝑖,𝑗 ) pairs th and the 𝑘-th bonus treatment. Finally, the predicted outcome
in increasing order of 𝑐𝑖,𝑗 , then only pairs that are monotonically is obtained by accumulating the effect increments, as shown in
increasing on the convex hull can be decided. Eq.(16).
5032
KDD ’23, August 6–10, 2023, Long Beach, CA, USA Chao Wang et al.
The loss function is identical to Eq.(17). Obviously, according parameters of a PID controller.
to Eq.(16,19), the monotonically increasing and convex constraints
between the outcome 𝑦ˆ and bonus amount can be realized. 𝑒 (𝑡) = 𝑟 (𝑡) − ℎ(𝑡), (21)
∑︁
𝑢𝜆 (𝑡) = 𝑘𝑝 𝑒 (𝑡) + 𝑘𝑖 𝑒 (𝑘) + 𝑘𝑑 (𝑒 (𝑡) − 𝑒 (𝑡 − 1)), (22)
3.4 Feedback Control Module 𝑖 −1
As shown in 3.2, we iterate 𝜆 to satisfy the budget constraint based 𝜆(𝑡 + 1) = 𝐹𝜆 (𝜆(0), 𝜇𝜆 (𝑡)). (23)
on the historical data. Unfortunately, obtaining the optimal 𝜆 in
the dynamic payment platform is extremely difficult. First, uncon- 4 EXPERIMENTS
trollable changes in factors including customer profile, traffic, and To verify the effectiveness of the proposed framework, we con-
coupon write-off rate make the distribution of the variables non- duct offline experiments and online A/B tests. We first introduce
stationary. Second, the marketing campaign settings such as budget the datasets, baseline methods, and evaluation metrics. Finally, we
and target audience may be changed irregularly. present the experimental results and analysis.
Feedback control, which deals with the dynamic systems from
feedback and outside noise [3], is widely adopted for its robust-
ness and effectiveness. In our scenario, to cope with the fluctuation
4.1 Experimental Setup
of the dynamic bonus allocation environment, we regard the La- 4.1.1 Experimental Settings. To demonstrate the effectiveness of
grangian multiplier 𝜆 as the adjustable system input and use CAC the proposed framework, our experiments consist of three parts:
as the performance indicator. Naturally, the constraint problem is user intent detection evaluation part, budget allocation evaluation
transformed into a feedback control problem: a low 𝜆 leads to a part, and feedback control evaluation part. We mainly conduct ex-
generous bonus distribution so that CAC might be higher than the periments in real industrial data, and the sample collection process
target and the budget may run out early. On the contrary, a high 𝜆 is shown in Figure 2. Through an online randomized controlled trial,
results in a stingy bonus distribution, so that CAC might be lower we collected data on 6.6 million orders and 2 million users within 11
than the target and the budget may not be fully spent. days, including 51 optional bonus amounts. We randomly selected
5.6 million order data as the training set in the user intent detection
module to train the multi-treatment effect estimation model.
5033
A Multi-stage Framework for Online Bonus Allocation Based on Constrained User Intent Detection KDD ’23, August 6–10, 2023, Long Beach, CA, USA
5034
KDD ’23, August 6–10, 2023, Long Beach, CA, USA Chao Wang et al.
5035
A Multi-stage Framework for Online Bonus Allocation Based on Constrained User Intent Detection KDD ’23, August 6–10, 2023, Long Beach, CA, USA
6.0
Relative error %
Gain
Method
CVR CAC
4.0
Monotonic constrained -0.44% -5.07%
2.0 vs S-learner (𝑝-value=0.68) (0.0082)
Convex constrained -2.59% -6.62%
0.0 vs S-learner (0.0327) (<0.001)
5036
KDD ’23, August 6–10, 2023, Long Beach, CA, USA Chao Wang et al.
5037
A Multi-stage Framework for Online Bonus Allocation Based on Constrained User Intent Detection KDD ’23, August 6–10, 2023, Long Beach, CA, USA
Further, it yields:
50000 𝑝𝑖,𝑏 − 𝑝𝑖,𝑎 𝑝𝑖,𝑐 − 𝑝𝑖,𝑏
≥ 0,
𝑐𝑖,𝑏 − 𝑐𝑖,𝑎 𝑐𝑖,𝑐 − 𝑐𝑖,𝑏
47500 M: 5
M: 15 which contradicts the definition of non-convex. Thus, we complete
45000 the proof. □
M: 30
M: 51 A.1.3 Theorem 3.3. If 𝑦 = 𝑓 (𝑥) is a monotonically increasing and
42500
4 6 8 convex function, there is 𝑦 = 𝑔(𝑥 𝑓 (𝑥)), then 𝑔(·) is also a mono-
Budget constraint ×107 tonically increasing and convex function if 𝑥 > 0.
Proof. 𝑦 = 𝑓 (𝑥) yields 𝑥 = 𝑓 −1 (𝑦), thus 𝑔 −1 (𝑦) = 𝑦 𝑓 −1 (𝑦).
Figure 8: The X-axis denotes budget constraint 𝐵, and Y-axis Derivation of 𝑓 −1 is as follows:
denotes the optimal value obtained by using the Lagrangian ′ 1
𝑓 −1 = −1 , (29)
dual method. As the number of bonus 𝑀 increases, the opti- 𝑓
mal value obtained within the same budget constraint will ′′ 𝑓
′′
5038