0% found this document useful (0 votes)
51 views11 pages

108 Preference Based Multi Obj

bases

Uploaded by

Andrei Nastuta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views11 pages

108 Preference Based Multi Obj

bases

Uploaded by

Andrei Nastuta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Preference-based Multi-Objective Bayesian

Optimization with Gradients

Anonymous Author(s)
Affiliation
Address
email

Abstract

1 We propose PUB-MOBO for personalized multi-objective Bayesian Optimization.


2 PUB-MOBO combines utility-based MOBO with local multi-gradient descent
3 to refine user-preferred solutions to be near-Pareto-optimal. Unlike traditional
4 methods, PUB-MOBO does not require estimating the entire Pareto-front, making
5 it more efficient. Experimental results on synthetic and real-world benchmarks
6 show that PUB-MOBO consistently outperforms existing methods in terms of
7 proximity to the Pareto-front and utility regret.

8 1 Introduction

9 Multi-objective Bayesian optimization (MOBO) is a particularly useful multi-objective optimization


10 (MOO) strategy when the objectives are black-box functions constructed from noisy observations.
11 Traditional MOBO methods such as q-EHVI [1] assume that all Pareto-optimal solutions are equally
12 desirable to the user, which might not be the case in practice. There has been a growing interest in
13 preference-based MOBO (e.g., [2, 3]) that leverages user preferences to guide the optimization process
14 towards regions of interest within the Pareto-front, typically in the form of pairwise comparisons
15 between solutions generated by the optimization algorithm. These comparisons are used to estimate an
16 underlying utility function that describes user preferences. In [4], the authors propose the EUBO and
17 qEIUU acquisition functions respectively, which take advantage of user-preference when querying
18 new points. However, while preference-based MOBO can effectively identify solutions with high
19 utility as informed by user feedback, the resulting solutions may not be Pareto-optimal.
20 We present Preference-Utility-Balanced MOBO (PUB-MOBO), that systematically determines the
21 user-informed regions of interest within the Pareto-front by synergizing global and local search
22 strategies. PUB-MOBO begins with a global search driven by utility maximization to identify regions
23 in the solution space that align with user preferences. Subsequently, a local search is conducted in
24 the vicinity of these solutions to discover dominating solutions that are closer to Pareto-optimality.
25 Additionally, a new utility function, the Preference-Dominated Utility Function (PDUF), is proposed
26 that encapsulates the concept of dominance within a single function. PDUF allows for consistently
27 identifying dominating solutions, while providing a straightforward means for expressing all possible
28 user preferences. This differs from existing utility functions for preference-based MOBO such as the
29 negative ℓ1 distance from an ideal solution irrespective of the solution being on the Pareto-front or an
30 infeasible ideal solution [5], or the weighted sum where not all Pareto-optimal points can be assigned
31 with the highest utility value from any choice of weights [6]. PDUF is then used in conjunction
32 with gradient descent (GD) to seamlessly combine user preferences with the notion of dominance to
33 identify user-preferred solutions that are approximately Pareto-optimal. Empirical demonstrations on
34 several synthetic benchmark and real-world problems show that PUB-MOBO not only enhances the
35 utility of the optimization solutions, but also yields near Pareto-optimal solutions.

Submitted to 38th Conference on Neural Information Processing Systems (NeurIPS 2024). Do not distribute.
36 2 Problem Formulation
37 We minimize nf expensive-to-evaluate objective functions, denoted by fi (x) for i ∈ {1, · · · , nf }.
38 Consequently, the objective function vector is denoted f (x), where x ∈ Rnx denote the decision
39 variables. We assume that for a candidate x, the function f (x) can be evaluated, but no first- or
40 higher-order information about any component of f is available. No analytical form of f is known.
41 For MOO problems without user-preferences, the objective is to attain Pareto-optimality, which is
42 defined as follows [7]. Note that accurately computing the set of Pareto-optimal points, referred
43 to as the Pareto-front Xpareto , can often be computationally prohibitive, even for small nf . In the
44 presence of a user, estimating the entire Pareto-front may become unnecessary, especially when only
45 specific sub-regions of the feasible set X is of interest. Mathematically, such user-preferences are
46 often abstracted in the MOBO literature via utility functions. Specifically, the MOO problem is recast
47 as a (scalar) utility maximization problem
max u (f (x)) , (1)
x∈X

48 where u : Rny → R is the unknown utility function that dictates the behavior of the user. Note that
49 the input to the utility is a noise-corrupted outcome vector y = f (x) + ε, where ε is zero-mean noise
50 with variance σε2 I ny where I ny is the ny × ny identity matrix. Let the highest utility Pareto-point be
51 defined as
x∗ ∈ arg max u(f (x)). (2)
x∈Xpareto

52 Following the preference BO literature, we assume the utility function is not available to evaluate and
53 its functional form is unknown. Additionally, it is well-established that user preferences are difficult
54 to be assigned to continuous numerical values; instead we suppose that users are more inclined to
55 provide weak supervision in the form of pairwise comparisons [8, 4]. The following assumption is
56 made to assert that a typical user will select dominating solutions when possible: If y 1 and y 2 are
57 candidate outcomes presented to the user and y 1 ≻ y 2 , then the user will always select y 1 ; that
58 is, u(y 1 ) > u(y 2 ). This assumption should be enforced when modeling preference-based MOBO
59 problems to accurately reflect real user behavior.

60 3 Preference-Utility-Balanced (PUB) MOBO


61 Users often require some assurance that the suggested candidates are not only high in utility, but also
62 near-(Pareto)-optimal. PUB-MOBO relies on utility maximization to ascertain candidate solutions
63 that are preferred by the user while promoting a local search towards the Pareto-front using estimated
64 gradients. We observe that the local search finds solutions near Pareto points, which subsequently
65 accelerates the search for high-utility solutions.

66 3.1 PUB-MOBO Algorithm

67 The proposed PUB-MOBO method operates in three stages. We extend the two stages (PE: preference
68 exploration, and EXP: outcome evaluation via experiments) in [4] with an additional stage based on
69 local multi-gradient descent, denominated the GD stage. In each PUB-MOBO iteration, these three
70 stages are executed, and the process is repeated ad infinitum, or (more practically) until a pre-decided
71 budget for total number of outcome evaluations is attained; see Algorithm 1 in Appendix D.
72 Preference Exploration: Here, the user expresses their preferences over a query of two candidate
73 solutions in a form of pairwise comparisons. The comparison is used to update the estimate û of the
74 utility, obtained implicitly with a pairwise GP and the EUBO acquisition function proposed in [4];
75 see Appendix B.1 for the closed-form expression. Note that no evaluation of f is required for PE.
76 Outcome evaluation via Experiments: Here, we compute the optimal decision variables and
77 evaluate true outcomes to update the outcome model fˆ using the expected improvement under utility
78 uncertainty (qEIUU) [9] acquisition; see Appendix B.2 for the closed-form expression. Maximizing
79 qEIUU involves taking Monte Carlo samples [10, 11] and yields the optimal decision variables, xEXP .
80 After xEXP is obtained, we append it along with its true outcome value f (xEXP ) to the current dataset.
81 Multi-gradient descent: This GD stage is motivated by the fact that xEXP , while expected to be
82 high in utility, is not specifically designed to be near the Pareto-front. Analogous to single-objective
83 optimization, we will pursue local gradients that are expected to generate a trajectory of x candidates

2
84 that evolves towards a nearby Pareto-optimal point. We will refer to these gradient-following decision
85 variables as ‘xGD ’. We set the initial xGD to be xEXP .
86 For a MOO problem, gradient descent must be adapted for multiple objectives. We propose the
87 use of multiple gradient descent algorithm (MGDA) [12], which was designed for smooth multi-
88 outcome objective functions. MGDA exhibits some theoretical properties that, we hypothesize, and
89 demonstrate via experiments, are beneficial in the MOBO context.
90 MGDA exploits the KKT conditions [13] as a quadratic cost constrained on the probability simplex:
2
min α⊤ ∇f (x) subject to: 1⊤ α = 1. (3)
α≥0

91 It is well-known, c.f. [12], that a solution to (3) is either: α⊤ ∇f (x) = 0, in which case the current
92 parameters x are Pareto-optimal, or α⊤ ∇f (x) ̸= 0, and α⊤ ∇f (x) is a feasible descent direction.
93 Given that (3) is a quadratic cost over linear constraints, we can use the Frank-Wolfe algorithm
94 [14, 15] to efficiently compute optimal solutions; see pseudocode in Algorithm 3 in Appendix D.
95 Solving (3) yields an optimal α with which we can take a gradient step xGD ← xGD −ηα⊤ ∇f (xGD ).
96 However, there are two clear difficulties at this juncture. The first is that this update may yield an
97 xGD ̸∈ X. To counter this, we stop updating when this happens, and stop the local gradient search
98 phase, moving on to the next PUB-MOBO iterations with an updated dataset D that contains all the
99 xGD and correspond y GD observed so far. The second and more debilitating problem is that we do not
100 have access to gradients of f . Thankfully, we do have a surrogate model fˆ with which we can obtain
101 an estimate of the gradient at any x with µ∇ := E[∇fˆ(x)] through (6a). The gradient step is then
102 xGD ← xGD − ηα⊤ µ∇ (xGD ). Unfortunately, there is no clear correlation between the uncertainties
103 in f and ∇f , so µ∇ could have large uncertainties even near previously observed points. Therefore,
104 it is imperative to incorporate techniques that can reduce uncertainty in the posterior of the gradient
105 estimate. To this end, we propose to use the gradient information (GI) acquisition function [16].
106 Multi-gradient descent with GI acquisition: We briefly explain the mechanism of the GI acquisition.
107 Suppose we select the best candidate from the EXP stage, xEXP , and set it as the initial candidate
108 for the local gradient search: xGD . The GI acquisition tries to select a subsequent point x′ that will
109 minimize the uncertainty of the gradient at xGD if x′ and its corresponding y ′ were known. By
110 considering all nf objective independently distributed, we assess the uncertainty can formulate the
111 uncertainty information using an A-optimal design criterion [17], which, for Gaussian distributions,
112 involves maximizing:
nf
X

Tr ∇ki (xGD , X ′ )Kσ−1 (X ′ )∇ki⊤ (xGD , X ′ )

GI(x ) = (4)
i=1

113 where X ′ = {X ∪ x′ }. For each gradient-step in nGD , the GI acquisition function is optimized nGI
114 times to reduce gradient uncertainty. Upon each optimization, we evaluate the outcome function
115 to obtain a corresponding y GD , which is appended to the dataset D for subsequent PUB-MOBO
116 iterations. We provide the derivation of the GI acquisition function in Appendix B.3 and pseudocode
117 of multi-gradient descent in Algorithm 2, in Appendix D.

118 3.2 Preference-Dominated Utility Function

119 We propose the PDUF which merges the concept of dominance with user preferences to help locate
120 high utility points that are close to Pareto-optimality. See Appendix C for more details.

121 4 Experiments
122 We validate the proposed PUB-MOBO method on benchmarks commonly found in MOO literature:
123 DTLZ1 (nx = 9, nf = 2) [18], DH1 (nx = 10, nf = 2) [7], Conceptual Marine Design (nx =
124 6, nf = 4) [19], Car Side Impact (nx = 7, nf = 4) [20]. The baselines and ablations that we compare
125 are (i) EUBO+qEIUU baseline which contains only the PE and EXP stages; (ii) PUB-MOBO-PG
126 which uses the predicted gradients (PG) without any outcome evaluations or GI optimizations in the
127 GD stage. This makes it relatively inexpensive, but it ignores the fact that additional samples can
128 yield useful derivative information; (iii) PUB-MOBO-PG+OE which is a PUB-MOBO ablation that

3
Figure 1: Performance comparison on benchmarks DTLZ1, DH1, Conceptual Marine Design, Car
Side Impact. Continuous lines show median over 100 runs, and shading indicates 25-75 percentiles.

129 uses the predicted gradients as in PUB-MOBO-PG, but an Outcome Evaluation (OE) is performed at
130 every gradient descent step in an effort to lower gradient uncertainty around observed points; (iv)
131 PUB-MOBO which is the proposed method. Figure 1 illustrates the performance of the experiments
132 in terms of utility regret and distance to the Pareto front w.r.t. outcome evaluations and user queries.
133 EUBO+qEIUU is the poorest-performing algorithm for all the metrics affirming the effectiveness
134 of the additional stage based on local gradient search. However, PUB-MOBO-PG performs equally
135 poorly, largely due to inaccurate gradient estimation obtained with the surrogate model f̂ in (6). We
136 frequently observe that the evolution of xGD in the GD stage is prematurely terminated either due to
137 infeasibility in x or because of incorrect solutions to MGDA due to erroneous µ∇ (xGD ). The PG+OE
138 variant significantly outperforms the PG variant due to its enhanced gradient estimation accuracy,
139 which justifies the additional computational cost of updating the outcome model. PUB-MOBO further
140 improves on the PG+OE variant by using the GI acquisition function to reduce gradient uncertainty,
141 leading to even more accurate gradient estimates.

142 5 Conclusion
143 In this work we presented PUB-MOBO, a sample efficient Multi-Objective Bayesian Optimization
144 algorithm that combine user-preference with a gradient-based search to compute near Pareto-optimal
145 solutions. We verify that our proposed method yields high utility and reduced distance to Pareto-front
146 solutions, and also demonstrate the importance of gradient uncertainty reduction in the gradient-based
147 search. Finally, the proposed utility function respects dominance while modeling different user
148 preferences.

4
149 References
150 [1] Samuel Daulton, Maximilian Balandat, and Eytan Bakshy. Differentiable expected hypervolume
151 improvement for parallel multi-objective bayesian optimization. Advances in Neural Information
152 Processing Systems, 33:9851–9864, 2020.
153 [2] Ketong Shao, Diego Romeres, Ankush Chakrabarty, and Ali Mesbah. Preference-guided
154 Bayesian optimization for control policy learning: Application to personalized plasma medicine.
155 In NeurIPS 2023 Workshop on Adaptive Experimental Design and Active Learning in the Real
156 World, 2023.
157 [3] Ryota Ozaki, Kazuki Ishikawa, Youhei Kanzaki, Shion Takeno, Ichiro Takeuchi, and Masayuki
158 Karasuyama. Multi-objective Bayesian optimization with active preference learning. In
159 Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 14490–14498,
160 2024.
161 [4] Zhiyuan Jerry Lin, Raul Astudillo, Peter Frazier, and Eytan Bakshy. Preference exploration
162 for efficient bayesian optimization with multiple outcomes. In International Conference on
163 Artificial Intelligence and Statistics, pages 4235–4258. PMLR, 2022.
164 [5] Kaisa Miettinen. Nonlinear multiobjective optimization, volume 12. Springer Science &
165 Business Media, 1999.
166 [6] Giorgio Chiandussi, Marco Codegone, Simone Ferrero, and Federico Erminio Varesio. Compar-
167 ison of multi-objective optimization methodologies for engineering applications. Computers &
168 Mathematics with Applications, 63(5):912–942, 2012.
169 [7] Kalyanmoy Deb and Himanshu Gupta. Searching for robust Pareto-optimal solutions in multi-
170 objective optimization. In International conference on evolutionary multi-criterion optimization,
171 pages 150–164. Springer, 2005.
172 [8] Wei Chu and Zoubin Ghahramani. Preference learning with gaussian processes. In Proceedings
173 of the 22nd international conference on Machine learning, pages 137–144, 2005.
174 [9] Raul Astudillo and Peter Frazier. Multi-attribute bayesian optimization with interactive prefer-
175 ence learning. In Silvia Chiappa and Roberto Calandra, editors, Proceedings of the Twenty Third
176 International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of
177 Machine Learning Research, pages 4496–4507. PMLR, 26–28 Aug 2020.
178 [10] James Wilson, Frank Hutter, and Marc Deisenroth. Maximizing acquisition functions for
179 bayesian optimization. Advances in neural information processing systems, 31, 2018.
180 [11] Maximilian Balandat, Brian Karrer, Daniel Jiang, Samuel Daulton, Ben Letham, Andrew G Wil-
181 son, and Eytan Bakshy. Botorch: A framework for efficient monte-carlo bayesian optimization.
182 Advances in neural information processing systems, 33:21524–21538, 2020.
183 [12] Jean-Antoine Désidéri. Multiple-gradient descent algorithm (mgda) for multiobjective opti-
184 mization. Comptes Rendus Mathematique, 350(5-6):313–318, 2012.
185 [13] Stefan Schäffler, Reinhart Schultz, and Klaus Weinzierl. Stochastic method for the solution of
186 unconstrained vector optimization problems. Journal of Optimization Theory and Applications,
187 114:209–222, 2002.
188 [14] Ozan Sener and Vladlen Koltun. Multi-task learning as multi-objective optimization. Advances
189 in neural information processing systems, 31, 2018.
190 [15] Martin Jaggi. Revisiting frank-wolfe: Projection-free sparse convex optimization. In Interna-
191 tional conference on machine learning, pages 427–435. PMLR, 2013.
192 [16] Sarah Müller, Alexander von Rohr, and Sebastian Trimpe. Local policy search with bayesian
193 optimization. Advances in Neural Information Processing Systems, 34:20708–20720, 2021.
194 [17] Ankush Chakrabarty, Gregery T Buzzard, and Ann E Rundell. Model-based design of experi-
195 ments for cellular processes. Wiley Interdisciplinary Reviews: Systems Biology and Medicine,
196 5(2):181–203, 2013.

5
197 [18] Kalyanmoy Deb, Lothar Thiele, Marco Laumanns, and Eckart Zitzler. Scalable test problems
198 for evolutionary multiobjective optimization. In Evolutionary multiobjective optimization:
199 theoretical advances and applications, pages 105–145. Springer, 2005.
200 [19] Michael G Parsons and Randall L Scott. Formulation of multicriterion design optimization
201 problems for solution with scalar numerical optimization methods. Journal of Ship Research,
202 48(01):61–76, 2004.
203 [20] Himanshu Jain and Kalyanmoy Deb. An evolutionary many-objective optimization algorithm
204 using reference-point based nondominated sorting approach, part ii: Handling constraints
205 and extending to an adaptive approach. IEEE Transactions on evolutionary computation,
206 18(4):602–622, 2013.
207 [21] Mauricio A Alvarez, Lorenzo Rosasco, Neil D Lawrence, et al. Kernels for vector-valued
208 functions: A review. Foundations and Trends® in Machine Learning, 4(3):195–266, 2012.
209 [22] Christopher KI Williams and Carl Edward Rasmussen. Gaussian processes for machine learning,
210 volume 2. MIT press Cambridge, MA, 2006.
211 [23] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan,
212 Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative
213 style, high-performance deep learning library. Advances in neural information processing
214 systems, 32, 2019.

6
215 Appendix

216 A Preliminaries

217 A.1 Modeling with Gaussian processes

218 We first discuss the modeling choices considered to learn the outcome function f and the utility
219 function u: their respective approximations are denoted f̂ and û.

220 A.1.1 Modeling outcomes

221 Gaussian process (GP) regression is a popular choice for constructing the surrogate fˆ for the true
222 outcome function f . We train an independent GP for each objective f i , though a multi-output GP
223 that models correlations between the objectives could also be considered [21]. Each GP is defined a
224 priori by a mean function m(x) and covariance function ki (x, x′ ) called kernel. For this work, any
225 C 2 kernel is admissible.
226 Let X T = [x1 , x2 , . . . , xT ]; we drop the subscript for brevity. Given a dataset D := (X, Y ),
227 comprising input-outcome pairs, the mean and variance of the posterior are given by

µi (x) = m(x) + ki (x, X)Kσ−1 (X)(Yi − m(X)), (5a)


Σi (x) = ki (x, x) − ki (x, X)Kσ−1 (X)ki (X, x), (5b)

228 where Kσ (X) := ki (X, X) + σ 2 I and m(·) is the prior mean. Since the derivative is a linear
229 operator, the derivative GP is another GP [22] characterized fully by the mean and covariance
230 functions

µ∇ −1
i (x) = ∇m(x) + ∇ki (x, X)Kσ (X)(Yi − m(X)), (6a)
Σ∇
i (x)
2
= ∇ ki (x, x) − ∇ki (x, X)Kσ−1 (X)∇ki (X, x), (6b)

231 In the implementation, each GP is designed with a Matérn 5/2 kernel with ARD, a lengthscale prior
232 defined by Gamma(α = 3, β = 6), and an outputscale prior defined by Gamma(α = 2, β = 0.15). To
233 infer the gradient mean (6a) and covariance (6b) of the posterior, automatic differentation [23] is
234 used. The inputs X are normalized from [0, 1] and the outcomes Y are standardized to zero mean
235 and unit variance during GP fitting. We intialize the model with 6 outcomes.

236 A.1.2 Modeling preferences


237 We assume the user is only capable of weak supervisions in the form of pairwise comparisons (PC).
238 That is, if the user prefers y := f over y := f ′ , the pairwise comparison function r(y, y ′ ) = 0.
239 In the event that the user prefers y ′ instead, r(y, y ′ ) = 1. Pairwise GPs c.f. [8] allow us to learn
240 a latent functional representation û of the true user utility based on this preference feedback. The
241 latent function satisfies û(y) > û(y ′ ) if the user prefers y, and vice versa. In the implementation,
242 we use the RBF kernel with ARD, a lengthscale prior defined by Gamma(α = 2.4, β = 2.7), and an
243 outputscale prior defined by a smoothed box prior from [0.01, 100]. The outcomes Y are normalized
244 from [0, 1] during GP fitting. We initialize with 12 Sobol points and form pairwise comparisons with
245 every consecutive pair of outcomes to yield 6 user comparisons.

246 B Acquisition functions

247 B.1 EUBO

248 The EUBO acquisition is given by:

EUBO(x1 , x2 ) = E[max(û(fˆ(x1 ), û(fˆ(x2 ))], (7)

249 where the hat notation denotes surrogate models of the corresponding functions.

7
250 B.2 qEIUU

251 The expected improvement under utility uncertainty is given by


h i
qEIUU(x) = E max(û(fˆ(x)) − û(f (xbest )), 0) , (8)

252 where xbest = arg maxx∈X û(fˆ(x)), and xEXP := arg maxX qEIUU(x). Since the expectation
253 in (8) is with respect to the outcome and utility models, the analytical expression is challenging.

254 B.3 GI

255 We derive the GI acquisition function described in (4). The derivation is an adaptation of the Gradient
256 Information acquisition function in [16] to the case of independent multi-objectives. We begin by
257 expressing the difference in the trace of the gradient posterior covariance before and after the addition
258 of the new datapoint (x′ , y ′ ) to the dataset D
nf
X
E Tr(Σ∇ ∇ ′ ′
 
GI = i (xGD |D)) − Tr Σi (xGD |D, (x , y )) .
i=1

259 This can be expressed as the Lebesgue-Stieltjes integral

nf Z
X
Tr(Σ∇ ∇ ′ ′ ′
 
GI = i (xGD |D)) − Tr Σi (xGD |D, (x , y )) dF (x ),
i=1

260 with F denoting the distribution of x′ . For optimization purposes maximizing GI is equivalent to
261 maximizing

nf Z
X
−Tr Σ∇ ′ ′ ′

argmax GI ≡ argmax i (xGD |D, (x , y )) dF (x ).
x′ x′ i=1

262 since the first term does not depend on the optimization variable x′ . Rewriting this formulation as a
263 Reinmann integral will yield

nf Z
X
Tr Σ∇ ′ ′ ′ ′ ′

argmax GI = argmin i (xGD |D, (x , y )) · p(f (x ) = y |D)dy .
x′ x′ i=1 R

264 As seen from (6b), the covariance in Gaussian distributions is independent of the observed outcomes,
265 so the acquisition function can be further reduced to

nf Z
X
Σ∇ ′ ′
p(f (x′ ) = y ′ |D)dy ′ ,

argmax GI = argmin Tr i (xGD |D, (x , y ))
x′ x′ i=1 R
| {z }
=1
nf
X
Tr Σ∇ ′ ′

= argmin i (xGD |D, (x , y )) ,
x′ i=1
nf
X
Tr ∇ki (xGD , X ′ )Kσ−1 (X ′ )∇ki⊤ (xGD , X ′ ) ,

= argmax
x′ i=1

266 where X ′ = {X ∪ x′ }.

8
267 C Preference-Dominated Utility Function
268 The utility function represents the user preference in preference-based MOBO algorithms and is used
269 to simulate user responses. Practically, the utility function is employed to respond to user queries,
270 such as providing pairwise comparisons between two outcomes [4]. A utility function used to test
271 preference-based MOBO algorithms should satisfy two key properties:

272 (P1) Dominance Preservation: When evaluating a query, the true utility function should satisfy
273 Assumption 1.
274 (P2) Preference Integration: The utility function should have parameters θu that allow unique strictly
275 maximal-utility Pareto-optimal solutions. That is, for any x ∈ Xpareto , there exists an easily
276 computable θu ∈ Rnu such that u(f (x)|θu ) > u(f ({Xpareto \ x}|θu ).

277 For instance, the commonly used ℓ1 distance (a) fails to satisfy the Preference Integration property
278 when calculated from the utopia point, and violates Dominance Preservation when calculated from
279 any other point. This is illustrated in Fig. 2a, where the contours of an ℓ1 distance utility function is
280 shown with an example Pareto-front. Here, the two red points are indistinguishable according to the
281 utility function, demonstrating the limitations of ℓ1 distance in distinguishing between Pareto-optimal
282 solutions.

(a) ℓ1 distance Utility function (b) PDUF


Figure 2: Contour plots of (a) the commonly used negative l1 distance Utility function (b) the
proposed PDUF.

283 Therefore, we propose the preference-dominated utility function (PDUF) which merges the concept
284 of dominance with user preferences. An illustration of the contours in a 2D case is shown in Fig. 2b.
285 The PDUF integrates the concept of dominance with user preferences by combining multiple logistic
286 functions centered around different points in the objective function space and is expressed as:
nc Yny
1 X
u(y) = Lβ (yj , ci,j ) (10)
nc i=1 j=1

1
287 where Lβ (yj , ci,j ) = 1+exp(β·(y j −ci,j ))
. ci = (ci,1 , ci,2 , . . . , ci,ny ) denotes the ith center for one
288 logistic function, β denotes a parameter that controls the steepness of the logistic function, and nc
289 denotes the number of centers. The logistic function Lβ (yj , ci,j ) approximates the step function and
290 enforces dominance for each objective yj , as seen in the red dashed lines in Fig. 2b, and the product
291 aggregates this approximation for all objectives. Furthermore, the sum of logistic function products
292 preserve dominance in the objective space. Indeed, for every ȳ that dominates user query ci , PDUF
293 will express user preference with u(ȳ) > u(ci ). Finally, the centers define the parameters θu that
294 ensure the utility function adheres to the preference integration property by aligning them along an
295 arbitrary line (the grey line in Fig. 2b).

9
296 D PUB-MOBO Algorithms

Algorithm 1 PUB-MOBO
1: Generate initial data: xINIT , yINIT , r(yINIT )
2: D = (xINIT , y INIT )
3: P = (y INIT , r(y INIT ))
4: Update outcome model fˆ with (xINIT , yINIT )
5: Update preference model û with (yINIT , r(yINIT ))
6: while # outcome evaluations ≤ budget do
7: PE stage
8: x1 , x2 ← argmaxx1 ,x2 EUBO
9: y1 , y2 = fˆ(x1 ), fˆ(x2 )
10: r(y1 , y2 ) ← user provides a comparison
11: Append P with (y1 , y2 , r(y1 , y2 ))
12: Update pref. model û with (y1 , y2 , r(y1 , y2 ))
13: EXP stage
14: xEXP ← argmaxx qEIUU
15: yEXP = f (xEXP )
16: Append D with (xEXP , yEXP )
17: Update outcome model fˆ with (xEXP , yEXP )
18: GD stage
19: (X GD , Y GD ) ← Local Gradient Descent(xEXP )
20: Append D with (X GD , Y GD )
21: end while

Algorithm 2 M ULTI -G RADIENT D ESCENT


1: Initialize xGD ← xEXP
2: (X GD , Y GD ) = (∅, ∅)
3: # of multi-gradient steps, nGD ▷ default:10
4: # of GI optimizations, nGI ▷ default:1
5: Early stopping threshold, εGD ▷ default:0.1
6: for i ≤ nGD do
7: Compute µ∇ (xGD ) using (6a)
8: Compute M = µ∇ (xGD )⊤ µ∇ (xGD )
9: α ← Frank-Wolfe(M )
10: xGD ← xGD − ηα⊤ µ∇ (xGD )
2
11: if xGD ∈X and α⊤ µ∇ (xGD ) 2 >εGD then
12: Evaluate the true objective: yGD = f (xGD )
13: Append (X GD , Y GD ) with (xGD , y GD )
14: Update outcome model fˆ with (xGD , yGD )
15: for j ≤ nGI do
16: xGI ← arg maxx′ GI
17: Evaluate the true objective: yGI = f (xGI )
18: Append (X GD , Y GD ) with (xGI , y GI )
19: Update outcome model fˆ with (xGI , yGI )
20: end for
21: else
22: break
23: end if
24: end for
25: return (X GD , Y GD )

10
Algorithm 3 Frank-Wolfe Algorithm
1: input M
2: initialize α = [ n1 , ..., n1 ] s.t. 1⊤ α = 1
f f
3: for j ≤ # max no.Pof Frank-Wolfe steps do
4: t̂ = arg minr t αt Mrt

5: γ̂ = arg minγ ((1 − γ)α + γet̂ ) M ((1 − γ)α + γet̂ )
6: α = (1 − γ̂)α + γ̂et̂
7: if γ̂ ∼ 0 then
8: break
9: end if
10: end for
11: return α

11

You might also like